This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A majorized PAM method with subspace correction for low-rank composite factorization model

Ting Tao111(taoting@fosu.edu.cn) School of Mathematics, Foshan University, Foshan    Yitian Qian222(yitian.qian@polyu.edu.hk) Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong   and   Shaohua Pan333(shhpan@scut.edu.cn) School of Mathematics, South China University of Technology, Guangzhou
Abstract

This paper concerns a class of low-rank composite factorization models arising from matrix completion. For this nonconvex and nonsmooth optimization problem, we propose a proximal alternating minimization algorithm (PAMA) with subspace correction, in which a subspace correction step is imposed on every proximal subproblem so as to guarantee that the corrected proximal subproblem has a closed-form solution. For this subspace correction PAMA, we prove the subsequence convergence of the iterate sequence, and establish the convergence of the whole iterate sequence and the column subspace sequences of factor pairs under the KL property of objective function and a restrictive condition that holds automatically for the column 2,0\ell_{2,0}-norm function. Numerical comparison with the proximal alternating linearized minimization method on one-bit matrix completion problems indicates that PAMA has an advantage in seeking lower relative error within less time.

keywords Low-rank factor model; alternating minimization method; subspace correction; global convergence

1 Introduction

Let n×m(nm)\mathbb{R}^{n\times m}\ (n\leq m) be the space of all n×mn\times m real matrices, equipped with the trace inner product ,\langle\cdot,\cdot\rangle and its induced Frobenius norm F\|\cdot\|_{F}. Fix any r{1,,n}r\in\{1,\ldots,n\} and write 𝕏r:=n×r×m×r\mathbb{X}_{r}\!:=\mathbb{R}^{n\times r}\times\mathbb{R}^{m\times r}. We are interested in the low-rank composite factorization model

min(U,V)𝕏rΦλ,μ(U,V):=f(UV)+λi=1r[θ(Ui)+θ(Vi)]+μ2[UF2+VF2],\min_{(U,V)\in\mathbb{X}_{r}}\Phi_{\lambda,\mu}(U,V):=f(UV^{\top})+\lambda\sum_{i=1}^{r}\big{[}\theta(\|U_{i}\|)+\theta(\|V_{i}\|)\big{]}+\frac{\mu}{2}\big{[}\|U\|_{F}^{2}+\|V\|_{F}^{2}\big{]}, (1)

where f:n×mf\!:\mathbb{R}^{n\times m}\to\mathbb{R} is a lower bounded LfL_{\!f}-smooth (i.e., ff is continuously differentiable and its gradient f\nabla\!f is Lipschitz continuous with modulus LfL_{f}) function, Ui\|U_{i}\| and Vi\|V_{i}\| denote the Euclidean norm of the iith column of UU and VV, λ>0\lambda>0 is a regularization parameter, μ>0\mu>0 is a small constant, and θ:¯:=(,]\theta\!:\mathbb{R}\to\overline{\mathbb{R}}:=(-\infty,\infty] is a proper lower semicontinuous (lsc) function to promote sparsity and satisfy the following two conditions:

(C.1)

θ(0)=0\theta(0)=0, θ(t)>0\theta(t)>0 for t>0t>0, and θ(t)=\theta(t)=\infty for t<0t<0;

(C.2)

θ\theta is differentiable on ++\mathbb{R}_{++}, and its proximal mapping has a closed-form.

Model (1) has a wide application in matrix completion and sensing (see, e.g., [3, 6, 14, 8]). Among others, the term μ2(UF2+VF2)\frac{\mu}{2}(\|U\|_{F}^{2}+\|V\|_{F}^{2}) plays a twofold role: one is to ensure that (1) has a nonempty set of optimal solutions and then a nonempty set of stationary points, and the other is to guarantee that (1) has a balanced set of stationary points (see Proposition 1). The regularization term λ[i=1rθ(Ui)+θ(Vi)]\lambda\big{[}\sum_{i=1}^{r}\theta(\|U_{i}\|)+\theta(\|V_{i}\|)\big{]} aims at promoting low-rank solutions via column sparsity of factors UU and VV. Table 1 below provides some examples of θ\theta to satisfy conditions (C.1)-(C.2), where θ4\theta_{4} and θ5\theta_{5} satisfy (C.2) by [9, 26]. When θ=θ1\theta=\theta_{1}, model (1) is the column 2,0\ell_{2,0}-norm regularization problem studied in [24], and when θ=θ2\theta=\theta_{2}, it becomes the factorized form of the nuclear-norm regularization problem [15, 18]. In the sequel, we write

F(U,V):=f(UV)for(U,V)𝕏randϑ(W):=i=1rθ(Wi)forWl×r.F(U,V)\!:=f(UV^{\top})\ \ {\rm for}\ (U,V)\in\mathbb{X}_{r}\ \ {\rm and}\ \ \vartheta(W)\!:={\textstyle\sum_{i=1}^{r}}\theta(\|W_{i}\|)\ \ {\rm for}\ W\!\in\mathbb{R}^{l\times r}. (2)
Table 1: Some common functions satisfying conditions (C.1)-(C.2)
θ1(t)={sign(t)ift+,otherwise\theta_{1}(t)=\left\{\begin{array}[]{cl}{\rm sign}(t)&{\rm if}\ t\in\mathbb{R}_{+},\\ \infty&{\rm otherwise}\end{array}\right. θ2(t)={t2ift+,otherwise\theta_{2}(t)=\left\{\begin{array}[]{cl}t^{2}&{\rm if}\ t\in\mathbb{R}_{+},\\ \infty&{\rm otherwise}\end{array}\right.
θ3(t)={tift+,otherwise\theta_{3}(t)=\left\{\begin{array}[]{cl}t&{\rm if}\ t\in\mathbb{R}_{+},\\ \infty&{\rm otherwise}\end{array}\right. θ4(t)={t1/2ift+,otherwise\theta_{4}(t)=\left\{\begin{array}[]{cl}t^{1/2}&{\rm if}\ t\in\mathbb{R}_{+},\\ \infty&{\rm otherwise}\end{array}\right.
θ5(t)={t2/3ift+,otherwise\theta_{5}(t)=\left\{\begin{array}[]{cl}t^{2/3}&{\rm if}\ t\in\mathbb{R}_{+},\\ \infty&{\rm otherwise}\end{array}\right. θ6(t)={1ift>2aρ(a+1),ρt(ρ(a+1)t2)24(a21)if2ρ(a+1)<t2aρ(a+1),ρtif 0t2a+1,ift<0(a>1)\theta_{6}(t)=\!\left\{\begin{array}[]{cl}1&\textrm{if}\ t>\frac{2a}{\rho(a+1)},\\ \!\rho t-\!\frac{(\rho(a+1)t-2)^{2}}{4(a^{2}-1)}&{\rm if}\ \frac{2}{\rho(a+1)}<t\leq\frac{2a}{\rho(a+1)},\\ \rho t&{\rm if}\ 0\leq t\leq\frac{2}{a+1},\\ \infty&{\rm if}\ t<0\\ \end{array}\right.\ \ (a>1)

1.1 Related work

Problem (1) is a special case of the optimization models considered in [1, 2, 10, 16, 17, 25], for which the nonsmooth regularization term has a separable structure and the nonsmooth function associated with each block has a closed-form proximal mapping. Hence, the proximal alternating linearized minimization (PALM) methods and their inertial versions developed in [1, 2, 17, 25] are suitable for dealing with (1), whose basic idea is to minimize alternately a proximal version of the linearization of Φλ,μ\Phi_{\lambda,\mu} at the current iterate, the sum of the linearization of FF at this iterate and λi=1r[θ(Ui)+θ(Vi)]+μ2[UF2+VF2]\lambda\sum_{i=1}^{r}\!\big{[}\theta(\|U_{i}\|)+\theta(\|V_{i}\|)\big{]}+\frac{\mu}{2}\big{[}\|U\|_{F}^{2}+\|V\|_{F}^{2}\big{]}. The iteration of these PALM methods depends on the global Lipschitz moduli L1(V)L_{1}(V^{\prime}) and L1(U)L_{1}(U^{\prime}) of the partial gradients UF(,V)\nabla_{U}F(\cdot,V^{\prime}) and VF(U,)\nabla_{V}F(U^{\prime},\cdot) for any fixed Vm×rV^{\prime}\!\in\mathbb{R}^{m\times r} and Un×rU^{\prime}\in\mathbb{R}^{n\times r}. These constants are available whenever that of f\nabla\!f is known, but they are usually much larger than that of f\nabla\!f, which brings a challenge to the solution of subproblems with a first-order method. Although the descent lemma can be employed to search a tighter one in computation, it is consuming due to solution of additional subproblems. Since the Lipschitz constant of f\nabla\!f is available or easier to estimate, it is natural to ask whether an efficient alternating minimization (AM) method can be designed by using the Lipschitz continuity of f\nabla\!f only. The block coordinate variable metric algorithms in [10, 16] are also applicable to (1), but their efficiency is dependent on an appropriate choice of variable metric linear operators, and their convergence analysis requires the exact solutions of variable metric subproblems. Now it is unclear which kind of variable metric linear operators is suitable for (1), and whether the variable metric subproblems involving nonsmooth ϑ\vartheta have a closed-form solution.

Recently, the authors tried an AM method for problem (1) with θ=θ1\theta=\theta_{1} by leveraging the Lipschitz continuity of f\nabla\!f only (see [24, Algorithm 2]), tested its efficiency on matrix completion problems with f(X)=12XF2f(X)=\frac{1}{2}\|X\|_{F}^{2} for Xn×mX\in\mathbb{R}^{n\times m}, but did not achieve any convergence results on the generated iterate sequence [24]. This work is a deep dive of [24] and aims to provide a convergence certificate for the iterate sequence of this AM method.

1.2 Main contribution

We achieve a majorization of Φλ,μ\Phi_{\lambda,\mu} at each iterate by using the Lipschitz continuity of f\nabla\!f, and propose a proximal AM method by minimizing this majorization alternately, which is an extension of the proximal AM algorithm of [24] to the general model (1). The main contributions of this work involve the following three aspects.

(i) The proposed PAMA only involves the Lipschitz constant of f\nabla\!f, which is usually known or easier to estimate before starting algorithm. Then, compared with the existing PALM methods and their inertial versions in [1, 2, 17, 25], this PAMA has a potential advantage in running time due to avoiding the estimation of L1(V)L_{1}(V^{\prime}) and L1(U)L_{1}(U^{\prime}) in each iteration.

(ii) As will be shown in Section 3, our PAMA is actually a variable metric proximal AM method. Different from the variable metric methods in [10, 16], our variable metric linear operators are natural products of majorizing FF by the Lipschitz continuity of f\nabla\!f. In particular, by introducing a subspace correction step to per subproblem, we overcome the difficulty that the variable metric proximal subproblems involving ϑ\vartheta have no closed-form solutions.

(iii) Our majorized PAMA with subspace correction has a theoretical certificate. Specifically, we prove the subsequence convergence of the iterate sequence for all θi\theta_{i} in Table 1, and establish the convergence of the whole iterate sequence and the column subspace sequences of factor pairs under the KL property of Φλ,μ\Phi_{\lambda,\mu} and the restrictive conditions (29a)-(29b) that can be satisfied by θ1\theta_{1}, or by θ4\theta_{4}-θ5\theta_{5} if there is a stationary point with distinct nonzero singular values. To the best of our knowledge, this appears to be the first subspace correction AM method with convergence certificate for low-rank composite factorization models. Hastie et al. [13] proposed a PAMA with subspace correction (named softImpute-ALS) for the factorization form of the nuclear-norm regularized least squares model, but did not provide the convergence analysis of the iterate sequence even the objective value sequence.

1.3 Notation

Throughout this paper, IrI_{r} means a r×rr\times r identity matrix, 𝕆n1×n2\mathbb{O}^{n_{1}\times n_{2}} denotes the set of all n1×n2n_{1}\times n_{2} matrices with orthonormal columns, and 𝕆n1\mathbb{O}^{n_{1}} means 𝕆n1×n1\mathbb{O}^{n_{1}\times n_{1}}. For a matrix Xn1×n2X\in\mathbb{R}^{n_{1}\times n_{2}}, X,X\|X\|,\,\|X\|_{*} and X2,0\|X\|_{2,0} denote the spectral norm, nuclear norm, and column 2,0\ell_{2,0}-norm of XX, respectively, σ(X)=(σ1(X),,σn(X))\sigma(X)=(\sigma_{1}(X),\ldots,\sigma_{n}(X))^{\top} with σ1(X)σn(X)\sigma_{1}(X)\geq\cdots\geq\sigma_{n}(X), 𝕆n,m(X):={(U,V)𝕆n×𝕆m|X=U[Diag(σ(X)) 0]V}\mathbb{O}^{n,m}(X)\!:=\{(U,V)\in\mathbb{O}^{n}\times\mathbb{O}^{m}\ |\ X=U[{\rm Diag}(\sigma(X))\ \ 0]V^{\top}\}, and Σκ(X)=Diag(σ1(x),,σκ(X))\Sigma_{\kappa}(X)={\rm Diag}(\sigma_{1}(x),\ldots,\sigma_{\kappa}(X)). For an integer k1k\geq 1, [k]:={1,,k}[k]\!:=\{1,\ldots,k\}. For a matrix Zn1×n2Z\in\mathbb{R}^{n_{1}\times n_{2}}, ZjZ_{j} means the jjth column of ZZ, JZJ_{Z} denotes its index set of nonzero columns, col(Z){\rm col}(Z) and row(Z){\rm row}(Z) denote the subspace spanned by all columns and rows of ZZ. For an index set J[r]J\subset[r], we write J¯:=[r]\J\overline{J}:=[r]\backslash J, and define

ϑJ(ZJ)=iJθ(Zi)andϑJ¯(ZJ¯)=iJ¯θ(Zi)forZl×r.\vartheta_{\!J}(Z_{J})={\textstyle\sum_{i\in J}}\,\theta(\|Z_{i}\|)\ \ {\rm and}\ \vartheta_{\overline{J}}(Z_{\overline{J}})={\textstyle\sum_{i\in\overline{J}}}\,\theta(\|Z_{i}\|)\ \ {\rm for}\ Z\in\mathbb{R}^{l\times r}. (3)

2 Preliminaries

Recall that for a proper lsc function h:n¯h\!:\mathbb{R}^{n}\to\overline{\mathbb{R}}, its proximal mapping associated with parameter γ>0\gamma>0 is defined as 𝒫γh(z):=argminxn{12γzx2+h(x)}\mathcal{P}_{\!\gamma}h(z):=\mathop{\arg\min}_{x\in\mathbb{R}^{n}}\big{\{}\frac{1}{2\gamma}\|z-x\|^{2}+h(x)\big{\}} for znz\in\mathbb{R}^{n}. The mapping 𝒫γh\mathcal{P}_{\!\gamma}h is generally multi-valued unless the function hh is convex. The following lemma states that the proximal mapping of the function ϑ\vartheta in (2) can be obtained with that of θ\theta. Since its proof is immediate by condition (C.1), we do not include it.

Lemma 2.1

Fix any γ>0\gamma>0 and unu\in\mathbb{R}^{n}. Let SS^{*} be the optimal solution set of

minxn{12γxu2+λθ(x)}.\min_{x\in\mathbb{R}^{n}}\Big{\{}\frac{1}{2}\|\gamma x-u\|^{2}+\lambda\theta(\|x\|)\Big{\}}.

Then, S={0}S^{*}=\{0\} when u=0u=0; otherwise S=uu𝒫λ/γ2θ(u/γ)S^{*}=\frac{u}{\|u\|}\mathcal{P}_{\!\lambda/\gamma^{2}}\theta(\|u\|/\gamma).

2.1 Stationary point of problem (1)

Before introducing the concept of stationary points for problem (1), we first recall from [19] the notion of subdifferentials.

Definition 2.1

Consider a function h:n×m¯h\!:\mathbb{R}^{n\times m}\to\overline{\mathbb{R}} and a point xx with h(x)h(x) finite. The regular subdifferential of hh at xx, denoted by ^h(x)\widehat{\partial}h(x), is defined as

^h(x):={vn×m|lim infxxxh(x)h(x)v,xxxxF0},\widehat{\partial}h(x):=\bigg{\{}v\in\mathbb{R}^{n\times m}\ \big{|}\ \liminf_{x\neq x^{\prime}\to x}\frac{h(x^{\prime})-h(x)-\langle v,x^{\prime}-x\rangle}{\|x^{\prime}-x\|_{F}}\geq 0\bigg{\}},

and the basic (known as limiting or Morduhovich) subdifferential of hh at xx is defined as

h(x):={vn×m|xkxwithh(xk)h(x)andvkvwithvk^h(xk)}.\partial h(x):=\Big{\{}v\in\mathbb{R}^{n\times m}\ |\ \exists\,x^{k}\to x\ {\rm with}\ h(x^{k})\to h(x)\ {\rm and}\ v^{k}\to v\ {\rm with}\ v^{k}\in\widehat{\partial}h(x^{k})\Big{\}}.

The following lemma characterizes the subdifferential of the function ϑ\vartheta in (2). Since the proof is immediate by [19, Theorem 10.49 & Proposition 10.5], we omit it.

Lemma 2.2

At a given (U¯,V¯)𝕏r(\overline{U},\overline{V})\in\mathbb{X}_{r}, ϑ(U¯)=𝒰¯1××𝒰¯r\partial\vartheta(\overline{U})\!=\overline{\mathcal{U}}_{1}\times\cdots\times\overline{\mathcal{U}}_{r} and ϑ(V¯)=𝒱¯1××𝒱¯r\vartheta(\overline{V})\!=\overline{\mathcal{V}}_{1}\times\cdots\times\overline{\mathcal{V}}_{r} with

𝒰¯j={{θ(U¯j)U¯jU¯j}ifU¯j0,yθ(0)(y)ifU¯j=0and𝒱¯j={{θ(V¯j)V¯jV¯j}ifV¯j0,yθ(0)(y)ifV¯j=0.\mathcal{\overline{U}}_{j}\!=\!\left\{\begin{array}[]{cl}\Big{\{}\frac{\theta^{\prime}(\|\overline{U}_{j}\|)\overline{U}_{j}}{\|\overline{U}_{j}\|}\Big{\}}&\ {\rm if}\ \overline{U}_{j}\!\neq\!0,\\ \!\bigcup_{y\in\partial\theta(0)}\partial(y\|\cdot\|)&\ {\rm if}\ \overline{U}_{\!j}\!=\!0\end{array}\right.\ {\rm and}\ \mathcal{\overline{V}}_{j}\!=\!\left\{\begin{array}[]{cl}\Big{\{}\frac{\theta^{\prime}(\|\overline{V}_{j}\|)\overline{V}_{j}}{\|\overline{V}_{j}\|}\Big{\}}&\ {\rm if}\ \overline{V}_{j}\!\neq\!0,\\ \!\bigcup_{y\in\partial\theta(0)}\partial(y\|\cdot\|)&\ {\rm if}\ \overline{V}_{\!j}\!=\!0.\end{array}\right.

Motivated by Lemma 2.2, we introduce the following concept of stationary points.

Definition 2.2

A factor pair (U¯,V¯)𝕏r(\overline{U},\overline{V})\in\mathbb{X}_{r} is called a stationary point of problem (1) if

0Φλ,μ(U¯,V¯)=(Uf(U¯V¯)V¯+μU¯+λ[𝒰¯1××𝒰¯r][Vf(U¯V¯)]U¯+μV¯+λ[𝒱¯1××𝒱¯r])0\in\partial\Phi_{\lambda,\mu}(\overline{U},\overline{V})=\begin{pmatrix}\nabla_{U}f(\overline{U}\overline{V}^{\top})\overline{V}+\mu\overline{U}+\lambda[\,\mathcal{\overline{U}}_{1}\times\cdots\times\mathcal{\overline{U}}_{r}]\\ \big{[}\nabla_{V}f(\overline{U}\overline{V}^{\top})\big{]}^{\top}\overline{U}+\mu\overline{V}+\lambda[\,\mathcal{\overline{V}}_{1}\times\cdots\times\mathcal{\overline{V}}_{r}]\end{pmatrix}

where, for each j[r]j\in[r], the sets 𝒰¯j\mathcal{\overline{U}}_{j} and 𝒱¯j\mathcal{\overline{V}}_{j} take the same form as in Lemma 2.2.

2.2 Relation between column 2,p\ell_{2,p}-norm and Schatten pp-norm

Fix any p(0,1]p\in(0,1]. Recall that the column 2,p\ell_{2,p}-norm of a matrix Xn×mX\in\mathbb{R}^{n\times m} is defined as X2,p:=(j=1mXjp)1/p\|X\|_{2,p}\!:=\big{(}\sum_{j=1}^{m}\|X_{j}\|^{p}\big{)}^{{1}/{p}}, while its Schatten pp-norm is defined as XSp:=(i=1n[σi(X)]p)1/p\|X\|_{\rm S_{p}}\!:=\big{(}\sum_{i=1}^{n}[\sigma_{i}(X)]^{p}\big{)}^{{1}/{p}}. The following lemma states that XSp\|X\|_{\rm S_{p}} is not greater than 2,p\ell_{2,p}-norm X2,p\|X\|_{2,p}.

Lemma 2.3

Fix any Xn×mX\in\mathbb{R}^{n\times m} of rank rr. For any R𝕆d×rR\in\mathbb{O}^{d\times r} with rdnr\leq d\leq n, it holds that XSppi=1d[(RΣr(X)R)ii]p\|X\|^{p}_{\rm S_{p}}\leq\sum_{i=1}^{d}\big{[}(R\Sigma_{r}(X)R^{\top})_{ii}\big{]}^{p} and consequently XSppX2,pp\|X\|^{p}_{\rm S_{p}}\leq\|X\|^{p}_{2,p}.

Proof: For each i[d]i\in[d], (RΣr(X)R)ii=j=1rRij2σj(X)(R\Sigma_{r}(X)R^{\top})_{ii}=\sum_{j=1}^{r}R_{ij}^{2}\sigma_{j}(X), which implies that

i=1d[(RΣr(X)R)ii]p=i=1d(j=1rRij2σj(X))p.\sum_{i=1}^{d}\big{[}(R\Sigma_{r}(X)R^{\top})_{ii}\big{]}^{p}=\sum_{i=1}^{d}\Big{(}\sum_{j=1}^{r}R_{ij}^{2}\sigma_{j}(X)\Big{)}^{p}.

For each i[d]i\in[d], let αi=j=1rRij2\alpha_{i}=\sum_{j=1}^{r}R_{ij}^{2}. As R𝕆d×rR\in\mathbb{O}^{d\times r}, we have αi[0,1]\alpha_{i}\in[0,1] for each i[d]i\in[d]. Note that the function +ttp\mathbb{R}_{+}\ni t\mapsto t^{p} is concave because p(0,1]p\in(0,1]. Moreover, if αi0\alpha_{i}\neq 0, j=1rRij2/αi=1\sum_{j=1}^{r}{R_{ij}^{2}}/{\alpha_{i}}=1. For each i[d]i\in[d] with αi0\alpha_{i}\neq 0, from Jensen’s inequality,

(j=1rRij2αiσj(X))pj=1rRij2αi(σj(X))p,\Big{(}\sum\limits_{j=1}^{r}\frac{R_{ij}^{2}}{\alpha_{i}}\sigma_{j}(X)\Big{)}^{p}\geq\sum\limits_{j=1}^{r}\frac{R_{ij}^{2}}{\alpha_{i}}(\sigma_{j}(X))^{p},

which by αi(0,1]\alpha_{i}\!\in\!(0,1] and 0<p10\!<\!p\!\leq\!1 implies that (j=1rRij2σj(X))pj=1rRij2(σj(X))p\big{(}\sum_{j=1}^{r}R_{ij}^{2}\sigma_{j}(X)\big{)}^{p}\!\geq\!\sum_{j=1}^{r}R_{ij}^{2}(\sigma_{j}(X))^{p}. Together with 1=i=1dRij2=i:αi0Rij21=\sum_{i=1}^{d}R_{ij}^{2}=\sum_{i:\,\alpha_{i}\neq 0}R_{ij}^{2} for each j[r]j\in[r], it follows that

i=1d[(RΣr(X)R)ii]p\displaystyle\sum_{i=1}^{d}\big{[}(R\Sigma_{r}(X)R^{\top})_{ii}\big{]}^{p} =i=1d(j=1rRij2σj(X))p=i:αi0(j=1rRij2σj(X))p\displaystyle=\sum_{i=1}^{d}\Big{(}\sum_{j=1}^{r}R_{ij}^{2}\sigma_{j}(X)\Big{)}^{p}=\sum_{i:\,\alpha_{i}\neq 0}\Big{(}\sum_{j=1}^{r}R_{ij}^{2}\sigma_{j}(X)\Big{)}^{p}
i:αi0j=1rRij2(σj(X))p=j=1r(σj(X))p.\displaystyle\geq\sum_{i:\,\alpha_{i}\neq 0}\sum_{j=1}^{r}R_{ij}^{2}(\sigma_{j}(X))^{p}=\sum_{j=1}^{r}(\sigma_{j}(X))^{p}. (4)

The first part then follows. For the second part, let XX have the thin SVD as X=P1Σr(X)Q1X=P_{1}\Sigma_{r}(X)Q_{1}^{\top} with P1𝕆n×rP_{1}\in\mathbb{O}^{n\times r} and Q1𝕆m×rQ_{1}\in\mathbb{O}^{m\times r}. By the definition, it holds that

XSpp=i=1r(σi2(X))p/2i=1m[(Q1Σr(X)2Q1)ii]p/2=i=1m[(XX)ii]p/2=X2,pp\|X\|^{p}_{\rm S_{p}}\!=\!\sum_{i=1}^{r}(\sigma_{i}^{2}(X))^{p/2}\leq\sum_{i=1}^{m}\big{[}(Q_{1}\Sigma_{r}(X)^{2}Q_{1}^{\top})_{ii}\big{]}^{p/2}\!=\!\sum_{i=1}^{m}\big{[}(X^{\top}X)_{ii}\big{]}^{{p}/{2}}\!=\!\|X\|_{2,p}^{p}

where the inequality is implied by the above (2.2). The proof is completed. \Box

By invoking Lemma 2.3, we obtain the factorization form of the Schatten pp-norm.

Proposition 2.1

For any matrix Xn×mX\in\mathbb{R}^{n\times m} with rank rdr\leq d, it holds that

2XSpp=minUn×d,Vm×d{U2,p2p+V2,p2ps.t.X=UV}.2\|X\|_{\rm S_{p}}^{p}=\min_{U\in\mathbb{R}^{n\times d},V\in\mathbb{R}^{m\times d}}\Big{\{}\|U\|_{2,p}^{2p}+\|V\|_{2,p}^{2p}\ \ {\rm s.t.}\ \ X=UV^{\top}\Big{\}}.

Proof: From [20, Theorem 2], it follows that the following relation holds

XSp=minUn×d,Vm×dX=UV121/p(US2p2p+VS2p2p)1/p,\|X\|_{\rm S_{p}}=\min_{U\in\mathbb{R}^{n\times d},V\in\mathbb{R}^{m\times d}\atop X=UV^{\top}}\frac{1}{2^{1/p}}\Big{(}\|U\|^{2p}_{\rm S_{2p}}+\|V\|^{2p}_{\rm S_{2p}}\Big{)}^{1/p},

which together with Lemma 2.3 immediately implies that

XSpminUn×d,Vm×dX=UV121/p(U2,p2p+V2,p2p)1/p.\|X\|_{\rm S_{p}}\leq\min_{U\in\mathbb{R}^{n\times d},V\in\mathbb{R}^{m\times d}\atop X=UV^{\top}}\frac{1}{2^{1/p}}\Big{(}\|U\|^{2p}_{2,p}+\|V\|^{2p}_{2,p}\Big{)}^{1/p}.

By taking U¯=P1[Σd(X)]1/2\overline{U}=P_{1}[\Sigma_{d}(X)]^{1/2} and V¯=Q1[Σd(X)]1/2\overline{V}=Q_{1}[\Sigma_{d}(X)]^{1/2} where X=P1Σd(X)Q1X=P_{1}\Sigma_{d}(X)Q_{1}^{\top} with P1𝕆n×dP_{1}\in\mathbb{O}^{n\times d} and Q1𝕆m×dQ_{1}\in\mathbb{O}^{m\times d} is the thin SVD of XX, we get U¯2,p2p=V¯2,p2p=XSpp\|\overline{U}\|^{2p}_{2,p}=\|\overline{V}\|^{2p}_{2,p}=\|X\|^{p}_{\rm S_{p}}. The result holds. \Box

2.3 Kurdyka-Lojasiewicz property

We recall from [1] the concept of the KL property of an extended real-valued function.

Definition 2.3

Let h:𝕏¯h\!:\mathbb{X}\to\overline{\mathbb{R}} be a proper lower semicontinuous (lsc) function. The function hh is said to have the Kurdyka-Lojasiewicz (KL) property at x¯domh\overline{x}\in{\rm dom}\,\partial h if there exist η(0,+]\eta\in(0,+\infty], a continuous concave function φ:[0,η)+\varphi\!:[0,\eta)\to\mathbb{R}_{+} satisfying

  • (i)

    φ(0)=0\varphi(0)=0 and φ\varphi is continuously differentiable on (0,η)(0,\eta),

  • (ii)

    for all s(0,η)s\in(0,\eta), φ(s)>0\varphi^{\prime}(s)>0;

and a neighborhood 𝒰\mathcal{U} of x¯\overline{x} such that for all x𝒰[h(x¯)<h(x)<h(x¯)+η],x\in\mathcal{U}\cap\big{[}h(\overline{x})<h(x)<h(\overline{x})+\eta\big{]},

φ(h(x)h(x¯))dist(0,h(x))1.\varphi^{\prime}(h(x)-h(\overline{x})){\rm dist}(0,\partial h(x))\geq 1.

If hh satisfies the KL property at each point of domh{\rm dom}\,\partial h, then it is called a KL function.

Remark 2.1

By Definition 2.3 and [1, Lemma 2.1], a proper lsc function has the KL property at every point of (h)1(0)(\partial h)^{-1}(0). Thus, to show that a proper lsc h:𝕏¯h\!:\mathbb{X}\to\overline{\mathbb{R}} is a KL function, it suffices to check that hh has the KL property at any critical point.

3 A majorized PAMA with subspace correction

Fix any (U,V)𝕏r(U^{\prime},V^{\prime})\in\mathbb{X}_{r}. Recall that f\nabla f is Lipschitz continuous with modulus LfL_{\!f}. Then, for any (U,V)𝕏r(U,V)\in\mathbb{X}_{r}, it holds that

f(UV)f(UV)+f(UV),UVUV+Lf2UVUVF2:=F^(U,V,U,V),f(UV^{\top}\!)\!\leq\!f(U^{\prime}{V^{\prime}}^{\top}\!)\!+\!\langle\nabla\!f(U^{\prime}{V^{\prime}}^{\top}\!),UV^{\top}\!-\!U^{\prime}{V^{\prime}}^{\top}\rangle\!+\!\frac{L_{\!f}}{2}\|UV^{\top}\!\!-\!U^{\prime}{V^{\prime}}^{\top}\!\|_{F}^{2}:=\!\widehat{F}(U,V,U^{\prime},V^{\prime}), (5)

which together with the expression of Φλ,μ\Phi_{\lambda,\mu} immediately implies that

Φλ,μ(U,V)F^(U,V,U,V)+λ[ϑ(U)+ϑ(V)]+μ2(UF2+VF2):=Φ^λ,μ(U,V,U,V).\Phi_{\lambda,\mu}(U,V)\leq\widehat{F}(U,V,U^{\prime},V^{\prime})+\lambda\big{[}\vartheta(U)+\vartheta(V)\big{]}+\frac{\mu}{2}\big{(}\|U\|_{F}^{2}+\|V\|_{F}^{2}\big{)}:=\widehat{\Phi}_{\lambda,\mu}(U,V,U^{\prime},V^{\prime}). (6)

Note that Φ^λ,μ(U,V,U,V)=Φλ,μ(U,V)\widehat{\Phi}_{\lambda,\mu}(U^{\prime},V^{\prime},U^{\prime},V^{\prime})=\Phi_{\lambda,\mu}(U^{\prime},V^{\prime}), so Φ^λ,μ(,,U,V)\widehat{\Phi}_{\lambda,\mu}(\cdot,\cdot,U^{\prime},V^{\prime}) is a majorization of Φλ,μ\Phi_{\lambda,\mu} at (U,V)(U^{\prime},V^{\prime}). Let (Uk,Vk)(U^{k},V^{k}) be the current iterate. It is natural to develop an algorithm for problem (1) by minimizing the function Φ^λ,μ(,,Uk,Vk)\widehat{\Phi}_{\lambda,\mu}(\cdot,\cdot,U^{k},V^{k}) alternately or by the following iteration:

Uk+1argminUn×r{UF(Uk,Vk),U+λϑ(U)+μ2UF2+Lf2(UUk)(Vk)F2},\displaystyle U^{k+1}\in\mathop{\arg\min}_{U\in\mathbb{R}^{n\times r}}\Big{\{}\langle\nabla_{U}\!F(U^{k},V^{k}),U\rangle+\lambda\vartheta(U)+\frac{\mu}{2}\|U\|_{F}^{2}\!+\!\frac{L_{f}}{2}\|(U-U^{k})(V^{k})^{\top}\|_{F}^{2}\Big{\}},
Vk+1argminVm×r{VF(Uk+1,Vk),V+λϑ(V)+μ2VF2+Lf2Uk+1(VVk)F2}.\displaystyle V^{k+1}\in\mathop{\arg\min}_{V\in\mathbb{R}^{m\times r}}\Big{\{}\langle\nabla_{V}\!F(U^{k+1},V^{k}),V\rangle+\lambda\vartheta(V)+\frac{\mu}{2}\|V\|_{F}^{2}\!+\!\frac{L_{f}}{2}\|U^{k+1}(V\!-\!V^{k})^{\top}\|_{F}^{2}\Big{\}}.

Compared with the PALM method, such a majorized AM method is actually a variable metric proximal AM method. Unfortunately, due to the nonsmooth regularizer ϑ\vartheta, these two variable metric subproblems have no closed-form solutions, which brings a great challenge for convergence analysis of the generated iterate sequence. Inspired by the fact that variable metric proximal methods are more effective especially for ill-conditioned problems, we introduce a subspace correction step to per proximal subproblem so as to guarantee that the variable metric proximal subproblem at the corrected factor has a closed-form solution, and propose the following majorized PAMA with subspace correction.

Algorithm 1 (A majorized PAMA with subspace correction)

Initialization: Input parameters ϱ(0,1),γ1¯>0,γ2¯>0,γ1,0>0\varrho\in(0,1),\underline{\gamma_{1}}>0,\underline{\gamma_{2}}>0,\gamma_{1,0}>0 and γ2,0>0\gamma_{2,0}>0. Choose P¯0𝕆m×r,P^0𝕆n×r,Q¯0=D¯0=Ir\overline{P}^{0}\!\in\mathbb{O}^{m\times r},\widehat{P}^{0}\!\in\mathbb{O}^{n\times r},\overline{Q}^{0}=\overline{D}^{0}=I_{r}, and let (U¯0,V¯0)=(P^0,P¯0)(\overline{U}^{0},\overline{V}^{0})=(\widehat{P}^{0},\overline{P}^{0}).
For k=0,1,2,k=0,1,2,\ldots do

  • 1.

    Compute Uk+1argminUn×r{Φ^λ,μ(U,V¯k,U¯k,V¯k)+γ1,k2UU¯kF2}.U^{k+1}\in\displaystyle{\mathop{\arg\min}_{U\in\mathbb{R}^{n\times r}}}\Big{\{}\widehat{\Phi}_{\lambda,\mu}(U,\overline{V}^{k},\overline{U}^{k},\overline{V}^{k})+\frac{\gamma_{1,k}}{2}\|U-\overline{U}^{k}\|_{F}^{2}\Big{\}}.

  • 2.

    Perform a thin SVD for Uk+1D¯kU^{k+1}\overline{D}^{k} such that Uk+1D¯k=P^k+1(D^k+1)2(Q^k+1)U^{k+1}\overline{D}^{k}=\widehat{P}^{k+1}(\widehat{D}^{k+1})^{2}(\widehat{Q}^{k+1})^{\top} with P^k+1𝕆n×r,Q^k+1𝕆r\widehat{P}^{k+1}\in\mathbb{O}^{n\times r},\widehat{Q}^{k+1}\in\mathbb{O}^{r} and (D^k+1)2=Diag(σ(Uk+1D¯k))(\widehat{D}^{k+1})^{2}={\rm Diag}(\sigma(U^{k+1}\overline{D}^{k})), and set

    U^k+1:=P^k+1D^k+1,V^k+1:=P¯kQ^k+1D^k+1andX^k+1:=U^k+1(V^k+1).\widehat{U}^{k+1}:=\widehat{P}^{k+1}\widehat{D}^{k+1},\ \ \widehat{V}^{k+1}\!:=\overline{P}^{k}\widehat{Q}^{k+1}\widehat{D}^{k+1}\ \ {\rm and}\ \ \widehat{X}^{k+1}\!:=\widehat{U}^{k+1}(\widehat{V}^{k+1})^{\top}.
  • 3.

    Compute Vk+1argminVm×r{Φ^λ,μ(U^k+1,V,U^k+1,V^k+1)+γ2,k2VV^k+1F2}.V^{k+1}\in\displaystyle{\mathop{\arg\min}_{V\in\mathbb{R}^{m\times r}}}\Big{\{}\widehat{\Phi}_{\lambda,\mu}(\widehat{U}^{k+1},V,\widehat{U}^{k+1},\widehat{V}^{k+1})+\frac{\gamma_{2,k}}{2}\|V-\widehat{V}^{k+1}\|_{F}^{2}\Big{\}}.

  • 4.

    Find a thin SVD for Vk+1D^k+1V^{k+1}\widehat{D}^{k+1} such that Vk+1D^k+1=P¯k+1(D¯k+1)2(Q¯k+1)V^{k+1}\widehat{D}^{k+1}=\overline{P}^{k+1}(\overline{D}^{k+1})^{2}(\overline{Q}^{k+1})^{\top} with P¯k+1𝕆m×r,Q¯k+1𝕆r\overline{P}^{k+1}\in\mathbb{O}^{m\times r},\overline{Q}^{k+1}\in\mathbb{O}^{r} and (D¯k+1)2=Diag(σ(Vk+1D^k+1))(\overline{D}^{k+1})^{2}={\rm Diag}(\sigma(V^{k+1}\widehat{D}^{k+1})), and set

    U¯k+1:=P^k+1Q¯k+1D¯k+1,V¯k+1:=P¯k+1D¯k+1andX¯k+1:=U¯k+1(V¯k+1).\overline{U}^{k+1}\!:=\widehat{P}^{k+1}\overline{Q}^{k+1}\overline{D}^{k+1},\ \ \overline{V}^{k+1}\!:=\overline{P}^{k+1}\overline{D}^{k+1}\ \ {\rm and}\ \ \overline{X}^{k+1}\!:=\overline{U}^{k+1}(\overline{V}^{k+1})^{\top}\!.
  • 5.

    Set γ1,k+1=max(γ1¯,ϱγ1,k)\gamma_{1,k+1}=\max(\underline{\gamma_{1}},\varrho\gamma_{1,k}) and γ2,k+1=max(γ2¯,ϱγ2,k)\gamma_{2,k+1}=\max(\underline{\gamma_{2}},\varrho\gamma_{2,k}).

end (For)

Remark 3.1

(a) Steps 2 and 4 are the subspace correction steps. Step 2 is constructing a new factor pair (U^k+1,V^k+1)(\widehat{U}^{k+1},\widehat{V}^{k+1}) by performing an SVD for Uk+1D¯kU^{k+1}\overline{D}^{k}, whose column space col(U^k+1){\rm col}(\widehat{U}^{k+1}) can be regarded as a correction one for that of Uk+1U^{k+1}. This step guarantees that the proximal minimization of the majorization Φ^λ,μ(,,U^k+1,V^k+1)\widehat{\Phi}_{\lambda,\mu}(\cdot,\cdot,\widehat{U}^{k+1},\widehat{V}^{k+1}) with respect to VV has a closed-form solution. Similarly, step 4 is constructing a new factor pair (U¯k+1,V¯k+1)(\overline{U}^{k+1},\overline{V}^{k+1}) by performing an SVD for Vk+1D^k+1V^{k+1}\widehat{D}^{k+1}, whose column space col(V¯k+1){\rm col}(\overline{V}^{k+1}) can be viewed as a correction one for that of V^k+1\widehat{V}^{k+1}. This step ensures that the proximal minimization of the majorization Φ^λ,μ(,,U¯k+1,V¯k+1)\widehat{\Phi}_{\lambda,\mu}(\cdot,\cdot,\overline{U}^{k+1},\overline{V}^{k+1}) with respect to UU has a closed-form solution. By Theorem 4.1, the objective value at (U^k+1,V^k+1)(\widehat{U}^{k+1},\widehat{V}^{k+1}) is strictly less than the one at (U¯k,V¯k)(\overline{U}^{k},\overline{V}^{k}), while the objective value at (U¯k+1,V¯k+1)(\overline{U}^{k+1},\overline{V}^{k+1}) is strictly less the one at (U^k+1,V^k+1)(\widehat{U}^{k+1},\widehat{V}^{k+1}). From this point of view, the subspace correction steps contribute to reducing the objective values. To the best of our knowledge, such a subspace correction technique first appeared in the alternating least squares method for the nuclear-norm regularized least squares factorized model [13], and here it is employed to treat the nonconvex and nonsmooth problem (1). From steps 2 and 4, for each kk\in\mathbb{N},

Uk+1D¯k(P¯k)=Uk+1(V¯k)=X^k+1=U^k+1(V^k+1),\displaystyle{}U^{k+1}\overline{D}^{k}(\overline{P}^{k})^{\top}=U^{k+1}(\overline{V}^{k})^{\top}=\widehat{X}^{k+1}=\widehat{U}^{k+1}(\widehat{V}^{k+1})^{\top}, (8a)
P^k+1D^k+1(Vk+1)=U^k+1(Vk+1)=X¯k+1=U¯k+1(V¯k+1).\displaystyle\widehat{P}^{k+1}\widehat{D}^{k+1}(V^{k+1})^{\top}=\widehat{U}^{k+1}(V^{k+1})^{\top}=\overline{X}^{k+1}=\overline{U}^{k+1}(\overline{V}^{k+1})^{\top}. (8b)

(b) In steps 1 and 3, we introduce a proximal term with a uniformly positive proximal parameter to guarantee the sufficient decrease of the objective value sequence. As will be shown in Section 5, its uniformly lower bound γ1¯\underline{\gamma_{1}} or γ2¯\underline{\gamma_{2}} is easily chosen. By the optimality of Uk+1U^{k+1} and Vk+1V^{k+1} in steps 1 and 3 and [19, Exercise 8.8], for each kk, it holds that

0[f(X¯k)+Lf(X^k+1X¯k)]V¯k+μUk+1+γ1,k(Uk+1U¯k)+λϑ(Uk+1);\displaystyle 0\!\in\!\big{[}\nabla\!f(\overline{X}^{k})+L_{\!f}(\widehat{X}^{k+1}\!-\!\overline{X}^{k})\big{]}\overline{V}^{k}+\mu U^{k+1}+\gamma_{1,k}(U^{k+1}\!-\overline{U}^{k})+\lambda\partial\vartheta(U^{k+1});\qquad
0[f(X^k+1)+Lf(X¯k+1X^k+1)]U^k+1+μVk+1+γ2,k(Vk+1V^k+1)+λϑ(Vk+1).\displaystyle 0\!\in\!\big{[}\nabla f(\widehat{X}^{k+1})\!+L_{\!f}(\overline{X}^{k+1}\!\!\!-\!\widehat{X}^{k+1})\big{]}^{\top}\widehat{U}^{k+1}\!+\!\mu{V}^{k+1}\!+\!\gamma_{2,k}({V}^{k+1}\!-\!\widehat{V}^{k+1})\!+\!\lambda\partial\vartheta(V^{k+1}).

(c) By step 1, equations (8a)-(8b) and the expression of Φ^λ,μ(,V¯k,U¯k,V¯k)\widehat{\Phi}_{\lambda,\mu}(\cdot,\!\overline{V}^{k}\!\!,\!\overline{U}^{k}\!,\!\overline{V}^{k}\!), we have

Uk+1argminUn×r{12GkUΛkF2+λi=1rθ(Ui)}U^{k+1}\in\mathop{\arg\min}_{U\in\mathbb{R}^{n\times r}}\Big{\{}\frac{1}{2}\big{\|}G^{k}-U\Lambda^{k}\big{\|}_{F}^{2}+\lambda\sum_{i=1}^{r}\theta(\|U_{i}\|)\Big{\}}

where Gk:=(LfZ¯kP¯k+γ1,kP^kQ¯k)D¯k(Λk)1G^{k}\!:=\!\big{(}L_{\!f}\overline{Z}^{k}\overline{P}^{k}\!+\!\gamma_{1,k}\widehat{P}^{k}\overline{Q}^{k}\big{)}\overline{D}^{k}(\Lambda^{k})^{-1} with Z¯k=X¯kLf1f(X¯k)\overline{Z}^{k}=\overline{X}^{k}\!-\!L_{\!f}^{-1}\nabla\!f(\overline{X}^{k}) and Λk=[Lf(D¯k)2+(μ+γ1,kIr)]1/2\Lambda^{k}=\!\big{[}L_{\!f}(\overline{D}^{k})^{2}\!+(\mu+\gamma_{1,k}I_{r})\big{]}^{1/2} for each kk. By invoking Lemma 2.1, for each i[r]i\in[r],

Uik+1{GikGik𝒫λ/(Λiik)2θ(GikΛiik)ifGik>0,{0}ifGik=0.U_{i}^{k+1}\in\left\{\!\begin{array}[]{cl}\frac{G_{i}^{k}}{\|G_{i}^{k}\|}\mathcal{P}_{\!{\lambda}/{(\Lambda_{ii}^{k})^{2}}}\theta\Big{(}\frac{\|G^{k}_{i}\|}{\Lambda_{ii}^{k}}\Big{)}&{\rm if}\ \|G_{i}^{k}\|>0,\\ \{0\}&{\rm if}\ \|G_{i}^{k}\|=0.\end{array}\right. (9)

While by step 3, equations (8a)-(8b) and the expression of Φ^λ,μ(U^k+1,,U^k+1,V^k+1)\widehat{\Phi}_{\lambda,\mu}(\widehat{U}^{k\!+\!1}\!\!,\cdot,\!\widehat{U}^{k\!+\!1}\!\!,\!\widehat{V}^{k\!+\!1}),

Vk+1argminVm×r{12Hk+1VΔk+1F2+λi=1rθ(Vi)},V^{k+1}\in\mathop{\arg\min}_{V\in\mathbb{R}^{m\times r}}\Big{\{}\frac{1}{2}\big{\|}H^{k+1}\!-\!V\Delta^{k+1}\big{\|}_{F}^{2}\!+\!\lambda\sum_{i=1}^{r}\theta(\|V_{i}\|)\Big{\}},

where, for each kk\in\mathbb{N}, Hk+1=(Lf(Z^k+1)P^k+1+γ2,kP¯kQ^k+1)D^k+1(Δk+1)1H^{k+1}\!=\!\big{(}L_{\!f}(\widehat{Z}^{k+1})^{\top}\widehat{P}^{k+1}\!+\gamma_{2,k}\overline{P}^{k}\widehat{Q}^{k+1}\big{)}\widehat{D}^{k+1}(\Delta^{k+1})^{-1} with Z^k+1=X^k+1Lf1f(X^k+1)\widehat{Z}^{k+1}\!=\!\widehat{X}^{k+1}\!-\!L_{\!f}^{-1}\nabla\!f(\widehat{X}^{k+1}) and Δk+1=[Lf(D^k+1)2+(μ+γ2,k)Ir]1/2\Delta^{k+1}\!=\!\big{[}L_{\!f}(\widehat{D}^{k+1})^{2}\!+\!(\mu+\gamma_{2,k})I_{r}\big{]}^{1/2}. By Lemma 2.1,

Vik+1{Hik+1Hik+1𝒫λ/(Δiik+1)2θ(Hik+1Δiik+1)ifHik+1>0,{0}ifHik+1=0foreachi[r].V_{i}^{k+1}\in\left\{\!\begin{array}[]{cl}\frac{H_{i}^{k+1}}{\|H_{i}^{k+1}\|}\mathcal{P}_{\!{\lambda}/{(\Delta_{ii}^{k+1})^{2}}}\theta\Big{(}\frac{\|H_{i}^{k+1}\|}{\Delta_{ii}^{k+1}}\Big{)}&{\rm if}\ \|H_{i}^{k+1}\|>0,\\ \{0\}&{\rm if}\ \|H_{i}^{k+1}\|=0\end{array}\right.\ \ {\rm for\ each}\ i\in[r]. (10)

Recall that θ\theta is assumed to have a closed-form proximal mapping, so the main cost of Algorithm 1 in each step is to perform an SVD for Uk+1D¯kU^{k+1}\overline{D}^{k} and Vk+1D^k+1V^{k+1}\widehat{D}^{k+1}, which is not expensive because rr is usually chosen to be far less than min{m,n}\min\{m,n\}.

From Remark 3.1, Algorithm 1 is well defined. For its iterate sequences {(Uk,Vk)}k\{(U^{k},V^{k})\}_{k\in\mathbb{N}}, {(U^k,V^k)}k\{(\widehat{U}^{k},\widehat{V}^{k})\}_{k\in\mathbb{N}} and {(U¯k,V¯k)}k\{(\overline{U}^{k},\overline{V}^{k})\}_{k\in\mathbb{N}}, the following two propositions establish the relation among their column spaces and nonzero column indices. Since the proof of Proposition 3.2 is similar to that of [24, Proposition 4.2 (iii)], we here do not include it.

Proposition 3.1

Let {(Uk,Vk,U^k,V^k,U¯k,V¯k,X^k,X¯k)}k\big{\{}(U^{k},V^{k},\widehat{U}^{k},\widehat{V}^{k},\overline{U}^{k},\overline{V}^{k},\widehat{X}^{k},\overline{X}^{k})\big{\}}_{k\in\mathbb{N}} be the sequence generated by Algorithm 1. Then, for every kk\in\mathbb{N}, the following inclusions hold

col(U¯k+1)col(U^k+1)col(Uk+1)andcol(V^k+2)col(V¯k+1)col(Vk+1).{\rm col}(\overline{U}^{k+1})\subset{\rm col}(\widehat{U}^{k+1})\subset{\rm col}(U^{k+1})\ \ {\rm and}\ \ {\rm col}(\widehat{V}^{k+2})\subset{\rm col}(\overline{V}^{k+1})\subset{\rm col}(V^{k+1}).

Proof: From equation (8a), col(X^k+1)col(Uk+1){\rm col}(\widehat{X}^{k+1})\subset{\rm col}({U}^{k+1}) and row(X^k+1)col(V¯k){\rm row}(\widehat{X}^{k+1})\subset{\rm col}(\overline{V}^{k}). While from step 2 of Algorithm 1, X^k+1=P^k+1(D^k+1)2(P¯kQ^k+1)\widehat{X}^{k+1}=\widehat{P}^{k+1}(\widehat{D}^{k+1})^{2}(\overline{P}^{k}\!\widehat{Q}^{k+1})^{\top}, by which it is easy to check that col(U^k+1)=col(X^k+1){\rm col}(\widehat{U}^{k+1})={\rm col}(\widehat{X}^{k+1}) and col(V^k+1)=row(X^k+1){\rm col}(\widehat{V}^{k+1})={\rm row}(\widehat{X}^{k+1}). Then,

col(U^k+1)col(Uk+1)andcol(V^k+1)col(V¯k).{\rm col}(\widehat{U}^{k+1})\subset{\rm col}(U^{k+1})\ \ {\rm and}\ \ {\rm col}(\widehat{V}^{k+1})\subset{\rm col}(\overline{V}^{k}). (11)

Similarly, from equation (8b), col(X¯k+1)col(U^k+1){\rm col}(\overline{X}^{k+1})\subset{\rm col}(\widehat{U}^{k+1}) and row(X¯k+1)col(Vk+1){\rm row}(\overline{X}^{k+1})\subset{\rm col}({V}^{k+1}). From step 4 of Algorithm 1, X¯k+1=P^k+1Q¯k+1(D¯k+1)2(P¯k+1)\overline{X}^{k+1}=\widehat{P}^{k+1}\overline{Q}^{k+1}(\overline{D}^{k+1})^{2}(\overline{P}^{k+1})^{\top}. Then, it holds that

col(U¯k+1)=col(X¯k+1)col(U^k+1)andcol(V¯k+1)=row(X¯k+1)col(Vk+1).{\rm col}(\overline{U}^{k+1})={\rm col}(\overline{X}^{k+1})\subset{\rm col}(\widehat{U}^{k+1})\ \ {\rm and}\ \ {\rm col}(\overline{V}^{k+1})={\rm row}(\overline{X}^{k+1})\subset{\rm col}({V}^{k+1}). (12)

From the above equations (11)-(12), we immediately obtain the desired result. \Box

Proposition 3.2

Let {(Uk,Vk,U^k,V^k,U¯k,V¯k,X^k,X¯k)}k\big{\{}(U^{k}\!,V^{k}\!,\widehat{U}^{k}\!,\widehat{V}^{k}\!,\overline{U}^{k}\!,\overline{V}^{k}\!,\widehat{X}^{k}\!,\overline{X}^{k})\big{\}}_{k\in\mathbb{N}} be the sequence generated by Algorithm 1. Then, there exists k¯\overline{k}\in\mathbb{N} such that for all kk¯k\geq\overline{k},

JVk=JUk=JU^k=JV^k=JV¯k=JU¯k=JU¯k+1,\displaystyle J_{V^{k}}=J_{U^{k}}=J_{\widehat{U}^{k}}=J_{\widehat{V}^{k}}=J_{\overline{V}^{k}}\!=\!J_{\overline{U}^{k}}\!=\!J_{\overline{U}^{k+1}},\qquad\qquad\qquad (13)
rank(X¯k)=rank(X^k)=rank(U^k)=rank(V^k)=rank(U¯k)=rank(V¯k)=U¯k2,0,\displaystyle{\rm rank}(\overline{X}^{k})={\rm rank}(\widehat{X}^{k})\!=\!{\rm rank}(\widehat{U}^{k})\!=\!{\rm rank}(\widehat{V}^{k})\!=\!{\rm rank}(\overline{U}^{k})\!=\!{\rm rank}(\overline{V}^{k})=\|\overline{U}^{k}\|_{2,0}, (14)

and hence col(U¯k+1)=col(U^k+1)=col(Uk+1){\rm col}(\overline{U}^{k+1})\!=\!{\rm col}(\widehat{U}^{k+1})\!=\!{\rm col}(U^{k+1}) and col(V^k+2)=col(V¯k+1)=col(Vk+1){\rm col}(\widehat{V}^{k+2})\!=\!{\rm col}(\overline{V}^{k+1})\!=\!{\rm col}(V^{k+1}).

4 Convergence analysis of Algorithm 1

This section will establish the convergence of the objective value sequence and the iterate sequence generated by Algorithm 1 under additional conditions for the function θ\theta.

4.1 Convergence of objective value sequence

To achieve the convergence of sequences {Φλ,μ(U¯k,V¯k)}k\{\Phi_{\lambda,\mu}(\overline{U}^{k},\overline{V}^{k})\}_{k\in\mathbb{N}} and {Φλ,μ(U^k,V^k)}k\{\Phi_{\lambda,\mu}(\widehat{U}^{k},\widehat{V}^{k})\}_{k\in\mathbb{N}}, we require the following assumption on ϑ\vartheta or θ\theta.

Assumption 1

For any given Xn×mX\!\in\mathbb{R}^{n\times m} of rank(X)κ{\rm rank}(X)\!\leq\!\kappa and (P,Q)𝕆n,m(X)(P,Q)\!\in\!\mathbb{O}^{n,m}(X), the factor pair (U¯,V¯)=(P1[Σκ(X)]12,Q1[Σκ(X)]12)(\overline{U},\overline{V})=\!(P_{1}[\Sigma_{\kappa}(X)]^{\frac{1}{2}},Q_{1}[\Sigma_{\kappa}(X)]^{\frac{1}{2}}) satisfies

ϑ(U¯)+ϑ(V¯)=inf(U,V)n×κ×m×κ{ϑ(U)+ϑ(V)s.t.X=UV},\vartheta(\overline{U})+\vartheta(\overline{V})=\inf_{(U,V)\in\mathbb{R}^{n\times\kappa}\times\mathbb{R}^{m\times\kappa}}\Big{\{}\vartheta(U)+\vartheta(V)\ \ {\rm s.t.}\ \ X=UV^{\top}\Big{\}},

where P1P_{1} and Q1Q_{1} are the submatrix consisting of the first κ\kappa columns of PP and QQ.

Assumption 1 is rather mild, and it can be satisfied by the function ϑ\vartheta associated with some common θ\theta to promote sparsity. For example, when θ=θ1\theta=\theta_{1} and θ6\theta_{6}, Assumption 1 holds by [5], when θ=θ2\theta=\theta_{2}, it holds due to [21, Lemma 1], and when θ=θ3θ5\theta=\theta_{3}-\theta_{5}, it holds by Proposition 2.1.

Under Assumption 1, by following the similar proof to that of [24, Proposition 4.2 (i)], we can establish the convergence of the objective value sequences, which is stated as follows.

Theorem 4.1

Let {(Uk,Vk,U^k,V^k,U¯k,V¯k,X^k,X¯k)}k\big{\{}(U^{k},V^{k},\widehat{U}^{k},\widehat{V}^{k},\overline{U}^{k},\overline{V}^{k},\widehat{X}^{k},\overline{X}^{k})\big{\}}_{k\in\mathbb{N}} be the sequence generated by Algorithm 1, and write γ¯:=min{γ1¯,γ2¯}\underline{\gamma}:=\min\{\underline{\gamma_{1}},\underline{\gamma_{2}}\}. Then, under Assumption 1, for each kk\in\mathbb{N},

Φλ,μ(U¯k,V¯k)\displaystyle\Phi_{\lambda,\mu}(\overline{U}^{k},\overline{V}^{k}) Φλ,μ(U^k+1,V^k+1)+(γ¯/2)Uk+1U¯kF2\displaystyle\geq{\Phi}_{\lambda,\mu}(\widehat{U}^{k+1},\widehat{V}^{k+1})+(\underline{\gamma}/2)\|U^{k+1}-\overline{U}^{k}\|_{F}^{2} (15)
Φλ,μ(U¯k+1,V¯k+1)+γ¯2(Uk+1U¯kF2+Vk+1V^k+1F2),\displaystyle\geq\Phi_{\lambda,\mu}(\overline{U}^{k+1},\overline{V}^{k+1})+\frac{\underline{\gamma}}{2}\big{(}\|U^{k+1}\!-\overline{U}^{k}\|_{F}^{2}+\|V^{k+1}\!-\widehat{V}^{k+1}\|_{F}^{2}\big{)}, (16)

so {Φλ,μU¯k,V¯k)}k\{\Phi_{\lambda,\mu}\overline{U}^{k},\overline{V}^{k})\}_{k\in\mathbb{N}} and {Φλ,μ(U^k,V^k)}k\{\Phi_{\lambda,\mu}(\widehat{U}^{k},\widehat{V}^{k})\}_{k\in\mathbb{N}} converge to the same point, say, ϖ\varpi^{*}.

As a direct consequence of Theorem 4.1, we have the following conclusion.

Corollary 4.1

Let {(Uk,Vk,U^k,V^k,U¯k,V¯k,X^k,X¯k)}k\big{\{}(U^{k}\!,V^{k}\!,\widehat{U}^{k}\!,\widehat{V}^{k}\!,\overline{U}^{k}\!,\overline{V}^{k}\!,\widehat{X}^{k}\!,\overline{X}^{k})\big{\}}_{k\in\mathbb{N}} be the sequence given by Algorithm 1. Then, under Assumption 1, the following assertions are true.

  • (i)

    limkUk+1U¯kF=0\lim_{k\to\infty}\|U^{k+1}-\overline{U}^{k}\|_{F}=0 and limkVk+1V^k+1F=0\lim_{k\to\infty}\|V^{k+1}-\widehat{V}^{k+1}\|_{F}=0;

  • (ii)

    the sequence {(Uk,Vk,U^k,V^k,U¯k,V¯k,X^k,X¯k)}k\big{\{}(U^{k},V^{k},\widehat{U}^{k},\widehat{V}^{k},\overline{U}^{k},\overline{V}^{k},\widehat{X}^{k},\overline{X}^{k})\big{\}}_{k\in\mathbb{N}} is bounded;

  • (iii)

    by letting β¯=supkmax{V¯k,U^k}\overline{\beta}=\sup_{k\in\mathbb{N}}\max\big{\{}\|\overline{V}^{k}\|,\|\widehat{U}^{k}\|\big{\}}, for each kk\in\mathbb{N},

    Φλ,μ(U¯k,V¯k)\displaystyle\Phi_{\lambda,\mu}(\overline{U}^{k},\overline{V}^{k}) Φλ,μ(U¯k+1,V¯k+1)+γ¯4(Uk+1U¯kF2+Vk+1V^k+1F2)\displaystyle\geq{\Phi}_{\lambda,\mu}(\overline{U}^{k+1},\overline{V}^{k+1})\!+\!\frac{\underline{\gamma}}{4}\big{(}\|U^{k+1}-\overline{U}^{k}\|_{F}^{2}\!+\!\|V^{k+1}\!-\widehat{V}^{k+1}\|_{F}^{2}\big{)}
    +γ¯8β¯2(X^k+1X¯kF2+X¯k+1X^k+1F2)+γ¯16β¯2X¯kX¯k+1F2;\displaystyle\quad\!+\!\frac{\underline{\gamma}}{8\overline{\beta}^{2}}\big{(}\|\widehat{X}^{k+1}\!-\!\overline{X}^{k}\|^{2}_{F}\!+\!\|\overline{X}^{k+1}\!\!-\!\widehat{X}^{k+1}\|^{2}_{F}\big{)}\!+\!\frac{\underline{\gamma}}{16\overline{\beta}^{2}}\|\overline{X}^{k}\!\!\!-\!\overline{X}^{k+1}\|^{2}_{F};
  • (iv)

    limkX^k+1X¯kF=0\lim_{k\rightarrow\infty}\|\widehat{X}^{k+1}-\overline{X}^{k}\|_{F}=0 and limkX¯k+1X^k+1F=0\lim_{k\rightarrow\infty}\|\overline{X}^{k+1}-\widehat{X}^{k+1}\|_{F}=0.

Proof: (i)-(ii) Part (i) is obvious by Theorem 4.1, so it suffices to prove part (ii). By Theorem 4.1, Φλ,μ(U¯k+1,V¯k+1)Φλ,μ(U^k+1,V^k+1)Φλ,μ(U¯0,V¯0){\Phi}_{\lambda,\mu}(\overline{U}^{k+1},\overline{V}^{k+1})\leq{\Phi}_{\lambda,\mu}(\widehat{U}^{k+1},\widehat{V}^{k+1})\leq{\Phi}_{\lambda,\mu}(\overline{U}^{0},\overline{V}^{0}) for each kk\in\mathbb{N}. Recall that ff is lower bounded, so the function Φλ,μ{\Phi}_{\lambda,\mu} is coercive. Thus, the sequence {(U^k,V^k,U¯k,V¯k)}k\big{\{}(\widehat{U}^{k},\widehat{V}^{k},\overline{U}^{k},\overline{V}^{k})\big{\}}_{k\in\mathbb{N}} is bounded. Along with part (i), the sequence {(Uk,Vk)}k\{(U^{k},V^{k})\}_{k\in\mathbb{N}} is bounded. The boundedness of {(U^k,V^k,U¯k,V¯k)}k\big{\{}(\widehat{U}^{k},\widehat{V}^{k},\overline{U}^{k},\overline{V}^{k})\big{\}}_{k\in\mathbb{N}} implies that of {(X^k,X¯k)}k\{(\widehat{X}^{k},\overline{X}^{k})\}_{k\in\mathbb{N}} because X^k=U^k(V^k)\widehat{X}^{k}=\widehat{U}^{k}(\widehat{V}^{k})^{\top} and X¯k=U¯k(V¯k)\overline{X}^{k}=\overline{U}^{k}(\overline{V}^{k})^{\top} for each kk by equations (8a)-(8b).

(iii) Fix any kk\in\mathbb{N}. From equations (8a)-(8b), it holds that

X^k+1X¯kF=Uk+1(V¯k)U¯k(V¯k)FV¯kUk+1U¯kF,\displaystyle\!\|\widehat{X}^{k+1}-\overline{X}^{k}\|_{F}=\big{\|}U^{k+1}(\overline{V}^{k})^{\top}-\overline{U}^{k}(\overline{V}^{k})^{\top}\big{\|}_{F}\leq\|\overline{V}^{k}\|\|U^{k+1}-\overline{U}^{k}\|_{F},\qquad
X¯k+1X^k+1F=U^k+1(Vk+1)U^k+1(V^k+1)FU^k+1Vk+1V^k+1F.\displaystyle\!\|\overline{X}^{k+1}\!-\widehat{X}^{k+1}\|_{F}=\big{\|}\widehat{U}^{k+1}({V}^{k+1})^{\top}\!-\widehat{U}^{k+1}(\widehat{V}^{k+1})^{\top}\big{\|}_{F}\leq\|\widehat{U}^{k+1}\|\|V^{k+1}-\widehat{V}^{k+1}\|_{F}.

Combining these two inequalities with the definition of β¯\overline{\beta} and Theorem 4.1 leads to

Φλ,μ(U¯k,V¯k)\displaystyle\Phi_{\lambda,\mu}(\overline{U}^{k},\overline{V}^{k}) Φλ,μ(U¯k+1,V¯k+1)+(γ¯/4)(Uk+1U¯kF2+Vk+1V^k+1F2)\displaystyle\geq{\Phi}_{\lambda,\mu}(\overline{U}^{k+1},\overline{V}^{k+1})+({\underline{\gamma}}/{4})\big{(}\|U^{k+1}-\overline{U}^{k}\|_{F}^{2}+\|V^{k+1}-\widehat{V}^{k+1}\|_{F}^{2}\big{)}
+γ¯4β¯2(X^k+1X¯kF2+X¯k+1X^k+1F2).\displaystyle\quad+\frac{\underline{\gamma}}{4\overline{\beta}^{2}}\big{(}\|\widehat{X}^{k+1}-\overline{X}^{k}\|^{2}_{F}+\|\overline{X}^{k+1}-\widehat{X}^{k+1}\|^{2}_{F}\big{)}.

Along with 2X^k+1X¯kF2+2X¯k+1X^k+1F2X¯kX¯k+1F22\|\widehat{X}^{k+1}-\overline{X}^{k}\|_{F}^{2}+2\|\overline{X}^{k+1}\!-\!\widehat{X}^{k+1}\|_{F}^{2}\geq\|\overline{X}^{k}\!-\!\overline{X}^{k+1}\|_{F}^{2}, we get the result.

(iv) The result follows by part (iii) and the convergence of {Φλ,μ(U¯k,V¯k)}k\{\Phi_{\lambda,\mu}(\overline{U}^{k},\overline{V}^{k})\}_{k\in\mathbb{N}}. \Box

Remark 4.1

By Corollary 4.1 (ii) and Remark 3.1 (c), there exists a constant β^>0\widehat{\beta}>0 such that for all kk\in\mathbb{N} and i[r]i\in[r], (μ+γ¯)(Λiik)2β^(\mu+\underline{\gamma})\leq(\Lambda_{ii}^{k})^{2}\leq\widehat{\beta} and (μ+γ¯)(Δiik)2β^(\mu+\underline{\gamma})\leq(\Delta_{ii}^{k})^{2}\leq\widehat{\beta}, where Λk\Lambda^{k} and Δk\Delta^{k} are the diagonal matrices appearing in Remark 3.1 (c).

4.2 Subsequence convergence of iterate sequence

For convenience, for each kk\in\mathbb{N}, write Wk:=(U^k,V^k,U¯k,V¯k,X^k,X¯k)W^{k}\!:=\big{(}\widehat{U}^{k},\widehat{V}^{k},\overline{U}^{k},\overline{V}^{k},\widehat{X}^{k},\overline{X}^{k}\big{)}. The following theorem shows that every accumulation point of {Wk}k\{W^{k}\}_{k\in\mathbb{N}} is a stationary point of problem (1).

Theorem 4.2

Under Assumption 1, the following assertions hold true.

  • (i)

    The accumulation point set 𝒲\mathcal{W}^{*} of {Wk}k\{W^{k}\}_{k\in\mathbb{N}} is nonempty and compact.

  • (ii)

    For each W=(U^,V^,U¯,V¯,X^,X¯)𝒲W=(\widehat{U},\widehat{V},\overline{U},\overline{V},\widehat{X},\overline{X})\in\mathcal{W}^{*}, it holds that U^V^=X^=X¯=U¯V¯\widehat{U}\widehat{V}^{\top}=\widehat{X}=\overline{X}=\overline{U}\overline{V}^{\top} with rank(X¯)=rank(X^)=U¯2,0=U^2,0{\rm rank}(\overline{X})\!=\!{\rm rank}(\widehat{X})\!=\!\|\overline{U}\|_{2,0}\!=\!\|\widehat{U}\|_{2,0}, (U^,V^)=(P^[Σr(X^)]12,R^[Σr(X^)]12)(\widehat{U}\!,\widehat{V})\!=\!\big{(}\widehat{P}[\Sigma_{r}(\widehat{X})]^{\frac{1}{2}},\widehat{R}[\Sigma_{r}(\widehat{X})]^{\frac{1}{2}}\big{)} for some P^𝕆n×r\widehat{P}\in\mathbb{O}^{n\times r} and R^𝕆m×r\widehat{R}\in\mathbb{O}^{m\times r} such that X^=P^Σr(X^)R^\widehat{X}\!=\!\widehat{P}\Sigma_{r}(\widehat{X})\widehat{R}^{\top}, and (U¯,V¯)=(R¯[Σr(X¯)]12,Q¯[Σr(X¯)]12)(\overline{U},\overline{V})\!=\!\big{(}\overline{R}[\Sigma_{r}(\overline{X})]^{\frac{1}{2}}\!,\overline{Q}[\Sigma_{r}(\overline{X})]^{\frac{1}{2}}\big{)} for some R¯𝕆n×r,Q¯𝕆m×r\overline{R}\!\in\!\mathbb{O}^{n\times r},\overline{Q}\!\in\!\mathbb{O}^{m\times r} with X¯=R¯Σr(X¯)Q¯\overline{X}\!=\!\overline{R}\Sigma_{r}(\overline{X})\overline{Q}^{\top}.

  • (iii)

    For each W=(U^,V^,U¯,V¯,X^,X¯)𝒲W\!=(\widehat{U},\widehat{V},\overline{U},\overline{V},\widehat{X},\overline{X})\in\mathcal{W}^{*}, the factor pairs (U¯,V¯)(\overline{U},\overline{V}) and (U^,V^)(\widehat{U},\widehat{V}) are the stationary points of (1) and Φλ,μ(U¯,V¯)=Φλ,μ(U^,V^)=ϖ\Phi_{\lambda,\mu}(\overline{U},\overline{V})=\Phi_{\lambda,\mu}(\widehat{U},\widehat{V})=\varpi^{*}.

Proof: Part (i) is immediate by Corollary 4.1 (ii). We next take a closer look at part (ii). Pick any W=(U^,V^,U¯,V¯,X^,X¯)𝒲W=(\widehat{U},\widehat{V},\overline{U},\overline{V},\widehat{X},\overline{X})\in\mathcal{W}^{*}. Then, there exists an index set 𝒦\mathcal{K}\subset\mathbb{N} such that lim𝒦kWk=W\lim_{\mathcal{K}\ni k\to\infty}W^{k}=W. From steps 2 and 4 of Algorithm 1, it follows that

U^=lim𝒦kU^k+1=lim𝒦kP^k+1D^k+1andV^=lim𝒦kV^k+1=lim𝒦kR^k+1D^k+1,\displaystyle\!\widehat{U}\!=\!\lim_{\mathcal{K}\ni k\to\infty}\widehat{U}^{k+1}\!=\!\lim_{\mathcal{K}\ni k\to\infty}\widehat{P}^{k+1}\widehat{D}^{k+1}\ \ {\rm and}\ \ \widehat{V}\!=\!\lim_{\mathcal{K}\ni k\to\infty}\widehat{V}^{k+1}\!=\!\lim_{\mathcal{K}\ni k\to\infty}\widehat{R}^{k+1}\widehat{D}^{k+1}, (17)
U¯=lim𝒦kU¯k+1=lim𝒦kR¯k+1D¯k+1andV¯=lim𝒦kV¯k+1=lim𝒦kP¯k+1D¯k+1\displaystyle\overline{U}=\lim_{\mathcal{K}\ni k\to\infty}\overline{U}^{k+1}=\lim_{\mathcal{K}\ni k\to\infty}\overline{R}^{k+1}\overline{D}^{k+1}\ \ {\rm and}\ \ \overline{V}=\lim_{\mathcal{K}\ni k\to\infty}\overline{V}^{k+1}=\lim_{\mathcal{K}\ni k\to\infty}\overline{P}^{k+1}\overline{D}^{k+1} (18)

with R^k+1=P¯kQ^k+1\widehat{R}^{k+1}\!=\!\overline{P}^{k}\widehat{Q}^{k+1} and R¯k+1=P^k+1Q¯k+1\overline{R}^{k+1}\!=\!\widehat{P}^{k+1}\overline{Q}^{k+1} for each kk. Note that {R^k+1}k𝕆m×r\{\widehat{R}^{k+1}\}_{k\in\mathbb{N}}\subset\mathbb{O}^{m\times r} and {P^k+1}k𝕆n×r\{\widehat{P}^{k+1}\}_{k\in\mathbb{N}}\subset\mathbb{O}^{n\times r}. By the compactness of 𝕆l×r\mathbb{O}^{l\times r}, there exists an index set 𝒦1𝒦\mathcal{K}_{1}\subset\mathcal{K} such that the sequences {R^k+1}k𝒦1\{\widehat{R}^{k+1}\}_{k\in\mathcal{K}_{1}} and {P^k+1}k𝒦1\{\widehat{P}^{k+1}\}_{k\in\mathcal{K}_{1}} are convergent, i.e., there are R^𝕆m×r\widehat{R}\in\mathbb{O}^{m\times r} and P^𝕆n×r\widehat{P}\in\mathbb{O}^{n\times r} such that lim𝒦1kR^k+1=R^\lim_{\mathcal{K}_{1}\ni k\to\infty}\widehat{R}^{k+1}=\widehat{R} and lim𝒦1kP^k+1=P^\lim_{\mathcal{K}_{1}\ni k\to\infty}\widehat{P}^{k+1}=\widehat{P}. Together with lim𝒦kD^k+1=[Σr(X^)]1/2\lim_{\mathcal{K}\ni k\to\infty}\widehat{D}^{k+1}=[\Sigma_{r}(\widehat{X})]^{{1}/{2}} and the above equation (17), we obtain

U^=lim𝒦1kP^k+1D^k+1=P^[Σr(X^)]12andV^=lim𝒦1kR^k+1D^k+1=R^[Σr(X^)]12.\widehat{U}=\lim_{\mathcal{K}_{1}\ni k\to\infty}\widehat{P}^{k+1}\widehat{D}^{k+1}=\widehat{P}[\Sigma_{r}(\widehat{X})]^{\frac{1}{2}}\ \ {\rm and}\ \ \widehat{V}=\lim_{\mathcal{K}_{1}\ni k\to\infty}\widehat{R}^{k+1}\widehat{D}^{k+1}=\widehat{R}[\Sigma_{r}(\widehat{X})]^{\frac{1}{2}}. (19)

Along with X^=lim𝒦kX^k=lim𝒦kU^k(V^k)=U^V^\widehat{X}=\lim_{\mathcal{K}\ni k\to\infty}\widehat{X}^{k}=\lim_{\mathcal{K}\ni k\to\infty}\widehat{U}^{k}(\widehat{V}^{k})^{\top}=\widehat{U}\widehat{V}^{\top}, we have X^=P^Σr(X^)R^\widehat{X}=\widehat{P}\Sigma_{r}(\widehat{X})\widehat{R}^{\top}. By using (18) and the similar arguments, there are R¯𝕆n×r\overline{R}\!\in\!\mathbb{O}^{n\times r} and Q¯𝕆m×r\overline{Q}\!\in\!\mathbb{O}^{m\times r} such that U¯=R¯[Σr(X¯)]1/2\overline{U}\!=\!\overline{R}[\Sigma_{r}(\overline{X})]^{{1}/{2}} and V¯=Q¯[Σr(X¯)]1/2\overline{V}\!=\!\overline{Q}[\Sigma_{r}(\overline{X})]^{{1}/{2}}. Along with X¯=limKkX¯k=limKkU¯k(V¯k)=U¯V¯\overline{X}=\lim_{K\ni k\to\infty}\overline{X}^{k}=\lim_{K\ni k\to\infty}\overline{U}^{k}(\overline{V}^{k})^{\top}=\overline{U}\overline{V}^{\top}, X¯=R¯Σr(X¯)Q¯\overline{X}=\overline{R}\Sigma_{r}(\overline{X})\overline{Q}^{\top}. By Corollary 4.1 (iv), 0=lim𝒦kX^kX¯kF=X^X¯F0=\lim_{\mathcal{K}\ni k\to\infty}\|\widehat{X}^{k}-\overline{X}^{k}\|_{F}=\|\widehat{X}-\overline{X}\|_{F}.

(iii) Pick any W=(U^,V^,U¯,V¯,X^,X¯)𝒲W=(\widehat{U},\widehat{V},\overline{U},\overline{V},\widehat{X},\overline{X})\in\mathcal{W}^{*}. There exists an index set 𝒦\mathcal{K}\subset\mathbb{N} such that limKkWk=W\lim_{K\ni k\to\infty}W^{k}=W. We first claim that the following two limits hold:

lim𝒦kϑ(Uk+1)=ϑ(U¯)andlim𝒦kϑ(Vk+1)=ϑ(V^).\lim_{\mathcal{K}\ni k\to\infty}\vartheta(U^{k+1})=\vartheta(\overline{U})\ \ {\rm and}\ \ \lim_{\mathcal{K}\ni k\to\infty}\vartheta(V^{k+1})=\vartheta(\widehat{V}). (20)

Indeed, for each kk\in\mathbb{N}, by the definition of Uk+1U^{k+1} in step 1 and the expression of Φ^λ,μ\widehat{\Phi}_{\lambda,\mu},

F^(Uk+1,V¯k,U¯k,V¯k)+μ2Uk+1F2+λϑ(Uk+1)+γ1,k2Uk+1U¯kF2\displaystyle\widehat{F}(U^{k+1},\overline{V}^{k},\overline{U}^{k},\overline{V}^{k})+\frac{\mu}{2}\|{U}^{k+1}\|_{F}^{2}+\lambda\vartheta({U}^{k+1})+\frac{\gamma_{1,k}}{2}\|U^{k+1}-\overline{U}^{k}\|_{F}^{2}
F^(U¯,V¯k,U¯k,V¯k)+μ2U¯F2+λϑ(U¯)+γ1,k2U¯U¯kF2.\displaystyle\leq\widehat{F}(\overline{U},\overline{V}^{k},\overline{U}^{k},\overline{V}^{k})+\frac{\mu}{2}\|\overline{U}\|_{F}^{2}+\lambda\vartheta(\overline{U})+\frac{\gamma_{1,k}}{2}\|\overline{U}-\overline{U}^{k}\|_{F}^{2}.

From Corollary 4.1 (i) and lim𝒦kU¯k=U¯\lim_{\mathcal{K}\ni k\to\infty}\overline{U}^{k}=\overline{U}, we have lim𝒦kUk+1=U¯\lim_{\mathcal{K}\ni k\to\infty}U^{k+1}=\overline{U}. Now passing the limit 𝒦k\mathcal{K}\ni k\to\infty to the above inequality and using lim𝒦kWk=W\lim_{\mathcal{K}\ni k\to\infty}W^{k}=W, Corollary 4.1 (i) and the continuity of F^\widehat{F} results in lim sup𝒦kϑ(Uk+1)ϑ(U¯)\limsup_{\mathcal{K}\ni k\to\infty}\vartheta({U}^{k+1})\leq\vartheta(\overline{U}). Along with the lower semicontinuity of ϑ\vartheta, we get lim𝒦kϑ(Uk+1)=ϑ(U¯)\lim_{\mathcal{K}\ni k\to\infty}\vartheta({U}^{k+1})=\vartheta(\overline{U}). Similarly, for each kk\in\mathbb{N}, by the definition Vk+1V^{k+1} in step 3 and the expression of Φ^λ,μ\widehat{\Phi}_{\lambda,\mu},

F^(U^k+1,V^k+1,U^k+1,V^k+1)+μ2Vk+1F2+λϑ(Vk+1)+γ1,k2Vk+1V^k+1F2\displaystyle\widehat{F}(\widehat{U}^{k+1},\widehat{V}^{k+1},\widehat{U}^{k+1},\widehat{V}^{k+1})+\frac{\mu}{2}\|V^{k+1}\|_{F}^{2}+\lambda\vartheta(V^{k+1})+\frac{\gamma_{1,k}}{2}\|V^{k+1}-\widehat{V}^{k+1}\|_{F}^{2}
F^(U^k+1,V^,U^k+1,V^k+1)+μ2V^F2+λϑ(V^)+γ1,k2Vk+1V^F2.\displaystyle\leq\widehat{F}(\widehat{U}^{k+1},\widehat{V},\widehat{U}^{k+1},\widehat{V}^{k+1})+\frac{\mu}{2}\|\widehat{V}\|_{F}^{2}+\lambda\vartheta(\widehat{V})+\frac{\gamma_{1,k}}{2}\|V^{k+1}-\widehat{V}\|_{F}^{2}.

Following the same arguments as above leads to the second limit in (20).

Now passing the limit 𝒦k\mathcal{K}\ni k\to\infty to the inclusions in Remark 3.1 (b) and using (20) and Corollary 4.1 (i) and (iv) results in the following inclusions

0f(X¯)V¯+μU¯+λϑ(U¯)and 0f(X^)U^+μV^+λϑ(V^).0\in\nabla f(\overline{X})\overline{V}+\mu\overline{U}+\lambda\partial\vartheta(\overline{U})\ \ {\rm and}\ \ 0\in\nabla\!f(\widehat{X})^{\top}\widehat{U}+\mu\widehat{V}+\lambda\partial\vartheta(\widehat{V}).

Let r¯=U¯2,0,J=[r¯]\overline{r}\!\!=\!\!\|\overline{U}\|_{2,0},J\!\!=\!\![\overline{r}] and J¯=[r]J\overline{J}\!=\![r]\setminus J. By part (ii), U¯J¯=U^J¯=0\overline{U}_{\overline{J}}\!=\!\widehat{U}_{\overline{J}}\!\!=\!\!0 and V¯J¯=V^J¯=0\overline{V}_{\overline{J}}\!\!=\!\!\widehat{V}_{\overline{J}}\!\!=\!\!0. By Lemma 2.2 and the above inclusions, 0Γ:=yθ(0)(y)(0)0\in\Gamma\!:=\bigcup_{y\in\partial\theta(0)}\partial(y\|\cdot\|)(0) and for each jJj\in J,

0=f(X¯)V¯j+μU¯j+λθ(U¯j)U¯j1U¯j,\displaystyle 0=\nabla f(\overline{X})\overline{V}_{\!j}+\mu\overline{U}_{j}+\lambda\theta^{\prime}(\|\overline{U}_{\!j}\|)\|\overline{U}_{j}\|^{-1}\overline{U}_{j},
0=f(X^)U^j+μV^j+λθ(V^j)V^j1V^j.\displaystyle 0=\nabla f(\widehat{X})^{\top}\widehat{U}_{j}+\mu\widehat{V}_{j}+\lambda\theta^{\prime}(\|\widehat{V}_{j}\|)\|\widehat{V}_{j}\|^{-1}\widehat{V}_{j}.

By part (ii), U¯j=V^j=σj(X^)1/2=σj(X¯)1/2\|\overline{U}_{\!j}\|=\|\widehat{V}_{j}\|=\sigma_{j}(\widehat{X})^{1/2}=\sigma_{j}(\overline{X})^{1/2} for each jJj\in J, which implies that

Diag(U¯1,,U¯r¯)=Σ11/2andDiag(V^1,,V^r¯)=Σ11/2{\rm Diag}\big{(}\|\overline{U}_{1}\|,\ldots,\|\overline{U}_{\overline{r}}\|\big{)}=\Sigma_{1}^{1/2}\ \ {\rm and}\ \ {\rm Diag}\big{(}\|\widehat{V}_{1}\|,\ldots,\|\widehat{V}_{\overline{r}}\|\big{)}=\Sigma_{1}^{1/2}

with Σ1=Diag(σ1(X¯),,σr¯(X¯))\Sigma_{1}={\rm Diag}(\sigma_{1}(\overline{X}),\ldots,\sigma_{\overline{r}}(\overline{X})). Write Λ:=Diag(θ(U¯1),,θ(U¯r¯))\Lambda\!:={\rm Diag}(\theta^{\prime}(\|\overline{U}_{1}\|),\ldots,\theta^{\prime}(\|\overline{U}_{\overline{r}}\|)). Then, the above two equalities for all jJj\in J can be compactly written as

0=f(X¯)V¯J+μU¯J+λU¯JΣ11/2Λ,\displaystyle 0=\nabla f(\overline{X})\overline{V}_{\!J}+\mu\overline{U}_{\!J}+\lambda\overline{U}_{\!J}\Sigma_{1}^{-1/2}\Lambda, (22a)
0=f(X^)U^J+μV^J+λV^JΣ11/2Λ.\displaystyle 0=\nabla f(\widehat{X})^{\top}\widehat{U}_{\!J}+\mu\widehat{V}_{J}+\lambda\widehat{V}_{\!J}\Sigma_{1}^{-1/2}\Lambda. (22b)

By part (ii), Σr(X^)=Σr(X¯):=Σr\Sigma_{r}(\widehat{X})=\!\Sigma_{r}(\overline{X})\!:=\Sigma_{r} and P^ΣrR^=U^V^=X^=X¯=U¯V¯=R¯ΣrQ¯\widehat{P}\Sigma_{r}\widehat{R}^{\top}\!=\widehat{U}\widehat{V}^{\top}\!=\widehat{X}=\overline{X}=\overline{U}\overline{V}^{\top}=\overline{R}\Sigma_{r}\overline{Q}^{\top}. Recall that P^,R¯𝕆n×r\widehat{P},\overline{R}\in\mathbb{O}^{n\times r} and R^,Q¯𝕆m×r\widehat{R},\overline{Q}\in\mathbb{O}^{m\times r}. There exist P^,R¯𝕆n×(nr)\widehat{P}^{\perp},\overline{R}^{\perp}\in\mathbb{O}^{n\times(n-r)} and R^,Q¯𝕆m×(mr)\widehat{R}^{\perp},\overline{Q}^{\perp}\in\mathbb{O}^{m\times(m-r)} such that [P^P^],[R¯R¯]𝕆n[\widehat{P}\ \ \widehat{P}^{\perp}],[\overline{R}\ \ \overline{R}^{\perp}]\in\mathbb{O}^{n}, [R^R^],[Q¯Q¯]𝕆m[\widehat{R}\ \ \widehat{R}^{\perp}],[\overline{Q}\ \ \overline{Q}^{\perp}]\in\mathbb{O}^{m} and

[P^P^](Σr000)[R^R^]=[R¯R¯](Σr000)[Q¯Q¯].\big{[}\widehat{P}\ \ \widehat{P}^{\perp}\big{]}\begin{pmatrix}\Sigma_{r}&0\\ 0&0\end{pmatrix}\big{[}\widehat{R}\ \ \widehat{R}^{\perp}\big{]}^{\top}=\big{[}\overline{R}\ \ \overline{R}^{\perp}\big{]}\begin{pmatrix}\Sigma_{r}&0\\ 0&0\end{pmatrix}\big{[}\overline{Q}\ \ \overline{Q}^{\perp}\big{]}^{\top}.

Let μ¯1>>μ¯κ\overline{\mu}_{1}>\cdots>\overline{\mu}_{\kappa} be the distinct singular values of X¯=X^\overline{X}=\widehat{X}. For each l[κ]l\in[\kappa], write αl:={j[n]|σj(X¯)=μ¯l}\alpha_{l}:=\{j\in[n]\ |\ \sigma_{j}(\overline{X})=\overline{\mu}_{l}\}. From the above equality and [11, Proposition 5], there is a block diagonal orthogonal Q~=BlkDiag(Q~1,,Q~κ)\widetilde{Q}={\rm BlkDiag}(\widetilde{Q}_{1},\ldots,\widetilde{Q}_{\kappa}) with Q~l𝕆|αl|\widetilde{Q}_{l}\in\mathbb{O}^{|\alpha_{l}|} such that

[P^P^]=[R¯R¯]Q~,[R^R^]=[Q¯Q¯]Q~andQ~(Σr000)=(Σr000)Q~.\big{[}\widehat{P}\ \ \widehat{P}^{\perp}\big{]}=\big{[}\overline{R}\ \ \overline{R}^{\perp}\big{]}\widetilde{Q},\ \big{[}\widehat{R}\ \ \widehat{R}^{\perp}\big{]}=\big{[}\overline{Q}\ \ \overline{Q}^{\perp}\big{]}\widetilde{Q}\ \ {\rm and}\ \ \widetilde{Q}\begin{pmatrix}\Sigma_{r}&0\\ 0&0\end{pmatrix}=\begin{pmatrix}\Sigma_{r}&0\\ 0&0\end{pmatrix}\widetilde{Q}.

Let Q~1=BlkDiag(Q~1,,Q~κ1)𝕆r¯\widetilde{Q}^{1}={\rm BlkDiag}(\widetilde{Q}_{1},\ldots,\widetilde{Q}_{\kappa-1})\in\mathbb{O}^{\overline{r}}, and let P^1\widehat{P}^{1} and R^1\widehat{R}^{1} be the matrices consisting of the first r¯\overline{r} columns of P^\widehat{P} and R^\widehat{R}. Then, P^1=R¯Q~1\widehat{P}^{1}=\overline{R}\widetilde{Q}^{1} and R^1=Q¯Q~1\widehat{R}^{1}=\overline{Q}\widetilde{Q}^{1}. Also, Q~1Σ1=Σ1Q~1\widetilde{Q}^{1}\Sigma_{1}=\Sigma_{1}\widetilde{Q}^{1}. Along with U^=P^Σr1/2,U¯=R¯Σr1/2\widehat{U}=\widehat{P}\Sigma_{r}^{1/2},\overline{U}=\overline{R}\Sigma_{r}^{1/2} and V^=R^Σr1/2,V¯=Q¯Σr1/2\widehat{V}\!=\widehat{R}\Sigma_{r}^{1/2},\overline{V}=\overline{Q}\Sigma_{r}^{1/2} by part (ii), we have U^J=P^1Σ11/2=R¯Q~1Σ11/2=R¯Σ11/2Q~1=U¯JQ~1\widehat{U}_{J}=\widehat{P}^{1}\Sigma_{1}^{1/2}=\overline{R}\widetilde{Q}^{1}\Sigma_{1}^{1/2}=\overline{R}\Sigma_{1}^{1/2}\widetilde{Q}^{1}=\overline{U}_{\!J}\widetilde{Q}^{1} and V^J=R^1Σ11/2=Q¯Q~1Σ11/2=Q¯Σ11/2Q~1=V¯JQ~1\widehat{V}_{J}=\widehat{R}^{1}\Sigma_{1}^{1/2}=\overline{Q}\widetilde{Q}^{1}\Sigma_{1}^{1/2}=\overline{Q}\Sigma_{1}^{1/2}\widetilde{Q}^{1}=\overline{V}_{\!J}\widetilde{Q}^{1}. Substituting U^J=U¯JQ~1,V^J=V¯JQ~1\widehat{U}_{J}=\overline{U}_{\!J}\widetilde{Q}^{1},\,\widehat{V}_{J}=\overline{V}_{\!J}\widetilde{Q}^{1} into (22a)-(22b) yields

f(X¯)V^J+μU^J+λU¯JΣ112ΛQ~1=0,\displaystyle\nabla f(\overline{X})\widehat{V}_{\!J}+\mu\widehat{U}_{\!J}+\lambda\overline{U}_{\!J}\Sigma_{1}^{-\frac{1}{2}}\Lambda\widetilde{Q}^{1}=0,
f(X¯)U¯J+μV¯J+λV^JΣ112Λ(Q~1)=0.\displaystyle\nabla f(\overline{X})^{\top}\overline{U}_{\!J}+\mu\overline{V}_{\!J}+\lambda\widehat{V}_{J}\Sigma_{1}^{-\frac{1}{2}}\Lambda(\widetilde{Q}^{1})^{\top}=0.

By the expressions of Q~1\widetilde{Q}^{1} and Λ\Lambda, we have ΛQ~1=Q~1Λ\Lambda\widetilde{Q}^{1}=\widetilde{Q}^{1}\Lambda and Λ(Q~1)=(Q~1)Λ\Lambda(\widetilde{Q}^{1})^{\top}=(\widetilde{Q}^{1})^{\top}\Lambda. Then Σ112ΛQ~1=Σ112Q~1Λ=Q~1Σ112Λ\Sigma_{1}^{-\frac{1}{2}}\Lambda\widetilde{Q}^{1}=\!\Sigma_{1}^{-\frac{1}{2}}\widetilde{Q}^{1}\Lambda=\!\widetilde{Q}^{1}\Sigma_{1}^{-\frac{1}{2}}\Lambda and Σ112Λ(Q~1)=Σ112(Q~1)Λ=(Q~1)Σ112Λ\Sigma_{1}^{-\frac{1}{2}}\Lambda(\widetilde{Q}^{1})^{\top}\!=\Sigma_{1}^{-\frac{1}{2}}(\widetilde{Q}^{1})^{\top}\Lambda\!=(\widetilde{Q}^{1})^{\top}\Sigma_{1}^{-\frac{1}{2}}\Lambda. Along with the above two equalities, U^J=U¯JQ~1\widehat{U}_{J}=\overline{U}_{\!J}\widetilde{Q}^{1} and V^J=V¯JQ~1\widehat{V}_{J}=\overline{V}_{\!J}\widetilde{Q}^{1}, we obtain

f(X¯)V^J+μU^J+λU^JΣ112Λ=0,\displaystyle\nabla f(\overline{X})\widehat{V}_{\!J}+\mu\widehat{U}_{\!J}+\lambda\widehat{U}_{\!J}\Sigma_{1}^{-\frac{1}{2}}\Lambda=0, (24a)
f(X¯)U¯J+μV¯J+λV¯JΣ112Λ=0.\displaystyle\nabla f(\overline{X})^{\top}\overline{U}_{\!J}+\mu\overline{V}_{\!J}+\lambda\overline{V}_{\!J}\Sigma_{1}^{-\frac{1}{2}}\Lambda=0. (24b)

Combining (24a), (22b) and 0Γ0\in\Gamma and invoking Definition 2.2, we conclude that (U^,V^)(\widehat{U},\widehat{V}) is a stationary point of (1), and combining (24b), (22a) and 0Γ0\in\Gamma and invoking Definition 2.2, we have that (U¯,V¯)(\overline{U},\overline{V}) is a stationary point of (1).

By part (ii) and the expression of Φλ,μ\Phi_{\lambda,\mu}, we have Φλ,μ(U¯,V¯)=Φλ,μ(U^,V^)\Phi_{\lambda,\mu}(\overline{U}\!,\overline{V})\!=\!\Phi_{\lambda,\mu}(\widehat{U}\!,\widehat{V}), so the rest only argues that Φλ,μ(U¯,V¯)=ϖ\Phi_{\lambda,\mu}(\overline{U}\!\!,\overline{V})\!=\!\varpi^{*}. Using the convergence of {Φλ,μ(U¯k,V¯k)}k\{\Phi_{\lambda,\mu}(\overline{U}^{k}\!,\overline{V}^{k})\!\}_{k\in\mathbb{N}} and {Φλ,μ(U^k,V^k)}k\{\Phi_{\lambda,\mu}(\widehat{U}^{k}\!,\widehat{V}^{k})\!\}_{k\in\mathbb{N}}, equation (20) and the continuity of F^\widehat{F} yields that limkΦλ,μ(U¯k+1,V¯k+1)=Φλ,μ(U¯,V¯)\lim_{k\!\to\!\infty}\!\!\Phi_{\lambda,\mu}\!(\!\overline{U}^{k\!+\!1}\!\!\!,\!\overline{V}^{k\!+\!1}\!)\!\!=\!\!\Phi_{\lambda,\mu}(\overline{U}\!,\!\overline{V}) and limkΦλ,μ(U^k+1,V^k+1)=Φλ,μ(U^,V^)\lim_{k\!\to\infty}\Phi_{\lambda,\mu}(\widehat{U}^{k+1},\widehat{V}^{k+1})=\Phi_{\lambda,\mu}(\widehat{U},\widehat{V}). The result holds by the arbitrariness of W𝒲W\in\mathcal{W}^{*}. \Box

Remark 4.2

Theorems 4.1 and 4.2 provide the theoretical guarantee for Algorithm 1 to solve problem (1) with ϑ\vartheta associated with θ1\theta_{1}-θ6\theta_{6} in Table 1. These results also provide the theoretical certificate for softImpute-ALS (see [13, Algorithm 3.1]).

4.3 Convergence of iterate sequence

Assumption 2

Fix any λβ^1γλ(μ+γ¯)1\lambda\widehat{\beta}^{-1}\leq\gamma\leq\lambda(\mu\!+\!\underline{\gamma})^{-1} with β^\widehat{\beta} same as in Remark 4.1. There exists cp>0c_{p}>0 such that for any t0t\geq 0, either 𝒫γθ(t)={0}\mathcal{P}_{\!\gamma}\theta(t)=\{0\} or mins𝒫γθ(t)scp\min_{s\in\mathcal{P}_{\!\gamma}\theta(t)}s\geq c_{p}.

It is easy to check that Assumption 2 holds for θ=θ1\theta=\theta_{1} and θ4\theta_{4}-θ5\theta_{5}. Under Assumptions 1-2, we prove that the sequences {rank(X^k)}k\{{\rm rank}(\widehat{X}^{k})\}_{k\in\mathbb{N}} and {rank(X¯k)}k\{{\rm rank}(\overline{X}^{k})\}_{k\in\mathbb{N}} converge to the same limit.

Lemma 4.1

Under Assumptions 1-2, for each W=(U^,V^,U¯,V¯,X^,X¯)𝒲W=\!(\widehat{U},\widehat{V},\overline{U},\overline{V},\widehat{X},\overline{X})\in\mathcal{W}^{*}, when kk¯k\geq\overline{k}, U¯2,0:=r¯=rank(X^k)=rank(X¯k)=U^k2,0=U¯k2,0\|\overline{U}\|_{2,0}\!:=\!\overline{r}\!=\!{\rm rank}(\widehat{X}^{k})\!=\!{\rm rank}(\overline{X}^{k})\!=\!\|\widehat{U}^{k}\|_{2,0}\!=\!\|\overline{U}^{k}\|_{2,0} and max{σi(X¯k),σi(X^k)}=0\max\{\sigma_{i}(\overline{X}^{k}),\sigma_{i}(\widehat{X}^{k})\}\!=\!0 for i=r¯+1,,ri=\overline{r}\!+\!1,\ldots,r. There exist α>0\alpha>0 and k~k¯\widetilde{k}\!\geq\overline{k} such that min{σr¯(X¯k),σr¯(X^k)}α\min\{\sigma_{\overline{r}}(\overline{X}^{k}),\sigma_{\overline{r}}(\widehat{X}^{k})\}\geq\alpha for kk~k\geq\widetilde{k}.

Proof: By Proposition 3.2, for all kk¯k\geq\overline{k}, JU¯k+1=JU¯k:=JJ_{\overline{U}^{k+1}}=J_{\overline{U}^{k}}:=J, so that U¯k+1=[U¯Jk+1 0]\overline{U}^{k+1}=[\overline{U}_{\!J}^{k+1}\ 0]. By combining Remark 4.1 with equation (9) and using Assumption 2,

mini[|J|][U¯Jk+1]icpforallkk¯,\min_{i\in[|J|]}\big{\|}[\overline{U}_{\!J}^{k+1}]_{i}\big{\|}\geq c_{p}\quad{\rm for\ all}\ k\geq\overline{k},

which means that U¯2,0|J|\|\overline{U}\|_{2,0}\geq|J|. In addition, the lower semicontinuity of 2,0\|\cdot\|_{2,0} implies that |J|=limkU¯k2,0U¯2,0|J|=\lim_{k\to\infty}\|\overline{U}^{k}\|_{2,0}\geq\|\overline{U}\|_{2,0}. Then, limkU¯k2,0=U¯2,0\lim_{k\to\infty}\|\overline{U}^{k}\|_{2,0}=\|\overline{U}\|_{2,0}. Together with Proposition 3.2, we obtain the first part of conclusions. Suppose on the contrary that the second part does not hold. There will exist an index set 𝒦2\mathcal{K}_{2}\subset\mathbb{N} such that lim𝒦2kσr¯(X¯k)=0\lim_{\mathcal{K}_{2}\ni k\to\infty}\sigma_{\overline{r}}(\overline{X}^{k})=0. By the continuity of σr¯()\sigma_{\overline{r}}(\cdot), the sequence {X¯k}k\{\overline{X}^{k}\}_{k\in\mathbb{N}} must have a cluster point, say X~\widetilde{X}, satisfying rank(X~)r¯1{\rm rank}(\widetilde{X})\leq\overline{r}-1, a contradiction to the first part. \Box

Next we apply the sinΘ\sin\Theta theorem (see [12]) to establish a crucial property of the sequence {Wk}k\{W^{k}\}_{k\in\mathbb{N}}, which will be used later to control the distance dist(0,Φλ,μ(U¯k,V¯k)){\rm dist}(0,\partial\Phi_{\lambda,\mu}(\overline{U}^{k},\overline{V}^{k})).

Lemma 4.2

For each kk, let D^1k=Diag(D^11k,,D^r¯r¯k)\widehat{D}_{1}^{k}\!\!=\!{\rm Diag}(\widehat{D}_{11}^{k},\ldots,\!\widehat{D}_{\overline{r}\overline{r}}^{k}) and D¯1k=Diag(D¯11k,,D¯r¯r¯k)\overline{D}_{1}^{k}\!\!=\!{\rm Diag}(\overline{D}_{11}^{k},\ldots,\!\overline{D}_{\overline{r}\overline{r}}^{k}). Then, under Assumptions 1-2, for each kk~k\geq\widetilde{k}, there exist R1k,R2k𝕆r¯R_{1}^{k},R_{2}^{k}\in\mathbb{O}^{\overline{r}} such that for the matrices Ak=BlkDiag((D^1k)1R1kD^1k,0)r×rA^{k}={\rm BlkDiag}\big{(}(\widehat{D}_{1}^{k})^{-1}R_{1}^{k}\widehat{D}_{1}^{k},0\big{)}\in\mathbb{R}^{r\times r} and Bk=BlkDiag((D¯1k)1R2kD¯1k,0)r×rB^{k}={\rm BlkDiag}\big{(}(\overline{D}_{1}^{k})^{-1}R_{2}^{k}\overline{D}_{1}^{k},0\big{)}\in\mathbb{R}^{r\times r},

max{U¯k+1U^k+1Ak+1F,V¯k+1V^k+1Ak+1F}α+4β¯2αX¯k+1X^k+1F,\displaystyle\max\big{\{}\|\overline{U}^{k+1}\!-\widehat{U}^{k+1}A^{k+1}\|_{F},\|\overline{V}^{k+1}\!-\widehat{V}^{k+1}A^{k+1}\|_{F}\big{\}}\leq\frac{\sqrt{\alpha}+4\overline{\beta}}{2\alpha}\big{\|}\overline{X}^{k+1}\!-\!\widehat{X}^{k+1}\big{\|}_{F},
max{U¯k+1U¯kBkF,V¯k+1V¯kBkF}α+4β¯2αX¯k+1X¯kF,\displaystyle\max\big{\{}\|\overline{U}^{k+1}\!-\overline{U}^{k}B^{k}\|_{F},\|\overline{V}^{k+1}\!-\overline{V}^{k}B^{k}\|_{F}\big{\}}\leq\frac{\sqrt{\alpha}+4\overline{\beta}}{2\alpha}\big{\|}\overline{X}^{k+1}\!-\!\overline{X}^{k}\big{\|}_{F},

where α\alpha is the same as in Lemma 4.1 and β¯\overline{\beta} is the same as in Corollary 4.1 (iii).

Proof: From steps 2 and 4 of Algorithm 1, X^k+1=P^k+1(D^k+1)2(Q~k+1)\widehat{X}^{k+1}=\widehat{P}^{k+1}(\widehat{D}^{k+1})^{2}(\widetilde{Q}^{k+1})^{\top} with Q~k+1=P¯kQ^k+1\widetilde{Q}^{k+1}=\overline{P}^{k}\widehat{Q}^{k+1} and X¯k+1=R~k+1(D¯k+1)2(P¯k+1)\overline{X}^{k+1}=\widetilde{R}^{k+1}(\overline{D}^{k+1})^{2}(\overline{P}^{k+1})^{\top} with R~k+1=P^k+1Q¯k+1\widetilde{R}^{k+1}=\widehat{P}^{k+1}\overline{Q}^{k+1}. By Lemma 4.1, for each kk~k\geq\widetilde{k}, rank(X^k)=rank(X¯k)=r¯{\rm rank}(\widehat{X}^{k})={\rm rank}(\overline{X}^{k})=\overline{r} and min{σr¯(X^k),σr¯(X¯k)}α\min\{\sigma_{\overline{r}}(\widehat{X}^{k}),\sigma_{\overline{r}}(\overline{X}^{k})\}\geq\alpha. Let J=[r¯]J=[\overline{r}] and J¯=[r]\J\overline{J}=[r]\backslash J. Then, for all kk~k\geq\widetilde{k}, D^iik+1=D¯iik+1=0\widehat{D}_{ii}^{k+1}=\overline{D}_{ii}^{k+1}=0 with iJ¯i\in\overline{J}, and

X^k+1=P^Jk+1(D^1k+1)2(Q~Jk+1)andX¯k+1=R~Jk+1(D¯1k+1)2(P¯Jk+1).\widehat{X}^{k+1}=\widehat{P}_{\!J}^{k+1}(\widehat{D}^{k+1}_{1})^{2}(\widetilde{Q}_{\!J}^{k+1})^{\top}\ \ {\rm and}\ \ \overline{X}^{k+1}=\widetilde{R}_{\!J}^{k+1}(\overline{D}_{1}^{k+1})^{2}(\overline{P}_{\!J}^{k+1})^{\top}.

For each kk~k\geq\widetilde{k}, by using [12, Theorem 2.1] with (A,A~)=(X^k+1,X¯k+1)(A,\widetilde{A})\!=\!(\widehat{X}^{k+1}\!,\overline{X}^{k+1}\!) and (X¯k,X¯k+1)(\overline{X}^{k}\!,\overline{X}^{k+1}\!), respectively, there exist R1k+1𝕆r¯R_{1}^{k+1}\in\mathbb{O}^{\overline{r}} and R2k𝕆r¯R_{2}^{k}\in\mathbb{O}^{\overline{r}} such that

P^Jk+1R1k+1R~Jk+1F2+Q~Jk+1R1k+1P¯Jk+1F22σr¯(X¯k+1)X¯k+1X^k+1F,\displaystyle\sqrt{\|\widehat{P}_{\!J}^{k+1}R_{1}^{k+1}-\!\widetilde{R}_{\!J}^{k+1}\|_{F}^{2}\!+\!\|\widetilde{Q}_{{J}}^{k+1}R_{1}^{k+1}-\overline{P}_{\!J}^{k+1}\|_{F}^{2}}\leq\frac{2}{\sigma_{\overline{r}}(\overline{X}^{k+1})}\big{\|}\overline{X}^{k+1}\!-\!\widehat{X}^{k+1}\big{\|}_{F}, (25)
R~JkR2kR~Jk+1F2+P¯JkR2kP¯Jk+1F22σr¯(X¯k+1)X¯kX¯k+1F.\displaystyle\sqrt{\|\widetilde{R}_{\!J}^{k}R_{2}^{k}-\widetilde{R}_{\!J}^{k+1}\|_{F}^{2}+\|\overline{P}_{\!J}^{k}R_{2}^{k}-\overline{P}_{\!J}^{k+1}\|_{F}^{2}}\leq\frac{2}{\sigma_{\overline{r}}(\overline{X}^{k\!+\!1})}\big{\|}\overline{X}^{k}\!-\overline{X}^{k\!+\!1}\big{\|}_{F}.\qquad (26)

Let Ak=BlkDiag((D^1k)1R1kD^1k,0)A^{k}={\rm BlkDiag}\big{(}(\widehat{D}_{1}^{k})^{-1}R_{1}^{k}\widehat{D}_{1}^{k},0\big{)} and Bk=BlkDiag((D¯1k)1R2kD¯1k,0)B^{k}={\rm BlkDiag}\big{(}(\overline{D}_{1}^{k})^{-1}R_{2}^{k}\overline{D}_{1}^{k},0\big{)} for each kk. By the expressions of U¯k+1,U^k+1\overline{U}^{k+1},\,\widehat{U}^{k+1} in steps 2 and 4 of Algorithm 1, for each kk~k\geq\widetilde{k},

U¯k+1U^k+1Ak+1F\displaystyle\big{\|}\overline{U}^{k+1}\!-\!\widehat{U}^{k+1}A^{k+1}\big{\|}_{F} =R~Jk+1D¯1k+1P^Jk+1R1k+1D^1k+1F\displaystyle=\big{\|}\widetilde{R}_{\!J}^{k+1}\overline{D}_{1}^{k+1}\!-\!\widehat{P}_{\!J}^{k+1}R_{1}^{k+1}\widehat{D}_{1}^{k+1}\big{\|}_{F}
D¯1k+1D^1k+1F+D^1k+1R~Jk+1P^Jk+1R1k+1F\displaystyle\leq\|\overline{D}_{1}^{k+1}-\widehat{D}_{1}^{k+1}\|_{F}+\|\widehat{D}_{1}^{k+1}\|\|\widetilde{R}_{J}^{k+1}-\widehat{P}^{k+1}_{{J}}R_{1}^{k+1}\|_{F}
α+4β¯2αX¯k+1X^k+1F\displaystyle\leq\frac{\sqrt{\alpha}+4\overline{\beta}}{2\alpha}\|\overline{X}^{k+1}-\widehat{X}^{k+1}\|_{F} (27)

where the last inequality is using (25), σr¯(X¯k+1)α\sigma_{\overline{r}}(\overline{X}^{k+1})\geq\alpha and the following relation

D¯1k+1D^1k+1F=i=1r¯[σi(X¯k+1)1/2σi(X^k+1)1/2]212αX¯k+1X^k+1F.\big{\|}\overline{D}_{1}^{k+1}-\widehat{D}_{1}^{k+1}\big{\|}_{F}=\sqrt{\textstyle{\sum_{i=1}^{\overline{r}}}\big{[}\sigma_{i}(\overline{X}^{k+1})^{1/2}-\sigma_{i}(\widehat{X}^{k+1})^{1/2}\big{]}^{2}}\leq\frac{1}{2\sqrt{\alpha}}\|\overline{X}^{k+1}\!-\widehat{X}^{k+1}\|_{F}.

By the expressions of V¯k+1\overline{V}^{k+1} and V^k+1\widehat{V}^{k+1} in steps 2 and 4 of Algorithm 1,

V¯k+1V^k+1Ak+1F\displaystyle\|\overline{V}^{k+1}\!-\!\widehat{V}^{k+1}A^{k+1}\|_{F} =P¯Jk+1D¯Jk+1Q~Jk+1R1k+1D^1k+1F\displaystyle=\big{\|}\overline{P}_{\!J}^{k+1}\overline{D}_{\!J}^{k+1}-\widetilde{Q}_{\!J}^{k+1}R_{1}^{k+1}\widehat{D}_{1}^{k+1}\big{\|}_{F}
D¯1k+1D^1k+1F+D^1k+1P¯Jk+1Q~Jk+1R1k+1F\displaystyle\leq\|\overline{D}_{1}^{k+1}-\widehat{D}_{1}^{k+1}\|_{F}+\|\widehat{D}_{1}^{k+1}\|\|\overline{P}_{\!J}^{k+1}-\widetilde{Q}_{\!J}^{k+1}R_{1}^{k+1}\|_{F}
α+4β¯2αX¯k+1X^k+1Fforeachkk~.\displaystyle\leq\frac{\sqrt{\alpha}+4\overline{\beta}}{2\alpha}\|\overline{X}^{k+1}-\widehat{X}^{k+1}\|_{F}\quad{\rm for\ each}\ k\geq\widetilde{k}. (28)

Inequalities (4.3)-(4.3) imply that the first inequality holds. Using inequality (26) and following the same argument as those for (4.3)-(4.3) leads to the second one. \Box

Proposition 4.1

Let J=[r¯]J=[\overline{r}] and J¯=[r]\J\overline{J}=[r]\backslash J. Let ϑJ\vartheta_{\!J} and ϑJ¯\vartheta_{\!\overline{J}} be defined by (3) with such JJ and J¯\overline{J}. Suppose that Assumptions 1-2 hold, and that for each kk~k\geq\widetilde{k},

ϑJ(U¯Jk+1)ϑJ(UJk+1)B1kFU¯Jk+1UJk+1B1kF,\displaystyle\|\nabla\vartheta_{\!J}(\overline{U}_{\!J}^{k+1})-\nabla\vartheta_{\!J}(U_{\!J}^{k+1})B_{1}^{k}\|_{F}\leq\|\overline{U}_{\!J}^{k+1}-{U}_{\!J}^{k+1}B_{1}^{k}\|_{F}, (29a)
ϑJ(V¯Jk+1)ϑJ(VJk+1)A1k+1FV¯Jk+1VJk+1A1k+1F\displaystyle\|\nabla\vartheta_{\!J}(\overline{V}_{\!J}^{k+1})-\nabla\vartheta_{\!J}(V_{\!J}^{k+1})A_{1}^{k+1}\|_{F}\leq\|\overline{V}_{\!J}^{k+1}-{V}_{\!J}^{k+1}A_{1}^{k+1}\|_{F} (29b)

with A1k=(D^1k)1R1kD^1kA_{1}^{k}\!=(\widehat{D}_{1}^{k})^{-1}R_{1}^{k}\widehat{D}_{1}^{k} and B1k=(D¯1k)1R2kD¯1kB_{1}^{k}\!=(\overline{D}_{1}^{k})^{-1}R_{2}^{k}\overline{D}_{1}^{k}. Then, there exists cs>0c_{s}>0 such that

dist(0,Φλ,μ(U¯k+1,V¯k+1))\displaystyle{\rm dist}\big{(}0,\partial\Phi_{\lambda,\mu}(\overline{U}^{k+1},\overline{V}^{k+1})\big{)} cs(X¯k+1X¯kF+X^k+1X¯k+1F)\displaystyle\leq c_{s}(\|\overline{X}^{k+1}\!-\!\overline{X}^{k}\|_{F}+\|\widehat{X}^{k+1}\!-\!\overline{X}^{k+1}\|_{F})
+cs(Uk+1U¯kF+Vk+1V^k+1F)forallkk~.\displaystyle\ +c_{s}(\|U^{k+1}\!-\!\overline{U}^{k}\|_{F}+\|V^{k+1}\!-\!\widehat{V}^{k+1}\|_{F})\ \ {\rm for\ all}\ k\geq\widetilde{k}.

Proof: From the inclusions in Remark 3.1 (b) and Lemma 2.2, for each kk~k\geq\widetilde{k},

0[f(X¯k)+Lf(X^k+1X¯k)]V¯k+μUk+1+γ1,k(Uk+1U¯k)\displaystyle 0\in\big{[}\nabla\!f(\overline{X}^{k})\!+\!L_{\!f}(\widehat{X}^{k+1}\!-\!\overline{X}^{k})\big{]}\overline{V}^{k}+\mu U^{k+1}+\gamma_{1,k}(U^{k+1}\!-\!\overline{U}^{k})
+λ[{ϑJ(UJk+1)}×ϑJ¯(UJ¯k+1)],\displaystyle\quad\ \ +\lambda\big{[}\{\nabla\vartheta_{J}(U_{\!J}^{k+1})\}\times\partial\vartheta_{\overline{J}}(U_{\overline{J}}^{k+1})\big{]},
0[f(X^k+1)+Lf(X¯k+1X^k+1)]U^k+1+μVk+1+γ2,k(Vk+1V^k+1)\displaystyle 0\in\big{[}\nabla\!f(\widehat{X}^{k+1})+L_{\!f}(\overline{X}^{k+1}\!-\!\widehat{X}^{k+1})\big{]}^{\top}\widehat{U}^{k+1}+\mu{V}^{k+1}+\gamma_{2,k}({V}^{k+1}\!-\!\widehat{V}^{k+1})
+λ[{ϑJ(VJk+1)}×ϑJ¯(VJ¯k+1)].\displaystyle\quad\ \ +\lambda\big{[}\{\nabla\vartheta_{J}(V_{\!J}^{k+1})\}\times\partial\vartheta_{\overline{J}}(V_{\!\overline{J}}^{k+1})\big{]}.

By Lemma 4.1, for each kk~k\geq\widetilde{k}, U¯J¯k=UJ¯k=0\overline{U}^{k}_{\overline{J}}={U}^{k}_{\overline{J}}=0 and V¯J¯k=VJ¯k=0\overline{V}^{k}_{\overline{J}}={V}^{k}_{\overline{J}}=0. Together with the above two inclusions, we have 0ϑJ¯(UJ¯k+1), 0ϑJ¯(VJ¯k+1)0\in\partial\vartheta_{\overline{J}}(U_{\!\overline{J}}^{k+1}),\,0\in\partial\vartheta_{\overline{J}}(V_{\!\overline{J}}^{k+1}) and the equalities

0=[f(X¯k)+Lf(X^k+1X¯k)]V¯Jk+μUJk+1+γ1,k(UJk+1U¯Jk)+λϑJ(UJk+1),\displaystyle 0\!=\!\big{[}\nabla\!f(\overline{X}^{k})\!+\!L_{\!f}(\widehat{X}^{k+1}-\!\overline{X}^{k})\big{]}\overline{V}_{\!J}^{k}+\mu U_{\!J}^{k+1}+\gamma_{1,k}(U_{\!J}^{k+1}\!-\!\overline{U}_{\!J}^{k})+\lambda\nabla\vartheta_{J}(U_{\!J}^{k+1}),\qquad
0=[f(X^k+1)+Lf(X¯k+1X^k+1)]U^Jk+1+μVJk+1+γ2,k(VJk+1V^Jk+1)+λϑJ(VJk+1).\displaystyle 0\!=\!\big{[}\nabla\!f(\widehat{X}^{k+1})\!\!+\!L_{\!f}(\overline{X}^{k+1}\!\!\!-\!\widehat{X}^{k+1})\big{]}^{\top}\widehat{U}_{\!J}^{k+1}\!+\!\mu{V}_{\!J}^{k+1}\!\!+\!\gamma_{2,k}({V}_{\!J}^{k+1}\!\!-\!\widehat{V}_{\!J}^{k+1})\!+\!\lambda\nabla\vartheta_{J}(V_{\!J}^{k+1}).

Multiplying the first equality by B1kB_{1}^{k} and the second one by A1kA_{1}^{k}, we immediately have

0=[f(X¯k)+Lf(X^k+1X¯k)]V¯JkB1k+μUJk+1B1k\displaystyle 0=\big{[}\nabla f(\overline{X}^{k})+L_{\!f}(\widehat{X}^{k+1}\!-\!\overline{X}^{k})\big{]}\overline{V}_{\!J}^{k}B_{1}^{k}+\mu U_{\!J}^{k+1}B_{1}^{k}
+γ1,k(UJk+1U¯Jk)B1k+λϑJ(UJk+1)B1k,\displaystyle\qquad+\gamma_{1,k}(U_{\!J}^{k+1}-\overline{U}_{\!J}^{k})B_{1}^{k}+\lambda\nabla\vartheta_{\!J}(U_{\!J}^{k+1})B_{1}^{k}, (31a)
0=[f(X^k+1)+Lf(X¯k+1X^k+1)]U^Jk+1A1k+1+μVJk+1A1k+1\displaystyle 0=\big{[}\nabla f(\widehat{X}^{k+1})\!\!+\!L_{\!f}(\overline{X}^{k+1}\!\!\!-\!\widehat{X}^{k+1})\big{]}^{\top}\widehat{U}_{\!J}^{k+1}A_{1}^{k+1}\!\!+\!\mu{V}_{\!J}^{k+1}A_{1}^{k+1}
+γ2,k(VJk+1V^Jk+1)A1k+1+λϑJ(VJk+1)A1k+1.\displaystyle\qquad+\gamma_{2,k}({V}_{\!J}^{k+1}\!\!-\!\widehat{V}_{\!J}^{k+1})A_{1}^{k+1}+\lambda\nabla\vartheta_{\!J}(V_{\!J}^{k+1})A_{1}^{k+1}. (31b)

Together with 0ϑJ¯(UJ¯k+1), 0ϑJ¯(VJ¯k+1)0\in\partial\vartheta_{\overline{J}}(U_{\!\overline{J}}^{k+1}),\,0\in\partial\vartheta_{\overline{J}}(V_{\!\overline{J}}^{k+1}) and Definition 2.2, for each kk~k\geq\widetilde{k},

dist(0,Φλ,μ(U¯k+1,V¯k+1))Sk+1F+Γk+1F,{\rm dist}\big{(}0,\partial\Phi_{\lambda,\mu}(\overline{U}^{k+1},\overline{V}^{k+1})\big{)}\leq\|S^{k+1}\|_{F}+\|\Gamma^{k+1}\|_{F}, (32)

with

Sk+1\displaystyle S^{k+1} :=Uf(U¯k+1(V¯k+1))V¯Jk+1+μU¯Jk+1+λϑ1(U¯Jk+1),\displaystyle:=\nabla_{U}f(\overline{U}^{k+1}(\overline{V}^{k+1})^{\top})\overline{V}_{\!J}^{k+1}+\mu\overline{U}_{\!J}^{k+1}+\!\lambda\nabla\vartheta_{1}(\overline{U}_{\!J}^{k+1}),
Tk+1\displaystyle T^{k+1} :=[Vf(U¯k+1(V¯k+1))]U¯Jk+1+μV¯Jk+1+λϑ1(V¯Jk+1).\displaystyle:=\big{[}\nabla_{V}f(\overline{U}^{k+1}(\overline{V}^{k+1})^{\top})\big{]}^{\top}\overline{U}_{\!J}^{k+1}+\mu\overline{V}_{\!J}^{k+1}+\lambda\nabla\vartheta_{1}(\overline{V}_{\!J}^{k+1}).

By comparing the expressions of Sk+1S^{k+1} and Tk+1T^{k+1} with (4.3)-(4.3), for each kk~k\geq\widetilde{k},

Sk+1\displaystyle S^{k+1} =f(X¯k+1)V¯Jk+1[f(X¯k)+Lf(X^k+1X¯k)]V¯JkB1k+λϑJ(U¯Jk+1)\displaystyle=\nabla f(\overline{X}^{k+1})\overline{V}_{\!J}^{k+1}-\big{[}\nabla f(\overline{X}^{k})+L_{\!f}(\widehat{X}^{k+1}\!-\!\overline{X}^{k})\big{]}\overline{V}_{\!J}^{k}B_{1}^{k}\!+\lambda\nabla\vartheta_{\!J}(\overline{U}_{{J}}^{k+1})
λϑJ(UJk+1)B1k+μ(U¯Jk+1UJk+1B1k)γ1,k(UJk+1U¯Jk)B1k,\displaystyle\quad\ -\lambda\nabla\vartheta_{\!J}({U}_{\!J}^{k+1})B_{1}^{k}+\mu(\overline{U}_{\!J}^{k+1}-U_{\!J}^{k+1}B_{1}^{k})-\gamma_{1,k}(U_{\!J}^{k+1}-\overline{U}_{\!J}^{k})B_{1}^{k},
Tk+1\displaystyle T^{k+1} =f(X¯k+1)U¯Jk+1[f(X^k+1)+Lf(X¯k+1X^k+1)]U^Jk+1A1k+1+λϑJ(V¯Jk+1)\displaystyle=\!\nabla f(\overline{X}^{k+1})^{\top}\overline{U}_{\!J}^{k+1}\!\!\!-\!\big{[}\nabla f(\widehat{X}^{k+1})\!+\!L_{f}(\overline{X}^{k+1}\!\!\!-\!\widehat{X}^{k+1})\big{]}^{\top}\widehat{U}_{\!J}^{k+1}A_{1}^{k+1}\!+\!\lambda\nabla\vartheta_{\!J}(\overline{V}_{\!J}^{k+1})
λϑJ(VJk+1)A1k+1+μ(V¯Jk+1VJk+1A1k+1)γ2,k(VJk+1V^Jk+1)A1k+1.\displaystyle\quad\ -\lambda\nabla\vartheta_{\!J}({V}_{\!J}^{k+1})A_{1}^{k+1}+\mu(\overline{V}_{\!J}^{k+1}-V_{\!J}^{k+1}A_{1}^{k+1})-\gamma_{2,k}({V}_{\!J}^{k+1}-\widehat{V}_{\!J}^{k+1})A_{1}^{k+1}.

Recall that Ak=BlkDiag(A1k,0)A^{k}={\rm BlkDiag}(A_{1}^{k},0) and Bk=BlkDiag(B1k,0)B^{k}={\rm BlkDiag}(B_{1}^{k},0) for each kk\in\mathbb{N}. Together with U¯J¯k=UJ¯k=0\overline{U}^{k}_{\overline{J}}={U}^{k}_{\overline{J}}=0 and V¯J¯k=VJ¯k=0\overline{V}^{k}_{\overline{J}}={V}^{k}_{\overline{J}}=0, it follows that for each kk~k\geq\widetilde{k},

Sk+1F\displaystyle\|S^{k+1}\!\|_{F} f(X¯k+1)V¯k+1f(X¯k)V¯kBkF+Lfβ¯X^k+1X¯kF+μU¯k+1Uk+1BkF\displaystyle\leq\|\nabla\!f(\overline{X}^{k+1}\!)\overline{V}^{k+1}\!-\!\nabla\!f(\overline{X}^{k})\overline{V}^{k}B^{k}\|_{F}\!+\!L_{\!f}\overline{\beta}\|\widehat{X}^{k+1}\!\!-\!\overline{X}^{k}\!\|_{F}+\!\mu\|\overline{U}^{k+1}\!-\!U^{k+1}B^{k}\|_{F}
+γ1,k(β¯/α)Uk+1U¯kF+λϑJ(UJk+1)B1kϑJ(U¯Jk+1)F,\displaystyle\quad\ +\!\gamma_{1,k}(\overline{\beta}/{\sqrt{\alpha}})\|U^{k+1}\!-\!\overline{U}^{k}\|_{F}+\lambda\|\nabla\vartheta_{\!J}(U_{\!J}^{k+1})B_{1}^{k}-\nabla\vartheta_{\!J}(\overline{U}_{\!J}^{k+1})\|_{F}, (33)
Tk+1F\displaystyle\|T^{k+1}\|_{F} f(X¯k+1)U¯k+1f(X^k+1)V^k+1Ak+1F+Lfβ¯X^k+1X¯k+1F\displaystyle\leq\|\nabla\!f(\overline{X}^{k+1})\overline{U}^{k+1}\!-\!\nabla\!f(\widehat{X}^{k+1})\widehat{V}^{k+1}A^{k+1}\|_{F}+L_{\!f}\overline{\beta}\|\widehat{X}^{k+1}\!-\!\overline{X}^{k+1}\|_{F}
+μV¯k+1Vk+1Ak+1F+γ2,k(β¯/α)Vk+1V^k+1F\displaystyle\quad\ +\mu\|\overline{V}^{k+1}\!-V^{k+1}A^{k+1}\|_{F}+\gamma_{2,k}(\overline{\beta}/{\sqrt{\alpha}})\|V^{k+1}\!-\widehat{V}^{k+1}\|_{F}
+λϑJ(VJk+1)A1k+1ϑJ(V¯Jk+1)F\displaystyle\quad\ +\lambda\|\nabla\vartheta_{\!J}(V_{\!J}^{k+1})A_{1}^{k+1}-\nabla\vartheta_{\!J}(\overline{V}_{\!J}^{k+1})\|_{F} (34)

where we use max(Ak+1,Bk)β¯/α\max(\|A^{k+1}\|,\|B^{k}\|)\leq{\overline{\beta}}/{\sqrt{\alpha}} for each kk\in\mathbb{N}, implied by the expressions of A1kA_{1}^{k} and B1kB_{1}^{k}. Recall that f\nabla\!f is Lipschitz continuous with modulus LfL_{\!f}. For each kk~k\geq\widetilde{k},

f(X¯k+1)V¯k+1f(X¯k)V¯kBkF\displaystyle\|\nabla\!f(\overline{X}^{k+1})\overline{V}^{k+1}-\nabla\!f(\overline{X}^{k})\overline{V}^{k}B^{k}\|_{F}
f(X¯k+1)f(X¯k)FV¯k+1+f(X¯k)V¯k+1V¯kBkF\displaystyle\leq\|\nabla\!f(\overline{X}^{k+1})-\nabla\!f(\overline{X}^{k})\|_{F}\|\overline{V}^{k+1}\|+\|\nabla\!f(\overline{X}^{k})\|\|\overline{V}^{k+1}-\overline{V}^{k}B^{k}\|_{F}
Lfβ¯X¯k+1X¯kF+cf(α+4β¯)2αX¯k+1X¯kF\displaystyle\leq L_{f}\overline{\beta}\|\overline{X}^{k+1}-\overline{X}^{k}\|_{F}+\frac{c_{f}(\sqrt{\alpha}\!+\!4\overline{\beta})}{2\alpha}\|\overline{X}^{k+1}\!-\!\overline{X}^{k}\|_{F}

with cf:=supk{f(X¯k)),f(X^k))}c_{f}:=\sup_{k\in\mathbb{N}}\{\|\nabla\!f(\overline{X}^{k})^{\top})\|,\|\nabla\!f(\widehat{X}^{k})^{\top})\|\}, where the second inequality is using Lemma 4.2. For the term U¯k+1Uk+1BkF\|\overline{U}^{k+1}\!-U^{k+1}B^{k}\|_{F} in inequality (4.3), it holds that

U¯k+1Uk+1BkF\displaystyle\|\overline{U}^{k+1}\!-U^{k+1}B^{k}\|_{F} U¯k+1U¯kBkF+U¯kBkUk+1BkF\displaystyle\leq\|\overline{U}^{k+1}\!-\overline{U}^{k}B^{k}\|_{F}+\|\overline{U}^{k}B^{k}\!-U^{k+1}B^{k}\|_{F}
α+4β¯2αX¯k+1X¯kF+(β¯/α)U¯kUk+1F.\displaystyle\leq\frac{\sqrt{\alpha}\!+\!4\overline{\beta}}{2\alpha}\|\overline{X}^{k+1}\!-\!\overline{X}^{k}\|_{F}+({\overline{\beta}}/{\sqrt{\alpha}})\|\overline{U}^{k}\!-\!U^{k+1}\|_{F}.

Combining the above two inequalities with (4.3) and the given (29a), we get

Sk+1F\displaystyle\|S^{k+1}\|_{F} [Lfβ¯+0.5α1(cf+μ+λ)(α+4β¯)]X¯k+1X¯kF\displaystyle\leq\big{[}L_{f}\overline{\beta}+0.5\alpha^{-1}(c_{f}+\mu+\!\lambda)(\sqrt{\alpha}\!+\!4\overline{\beta})\big{]}\|\overline{X}^{k+1}-\overline{X}^{k}\|_{F}
+(μ+λ+γ1,k)(β¯/α)Uk+1U¯kF+Lfβ¯X^k+1X¯kF.\displaystyle\quad+(\mu\!+\!\lambda+\gamma_{1,k})({\overline{\beta}}/{\sqrt{\alpha}})\|U^{k+1}-\overline{U}^{k}\|_{F}+L_{\!f}\overline{\beta}\|\widehat{X}^{k+1}\!\!-\!\overline{X}^{k}\!\|_{F}. (35)

Using inequality (4.3) and following the same argument as those for (4.3) leads to

Γk+1F\displaystyle\|\Gamma^{k+1}\|_{F} [2Lfβ¯+0.5α1(cf+μ+λ)(α+4β¯)]X¯k+1X^k+1F\displaystyle\leq\big{[}2L_{f}\overline{\beta}+0.5\alpha^{-1}({c}_{f}\!+\!\mu+\lambda)(\sqrt{\alpha}+4\overline{\beta})\big{]}\|\overline{X}^{k+1}-\!\widehat{X}^{k+1}\|_{F}
+(μ+λ+γ1,k)(β¯/α)Vk+1V^k+1F.\displaystyle\quad+(\mu+\lambda+\gamma_{1,k})({\overline{\beta}}/{\sqrt{\alpha}})\|V^{k+1}-\widehat{V}^{k+1}\|_{F}.

The desired result follows by combining the above two inequalities with (32). \Box

Remark 4.3

When θ=θ1\theta=\theta_{1}, by noting that ϑJ(U¯Jk+1)=0\nabla\vartheta_{\!J}(\overline{U}_{\!J}^{k+1})=0 and ϑJ(UJk+1)=0\nabla\vartheta_{\!J}(U_{\!J}^{k+1})=0, inequalities (29a)-(29b) automatically hold for all kk~k\geq\widetilde{k}. If there exists W𝒲W\in\mathcal{W}^{*} such that the nonzero singular values of X¯\overline{X} are distinct each other, then as will be shown in Proposition 2, inequalities (29a)-(29b) still hold for all kk~k\geq\widetilde{k}.

Now we are ready to establish the convergence of the iterate sequence {(X¯k,X^k)}k\{(\overline{X}^{k},\widehat{X}^{k})\}_{k\in\mathbb{N}} and the column subspace sequences {col(U^k),col(V^k)}\{{\rm col}(\widehat{U}^{k}),{\rm col}(\widehat{V}^{k})\} and {col(U¯k),col(V¯k)}\{{\rm col}(\overline{U}^{k}),{\rm col}(\overline{V}^{k})\} of factor pairs.

Theorem 4.3

Suppose that Φλ,μ\Phi_{\lambda,\mu} is a KL function, that Assumptions 1-2 hold, and inequalities (29a)-(29b) hold for all kk~k\geq\widetilde{k}. Then, {X¯k}k\{\overline{X}^{k}\}_{k\in\mathbb{N}} and {X^k}k\{\widehat{X}^{k}\}_{k\in\mathbb{N}} converge to the same point XX^{*}, (P1(Σr(X))12,Q1(Σr(X))12)\big{(}P_{1}^{*}(\Sigma_{{r}}(X^{*}))^{\frac{1}{2}},Q_{1}^{*}(\Sigma_{{r}}(X^{*}))^{\frac{1}{2}}\big{)} is a stationary point of (1) where P1P_{1}^{*} and Q1Q_{1}^{*} are the matrix consisting of the first r{r} columns of PP^{*} and QQ^{*} with (P,Q)𝕆n,m(X)(P^{*},Q^{*})\in\mathbb{O}^{n,m}(X^{*}), and

limkcol(Uk)=limkcol(U¯k)=limkcol(U^k)=col(X);\displaystyle\lim_{k\to\infty}{\rm col}(U^{k})=\lim_{k\to\infty}{\rm col}(\overline{U}^{k})=\lim_{k\to\infty}{\rm col}(\widehat{U}^{k})={\rm col}(X^{*}); (36a)
limkcol(Vk)=limkcol(V¯k)=limkcol(V^k)=row(X).\displaystyle\lim_{k\to\infty}{\rm col}(V^{k})=\lim_{k\to\infty}{\rm col}(\overline{V}^{k})=\lim_{k\to\infty}{\rm col}(\widehat{V}^{k})={\rm row}(X^{*}). (36b)

where the set convergence is in the sense of Painleve´\acute{e}-Kuratowski convergence.

Proof: Using Theorems 4.1 and 4.2 and Proposition 4.1 and following the same arguments as those for [2, Theorem 1] yields the first two parts. For the last part, by Proposition 3.2, we only need to prove limkcol(U^k)=col(X)\lim_{k\to\infty}{\rm col}(\widehat{U}^{k})={\rm col}(X^{*}), which by [19, Exercise 4.2] is equivalent to

:={ur¯|limkdist(u,col(U^k))=0}=col(X).\mathcal{L}:=\big{\{}u\in\mathbb{R}^{\overline{r}}\ |\ \lim_{k\to\infty}{\rm dist}(u,{\rm col}(\widehat{U}^{k}))=0\big{\}}={\rm col}(X^{*}). (37)

To this end, we first argue that col(P^J)=col(P~){\rm col}(\widehat{P}_{\!J})={\rm col}(\widetilde{P}^{*}). By Lemma 4.1, rank(X^k)=rank(X¯k)=r¯{\rm rank}(\widehat{X}^{k})\!={\rm rank}(\overline{X}^{k})=\overline{r} and min{σr¯(X^k),σr¯(X¯k)}α\min\{\sigma_{\overline{r}}(\widehat{X}^{k}),\sigma_{\overline{r}}(\overline{X}^{k})\}\geq\alpha for all kk~k\geq\widetilde{k}. Then, for each kk~k\geq\widetilde{k}, D^jjk=D¯jjk=0\widehat{D}_{jj}^{k}=\overline{D}_{jj}^{k}=0 for j=r¯+1,,rj=\overline{r}\!+\!1,\ldots,r. By step 2 of Algorithm 1, for each kk~k\geq\widetilde{k}, U^k=[P^JkD^1k 0]\widehat{U}^{k}=[\widehat{P}_{J}^{k}\widehat{D}_{1}^{k}\ \ 0] with J=[r¯]J=[\overline{r}] and D^1k=Diag(D^11k,,D^r¯r¯k)\widehat{D}_{1}^{k}={\rm Diag}(\widehat{D}_{11}^{k},\ldots,\widehat{D}_{\overline{r}\overline{r}}^{k}). This means that col(U^k)=col(P^Jk){\rm col}(\widehat{U}^{k})={\rm col}(\widehat{P}_{\!J}^{k}) for all kk~k\geq\widetilde{k}. Let P~Σr¯(X)(Q~)\widetilde{P}^{*}\Sigma_{\overline{r}}(X^{*})(\widetilde{Q}^{*})^{\top} be the thin SVD of XX^{*} with P~𝕆n×r¯\widetilde{P}^{*}\!\in\mathbb{O}^{n\times\overline{r}} and Q~𝕆m×r¯\widetilde{Q}^{*}\!\in\mathbb{O}^{m\times\overline{r}}. Clearly, col(X)=col(P~){\rm col}(X^{*})={\rm col}(\widetilde{P}^{*}). For each kk~k\geq\widetilde{k}, let P^2k𝕆n×(nr¯)\widehat{P}_{2}^{k}\in\mathbb{O}^{n\times(n-\overline{r})} be such that [P^JkP^2k]𝕆n[\widehat{P}_{\!J}^{k}\ \ \widehat{P}_{2}^{k}]\in\mathbb{O}^{n}. By invoking [7, Lemma 3], there exists η>0\eta>0 such that for all sufficiently large kk,

dist([P^JkP^2k],𝕆n(X(X)))ηX^k(X^k)X(X)Fη(X^k+X)X^kXF,{\rm dist}\big{(}[\widehat{P}_{\!J}^{k}\ \ \widehat{P}_{2}^{k}],\mathbb{O}^{n}(X^{*}(X^{*})^{\top})\big{)}\leq\eta\|\widehat{X}^{k}(\widehat{X}^{k})^{\top}\!-\!X^{*}(X^{*})^{\top}\|_{F}\leq\eta(\|\widehat{X}^{k}\|\!+\!\|X^{*}\|)\|\widehat{X}^{k}\!-\!X^{*}\|_{F},

which by the convergence of {X^k}k\{\widehat{X}^{k}\}_{k\in\mathbb{N}} means that limkdist([P^JkP^2k],𝕆n(X(X)))=0\lim_{k\to\infty}{\rm dist}\big{(}[\widehat{P}_{\!J}^{k}\ \ \widehat{P}_{2}^{k}],\mathbb{O}^{n}(X^{*}(X^{*})^{\top})\big{)}=0. Let 𝕆r¯(X(X)):={P𝕆n×r¯|X(X)=PΣr¯(X)P}\mathbb{O}^{\overline{r}}(X^{*}(X^{*})^{\top})\!:=\{P\in\mathbb{O}^{n\times\overline{r}}\,|\,X^{*}(X^{*})^{\top}=P\Sigma_{\overline{r}}(X^{*})P^{\top}\big{\}}. Then, for any accumulation point P^J\widehat{P}_{\!J} of {P^Jk}k\{\widehat{P}_{J}^{k}\}_{k\in\mathbb{N}}, we have dist(P^J,𝕆r¯(X(X)))=0{\rm dist}\big{(}\widehat{P}_{\!J},\mathbb{O}^{\overline{r}}(X^{*}(X^{*})^{\top})\big{)}=0, and hence col(P^J)=col(P~){\rm col}(\widehat{P}_{\!J})={\rm col}(\widetilde{P}^{*}). Now pick any uu\in\mathcal{L}. Then, 0=limkdist(u,col(U^k))=limkP^Jk(P^Jk)uu.0=\lim_{k\to\infty}{\rm dist}(u,{\rm col}(\widehat{U}^{k}))=\lim_{k\to\infty}\|\widehat{P}_{\!J}^{k}(\widehat{P}_{\!J}^{k})^{\top}u-u\|. From the boundedness of {P^Jk}\{\widehat{P}_{\!J}^{k}\}, there exists an accumulation point P^J\widehat{P}_{\!J} such that u=P^JP^Juu=\widehat{P}_{\!J}\widehat{P}_{\!J}^{\top}u, i.e., ucol(P^J)=col(P~)=col(X)u\in{\rm col}(\widehat{P}_{\!J})={\rm col}(\widetilde{P}^{*})={\rm col}(X^{*}). This shows that col(X)\mathcal{L}\subset{\rm col}(X^{*}). For the converse inclusion, from limkdist([P^JkP^2k],𝕆n(XX)=0\lim_{k\to\infty}{\rm dist}\big{(}[\widehat{P}_{\!J}^{k}\ \ \widehat{P}_{2}^{k}],\mathbb{O}^{n}(X^{*}{X^{*}}^{\top}\big{)}=0, it is not hard to deduce that

limk[minR𝕆r¯,P~R𝕆r¯(X(X))P~P^JkRF]=0,\lim_{k\to\infty}\Big{[}\min_{R\in\mathbb{O}^{\overline{r}},\widetilde{P}^{*}R\in\mathbb{O}^{\overline{r}}(X^{*}(X^{*})^{\top})}\|\widetilde{P}^{*}-\widehat{P}_{J}^{k}R\|_{F}\Big{]}=0,

which implies that limkP^Jk(P^Jk)P~(P~)F=0\lim_{k\to\infty}\|\widehat{P}_{\!J}^{k}(\widehat{P}_{\!J}^{k})^{\top}\!-\!\widetilde{P}^{*}(\widetilde{P}^{*})^{\top}\|_{F}=0. Now pick any ucol(X)=col(P~)u\in{\rm col}(X^{*})={\rm col}(\widetilde{P}^{*}). We have u=P~(P~)uu=\widetilde{P}^{*}(\widetilde{P}^{*})^{\top}u. Then, limkdist(u,col(U^k))=limkP^Jk(P^Jk)uu=0\lim_{k\to\infty}{\rm dist}(u,{\rm col}(\widehat{U}^{k}))=\lim_{k\to\infty}\|\widehat{P}_{\!J}^{k}(\widehat{P}_{\!J}^{k})^{\top}u-u\|=0. This shows that uu\in\mathcal{L}, and the converse inclusion follows. \Box

5 Numerical experiments

To validate the efficiency of Algorithm 1, we apply it to compute one-bit matrix completions with noise, and compare its performance with that of a line-search PALM described in Appendix B. All numerical tests are performed in MATLAB 2024a on a laptop computer running on 64-bit Windows Operating System with an Intel(R) Core(TM) i9-13905H CPU 2.60GHz and 32 GB RAM.

5.1 One-bit matrix completions with noise

We consider one-bit matrix completion under a uniform sampling scheme, in which the unknown true Mn×mM^{*}\in\mathbb{R}^{n\times m} is assumed to be low rank. Instead of observing noisy entries of M=M+EM=M^{*}+E directly, where EE is a noise matrix with i.i.d. entries, we now observe with error the sign of a random subset of the entries of MM^{*}. More specifically, assume that a random sample Ω={(i1,j1),(i2,j2),,(iN,jN)}([n]×[m])N\Omega=\{(i_{1},j_{1}),(i_{2},j_{2}),\ldots,(i_{N},j_{N})\}\subset([n]\times[m])^{N} of the index set is drawn i.i.d. with replacement according to a uniform sampling distribution {(it,jt)=(k,l)}=1nm\mathbb{P}\{(i_{t},j_{t})=(k,l)\}=\frac{1}{nm} for all t[N]t\in[N] and (k,l)[n]×[m](k,l)\in[n]\times[m], and the entries YijY_{ij} of a sign matrix YY with (i,j)Ω(i,j)\in\Omega are observed. Let ϕ:[0,1]\phi\!:\mathbb{R}\to[0,1] be a cumulative distribution function of E11-E_{11}. Then, the above observation model can be recast as

Yij={+1withprobabilityϕ(Mij),1withprobability 1ϕ(Mij)Y_{ij}=\left\{\!\begin{array}[]{ll}+1&\ {\rm with\ probability}\ \phi(M^{*}_{ij}),\\ -1&\ {\rm with\ probability}\ 1-\phi(M^{*}_{ij})\end{array}\right. (38)

and we observe noisy entries {Yit,jt}t=1N\{Y_{i_{t},j_{t}}\}_{t=1}^{N} indexed by Ω\Omega. More details can be found in [8]. Two common choices for the function ϕ\phi or the distribution of {Eij}\{E_{ij}\} are given as follows:

  1. (I)

    (Logistic regression/noise): The logistic regression model is described by (38) with ϕ(x)=ex1+ex\phi(x)=\frac{e^{x}}{1+e^{x}} and EijE_{ij} i.i.d. obeying the standard logistic distribution.

  2. (II)

    (Laplacian noise): In this case, EijE_{ij} i.i.d. obey a Laplacian distribution Laplace (0,b)(0,b) with the scale parameter b>0b>0, and the function ϕ\phi has the following form

    ϕ(x)={12exp(x/b)ifx<0,112exp(x/b)ifx0.\phi(x)=\left\{\!\begin{array}[]{cl}\frac{1}{2}\exp(x/b)&\ {\rm if}\ x<0,\\ 1-\frac{1}{2}\exp(-x/b)&\ {\rm if}\ x\geq 0.\end{array}\right.

Given a collection of observations YΩ={Yit,jt}t=1NY_{\Omega}=\{Y_{i_{t},j_{t}}\}_{t=1}^{N} from the observation model (38), the negative log-likelihood function can be written as

f(X)=(i,j)Ω(𝕀[Yij=1]lnϕ(Xij)+𝕀[Yij=1]ln(1ϕ(Mij))).f(X)=-\sum_{(i,j)\in\Omega}\Big{(}\mathbb{I}_{[Y_{ij}=1]}\ln\phi(X_{ij})+\mathbb{I}_{[Y_{ij}=-1]}\ln(1-\phi(M_{ij}))\Big{)}.

Under case (I), for each (i,j)[n]×[m](i,j)\in[n]\times[m], [2f(X)]ij=exp(Xij)(1+exp(Xij))2[\nabla^{2}\!f(X)]_{ij}=\frac{\exp(X_{ij})}{(1+\exp(X_{ij}))^{2}} for Xn×mX\in\mathbb{R}^{n\times m}, so f\nabla\!f is Lipschitz continuous with Lf=1L_{\!f}=1; while under case (II), for any Xn×mX\in\mathbb{R}^{n\times m} and each (i,j)[n]×[m](i,j)\in[n]\times[m], [2f(X)]ij=2exp(|x|/b)b2(2exp(|x|/b))2[\nabla^{2}\!f(X)]_{ij}=\frac{2\exp(-|x|/b)}{b^{2}(2-\exp(-|x|/b))^{2}} if XijYij0X_{ij}Y_{ij}\geq 0, otherwise [2f(X)]ij=0[\nabla^{2}\!f(X)]_{ij}=0. Clearly, for case (II), f\nabla\!f is Lipschitz continuous with Lf=2/b2L_{\!f}={2}/{b^{2}}.

5.2 Implementation details of Algorithms 1 and 2

First we take a look at the choice of parameters in Algorithm 1. From formulas (9)-(10) in Remark 3.1 (c), for fixed λ\lambda and μ\mu, a smaller γ1,0\gamma_{1,0} (respectively, γ2,0\gamma_{2,0}) will lead to a smooth change of the iterate UkU^{k} (respectively, VkV^{k}), but the associated subproblems will require a little more running time. As a trade-off, we choose γ1,0=γ2,0=102\gamma_{1,0}=\gamma_{2,0}=10^{-2} and ϱ=0.8\varrho=0.8 for the subsequent numerical tests. The parameters γ1¯\underline{\gamma_{1}} and γ2¯\underline{\gamma_{2}} are chosen to be 10810^{-8}. The initial (U0,V0)(U^{0},V^{0}) is generated by Matlab command U0=orth(randn(n,r)),V0=orth(randn(m,r))U^{0}={\rm orth}({\rm randn}(n,r)),V^{0}={\rm orth}({\rm randn}(m,r)) with r[n]r\in[n] specified in the subsequent experiments. We terminate Algorithm 1 at the iterate (Uk,Vk)(U^{k},V^{k}) when k>200k>200 or U¯k(V¯k)U¯k1(V¯k1)FU¯k(V¯k)Fϵ1\frac{\|\overline{U}^{k}(\overline{V}^{k})^{\top}-\overline{U}^{k-1}(\overline{V}^{k-1})^{\top}\|_{F}}{\|\overline{U}^{k}(\overline{V}^{k})^{\top}\|_{F}}\leq\epsilon_{1} or max1i9|Φμ,λ(U¯k,V¯k)Φμ,λ(U¯ki,V¯ki)|max{1,Φμ,λ(U¯k,V¯k)}ϵ2\frac{\max_{1\leq i\leq 9}|\Phi_{\mu,\lambda}(\overline{U}^{k},\overline{V}^{k})-\Phi_{\mu,\lambda}(\overline{U}^{k-i},\overline{V}^{k-i})|}{\max\{1,\Phi_{\mu,\lambda}(\overline{U}^{k},\overline{V}^{k})\}}\leq\epsilon_{2}. The parameters of Algorithm 2 are chosen as ϱ1=ϱ2=5,α¯=1010\varrho_{1}\!=\!\varrho_{2}\!=\!5,\,\underline{\alpha}\!=\!10^{-10} and α¯=1010\overline{\alpha}\!=\!10^{10}. For fair comparison, Algorithm 2 uses the same starting point as for Algorithm 1, and the similar stopping condition to that of Algorithm 1, i.e., terminate the iterate (Uk,Vk)(U^{k},V^{k}) when k>200k>200 or Uk(Vk)Uk1(Vk1)FUk(Vk)Fϵ3\frac{\|{U}^{k}({V}^{k}\!)^{\top}-{U}^{k-1}({V}^{k-1})^{\top}\|_{F}}{\|{U}^{k}({V}^{k})^{\top}\|_{F}}\leq\epsilon_{3} or max1i9|Φμ,λ(Uk,Vk)Φμ,λ(Uki,Vki)|max{1,Φμ,λ(Uk,Vk)}ϵ4\frac{\max_{1\leq i\leq 9}|\Phi_{\mu,\lambda}({U}^{k},{V}^{k})-\Phi_{\mu,\lambda}({U}^{k-i},{V}^{k-i})|}{\max\{1,\Phi_{\mu,\lambda}({U}^{k},{V}^{k})\}}\leq\epsilon_{4}.

5.3 Numerical results for simulated data

We test the two solvers on simulated data for one-bit matrix completion problems. The true matrix MM^{*} of rank rr^{*} is generated by M=ML(MR)M^{*}\!=M_{L}^{*}(M_{R}^{*})^{\top} with MLn×rM_{L}^{*}\in\mathbb{R}^{n\times r^{*}} and MRm×rM_{R}^{*}\in\mathbb{R}^{m\times r^{*}}, where the entries of MLM_{L}^{*} and MRM_{R}^{*} are drawn i.i.d. from a uniform distribution on [12,12][-\frac{1}{2},\frac{1}{2}]. We obtain one-bit observations by adding noise and recording the signs of the resulting values. Among others, the noise obeys the standard logistic distribution for case (I) and the Laplacian distribution Laplace(0,b)\textbf{Laplace}(0,b) for case (II) with b=2b=2. The noisy observation entries Yit,jtY_{i_{t},j_{t}} with (it,jt)Ω(i_{t},j_{t})\in\Omega are achieved by (38), where the index set Ω\Omega is given by uniform sampling. We use the relative error XoutMFMF\frac{\|X^{\rm out}-M^{*}\|_{F}}{\|M^{*}\|_{F}} to evaluate the recovery performance, where Xout=Uout(Vout)X^{\rm out}=U^{\rm out}(V^{\rm out})^{\top} denotes the output of a solver.

We take n=m=2000,r=3rn=m=2000,\,r=3r^{*} with r=10r^{*}=10 and sample rate SR=0.4{\rm SR}=0.4 to test how the relative error (RE) and rank vary with parameter λ=cλmaxj[m](Yj)\lambda=c_{\lambda}\max_{j\in[m]}(\|Y_{j}\|). The parameter μ\mu in model (1) is always chosen to be μ=108\mu=10^{-8}. Figures 1-2 plot the average RE, rank and time (in seconds) curves by running five different instances with Algorithm 1 for ϵ1=ϵ3=5×104\epsilon_{1}=\epsilon_{3}=5\times 10^{-4} and Algorithm 2 for ϵ2=ϵ4=103\epsilon_{2}=\epsilon_{4}=10^{-3}, respectively. We see that when solving model (1) with smaller λ\lambda (say, cλ3.2c_{\lambda}\leq 3.2 for Figure 1 and cλ6.4c_{\lambda}\leq 6.4 for Figure 2), Algorithm 1 returns lower RE within less time; and when solving model (1) with larger λ\lambda (say, cλ>3.2c_{\lambda}>3.2 for Figure 1 and cλ>6.4c_{\lambda}>6.4 for Figure 2), the two solvers yield the comparable RE and require comparable running time. Note that model (1) with a small λ\lambda is more difficult than the one with a large λ\lambda. This shows that Algorithm 1 is superior to Algorithm 2 in terms of RE and running time for those difficult test instances. In addition, during the tests, we find that the relative errors yielded by the two solvers will have a rebound as the number of iterations increases, especially under the scenario where the sample ratio is low, but the RE yielded by Algorithm 2 rebounds earlier than the RE of Algorithm 1; see Figure 3 below.

Refer to caption
Figure 1: Curves of relative error, rank and time for the two solvers under Case I
Refer to caption
Figure 2: Curves of relative error, rank and time for the two solvers under Case II
Refer to caption
Figure 3: Curves of relative error for the two solvers under Case I with different θ\theta

6 Conclusion

For the low-rank composite factorization model (1), we proposed a majorized PAMA with subspace correction by minimizing alternately the majorization Φ^λ,μ\widehat{\Phi}_{\lambda,\mu} of Φλ,μ\Phi_{\lambda,\mu} at each iterate and imposing a subspace correction step on per subproblem to ensure that it has a closed-norm solution. We established the convergence of subsequences for the generated iterate sequence, and achieved the convergence of the whole iterate sequence and the column subspace sequence of factor pairs under the KL property of Φλ,μ\Phi_{\lambda,\mu} and a restrictive condition that can be satisfied by the column 2,0\ell_{2,0}-norm function. To the best of our knowledge, this is the first subspace correction AM method with convergence certificate for low-rank factorization models. The obtained convergence results also provide the convergence guarantee for softImpute-ALS proposed in [13]. Numerical comparison with a line-search PALM method on one-bit matrix completions validates the efficiency of the subspace correction PAMA.

References

  • [1] H. Attouch, J. Bolte, P. Redont and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kerdyka-Łojasiewicz inequality, Mathematics of Operations Research, 35(2010): 438-457.
  • [2] J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming, 146 (2014), pp. 459–494.
  • [3] S. Bhojanapalli, B. Neyshabur, and N. Srebro, Global Optimality of Local Search for Low Rank Matrix Recovery, Advances in Neural Information Processing Systems, (2016), pp. 3873–3881.
  • [4] J. Barzilai and J. M. Borwein Two-point step size gradient methods, IMA Journal of Numerical Analysis, 8(1988), pp. 141–148.
  • [5] S. J. Bi, T. Ta and S. H. Pan, KL property of exponent 1/21/2 of 2,0\ell_{2,0}-norm and DC regularized factorizations for low-rank matrix recovery, Pacific Journal of Optimization, 18(2022): 1–26.
  • [6] E. J. Candès and B. Recht, Exact matrix completion via convex optimization Foundations of Computational Mathematics, 9(2009): 717–772.
  • [7] X. Chen and P. Tseng, Non-Interior continuation methods for solving semidefinite complementarity problems, Mathematical Programming, 95(2003): 431–474.
  • [8] T. Cai and W. X. Zhou, A max-norm constrained minimization approach to 1-bit matrix completion Journal of Machine Learning Research, 9(2013): 3619–3647.
  • [9] W. Cao, J. Sun and Z. Xu, Fast image deconvolution using closed-form thresholding formulas of Lq(q=12,23)L_{q}\ (q=\frac{1}{2},\frac{2}{3}) regularization Journal of Visual Communication and Image representation, 24(2013): 31–41.
  • [10] E. Chouzenoux, J. C. Pesque and A. Repetti, A block coordinate variable metric forward-backward algorithm, Journal of Global Optimization, 66(2016): 457–485.
  • [11] C. Ding, D. F. Sun and K. C. Toh, An introduction to a class of matrix cone programming, Mathematical Programming, 144(2014): 141–179.
  • [12] F. M. Dopico, A note on sin theorems for singular subspace variations, BIT Numerical Mathematics, 40(2): 395–403.
  • [13] T. Hastie, R. Mazumder, J. D. Lee and R. Zadeh, Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares Journal of Machine Learning Research, 16(2015): 3367-3402.
  • [14] P. Jain, P. Netrapalli and S. Sanghavi, Low-rank matrix completion using alternating minimization In Proceedings of the 45th annual ACM Symposium on Theory of Computing (STOC), (2013): 665–674.
  • [15] S. Negahban and M. Wainwright, Estimation of (near) low-rank matrices with noise and high-dimensional scaling, The Annals of Statistics, 39(2011): 1069–1097.
  • [16] P. Ochs, Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano, SIAM Journal on Optimization, 29(2019): 541–570.
  • [17] T. Pock and S. Sabach, Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems, SIAM review, 9(2016): 1756–1787.
  • [18] B. Recht, M. Fazel, and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM review, 52(2010): 471-501.
  • [19] R. T. Rockafellar and R. J-B. Wets, Variational analysis, Springer, 1998.
  • [20] F. H. Shang, Y. Y. Liu and F. J. Shang , A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms, Mathematics. 8(2020):1-19.
  • [21] N. Srebro, J. D. M. Rennie and T. Jaakkola, Maximum-margin matrix factorization, Advances In Neural Information Processing Systems, 17, 2005.
  • [22] G. W. Stewart and J.  G.  Sun, Maximum-margin matrix factorization, Academic Press, Boston, 1990.
  • [23] T. Tao, S. H. Pan and S. J. Bi, Error bound of critical points and KL property of exponent 1/21/2 for squared F-norm regularized factorization, Journal of Global Optimization, 81(2021): 991–1017.
  • [24] T. Tao, Y T. Qian and S. H. Pan, Column 2,0\ell_{2,0}-norm regularized factorization model of low-rank matrix recovery and its computation, SIAM Journal on Optimization, 32(2022): 959–988.
  • [25] Y. Y. Xu and W. T. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, Journal of Scientific Computing, 72(2017): 700-734.
  • [26] Z. Xu, X. Chang, F. Xu and H. Zhang, 1/2\ell_{1/2} regularization: A thresholding representation theory and a fast solver, IEEE Transactions on Neural Networks and Learning Systems, 23(2012): 1013–1027.

Appendix A

The following proposition states that any stationary point (U¯,V¯)(\overline{U},\overline{V}) of problem (1) satisfies the balance, i.e., U¯U¯=V¯V¯\overline{U}^{\top}\overline{U}=\overline{V}^{\top}\overline{V}.

Proposition 1

Denote by 𝒮\mathcal{S}^{*} the set of stationary points of (1). If θ(t)0\theta^{\prime}(t)\geq 0 for t>0t>0, then 𝒮critFμ\mathcal{S}^{*}\subset{\rm crit}\,F_{\mu}, and hence every (U¯,V¯)𝒮(\overline{U},\overline{V})\in\mathcal{S}^{*} satisfies U¯U¯=V¯V¯\overline{U}^{\top}\overline{U}=\overline{V}^{\top}\overline{V} and JU¯=JV¯J_{\overline{U}}=J_{\overline{V}}, where Fμ(U,V):=f(UV)+(μ/2)(UF2+VF2)F_{\mu}(U,V):=f(UV^{\top})+(\mu/2)(\|U\|_{F}^{2}+\|V\|_{F}^{2}) for (U,V)n×r×m×r(U,V)\in\mathbb{R}^{n\times r}\times\mathbb{R}^{m\times r}.

Proof: Pick any (U¯,V¯)𝒮(\overline{U},\overline{V})\in\mathcal{S}^{*}. From Definition 2.2, it follows that for any j[r]j\in[r],

0f(U¯V¯)V¯+μU¯+λϑ(U¯)and 0[f(U¯V¯)]U¯+μV¯+λϑ(V¯).0\in\nabla f(\overline{U}\overline{V}^{\top})\overline{V}+\mu\overline{U}+\lambda\partial\vartheta(\overline{U})\ \ {\rm and}\ \ 0\in\big{[}\nabla f(\overline{U}\overline{V}^{\top})\big{]}^{\top}\overline{U}+\mu\overline{V}+\lambda\partial\vartheta(\overline{V}). (39)

For each jJU¯j\in J_{\overline{U}}, from the first inclusion in (39) and Lemma 2.2, we have

0=f(U¯V¯𝕋)V¯j+μU¯j+λθ(U¯j)U¯jU¯j,0=\nabla f(\overline{U}\overline{V}^{\mathbb{T}})\overline{V}_{\!j}+\mu\overline{U}_{j}+\lambda\theta^{\prime}(\|\overline{U}_{\!j}\|)\frac{\overline{U}_{j}}{\|\overline{U}_{\!j}\|},

and hence f(U¯V¯)V¯j=(μ+λθ(U¯j)U¯j)U¯j\|\nabla f(\overline{U}\overline{V}^{\top})\overline{V}_{\!j}\|=\big{(}\mu+\lambda\frac{\theta^{\prime}(\|\overline{U}_{j}\|)}{\|\overline{U}_{j}\|}\big{)}\|\overline{U}_{\!j}\|. Recall that θ(t)0\theta^{\prime}(t)\geq 0 for t>0t>0. For each jJU¯j\in J_{\overline{U}}, it holds that f(U¯V¯)V¯j>0\|\nabla f(\overline{U}\overline{V}^{\top})\overline{V}_{\!j}\|>0 and hence V¯j0\overline{V}_{\!j}\neq 0. Consequently, JU¯JV¯J_{\overline{U}}\subset J_{\overline{V}}. For each jJV¯j\in J_{\overline{V}}, from the second inclusion in (39) and Lemma 2.2,

0[f(U¯V¯)]U¯j+μV¯j+λθ(V¯j)V¯jV¯j,0\in\big{[}\nabla f(\overline{U}\overline{V}^{\top})\big{]}^{\top}\overline{U}_{j}+\mu\overline{V}_{j}+\lambda\theta^{\prime}(\|\overline{V}_{j}\|)\frac{\overline{V}_{j}}{\|\overline{V}_{j}\|},

which by using the same arguments as above implies that U¯j0\overline{U}_{\!j}\neq 0, and then JV¯JU¯J_{\overline{V}}\subset J_{\overline{U}}. Thus, JU¯=JV¯:=JJ_{\overline{U}}=J_{\overline{V}}:=J. Together with equation (39), it follows that

f(U¯V¯)V¯J+μU¯J=0and[f(U¯V¯)]U¯J+μV¯J=0,\nabla\!f(\overline{U}\overline{V}^{\top})\overline{V}_{\!J}+\mu\overline{U}_{\!J}=0\ \ {\rm and}\ \ [\nabla\!f(\overline{U}\overline{V}^{\top})]^{\top}\overline{U}_{\!J}+\mu\overline{V}_{\!J}=0,

which implies that Fμ(U¯,V¯)=0\nabla\!F_{\mu}(\overline{U},\overline{V})=0 and (U¯,V¯)critFμ(\overline{U},\overline{V})\in{\rm crit}\,F_{\mu}. Consequently, the desired inclusion follows. From [23, Lemma 2.2], every (U¯,V¯)𝒮(\overline{U},\overline{V})\in\mathcal{S}^{*} satisfies U¯U¯=V¯V¯\overline{U}^{\top}\overline{U}=\overline{V}^{\top}\overline{V}. \Box

Proposition 2

Suppose that Assumptions 1-2 holds with θ\theta^{\prime} being strictly continuous on ++\mathbb{R}_{++}, that there is W𝒲W^{*}\in\mathcal{W}^{*} with X¯\overline{X}^{*} having distinct nonzero singular values, and Φλ,μ\Phi_{\lambda,\mu} has the KL property at (U¯,V¯)(\overline{U}^{*},\overline{V}^{*}). Let δ=0.5min1i<jr¯+1[σi(X¯)σj(X¯)]\delta=0.5\min\limits_{1\leq i<j\leq\overline{r}+1}[\sigma_{i}(\overline{X}^{*})\!-\!\sigma_{j}(\overline{X}^{*})] for r¯=rank(X¯)\overline{r}\!={\rm rank}(\overline{X}^{*}).

  • (i)

    Then, for any X,X𝔹(X¯,δ){Xn×m|rank(X)=r¯}X,X^{\prime}\in\mathbb{B}(\overline{X}^{*},\delta)\cap\{X\in\mathbb{R}^{n\times m}\ |\ {\rm rank}(X)=\overline{r}\} and (U,V)𝕆m,n(X)(U,V)\in\mathbb{O}^{m,n}(X) and (U,V)𝕆m,n(X)(U^{\prime},V^{\prime})\in\mathbb{O}^{m,n}(X^{\prime}),

    max(max1ir¯UiUi,max1ir¯ViVi)(2/δ)XXF;\max\big{(}\max_{1\leq i\leq\overline{r}}\|U_{i}-U_{i}^{\prime}\|,\max_{1\leq i\leq\overline{r}}\|V_{i}-V_{i}^{\prime}\|\big{)}\leq({2}/{\delta})\|X-X^{\prime}\|_{F};
  • (ii)

    there exists k¯\overline{k}\in\mathbb{N} such that for all kk¯k\geq\overline{k}, X¯k𝔹(X¯,δ){Xn×m|rank(X)=r¯}\overline{X}^{k}\in\mathbb{B}(\overline{X}^{*},\delta)\cap\{X\in\mathbb{R}^{n\times m}\ |\ {\rm rank}(X)=\overline{r}\},

    dist(0,Φλ,μ(U¯k+1,V¯k+1))\displaystyle{\rm dist}\big{(}0,\partial\Phi_{\lambda,\mu}(\overline{U}^{{k}+1},\overline{V}^{{k}+1})\big{)} cs(X¯k+1X¯kF+X^k+1X¯kF)\displaystyle\leq c_{s}(\|\overline{X}^{{k}+1}-\overline{X}^{{k}}\|_{F}+\|\widehat{X}^{{k}+1}\!\!\!-\!\overline{X}^{{k}}\|_{F})
    +cs(Uk+1U¯kF+Vk+1V^k+1F).\displaystyle+c_{s}(\|U^{{k}+1}-\overline{U}^{{k}}\|_{F}+\|V^{{k}+1}\!\!\!-\!\widehat{V}^{{k}+1}\|_{F}).

Proof: (i) As X¯\overline{X}^{*} has distinct nonzero singular values with min1i<jr¯+1[σi(X¯)σj(X¯)]=2δ\min\limits_{1\leq i<j\leq\overline{r}+1}[\sigma_{i}(\overline{X}^{*})-\sigma_{j}(\overline{X}^{*})]=2\delta, from Wely’s Theorem [22, Corollary 4.9], for any X𝔹(X¯,δ){Xn×m|rank(X)=r¯}X\in\mathbb{B}(\overline{X}^{*},\delta)\cap\{X\in\mathbb{R}^{n\times m}\,|\,{\rm rank}(X)=\overline{r}\}, min1i<jr¯+1[σi(X)σj(X)]δ\min\limits_{1\leq i<j\leq\overline{r}+1}[\sigma_{i}(X)-\sigma_{j}(X)]\geq\delta. Fix any X,X𝔹(X¯,δ){Xn×m|rank(X)=r¯}X,X^{\prime}\in\mathbb{B}(\overline{X}^{*},\delta)\cap\{X\in\mathbb{R}^{n\times m}\,|\,{\rm rank}(X)=\overline{r}\} and any (U,V)𝕆m,n(X)(U,V)\in\mathbb{O}^{m,n}(X) and (U,V)𝕆m,n(X)(U^{\prime},V^{\prime})\in\mathbb{O}^{m,n}(X^{\prime}). By [12, Theorem 2.1], we get the result.

(ii) As X¯\overline{X}^{*} is a cluster point of {X^k}\{\widehat{X}^{k}\} and {X¯k}\{\overline{X}^{k}\} by Theorem 4.2 and limkX¯kX^kF=0\lim_{k\to\infty}\|\overline{X}^{k}\!-\!\widehat{X}^{k}\|_{F}=0 by Corollary 4.1 (iv), there exists k¯\overline{k}\in\mathbb{N} such that X¯k¯+1,X^k¯+1𝔹(X¯,δ/2){Xn×m|rank(X)=r¯}\overline{X}^{\overline{k}+1},\widehat{X}^{\overline{k}+1}\in\mathbb{B}(\overline{X}^{*},\delta/2)\cap\{X\in\mathbb{R}^{n\times m}\,|\,{\rm rank}(X)=\overline{r}\}. From part (i) with X=X¯k¯+1X=\overline{X}^{\overline{k}+1} and X=X^k¯+1X^{\prime}=\widehat{X}^{\overline{k}+1}, (25)-(26) hold with R1k¯+1=R2k¯=Ir¯R_{1}^{\overline{k}+1}=R_{2}^{\overline{k}}=I_{\overline{r}}, so Lemma 4.2 holds with A1k¯+1=B1k¯=Ir¯A_{1}^{\overline{k}+1}=B_{1}^{\overline{k}}=I_{\overline{r}}. By the local Lipschitz continuity of ϑJ\nabla\vartheta_{\!J} with J=[r¯]J=[\overline{r}], inequalities (29a)-(29b) hold with k=k¯k=\overline{k}. From Proposition 4.1,

dist(0,Φλ,μ(U¯k¯+1,V¯k¯+1))\displaystyle{\rm dist}\big{(}0,\partial\Phi_{\lambda,\mu}(\overline{U}^{\overline{k}+1},\overline{V}^{\overline{k}+1})\big{)} cs(X¯k¯+1X¯k¯F+X^k¯+1X¯k¯F)\displaystyle\leq c_{s}(\|\overline{X}^{\overline{k}+1}-\overline{X}^{\overline{k}}\|_{F}+\|\widehat{X}^{\overline{k}+1}-\overline{X}^{\overline{k}}\|_{F})
+cs(Uk¯+1U¯k¯F+Vk¯+1V^k¯+1F).\displaystyle\quad+c_{s}(\|U^{\overline{k}+1}-\overline{U}^{\overline{k}}\|_{F}+\|V^{\overline{k}+1}-\widehat{V}^{\overline{k}+1}\|_{F}). (40)

Since Φλ,μ\Phi_{\lambda,\mu} has the KL property at (U¯,V¯)(\overline{U}^{*},\overline{V}^{*}), there exist η>0\eta>0, a neighborhood 𝒩\mathcal{N} of (U¯,V¯)(\overline{U}^{*},\overline{V}^{*}), and a continuous concave function φ:[0,η)+\varphi\!:[0,\eta)\to\mathbb{R}_{+} satisfying Definition 2.3 (i)-(ii) such that for all (U,V)𝒩[Φλ,μ(U¯,V¯)<Φλ,μ<Φλ,μ(U¯,V¯)+η](U,V)\in\mathcal{N}\cap[\Phi_{\lambda,\mu}(\overline{U}^{*},\overline{V}^{*})<\Phi_{\lambda,\mu}<\Phi_{\lambda,\mu}(\overline{U}^{*},\overline{V}^{*})+\eta],

φ(Φλ,μ(U,V)Φλ,μ(U¯,V¯))dist(0,Φλ,μ(U,V))1.\varphi^{\prime}(\Phi_{\lambda,\mu}(U,V)-\Phi_{\lambda,\mu}(\overline{U},\overline{V})){\rm dist}(0,\partial\Phi_{\lambda,\mu}(U,V))\geq 1. (41)

Let Γk,k+1:=φ(Φλ,μ(U¯k,V¯k)Φλ,μ(U¯,V¯))φ(Φλ,μ(U¯k+1,V¯k+1)Φλ,μ(U¯,V¯))\Gamma_{k,k+1}:=\!\varphi\big{(}\Phi_{\lambda,\mu}(\overline{U}^{k},\overline{V}^{k})-\Phi_{\lambda,\mu}(\overline{U}^{*},\overline{V}^{*})\big{)}\!-\!\varphi\big{(}\Phi_{\lambda,\mu}(\overline{U}^{k+1},\overline{V}^{k+1})\!-\!\Phi_{\lambda,\mu}(\overline{U}^{*},\overline{V}^{*})\big{)}. If necessary by increasing k¯\overline{k}, (U¯k¯+1,V¯k¯+1)𝒩[Φλ,μ(U¯,V¯)<Φλ,μ<Φλ,μ(U¯,V¯)+η](\overline{U}^{\overline{k}+1},\overline{V}^{\overline{k}+1})\!\in\!\mathcal{N}\cap[\Phi_{\lambda,\mu}(\overline{U}^{*},\overline{V}^{*})<\Phi_{\lambda,\mu}<\Phi_{\lambda,\mu}(\overline{U}^{*},\overline{V}^{*})+\eta]. Then

φ(Φλ,μ(U¯k¯+1,V¯k¯+1)Φλ,μ(U¯,V¯))dist(0,Φλ,μ(U¯k¯+1,V¯k¯+1))1.\varphi^{\prime}(\Phi_{\lambda,\mu}(\overline{U}^{\overline{k}+1},\overline{V}^{\overline{k}+1})-\Phi_{\lambda,\mu}(\overline{U}^{*},\overline{V}^{*})){\rm dist}(0,\partial\Phi_{\lambda,\mu}(\overline{U}^{\overline{k}+1},\overline{V}^{\overline{k}+1}))\geq 1.

Together with the concavity of φ\varphi, it is easy to obtain that

Φλ,μ(U¯k¯+1,V¯k¯+1)Φλ,μ(U¯k¯+2,V¯k¯+2)dist(0,Φλ,μ(U¯k¯+1,V¯k¯+1))Γk¯+1,k¯+2.\Phi_{\lambda,\mu}(\overline{U}^{\overline{k}+1},\overline{V}^{\overline{k}+1})-\Phi_{\lambda,\mu}(\overline{U}^{\overline{k}+2},\overline{V}^{\overline{k}+2})\leq{\rm dist}(0,\partial\Phi_{\lambda,\mu}(\overline{U}^{\overline{k}+1},\overline{V}^{\overline{k}+1}))\Gamma_{\overline{k}+1,\overline{k}+2}.

Combining this inequality with Corollary 4.1 (iii) and the above (A majorized PAM method with subspace correction for low-rank composite factorization model), it holds that

aΞk¯+2a2Ξk¯+1+acs2Γk¯+1,k¯+2witha=min(γ¯4,γ¯16β¯2),\sqrt{a}\Xi^{\overline{k}+2}\leq\frac{\sqrt{a}}{2}\Xi^{\overline{k}+1}+\frac{\sqrt{a}c_{s}}{2}\Gamma_{\overline{k}+1,\overline{k}+2}\quad{\rm with}\ a=\min(\frac{\overline{\gamma}}{4},\frac{\overline{\gamma}}{16\overline{\beta}^{2}}),

where Ξk+1:=X¯k¯+1X¯k¯F+X^k¯+1X¯k¯F+Uk¯+1U¯k¯F+Vk¯+1V^k¯+1F\Xi^{k+1}:=\|\overline{X}^{\overline{k}+1}\!-\!\overline{X}^{\overline{k}}\|_{F}+\|\widehat{X}^{\overline{k}+1}\!-\!\overline{X}^{\overline{k}}\|_{F}+\|U^{\overline{k}+1}\!-\overline{U}^{\overline{k}}\|_{F}+\|V^{\overline{k}+1}-\widehat{V}^{\overline{k}+1}\|_{F}. By Theorem 4.2 (iii), Ξk¯+1+csφ(Φλ,μ(U¯k¯+1,V¯k¯+1)Φλ,μ(U¯,V¯))δ\Xi^{\overline{k}+1}+c_{s}\varphi\big{(}\Phi_{\lambda,\mu}(\overline{U}^{\overline{k}+1},\overline{V}^{\overline{k}+1})-\Phi_{\lambda,\mu}(\overline{U}^{*},\overline{V}^{*})\big{)}\leq\delta and X¯k+1X^k+1F0.25δ\|\overline{X}^{k+1}\!-\!\widehat{X}^{k+1}\|_{F}\leq 0.25\delta for all kk¯k\geq\overline{k} (if necessary by increasing k¯\overline{k}). Thus, we have Ξk¯+20.5δ\Xi^{\overline{k}+2}\leq 0.5\delta and hence X¯k¯+2,X^k¯+2𝔹(X¯,δ){Xn×m|rank(X)=r¯}\overline{X}^{\overline{k}+2},\widehat{X}^{\overline{k}+2}\in\mathbb{B}(\overline{X}^{*},\delta)\cap\{X\in\mathbb{R}^{n\times m}\,|\,{\rm rank}(X)=\overline{r}\}. By repeating the above arguments, we have

aΞk¯+3a2Ξk¯+2+acs2Γk¯+2,k¯+3,\sqrt{a}\Xi^{\overline{k}+3}\leq\frac{\sqrt{a}}{2}\Xi^{\overline{k}+2}+\frac{\sqrt{a}c_{s}}{2}\Gamma_{\overline{k}+2,\overline{k}+3},

This implies that a(Ξk¯+2+Ξk¯+3)a2Ξk¯+1+acsaΓk¯+1,k¯+3\sqrt{a}(\Xi^{\overline{k}+2}+\Xi^{\overline{k}+3})\leq\frac{\sqrt{a}}{2}\Xi^{\overline{k}+1}+\frac{\sqrt{a}c_{s}}{a}\Gamma_{\overline{k}+1,\overline{k}+3}, so X¯k¯+3,X^k¯+3𝔹(X¯,δ){Xn×m|rank(X)=r¯}\overline{X}^{\overline{k}+3},\widehat{X}^{\overline{k}+3}\in\mathbb{B}(\overline{X}^{*},\delta)\cap\{X\in\mathbb{R}^{n\times m}\,|\,{\rm rank}(X)=\overline{r}\}. By induction, we can obtain the desired result. \Box


Appendix B: A line-search PALM method for problem (1).

As mentioned in the introduction, the iterations of the PALM methods in [2, 17, 25] depend on the Lipschitz constants of UF(,Vk)\nabla_{U}F(\cdot,V^{k}) and VF(Uk+1,)\nabla_{V}F(U^{k+1},\cdot). An immediate upper estimation for them is Lfmax{Vk2,Uk+12}L_{f}\max\{\|V^{k}\|^{2},\|U^{k+1}\|^{2}\}, but it is too large and will make the performance of PALM methods worse. Here we present a PALM method by searching a favourable estimation for them. For any given (U,V)(U,V) and τ>0\tau>0, let 𝒬U(U,V;U,V,τ)\mathcal{Q}_{U}(U^{\prime},V^{\prime};U,V,\tau) and 𝒬V(U,V;U,V,τ)\mathcal{Q}_{V}(U^{\prime},V^{\prime};U,V,\tau) be defined by

𝒬U(U,V;U,V,τ):=F(U,V)+UF(U,V),UU+(τ/2)UUF2,\displaystyle\mathcal{Q}_{U}(U^{\prime},V^{\prime};U,V,\tau):=F(U,V)+\langle\nabla_{U}F(U,V),U^{\prime}-U\rangle+({\tau}/{2})\|U^{\prime}-U\|_{F}^{2},
𝒬V(U,V;U,V,τ):=F(U,V)+VF(U,V),VV+(τ/2)VVF2.\displaystyle\mathcal{Q}_{V}(U^{\prime},V^{\prime};U,V,\tau):=F(U,V)+\langle\nabla_{V}F(U,V),V^{\prime}-V\rangle+({\tau}/{2})\|V^{\prime}-V\|_{F}^{2}.

The iterations of the line-search PALM method are described as follows.

Algorithm 2 (A line-search PALM method)

Initialization: Choose ϱ1>1,ϱ2>1,0<α¯<α¯\varrho_{1}>1,\varrho_{2}>1,0<\underline{\alpha}<\overline{\alpha} and an initial point (U0,V0)𝕏r(U^{0},V^{0})\in\mathbb{X}_{r}.
For k=0,1,2,k=0,1,2,\ldots

  • 1.

    Select αk[α¯,α¯]\alpha_{k}\in[\underline{\alpha},\overline{\alpha}] and compute

    Uk+1argminUn×r𝒬U(U,Vk;Uk,Vk,αk).U^{k+1}\in\mathop{\arg\min}_{U\in\mathbb{R}^{n\times r}}\mathcal{Q}_{U}(U,V^{k};U^{k},V^{k},\alpha_{k}).
  • 2.

    while F(Uk+1,Uk)>𝒬U(Uk+1,Vk;Uk,Vk,αk)F(U^{k+1},U^{k})>\mathcal{Q}_{U}(U^{k+1},V^{k};U^{k},V^{k},\alpha_{k}) do

    • (a)

      τk=ϱτk\tau_{k}=\varrho\tau_{k};

    • (b)

      Compute Uk+1argminUn×r𝒬U(U,Vk;Uk,Vk,τk)U^{k+1}\in\mathop{\arg\min}_{U\in\mathbb{R}^{n\times r}}\mathcal{Q}_{U}(U,V^{k};U^{k},V^{k},\tau_{k})

    end (while)

  • 3.

    Select αk[α¯,α¯]\alpha_{k}\in[\underline{\alpha},\overline{\alpha}]. Compute

    Vk+1argminVm×r𝒬V(Uk+1,V;Uk+1,Vk,αk).V^{k+1}\in\!\mathop{\arg\min}_{V\in\mathbb{R}^{m\times r}}\mathcal{Q}_{V}(U^{k+1},V;U^{k+1},V^{k},\alpha_{k}).
  • 4.

    while F(Uk+1,Vk+1)>𝒬V(Uk+1,Vk+1;Uk+1,Vk,τk)F(U^{k+1},V^{k+1})>\mathcal{Q}_{V}(U^{k+1},V^{k+1};U^{k+1},V^{k},\tau_{k}) do

    • (a)

      αk=ϱ2αk\alpha_{k}=\varrho_{2}\alpha_{k};

    • (b)

      Compute Vk+1argminVm×r𝒬V(Uk+1,V;Uk+1,Vk,αk)V^{k+1}\in\mathop{\arg\min}_{V\in\mathbb{R}^{m\times r}}\mathcal{Q}_{V}(U^{k+1},V;U^{k+1},V^{k},\alpha_{k}).

    end (while)

  • 5.

    Let kk+1k\leftarrow k+1, and go to step 1.

end

Remark 1

In the implementation of Algorithm 2, we use the Barzilai-Borwein (BB) rule [4] to capture the initial αk\alpha_{k} in steps 1 and 3. That is, αk\alpha_{k} in step 1 is given by

max{min{UF(Uk,Vk)UF(Uk1,Vk)FUkUk1F),α¯},α¯},\max\Big{\{}\min\Big{\{}\frac{\|\langle\nabla_{U}F(U^{k},V^{k})-\nabla_{U}F(U^{k-1},V^{k})\|_{F}}{\|U^{k}-U^{k-1}\|_{F}}\Big{)},\underline{\alpha}\Big{\}},\overline{\alpha}\Big{\}},

while αk\alpha_{k} in step 2 is chosen to be

max{min{VF(Uk,Vk)VF(Uk,Vk1)FVkVk1F),α¯},α¯}.\max\Big{\{}\min\Big{\{}\frac{\|\langle\nabla_{V}F(U^{k},V^{k})-\nabla_{V}F(U^{k},V^{k-1})\|_{F}}{\|V^{k}-V^{k-1}\|_{F}}\Big{)},\underline{\alpha}\Big{\}},\overline{\alpha}\Big{\}}.