This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning idempotent representation for subspace clustering

Lai Wei, Shiteng Liu, Rigui Zhou and Changming Zhu L. Wei, S. Liu, R. Zhou, C. Zhu were with the College of Information Engineering, Shanghai Maritime University, Shanghai, 201306 China (e-mail: weilai@shmtu.edu.cn, Liushiteng1218@163.com, rgzhou@shmtu.edu.cn, cmzhu@shmtu.edu.cn.)
Abstract

The critical point for the successes of spectral-type subspace clustering algorithms is to seek reconstruction coefficient matrices which can faithfully reveal the subspace structures of data sets. An ideal reconstruction coefficient matrix should have two properties: 1) it is block diagonal with each block indicating a subspace; 2) each block is fully connected. Though there are various spectral-type subspace clustering algorithms have been proposed, some defects still exist in the reconstruction coefficient matrices constructed by these algorithms. We find that a normalized membership matrix naturally satisfies the above two conditions. Therefore, in this paper, we devise an idempotent representation (IDR) algorithm to pursue reconstruction coefficient matrices approximating normalized membership matrices. IDR designs a new idempotent constraint for reconstruction coefficient matrices. And by combining the doubly stochastic constraints, the coefficient matrices which are closed to normalized membership matrices could be directly achieved. We present the optimization algorithm for solving IDR problem and analyze its computation burden as well as convergence. The comparisons between IDR and related algorithms show the superiority of IDR. Plentiful experiments conducted on both synthetic and real world datasets prove that IDR is an effective and efficient subspace clustering algorithm.

Index Terms:
subspace clustering, idempotent matrix, doubly stochastic constraint, normalized membership matrix

I Introduction

High-dimensional data samples emerged in computer vision fields could be viewed as generated from a union of linear subspaces [1, 2, 3, 4]. subspace clustering, whose goal is to partition the data samples into several clusters with each cluster corresponding to a subspace, has attracted lots of researchers’ attentions. In the past decades, many kinds of subspace clustering algorithms have been proposed [5, 6, 7, 8, 9, 10]. Among them, spectral-type methods showed more excellent performances in many applications such as motion segmentation, face clustering and so on [9, 10, 11, 12].

Without loss of generality, suppose that a clean data matrix 𝐗=[𝐗1,𝐗2,\mathbf{X}=[\mathbf{X}_{1},\mathbf{X}_{2}, ,𝐗k]d×n\cdots,\mathbf{X}_{k}]\in\mathcal{R}^{d\times n} contains nn data samples drawn from kk subspaces. 𝐗i𝐗\mathbf{X}_{i}\subset\mathbf{X} denotes the sub-matrix including nin_{i} data samples lying in the ii-th subspace, where i=1hni=n\sum_{i=1}^{h}n_{i}=n. And if iji\neq j (i,j=1,2,,ki,j=1,2,\cdots,k), 𝐗i𝐗j=\mathbf{X}_{i}\cap\mathbf{X}_{j}=\emptyset. The framework of spectral-type subspace clustering algorithms is divided into three parts. Firstly, they learn a reconstruction coefficient matrix 𝐙n×n\mathbf{Z}\in\mathcal{R}^{n\times n} satisfying 𝐗=𝐗𝐙\mathbf{X}=\mathbf{XZ}. Secondly, an affinity matrix 𝐀\mathbf{A} is built by using the obtained reconstruction coefficient matrix, i.e. [𝐀]ij=(|[𝐙]ij|+|[𝐙]ij|)/2[\mathbf{A}]_{ij}=(|[\mathbf{Z}]_{ij}|+|[\mathbf{Z}^{\top}]_{ij}|)/2, where [𝐀]ij[\mathbf{A}]_{ij} and [𝐙]ij[\mathbf{Z}]_{ij} denote the (i,j)th(i,j)-th element of 𝐀\mathbf{A} and 𝐙\mathbf{Z} respectively, 𝐙\mathbf{Z}^{\top} is the transpose of 𝐙\mathbf{Z}. Finally, a certain spectral clustering algorithm, e.g. normalized cuts (Ncuts) [13], is used to get the final clustering results by using 𝐀\mathbf{A}. It could be clearly seen that the performance of a spectral-type algorithm mainly rely on the learned reconstruction matrix. An ideal coefficient matrix should have inter-subspace sparsity and intra-subspace connectivity. Namely, if 𝐱i\mathbf{x}_{i} and 𝐱j\mathbf{x}_{j} belong to a same subspace, |[𝐙]ij|>0|[\mathbf{Z}]_{ij}|>0. Otherwise, |[𝐙]ij|=0|[\mathbf{Z}]_{ij}|=0.

Different spectral-type methods use different regularizers to produce coefficient matrices with different characteristics. For instance, sparse subspace clustering (SSC) [9, 14] pursuits sparse reconstruction coefficient matrices by introducing a sparse constraint [15]. Low-rank representation (LRR) [10, 16] seeks a low-rank reconstruct coefficient matrix by minimize the nuclear norm of the coefficient matrix. Least square regression (LSR) [17] defines a Frobenuis norm regularizer and searches a dense reconstruction coefficient matrix. Block diagonal representation (BDR) [18] provides a kk block diagonal reconstruction coefficient matrix by minimizing the sum of smallest kk eigenvalues of the coefficient matrix’s Laplacian regularizer. Though these representative methods achieve promising results in different kinds of subspace clustering tasks, the obtained coefficient matrices still have some drawbacks. The coefficient matrices gotten by SSC are usually too sparse to lack connectedness within each subspace. The block diagonal constraint used in BDR may not lead the correct clustering, since each block still may not be fully connected. On the other hand, although the connectedness within subspaces are guaranteed in the dense coefficient matrices constructed by LRR and LSR, the coefficients of the inter-subspace samples are usually non-zero. In order to get away the dilemmas, three different types of methods are emerged. Firstly, some extensions of classical regularizers are developed. For example, Zhang et al. extended the nuclear norm regularizer used in LRR to a kind of Schatten-pp norm regularizer [19]. Xu et al. proposed a scaled simplex representation by adding the non-negative constraint and scaled affine constraint of the coefficient matrix obtained in LSR [20]. Secondly, researchers begin to use mixed regularizers of coefficient matrices. Li et al. proposed a structured sparse subspace clustering (SSSC) [21] by adding a re-weighted l1l_{1}-norm regularizer into SSC. Elastic net (EN) method defined a combination of l1l_{1}-norm and Frobenius regularizer of the coefficient matrices [22, 23]. Zhuang et al. combined sparse constraint and nuclear norm regularizer together to propose a non-negative low-rank and sparse representation method (NNLRSR) [24]. Tang et al. generalized NNLRSR and devised a structure-constrained LRR (SCLRR) [25]. Lu et al. presented a graph-regularized LRR (GLRR) algorithm by minimizing the nuclear norm and the Laplacian regularizer the coefficient matrix simultaneously [26]. Tang et al. designed a dense block and sparse representation (DBSR) method which used the 22-norm (the maximal singular value) and l1l_{1} norm regularizers to compute a dense block and sparse coefficient matrix [27]. Thirdly, classical spectral-type subspace clustering algorithms are integrated to build cascade models. Wei et al. devised a sparse relation representation by stacking SSC and LRR [28]. Sui et al. also provided a similar method to show the effectiveness of cascade models [29]. These extended methods outperform the classical algorithms to a certain extent, but they still may not guarantee to produce ideal coefficient matrices.

From the view point of a spectral clustering algorithm, the best affinity matrix 𝐌\mathbf{M}^{*} of data set 𝐗\mathbf{X} should have the following properties: If 𝐱i\mathbf{x}_{i} and 𝐱j\mathbf{x}_{j} belong to a same cluster, then [𝐌]ij=1[\mathbf{M}^{*}]_{ij}=1. Otherwise, [𝐌]ij=0[\mathbf{M}^{*}]_{ij}=0. Namely, 𝐌\mathbf{M}^{*} should be kk block diagonal and have the following formulation:

𝐌=(𝟏n1𝟏n1000𝟏n2𝟏n2000𝟏nk𝟏nk)\mathbf{M}=\left(\begin{array}[]{cccc}\mathbf{1}_{n_{1}}\mathbf{1}_{n_{1}}^{\top}&0&\cdots&0\\ 0&\mathbf{1}_{n_{2}}\mathbf{1}^{\top}_{n_{2}}&\cdots&0\\ \vdots&\ddots&\vdots&\vdots\\ 0&\cdots&0&\mathbf{1}_{n_{k}}\mathbf{1}^{\top}_{n_{k}}\end{array}\right) (1)

where 𝟏ni\mathbf{1}_{n_{i}} is a column vector with nin_{i} elements and each element equals 11. In the correlation clustering domain, 𝐌\mathbf{M}^{*} is called a membership matrix [30, 31, 32]. Moreover, the researchers also proved that a variation of the membership matrix, called normalized membership matrix, is also adequate for spectral clustering. The normalized membership matrix 𝐀\mathbf{A}^{*} corresponding to 𝐌\mathbf{M}^{*} is expressed as follows:

𝐀=(1n1𝟏n1𝟏n10001n2𝟏n2𝟏n20001nk𝟏nk𝟏nk).\mathbf{A}^{*}=\left(\begin{array}[]{cccc}\frac{1}{n_{1}}\mathbf{1}_{n_{1}}\mathbf{1}_{n_{1}}^{\top}&0&\cdots&0\\ 0&\frac{1}{n_{2}}\mathbf{1}_{n_{2}}\mathbf{1}^{\top}_{n_{2}}&\cdots&0\\ \vdots&\ddots&\vdots&\vdots\\ 0&\cdots&0&\frac{1}{n_{k}}\mathbf{1}_{n_{k}}\mathbf{1}^{\top}_{n_{k}}\end{array}\right). (2)

Back to the domain of subspace clustering, suppose an affinity matrix 𝐀\mathbf{A} is a normalized membership matrix. As we mentioned above, in a spectral-type subspace clustering algorithm, an affinity matrix is defined as [𝐀]ij=(|[𝐙]ij|+|[𝐙]ij|)/2[\mathbf{A}]_{ij}=(|[\mathbf{Z}]_{ij}|+|[\mathbf{Z}^{\top}]_{ij}|)/2. If we force 𝐙=𝐙\mathbf{Z}=\mathbf{Z}^{\top} and [𝐙]ij0[\mathbf{Z}]_{ij}\geq 0 (for all i,ji,j), the best reconstruction coefficient matrix 𝐙\mathbf{Z}^{*} should be same as 𝐀\mathbf{A}^{*}, namely 𝐙\mathbf{Z}^{*} is also a normalized membership matrix. We can see such 𝐙\mathbf{Z}^{*} is definitely inter-subspace sparse and intra-subspace connective. The property that each element in a block (i.e., 𝐙i\mathbf{Z}_{i}^{*}) equals 1/ni(>0)1/n_{i}(>0) means 𝐙\mathbf{Z}^{*} is fully connected in each block. Hence, this kind of coefficient matrix is better than that obtained by BDR. Fig. 1 presents the two coefficient matrices obtained by BDR and the proposed algorithm in this paper on a synthetic data set. We can see that all coefficient matrices are block diagonal, but each block in the coefficient matrices obtained by the proposed algorithms is denser than that obtained by BDR. Then the subspace structure of the data set could be revealed more faithfully by using the proposed algorithm.

Refer to caption
Figure 1: We generate 55 subspaces each of dimension d=5d=5 in an ambient space of dimension d=20d=20. We sampled 5050 data points from each subspace and constructed a d×250d\times 250 data matrix 𝐗=[𝐗1,,𝐗5]\mathbf{X}=[\mathbf{X}_{1},\cdots,\mathbf{X}_{5}] without noise. 𝐗i(i=1,2,,5)\mathbf{X}_{i}(i=1,2,\cdots,5) contains the samples from ithi-th subspace. Then we use BDR and the proposed algorithm in this paper to compute the coefficient matrices. The two coefficient matrices obtained by BDR are illustrated in (a) and (b). The two coefficient matrices achieved by the proposed algorithm are shown in (c) and (d).

In [32], Lee et al. also suggested constructing a normalized membership matrix for subspace clustering. However, the so-called membership representation (MR) algorithm [32] takes three steps to finally get the coefficient matrix. Firstly, a certain subspace clustering algorithm, such as SSC or LRR, was used to get an initial coefficient matrix. Secondly, MR sought a membership matrix by using the obtained initial coefficient matrix. Finally, a normalized membership matrix was computed with the obtained membership matrix. In the last two steps, augmented Lagrangian method (ALM) [33] was applied to solve the corresponding optimization problems. Hence, besides the computation time used for finding an initial coefficient matrix, the time cost in the last two steps of MR was also high.

In this paper, we invent a new method to find a coefficient matrix which is closed to a normalized membership matrix as much as possible. The motivation of the proposed algorithm is the self-expressiveness property of the reconstruction coefficient vectors obtained by subspace clustering algorithms. As we know, spectral-type subspace clustering algorithms assume that the original data samples obey the self-expressiveness property [9], i.e., each data point can be well reconstructed by a linear combination of other points in the given dataset. The self-expressiveness property of the obtained coefficient vectors means each coefficient vector could be linearly reconstructed by other coefficient vectors. Based on this proposition and the doubly stochastic constraints [34, 35], an idempotent representation (IDR) method for subspace clustering is proposed. For solving the IDR problem, an optimization algorithm is also presented. And the convergence as well as the complexity analysis of the optimization algorithm are given consequently. We also make comparisons between IDR and some related algorithms, so that the superiority of IDR is shown. Finally, extensive experiments conducted on both synthetic and real world databases show the effectiveness and efficiency of IDR method.

The rest of the paper is organized as follows: we introduce the general formulation of spectral-type subspace clustering algorithms in Section II. In Section III, we propose the idea of idempotent representation (IDR) and the optimization algorithm for solving IDR problem. The further discussions of IDR, such as the analysis on the convergence and complexity of the optimization algorithm, the connections between IDR and the related algorithms, are given in Section IV. Comparative subspace clustering experiments on both synthetic data set and real world data sets are performed in Section V. Section VI presents the conclusions.

II Preliminary

Through there is a wide variety of existing spectral-type subspace clustering algorithms, the general objective function of these algorithms could be expressed as follows:

min𝐙,𝐄Ω(𝐙)s.t.𝐗=𝐗𝐙,\begin{array}[]{ll}\min_{\mathbf{Z,E}}&\Omega(\mathbf{Z})\\ s.t.&\mathbf{X}=\mathbf{XZ},\end{array} (3)

where Ω(𝐙)\Omega(\mathbf{Z}) indicates a certain norm regularizer of 𝐙\mathbf{Z}, 𝐗d×n\mathbf{X}\in\mathcal{R}^{d\times n} is a data matrix. In real applications, data is often noisy or corrputed. Hence, the more robust version of above problem could be defined as follows:

min𝐙,𝐄Ω(𝐙)+λΦ(𝐄),s.t.𝐗=𝐗𝐙+𝐄,\begin{array}[]{ll}\min_{\mathbf{Z,E}}&\Omega(\mathbf{Z})+\lambda\Phi(\mathbf{E}),\\ s.t.&\mathbf{X}=\mathbf{XZ}+\mathbf{E},\end{array} (4)

where 𝐄\mathbf{E} is the error term, Φ(𝐄)\Phi(\mathbf{E}) is a certain measurement of 𝐄\mathbf{E}. λ\lambda is a positive parameter which is used to balance the effects of Ω(𝐙)\Omega(\mathbf{Z}) and Φ(𝐄)\Phi(\mathbf{E}). Moreover, some algorithms add some additional constraints of 𝐙\mathbf{Z} which could be expressed as Θ(𝐙)\Theta(\mathbf{Z}). Then the main differences between the existing subspace clustering algorithms are the definitions of Ω(),Φ()\Omega(\cdot),\Phi(\cdot) and Θ()\Theta(\cdot). We use Table I to summarize the formulations of Ω(𝐙),Φ(𝐄)\Omega(\mathbf{Z}),\Phi(\mathbf{E}) and Θ(𝐙)\Theta(\mathbf{Z}) of some representative subspace clustering algorithms.

TABLE I: The residual terms, regularizers and additional constraints of coefficient matrices used in some subspace clustering algorithms.
Algorithms Ω(𝐙)\Omega(\mathbf{Z}) Φ(𝐄)\Phi(\mathbf{E}) Θ(𝐙)\Theta(\mathbf{Z})
SSC 𝐙1\|\mathbf{Z}\|_{1} 𝐄1\|\mathbf{E}\|_{1} diag(𝐙)=𝟎ndiag(\mathbf{Z})=\mathbf{0}_{n}
LRR 𝐙\|\mathbf{Z}\|_{*} 𝐄2,1\|\mathbf{E}\|_{2,1} -
LSR 𝐙F2\|\mathbf{Z}\|_{F}^{2} 𝐄F2\|\mathbf{E}\|_{F}^{2} diag(𝐙)=𝟎ndiag(\mathbf{Z})=\mathbf{0}_{n}
BDR 𝐙k\|\mathbf{Z}\|_{k} 𝐄F2\|\mathbf{E}\|_{F}^{2} diag(𝐙)=𝟎n,diag(\mathbf{Z})=\mathbf{0}_{n},
𝐙=𝐙,\mathbf{Z}=\mathbf{Z}^{\top},
𝐙0\mathbf{Z}\geq 0
SSSC (𝐈n+γ𝐐)𝐙1\|(\mathbf{I}_{n}+\gamma\mathbf{Q})\bigodot\mathbf{Z}\|_{1} 𝐄1\|\mathbf{E}\|_{1} diag(𝐙)=𝟎ndiag(\mathbf{Z})=\mathbf{0}_{n}
EN 𝐙F2\|\mathbf{Z}\|_{F}^{2} +γ𝐙1\gamma\|\mathbf{Z}\|_{1} 𝐄1\|\mathbf{E}\|_{1} diag(𝐙)=𝟎ndiag(\mathbf{Z})=\mathbf{0}_{n}
SCLRR 𝐙\|\mathbf{Z}\|_{*} +γ𝐙1\gamma\|\mathbf{Z}\|_{1} 𝐄2,1\|\mathbf{E}\|_{2,1} -
GLRR 𝐙\|\mathbf{Z}\|_{*} + γTr(𝐙𝐋𝐙)\gamma Tr(\mathbf{ZLZ}^{\top}) 𝐄2,1\|\mathbf{E}\|_{2,1} -
DBSR 𝐙2\|\mathbf{Z}\|_{2} + γ𝐙1\gamma\|\mathbf{Z}\|_{1} 𝐄2,1\|\mathbf{E}\|_{2,1} -

Notice: γ>0\gamma>0 is a parameter. diag(𝐙)diag(\mathbf{Z}) denotes a column vector composed by the elements in the diagonal of 𝐙\mathbf{Z}. 𝟎n\mathbf{0}_{n} is a column vector with each element equals 0. In BDR, 𝐙k\|\mathbf{Z}\|_{k} is a constraint which forces 𝐙\mathbf{Z} to be kk block diagonal. In SSSC, 𝐐n×n\mathbf{Q}\in\mathcal{R}^{n\times n} is a weighted matrix updated by the segmentation results in each iteration and 𝐈n\mathbf{I}_{n} is an n×nn\times n identity matrix. In GLRR, Tr()Tr(\cdot) denotes the trace of a matrix and 𝐋\mathbf{L} is the Laplacian matrix built by using K-nearest-neighbors (KNN) [36] and 𝐗\mathbf{X}.

All the algorithms mentioned in Table I could be solved by using ALM. Consequently, Ncuts is used to get final subspace clustering results.

III Idempotent representation

III-A Motivation

The key point of the spectral-type subspace clustering algorithms have in common is that they all assume that the data samples in 𝐗\mathbf{X} obey the self-expressiveness property [9]. Namely, each data sample could be approximately constructed by a linear combination of other data points in the given dataset with tolerable errors. Thus, 𝐗𝐗𝐙\mathbf{X}\approx\mathbf{XZ} and 𝐙\mathbf{Z} records the reconstruction relationship of the original data samples.

In addition, as described in [10, 16], the obtained coefficient matrix 𝐙\mathbf{Z} is a representation of the original data matrix 𝐗\mathbf{X} with 𝐳i\mathbf{z}_{i} being the representation of 𝐱i\mathbf{x}_{i}. Here, 𝐳i\mathbf{z}_{i} and 𝐱i\mathbf{x}_{i} are the ii-th columns of 𝐙\mathbf{Z} and 𝐗\mathbf{X} respectively. Then it is reasonable to assume that the coefficient vectors also obey the self-expressiveness property ( Self-expressiveness property of coefficient vectors), namely each coefficient vector could be linearly reconstructed by other coefficient vectors in 𝐙\mathbf{Z}. Therefore, we have

𝐙=𝐙𝐓,\mathbf{Z}=\mathbf{ZT}, (5)

where 𝐓\mathbf{T} is a reconstruction coefficient matrix corresponding to 𝐙\mathbf{Z}. Moreover, we could hope 𝐓\mathbf{T} to be closed to 𝐙\mathbf{Z}. Because if 𝐙\mathbf{Z} is a good representation of 𝐗\mathbf{X}, 𝐙\mathbf{Z} should follow the reconstruction relationship of the original data set and 𝐙\mathbf{Z} records the reconstruction relationship of the original data samples. Therefore, the following equation holds

𝐙𝐙×𝐙=𝐙2.\mathbf{Z}\approx\mathbf{Z}\times\mathbf{Z}=\mathbf{Z}^{2}. (6)

The above equation means that 𝐙\mathbf{Z} is approximate to an idempotent matrix.

It is easy to verify that an n×nn\times n identity matrix 𝐈n\mathbf{I}_{n} is idempotent and the solution to the problem 𝐗=𝐗𝐙\mathbf{X}=\mathbf{XZ}. Then in sepctral-type subspace clustering algorithm, the above idempotent constraint (Eq. (6)) is not sufficient for finding a good coefficient matrix. Fortunately, it could be checked that a normalized membership matrix is also an idempotent matrix. Hence, we will show how to add some necessary constraints to compel an idempotent reconstruction coefficient matrix to be a normalized membership matrix.

In fact, Lee et al. pointed that an idempotent matrix is a normalized membership matrix if and only if it is doubly stochastic [32]. And a doubly stochastic matrix 𝐙n×n\mathbf{Z}\in\mathcal{R}^{n\times n} can be completely described by the following doubly stochastic conditions [34, 35]:

𝟏n𝐙=𝟏n,𝐙=𝐙,𝐙𝟎.\mathbf{1}_{n}^{\top}\mathbf{Z}=\mathbf{1}_{n}^{\top},\mathbf{Z}=\mathbf{Z}^{\top},\mathbf{Z}\geq\mathbf{0}. (7)

However, these conditions still can not prevent 𝐙\mathbf{Z} to be 𝐈n\mathbf{I}_{n}. As mentioned above, for revealing the subspace structure of data set 𝐗\mathbf{X} with kk subspaces faithfully, a coefficient matrix should be kk block diagonal. Then for an idempotent and doubly stochastic coefficient matrix 𝐙\mathbf{Z}, we could simply let Tr(𝐙)=kTr(\mathbf{Z})=k, then 𝐙\mathbf{Z} would be kk block diagonal. This constraint could also prevent 𝐙\mathbf{Z} to degenerate the trivial solution, i.e., 𝐙=𝐈n\mathbf{Z}=\mathbf{I}_{n}. Therefore, by integrating these constraints and the general formulation of subspace clustering algorithms, we could define the idempotent representation (IDR) problem as follows:

min𝐙𝐙id+λ𝐄2,1,s.t.𝐗=𝐗𝐙+𝐄,𝟏n𝐙=𝟏n,𝐙=𝐙,𝐙𝟎,Tr(𝐙)=k,\begin{array}[]{ll}\min_{\mathbf{Z}}&\|\mathbf{Z}\|_{id}+\lambda\|\mathbf{E}\|_{2,1},\\ s.t.&\mathbf{X}=\mathbf{XZ}+\mathbf{E},\\ &\mathbf{1}_{n}^{\top}\mathbf{Z}=\mathbf{1}_{n}^{\top},\mathbf{Z}=\mathbf{Z}^{\top},\mathbf{Z}\geq\mathbf{0},Tr(\mathbf{Z})=k,\end{array} (8)

where 𝐙id\|\mathbf{Z}\|_{id} denotes an idempotent regularizer of 𝐙\mathbf{Z}, namely 𝐙id=𝐙𝐙2F2\|\mathbf{Z}\|_{id}=\|\mathbf{Z}-\mathbf{Z}^{2}\|_{F}^{2}. In most real applications, partial data samples are corrupted, hence we use l2,1l_{2,1} norm to measure the error term 𝐄\mathbf{E}.

Furthermore, all these restrictions imposed on 𝐙\mathbf{Z} will limit its representation capability. For alleviating this problem, we introduce an intermediate term and proposed the following relaxed problem:

min𝐙,𝐒𝐙𝐒F2+γ𝐒id+λ𝐄2,1,s.t.𝐗=𝐗𝐙+𝐄,𝟏n𝐒=𝟏n,𝐒=𝐒,𝐒𝟎,Tr(𝐒)=k,\begin{array}[]{ll}\min_{\mathbf{Z,S}}&\|\mathbf{Z}-\mathbf{S}\|_{F}^{2}+\gamma\|\mathbf{S}\|_{id}+\lambda\|\mathbf{E}\|_{2,1},\\ s.t.&\mathbf{X}=\mathbf{XZ}+\mathbf{E},\\ &\mathbf{1}_{n}^{\top}\mathbf{S}=\mathbf{1}_{n}^{\top},\mathbf{S}=\mathbf{S}^{\top},\mathbf{S}\geq\mathbf{0},Tr(\mathbf{S})=k,\end{array} (9)

where γ\gamma is also a positive parameter.

III-B Optimization

Similar to solving the existing subspace clustering problems, we use ALM [33] to find the solutions to IDR problem (e.g., Eq. (9)). Firstly, we need to transfer Eq. (9) to the following equivalent problem:

min𝐙,𝐒,𝐂,𝐃𝐙𝐒F2+γ𝐒𝐒𝐂F2+λ𝐄2,1,s.t.𝐗=𝐗𝐙+𝐄,𝐒=𝐂,𝐒=𝐒,𝐒𝟎,𝟏n𝐂=𝟏n,𝐒=𝐃,Tr(𝐃)=k,\begin{array}[]{ll}\min_{\mathbf{Z,S,C,D}}&\|\mathbf{Z}-\mathbf{S}\|_{F}^{2}+\gamma\|\mathbf{S}-\mathbf{SC}\|_{F}^{2}+\lambda\|\mathbf{E}\|_{2,1},\\ s.t.&\mathbf{X}=\mathbf{XZ}+\mathbf{E},\\ &\mathbf{S}=\mathbf{C},\mathbf{S}=\mathbf{S}^{\top},\mathbf{S}\geq\mathbf{0},\\ &\mathbf{1}_{n}^{\top}\mathbf{C}=\mathbf{1}_{n}^{\top},\\ &\mathbf{S}=\mathbf{D},Tr(\mathbf{D})=k,\end{array} (10)

where 𝐂,𝐃\mathbf{C},\mathbf{D} are two auxiliary variables. Then the corresponding augmented Lagrangian function of Eq. (10) could be expressed as follows:

𝔏=𝐙𝐒F2+γ𝐒𝐒𝐂F2+λ𝐄2,1+<𝐘1,𝐗𝐗𝐙𝐄>+<𝐘2,𝐒𝐂>+<𝐘3,𝟏n𝐂𝟏n>+<𝐘4,𝐒𝐃>+μ/2(𝐗𝐗𝐙𝐄F2+𝐒𝐂F2+𝟏n𝐂𝟏nF2+𝐒𝐃F2),\begin{array}[]{ll}\mathfrak{L}&=\|\mathbf{Z}-\mathbf{S}\|_{F}^{2}+\gamma\|\mathbf{S}-\mathbf{SC}\|_{F}^{2}+\lambda\|\mathbf{E}\|_{2,1}\\ &+<\mathbf{Y}_{1},\mathbf{X}-\mathbf{XZ}-\mathbf{E}>+<\mathbf{Y}_{2},\mathbf{S}-\mathbf{C}>\\ &+<\mathbf{Y}_{3},\mathbf{1}_{n}^{\top}\mathbf{C}-\mathbf{1}_{n}^{\top}>+<\mathbf{Y}_{4},\mathbf{S}-\mathbf{D}>\\ &+\mu/2\big{(}\|\mathbf{X}-\mathbf{XZ}-\mathbf{E}\|_{F}^{2}+\|\mathbf{S}-\mathbf{C}\|_{F}^{2}\\ &+\|\mathbf{1}_{n}^{\top}\mathbf{C}-\mathbf{1}_{n}^{\top}\|_{F}^{2}+\|\mathbf{S}-\mathbf{D}\|_{F}^{2}\big{)},\end{array} (11)

where 𝐘1,𝐘2\mathbf{Y}_{1},\mathbf{Y}_{2}, 𝐘3\mathbf{Y}_{3} and 𝐘4\mathbf{Y}_{4} are four Langrangian multipliers and μ>0\mu>0 is an additional parameter. By minimizing \mathcal{L}, the variables 𝐙,𝐒,𝐂,𝐃,𝐄{\mathbf{Z},\mathbf{S},\mathbf{C},\mathbf{D},\mathbf{E}} could be optimized alternately while fixing others.

1. Fix other variables and update 𝐙\mathbf{Z}. Then in the hthh-th iteration (hh is the number of iterations),

𝐙h+1=argmin𝐙𝐙𝐒hF2+<𝐘1h,𝐗𝐗𝐙𝐄h>+μh/2𝐗𝐗𝐙𝐄hF2=argmin𝐙𝐙𝐒hF2+μh/2𝐗𝐗𝐙𝐄h+𝐘1h/μhF2,\begin{array}[]{l}\mathbf{Z}^{h+1}=\arg\min_{\mathbf{Z}}\|\mathbf{Z}-\mathbf{S}^{h}\|_{F}^{2}+<\mathbf{Y}_{1}^{h},\mathbf{X}-\mathbf{XZ}-\mathbf{E}^{h}>\\ +\mu^{h}/2\|\mathbf{X}-\mathbf{XZ}-\mathbf{E}^{h}\|_{F}^{2}\\ =\arg\min_{\mathbf{Z}}\|\mathbf{Z}-\mathbf{S}^{h}\|_{F}^{2}+\mu^{h}/2\|\mathbf{X}-\mathbf{XZ}-\mathbf{E}^{h}+\mathbf{Y}^{h}_{1}/\mu^{h}\|_{F}^{2},\end{array} (12)

where 𝐒h,𝐘1h\mathbf{S}^{h},\mathbf{Y}^{h}_{1} and μh\mu^{h} are the updated variables. It could be easily verified that 𝐙h+1=(2𝐈n+μh𝐗𝐗)1(2𝐒h+μh(𝐗𝐗𝐗𝐄h)+𝐗𝐘1h)\mathbf{Z}^{h+1}=\big{(}2\mathbf{I}_{n}+\mu^{h}\mathbf{X}^{\top}\mathbf{X}\big{)}^{-1}\big{(}2\mathbf{S}^{h}+\mu^{h}(\mathbf{X}^{\top}\mathbf{X}-\mathbf{X}^{\top}\mathbf{E}^{h})+\mathbf{X}^{\top}\mathbf{Y}_{1}^{h}\big{)}, where (2𝐈n+μh𝐗𝐗)1\big{(}2\mathbf{I}_{n}+\mu^{h}\mathbf{X}^{\top}\mathbf{X}\big{)}^{-1} is the pseudo-inverse of (2𝐈n+μh𝐗𝐗)(2\mathbf{I}_{n}+\mu^{h}\mathbf{X}^{\top}\mathbf{X}).

2. Fix other variables and update 𝐒\mathbf{S}. Similar to updting 𝐙\mathbf{Z},

𝐒h+1=argmin𝐒𝐙h+1𝐒F2+γ𝐒𝐒𝐂hF2+<𝐘2h,𝐒𝐂h>+<𝐘4h,𝐒𝐃h>+μh/2(𝐒𝐂hF2+𝐒𝐃hF2)=argmin𝐒𝐙h+1𝐒F2+γ𝐒𝐒𝐂hF2+μh/2(𝐒𝐂h+𝐘2h/μhF2+𝐒𝐃h+𝐘4h/μhF2).\begin{array}[]{l}\mathbf{S}^{h+1}=\arg\min_{\mathbf{S}}\|\mathbf{Z}^{h+1}-\mathbf{S}\|_{F}^{2}+\gamma\|\mathbf{S}-\mathbf{SC}^{h}\|_{F}^{2}\\ +<\mathbf{Y}^{h}_{2},\mathbf{S}-\mathbf{C}^{h}>+<\mathbf{Y}^{h}_{4},\mathbf{S}-\mathbf{D}^{h}>\\ +\mu^{h}/2\big{(}\|\mathbf{S}-\mathbf{C}^{h}\|_{F}^{2}+\|\mathbf{S}-\mathbf{D}^{h}\|_{F}^{2}\big{)}\\ =\arg\min_{\mathbf{S}}\|\mathbf{Z}^{h+1}-\mathbf{S}\|_{F}^{2}+\gamma\|\mathbf{S}-\mathbf{SC}^{h}\|_{F}^{2}\\ +\mu^{h}/2\big{(}\|\mathbf{S}-\mathbf{C}^{h}+\mathbf{Y}^{h}_{2}/\mu^{h}\|_{F}^{2}+\|\mathbf{S}-\mathbf{D}^{h}+\mathbf{Y}^{h}_{4}/\mu^{h}\|_{F}^{2}\big{)}.\end{array} (13)

Hence, 𝐒h+1=(2𝐙h+1+μh𝐂h𝐘2h+μh𝐃h𝐘4h)((2+2μh)𝐈n+2γ(𝐈n𝐂h)(𝐈n𝐂h))1\mathbf{S}^{h+1}=\big{(}2\mathbf{Z}^{h+1}+\mu^{h}\mathbf{C}^{h}-\mathbf{Y}^{h}_{2}+\mu^{h}\mathbf{D}^{h}-\mathbf{Y}^{h}_{4}\big{)}\big{(}(2+2\mu^{h})\mathbf{I}_{n}+2\gamma(\mathbf{I}_{n}-\mathbf{C}^{h})(\mathbf{I}_{n}-\mathbf{C}^{h})^{\top}\big{)}^{-1}. Because of the non-negative and symmetric constraints on 𝐒\mathbf{S}, we further let 𝐒h+1=max(𝐒h+1,𝟎)\mathbf{S}^{h+1}=\max(\mathbf{S}^{h+1},\mathbf{0}) and 𝐒h+1=(𝐒h+1+(𝐒h+1))/2\mathbf{S}^{h+1}=\big{(}\mathbf{S}^{h+1}+(\mathbf{S}^{h+1})^{\top}\big{)}/2.

3. Fix other variables and update 𝐂\mathbf{C}. We also could find

𝐂h+1=argmin𝐂γ𝐒h+1𝐒h+1𝐂F2+<𝐘2h,𝐒h+1𝐂>+<𝐘3h,𝟏n𝐂𝟏n>+μh/2(𝐒h+1𝐂F2+𝟏n𝐂𝟏nF2)=argmin𝐂γ𝐒h+1𝐒h+1𝐂F2+μh/2(𝐒h+1𝐂+𝐘2h/μhF2+𝟏n𝐂𝟏n+𝐘3h/μhF2).\begin{array}[]{l}\mathbf{C}^{h+1}=\arg\min_{\mathbf{C}}\gamma\|\mathbf{S}^{h+1}-\mathbf{S}^{h+1}\mathbf{C}\|_{F}^{2}+<\mathbf{Y}^{h}_{2},\mathbf{S}^{h+1}-\mathbf{C}>\\ +<\mathbf{Y}^{h}_{3},\mathbf{1}_{n}^{\top}\mathbf{C}-\mathbf{1}_{n}^{\top}>+\mu^{h}/2\big{(}\|\mathbf{S}^{h+1}-\mathbf{C}\|_{F}^{2}\\ +\|\mathbf{1}_{n}^{\top}\mathbf{C}-\mathbf{1}_{n}^{\top}\|_{F}^{2}\big{)}\\ =\arg\min_{\mathbf{C}}\gamma\|\mathbf{S}^{h+1}-\mathbf{S}^{h+1}\mathbf{C}\|_{F}^{2}+\mu^{h}/2\big{(}\|\mathbf{S}^{h+1}-\mathbf{C}\\ +\mathbf{Y}^{h}_{2}/\mu^{h}\|_{F}^{2}+\|\mathbf{1}_{n}^{\top}\mathbf{C}-\mathbf{1}_{n}^{\top}+\mathbf{Y}^{h}_{3}/\mu^{h}\|_{F}^{2}\big{)}.\end{array} (14)

Then 𝐂h+1=(2γ(𝐒h+1)𝐒h+1+μh(𝐈n+𝟏n𝟏n))1(2γ(𝐒h+1)𝐒h+1+𝐘2h𝟏n𝐘3h+μh(𝐒h+1+𝟏n𝟏n))\mathbf{C}^{h+1}=\big{(}2\gamma(\mathbf{S}^{h+1})^{\top}\mathbf{S}^{h+1}+\mu^{h}(\mathbf{I}_{n}+\mathbf{1}_{n}\mathbf{1}_{n}^{\top})\big{)}^{-1}\big{(}2\gamma(\mathbf{S}^{h+1})^{\top}\mathbf{S}^{h+1}+\mathbf{Y}_{2}^{h}-\mathbf{1}_{n}\mathbf{Y}_{3}^{h}+\mu^{h}(\mathbf{S}^{h+1}+\mathbf{1}_{n}\mathbf{1}_{n}^{\top})\big{)}.

4. Fix other variables and update 𝐃\mathbf{D}. For updating 𝐃\mathbf{D}, we could get the following problem:

min𝐃<𝐘4h,𝐒h+1𝐃>+μh/2𝐒h+1𝐃F2=min𝐃𝐒h+1𝐃+𝐘4h/μhF2=min𝐃𝐃𝐌F2,s.t.Tr(𝐃)=k,\begin{array}[]{ll}\min_{\mathbf{D}}&<\mathbf{Y}_{4}^{h},\mathbf{S}^{h+1}-\mathbf{D}>+\mu^{h}/2\|\mathbf{S}^{h+1}-\mathbf{D}\|_{F}^{2}\\ =\min_{\mathbf{D}}&\|\mathbf{S}^{h+1}-\mathbf{D}+\mathbf{Y}_{4}^{h}/\mu^{h}\|_{F}^{2}\\ =\min_{\mathbf{D}}&\|\mathbf{D}-\mathbf{M}\|_{F}^{2},\\ s.t.&Tr(\mathbf{D})=k,\end{array} (15)

where 𝐌=𝐒h+1+𝐘4h/μh\mathbf{M}=\mathbf{S}^{h+1}+\mathbf{Y}_{4}^{h}/\mu^{h}. Note that the constraint is just imposed on the diagonal elements of 𝐃\mathbf{D}, hence [𝐃h+1]ij=[𝐌]ij[\mathbf{D}^{h+1}]_{ij}=[\mathbf{M}]_{ij}, if ij(i,j=1,2,,n)i\neq j(i,j=1,2,\cdots,n). Let 𝐝=diag(𝐃)\mathbf{d}=diag(\mathbf{D}) and 𝐦=diag(𝐌)\mathbf{m}=diag(\mathbf{M}), then we have

min𝐝𝐝𝐦22,s.t.𝟏n𝐝=k.\begin{array}[]{ll}\min_{\mathbf{d}}&\|\mathbf{d}-\mathbf{m}\|_{2}^{2},\\ s.t.&\mathbf{1}_{n}^{\top}\mathbf{d}=k.\end{array} (16)

This problem could be solved by any off-the-shelf quadratic programming solver. We here provide a more efficient method to achieve the solution to Problem (16). The Lagrangian function of Problem (16) is

=𝐝𝐦22η(𝟏n𝐝k),\mathcal{L}=\|\mathbf{d}-\mathbf{m}\|_{2}^{2}-\eta(\mathbf{1}_{n}^{\top}\mathbf{d}-k), (17)

where η>0\eta>0 is a Lagrange multiplier. The optimal solution 𝐝\mathbf{d} should satisfy that the derivative of Eq. (16) w.r.t. 𝐝\mathbf{d} is equal to zero, so we have

2(𝐝𝐦)η𝟏n=𝟎n.2(\mathbf{d}-\mathbf{m})-\eta\mathbf{1}_{n}=\mathbf{0}_{n}. (18)

Then for the jthj-th element of 𝐝\mathbf{d}, we have

djmjη/2=0,d_{j}-m_{j}-\eta/2=0, (19)

where djd_{j} and mjm_{j} are the jthj-th element of 𝐝\mathbf{d} and 𝐦\mathbf{m} respectively. According to the constraint 𝟏n𝐝=k\mathbf{1}_{n}\mathbf{d}=k in Problem (16), then

η=2(k𝟏n𝐦)/n.\eta=2(k-\mathbf{1}_{n}^{\top}\mathbf{m})/n. (20)

Hence,

𝐝=𝐦+𝟏n(k𝟏n𝐦)/n.\mathbf{d}=\mathbf{m}+\mathbf{1}_{n}(k-\mathbf{1}_{n}^{\top}\mathbf{m})/n. (21)

By summarizing the above computations,

𝐃h+1=𝐌+𝐃𝐢𝐚𝐠(𝟏n(k𝟏ndiag(𝐌))/n),\mathbf{D}^{h+1}=\mathbf{M}+\mathbf{Diag}\big{(}\mathbf{1}_{n}(k-\mathbf{1}_{n}^{\top}diag(\mathbf{M}))/n\big{)}, (22)

where 𝐃𝐢𝐚𝐠(𝟏n(k𝟏ndiag(𝐌))/n)\mathbf{Diag}\big{(}\mathbf{1}_{n}(k-\mathbf{1}_{n}^{\top}diag(\mathbf{M}))/n\big{)} is a diagonal matrix with its diagonal vector being 𝟏n(k𝟏ndiag(𝐌))/n\mathbf{1}_{n}(k-\mathbf{1}_{n}^{\top}diag(\mathbf{M}))/n.

5. Fix other variables and update 𝐄\mathbf{E}. From Eq. (11), it could be easily obtained as follows:

𝐄h+1=argmin𝐄λ𝐄2,1+<𝐘1h,𝐗𝐗𝐙h+1𝐄>+μh𝐗𝐗𝐙h+1𝐄F2=argmin𝐄λ𝐄2,1+μh/2𝐗𝐗𝐙h+1𝐄+𝐘1h/μhF2.\begin{array}[]{l}\mathbf{E}_{h+1}=\arg\min_{\mathbf{E}}\lambda\|\mathbf{E}\|_{2,1}+<\mathbf{Y}_{1}^{h},\mathbf{X}-\mathbf{XZ}^{h+1}-\mathbf{E}>\\ +\mu^{h}\|\mathbf{X}-\mathbf{XZ}^{h+1}-\mathbf{E}\|_{F}^{2}\\ =\arg\min_{\mathbf{E}}\lambda\|\mathbf{E}\|_{2,1}+\mu^{h}/2\|\mathbf{X}-\mathbf{XZ}^{h+1}-\mathbf{E}+\mathbf{Y}^{h}_{1}/\mu^{h}\|_{F}^{2}.\end{array} (23)

The above problem could be solved by following the Lemma presented in [10, 16].

Lemma 1. Let 𝐐=[𝐪1,𝐪2,,𝐪i,]\mathbf{Q}=[\mathbf{q}_{1},\mathbf{q}_{2},\cdots,\mathbf{q}_{i},\cdots] be a given matrix. If the optimal solution to

min𝐏α𝐏2,1+12𝐏𝐐F2\min_{\mathbf{P}}\alpha\|\mathbf{P}\|_{2,1}+\frac{1}{2}\|\mathbf{P}-\mathbf{Q}\|_{F}^{2} (24)

is 𝐏\mathbf{P}^{*}, then the ii-th column of 𝐏\mathbf{P}^{*} is

𝐏(:,i)={𝐪i2α𝐪i2𝐪i,if α<𝐪i2;0,oterwise.\mathbf{P}^{*}(:,i)=\left\{\begin{array}[]{ll}\frac{\|\mathbf{q}_{i}\|_{2}-\alpha}{\|\mathbf{q}_{i}\|_{2}}\mathbf{q}_{i},&\hbox{if $\alpha<\|\mathbf{q}_{i}\|_{2}$;}\\ 0,&\hbox{oterwise.}\end{array}\right. (25)

6. Fix other variables and update parameters. The precise updating schemes for the parameters existed in Eq. (11) are summarized as follows:

𝐘1h+1=𝐘1h+μh(𝐗𝐗𝐙h+1𝐄h+1),𝐘2h+1=𝐘2h+μh(𝐒h+1𝐂h+1),𝐘3h+1=𝐘3h+μh(𝟏n𝐂h+1𝟏n),𝐘4h+1=𝐘4h+μh(𝐒h+1𝐃h+1),μh+1=min(μmax,ρμh),\begin{array}[]{l}\mathbf{Y}_{1}^{h+1}=\mathbf{Y}^{h}_{1}+\mu^{h}(\mathbf{X}-\mathbf{XZ}^{h+1}-\mathbf{E}^{h+1}),\\ \mathbf{Y}_{2}^{h+1}=\mathbf{Y}^{h}_{2}+\mu^{h}(\mathbf{S}^{h+1}-\mathbf{C}^{h+1}),\\ \mathbf{Y}_{3}^{h+1}=\mathbf{Y}^{h}_{3}+\mu^{h}(\mathbf{1}_{n}^{\top}\mathbf{C}^{h+1}-\mathbf{1}_{n}^{\top}),\\ \mathbf{Y}_{4}^{h+1}=\mathbf{Y}^{h}_{4}+\mu^{h}(\mathbf{S}^{h+1}-\mathbf{D}^{h+1}),\\ \mu^{h+1}=\min(\mu_{max},\rho\mu^{h}),\end{array} (26)

where μmax\mu_{max} and ρ\rho are two given positive parameters.

III-C Algorithm

We summarize the algorithmic procedure of IDR in Algorithm 1. For a data set, once the solutions to IDR are obtained, we use 𝐙\mathbf{Z} and 𝐒\mathbf{S} to define two affinity graphs 𝐆1\mathbf{G}_{1} and 𝐆2\mathbf{G}_{2} as [𝐆1]ij=(|[𝐙]ij|+|[𝐙]ij|)/2[\mathbf{G}_{1}]_{ij}=\big{(}|[\mathbf{Z}]_{ij}|+|[\mathbf{Z}^{\top}]_{ij}|\big{)}/2 and [𝐆2]ij=(|[𝐒]ij|+|[𝐒]ij|)/2[\mathbf{G}_{2}]_{ij}=\big{(}|[\mathbf{S}]_{ij}|+|[\mathbf{S}^{\top}]_{ij}|\big{)}/2. Then N-cut is consequently performed on the two graphs to get two segmentation results. Finally, the best one would be chosen as the final result.

Algorithm 1 Idempotent representation (IDR)
0:    Data set 𝐗=[𝐱1,𝐱2,,𝐱n]d×n\mathbf{X}=[\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{n}]\in\mathcal{R}^{d\times n} with each column has unit l2l_{2} norm, parameters γ,λ\gamma,\lambda, the number of subspaces kk, the maximal number of iteration MaxiterMaxiter;
0:    The coefficient matrix 𝐙\mathbf{Z}^{*} and 𝐒\mathbf{S}^{*} and the noise term 𝐄\mathbf{E}^{*};
1:  Initialize the parameters, i.e., h=0,μh=106,μmax=104,ρ=1.1,ε=107h=0,\mu^{h}=10^{-6},\mu_{max}=10^{4},\rho=1.1,\varepsilon=10^{-7} and 𝐘1h=𝐘2h=𝐘3h=𝟎,𝐙h=𝐒h=𝐂h=𝐃h=𝟎\mathbf{Y}_{1}^{h}=\mathbf{Y}_{2}^{h}=\mathbf{Y}_{3}^{h}=\mathbf{0},\mathbf{Z}^{h}=\mathbf{S}^{h}=\mathbf{C}^{h}=\mathbf{D}^{h}=\mathbf{0}.
2:  while 𝐒h𝐂h>ε\|\mathbf{S}^{h}-\mathbf{C}^{h}\|_{\infty}>\varepsilon, 𝐒h𝐃h>ε\|\mathbf{S}^{h}-\mathbf{D}^{h}\|_{\infty}>\varepsilon, 𝟏n𝐂h𝟏n>ε\|\mathbf{1}_{n}^{\top}\mathbf{C}^{h}-\mathbf{1}^{\top}_{n}\|_{\infty}>\varepsilon and h<Maxiterh<Maxiter do
3:     h=h+1h=h+1;
4:     Update 𝐙h+1=(2𝐈n+λ𝐗𝐗)1(𝐒h+λ𝐗𝐗)\mathbf{Z}^{h+1}=\big{(}2\mathbf{I}_{n}+\lambda\mathbf{X}^{\top}\mathbf{X}\big{)}^{-1}\big{(}\mathbf{S}^{h}+\lambda\mathbf{X}^{\top}\mathbf{X}\big{)};
5:     Update 𝐒h+1=(2𝐙h+1+μh𝐂h𝐘2h+μh𝐃h𝐘4h)((2+2μh)𝐈n+2γ(𝐈n𝐂h)(𝐈n𝐂h))1\mathbf{S}^{h+1}=\big{(}2\mathbf{Z}^{h+1}+\mu^{h}\mathbf{C}^{h}-\mathbf{Y}^{h}_{2}+\mu^{h}\mathbf{D}^{h}-\mathbf{Y}^{h}_{4}\big{)}\big{(}(2+2\mu^{h})\mathbf{I}_{n}+2\gamma(\mathbf{I}_{n}-\mathbf{C}^{h})(\mathbf{I}_{n}-\mathbf{C}^{h})^{\top}\big{)}^{-1}. Then let 𝐒h+1=max(𝟎,𝐒h+1)\mathbf{S}^{h+1}=\max(\mathbf{0},\mathbf{S}^{h+1}) and 𝐒h+1=(𝐒h+1+(𝐒h+1))/2\mathbf{S}^{h+1}=\big{(}\mathbf{S}^{h+1}+(\mathbf{S}^{h+1})^{\top}\big{)}/2;
6:     Update 𝐂h+1=(2γ(𝐒h+1)𝐒h+1+μh(𝐈n+𝟏n𝟏n))1(2γ(𝐒h+1)𝐒h+1+𝐘2h𝟏n𝐘3h+μh(𝐒h+1+𝟏n𝟏n))\mathbf{C}^{h+1}=\big{(}2\gamma(\mathbf{S}^{h+1})^{\top}\mathbf{S}^{h+1}+\mu^{h}(\mathbf{I}_{n}+\mathbf{1}_{n}\mathbf{1}_{n}^{\top})\big{)}^{-1}\big{(}2\gamma(\mathbf{S}^{h+1})^{\top}\mathbf{S}^{h+1}+\mathbf{Y}_{2}^{h}-\mathbf{1}_{n}\mathbf{Y}_{3}^{h}+\mu^{h}(\mathbf{S}^{h+1}+\mathbf{1}_{n}\mathbf{1}_{n}^{\top})\big{)}.
7:     Update 𝐃h+1=𝐌+𝐃𝐢𝐚𝐠(𝟏n(k𝟏ndiag(𝐌)/n)\mathbf{D}^{h+1}=\mathbf{M}+\mathbf{Diag}\big{(}\mathbf{1}_{n}(k-\mathbf{1}_{n}^{\top}diag(\mathbf{M})/n\big{)}, where 𝐌=𝐒h+1+𝐘3h\mathbf{M}=\mathbf{S}^{h+1}+\mathbf{Y}_{3}^{h};
8:     Update 𝐄h+1\mathbf{E}^{h+1} by solving Problem (23);
9:     Update 𝐘1h+1,𝐘2h+1,𝐘3h+1\mathbf{Y}_{1}^{h+1},\mathbf{Y}_{2}^{h+1},\mathbf{Y}_{3}^{h+1} and μh+1\mu^{h+1} by using Eq. (26).
10:  end while
11:  return  𝐙=𝐙h,𝐒=𝐒h\mathbf{Z}^{*}=\mathbf{Z}^{h},\mathbf{S}^{*}=\mathbf{S}^{h}.

IV Further analyses

IV-A Complexity analysis

We can see that the complexity of Algorithm 1 is mainly determined by the updating of five variables 𝐙,𝐒,𝐂,𝐃,𝐄\mathbf{Z,S,C,D,E}. In each iteration, these variables all have closed form solutions. For updating 𝐙,𝐒,𝐂\mathbf{Z,S,C}, it needs to compute the pseudo-inverse of an n×nn\times n matrix, hence the computation burden is O(n3)O(n^{3}). For updating 𝐃\mathbf{D}, it takes O(n2)O(n^{2}) to compute the multiplier of an n×nn\times n matrices. And for updating 𝐄\mathbf{E} by using Lemma 1, its time cost is O(n)O(n). Hence, the time complexity of Algorithm 1 in each iteration taken together is O(n3)O(n^{3}). In our experiments, the number of iterations of Algorithm 1 are all less than 500500, hence its total complexity is O(n3)O(n^{3}).

IV-B Convergence analysis

Then we present a theoretical convergence proof of the proposed Algorithm 1.

Proposition 1: Algorithm 1 is convergent and the sequence {𝐙h,𝐒h,\{\mathbf{Z}^{h},\mathbf{S}^{h}, 𝐂h,𝐃h,𝐄h}\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h}\} generated by Algorithm 1 would convergent to a stationary point.

Proof: Algorithm 1 aims to minimize the Lagrangian function of Eq. (11) by alternately updating the variables 𝐙,𝐒,𝐂,𝐃,𝐄\mathbf{Z,S,C,D,E}. Firstly, from the updating rule of 𝐙h+1\mathbf{Z}^{h+1} in Eq. (12), we have

𝐙k+1=argmin𝐙𝔏(𝐙h,𝐒h,𝐂h,𝐃h,𝐄h).\mathbf{Z}^{k+1}=\arg\min_{\mathbf{Z}}\mathfrak{L}(\mathbf{Z}^{h},\mathbf{S}^{h},\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h}). (27)

Note that 𝔏(𝐙h,𝐒h,𝐂h,𝐃h,𝐄h)\mathfrak{L}(\mathbf{Z}^{h},\mathbf{S}^{h},\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h}) is β\beta-strongly convex w.r.t. 𝐙\mathbf{Z}. The following inequality holds:

𝔏(𝐙h+1,𝐒h,𝐂h,𝐃h,𝐄h)𝔏(𝐙h,𝐒h,𝐂h,𝐃h,𝐄h)β/2𝐙h+1𝐙hF2.\begin{array}[]{l}\mathfrak{L}(\mathbf{Z}^{h+1},\mathbf{S}^{h},\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h})\leq\mathfrak{L}(\mathbf{Z}^{h},\mathbf{S}^{h},\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h})\\ -\beta/2\|\mathbf{Z}^{h+1}-\mathbf{Z}^{h}\|_{F}^{2}.\end{array} (28)

Here we use Lemma B.5 in [37].

Secondly, according to updating schemes for the rest variables, it could be found that these variables, namely 𝐒,𝐂,𝐃,𝐄\mathbf{S,C,D,E}, have the similar properties of 𝐙\mathbf{Z}. Hence, the corresponding inequalities of the variables similar to (28) would hold. By adding these inequalities, we have

𝔏(𝐙h+1,𝐒h+1,𝐂h+1,𝐃h+1,𝐄h+1)𝔏(𝐙h,𝐒h,𝐂h,𝐃h,𝐄h)β/2(𝐙h+1𝐙hF2+𝐒h+1𝐒hF2+𝐂h+1𝐂hF2+𝐃h+1𝐃hF2+𝐄h+1𝐄hF2).\begin{array}[]{l}\mathfrak{L}(\mathbf{Z}^{h+1},\mathbf{S}^{h+1},\mathbf{C}^{h+1},\mathbf{D}^{h+1},\mathbf{E}^{h+1})\leq\mathfrak{L}(\mathbf{Z}^{h},\mathbf{S}^{h},\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h})\\ -\beta/2\Big{(}\|\mathbf{Z}^{h+1}-\mathbf{Z}^{h}\|_{F}^{2}+\|\mathbf{S}^{h+1}-\mathbf{S}^{h}\|_{F}^{2}+\|\mathbf{C}^{h+1}-\mathbf{C}^{h}\|_{F}^{2}\\ +\|\mathbf{D}^{h+1}-\mathbf{D}^{h}\|_{F}^{2}+\|\mathbf{E}^{h+1}-\mathbf{E}^{h}\|_{F}^{2}\Big{)}.\end{array} (29)

Hence, (𝐙h,𝐒h,𝐂h,𝐃h,𝐄h)\mathcal{L}(\mathbf{Z}^{h},\mathbf{S}^{h},\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h}) is monotonically decreasing and thus it is upper bounded. This implies that {𝐙h,𝐒h,\{\mathbf{Z}^{h},\mathbf{S}^{h}, 𝐂h,𝐃h,𝐄h\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h} are also bounded. Now, summing inequality (29) over h=1,2,h=1,2,\cdots, we have

k=1+β2(𝐙h+1𝐙hF2+𝐒h+1𝐒hF2+𝐂h+1𝐂hF2+𝐃h+1𝐃hF2+𝐄h+1𝐄hF2(𝐙0,𝐒0,𝐂0,𝐃0,𝐄0).\begin{array}[]{l}\sum_{k=1}^{+\infty}\frac{\beta}{2}\Big{(}\|\mathbf{Z}^{h+1}-\mathbf{Z}^{h}\|_{F}^{2}+\|\mathbf{S}^{h+1}-\mathbf{S}^{h}\|_{F}^{2}\\ +\|\mathbf{C}^{h+1}-\mathbf{C}^{h}\|_{F}^{2}+\|\mathbf{D}^{h+1}-\mathbf{D}^{h}\|_{F}^{2}+\|\mathbf{E}^{h+1}-\mathbf{E}^{h}\|_{F}^{2}\\ \leq\mathcal{L}(\mathbf{Z}^{0},\mathbf{S}^{0},\mathbf{C}^{0},\mathbf{D}^{0},\mathbf{E}^{0}\Big{)}.\end{array} (30)

This implies when h+h\rightarrow+\infty,

𝐙h+1𝐙h0,𝐒h+1𝐒h0,𝐂h+1𝐂h0,𝐃h+1𝐃h0,𝐄h+1𝐄h0.\begin{array}[]{l}\mathbf{Z}^{h+1}-\mathbf{Z}^{h}\rightarrow 0,\\ \mathbf{S}^{h+1}-\mathbf{S}^{h}\rightarrow 0,\\ \mathbf{C}^{h+1}-\mathbf{C}^{h}\rightarrow 0,\\ \mathbf{D}^{h+1}-\mathbf{D}^{h}\rightarrow 0,\\ \mathbf{E}^{h+1}-\mathbf{E}^{h}\rightarrow 0.\end{array} (31)

Moreover, according to the definition, clearly 𝔏(𝐙h,𝐒h,𝐂h,𝐃h,𝐄h)0\mathfrak{L}(\mathbf{Z}^{h},\mathbf{S}^{h},\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h})\geq 0. Therefore, the convergence of Algorithm 1 is guaranteed and the sequence {𝐙h,𝐒h,𝐂h,𝐃h,𝐄h}\{\mathbf{Z}^{h},\mathbf{S}^{h},\mathbf{C}^{h},\mathbf{D}^{h},\mathbf{E}^{h}\} would convergent to a stationary point of Eq. (10).

IV-C Comparative analysis with related algorithms

We now discuss the relationships between IDR and some related algorithms.

IV-C1 Comparative analysis with membership representation (MR)

As we mentioned in Section I, MR also proposes to learn a normalized membership matrix as a reconstruction coefficient matrix [32]. However, MR is a cascade model which consists of three steps:

Firstly, an initial coefficient matrix 𝐖\mathbf{W} is leaned by using SSC or LRR.

Secondly, a membership matrix 𝐌\mathbf{M} is constructed by solving the following problem:

min𝐌𝐖𝐖𝐌1+λ𝐌F2s.t.diag(𝐌)=𝟏n,𝐌0,𝐌𝟎,\begin{array}[]{ll}\min_{\mathbf{M}}&\|\mathbf{W}-\mathbf{W}\bigodot\mathbf{M}\|_{1}+\lambda\|\mathbf{M}\|_{F}^{2}\\ s.t.&diag(\mathbf{M})=\mathbf{1}_{n},\mathbf{M}\geq 0,\mathbf{M}\succeq\mathbf{0},\end{array} (32)

where 𝐌𝟎\mathbf{M}\succeq\mathbf{0} requires 𝐌\mathbf{M} to be a positive semi-definite, λ>0\lambda>0 is a positive parameter.

Thirdly, after 𝐌\mathbf{M} is obtained, a normalized membership matrix 𝐙\mathbf{Z} is achieved by optimizing the following problem:

min𝐙Tr(𝐙)s.t.𝟏n𝐙=𝟏n,𝐙0,𝐙𝟎,<𝐇,𝐙>c,\begin{array}[]{ll}\min_{\mathbf{Z}}&Tr(\mathbf{Z})\\ s.t.&\mathbf{1}_{n}^{\top}\mathbf{Z}=\mathbf{1}_{n}^{\top},\mathbf{Z}\geq 0,\mathbf{Z}\succeq\mathbf{0},\\ &<\mathbf{H},\mathbf{Z}>\leq c,\end{array} (33)

where 𝐇=𝟏n𝟏n𝐌\mathbf{H}=\mathbf{1}_{n}\mathbf{1}_{n}^{\top}-\mathbf{M} and c=β𝐇1/nc=\beta\|\mathbf{H}\|_{1}/n. β>0\beta>0 is a manually set constant. We could see that the symmetric constraint of 𝐙\mathbf{Z} is omitted in the above problem. Hence, the coefficient matrix found by MR may not be close to a normalized membership matrix.

Besides the computation for finding the initial coefficient matrix, Problem (32) and (33) are also needed to solve by using ALM method. Clearly, MR is very time-consuming.

Additionally, it can be seen that the performances of MR depends on the learned initial coefficient matrices. The value of the parameter in SSC or LRR will influence its performances. How to choose an initial coefficient matrix is not reported discussed in [32]. Moreover, the three hyper-parameters existed in MR will make the tuning of the parameters be difficult.

IV-C2 Comparative analysis with doubly stochastic subspace clustering (DSSC) [38]

Based on the descriptions in Section III-A, it could be seen that the normalized membership matrix obtained by IDR is a special case of doubly stochastic matrix. Recently, Lim et al. devised a doubly stochastic subspace clustering (DSSC) algorithm [38] which pursuits a doubly stochastic coefficient matrix. The objective of DSSC could be expressed as follows:

min𝐙,𝐀γ𝐙1+λ/2𝐙η𝐀F2+1/2𝐗𝐗𝐙F2s.t.diag(𝐙)=0,𝐀Ψn,\begin{array}[]{ll}\min_{\mathbf{Z,A}}&\gamma\|\mathbf{Z}\|_{1}+\lambda/2\|\mathbf{Z}-\eta\mathbf{A}\|_{F}^{2}+1/2\|\mathbf{X-XZ}\|_{F}^{2}\\ s.t.&diag(\mathbf{Z})=0,\mathbf{A}\in\Psi_{n},\end{array} (34)

where γ,λ,η>0\gamma,\lambda,\eta>0 are three parameters and 𝐀Ψn\mathbf{A}\in\Psi_{n} means 𝐀\mathbf{A} is an n×nn\times n doubly stochastic matrix. Namely, 𝐀\mathbf{A} satisfies the conditions presented in Eq. (7). By using two different strategies to solve the above problem, two different models, joint DSSC (J-DSSC) and approximation DSSC (A-DSSC), are presented. Among them, A-DSSC is a two-step algorithm which firstly uses LSR or EN to get an initial coefficient matrix and computes a doubly stochastic matrix consequently. On the other hand, the computation burden of J-DSSC is high, because in each iteration of J-DSSC, two intermediate n×nn\times n matrices should be iteratively updated by using linear alternating direction method of multipliers (ADMM) [39]. Moreover, we also could see that DSSC has three hyper-parameters which will also leads the difficulties in parameters adjustment.

IV-C3 Comparative analysis with self-representation constrained LRR (SRLRR) [40]

The idempotent constraint of coefficient matrix is firstly proposed in our previously work in [40]. The SRLRR defines the following problem:

min𝐙,𝐄𝐙+γ𝐙𝐙2F2+λ𝐄2,1s.t.𝐗=𝐗𝐙+𝐄,𝟏n𝐙=𝟏n.\begin{array}[]{ll}\min_{\mathbf{Z,E}}&\|\mathbf{Z}\|_{*}+\gamma\|\mathbf{Z}-\mathbf{Z}^{2}\|_{F}^{2}+\lambda\|\mathbf{E}\|_{2,1}\\ s.t.&\mathbf{X}=\mathbf{XZ}+\mathbf{E},\mathbf{1}_{n}^{\top}\mathbf{Z}=\mathbf{1}_{n}^{\top}.\end{array} (35)

The main problem existed in SRLRR is that we have not build solid theoretical connections between 𝐙\mathbf{Z} and a normalized membership matrix. The nuclear norm minimization and the affine constraint (i.e., 𝟏n𝐙=𝟏n\mathbf{1}_{n}^{\top}\mathbf{Z}=\mathbf{1}_{n}^{\top}) [41] in SRLRR are used to avoid 𝐙\mathbf{Z} to degenerate to 𝐈n\mathbf{I}_{n} or 𝟎n×n\mathbf{0}_{n\times n}. This is totally different from IDR.

Based on these comparisons, we could see the existing subspace clustering methods, which also aim to seek normalized membership matrices or doubly stochastic matrices, will all use certain existing regularizers of 𝐙\mathbf{Z}. IDR presents a much different method for tacking subspace clustering problems.

V Experiments

V-A Experiment setup

V-A1 Datasets

Both synthetic and real world data sets are applied in our experiments to verify the effectiveness of IDR. Four benchmark databases including Hopkins 155 motion segmentation data set [42], ORL face image database [43], AR face image database [44] and MNIST handwritten digital database111http://yann.lecun.com/exdb/mnist/ are used for evaluation.

V-A2 Comparison methods

The representative and close related algorithms such as SSC [14], LRR [16, 10], LSR [17], BDR [18], MR [32] and DSSC [38] would be used for comparison222We provide the Matlab codes of IDR, MR and DSSC on https://github.com/weilyshmtu/Learning-idempotent-representation-for-subspace-segmentation. And the Matlab codes for SSC and LRR could be found on http://www.vision.jhu.edu/code/ and http://sites.google.com/site/guangcanliu/ respectively. The Matlab codes of LSR and BDR could be found on https://canyilu.github.io/code/.. All the experiments are conducted on a Windows-based machine with an Intel i7-4790 CPU with 20-GB memory and MATLAB R2017b.

V-A3 Parameters Setting

Because the value of parameters will influence the performance of the evaluated algorithms, for each compared method, we will tune all the parameters by following the suggestions in corresponding references and retain those with the best performance on each data set. The chosen parameter settings for all algorithms are given in Table II. Especially, for MR, when SSC or LRR is used to achieve the initial coefficient matrix, the parameter corresponding to the two algorithms would be chosen in [0.001,0.01,0.1,1,5,10][0.001,0.01,0.1,1,5,10] according to the description in [32].

TABLE II: Parameters searched over for different methods.
Methods Parameters
SSC λ{0.0001,0.001,0.01,0.1,1,10,20,50,100,200,500,600,800,1000}\lambda\in\{0.0001,0.001,0.01,0.1,1,10,20,50,100,200,500,600,800,1000\}
LRR λ{0.0001,0.001,0.01,0.05,0.1,0.2,0.5,1,2,5,8,10,15,20,50}\lambda\in\{0.0001,0.001,0.01,0.05,0.1,0.2,0.5,1,2,5,8,10,15,20,50\}
LSR λ{0.0001,0.001,0.01,0.05,0.1,0.2,0.5,1,2,5,8,10,15,20,50}\lambda\in\{0.0001,0.001,0.01,0.05,0.1,0.2,0.5,1,2,5,8,10,15,20,50\}
BDR λ{0.001,0.01,0.05,0.1,0.2,0.5,1,2,3,5,8,10,15,20,50}\lambda\in\{0.001,0.01,0.05,0.1,0.2,0.5,1,2,3,5,8,10,15,20,50\},
γ{0.1,1,10,20,30,40,50,60,70,80}\gamma\in\{0.1,1,10,20,30,40,50,60,70,80\}
MR λ{0.001,0.01,0.1,1,5,10},β{0.01,0.1,1,5,10}\lambda\in\{0.001,0.01,0.1,1,5,10\},\beta\in\{0.01,0.1,1,5,10\}
DSSC(JDSSC) λ{0.01,0.25,1,25},η{0.01,0.05,0.1,0.2},γ{0,0.01}\lambda\in\{0.01,0.25,1,25\},\eta\in\{0.01,0.05,0.1,0.2\},\gamma\in\{0,0.01\}
DSSC(ADSSC) λ{0.1,1,10,25,50},η{0.0005,0.001,0.01,0.025,0.05,0.1},γ=0\lambda\in\{0.1,1,10,25,50\},\eta\in\{0.0005,0.001,0.01,0.025,0.05,0.1\},\gamma=0
IDR λ,γ{0.001,0.005,0.01,0.02,0.05,0.1,0.2,0.5,1,5,10,50,100,200}\lambda,\gamma\in\{0.001,0.005,0.01,0.02,0.05,0.1,0.2,0.5,1,5,10,50,100,200\}

V-A4 Evaluation metrics

For all the evaluated algorithms, we use the obtained coefficient matrices to construct the affinity graphs without any post-processing. For the performance evaluation, we use segmentation accuracy (SA) or segmentation error (SE), which is defined as follows:

SA=1ni=1nδ(si,f(ti)),SE=1SA,\begin{array}[]{l}SA=\frac{1}{n}\sum_{i=1}^{n}\delta(s_{i},f(t_{i})),\\ SE=1-SA,\end{array} (36)

where sis_{i} and tit_{i} represent the ground truth and the output label of the ii-th point respectively, δ(x,y)=1\delta(x,y)=1 if x=yx=y, and δ(x,y)=0\delta(x,y)=0 otherwise, and f(ti)f(t_{i}) is the best mapping function that permutes clustering labels to match the ground truth labels.

V-B Experiments on a synthetic data set

We generate 55 subspaces each of dimension d=5d=5 in an ambient space of dimension D=20D=20. We sample 5050 data points from each subspace and construct a D×250D\times 250 data matrix 𝐗\mathbf{X}. Moreover, a certain percentage p=0100%p=0-100\% of the data vectors are chosen randomly to add Gaussian noise with zero mean and variance 0.3𝐱220.3\|\mathbf{x}\|_{2}^{2}. Finally, the evaluated algorithms are used to segment the data into 55 subspaces. For a certain pp, the experiments would repeat 2020 trials. Therefore, there would be total 220220 subspace clustering tasks.

Actually, the similar experiments could be found in some existing references [16, 21]. But in these experiments the parameters of corresponding evaluated algorithm would be fixed when the algorithm is performed on each sub-database. Then by changing the parameter(s), the best results with certain parameter(s) would be finally selected. However, performing subspace clustering on sub-database should be viewed as a sole segmentation task. In our experiments, we hence will let the parameter(s) of each algorithm vary in the corresponding interval sets in Table II and record the highest segmentation accuracies of the evaluated algorithms on each sub-database. Then the mean of these highest segmentation accuracies (averaged from 20 random trials) of each algorithm versus variation of the percentage of corruption are reported in Fig. 2.

In addition, IDR and BDR can produce two coefficient matrices to compute the clustering results. By using different methods (SSC and LRR) to construct initial coefficient matrices, MR could obtain two different results. Based on different strategies, DSSC has two sub-models, namely JDSSC and ADSSC. Hence, we plot the accuracies of all the algorithms by selecting the better ones of the corresponding two results in Fig 2(a). The detailed segmentation accuracies of IDR and BDR by using two different coefficient matrices, and the results of MR based on SSC and LRR (denoted as MR-SSC and MR-LRR) as well as JDSSC and ADSSC are ploted in Fig. 2(b).

Refer to caption
Figure 2: The segmentation accuracies of each method versus variation the range of corruption. (a) The best results of the evaluated algorithms, (b) the detailed results of IDR, BDR, MR and DSSC, where BDR-Z denotes the results obtained by the reconstruction coefficient matrix 𝐙\mathbf{Z} and BDR-B indicates the results obtained by the intermediate matrix introduced in BDR problem.

From Fig. 2(a), we can see that 1) IDR constantly achieves the best results; 2) the performances of LRR, LSR, MR and DSSC are closed to each other, when the percentage of corruption is smaller than 50%50\% ; 3) when the percentage of corruption is larger than 50%50\%, MR dominates LRR, LSR and DSSC; 4) SSC is inferior to other algorithms.

From Fig. 2(b), it can be seen that 1) the results obtained by two different coefficient matrices corresponding to IDR and BDR respectively are closed to each other; 2) the performances of JDSSC and ADSSC are also similar to each other; 3) However, the results of MR-LRR are much better than those of MR-SSC. This means that the performance of MR relies on the initial coefficient matrices.

In order to show the sensitivity of IDR to its two parameters γ\gamma and λ\lambda, we report the segmentation accuracies of IDR changed with the values of parameters. The sub-databases with p=10%,50%,90%p=10\%,50\%,90\% are now used. Then the mean of segmentation accuracies against the pairs of parameters are illustrated in Fig. 3.

Refer to caption
Figure 3: The segmentation accuracy against the variation of parameters of IDR. The vertical axis in each figure denotes the subspace clustering accuracy. The three sub-figures in the left column show the segmentation accuracies obtained by 𝐙\mathbf{Z} changed with parameters, and the three sub-figures in the right column record the segmentation accuracies obtained by 𝐒\mathbf{S} changed with parameters. The first, second and third rows present the segmentation accuracies of IDR on the sub-databases with corruption percentage equals 10%,50%,90%10\%,50\%,90\% respectively.

From Fig. 3, we could see that 1) the performance of IDR is stable when the parameters varied in relative large intervals; 2) when the corruption percentage is low, IDR is insensitive to γ\gamma. However, when the corruption percentage is high, small γ\gamma and λ\lambda could help IDR to achieve good results. We believe that when a data set is clean, a normalized membership reconstruction coefficient matrix is easily to get, so the idempotent constraint could also be satisfied. However, when most data samples in the data set are corrputed, the normalized membership reconstruction coefficient matrix is difficult to obtain. Hence, in such situations, the corresponding parameter γ\gamma should be small.

V-C Experiments on Hopkins 155 data set

Hopkins155 database is a well-known benchmark database to test the performances of subspace clustering algorithms. It consists of 120 sequences of two motions and 35 sequences of three motions. Each sequence is a sole clustering task and there are 155 clustering tasks in total. The features of each sequence are extracted and tracked along with the motion in all frames, and errors are manually removed for each sequence. We illustrate the sample images from Hopkins 155 database in Fig. 4.

Refer to caption
Figure 4: The sample images from Hopkins 155 database.

We performed the experiments with the original data matrix and projected data matrix in 4s4s-dimensional subspace333ss is the number of subspaces. obtained by using principal component analysis (PCA) [36]. Then the segmentation errors (i.e., SE=1SASE=1-SA) of each evaluated algorithm are computed on each sequence.

Firstly, we collect all the best results of each algorithm obtained on 155 sequences with the parameters changing in the given intervals. And the mean, median and std. (standard variance) of the results are reported in Table III and IV. From the two tables, we could see that 1) IDR achieves best results in these experiments; 2) BDR and LSR also achieve competitive results; 3) MR-LRR and MR-SSC do not conquer their corresponding classical methods SSC and LRR. This means the post-processing on the coefficient matrices in MR may not always enhance the performance of LRR and SSC; 5) JDSSC fails to achieve satisfying results.

TABLE III: The segmentation errors (%\%) and average computation time (sec.) of different algorithms on the Hopkins 155 database with the original data points. The best results of these algorithms are emphasized in bold.
Methods Average time 2 motions 3 motions All motions
(sec.) mean median std. mean median std. mean median std.
IDR-Z 9.159.15 0.25\mathbf{0.25} 𝟎\mathbf{0} 1.15\mathbf{1.15} 1.14\mathbf{1.14} 0.200.20 2.11\mathbf{2.11} 0.45\mathbf{0.45} 𝟎\mathbf{0} 1.47\mathbf{1.47}
IDR-S 0.500.50 𝟎\mathbf{0} 1.891.89 2.232.23 0.560.56 3.493.49 0.890.89 𝟎\mathbf{0} 2.442.44
SSC 2.982.98 1.661.66 𝟎\mathbf{0} 5.135.13 5.295.29 1.461.46 7.357.35 2.482.48 𝟎\mathbf{0} 5.885.88
LRR 9.3789.378 1.151.15 𝟎\mathbf{0} 3.193.19 4.174.17 1.201.20 5.995.99 1.831.83 𝟎\mathbf{0} 4.174.17
LSR 0.03\mathbf{0.03} 0.560.56 𝟎\mathbf{0} 2.182.18 1.941.94 0.210.21 4.124.12 0.870.87 𝟎\mathbf{0} 2.792.79
BDR-B 5.505.50 0.580.58 𝟎\mathbf{0} 2.782.78 2.722.72 𝟎\mathbf{0} 4.734.73 1.061.06 𝟎\mathbf{0} 3.423.42
BDR-Z 0.60.6 𝟎\mathbf{0} 2.762.76 2.772.77 𝟎\mathbf{0} 5.105.10 1.091.09 𝟎\mathbf{0} 3.533.53
MR-SSC 41.1541.15 2.712.71 𝟎\mathbf{0} 6.566.56 9.229.22 6.056.05 9.189.18 4.334.33 0.210.21 7.787.78
MR-LRR 43.2943.29 1.391.39 𝟎\mathbf{0} 3.953.95 6.526.52 2.852.85 6.826.82 2.662.66 𝟎\mathbf{0} 5.285.28
JDSSC 16.2916.29 12.5112.51 11.4511.45 10.5410.54 24.0924.09 25.0625.06 11.6211.62 15.1315.13 14.4814.48 11.811.8
ADSSC 0.070.07 2.422.42 𝟎\mathbf{0} 5.675.67 8.748.74 5.375.37 9.659.65 3.853.85 𝟎\mathbf{0} 7.247.24
TABLE IV: The segmentation errors (%\%) and average computation time (sec.) of different algorithms on the Hopkins 155 database with the 4s4s-dimensional data points by applying PCA. The best results of these algorithms are emphasized in bold.
Methods Average time 2 motions 3 motions All motions
(sec.) mean median std. mean median std. mean median std.
IDR-Z 9.439.43 0.30\mathbf{0.30} 𝟎\mathbf{0} 1.24\mathbf{1.24} 1.20\mathbf{1.20} 𝟎\mathbf{0} 2.30\mathbf{2.30} 0.50\mathbf{0.50} 𝟎\mathbf{0} 1.581.58
IDR-S 0.490.49 𝟎\mathbf{0} 1.761.76 2.162.16 0.560.56 3.273.27 0.860.86 𝟎\mathbf{0} 2.292.29
SSC 2.292.29 1.661.66 𝟎\mathbf{0} 5.135.13 5.295.29 1.461.46 7.357.35 2.482.48 𝟎\mathbf{0} 5.885.88
LRR 10.2710.27 1.151.15 𝟎\mathbf{0} 3.193.19 4.174.17 1.201.20 5.995.99 1.831.83 𝟎\mathbf{0} 4.174.17
LSR 0.03\mathbf{0.03} 0.560.56 𝟎\mathbf{0} 2.182.18 1.941.94 0.210.21 4.124.12 0.870.87 𝟎\mathbf{0} 2.792.79
BDR-Z 4.414.41 0.650.65 𝟎\mathbf{0} 2.822.82 2.912.91 𝟎\mathbf{0} 5.125.12 1.161.16 𝟎\mathbf{0} 3.593.59
BDR-B 0.650.65 𝟎\mathbf{0} 2.902.90 2.682.68 𝟎\mathbf{0} 5.035.03 1.111.11 𝟎\mathbf{0} 3.583.58
MR-SSC 39.8939.89 4.624.62 𝟎\mathbf{0} 8.228.22 10.4410.44 10.4710.47 8.598.59 6.066.06 0.470.47 8.658.65
MR-LRR 44.4744.47 1.391.39 𝟎\mathbf{0} 3.933.93 6.726.72 2.852.85 6.936.93 2.712.71 𝟎\mathbf{0} 5.345.34
JDSSC 14.1814.18 12.5112.51 11.4511.45 10.5410.54 24.0924.09 25.0625.06 11.6211.62 15.1315.13 14.4814.48 11.811.8
ADSSC 0.070.07 2.422.42 𝟎\mathbf{0} 5.675.67 8.768.76 5.375.37 9.659.65 3.853.85 𝟎\mathbf{0} 7.247.24

Moreover, we also report the average computation time of each algorithm on the 155 motion sequences in Table III and IV. Clearly, LSR and ADSSC are much efficient than other algorithms. The average computation time of IDR is close to that of LRR. Hence the computation burden of IDR is acceptable. We could see that MR is time-consuming.

Secondly, we analyse the experimental results of each algorithm in another way. For each algorithm, we present the percentage of motions with the obtained SEs are less than or equal to a given percentage of segmentation error in Fig. 5. We can see that the segmentation errors on all motions obtained by IDR-Z are all less than 0.20.2.

Refer to caption
Figure 5: The percentage of motions versus the variation of segmentation error for each method. (a) The results obtained on the original data samples, (b) The results obtained on the projected data samples.

Finally, we also test the sensitivity of IDR to the parameters on Hopkins 155 database. For a fixed pair of (γ,λ)(\gamma,\lambda), we compute the segmentation error for all 155 segmentation tasks, then the mean of 155 segmentation accuracies could be achieved. Then by changing the values of γ\gamma and λ\lambda, we illustrate the performance of IDR against to its parameters in Fig 6.

Refer to caption
Figure 6: The segmentation accuracy against the variation of parameters of IDR on Hopkins 155 data set. Now the vertical axis in each figure denotes the subspace clustering error which is different from the experiments conducted on synthetic data sets. The two sub-figures in the left column show the segmentation errors obtained by 𝐙\mathbf{Z} changed with parameters, and the two sub-figures in the right column record the segmentation errors obtained by 𝐒\mathbf{S} changed with parameters. The first and second rows present the segmentation error of IDR on the original data sets and projected data sets respectively.

Based on Fig. 6, we still could see that IDR is insensitive to its parameters and small (γ,λ)(\gamma,\lambda) could help IDR to achieve better results.

V-D Experiments on face image databases

We now perform the experiments on two benchmark face image databases, i.e., ORL database [43] and AR database [44]. The brief information of the two databases is introduced as follows:

ORL database contains 400400 face images (without noise) of 4040 persons. Each individual has 10 different images. These images were taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses). In our experiments, each image is resized to 32×3232\times 32 pixels.

AR database consists of over 40004000 face images of 126126 individuals. For each individual, 26 pictures were taken in two sessions (separated by two weeks) and each section contains 13 images. These images include front view of faces with different expressions, illuminations and occlusions. In our experiments, each image is resized to 50×4050\times 40 pixels.

Moreover, the pixel value in each image belongs to the two databases lies in [0,255][0,255]. For efficient computation, we let each pixel value be divided by 255255, so that the pixel value of each image fell into [0,1][0,1]. This will not change the distribution of the original data sets. Some sample images from ORL and AR database are shown in Fig. 7.

Refer to caption
Figure 7: Sample images from ORL database and AR database.

We firstly randomly choose images of qq persons from the two databases. In ORL database, q{6,12,18,24,36}q\in\{6,12,18,24,36\}, and in AR database, q{4,8,12,16,20}q\in\{4,8,12,16,20\}. Then the performances of the evaluated methods are tested in these sub-databases. With the parameters varying, the highest clustering accuracy of each algorithm obtained on each sub-database is collected. These experiments are run 1010 trials, and the mean and standard variance of SAs obtained by each algorithm are reported in Table V and Table VI respectively.

TABLE V: Segmentation accuracies (%\%) ±\pm standard variances of evaluated algorithms on the sub-databases of ORL database. The best results (mean) of the algorithms are emphasized in bold.
Methods Number of persons
66 1212 1818 2424 3030 3636
IDR-Z 92.33±7.4292.33\pm 7.42 87.08±3.9187.08\pm 3.91 87.28±4.2187.28\pm 4.21 83.54±2.5683.54\pm 2.56 82.40±1.7482.40\pm 1.74 81.89±0.87\mathbf{81.89\pm 0.87}
IDR-S 94.17±6.35\mathbf{94.17\pm 6.35} 89.50±1.63\mathbf{89.50\pm 1.63} 87.39±2.6687.39\pm 2.66 83.67±2.2883.67\pm 2.28 83.03±1.37\mathbf{83.03\pm 1.37} 81.81±0.8581.81\pm 0.85
SSC 89.83±7.2289.83\pm 7.22 80.25±5.8180.25\pm 5.81 80.44±5.1280.44\pm 5.12 79.00±2.5779.00\pm 2.57 77.93±2.8877.93\pm 2.88 77.36±1.8977.36\pm 1.89
LRR 89.00±7.0489.00\pm 7.04 82.50±5.6182.50\pm 5.61 84.67±4.0884.67\pm 4.08 80.71±2.8880.71\pm 2.88 79.43±1.8579.43\pm 1.85 77.75±1.2677.75\pm 1.26
LSR 91.67±5.8891.67\pm 5.88 86.08±3.9586.08\pm 3.95 86.33±3.4786.33\pm 3.47 82.92±3.1182.92\pm 3.11 81.60±1.1481.60\pm 1.14 80.69±1.5280.69\pm 1.52
BDR-Z 90.50±7.2090.50\pm 7.20 85.67±5.0385.67\pm 5.03 86.89±3.4986.89\pm 3.49 83.38±2.5683.38\pm 2.56 80.53±1.3180.53\pm 1.31 80.36±1.0980.36\pm 1.09
BDR-B 91.83±7.1391.83\pm 7.13 86.08±2.9786.08\pm 2.97 88.28±3.22\mathbf{88.28\pm 3.22} 84.17±2.7684.17\pm 2.76 81.20±2.6681.20\pm 2.66 80.19±0.6780.19\pm 0.67
MR-SSC 91.67±6.6791.67\pm 6.67 87.17±6.5987.17\pm 6.59 86.00±3.4886.00\pm 3.48 84.46±2.22\mathbf{84.46\pm 2.22} 82.90±1.6782.90\pm 1.67 80.75±1.0680.75\pm 1.06
MR-LRR 85.00±6.5785.00\pm 6.57 86.08±3.5186.08\pm 3.51 84.94±3.9884.94\pm 3.98 81.71±3.5481.71\pm 3.54 78.23±1.7178.23\pm 1.71 78.92±0.5578.92\pm 0.55
JDSSC 88.00±7.9388.00\pm 7.93 87.83±5.1487.83\pm 5.14 86.83±3.4686.83\pm 3.46 81.75±2.2381.75\pm 2.23 77.83±1.8977.83\pm 1.89 77.36±1.0877.36\pm 1.08
ADSSC 91.17±6.2991.17\pm 6.29 85.50±4.0185.50\pm 4.01 85.83±3.4985.83\pm 3.49 82.58±2.2582.58\pm 2.25 81.10±1.3481.10\pm 1.34 80.03±0.7980.03\pm 0.79
TABLE VI: Segmentation accuracies (%\%) ±\pm standard variances of evaluated algorithms on the sub-databases of AR database. The best results (mean) of the algorithms are emphasized in bold.
Methods Number of persons
44 88 1212 1616 2020
IDR-Z 90.29±9.4490.29\pm 9.44 88.03±7.9088.03\pm 7.90 88.40±4.7988.40\pm 4.79 89.59±2.3889.59\pm 2.38 87.65±4.1687.65\pm 4.16
IDR-S 91.44±8.45\mathbf{91.44\pm 8.45} 94.18±4.67\mathbf{94.18\pm 4.67} 94.04±3.89\mathbf{94.04\pm 3.89} 93.08±3.22\mathbf{93.08\pm 3.22} 90.54±3.53\mathbf{90.54\pm 3.53}
SSC 88.27±11.2888.27\pm 11.28 82.69±9.0382.69\pm 9.03 81.92±6.6381.92\pm 6.63 79.18±4.0479.18\pm 4.04 79.25±4.3379.25\pm 4.33
LRR 86.63±10.2286.63\pm 10.22 85.53±6.8985.53\pm 6.89 87.98±6.1287.98\pm 6.12 87.28±3.6387.28\pm 3.63 86.27±3.0486.27\pm 3.04
LSR 83.65±12.2283.65\pm 12.22 85.10±7.3185.10\pm 7.31 89.78±4.6689.78\pm 4.66 86.56±3.2586.56\pm 3.25 86.04±3.9886.04\pm 3.98
BDR-Z 85.29±13.4085.29\pm 13.40 83.51±6.0383.51\pm 6.03 85.83±6.4385.83\pm 6.43 86.13±3.9486.13\pm 3.94 83.35±2.7783.35\pm 2.77
BDR-B 88.17±11.0088.17\pm 11.00 85.67±7.2885.67\pm 7.28 84.65±5.3284.65\pm 5.32 82.74±3.7782.74\pm 3.77 81.58±4.3281.58\pm 4.32
MR-SSC 88.17±13.5888.17\pm 13.58 84.23±7.9384.23\pm 7.93 82.95±7.6482.95\pm 7.64 82.12±4.2082.12\pm 4.20 79.71±3.4779.71\pm 3.47
MR-LRR 84.71±14.0384.71\pm 14.03 81.15±4.5781.15\pm 4.57 86.25±7.2386.25\pm 7.23 88.51±3.6788.51\pm 3.67 86.11±6.6886.11\pm 6.68
JDSSC 66.83±19.5666.83\pm 19.56 68.87±7.6768.87\pm 7.67 76.60±6.8376.60\pm 6.83 80.65±2.1380.65\pm 2.13 76.73±4.3076.73\pm 4.30
ADSSC 83.41±12.8683.41\pm 12.86 80.17±5.7280.17\pm 5.72 83.41±1.7283.41\pm 1.72 83.53±1.6783.53\pm 1.67 81.49±4.0381.49\pm 4.03

Clearly, the two tables show that on most cases, IDR outperforms other algorithms on the two databases. Especially on AR database, IDR gets much better results than those of other evaluated algorithms.

We also compare the computation time of all the evaluated algorithms. For a face images database, on its sub-databases with a fixed qq (number of persons), we could compute the average computation time of each algorithm. Then the computation time of each algorithm changed with qq could be illustrated in the following Fig. 8. Similar to the results obtained on Hopkins 155 databases, it could be seen that the computation time of IDR is acceptable. When qq is relatively small, the computation cost of IDR is close to that of LRR, when qq is relatively large, IDR is more efficient than LRR. However, JDSSC spends much more time than other algorithms.

Refer to caption
Figure 8: The average computation time (sec.) of each algorithm versus the number of persons. (a) the results obtained on ORL database, (b) the results obtained on AR database.

Finally, we test the convergence of IDR by using all the samples in ORL database. The residuals defined as 𝐫𝐞𝐝𝐢𝐬𝐮𝐚𝐥𝐙=𝐙h𝐙h+1F2,𝐫𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐒=𝐒h𝐒h+1F2,𝐫𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐄=𝐄h𝐄h+1F2\mathbf{redisualZ}=\|\mathbf{Z}^{h}-\mathbf{Z}^{h+1}\|_{F}^{2},\mathbf{residualS}=\|\mathbf{S}^{h}-\mathbf{S}^{h+1}\|_{F}^{2},\mathbf{residualE}=\|\mathbf{E}^{h}-\mathbf{E}^{h+1}\|_{F}^{2} of the three variables 𝐙,𝐒,𝐄\mathbf{Z},\mathbf{S},\mathbf{E} in Eq. (10). Fig. 10 plots the residuals versus the number of iterations. It can be seen that the variables 𝐙,𝐒,𝐄\mathbf{Z},\mathbf{S},\mathbf{E} could converge to a station point with a relative small number of iterations by using Algorithm 1. And when the number of iterations is larger than 200200, the residuals are closed to 0.

Refer to caption
Figure 9: The residuals of variables 𝐙,𝐒,𝐄\mathbf{Z},\mathbf{S},\mathbf{E} versus the iterations on the whole ORL database.

The performances of the evaluated algorithms on the whole ORL database are reported in Table VII. We can see that IDR still achieves best results. In Table VII, the average computation time of each algorithm with different parameters is also reported. Moreover, the sensitivity verification of IDR to its parameters is illustrated in Fig. 10. It still shows that IDR is stable and can get good results when γ\gamma is relatively small.

TABLE VII: The segmentation accuracies (%\%) and average computation time (sec.) of the evaluated algorithms on the whole ORL database. The best results of the algorithms are emphasized in bold.
Methods IDR-Z IDR-S SSC LRR LSR BDR-Z BDR-B MR-SSC MR-LRR JDSSC ADSSC
Segmentation Accuracy 81.50\mathbf{81.50} 80.7580.75 78.0078.00 76.0076.00 78.2578.25 79.5079.50 81.2581.25 80.2580.25 75.2575.25 73.7573.75 78.2578.25
Average time (sec.) 36.1936.19 16.2316.23 58.5558.55 0.06\mathbf{0.06} 26.5426.54 113.84113.84 136.78136.78 88.4388.43 0.110.11
Refer to caption
Figure 10: The segmentation accuracies of IDR against the variation of parameters on the whole ORL database.

V-E Experiments on MNIST data set

MNIST database has 10 subjects, corresponding to 1010 handwritten digits, namely ‘0’-‘99’. We first select a subset which consists of the first 100 samples of each subject’s training data set to form a sub MNIST database. And each image is resized to 28×2828\times 28 pixels. Some sample images from the database are illustrated in Fig. 11.

Refer to caption
Figure 11: Sample images from MNIST database.

Then we followed the similar methodologies used in the above experiments. Here, we randomly chose images of {2,4,6,8,10}\{2,4,6,8,10\} digits from each subjects’ training data set to build sub-databases. We also run the experiments 1010 trails and record the mean and standard variance of segmentation accuracies obtained by each algorithm in Table VIII.

TABLE VIII: Segmentation accuracies (%\%) ±\pm standard variances of evaluated algorithms on the sub-databases of MNIST database. The best results (mean) of the algorithms are emphasized in bold.
Methods Number of digits
22 44 66 88 1010
IDR-Z 98.45±1.9998.45\pm 1.99 84.92±7.3784.92\pm 7.37 73.67±4.2573.67\pm 4.25 71.26±4.13\mathbf{71.26\pm 4.13} 66.65±2.3866.65\pm 2.38
IDR-S 99.35±0.91\mathbf{99.35\pm 0.91} 85.60±9.47\mathbf{85.60\pm 9.47} 74.58±4.03\mathbf{74.58\pm 4.03} 70.97±4.3570.97\pm 4.35 68.00±2.09\mathbf{68.00\pm 2.09}
SSC 97.05±2.0397.05\pm 2.03 77.15±10.6777.15\pm 10.67 67.83±3.4867.83\pm 3.48 64.56±4.0864.56\pm 4.08 62.71±2.1562.71\pm 2.15
LRR 96.80±2.5496.80\pm 2.54 82.50±7.3682.50\pm 7.36 68.38±3.7068.38\pm 3.70 64.46±4.1064.46\pm 4.10 61.02±2.0961.02\pm 2.09
LSR 94.70±5.4294.70\pm 5.42 77.05±7.5477.05\pm 7.54 66.73±4.6766.73\pm 4.67 61.29±5.5061.29\pm 5.50 56.55±2.5356.55\pm 2.53
BDR-Z 95.15±5.2995.15\pm 5.29 76.35±6.5576.35\pm 6.55 67.87±4.8567.87\pm 4.85 60.78±3.4160.78\pm 3.41 57.79±2.1157.79\pm 2.11
BDR-B 93.20±6.3393.20\pm 6.33 74.78±7.5674.78\pm 7.56 64.72±6.8364.72\pm 6.83 56.92±3.4756.92\pm 3.47 54.21±1.6554.21\pm 1.65
MR-SSC 97.35±1.7397.35\pm 1.73 76.72±9.1776.72\pm 9.17 67.33±2.9167.33\pm 2.91 63.02±2.6663.02\pm 2.66 59.54±1.6559.54\pm 1.65
MR-LRR 96.25±2.7696.25\pm 2.76 77.40±9.3377.40\pm 9.33 66.25±3.2766.25\pm 3.27 61.46±3.5161.46\pm 3.51 57.67±2.9857.67\pm 2.98
JDSSC 97.75±1.5797.75\pm 1.57 77.38±13.5377.38\pm 13.53 68.73±5.1468.73\pm 5.14 63.96±4.3363.96\pm 4.33 62.15±1.7562.15\pm 1.75
ADSSC 95.15±4.7495.15\pm 4.74 76.13±6.9676.13\pm 6.96 65.33±4.7865.33\pm 4.78 60.24±3.9360.24\pm 3.93 56.87±2.7956.87\pm 2.79

Form Table VIII, we could find that IDR still dominates the other algorithms. Actually, IDR achieves much better results than those of other algorithms. In addition, we could see that the performances of other algorithms are closed to each other.

Moreover, we also plot the average computation time of each algorithm against the number of digits in Fig. 12(a) and show that the performances of IDR-Z and IDR-S changed with the values of parameters γ\gamma and λ\lambda in Fig. 12(b) and Fig. 12(c) respectively. For the visualization of IDR’s sensitivity, here we use 1010 sub-databases with 1010 digits.

Refer to caption
Figure 12: (a) The average computation time (sec.) of each algorithm versus the number of digits. (b) The segmentation accuracies of IDR-Z against the variation of parameters. (b) The segmentation accuracies of IDR-S against the variation of parameters.

From Fig. 12, we could conclude that 1) the computation time of IDR is much less than MR and JDSSC; 2) the computation costs of MR-SSC and MR-LRR are much larger than those of other algorithms; 2) IDR could achieve better results with small γ\gamma and λ\lambda. This coincides with the experiments provided above.

Based on all the above experiments, we could make the following summarizations: 1) IDR could get satisfying subspace clustering results on different kinds of databases; 2) Compared with the closed related algorithms, such as MR and DSSC, the computation cost of IDR is acceptable; 3) IDR is insensitive to its two parameters. However, small parameters could make IDR achieve better results.

VI Conclusions

Spectral-type subspace clustering algorithms show their excellent performances in subspace clustering tasks. The classical spectral-type methods hope to use different norms of reconstruction coefficient matrices to seek coefficient matrices satisfying intra-subspace connectivity and inter-subspace sparse. In this paper, we design an idempotent constraint for reconstruction coefficient matrices based on the proposition that reconstruction coefficient vectors also obey the self-expressiveness property. By integrating double stochastic constraints, we present an idempotent representation (IDR) method for subspace clustering. subspace clustering experiments conducted on both synthetic data sets and real world data sets verify the effectiveness and efficiency of IDR.

References

  • [1] L. Parsons, E. Haque, and H. Liu, “Subspace clustering for high dimensional data: A review,” SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 90–105, 2004.
  • [2] R. Vidal, “Subspace clustering,” IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 52–68, 2011.
  • [3] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic subspace clustering of high dimensional data for data mining applications,” in SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA, L. M. Haas and A. Tiwary, Eds.   ACM Press, 1998, pp. 94–105.
  • [4] H. Kriegel, P. Kröger, and A. Zimek, “Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering,” ACM Trans. Knowl. Discov. Data, vol. 3, no. 1, pp. 1:1–1:58, 2009.
  • [5] R. Vidal and P. Favaro, “Low rank subspace clustering (lrsc),” Pattern Recognition Letters, vol. 43, pp. 47–61, 2014.
  • [6] Y. Chu, Y. Chen, D. Yang, and M. Chen, “Reducing redundancy in subspace clustering,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 10, pp. 1432–1446, 2009.
  • [7] S. Yi, Z. He, Y. Cheung, and W. Chen, “Unified sparse subspace learning via self-contained regression,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 10, pp. 2537–2550, 2018.
  • [8] Y. Ma, H. Derksen, W. Hong, and J. Wright, “Segmentation of multivariate mixed data via lossy data coding and compression,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 9, pp. 1546–1562, 2007.
  • [9] E. Elhamifar and R. Vidal, “Sparse subspace clustering: algorithm, theory, and applications,” IEEE Trans Pattern Anal Mach Intell, vol. 35, no. 11, pp. 2765–81, 2013.
  • [10] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013.
  • [11] K. Kanatani, “Motion segmentation by subspace separation and model selection,” in Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7-14, 2001 - Volume 2.   IEEE Computer Society, 2001, pp. 586–591.
  • [12] Y. Ma, A. Y. Yang, H. Derksen, and R. M. Fossum, “Estimation of subspace arrangements with applications in modeling and segmenting mixed data,” SIAM Review, vol. 50, no. 3, pp. 413–458, 2008.
  • [13] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, 2000.
  • [14] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, Florida, USA, Jun. 2009, pp. 2790–2797.
  • [15] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
  • [16] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010,, Haifa, Israel, jun 2010, pp. 663–670.
  • [17] C.-Y. Lu, H. Min, Z.-Q. Zhao, L. Zhu, D.-S. Huang, and S. Yan, “Robust and efficient subspace segmentation via least squares regression,” in Proceedings of the 12th European Conference on Computer Vision, ECCV 2012, Florence, Italy, 2012, pp. 347–360.
  • [18] C. Lu, J. Feng, Z. Lin, T. Mei, and S. Yan, “Subspace clustering by block diagonal representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 487–501, 2019.
  • [19] H. Zhang, J. Yang, F. Shang, C. Gong, and Z. Zhang, “Lrr for subspace segmentation via tractable schatten-p norm minimization and factorization,” IEEE Transactions on Cybernetics, pp. 1–13, 2018.
  • [20] J. Xu, M. Yu, L. Shao, W. Zuo, D. Meng, L. Zhang, and D. Zhang, “Scaled simplex representation for subspace clustering,” IEEE TRANSACTIONS ON CYBERNETICS, 2019.
  • [21] C.-G. Li, C. You, and R. Vidal, “Structured sparse subspace clustering: A joint affinity learning and subspace clustering framework,” IEEE Trans Image Process, vol. 26, no. 6, pp. 2988–3001, 2017.
  • [22] Y. Panagakis and C. Kotropoulos, “Elastic net subspace clustering applied to pop/rock music structure analysis,” Pattern Recognition Letters, vol. 38, pp. 46–53, 2014.
  • [23] C. You, C. Li, D. P. Robinson, and R. Vidal, “Oracle based active set algorithm for scalable elastic net subspace clustering,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016.   IEEE Computer Society, 2016, pp. 3928–3937.
  • [24] L. Zhuang, S. Gao, J. Tang, J. Wang, Z. Lin, Y. Ma, and N. Yu, “Constructing a nonnegative low-rank and sparse graph with data-adaptive features,” IEEE Trans. Image Processing, vol. 24, no. 11, pp. 3717–3728, 2015.
  • [25] K. Tang, R. Liu, and J. Zhang, “Structure-constrained low-rank representation,” IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 25, no. 12, pp. 2167–2179, 2014.
  • [26] X. Lu, Y. Wang, and Y. Yuan, “Graph-regularized low-rank representation for destriping of hyperspectral images,” IEEE Transaction on Geoscience and Remote Sensing, vol. 51, no. 7-1, pp. 4009–4018, 2013.
  • [27] K. Tang, D. B. Dunson, Z. Su, R. Liu, J. Zhang, and J. Dong, “Subspace segmentation by dense block and sparse representation,” Neural Network, vol. 75, pp. 66–76, 2016.
  • [28] L. Wei, F. Ji, H. Liu, R. Zhou, C. Zhu, and X. Zhang, “Subspace clustering via structured sparse relation representation,” IEEE Trans Neural Network and Learn Systems, vol. PP, 2021.
  • [29] Y. Sui, G. Wang, and L. Zhang, “Sparse subspace clustering via low-rank structure propagation,” Pattern Recognition, vol. 95, pp. 261–271, 2019.
  • [30] C. Mathieu and W. Schudy, “Correlation clustering with noisy input,” in Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, M. Charikar, Ed.   SIAM, 2010, pp. 712–728.
  • [31] C. Swamy, “Correlation clustering: maximizing agreements via semidefinite programming,” in Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 11-14, 2004, J. I. Munro, Ed.   SIAM, 2004, pp. 526–527.
  • [32] M. Lee, J. Lee, H. Lee, and N. Kwak, “Membership representation for detecting block-diagonal structure in low-rank or sparse subspace clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015.   IEEE Computer Society, 2015, pp. 1648–1656.
  • [33] Z. Lin, M. Chen, and Y. Ma, “The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices,” CoRR, vol. abs/1009.5055, 2010.
  • [34] R. Zass and A. Shashua, “A unifying approach to hard and probabilistic clustering,” in 10th IEEE International Conference on Computer Vision (ICCV 2005), 17-20 October 2005, Beijing, China.   IEEE Computer Society, 2005, pp. 294–301.
  • [35] ——, “Doubly stochastic normalization for spectral clustering,” in Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006, B. Schölkopf, J. C. Platt, and T. Hofmann, Eds.   MIT Press, 2006, pp. 1569–1576.
  • [36] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2nd Edition.   Wiley, 2001.
  • [37] J. Mairal, “Optimization with first-order surrogate functions,” in Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, ser. JMLR Workshop and Conference Proceedings, vol. 28.   JMLR.org, 2013, pp. 783–791.
  • [38] D. Lim, R. Vidal, and B. D. Haeffele, “Doubly stochastic subspace clustering,” CoRR, vol. abs/2011.14859, 2020.
  • [39] S. Ma, “Alternating proximal gradient method for convex minimization,” J. Sci. Comput., vol. 68, no. 2, pp. 546–572, 2016.
  • [40] L. Wei, X. Wang, A. Wu, R. Zhou, and C. Zhu, “Robust subspace segmentation by self-representation constrained low-rank representation,” Neural Processing Letters, vol. 48, no. 3, pp. 1671–1691, 2018.
  • [41] C. You, C.-G. Li, D. P. Robinson, and R. Vidal, “Is an affine constraint needed for affine subspace clustering?” ICCV, 2019.
  • [42] R. Tron and R. Vidal, “A benchmark for the comparison of 3-d motion segmentation algorithms,” in 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA.   IEEE Computer Society, 2007.
  • [43] F. Samaria and A. Harter, “Parameterisation of a stochastic model for human face identification,” in Proceedings of Second IEEE Workshop on Applications of Computer Vision, WACV 1994, Sarasota, FL, USA, December 5-7, 1994.   IEEE, 1994, pp. 138–142.
  • [44] A. Martinez and R. Benavente, “The ar face database,” CVC Technical Report 24, 1998.