This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

SPONGE Extension

Mihai Cucuringu
mihai.cucuringu@stats.ox.ac.uk &Apoorv Vikram Singh
apoorv.singh@nyu.edu &Déborah Sulem
deborah.sulem@stats.ox.ac.uk &Hemant Tyagi 
hemant.tyagi@inria.fr
University of Oxford, Department of Statistics and Mathematical Institute. The Alan Turing Institute, London, UK. This work was supported by EPSRC grant EP/N510129/1.New York University, Department of Computer Science and Engineering. This work was done while the author was visiting the MODAL team at Inria Lille-Nord Europe.University of Oxford, Department of Statistics.Inria, Univ. Lille, CNRS, UMR 8524 - Laboratoire Paul Painlevé, F-59000.Authors are listed in alphabetical order.

Regularized spectral methods for clustering signed networks

Mihai Cucuringu
mihai.cucuringu@stats.ox.ac.uk &Apoorv Vikram Singh
apoorv.singh@nyu.edu &Déborah Sulem
deborah.sulem@stats.ox.ac.uk &Hemant Tyagi 
hemant.tyagi@inria.fr
University of Oxford, Department of Statistics and Mathematical Institute. The Alan Turing Institute, London, UK. This work was supported by EPSRC grant EP/N510129/1.New York University, Department of Computer Science and Engineering. This work was done while the author was visiting the MODAL team at Inria Lille-Nord Europe.University of Oxford, Department of Statistics.Inria, Univ. Lille, CNRS, UMR 8524 - Laboratoire Paul Painlevé, F-59000.Authors are listed in alphabetical order.
Abstract

We study the problem of kk-way clustering in signed graphs. Considerable attention in recent years has been devoted to analyzing and modeling signed graphs, where the affinity measure between nodes takes either positive or negative values. Recently, [CDGT19] proposed a spectral method, namely SPONGE (Signed Positive over Negative Generalized Eigenproblem), which casts the clustering task as a generalized eigenvalue problem optimizing a suitably defined objective function. This approach is motivated by social balance theory, where the clustering task aims to decompose a given network into disjoint groups, such that individuals within the same group are connected by as many positive edges as possible, while individuals from different groups are mainly connected by negative edges. Through extensive numerical simulations, SPONGE was shown to achieve state-of-the-art empirical performance. On the theoretical front, [CDGT19] analyzed SPONGE, as well as the popular Signed Laplacian based spectral method under the setting of a Signed Stochastic Block Model, for k=2k=2 equal-sized clusters, in the regime where the graph is moderately dense.

In this work, we build on the results in [CDGT19] on two fronts for the normalized versions of SPONGE and the Signed Laplacian. Firstly, for both algorithms, we extend the theoretical analysis in [CDGT19] to the general setting of k2k\geqslant 2 unequal-sized clusters in the moderately dense regime. Secondly, we introduce regularized versions of both methods to handle sparse graphs – a regime where standard spectral methods are known to underperform – and provide theoretical guarantees under the same setting of a Signed Stochastic Block Model. To the best of our knowledge, regularized spectral methods have so far not been considered in the setting of clustering signed graphs. We complement our theoretical results with an extensive set of numerical experiments on synthetic data.

Keywords: signed clustering, graph Laplacians, stochastic block models, spectral methods, regularization techniques, sparse graphs.

1 Introduction

Signed graphs.

The recent years have seen a significant increase in interest for analysis of signed graphs, for tasks such as clustering [CHN+14, CDGT19], link prediction [LHK10, KSSF16] and visualization [KSL+10]. Signed graphs are an increasingly popular family of undirected graphs, for which the edge weights may take both positive and negative values, thus encoding a measure of similarity or dissimilarity between the nodes. Signed social graphs have also received considerable attention to model trust relationships between entities, with positive (respectively, negative) edges encoding trust (respectively, distrust) relationships.

Clustering is arguably one of the most popular tasks in unsupervised machine learning, aiming at partitioning the node set such that the average connectivity or similarity between pairs of nodes within the same cluster is larger than that of pairs of nodes spanning different clusters. While the problem of clustering undirected unsigned graphs has been thoroughly studied for the past two decades (and to some extent, also that of clustering directed graphs in recent years), a lot less research has been undertaken on studying signed graphs.

Spectral clustering and regularization.

Spectral clustering methods have become a fundamental tool with a broad range of applications in areas including network science, machine learning and data mining [vL07]. The attractivity of spectral clustering methods stems, on one hand, from its computational scalability by leveraging state-of-the-art eigensolvers, and on the other hand, from the fact that such algorithms are amenable to a theoretical analysis under suitably defined stochastic block models that quantify robustness to noise and sparsity of the measurement graph. Furthermore, on the theoretical side, understanding the spectrum of the adjacency matrix and its Laplacians, is crucial for the development of efficient algorithms with performance guarantees, and leads to a very mathematically rich set of problems. One such example from the latter class is that of Cheeger inequalities for general graphs, which relate the dominant eigenvalues of the Laplacian to edge expansion on graphs [Chu96], extended to the setup of directed graphs [Chu05], and more recently, to the graph Connection Laplacian arising in the context of the group synchronization problem [BSS13], and higher-order Cheeger inequalities for multiway spectral clustering [LGT14]. There has been significant recent advances in theoretically analyzing spectral clustering methods in the context of stochastic block models; for a detailed survey, we refer the reader to the comprehensive recent survey of Abbe [Abb17].

In general, spectral clustering algorithms for unsigned and signed graphs typically have a common pipeline, where a suitable graph operator is considered (e.g., the graph Laplacian), its (usually kk) extremal eigenvectors are computed, and the resulting point cloud in \varmathbbRk\varmathbb{R}^{k} is clustered using a variation of the popular kk-means algorithm [RCY+11]. The main motivation for our current work stems from the lack of statistical guarantees in the above literature for the signed clustering problem, in the context of sparse graphs and large number of clusters k3k\geqslant 3. The problem of kk-way clustering in signed graphs aims to find a partition of the node set into kk disjoint clusters, such that most edges within clusters are positive, while most edges across clusters are negative, thus altogether maximizing the number of satisfied edges in the graph. Another potential formulation to consider is to minimize the number of (unsatisfied) edges violating the partitions, i.e, the number of negative edges within clusters and positive edges across clusters.

A regularization step has been introduced in the recent literature motivated by the observation that properly regularizing the adjacency matrix AA of a graph can significantly improve performance of spectral algorithms in the sparse regime. It was well known beforehand that standard spectral clustering often fails to produce meaningful results for sparse networks that exhibit strong degree heterogeneity [ACBL13, J+15]. To this end, [CCT12] proposed the regularized graph Laplacian Lτ=Dτ1/2ADτ1/2,L^{\tau}=D_{\tau}^{-1/2}AD_{\tau}^{-1/2}, where Dτ=D+τID_{\tau}=D+\tau I, for τ0\tau\geqslant 0. The spectral algorithm introduced and analyzed in [CCT12] splits the nodes into two random subsets and only relies on the subgraph induced by only one of the subsets to compute the spectral decomposition. Tai and Karl [QR13] studied the more traditional formulation of a spectral clustering algorithm that uses the spectral decomposition on the entire matrix [NJW+02], and proposed a regularized spectral clustering which they analyze. Subsequently, [JY16] provided a theoretical justification for the regularization Aτ=A+τJA_{\tau}=A+\tau J, where JJ denotes the all ones matrix, partly explaining the empirical findings of [ACBL13] that the performance of regularized spectral clustering becomes insensitive for larger values of regularization parameters, and show that such large values can lead to better results. It is this latter form of regularization that we would be leveraging in our present work, in the context of clustering signed graphs. Additional references and discussion on the regularization literature are provided in Section 1.2.

Motivation & Applications.

The recent surge of interest in analyzing signed graphs has been fueled by a very wide range of real-world applications, in the context of clustering, link prediction, and node rankings. Such social signed networks model trust relationships between users with positive (trust) and negative (distrust) edges. A number of online social services such as Epinions [epi] and Slashdot [sla] that allow users to express their opinions are naturally represented as signed social networks [LHK10]. [BSG+12] considered shopping bipartite networks that encode like and dislike preferences between users and products. Other domain specific applications include personalized rankings via signed random walks [JJSK16], node rankings and centrality measures [LFZ19], node classification [TAL16], community detection [YCL07, CWP+16], and anomaly detection, as in [KSS14] which classifies users of an online signed social network as malicious or benign. In the very active research area of synthetic data generation, generative models for signed networks inspired by Structural Balance Theory have been proposed in [DMT18]. Learning low-dimensional representations of graphs (network embeddings) have received tremendous attention in the recent machine learning literature, and graph convolutional networks-based methods have also been proposed for the setting of signed graphs, including [DMT18, LTJC20], which provide network embeddings to facilitate subsequent downstream tasks, including clustering and link prediction.

A key motivation for our line of work stems from time series clustering [ASW15], an ubiquitous task arising in many applications that consider biological gene expression data [FSK+12], economic time series that capture macroeconomic variables [Foc05], and financial time series corresponding to large baskets of instruments in the stock market [ZJGK10, PPTV06]. Driven by the clustering task, a popular approach in the literature is to consider similarity measures based on the Pearson correlation coefficient that captures linear dependence between variables and takes values in [1,1][-1,1]. By construing the correlation matrix as a weighted network whose (signed) edge weights capture the pairwise correlations, we cluster the multivariate time series by clustering the underlying signed network. To increase robustness, tests of statistical significance are often applied to individual pairwise correlations, indicating the probability of observing a correlation at least as large as the measured sample correlation, assuming the null hypothesis is true. Such a thresholding step on the pp-value associated to each individual sample correlation [HLN15], renders the correlation network as a sparse matrix, which is one of the main motivations of our current work which proposes and analyzes algorithms for handling such sparse signed networks. We refer the reader to the popular work of Smith et al. [SMSK+11] for a detailed survey and comparison of various methodologies for turning time series data into networks, where the authors explore the interplay between fMRI time series and the network generation process. Importantly, they conclude that, in general, correlation-based approaches can be quite successful at estimating the connectivity of brain networks from fMRI time series.

Paper outline.

This paper is structured as follows. The remainder of this Section 1 establishes the notation used throughout the paper, followed by a brief survey of related work in the signed clustering literature and graph regularization techniques for general graphs, along by a brief summary of our main contributions. Section 2 lays out the problem setup leading to our proposed algorithms in the context of the signed stochastic block model we subsequently analyze. Section 3 is a high-level summary of our main results across the two algorithms we consider. Section 4 contains the analysis of the proposed SPONGEsym algorithm, for both the sparse and dense regimes, for general number of clusters. Similarly, Section 5 contains the main theoretical results for the symmetric Signed Laplacian, under both sparsity regimes as well. Section 6 contains detailed numerical experiments on various synthetic data sets, showcasing the performance of our proposed algorithms as we vary the number of clusters, the relative cluster sizes, the sparsity regimes, and the regularization parameters. Finally, Section 7 is a summary and discussion of our main findings, with an outlook towards potential future directions. We defer to the Appendix additional proof details and a summary of the main technical tools used throughout.

1.1 Notation

We denote by G=(V,E)G=(V,E) a signed graph with vertex set VV, edge set EE, and adjacency matrix A{0,±1}n×nA\in\{0,\pm 1\}^{n\times n}. We will also refer to the unsigned subgraphs of positive (resp. negative) edges G+=(V,E+)G^{+}=(V,E^{+}) (resp. G=(V,E)G^{-}=(V,E^{-})) with adjacency matrices A+A^{+} (resp. AA^{-}), such that A=A+AA=A^{+}\hskip-2.84526pt-A^{-}. More precisely, Aij+=max{Aij,0}A_{ij}^{+}=\max\left\{A_{ij},0\right\} and Aij=max{Aij,0}A_{ij}^{-}=\max\left\{-A_{ij},0\right\}, with E+E=E^{+}\hskip 0.0pt\cap E^{-}\hskip-2.84526pt=\emptyset, and E+E=EE^{+}\hskip 0.0pt\cup E^{-}\hskip-2.84526pt=E. We denote by D¯=D++D\overline{D}=D^{+}+D^{-} the signed degree matrix, with the unsigned versions given by D+:=A+𝟙D^{+}:=A^{+}\mathds{1} and D:=A𝟙D^{-}:=A^{-}\mathds{1}. For a subset of nodes CVC\subset V, we denote its complement by C¯=VC\overline{C}=V\setminus C.

For a matrix M\varmathbbRm×nM\in\varmathbb R^{m\times n}, M\left\lVert M\right\rVert denotes its spectral norm M2\left\lVert M\right\rVert_{2}, i.e., its largest singular value, and MF\|M\|_{F} denotes its Frobenius norm. When MM is a n×nn\times n symmetric matrix, we denote Vk(M)V_{k}(M) be the n×kn\times k matrix whose columns are given by the eigenvectors corresponding to the kk smallest eigenvalues, and let (Vk(M))\mathcal{R}(V_{k}(M)) denote the range space of these eigenvectors. We denote the eigenvalues of MM by (λj(M))j=1n(\lambda_{j}(M))_{j=1}^{n}, with the ordering

λn(M)λn1(M)λ1(M).\lambda_{n}(M)\leqslant\lambda_{n-1}(M)\leqslant\dots\leqslant\lambda_{1}(M).

We also denote MiM_{i*} to be the ii-th row of MM. We denote 𝟙=(1,,1)\mathds{1}=(1,\dots,1) (resp. 𝟙k\mathds{1}_{k}) the all ones column vector of size nn (resp. kk) and χ1=1k𝟙k\chi_{1}=\frac{1}{\sqrt{k}}\mathds{1}_{k}. ImI_{m} denotes the square identity matrix of size mm and is shortened to II when m=nm=n. JmnJ_{mn} is the m×nm\times n matrix of all ones. Finally, for a,b0a,b\geqslant 0, we write aba\lesssim b if there exists a universal constant C>0C>0 such that aba\leqslant b. If aba\lesssim b and bab\lesssim a, then we write aba\asymp b.

1.2 Related literature on signed clustering and graph regularization techniques

Signed clustering.

There exists a very rich literature on algorithms developed to solve the kk-way clustering problem, with spectral methods playing a central role in the developments of the last two decades. Such spectral techniques optimize an objective function via the eigen-decomposition of a suitably chosen graph operator (typically a graph Laplacian) built directly from the data, in order to obtain a low-dimensional embedding (most often of dimension kk or k1k-1). A clustering algorithm such as kk-means or kk-means++ is subsequently applied in order to extract the final partition.

Kunegis et al. in [KSL+10] introduced the combinatorial Signed Laplacian L¯=D¯A\overline{L}=\overline{D}-A for the 2-way clustering problem. For heterogeneous degree distributions, normalized extensions are generally preferred, such as the random-walk Signed Laplacian Lrw¯=ID¯1A\overline{L_{rw}}=I-\overline{D}^{-1}A, and the symmetric Signed Laplacian Lsym¯=ID¯1/2AD¯1/2\overline{L_{sym}}=I-\overline{D}^{-1/2}A\overline{D}^{-1/2}. Chiang et al. [CWD12] pointed out a weakness in the Signed Laplacian objective for kk-way clustering with k>2k>2, and proposed instead a Balanced Normalized Cut (BNC) objective based on the operator LBNC¯=D¯1/2(D+A)D¯1/2\overline{L_{BNC}}=\overline{D}^{-1/2}(D^{+}-A)\overline{D}^{-1/2}. Mercado et al. [MTH16] based their clustering algorithm on a new operator called the Geometric Mean of Laplacians, and later extended this method in [MTH19] to a family of operators called the Matrix Power Mean of Laplacians. Previous work [CDGT19] by a subset of the authors of the present paper introduced the symmetric SPONGE objective using the matrix operator T=(Lsym+τ+I)1/2(Lsym++τI)(Lsym+τ+I)1/2T=(L^{-}_{sym}+\tau^{+}I)^{-1/2}(L^{+}_{sym}+\tau^{-}I)(L^{-}_{sym}+\tau^{+}I)^{-1/2}, using the unsigned normalized Laplacians Lsym±=I(D±)1/2A±(D±)1/2L_{sym}^{\pm}=I-(D^{\pm})^{-1/2}A^{\pm}(D^{\pm})^{-1/2} and regularization parameters τ+,τ>0\tau^{+},\tau^{-}>0. This work also provides theoretical guarantees for the SPONGE and Signed Laplacian algorithms, in the setting of a Signed Stochastic Block Model.

In [MTH16] and [MTH19], Mercado et al. study the eigenspaces - in expectations and in probability - of several graph operators in a certain Signed Stochastic Block Model. However, this generative model differs from the one proposed in [CDGT19] that we analyze in this work. In the former, the positive and negative adjacency matrices do not have disjoint support, contrary to the latter. Moreover, their analysis is performed in the case of equal-size clusters. We will later show in our analysis that their result for the symmetric Signed Laplacian is not applicable in our setting.

Hsieh et al. [HCD12] proposed to perform low-rank matrix completion as a preprocessing step, before clustering using the top kk eigenvectors of the completed matrix. For k=2k=2, [Cuc15] showed that signed clustering can be cast as an instance of the group synchronization [Sin11] problem over \varmathbbZ2\varmathbb{Z}_{2}, potentially with constraints given by available side information, for which spectral, semidefinite programming relaxations, and message passing algorithms have been considered. In recent work, [CPv19] proposed a formulation for the signed clustering problem that relates to graph-based diffuse interface models utilizing the Ginzburg-Landau functionals, based on an adaptation of the classic numerical Merriman-Bence-Osher (MBO) scheme for minimizing such graph-based functionals [MSB14]. We refer the reader to [Gal13] for a recent survey on clustering signed and unsigned graphs.

In a different line of work, known as correlation clustering, Bansal et al. [BBC04] considered the problem of clustering signed complete graphs, proved that it is NP-complete, and proposed two approximation algorithms with theoretical guarantees on their performance. On a related note, Demaine and Immorlica [DEFI06] studied the same problem but for arbitrary weighted graphs, and proposed an O(logn\log n) approximation algorithm based on linear programming. For correlation clustering, in contrast to kk-way clustering, the number of clusters is not given in advance, and there is no normalization with respect to size or volume.

Regularization in the sparse regime.

In many applications, real-world networks are sparse. In this context, regularization methods have increased the performance of traditional spectral clustering techniques, both for synthetic Stochastic Block Models and real data sets [CCT12, ACBL13, JY16, LLV15].

Chaudhuri et al. [CCT12] regularize the Laplacian matrix by adding a (typically small) weight τ\tau to the diagonal entries of the degree matrix Lτ=IDτ1/2ADτ1/2L_{\tau}=I-D_{\tau}^{-1/2}AD_{\tau}^{-1/2} with Dτ=D+τID_{\tau}=D+\tau I. Amini et al. [ACBL13] regularize the graph by adding a weight τ/n\tau/n to every edge, leading to the Laplacian L~τ=IDτ1/2AτDτ1/2\widetilde{L}_{\tau}=I-D_{\tau}^{-1/2}A_{\tau}D_{\tau}^{-1/2} with Aτ=A+τ/n𝟙𝟙TA_{\tau}=A+\tau/n\mathds{1}\mathds{1}^{T} and Dτ=Aτ𝟙D_{\tau}=A_{\tau}\mathds{1}. Le et al. [LLV17] proved that this technique makes the adjacency and Laplacian matrices concentrate for inhomogeneous Erdős-Rényi graphs. Zhang et al. [ZR18] showed that this technique prevents spectral clustering from overfitting through the analysis of dangling sets. In [LLV17], Le et al. propose a graph trimming method in order to reduce the degree of certain nodes. This is achieved by reducing the entries of the adjacency matrix that lead to high-degree vertices. Zhou and Amini [ZA18] added a spectral truncation step after this regularization method, and proved consistency results in the bipartite Stochastic Block Model.

Very recently, regularization methods using powers of the adjacency matrix have been introduced. Abbe et al. [ABARS20] transform the adjacency matrix into the operator Ar=𝟙{(I+A)r1}A_{r}=\mathds{1}\left\{(I+A)^{r}\geqslant 1\right\}, where the indicator function is applied entrywise. With this method, spectral clustering achieves the fundamental limit for weak recovery in the sparse setting. Very similarly, Stefan and Massoulié [SM19] transform the adjacency matrix into a distance matrix of outreach ll, which links pairs of nodes that are ll far apart w.r.t the graph distance.

1.3 Summary of our main contributions

This work extends the results obtained in [CDGT19] by a subset of the authors of our present paper. This previous work introduced the SPONGE algorithm, a principled and scalable spectral method for the signed clustering task that amounts to solving a generalized eigenvalue problem. [CDGT19] provided a theoretical analysis of both the newly introduced SPONGE algorithm and the popular Signed Laplacian-based method [KSL+10], quantifying their robustness against the sampling sparsity and noise level, under the setting of a Signed Stochastic Block Model (SSBM). These were the first such theoretical guarantees for the signed clustering problem under a suitably defined stochastic graph model. However, the analysis in [CDGT19] was restricted to the setting of k=2k=2 equally-sized clusters, which is less realistic in light of most real world applications. Furthermore, the same previous line of work considered the moderately dense regime in terms of the edge sampling probability, in particular, it operated in the setting where \varmathbbE[D¯jj]lnn\operatorname*{\varmathbb{E}}[\overline{D}_{jj}]\gtrsim\ln n, i.e., plnnnp\gtrsim\frac{\ln n}{n}. Many real world applications involve large but very sparse graphs, with p=Θ(1n)p=\Theta\left(\frac{1}{n}\right), which provides motivation for our present work.

We summarize below our main contributions, and start with the remark that the theoretical analysis in the present paper pertains to the normalized version of SPONGE (denoted as SPONGEsym) and the symmetric Signed Laplacian, while [CDGT19] analyzed only the un-normalized versions of these signed operators. The experiments reported in [CDGT19] also considered such normalized matrix operators, and reported their superior performance over their respective un-normalized versions, further providing motivational ground for our current work.

  1. (i)

    Our first main contribution is to analyze the two above-mentioned signed operators, namely SPONGEsym and the symmetric Signed Laplacian, in the general SSBM model with k2k\geqslant 2 and unequal-cluster sizes, in the moderately dense regime. In particular, we evaluate the accuracy of both signed clustering algorithms by bounding the mis-clustering rate of the entire pipeline, as achieved by the popular kk-means algorithm.

  2. (ii)

    Our second contribution is to introduce and analyze new regularized versions of both SPONGEsym and the symmetric Signed Laplacian, under the same general SSBM model, but in the sparse graph regime \varmathbbE[D¯jj]1\operatorname*{\varmathbb{E}}[\overline{D}_{jj}]\gtrsim 1, a setting where standard spectral methods are known to underperform. To the best of our knowledge, this sparsity regime has not been previously considered in the literature of signed networks; such regularized spectral methods have so far not been considered in the setting of clustering signed networks, or more broadly in the signed networks literature, where such regularization could prove useful for other related downstream tasks. One important aspect of regularization techniques is the choice of the regularization parameters. We show that our proposed algorithms can benefit from careful regularization and attain a higher level of accuracy in the sparse regime, provided that the regularization parameters scale as an adequate power of the average degree in the graph.

2 Problem setup

This section details the two algorithms for the signed clustering problem that we will analyze subsequently, namely, SPONGEsym(Symmetric Signed Positive Over Negative Generalized Eigenproblem) and the symmetric Signed Laplacian, along with their respective regularized versions.

2.1 Clustering via the SPONGEsym algorithm

The symmetric SPONGE method, denoted as SPONGEsym, aims at jointly minimizing two measures of badness in a signed clustering problem. For an unsigned graph GG and X,YVX,Y\subset V, we define the cut function CutG(X,Y):=iX,jYAij\text{Cut}_{G}(X,Y):=\sum_{i\in X,j\in Y}A_{ij}, and denote the volume of XX by VolG(X):=iXj=1nAij\text{Vol}_{G}(X):=\sum_{i\in X}\sum_{j=1}^{n}A_{ij}.

For a given cluster set CVC\subset V, CutG(C,C¯)\text{Cut}_{G}(C,\overline{C}) is the total weight of edges crossing from CC to C¯\overline{C} and VolG(C)\text{Vol}_{G}(C) is the sum of (weighted) degrees of nodes in CC. With this notation in mind and motivated by the approach of [CKC+16] in the context of constrained clustering, the symmetric SPONGE algorithm for signed clustering aims at minimizing the following two measures of badness given by CutG+(C,C¯)VolG+(C)\frac{\text{Cut}_{G^{+}}(C,\overline{C})}{\text{Vol}_{G^{+}}(C)} and (CutG(C,C¯)VolG(C))1=VolG(C)CutG(C,C¯)\Big{(}\frac{\text{Cut}_{G^{-}}(C,\overline{C})}{\text{Vol}_{G^{-}}(C)}\Big{)}^{-1}=\frac{\text{Vol}_{G^{-}}(C)}{\text{Cut}_{G^{-}}(C,\overline{C})}. To this end, we consider “merging” the objectives, and aim to solve

minCVCutG+(C,C¯)VolG+(C)+τCutG(C,C¯)VolG(C)+τ+,\min_{C\subset V}\frac{\frac{\text{Cut}_{G^{+}}(C,\overline{C})}{\text{Vol}_{G^{+}}(C)}+\tau^{-}}{\frac{\text{Cut}_{G^{-}}(C,\overline{C})}{\text{Vol}_{G^{-}}(C)}+\tau^{+}}\,,

where τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 denote trade-off parameters. For kk-way signed clustering into disjoint clusters C1,,CkC_{1},\ldots,C_{k}, we arrive at the combinatorial optimization problem

minC1,,Cki=1k(CutG+(Ci,Ci¯)VolG+(Ci)+τCutG(Ci,Ci¯)VolG(Ci)+τ+).\min_{C_{1},\ldots,C_{k}}\sum_{i=1}^{k}\left(\frac{\frac{\text{Cut}_{G^{+}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{+}}(C_{i})}+\tau^{-}}{\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}}\right)\,. (1)

Let D+,L+D^{+},L^{+} denote respectively the degree matrix and un-normalized Laplacian associated with G+G^{+}, and Lsym+=(D+)1/2L+(D+)1/2L^{+}_{sym}=(D^{+})^{-1/2}L^{+}(D^{+})^{-1/2} denote the symmetric Laplacian matrix for G+G^{+} (similarly for Lsym,D,LL^{-}_{sym},D^{-},L^{-}). For a subset CiVC_{i}\subset V, denote 𝟙Ci\mathds{1}_{C_{i}} to be the indicator vector for CiC_{i} so that (𝟙Ci)j(\mathds{1}_{C_{i}})_{j} equals 11 if jCij\in C_{i}, and is 0 otherwise. Now define the normalized indicator vector xCi\varmathbbRnx_{C_{i}}\in\varmathbb R^{n} where

xCi=(CutG(Ci,Ci¯)VolG(Ci)+τ+)1/21VolG+(Ci)(D+)1/2𝟙Ci.x_{C_{i}}=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1/2}\frac{1}{\sqrt{\text{Vol}_{G^{+}}(C_{i})}}(D^{+})^{1/2}\mathds{1}_{C_{i}}.

In light on this, one can verify that

xCixCi\displaystyle x_{C_{i}}^{\top}x_{C_{i}} =(CutG(Ci,Ci¯)VolG(Ci)+τ+)1𝟙CiD+𝟙CiVolG+(Ci)=(CutG(Ci,Ci¯)VolG(Ci)+τ+)1,\displaystyle=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1}\frac{\mathds{1}_{C_{i}}^{\top}D^{+}\mathds{1}_{C_{i}}}{\text{Vol}_{G^{+}}(C_{i})}=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1},
xCiLsym+xCi\displaystyle x_{C_{i}}^{\top}L^{+}_{sym}x_{C_{i}} =(CutG(Ci,Ci¯)VolG(Ci)+τ+)1𝟙CiL+𝟙CiVolG+(Ci)=(CutG(Ci,Ci¯)VolG(Ci)+τ+)1CutG+(Ci,Ci¯)VolG+(Ci).\displaystyle=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1}\frac{\mathds{1}_{C_{i}}^{\top}L^{+}\mathds{1}_{C_{i}}}{\text{Vol}_{G^{+}}(C_{i})}=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1}\frac{\text{Cut}_{G^{+}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{+}}(C_{i})}.

Hence (1) is equivalent to the following discrete optimization problem

minC1,,Cki=1kxCi(Lsym++τI)xCi\min_{C_{1},\ldots,C_{k}}\sum_{i=1}^{k}x_{C_{i}}^{\top}(L^{+}_{sym}+\tau^{-}I)x_{C_{i}} (2)

which is NP-Hard. A common approach to solve this problem is to drop the discreteness constraints, and allow xCix_{C_{i}} to take values in \varmathbbRn\varmathbb R^{n}. To this end, we introduce a new set of vectors z1,,zk\varmathbbRnz_{1},\ldots,z_{k}\in\varmathbb{R}^{n} such that they are orthonormal with respect to the matrix Lsym+τ+IL^{-}_{sym}+\tau^{+}I, i.e., zi(Lsym+τ+I)zi=δiiz_{i}^{\top}(L^{-}_{sym}+\tau^{+}I)z_{i^{\prime}}=\delta_{ii^{\prime}}. This leads to the continuous optimization problem

minzi(Lsym+τ+I)zi=δiii=1kzi(Lsym++τI)zi.\min_{z_{i}^{\top}(L^{-}_{sym}+\tau^{+}I)z_{i^{\prime}}=\delta_{ii^{\prime}}}\sum_{i=1}^{k}z_{i}^{\top}(L^{+}_{sym}+\tau^{-}I)z_{i}. (3)

Note that the above choice of vectors z1,,zkz_{1},...,z_{k} is not really a relaxation of (2) since xC1,,xCkx_{C_{1}},\dots,x_{C_{k}} are not necessarily (Lsym+τ+IL^{-}_{sym}+\tau^{+}I)-orthonormal, but (3) can be conveniently formulated as a suitable generalized eigenvalue problem, similar to the approach in [CKC+16]. Indeed, denoting yi=(Lsym+τ+I)1/2ziy_{i}=(L^{-}_{sym}+\tau^{+}I)^{1/2}z_{i}, and Y=[y1,,yk]\varmathbbRn×kY=[y_{1},\ldots,y_{k}]\in\varmathbb{R}^{n\times k}, (3) can be rewritten as

minYY=ITr(Y(Lsym+τ+I)1/2(Lsym++τI)(Lsym+τ+I)1/2Y),\min_{Y^{\top}Y=I}\text{Tr}\Big{(}Y^{\top}(L^{-}_{sym}+\tau^{+}I)^{-1/2}(L^{+}_{sym}+\tau^{-}I)(L^{-}_{sym}+\tau^{+}I)^{-1/2}Y\Big{)},

the solution to which is well known to be given by the smallest kk eigenvectors of

T=(Lsym+τ+I)1/2(Lsym++τI)(Lsym+τ+I)1/2,T=(L^{-}_{sym}+\tau^{+}I)^{-1/2}(L^{+}_{sym}+\tau^{-}I)(L^{-}_{sym}+\tau^{+}I)^{-1/2},

see for e.g. [ST00, Theorem 2.1]. However this is not practically viable for large scale problems, since computing TT itself is already expensive. To circumvent this issue, one can instead consider the embedding in \varmathbbRk\varmathbb{R}^{k} corresponding to the smallest kk generalized eigenvectors of the symmetric definite pair (Lsym++τI,Lsym+τ+I)(L^{+}_{sym}+\tau^{-}I,L^{-}_{sym}+\tau^{+}I). There exist many efficient solvers for solving large scale generalized eigenproblems for symmetric definite matrix pairs. In our experiments, we use the LOBPCG (Locally Optimal Block Preconditioned Conjugate Gradient method) solver introduced in [Kny01].

One can verify that (λ,v)(\lambda,v) is an eigenpair111With λ\lambda denoting its eigenvalue, and vv the corresponding eigenvector. of TT iff (λ,(Lsym+τ+I)1/2v)(\lambda,(L^{-}_{sym}+\tau^{+}I)^{-1/2}v) is a generalized eigenpair of (Lsym++τI,Lsym+τ+I)(L^{+}_{sym}+\tau^{-}I,L^{-}_{sym}+\tau^{+}I). Indeed, for symmetric matrices A,BA,B with A0A\succ 0, it holds true for w=A1/2vw=A^{-1/2}v that

A1/2BA1/2v=λvBw=λAw.A^{-1/2}BA^{-1/2}v=\lambda v\iff Bw=\lambda Aw.

Therefore, denoting Vk(T)\varmathbbRn×kV_{k}(T)\in\varmathbb{R}^{n\times k} to be the matrix consisting of the smallest kk eigenvectors of TT, and Gk(T)\varmathbbRn×kG_{k}(T)\in\varmathbb{R}^{n\times k} to be the matrix of the smallest kk generalized eigenvectors of (Lsym++τI,Lsym+τ+I)(L^{+}_{sym}+\tau^{-}I,L^{-}_{sym}+\tau^{+}I), it follows that

Gk(T)=(Lsym+τ+I)1/2Vk(T).G_{k}(T)=(L^{-}_{sym}+\tau^{+}I)^{-1/2}V_{k}(T). (4)

Hence upon computing Gk(T)G_{k}(T), we will apply a suitable clustering algorithm on the rows of Gk(T)G_{k}(T) such as the popular kk-means++ [AV07], to arrive at the final partition.

Remark 2.1.

In [CDGT19], similar arguments as above were shown for the SPONGE algorithm which led to computing the kk smallest generalized eigenvectors of the matrix pair (L++τD,L+τ+D+)(L^{+}+\tau^{-}D^{-},L^{-}+\tau^{+}D^{+}). SPONGEsym was proposed in [CDGT19] but no theoretical results were provided.

Clustering in the sparse regime.

We also provide a version of SPONGEsym for the case where GG is sparse, i.e., the graph has very few edges and is typically disconnected. In this setting, we consider a regularized version of SPONGEsym wherein a weight is added to each edge (including self-loops) of the positive and negative subgraphs, respectively. Formally, for regularization parameters γ+,γ0\gamma^{+},\gamma^{-}\geqslant 0, let us define Aγ±±:=A±+γ±n𝟙𝟙A_{\gamma^{\pm}}^{\pm}:=A^{\pm}+\frac{\gamma^{\pm}}{n}\mathds{1}\mathds{1}^{\top} to be the regularized adjacency matrices for the unsigned graphs G+,GG^{+},G^{-} respectively. Denoting Dγ±±D_{\gamma^{\pm}}^{\pm} to be the degree matrix of Aγ±±A_{\gamma^{\pm}}^{\pm}, the normalized Laplacians corresponding to Aγ±±A_{\gamma^{\pm}}^{\pm} are given by

Lsym,γ±±=I(Dγ±±)1/2Aγ±±(Dγ±±)1/2.L_{sym,\gamma^{\pm}}^{\pm}=I-(D_{\gamma^{\pm}}^{\pm})^{-1/2}A_{\gamma^{\pm}}^{\pm}(D_{\gamma^{\pm}}^{\pm})^{-1/2}.

Given the above modifications, let Vk(Tγ+,γ)\varmathbbRn×kV_{k}(T_{\gamma^{+},\gamma^{-}})\in\varmathbb{R}^{n\times k} denote the matrix consisting of the smallest kk eigenvectors of

Tγ+,γ=(Lsym,γ+τ+I)1/2(Lsym,γ+++τI)(Lsym,γ+τ+I)1/2.T_{\gamma^{+},\gamma^{-}}=(L_{sym,\gamma^{-}}^{-}+\tau^{+}I)^{-1/2}(L_{sym,\gamma^{+}}^{+}+\tau^{-}I)(L_{sym,\gamma^{-}}^{-}+\tau^{+}I)^{-1/2}\,.

For the same reasons discussed earlier, we will consider the embedding given by the smallest kk generalized eigenvectors of the matrix pencil (Lsym,γ+++τI,Lsym,γ+τ+I)(L_{sym,\gamma^{+}}^{+}+\tau^{-}I,L_{sym,\gamma^{-}}^{-}+\tau^{+}I), namely Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}}) where

Gk(Tγ+,γ)=(Lsym,γ+τ+I)1/2Vk(Tγ+,γ),G_{k}(T_{\gamma^{+},\gamma^{-}})=(L^{-}_{sym,\gamma^{-}}+\tau^{+}I)^{-1/2}V_{k}(T_{\gamma^{+},\gamma^{-}}),

as in (44). The rows of Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}}) can then be clustered using an appropriate clustering procedure, such as kk-means++.

Remark 2.2.

Regularized spectral clustering for unsigned graphs involves adding γn𝟙𝟙\frac{\gamma}{n}\mathds{1}\mathds{1}^{\top} to the adjacency matrix, followed by clustering the embedding given by the smallest kk eigenvectors of the normalized Laplacian (of the regularized adjacency), see for e.g. [ACBL13, LLV17]. To the best of our knowledge, regularized spectral clustering methods have not been explored thus far in the context of sparse signed graphs.

2.2 Clustering via the symmetric Signed Laplacian

The rationale behind the use of the (un-normalized) Signed Laplacian L¯\overline{L} for clustering is justified by Kunegis et al. in [KSL+10] using the signed ratio cut function. For CVC\subset V,

sRCut(C,C¯)=(2CutG+(C,C¯)+CutG(C,C)+CutG(C¯,C¯))(1|C|+1|C¯|).sRCut(C,\overline{C})=\left(2\text{Cut}_{G+}(C,\overline{C})+\text{Cut}_{G-}(C,C)+\text{Cut}_{G-}(\overline{C},\overline{C})\right)\left(\frac{1}{|C|}+\frac{1}{|\overline{C}|}\right). (5)

For 22-way clustering, minimizing this objective corresponds to minimizing the number of positive edges between the two classes and the number of negative edges inside each class. Moreover, (5) is equivalent to the following optimization problem

minu𝒰uL¯u,\min_{u\in\mathcal{U}}u^{\top}\overline{L}u,

where 𝒰\varmathbbRn\mathcal{U}\in\varmathbb R^{n} is the set of vectors of the form i[n],ui=±12(|C||C¯|+|C¯||C|)\forall i\in[n],u_{i}=\pm\frac{1}{2}\left(\sqrt{\frac{|C|}{|\overline{C}|}}+\sqrt{\frac{|\overline{C}|}{|C|}}\right).

However, Gallier [Gal16] noted that this equivalence does not generalize to k>2k>2, and defined a new notion of signed cut, called the signed normalized cut function. For a partition C1,,CkC_{1},\dots,C_{k} with membership matrix X{0,1}n×kX\in\{0,1\}^{n\times k},

sNCut(C1,,Ck)=i=1kCutG(Ci,Ci¯)VolG(Ci)+2CutG(Ci,Ci)VolG(Ci)=i=1k(Xi)L¯Xi(Xi)D¯Xi,sNCut(C_{1},\dots,C_{k})=\sum_{i=1}^{k}\frac{\text{Cut}_{G}(C_{i},\overline{C_{i}})}{\text{Vol}_{G}(C_{i})}+2\frac{\text{Cut}_{G-}(C_{i},C_{i})}{\text{Vol}_{G}(C_{i})}=\sum_{i=1}^{k}\frac{(X^{i})^{\top}\overline{L}X^{i}}{(X^{i})^{\top}\overline{D}X^{i}},

with XiX^{i} the ii-th column of XX. Compared to (5), this objective also penalizes the number of negative edges across two subsets, which may not be a desirable feature for signed clustering. Minimizing this function with a relaxation of the constraint that Xi{0,1}nX^{i}\in\{0,1\}^{n} leads to the following problem

minYY=ITr(YLsym¯Y).\min_{Y^{\top}Y=I}\text{Tr}\Big{(}Y^{\top}\overline{L_{sym}}Y\Big{)}.

The minimum of this problem is obtained by stacking column-wise the kk eigenvectors of Lsym¯\overline{L_{sym}} corresponding to the smallest eigenvalues, i.e. Vk(Lsym¯)V_{k}(\overline{L_{sym}}). Therefore, one can apply a clustering algorithm to the rows of the matrix Vk(Lsym¯)V_{k}(\overline{L_{sym}}) to find a partition of the set of nodes VV.

In fact, we will consider using only the k1k-1 smallest eigenvectors of Lsym¯\overline{L_{sym}} and applying the kk-means++ algorithm on the rows of Vk1(Lsym¯)V_{k-1}(\overline{L_{sym}}). This will be justified in our analysis via a stochastic generative model, namely the Signed Stochastic Block Model (SSBM), introduced in the next subsection. Under this model assumption, we will see later that the embedding given by the k1k-1 smallest eigenvectors of the symmetric Signed Laplacian of the expected graph has kk distinct rows (with two rows being equal if and only if the corresponding nodes belong to the same cluster).

Clustering in the sparse regime.

When GG is sparse, we propose a spectral clustering method based on a regularization of the signed graph, leading to a regularized Signed Laplacian. To this end, for γ+,γ0\gamma^{+},\gamma^{-}\geqslant 0, recall the regularized adjacency matrices Aγ±±A_{\gamma^{\pm}}^{\pm}, with degree matrices Dγ±±D_{\gamma^{\pm}}^{\pm}, for the unsigned graphs G+,GG^{+},G^{-} respectively. In light of this, the regularized signed adjacency and degree matrices are defined as follows

Aγ\displaystyle A_{\gamma} :=Aγ++Aγ=A+γ+γn𝟙𝟙,\displaystyle:=A_{\gamma^{+}}^{+}-A_{\gamma^{-}}^{-}=A+\frac{\gamma^{+}-\gamma^{-}}{n}\mathds{1}\mathds{1}^{\top},
D¯γ\displaystyle\overline{D}_{\gamma} :=Dγ+++Dγ=D++γ+I+D+γI=D¯+(γ++γ)I=D¯+γI,\displaystyle:=D_{\gamma^{+}}^{+}+D_{\gamma^{-}}^{-}=D^{+}+\gamma^{+}I+D^{-}+\gamma^{-}I=\overline{D}+(\gamma^{+}+\gamma^{-})I=\overline{D}+\gamma I,

with γ:=γ++γ\gamma:=\gamma^{+}+\gamma^{-}. Our regularized Signed Laplacian is the symmetric Signed Laplacian on this regularized signed graph, i.e.

Lγ:=I(D¯γ)1/2Aγ(D¯γ)1/2.L_{\gamma}:=I-(\overline{D}_{\gamma})^{-1/2}A_{\gamma}(\overline{D}_{\gamma})^{-1/2}. (6)

Similarly to the symmetric Signed Laplacian, our clustering algorithm in the sparse case finds the k1k-1 smallest eigenvectors of LγL_{\gamma} and applies the kk-means algorithm on the rows of Vk1(Lγ)V_{k-1}(L_{\gamma}).

Remark 2.3.

For the choice γ+=γ\gamma^{+}=\gamma^{-}, the regularized Laplacian becomes

Lγ\displaystyle L_{\gamma} :=I(D¯γ)1/2A(D¯γ)1/2,\displaystyle:=I-(\overline{D}_{\gamma})^{-1/2}A(\overline{D}_{\gamma})^{-1/2},

with D¯γ=D¯+(γ++γ)I\overline{D}_{\gamma}=\overline{D}+(\gamma^{+}+\gamma^{-})I. This regularization scheme is very similar to the degree-corrected normalized Laplacian defined in [CCT12].

2.3 Signed Stochastic Block Model (SSBM)

Our work theoretically analyzes the clustering performance of SPONGEsym and the symmetric Signed Laplacian algorithms under a signed random graph model, also considered previously in [CDGT19, CPv19]. We recall here its definition and parameters.

  • nn: the number of nodes in network;

  • kk: the number of planted communities;

  • pp: the probability of an edge to be present;

  • η\eta: the probability of flipping the sign of an edge;

  • C1,,CkC_{1},\dots,C_{k}: an arbitrary partition of the vertices with sizes n1,,nkn_{1},\dots,n_{k}.

We first partition the vertices (arbitrarily) into clusters C1,,CkC_{1},\dots,C_{k} where |Ci|=ni\left\lvert C_{i}\right\rvert=n_{i}. Next, we generate a noiseless measurement graph from the Erdős-Rényi model G(n,p)G(n,p), wherein each edge takes value +1+1 if both its endpoints are contained in the same cluster, and 1-1 otherwise. To model noise, we flip the sign of each edge independently with probability η[0,1/2)\eta\in[0,1/2). This results in the realization of a signed graph instance GG from the SSBM ensemble.

Let A{0,±1}n×nA\in\left\{0,\pm 1\right\}^{n\times n} denote the adjacency matrix of GG, and note that (Ajj)jj(A_{jj^{\prime}})_{j\leqslant j^{\prime}} are independent random variables. Recall that A=A+AA=A^{+}-A^{-}, where A+,A{0,1}n×nA^{+},A^{-}\in\left\{0,1\right\}^{n\times n} are the adjacency matrices of the unsigned graphs G+,GG^{+},G^{-} respectively. Then, (Ajj+)jj(A_{jj^{\prime}}^{+})_{j\leqslant j^{\prime}} are independent, and similarly (Ajj)jj(A_{jj^{\prime}}^{-})_{j\leqslant j^{\prime}} are also independent. But for given j,j[n]j,j^{\prime}\in[n] with jjj\neq j^{\prime}, Ajj+A_{jj^{\prime}}^{+} and AjjA_{jj^{\prime}}^{-} are dependent. Let di±d_{i}^{\pm} denote the degree of a node in cluster ii, for i[k]i\in[k] in the graph \varmathbbE[A±]\operatorname*{\varmathbb{E}}[A^{\pm}]. Moreover, under this model, the expected signed degree matrix is the scaled identity matrix \varmathbbED¯=d¯I\varmathbb{E}\overline{D}=\overline{d}I, with d¯=p(n1)\overline{d}=p(n-1).

Remark 2.4.

Contrary to stochastic block models for unsigned graphs, we do not require (for the purpose of detecting clusters) that the intra-cluster edge probabilities to be different from those of inter-cluster edges, since the sign of the edges already achieves this purpose implicitly. In fact, it is the noise parameter η\eta that is crucial for identifying the underlying latent cluster structure.

To formulate our theoretical results we will also need the following notations. Let si=ni/ns_{i}=n_{i}/n denote the fraction of nodes in cluster ii, with ll (resp. ss) denoting the fraction for the largest (resp. smallest) cluster. Hence, the size of the largest (resp. smallest) cluster is nlnl (resp. nsns). Following the notation in [LR15], we will denote \varmathbbMn,k\varmathbb{M}_{n,k} to be the class of “membership” matrices of size n×kn\times k, and denote Θ^\varmathbbMn,k\hat{\Theta}\in\varmathbb{M}_{n,k} to be the ground-truth membership matrix containing kk distinct indicator row-vectors (one for each cluster), i.e., for i[k]i\in[k] and j[n]j\in[n],

Θ^ji={1 if node j cluster Ci,0 otherwise.\hat{\Theta}_{ji}=\begin{cases}1&\text{ if node }j\in\text{ cluster }C_{i},\\ 0&\text{ otherwise}.\end{cases}

We also define the normalized membership matrix Θ\Theta corresponding to Θ^\hat{\Theta}, where for i[k]i\in[k] and j[n]j\in[n],

Θji={1/ni if node j cluster Ci,0 otherwise.\Theta_{ji}=\begin{cases}1/\sqrt{n_{i}}&\text{ if node }j\in\text{ cluster }C_{i},\\ 0&\text{ otherwise}.\end{cases}

3 Summary of main results

We now summarize our theoretical results for SPONGEsym and the symmetric Signed Laplacian methods, when the graph is generated from the SSBM ensemble.

3.1 Symmetric SPONGE

We begin by describing conditions under which the rows of the matrix Gk(T)G_{k}(T) approximately preserve the ground truth clustering structure. Before explaining our results, let us denote the matrix T¯\overline{T} to be the analogue of TT for the expected graph, i.e.,

T¯=(Lsym¯+τ+I)1/2(Lsym+¯+τI)(Lsym¯+τ+I)1/2,\overline{T}=(\overline{L^{-}_{sym}}+\tau^{+}I)^{-1/2}(\overline{L^{+}_{sym}}+\tau^{-}I)(\overline{L^{-}_{sym}}+\tau^{+}I)^{-1/2}\,,

where Lsym±¯=I(\varmathbbE[D±])1/2\varmathbbE[A±](\varmathbbE[D±])1/2\overline{L_{sym}^{\pm}}=I-(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}\operatorname*{\varmathbb{E}}[A^{\pm}](\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}. We first show that for suitable values of τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 (with nn large enough), the smallest kk eigenvectors of T¯\overline{T}, denoted by Vk(T¯)V_{k}(\overline{T}), are given by Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, for some k×kk\times k rotation matrix RR. Hence, the rows of Vk(T¯)V_{k}(\overline{T}) have the same clustering structure as that of Θ\Theta. Denoting Gk(T¯)\varmathbbRn×kG_{k}(\overline{T})\in\varmathbb{R}^{n\times k} to be the matrix consisting of the kk smallest generalized eigenvectors of (Lsym+¯+τI,Lsym¯+τ+I)(\overline{L^{+}_{sym}}+\tau^{-}I,\overline{L^{-}_{sym}}+\tau^{+}I), and recalling (4), we can relate Gk(T¯)G_{k}(\overline{T}) and Vk(T¯)V_{k}(\overline{T}) via

Gk(T¯)=(Lsym¯+τ+I)1/2Vk(T¯).G_{k}(\overline{T})=(\overline{L^{-}_{sym}}+\tau^{+}I)^{-1/2}V_{k}(\overline{T}). (7)

It turns out that when Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, and in light of the expression for Lsym¯+τ+I\overline{L^{-}_{sym}}+\tau^{+}I from (24), we arrive at Gk(T¯)=Θ(C)1/2RG_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R, where C0C^{-}\succ 0 is as in (18). Since (C)1/2R(C^{-})^{-1/2}R is invertible, it follows that Gk(T¯)G_{k}(\overline{T}) has kk distinct rows, with the rows that belong to the same cluster being identical. The remaining arguments revolve around deriving concentration bounds on TT¯\left\lVert T-\overline{T}\right\rVert, which imply (for pp large enough) that the distance between the column spans of Vk(T)V_{k}(T) and Vk(T¯)V_{k}(\overline{T}) is small, i.e., there exists an orthonormal matrix OO such that Vk(T)Vk(T¯)O\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert is small. Finally, the expressions in (4) and (7) altogether imply that Gk(T)Gk(T¯)O\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert is small, which is an indication that the rows of Gk(T)G_{k}(T) approximately preserve the clustering structure encoded in Θ\Theta.

The above discussion is summarized in the following theorem, which is our first main result for SPONGEsym in the moderately dense regime.

Theorem 3.1 (Restating Theorem 4.13; Eigenspace alignment of SPONGEsym in the dense case).

Assuming nmax{2(1η)s(12η),2η(1l)(1η)}n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}, suppose that τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 are chosen to satisfy

τ+>16ηβs(12η),τ<β2(s(12η)s(12η)+2η)min{14(1β),τ+8}\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\quad\quad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}

where β,η\beta,\eta satisfy one of the following conditions

  1. 1.

    β=4ηs(12η)+4η\beta=\frac{4\eta}{s(1-2\eta)+4\eta} and 0<η<120<\eta<\frac{1}{2}, or

  2. 2.

    β=12\beta=\frac{1}{2} and ηs2s+4\eta\leqslant\frac{s}{2s+4}.

Then Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R and Gk(T¯)=Θ(C)1/2RG_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R, where RR is a rotation matrix, and C0C^{-}\succ 0 is as defined in (18). Moreover, for any ε,δ(0,1)\varepsilon,\delta\in(0,1), there exists a constant c~ε>0\widetilde{c}_{\varepsilon}>0 such that the following is true. If pp satisfies

pmax{c~εC2(s,η,l),256C14(τ+,τ)(2+τ+)4δ4(1+τ)4(1β)4C2(s,η,l),81(1l)δ4}ln(4n/ε)np\geqslant\max\left\{\widetilde{c}_{\varepsilon}C_{2}(s,\eta,l),\frac{256C_{1}^{4}(\tau^{+},\tau^{-})(2+\tau^{+})^{4}}{\delta^{4}(1+\tau^{-})^{4}(1-\beta)^{4}}C_{2}(s,\eta,l),\frac{81}{(1-l)\delta^{4}}\right\}\frac{\ln(4n/\varepsilon)}{n}

with C1(),C2()C_{1}(\cdot),C_{2}(\cdot) as in (45), then with probability at least 12ε1-2\varepsilon, there exists an orthogonal matrix O\varmathbbRk×kO\in\varmathbb R^{k\times k} such that

Vk(T)Vk(T¯)Oδ,andGk(T)Gk(T¯)Oδτ++δ(τ+)2.\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert\leqslant\delta,\qquad\mbox{and}\qquad\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\frac{\delta}{(\tau^{+})^{2}}.

Let us now interpret the scaling of the terms n,p,τ+n,p,\tau^{+} and τ\tau^{-} in Theorem 3.1, and provide some intuition.

  1. 1.

    In general, when no assumption is made on the noise level η\eta, we have β=4ηs(12η)+4η\beta=\frac{4\eta}{s(1-2\eta)+4\eta} and the requirement on nn is nmax{1s(12η),η1l}n\gtrsim\max\left\{\frac{1}{s(1-2\eta)},\frac{\eta}{1-l}\right\}. Then a sufficient set of conditions on τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 are

    τ+1+ηs(12η),τηs(12η)+2η.\tau^{+}\gtrsim 1+\frac{\eta}{s(1-2\eta)},\quad\tau^{-}\lesssim\frac{\eta}{s(1-2\eta)+2\eta}. (8)

    Moreover, we see from (45) that C1(τ+,τ)1/τ+C_{1}(\tau^{+},\tau^{-})\lesssim 1/\tau^{+}, and thus (2+τ+)C1(τ+,τ)1+τ1\frac{(2+\tau^{+})C_{1}(\tau^{+},\tau^{-})}{1+\tau^{-}}\lesssim 1. Hence, a sufficient condition on pp is

    p1δ4(1+ηs(12η))4C2(s,η,l)lnnn.p\gtrsim\frac{1}{\delta^{4}}\left(1+\frac{\eta}{s(1-2\eta)}\right)^{4}C_{2}(s,\eta,l)\frac{\ln n}{n}.
  2. 2.

    In the “low-noise” regime where ηs2s+4\eta\leqslant\frac{s}{2s+4}, the condition on τ\tau^{-} in (8) becomes strict, especially as η0\eta\rightarrow 0. In this regime, the second condition in Theorem 3.1 allows for a wider range of values for τ\tau^{-}; in particular, the following set of conditions suffice

    τ+1,τs(12η)s(12η)+2η.\tau^{+}\gtrsim 1,\quad\tau^{-}\lesssim\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}.

    Moreover, we then obtain that the condition p1δ4C2(s,η,l)lnnnp\gtrsim\frac{1}{\delta^{4}}C_{2}(s,\eta,l)\frac{\ln n}{n} is sufficient.

  3. 3.

    When τ+\tau^{+}\rightarrow\infty, then Gk(T)Gk(T¯)O0\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert\rightarrow 0, which might lead one to believe that the clustering performance improves accordingly. This is not the case however, since when τ+\tau^{+} is large, then Gk(T)1τ+Vk(T)G_{k}(T)\approx\frac{1}{\sqrt{\tau^{+}}}V_{k}(T) and Gk(T¯)1τ+Vk(T¯)G_{k}(\overline{T})\approx\frac{1}{\sqrt{\tau^{+}}}V_{k}(\overline{T}), which means that clustering the rows of Gk(T)G_{k}(T) (resp. Gk(T¯)G_{k}(\overline{T})) is roughly equivalent to clustering the rows of Vk(T)V_{k}(T) (resp. Vk(T¯)V_{k}(\overline{T})). Moreover, note that for large τ+\tau^{+}, we have T1τ+(Lsym++τI)T\approx\frac{1}{\tau^{+}}(L^{+}_{sym}+\tau^{-}I) and T¯1τ+(Lsym+¯+τI)\overline{T}\approx\frac{1}{\tau^{+}}(\overline{L^{+}_{sym}}+\tau^{-}I) and thus the negative subgraph has no effect on the clustering performance.

SPONGEsym in the sparse regime.

Notice that the above theorem required the sparsity parameter p=Ω(lnn/n)p=\Omega(\ln n/n), when nn is large enough. This condition on pp is essentially required to show concentration bounds on Lsym±Lsym±¯\left\lVert L^{\pm}_{sym}-\overline{L^{\pm}_{sym}}\right\rVert in Lemma 4.11, which in turn implies a concentration bound on TT¯\left\lVert T-\overline{T}\right\rVert (see Lemma 4.12). However, in the sparse regime pp is of the order o(lnn)/no(\ln n)/n, and thus Lemma 4.11 does not apply in this setting. In fact, it is not difficult to see that the matrices Lsym±L^{\pm}_{sym} will not concentrate222See for e.g., [LLV17]. around Lsym±¯\overline{L^{\pm}_{sym}} in the sparse regime. On the other hand, by relying on a recent result in [LLV17, Theorem 4.1] on the concentration of the normalized Laplacian of regularized adjacency matrices of inhomogeneous Erdős-Rényi graphs in the sparse regime (see Theorem 4.15), we show concentration bounds on Lsym,γ++Lsym+¯\left\lVert L^{+}_{sym,\gamma^{+}}-\overline{L^{+}_{sym}}\right\rVert and Lsym,γLsym¯\left\lVert L^{-}_{sym,\gamma^{-}}-\overline{L^{-}_{sym}}\right\rVert, which hold when p1/np\gtrsim 1/n and γ+,γ(np)6/7\gamma^{+},\gamma^{-}\asymp(np)^{6/7} (see Lemma 4.16). As before, these concentration bounds can then be shown to imply a concentration bound on Tγ+,γT¯\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert (see Lemma 4.17). Other than these technical differences, the remainder of the arguments follow the same structure as in the proof of Theorem 3.1, thus leading to the following result in the sparse regime.

Theorem 3.2 (Restating Theorem 4.18 ).

Assuming nmax{2(1η)s(12η),2η(1η)(1l)}n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-\eta)(1-l)}\right\}, suppose τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 are chosen to satisfy

τ+>16ηβs(12η),τ<β2(s(12η)s(12η)+2η)min{14(1β),τ+8}\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\qquad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}

where β,η\beta,\eta satisfy one of the following conditions

  1. 1.

    β=4ηs(12η)+4η\beta=\frac{4\eta}{s(1-2\eta)+4\eta} and 0<η<120<\eta<\frac{1}{2}, or

  2. 2.

    β=12\beta=\frac{1}{2} and ηs2s+4\eta\leqslant\frac{s}{2s+4}.

Then Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R and Gk(T¯)=Θ(C)1/2RG_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R, where RR is a rotation matrix, and C0C^{-}\succ 0 is as defined in (18). Moreover, there exists a constant C>0C>0 such that for r1r\geqslant 1 and δ(0,1)\delta\in(0,1), if pp satisfies

pmax{1,(4C1(τ+,τ)(2+τ+)3(τ+)2(1β)(1+τ))28}C414(r,s,η,l)δ28(1η)n,p\geqslant\max\left\{1,\left(\frac{4C_{1}(\tau^{+},\tau^{-})(2+\tau^{+})}{3(\tau^{+})^{2}(1-\beta)(1+\tau^{-})}\right)^{28}\right\}\frac{C_{4}^{14}(r,s,\eta,l)}{\delta^{28}(1-\eta)n},

and γ+,γ=[np(1η)]6/7\gamma^{+},\gamma^{-}=[np(1-\eta)]^{6/7}, then with probability at least 12er1-2e^{-r}, there exists a rotation O\varmathbbRk×kO\in\varmathbb R^{k\times k} so that

Vk(Tγ+,γ)Vk(T¯)Oδ,andGk(Tγ+,γ)Gk(T¯)Oδτ++δ(τ+)2.\left\lVert V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O\right\rVert\leqslant\delta,\qquad\mbox{and}\qquad\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\frac{\delta}{(\tau^{+})^{2}}.

Here, C4(r,s,η,l):=25/2Cr2+32C2(s,η,l)C_{4}(r,s,\eta,l):=2^{5/2}Cr^{2}+3\sqrt{2C_{2}(s,\eta,l)}, with C2(s,η,l)C_{2}(s,\eta,l) as defined in (45).

The following remarks are in order.

  1. 1.

    It is clear that γ+,γ\gamma^{+},\gamma^{-} can neither be too small (since this would imply lack of concentration), nor too large (since this would destroy the latent geometries of G+,GG^{+},G^{-}). The choice γ+,γ(np)6/7\gamma^{+},\gamma^{-}\asymp(np)^{6/7} provides a trade-off, and leads to the bounds Lsym,γ++Lsym+¯,Lsym,γLsym¯=O((np)1/14)\left\lVert L^{+}_{sym,\gamma^{+}}-\overline{L^{+}_{sym}}\right\rVert,\left\lVert L^{-}_{sym,\gamma^{-}}-\overline{L^{-}_{sym}}\right\rVert=O((np)^{-1/14}) when p1/np\gtrsim 1/n (see Lemma 4.16).

  2. 2.

    In general, for η(0,1/2)\eta\in(0,1/2), it suffices that τ+,τ\tau^{+},\tau^{-} satisfy (8) and nmax{1s(12η),η1l}n\gtrsim\max\left\{\frac{1}{s(1-2\eta)},\frac{\eta}{1-l}\right\}. As discussed earlier, (2+τ+)C1(τ+,τ)1+τ1\frac{(2+\tau^{+})C_{1}(\tau^{+},\tau^{-})}{1+\tau^{-}}\lesssim 1, and hence it suffices that pC414(r,s,η,l)δ28np\gtrsim\frac{C_{4}^{14}(r,s,\eta,l)}{\delta^{28}n}.

Mis-clustering error bounds.

Thus far, our analysis has shown that under suitable conditions on n,p,τ+n,p,\tau^{+} and τ\tau^{-}, the matrix Gk(T)G_{k}(T) (or Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}}) in the sparse regime) is close to Gk(T¯)OG_{k}(\overline{T})O for some rotation OO, with the rows of Gk(T¯)G_{k}(\overline{T}) preserving the ground truth clustering. This suggests that by applying the kk-means clustering algorithm on the rows of Gk(T)G_{k}(T) (or Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}})) one should be able to approximately recover the underlying communities. However, the kk-means problem for clustering points in \varmathbbRd\varmathbb R^{d} is known to be NP-Hard in general, even for k=2k=2 or d=2d=2 [ADHP09, Das08, MNV12]. On the other hand, there exist efficient (1+ξ)(1+\xi)-approximation algorithms (for ξ>0\xi>0), such as, for e.g., the algorithm of Kumar et al. [KSS04] which has a running time of O(2(k/ξ)O(1)nd)O(2^{(k/\xi)^{O(1)}}nd).

Using standard tools [LR15, Lemma 5.1], we can bound the mis-clustering error when a (1+ξ)(1+\xi)-approximate kk-means algorithm is applied on the rows of Gk(T)G_{k}(T) (or Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}})), provided the estimation error bound δ\delta is small enough. In the following theorem, the sets SiS_{i}, i=1,,ki=1,\ldots,k contain those vertices in CiC_{i} for which we cannot guarantee correct clustering.

Theorem 3.3 (Re-Stating Theorem 4.20).

Under the notation and assumptions of Theorem 3.1, let (Θ~,X~)\varmathbbMn×k×\varmathbbRk×k(\tilde{\Theta},\tilde{X})\in\varmathbb{M}_{n\times k}\times\varmathbb R^{k\times k} be a (1+ξ)(1+\xi)-approximate solution to the kk-means problem minΘ\varmathbbMn×k,X\varmathbbRk×kΘXGk(T)F2\min_{\Theta\in\varmathbb{M}_{n\times k},X\in\varmathbb R^{k\times k}}\left\lVert\Theta X-G_{k}(T)\right\rVert_{F}^{2}. Denoting

Si={jCi:(Θ~X~)j(Θ(C)1/2RO)j12ni(τ++21l)}S_{i}=\left\{j\in C_{i}\ :\ \left\lVert(\tilde{\Theta}\tilde{X})_{j*}-(\Theta(C^{-})^{-1/2}RO)_{j*}\right\rVert\geqslant\frac{1}{2\sqrt{n_{i}(\tau^{+}+\frac{2}{1-l})}}\right\}

it holds with probability at least 12ε1-2\varepsilon that

i=1k|Si|niδ2(64+32ξ)k(τ++21l)((τ+)3+1(τ+)4).\sum_{i=1}^{k}\frac{\left\lvert S_{i}\right\rvert}{n_{i}}\leqslant\delta^{2}{(64+32\xi)k}\left(\tau^{+}+\frac{2}{1-l}\right)\left(\frac{(\tau^{+})^{3}+1}{(\tau^{+})^{4}}\right). (9)

In particular, if δ\delta satisfies

δ<(τ+)2(64+32ξ)k(τ++21l)((τ+)3+1)\delta<\frac{(\tau^{+})^{2}}{\sqrt{(64+32\xi)k(\tau^{+}+\frac{2}{1-l})((\tau^{+})^{3}+1)}}

then there exists a k×kk\times k permutation matrix π\pi such that Θ~G=Θ^Gπ\tilde{\Theta}_{G}=\hat{\Theta}_{G}\pi, where G=i=1k(CiSi)G=\cup_{i=1}^{k}(C_{i}\setminus S_{i}).

In the sparse regime, the above statement holds under the notation and assumptions of Theorem 3.2 with Gk(T)G_{k}(T) replaced with Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}}), and with probability at least 12er1-2e^{-r}.

We remark that when τ+\tau^{+}\rightarrow\infty, the bound on δ\delta becomes independent of τ+\tau^{+} and is of the form δ1k\delta\lesssim\frac{1}{\sqrt{k}}. This is also true for the mis-clustering bound in (9), which is of the form i=1k|Si|niδ2k.\sum_{i=1}^{k}\frac{\left\lvert S_{i}\right\rvert}{n_{i}}\lesssim\delta^{2}k.

3.2 Symmetric Signed Laplacian

We now describe our results for the symmetric Signed Laplacian. We recall that \varmathbbE[A]=\varmathbbE[A+]\varmathbbE[A]\varmathbb{E}[A]=\varmathbb{E}[A^{+}]-\varmathbb{E}[A^{-}] and \varmathbbE[D¯]\varmathbb{E}[\bar{D}] denote the adjacency and degree matrices of the expected graph, under the SSBM ensemble. We define

sym=In(\varmathbbE[D¯])1/2\varmathbbE[A](\varmathbbE[D¯])1/2,\mathcal{L}_{sym}=I_{n}-(\varmathbb{E}[\bar{D}])^{-1/2}\varmathbb{E}[A](\varmathbb{E}[\bar{D}])^{-1/2}, (10)

to be the normalized Signed Laplacian of the expected graph. Moreover, ρ=sl1\rho=\frac{s}{l}\leqslant 1 denotes the aspect ratio, measuring the discrepancy between the smallest and largest cluster sizes in the SSBM.

We will first show that for ρ\rho large enough, the smallest k1k-1 eigenvectors of sym\mathcal{L}_{sym}, denoted by Vk1(sym)V_{k-1}(\mathcal{L}_{sym}), are given by Vk1(sym)=ΘRk1V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}, with Rk1\varmathbbRk×(k1)R_{k-1}\in\varmathbb R^{k\times(k-1)} a matrix whose columns are the k1k-1 smallest eigenvectors of a k×kk\times k matrix C¯\overline{C} defined in Lemma 5.1. We will then prove that the rows of Vk1(sym)V_{k-1}(\mathcal{L}_{sym}) impart the same clustering structure as that of Θ\Theta. The remaining arguments revolve around deriving concentration bounds on Lsym¯sym\left\lVert\overline{L_{sym}}-\mathcal{L}_{sym}\right\rVert, which imply, for n,pn,p and ρ\rho large enough, that the distance between the column spans of Vk1(Lsym¯)V_{k-1}(\overline{L_{sym}}) and Vk1(sym)V_{k-1}(\mathcal{L}_{sym}) is small, i.e. there exists a unitary matrix OO such that Vk1(Lsym¯)Vk1(sym)O\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})O\right\rVert is small. Altogether, this allows us to conclude that the rows of Vk1(Lsym¯)V_{k-1}(\overline{L_{sym}}) approximately encode the clustering structure of Θ\Theta. The above discussion is summarized in the following theorem, which is our first main result for the symmetric Signed Laplacian, in the moderately dense regime.

Theorem 3.4 (Eigenspace alignment in the dense case).

Assuming η[0,1/2)\eta\in[0,1/2), k2k\geqslant 2, n10n\geqslant 10, suppose the aspect ratio satisfies

ρ>114k(2+k),\displaystyle\sqrt{\rho}>1-\frac{1}{4k(2+\sqrt{k})}, (11)

and suppose that, for δ(0,12)\delta\in(0,\frac{1}{2}), it holds true that

p>C(k,η,δ)lnnn\displaystyle p>C(k,\eta,\delta)\frac{\ln n}{n}\qquad with C(k,η,δ)=(2Ckδ(12η))2andC<43,\displaystyle\text{ with }\quad C(k,\eta,\delta)=\left(\frac{2Ck}{\delta(1-2\eta)}\right)^{2}\quad\text{and}\>C<43, (12)

Then there exists a universal constant c>0c>0, such that with probability at least 12nnexp(npc)1-\frac{2}{n}-n\exp{(\frac{-np}{c})}, there exists an orthogonal matrix O\varmathbbR(k1)×(k1)O\in\varmathbb R^{(k-1)\times(k-1)} such that

Vk1(Lsym¯)ΘRk1O2δ,\|V_{k-1}(\overline{L_{sym}})-\Theta R_{k-1}O\|\leqslant 2\delta,

where Rk1\varmathbbRk×(k1)R_{k-1}\in\varmathbb R^{k\times(k-1)} is a matrix whose columns are the (k1)(k-1) smallest eigenvectors of the matrix C¯\overline{C} defined in Lemma 5.1.

Remark 3.5.

(Related work) As previously explained, for the special case where k=2k=2 and with equal-size clusters, a similar result was proved in [CDGT19, Theorem 3]. Under a different SSBM model, the Signed Laplacian clustering algorithm was analyzed by Mercado et al. [MTH19] for general kk. Although their generative model is more general than our SSBM, their results on the symmetric Signed Laplacian do not apply here. More precisely, one assumption of Theorem 3 [MTH19] translates into our model as p(k2)(12η)<0p(k-2)(1-2\eta)<0, which does not hold for η<12\eta<\frac{1}{2} and k2k\geqslant 2.

Remark 3.6.

(Assumptions) The condition on the aspect ratio (11) is essential to apply a perturbation technique, where the reference is the setting with equal-size clusters, i.e. ni=nk,i[k]n_{i}=\frac{n}{k},\forall i\in[k] (see Lemma 5.3). In the sparsity condition (12), we note that the constant C(k,η,δ)C(k,\eta,\delta) scales quadratically with the number of classes kk and as δ2\delta^{-2} with δ>0\delta>0 the error on the eigenspace. However, we conjecture that this assumption is only an artefact of the proof technique, and that the result could hold for more general graphs with very unbalanced cluster sizes.

Regularized Signed Laplacian.

We now consider the sparse regime p=o(lnn)/np=o(\ln n)/n and show that we can recover the ground-truth clustering structure up to some small error using the regularized Signed Laplacian LγL_{\gamma}, provided that nn, pp and ρ\rho are large enough, and that the regularization parameters γ+,γ\gamma^{+},\gamma^{-} are well-chosen. We denote γ\mathcal{L}_{\gamma} to be the equivalent of the regularized Laplacian for the expected graph in our SSBM, i.e.

γ=I(\varmathbbE[D¯γ])1/2\varmathbbE[Aγ](\varmathbbE[D¯γ])1/2,\displaystyle\mathcal{L}_{\gamma}=I-(\varmathbb{E}[\bar{D}_{\gamma}])^{-1/2}\varmathbb{E}[A_{\gamma}](\varmathbb{E}[\bar{D}_{\gamma}])^{-1/2},

with \varmathbbE[Aγ]\varmathbb{E}[A_{\gamma}], resp. \varmathbbE[D¯γ]\varmathbb{E}[\overline{D}_{\gamma}], denoting the adjacency matrix, resp. the degree matrix, of the expected regularized graph. The next theorem is an intermediate result, which provides a high probability bound on Lγγ\left\lVert L_{\gamma}-\mathcal{L}_{\gamma}\right\rVert and Lγsym\left\lVert L_{\gamma}-\mathcal{L}_{sym}\right\rVert.

Theorem 3.7.

(Error bound for the regularized Signed Laplacian) Assuming η[0,1/2)\eta\in[0,1/2), k2k\geqslant 2, and regularization parameters γ+,γ0\gamma^{+},\gamma^{-}\geqslant 0, γ:=γ++γ\gamma:=\gamma^{+}+\gamma^{-}, it holds true that for any r1r\geqslant 1, with probability at least 17e2r1-7e^{-2r}, we have

LγγCr2γ(1+d¯γ)5/2+322rγ+8d¯,\|L_{\gamma}-\mathcal{L}_{\gamma}\|\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}, (13)

with C>1C>1 an absolute constant. Moreover, it also holds true that

LγsymCr2γ(1+d¯γ)5/2+322rγ+8d¯+γd¯+γ.\|L_{\gamma}-\mathcal{L}_{sym}\|\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}+\frac{\gamma}{\overline{d}+\gamma}. (14)

In particular, for the choice γ=d¯7/8\gamma=\overline{d}^{7/8}, if p2/np\geqslant 2/n, we obtain

Lγsym(128Cr2+1)(d¯)18.\displaystyle\|L_{\gamma}-\mathcal{L}_{sym}\|\leqslant\left(128Cr^{2}+1\right)(\overline{d})^{-\frac{1}{8}}.
Remark 3.8.

The above theorem shows the concentration of our regularized Laplacian LγL_{\gamma} towards the regularized Laplacian (13) and the Signed Laplacian (14) of the expected graph. More precisely, if for some well-chosen parameters γ+,γ0\gamma^{+},\gamma^{-}\geqslant 0, these upper bounds are small, e.g Lγsym<<1\|L_{\gamma}-\mathcal{L}_{sym}\|<<1, then we have Lγsym<<sym\|L_{\gamma}-\mathcal{L}_{sym}\|<<\left\lVert\mathcal{L}_{sym}\right\rVert since sym=2\left\lVert\mathcal{L}_{sym}\right\rVert=2 (see Appendix E).

Using this concentration bound, we can show that the eigenspaces Vk1(Lγ)V_{k-1}(L_{\gamma}) and Vk1(sym)V_{k-1}(\mathcal{L}_{sym}) are “close”, provided that p=Ω(1/n)p=\Omega(1/n), ρ\rho is close enough to 1, and γ\gamma is well-chosen. This is stated in the next theorem.

Theorem 3.9 (Eigenspace alignment in the sparse case).

Assuming η[0,1/2)\eta\in[0,1/2), k2k\geqslant 2, and n10n\geqslant 10, suppose that (11) holds true, and for δ(0,12)\delta\in(0,\frac{1}{2}) and r1r\geqslant 1, the sparsity pp satisfies

p>(2kC4δ(12η))82n\displaystyle p>\left(\frac{2kC_{4}}{\delta(1-2\eta)}\right)^{8}\frac{2}{n}\qquad with C4=128Cr2+1\displaystyle\text{ with }\quad C_{4}=128Cr^{2}+1 (15)

and C>1C>1 the constant defined in (13). If the regularization parameters γ+,γ0\gamma^{+},\gamma^{-}\geqslant 0 are chosen so that γ=d¯7/8\gamma=\overline{d}^{7/8}, then with probability at least 17e2r2nnenp/c1-7e^{-2r}-\frac{2}{n}-ne^{-np/c}, there exists an orthogonal matrix O\varmathbbR(k1)×(k1)O\in\varmathbb R^{(k-1)\times(k-1)} so that

Vk1(Lγ)ΘRk1O2δ.\|V_{k-1}(L_{\gamma})-\Theta R_{k-1}O\|\leqslant 2\delta.
Remark 3.10.

In the sparse setting, the constant before the factor 1n\frac{1}{n} in the sparsity condition (15) scales as (kδ)8\left(\frac{k}{\delta}\right)^{8}. However for kk fixed, it would hold if p=ω(1/n)p=\omega(1/n) as nn\to\infty.

Remark 3.11.

In practice, one can choose the regularization parameters by first estimating the sparsity parameter pp, e.g. from the fraction of connected pairs of nodes

p=2n(n1)i<j|Aij|,\displaystyle p=\frac{2}{n(n-1)}\sum_{i<j}|A_{ij}|,

then choosing γ0\gamma\geqslant 0 so that γ=(p^(n1))7/8\gamma=(\hat{p}(n-1))^{7/8}. However, from this analysis, it is not clear how one would suitably choose γ+\gamma^{+} and γ\gamma^{-}.

Mis-clustering error bounds.

Since Vk1(Lsym¯)V_{k-1}(\overline{L_{sym}}) and Vk1(Lγ)V_{k-1}(L_{\gamma}) are “close” to Vk1(sym)V_{k-1}(\mathcal{L}_{sym}), we recover the ground-truth clustering structure up to some error, which we quantify in the following theorem, where we bound the mis-clustering rate when using a (1+ξ)(1+\xi)-approximate kk-means error on the rows of Vk1(Lsym¯)V_{k-1}(\overline{L_{sym}}) (resp. Vk1(Lγ)V_{k-1}(L_{\gamma})).

Theorem 3.12.

(Number of mis-clustered nodes) Let ξ>0\xi>0 and δ(0,112(16+8ξ)(k1))\delta\in\left(0,\sqrt{\frac{1}{12(16+8\xi)(k-1)}}\right), and suppose that ρ\rho and pp satisfy the assumptions of Theorem 3.4 (resp. Theorem 3.9 and r1r\geqslant 1). Let (Θ~,R~k1)(\tilde{\Theta},\tilde{R}_{k-1}) be the (1+ξ)(1+\xi)-approximation of the kk-means problem

minΘ\varmathbbMn,k,R\varmathbbRk×(k1)ΘRVk1(Lsym¯)F(resp. minΘ\varmathbbMn,k,R\varmathbbRk×(k1)ΘRVk1(Lγ)F ).\min_{\Theta\in\varmathbb{M}_{n,k},R\in\varmathbb R^{k\times(k-1)}}\left\lVert\Theta R-V_{k-1}(\overline{L_{sym}})\right\rVert_{F}\qquad\text{(resp. }\min_{\Theta\in\varmathbb{M}_{n,k},R\in\varmathbb R^{k\times(k-1)}}\left\lVert\Theta R-V_{k-1}(L_{\gamma})\right\rVert_{F}\text{ )}.

Let Si={jCi;(Θ~R~k1)j(ΘRk1O)j223ni}S_{i}=\left\{j\in C_{i};\left\lVert(\tilde{\Theta}\tilde{R}_{k-1})_{j*}-(\Theta R_{k-1}O)_{j*}\right\rVert^{2}\geqslant\frac{2}{3n_{i}}\right\} and V~=i=1kCi\Si\tilde{V}=\cup_{i=1}^{k}C_{i}\backslash S_{i}. Then with probability at least 12nnexp(npc)1-\frac{2}{n}-n\exp(\frac{-np}{c}) (resp. 17e2r2nnenp/c1-7e^{-2r}-\frac{2}{n}-ne^{-np/c}), there exists a permutation π\varmathbbRk×k\pi\in\varmathbb R^{k\times k} such that Θ~V~=Θ^V~π\tilde{\Theta}_{\tilde{V}*}=\hat{\Theta}_{\tilde{V}*}\pi and

i=1k|Si|ni96(2+ξ)(k1)δ2.\displaystyle\sum_{i=1}^{k}\frac{|S_{i}|}{n_{i}}\leqslant 96(2+\xi)(k-1)\delta^{2}.

In particular, the set of mis-clustered nodes is a subset of i=1kSi\cup_{i=1}^{k}S_{i}.

4 Analysis of SPONGE Symmetric

This section contains the proof of our main results for SPONGEsym, divided over the following subsections. Section 4.1 describes the eigen-decomposition of the matrix T¯\overline{T}, thus revealing that a subset of its eigenvectors contain relevant information about Θ\Theta. Section 4.2 provides conditions on τ+,τ\tau^{+},\tau^{-} which ensure that Vk(Θ)=ΘRV_{k}(\Theta)=\Theta R (for some rotation matrix RR), along with a lower bound on the eigengap λnk+1(T¯)λnk(T¯)\lambda_{n-k+1}(\overline{T})-\lambda_{n-k}(\overline{T}). Section 4.3 then derives concentration bounds on TT¯\left\lVert T-\overline{T}\right\rVert using standard tools from the random matrix literature. These results are combined in Section 4.4 to derive error bounds for estimating Vk(T¯)V_{k}(\overline{T}) and Gk(T¯)G_{k}(\overline{T}) up to a rotation (using the Davis-Kahan theorem). The results summarized thus far pertain to the “dense” regime, where we require p=Ω(lnn/n)p=\Omega(\ln n/n) when nn is large. Section 4.5 extends these results to the sparse regime where p=o(lnn)/np=o(\ln n)/n, for the regularized version of SPONGEsym. Finally, we conclude in Section 4.6 by translating our results from Sections 4.4 and 4.5 to obtain mis-clustering error bounds for a (1+ξ)(1+\xi)-approximate kk-means algorithm, by leveraging previous tools from the literature [LR15].

4.1 Eigen-decomposition of T¯\overline{T}

The following lemma shows that a subset of the eigenvectors of T¯\overline{T} indeed contain information about Θ\Theta, i.e., the ground-truth clustering.

Lemma 4.1 (Spectrum of T¯\overline{T}).

Let di+=p(n(si(12η)+η)(1η))d_{i}^{+}=p\left(n(s_{i}(1-2\eta)+\eta)-(1-\eta)\right), and di=p(n(si(12η)+(1η))η)d_{i}^{-}=p\left(n(-s_{i}(1-2\eta)+(1-\eta))-\eta\right) denote the expected degree of a node in cluster CiC_{i}, i[k]i\in[k]. Let u+=(n1d1+,,nkdk+)u^{+}=\left(\sqrt{\frac{n_{1}}{d_{1}^{+}}},\ldots,\sqrt{\frac{n_{k}}{d_{k}^{+}}}\right)^{\top}, u=(n1d1,,nkdk)u^{-}=\left(\sqrt{\frac{n_{1}}{d_{1}^{-}}},\ldots,\sqrt{\frac{n_{k}}{d_{k}^{-}}}\right)^{\top}, αi+=1+τ+p(1η)/di+\alpha_{i}^{+}=1+\tau^{-}+p(1-\eta)/d_{i}^{+}, and αi=1+τ++pη/di\alpha_{i}^{-}=1+\tau^{+}+p\eta/d_{i}^{-}, for i[k]i\in[k], for some τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0. Let the columns of VV^{\perp} contain eigenvectors of \varmathbbE[D+]\operatorname*{\varmathbb{E}}[D^{+}] which are orthogonal to the column span of Θ\Theta. It holds true that

T¯=[ΘRV][𝚲α1+α1In11αk+αkInk1][(ΘR)V],\overline{T}=\begin{bmatrix}{\Theta R}&V^{\perp}\\ \end{bmatrix}\begin{bmatrix}\bm{\Lambda}\\ &\frac{\alpha_{1}^{+}}{\alpha_{1}^{-}}I_{n_{1}-1}\\ &&\ddots\\ &&&\frac{\alpha_{k}^{+}}{\alpha_{k}^{-}}I_{n_{k}-1}\end{bmatrix}\begin{bmatrix}(\Theta R)^{\top}\\ {V^{\perp}}^{\top}\end{bmatrix}\,, (16)

where RR is a k×kk\times k rotation matrix, and Λ\Lambda is a diagonal matrix, such that (C)1/2C+(C)1/2=RΛRT(C^{-})^{-1/2}\;C^{+}\;(C^{-})^{-1/2}=R\Lambda R^{T}, where

C+=pηu+(u+)+diag(1+τ+pdi+(1ηni(12η))),C^{+}=-p\eta u^{+}(u^{+})^{\top}+\textrm{diag}\left(1+\tau^{-}+\frac{p}{d_{i}^{+}}(1-\eta-n_{i}(1-2\eta))\right)\,, (17)
C=p(1η)u(u)+diag(1+τ++pdi(η+ni(12η))).C^{-}=-p(1-\eta)u^{-}(u^{-})^{\top}+\textrm{diag}\left(1+\tau^{+}+\frac{p}{d_{i}^{-}}(\eta+n_{i}(1-2\eta))\right)\,. (18)
Proof.

We first consider the spectrum of D+,D,A+,AD^{+},D^{-},A^{+},A^{-}, followed by that of (Lsym+¯+τI)(\overline{L^{+}_{sym}}+\tau^{-}I) and (Lsym¯+τ+I)(\overline{L^{-}_{sym}}+\tau^{+}I), which altogether will reveal the spectral decomposition of T¯\overline{T}.

\bullet Analysis in expectation of the spectra of D+,D,A+,AD^{+},D^{-},A^{+},A^{-}. Without loss of generality, we may assume that cluster C1C_{1} contains the first n1n_{1} vertices, cluster C2C_{2} the next n2n_{2} vertices and similarly for the remaining clusters. Note that \varmathbbE[D±]=diag(d1±In1,,dk±Ink)\operatorname*{\varmathbb{E}}[D^{\pm}]=\textrm{diag}\left(d_{1}^{\pm}I_{n_{1}},\ldots,d_{k}^{\pm}I_{n_{k}}\right), where for i[k]i\in[k], straightforward calculations reveal that di+=p(n(si(12η)+η)(1η))d_{i}^{+}=p\left(n(s_{i}(1-2\eta)+\eta)-(1-\eta)\right), and di=p(n(si(12η)+(1η))η)d_{i}^{-}=p\left(n(-s_{i}(1-2\eta)+(1-\eta))-\eta\right). One can rewrite the matrices (\varmathbbE[D±])1(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1} in the more convenient form

(\varmathbbE[D±])1=[ΘV]diag(1d1±,,1dk±,1d1±In11,,1dk±Ink1)[ΘV](\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1}=[{\Theta}~V^{\perp}]~\textrm{diag}\left(\frac{1}{d_{1}^{\pm}},...,\frac{1}{d_{k}^{\pm}},\frac{1}{d_{1}^{\pm}}I_{n_{1}-1},...,\frac{1}{d_{k}^{\pm}}I_{n_{k}-1}\right)~[{\Theta}~V^{\perp}]^{\top} (19)

since the column vectors of Θ\Theta are eigenvectors of (\varmathbbE[D±])1(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1}, and the eigenvalues of (\varmathbbE[D±])1(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1} are apparent because \varmathbbE[D±]\operatorname*{\varmathbb{E}}[D^{\pm}] is a diagonal matrix. Note that (19) is true in general, and does not make any assumption on the placement of the vertices into their respective CiC_{i} cluster. Furthermore, one can verify that \varmathbbE[A+]\operatorname*{\varmathbb{E}}[A^{+}] admits the eigen-decomposition

\varmathbbE[A+]=Θn×k[n1p(1η)n1n2pηn1nkpηn2n1pηn2p(1η)n2nkpηnkn1pηnkn2pηnkp(1η)]k×kΘk×np(1η)In×n\operatorname*{\varmathbb{E}}[A^{+}]=\Theta_{n\times k}{\begin{bmatrix}n_{1}p(1-\eta)&\sqrt{n_{1}n_{2}}p\eta&\ldots&\sqrt{n_{1}n_{k}}p\eta\\ \sqrt{n_{2}n_{1}}p\eta&n_{2}p(1-\eta)&\ldots&\sqrt{n_{2}n_{k}}p\eta\\ \vdots&\vdots&\ddots&\vdots\\ \sqrt{n_{k}n_{1}}p\eta&\sqrt{n_{k}n_{2}}p\eta&\ldots&n_{k}p(1-\eta)\end{bmatrix}_{k\times k}}{\Theta}^{\top}_{k\times n}-p(1-\eta)I_{n\times n} (20)

and similarly, \varmathbbE[A]\operatorname*{\varmathbb{E}}[A^{-}] can be decomposed as

\varmathbbE[A]=Θn×k[n1pηn1n2p(1η)n1nkp(1η)n2n1p(1η)n2pηn2nkp(1η)nkn1p(1η)nkn2p(1η)nkpη]k×kΘk×npηIn×n.\operatorname*{\varmathbb{E}}[A^{-}]=\Theta_{n\times k}{\begin{bmatrix}n_{1}p\eta&\sqrt{n_{1}n_{2}}p(1-\eta)&\ldots&\sqrt{n_{1}n_{k}}p(1-\eta)\\ \sqrt{n_{2}n_{1}}p(1-\eta)&n_{2}p\eta&\ldots&\sqrt{n_{2}n_{k}}p(1-\eta)\\ \vdots&\vdots&\ddots&\vdots\\ \sqrt{n_{k}n_{1}}p(1-\eta)&\sqrt{n_{k}n_{2}}p(1-\eta)&\ldots&n_{k}p\eta\end{bmatrix}_{k\times k}}{\Theta}^{\top}_{k\times n}-p\eta I_{n\times n}\,.

\bullet Analysis of the spectra of (Lsym+¯+τI)(\overline{L^{+}_{sym}}+\tau^{-}I) and (Lsym¯+τ+I)(\overline{L^{-}_{sym}}+\tau^{+}I). We start by observing that

Lsym±¯+τI=I(\varmathbbE[D±])1/2(\varmathbbE[A±])(\varmathbbE[D±])1/2+τI=(1+τ)I(\varmathbbE[D±])1/2(\varmathbbE[A±])(\varmathbbE[D±])1/2.\displaystyle\overline{L^{\pm}_{sym}}+\tau^{\mp}I=I-(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{\pm}])(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}+\tau^{\mp}I=(1+\tau^{\mp})I-(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{\pm}])(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}\,. (21)

In light of (20), one can write (\varmathbbE[D+])1/2(\varmathbbE[A+])(\varmathbbE[D+])1/2(\operatorname*{\varmathbb{E}}[D^{+}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{+}])(\operatorname*{\varmathbb{E}}[D^{+}])^{-1/2} as

(\varmathbbE[D+])1/2(\varmathbbE[A+])(\varmathbbE[D+])1/2=(\operatorname*{\varmathbb{E}}[D^{+}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{+}])(\operatorname*{\varmathbb{E}}[D^{+}])^{-1/2}=
[ΘV][[n1d1+p(1η)n1n2d1+d2+pηn1nkd1+dk+pηn2n1d2+d1+pηn2d2+p(1η)n2nkd2+dk+pηnkn1dk+d1+pηnkn2dk+d2+pηnkdk+p(1η)]k×k=defB+𝟎k×(nk)𝟎(nk)×k𝟎(nk)×(nk)][ΘV]p(1η)(\varmathbbE[D+])1.\begin{bmatrix}{\Theta}&V^{\perp}\\ \end{bmatrix}\begin{bmatrix}\overbrace{\begin{bmatrix}\frac{n_{1}}{d_{1}^{+}}p(1-\eta)&\sqrt{\frac{n_{1}n_{2}}{d_{1}^{+}d_{2}^{+}}}p\eta&\ldots&\sqrt{\frac{n_{1}n_{k}}{d_{1}^{+}d_{k}^{+}}}p\eta\\ \sqrt{\frac{n_{2}n_{1}}{d_{2}^{+}d_{1}^{+}}}p\eta&\frac{n_{2}}{d_{2}^{+}}p(1-\eta)&\ldots&\sqrt{\frac{n_{2}n_{k}}{d_{2}^{+}d_{k}^{+}}}p\eta\\ \vdots&\vdots&\ddots&\vdots\\ \sqrt{\frac{n_{k}n_{1}}{d_{k}^{+}d_{1}^{+}}}p\eta&\sqrt{\frac{n_{k}n_{2}}{d_{k}^{+}d_{2}^{+}}}p\eta&\ldots&\frac{n_{k}}{d_{k}^{+}}p(1-\eta)\end{bmatrix}_{k\times k}}^{\stackrel{{\scriptstyle\textup{def}}}{{=}}B^{+}}&\bm{0}_{k\times(n-k)}\\ \bm{0}_{(n-k)\times k}&\bm{0}_{(n-k)\times(n-k)}\\ \end{bmatrix}\begin{bmatrix}\Theta^{\top}\\ {V^{\perp}}^{\top}\end{bmatrix}-p(1-\eta)(\operatorname*{\varmathbb{E}}[D^{+}])^{-1}\,. (22)

Similarly, using the expression for \varmathbbE[A]\operatorname*{\varmathbb{E}}[A^{-}], the expression for (\varmathbbE[D])1/2(\varmathbbE[A])(\varmathbbE[D])1/2(\operatorname*{\varmathbb{E}}[D^{-}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{-}])(\operatorname*{\varmathbb{E}}[D^{-}])^{-1/2} can be written as

(\varmathbbE[D])1/2(\varmathbbE[A])(\varmathbbE[D])1/2=(\operatorname*{\varmathbb{E}}[D^{-}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{-}])(\operatorname*{\varmathbb{E}}[D^{-}])^{-1/2}=
[ΘV][[n1d1pηn1n2d1d2p(1η)n1nkd1dkp(1η)n2n1d2d1p(1η)n2d2pηn2nkd2dkp(1η)nkn1dkd1p(1η)nkn2dkd2p(1η)nkdkpη]k×k=defB𝟎k×(nk)𝟎(nk)×k𝟎(nk)×(nk)][ΘV]pη(\varmathbbE[D])1.\begin{bmatrix}{\Theta}&V^{\perp}\\ \end{bmatrix}\begin{bmatrix}\overbrace{\begin{bmatrix}\frac{n_{1}}{d_{1}^{-}}p\eta&\sqrt{\frac{n_{1}n_{2}}{d_{1}^{-}d_{2}^{-}}}p(1-\eta)&\ldots&\sqrt{\frac{n_{1}n_{k}}{d_{1}^{-}d_{k}^{-}}}p(1-\eta)\\ \sqrt{\frac{n_{2}n_{1}}{d_{2}^{-}d_{1}^{-}}}p(1-\eta)&\frac{n_{2}}{d_{2}^{-}}p\eta&\ldots&\sqrt{\frac{n_{2}n_{k}}{d_{2}^{-}d_{k}^{-}}}p(1-\eta)\\ \vdots&\vdots&\ddots&\vdots\\ \sqrt{\frac{n_{k}n_{1}}{d_{k}^{-}d_{1}^{-}}}p(1-\eta)&\sqrt{\frac{n_{k}n_{2}}{d_{k}^{-}d_{2}^{-}}}p(1-\eta)&\ldots&\frac{n_{k}}{d_{k}^{-}}p\eta\end{bmatrix}_{k\times k}}^{\stackrel{{\scriptstyle\textup{def}}}{{=}}B^{-}}&\bm{0}_{k\times(n-k)}\\ \bm{0}_{(n-k)\times k}&\bm{0}_{(n-k)\times(n-k)}\\ \end{bmatrix}\begin{bmatrix}\Theta^{\top}\\ {V^{\perp}}^{\top}\end{bmatrix}-p\eta(\operatorname*{\varmathbb{E}}[D^{-}])^{-1}\,. (23)

Combining (19), (22), and (23) into (21), we readily arrive at

(Lsym±¯+τI)=[ΘV][[diag(αi±)B±]k×k=defC±𝟎k×(nk)α1±In11α2±Ink1αk±Ink1,][ΘV](\overline{L_{sym}^{\pm}}+\tau^{\mp}I)=\begin{bmatrix}{\Theta}&V^{\perp}\\ \end{bmatrix}\begin{bmatrix}[\underbrace{\textrm{diag}(\alpha_{i}^{\pm})-B^{\pm}]_{k\times k}}_{\stackrel{{\scriptstyle\textup{def}}}{{=}}C^{\pm}}&\bm{0}_{k\times(n-k)}\\ &\alpha_{1}^{\pm}I_{n_{1}-1}\\ &&\alpha_{2}^{\pm}I_{n_{k}-1}\\ &&&\ddots\\ &&&&\alpha_{k}^{\pm}I_{n_{k}-1},\end{bmatrix}\begin{bmatrix}\Theta^{\top}\\ {V^{\perp}}^{\top}\end{bmatrix} (24)

where αi±\alpha_{i}^{\pm} and C+,CC^{+},C^{-} are defined as in the statement of the lemma. The spectral decomposition of T¯\overline{T} now follows trivially using (24), along with the spectral decomposition (C)1/2C+(C)1/2=RΛRT(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}=R\Lambda R^{T}. ∎

Lemma 4.1 reveals that we need to extract the kk-informative eigenvectors ΘR\Theta R from the nn-eigenvectors [ΘRV]\begin{bmatrix}{\Theta R}&V^{\perp}\\ \end{bmatrix} of T¯\overline{T}. Clearly, it suffices to recover any orthonormal basis for the column span of Θ\Theta, since the rows of any such corresponding matrix (one instance of which is ΘR\Theta R) will exhibit the same clustering structure as Θ\Theta.

4.2 Ensuring Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R and bounding the spectral gap

In this section, our aim is to show that, for suitable values of τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0, the eigenvectors corresponding to the smallest kk eigenvalues of T¯\overline{T} are given by ΘR\Theta R, i.e., Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R. This is equivalent to ensuring (recall Lemma 4.1) that

λnk+1(T¯)=(C)1/2C+(C)1/2<mini[k]αi+αi=λnk(T¯).\lambda_{n-k+1}(\overline{T})=\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert<\min_{i\in[k]}\frac{\alpha_{i}^{+}}{\alpha_{i}^{-}}=\lambda_{n-k}(\overline{T}). (25)

Moreover, we will need to find a strictly positive lower-bound on the spectral gap λnk(T¯)λnk+1(T¯)\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T}), as it will be used later on, in order to show that the column span of Vk(T)V_{k}(T) is close to that of Vk(T¯)V_{k}(\overline{T}). We first consider the equal-sized clusters case, and then proceed to the general-sized clusters case.

4.2.1 Spectral gap for equal-sized clusters

When the cluster sizes are equal, the analysis is considerably cleaner than the general setting. Let us first establish notation specific to the equal-sized clusters case.

Remark 4.2 (Notation for the equal-sized clusters).

For clusters of equal size, we have that n1==nk=n/kn_{1}=...=n_{k}=n/k, d+:=d1+==dk+d^{+}:=d_{1}^{+}=...=d_{k}^{+}, d:=d1==dkd^{-}:=d_{1}^{-}=...=d_{k}^{-}, α+:=α1+==αk+\alpha^{+}:=\alpha_{1}^{+}=...=\alpha_{k}^{+}, and α:=α1==αk\alpha^{-}:=\alpha_{1}^{-}=...=\alpha_{k}^{-}. Let Ce+,CeC^{+}_{e},C^{-}_{e}, and Te¯\overline{T_{e}} denote the respective counterparts of C+,CC^{+},C^{-}, and T¯\overline{T}, for the equal-sized case. In light of (17) and (18), one can verify that Ce+C^{+}_{e} and CeC^{-}_{e} are simultaneously diagonalizable, which we show in Lemma D.1.

In the following lemma, we show the exact value of Λ=(Ce)1/2Ce+(Ce)1/2\left\lVert\Lambda\right\rVert=\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert.

Lemma 4.3 (Bounding the spectral norm of (Ce)1/2Ce+(Ce)1/2(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}).

For equal-sized clusters, the following holds true

(Ce)1/2Ce+(Ce)1/2=max{ττ+,τ+pnηd+τ++pn(1η)d}.\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert=\max\left\{\frac{\tau^{-}}{\tau^{+}},\frac{\tau^{-}+\frac{pn\eta}{d^{+}}}{\tau^{+}+\frac{pn(1-\eta)}{d^{-}}}\right\}\,.
Proof.

The lemma follows directly from Lemma D.1. ∎

Next, we derive conditions on τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 which ensure Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R.

Lemma 4.4 (Conditions on τ\tau^{-} and τ+\tau^{+}).

Suppose n2k(1η)12ηn\geqslant\frac{2k(1-\eta)}{1-2\eta}, and τ0\tau^{-}\geqslant 0, τ+>0\tau^{+}>0. If τ\tau^{-}, τ+\tau^{+} satisfy

  1. 1.
    τ(1+pηd)<τ+(1+p(1η)d+),\tau^{-}\left(1+\frac{p\eta}{d^{-}}\right)<\tau^{+}\left(1+\frac{p(1-\eta)}{d^{+}}\right)\,,
  2. 2.
    τ[(12η)/k(1η)12ηk]+τ+[(12η)/kη+12ηk]+1>2ηη+12η2k.\tau^{-}\left[\frac{(1-2\eta)/k}{(1-\eta)-\frac{1-2\eta}{k}}\right]+\tau^{+}\left[\frac{(1-2\eta)/k}{\eta+\frac{1-2\eta}{k}}\right]+1>\frac{2\eta}{\eta+\frac{1-2\eta}{2k}}\,.

Then it holds true that Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, i.e., λnk+1(T¯)=(Ce)1/2Ce+(Ce)1/2<α+α=λnk(T¯).\lambda_{n-k+1}(\overline{T})=\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert<\frac{\alpha^{+}}{\alpha^{-}}=\lambda_{n-k}(\overline{T}).

Proof.

Recalling the expression for (Ce)1/2Ce+(Ce)1/2\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert from Lemma 4.3, we will ensure that each term inside the max is less than α+/α\alpha^{+}/\alpha^{-}. To derive the first condition of the lemma, we simply ensure that

ττ+<1+τ+p(1η)/d+1+τ++pη/dτ(1+pηd)<τ+(1+p(1η)d+).\frac{\tau^{-}}{\tau^{+}}<\frac{1+\tau^{-}+p(1-\eta)/d^{+}}{1+\tau^{+}+p\eta/d^{-}}\Leftrightarrow\tau^{-}\left(1+\frac{p\eta}{d^{-}}\right)<\tau^{+}\left(1+\frac{p(1-\eta)}{d^{+}}\right)\,.

Before deriving the second condition, let us note additional useful bounds on npd,npd+\frac{np}{d^{-}},\frac{np}{d^{+}} which will be needed later.

  1. 1.

    d/np=1η(12η)/kη/n1ηd^{-}/np=1-\eta-(1-2\eta)/k-\eta/n\leqslant 1-\eta.

  2. 2.

    Since nk2n\geqslant k\geqslant 2, we obtain that d/np(1η)(13η)/k1η2d^{-}/np\geqslant(1-\eta)-(1-3\eta)/k\geqslant\frac{1-\eta}{2}. This also implies that pη/d1p\eta/d^{-}\leqslant 1.

  3. Therefore, combining the above two bounds, we arrive at

    11ηnpd21η.\frac{1}{1-\eta}\leqslant\frac{np}{d^{-}}\leqslant\frac{2}{1-\eta}\,.
  4. 3.

    d+/np=(12η)/k+η(1η)/nη+(12η)/kd^{+}/np=(1-2\eta)/k+\eta-(1-\eta)/n\leqslant\eta+(1-2\eta)/k.

  5. 4.

    Since n2k(1η)12ηn\geqslant\frac{2k(1-\eta)}{1-2\eta}, it holds that d+/np=(12η)/k+η(1η)/nη+(12η)/2kd^{+}/np=(1-2\eta)/k+\eta-(1-\eta)/n\geqslant\eta+(1-2\eta)/2k.

  6. 5.

    Therefore, combining the above two conditions yields

    1η+12ηknpd+1η+12η2k.\frac{1}{\eta+\frac{1-2\eta}{k}}\leqslant\frac{np}{d^{+}}\leqslant\frac{1}{\eta+\frac{1-2\eta}{2k}}\,.

To derive the second condition, we need to ensure τ+pnηd+τ++pn(1η)d<1+τ+p(1η)/d+1+τ++pη/d\frac{\tau^{-}+\frac{pn\eta}{d^{+}}}{\tau^{+}+\frac{pn(1-\eta)}{d^{-}}}<\frac{1+\tau^{-}+p(1-\eta)/d^{+}}{1+\tau^{+}+p\eta/d^{-}}, which is equivalent to

τ[1npd((1η)ηn)]<τ+[1npd+(η1ηn)]+[np(1η)d(1+p(1η)d+)npηd+(1+pηd)]term 2.\tau^{-}\left[1-\frac{np}{d^{-}}\left((1-\eta)-\frac{\eta}{n}\right)\right]<\tau^{+}\left[1-\frac{np}{d^{+}}\left(\eta-\frac{1-\eta}{n}\right)\right]+\underbrace{\left[\frac{np(1-\eta)}{d^{-}}\left(1+\frac{p(1-\eta)}{d^{+}}\right)-\frac{np\eta}{d^{+}}\left(1+\frac{p\eta}{d^{-}}\right)\right]}_{\text{term 2}}\,.

Now, we can lower bound “term 2” in the above equation as

np(1η)d(1+p(1η)d+)npηd+(1+pηd)12ηη+(12η)k.{\frac{np(1-\eta)}{d^{-}}\left(1+\frac{p(1-\eta)}{d^{+}}\right)-\frac{np\eta}{d^{+}}\left(1+\frac{p\eta}{d^{-}}\right)}\geqslant 1-\frac{2\eta}{\eta+\frac{(1-2\eta)}{k}}\,.

Hence from the above two equations, we observe that it suffices that τ+,τ\tau^{+},\tau^{-} satisfy

τ[(12η)/k(1η)12ηk]+τ+[(12η)/kη+12ηk]+1>2ηη+12η2k.\tau^{-}\left[\frac{(1-2\eta)/k}{(1-\eta)-\frac{1-2\eta}{k}}\right]+\tau^{+}\left[\frac{(1-2\eta)/k}{\eta+\frac{1-2\eta}{k}}\right]+1>\frac{2\eta}{\eta+\frac{1-2\eta}{2k}}\,.

Next, we derive sufficient conditions on τ+,τ\tau^{+},\tau^{-} which ensure a lower bound on the spectral gap

λnk(T¯)λnk+1(T¯)=α+α(Ce)1/2Ce+(Ce)1/2.\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})=\frac{\alpha^{+}}{\alpha^{-}}-\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert.
Lemma 4.5 (Conditions on τ+,τ\tau^{+},\tau^{-}, and lower-bound on spectral gap).

Suppose n2k(1η)12ηn\geqslant\frac{2k(1-\eta)}{1-2\eta}, then the following holds.

  1. 1.

    If τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 satisfy

    τ+>32ηk3(12η),τ<min{32,316τ+,3(1η)8(η+12ηk)},\tau^{+}>\frac{32\eta k}{3(1-2\eta)},\quad\tau^{-}<\min\left\{\frac{3}{2},\frac{3}{16}\tau^{+},\frac{3(1-\eta)}{8(\eta+\frac{1-2\eta}{k})}\right\},

    then Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, and (Ce)1/2Ce+(Ce)1/2<(1(12η)2k(1η))α+α\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert<\left(1-\frac{(1-2\eta)}{2k(1-\eta)}\right)\frac{\alpha^{+}}{\alpha^{-}}, i.e., λnk(T¯)λnk+1(T¯)>((12η)2k(1η))α+α\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})>\left(\frac{(1-2\eta)}{2k(1-\eta)}\right)\frac{\alpha^{+}}{\alpha^{-}}.

  2. 2.

    If η<13k+2\eta<\frac{1}{3k+2} and τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 satisfy

    τ<min{(12ηkη12ηk+η),12,τ+8},\tau^{-}<\min\left\{\left(\frac{\frac{1-2\eta}{k}-\eta}{\frac{1-2\eta}{k}+\eta}\right),\frac{1}{2},\frac{\tau^{+}}{8}\right\}\,,

    then Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, and (Ce)1/2Ce+(Ce)1/2<α+2α\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert<\frac{\alpha^{+}}{2\alpha^{-}}, i.e., λnk(T¯)λnk+1(T¯)>α+2α\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})>\frac{\alpha^{+}}{2\alpha^{-}}.

Proof.

We need to ensure the following two conditions for a suitably chosen β(0,1]\beta\in(0,1].

τ+pnηd+τ++pn(1η)d\displaystyle\frac{\tau^{-}+\frac{pn\eta}{d^{+}}}{\tau^{+}+\frac{pn(1-\eta)}{d^{-}}} <β(1+τ+p(1η)/d+1+τ++pη/d),\displaystyle<\beta\left(\frac{1+\tau^{-}+p(1-\eta)/d^{+}}{1+\tau^{+}+p\eta/d^{-}}\right), (26)
ττ+\displaystyle\frac{\tau^{-}}{\tau^{+}} <β(1+τ+p(1η)/d+1+τ++pη/d).\displaystyle<\beta\left(\frac{1+\tau^{-}+p(1-\eta)/d^{+}}{1+\tau^{+}+p\eta/d^{-}}\right). (27)
1. Ensuring (26)

We can rewrite (26) as

τ(1+pηdβpn(1η)d)+τ+(pnηd+β(1+p(1η)d+))+τ+τ(1β)<βpn(1η)d(1+p(1η)d+)pnηd+(1+pηd).\tau^{-}\left(1+\frac{p\eta}{d^{-}}-\beta\frac{pn(1-\eta)}{d^{-}}\right)+\tau^{+}\left(\frac{pn\eta}{d^{+}}-\beta\left(1+\frac{p(1-\eta)}{d^{+}}\right)\right)+\tau^{+}\tau^{-}(1-\beta)<\beta\frac{pn(1-\eta)}{d^{-}}\left(1+\frac{p(1-\eta)}{d^{+}}\right)-\frac{pn\eta}{d^{+}}\left(1+\frac{p\eta}{d^{-}}\right). (28)

Using the expressions for d+,dd^{+},d^{-}, we can write the coefficients of the terms τ+,τ\tau^{+},\tau^{-} as follows.

1+pηdβpn(1η)d\displaystyle 1+\frac{p\eta}{d^{-}}-\beta\frac{pn(1-\eta)}{d^{-}} =(12ηk)+(1η)(1β)(12ηk)+(1η)ηn,\displaystyle=\frac{-(\frac{1-2\eta}{k})+(1-\eta)(1-\beta)}{-(\frac{1-2\eta}{k})+(1-\eta)-\frac{\eta}{n}},
pnηd+β(1+p(1η)d+)\displaystyle\frac{pn\eta}{d^{+}}-\beta\left(1+\frac{p(1-\eta)}{d^{+}}\right) =npd+(ηβ1ηn)β=η(1β)β(12ηk)12ηk+η1ηn.\displaystyle=\frac{np}{d^{+}}(\eta-\beta\frac{1-\eta}{n})-\beta=\frac{\eta(1-\beta)-\beta(\frac{1-2\eta}{k})}{\frac{1-2\eta}{k}+\eta-\frac{1-\eta}{n}}.

Moreover, using the bounds on dnp,d+np\frac{d^{-}}{np},\frac{d^{+}}{np} derived in Lemma 4.4, we can lower bound the RHS term in (28) as

βpn(1η)d(1+p(1η)d+)pnηd+(1+pηd)>β2ηη+12ηk.\beta\frac{pn(1-\eta)}{d^{-}}\left(1+\frac{p(1-\eta)}{d^{+}}\right)-\frac{pn\eta}{d^{+}}\left(1+\frac{p\eta}{d^{-}}\right)>\beta-\frac{2\eta}{\eta+\frac{1-2\eta}{k}}.

From the above considerations, we see that (28) is ensured provided

τ[(12ηk)(1η)(1β)(12ηk)+(1η)ηn]+τ+[η(1β)+β(12ηk)12ηk+η1ηn]+β>2ηη+12ηk+τ+τ(1β).\tau^{-}\left[\frac{(\frac{1-2\eta}{k})-(1-\eta)(1-\beta)}{-(\frac{1-2\eta}{k})+(1-\eta)-\frac{\eta}{n}}\right]+\tau^{+}\left[\frac{-\eta(1-\beta)+\beta(\frac{1-2\eta}{k})}{\frac{1-2\eta}{k}+\eta-\frac{1-\eta}{n}}\right]+\beta>\frac{2\eta}{\eta+\frac{1-2\eta}{k}}+\tau^{+}\tau^{-}(1-\beta). (29)

We outline two possible ways in which (29) is ensured.

  • Note that the denominators of the coefficients of τ+,τ\tau^{+},\tau^{-} in (29) are positive, while the numerators are non-negative provided 1β(12η)2k(1η)1-\beta\leqslant\frac{(1-2\eta)}{2k(1-\eta)}. Therefore, choosing

    β=1(12η)2k(1η)(34),\beta=1-\frac{(1-2\eta)}{2k(1-\eta)}\quad\left(\geqslant\frac{3}{4}\right),

    note that (29) is ensured provided

    τ[(12η)2k(1η)]+τ+[3(12η)8k(η+12ηk)]+34>2ηη+12ηk+τ+τ[(12η)2k(1η)].\tau^{-}\left[\frac{(1-2\eta)}{2k(1-\eta)}\right]+\tau^{+}\left[\frac{3(1-2\eta)}{8k\left(\eta+\frac{1-2\eta}{k}\right)}\right]+\frac{3}{4}>\frac{2\eta}{\eta+\frac{1-2\eta}{k}}+\tau^{+}\tau^{-}\left[\frac{(1-2\eta)}{2k(1-\eta)}\right]. (30)

    Finally, we observe that in order for (30) to hold, it suffices that

    τ+τ[(12η)2k(1η)]<τ+2[3(12η)8k(η+12ηk)]\displaystyle\tau^{+}\tau^{-}\left[\frac{(1-2\eta)}{2k(1-\eta)}\right]<\frac{\tau^{+}}{2}\left[\frac{3(1-2\eta)}{8k\left(\eta+\frac{1-2\eta}{k}\right)}\right] τ<3(1η)8(η+12ηk), and\displaystyle\iff\tau^{-}<\frac{3(1-\eta)}{8\left(\eta+\frac{1-2\eta}{k}\right)},\text{ and }
    2ηη+12ηk<τ+2[3(12η)8k(η+12ηk)]\displaystyle\frac{2\eta}{\eta+\frac{1-2\eta}{k}}<\frac{\tau^{+}}{2}\left[\frac{3(1-2\eta)}{8k\left(\eta+\frac{1-2\eta}{k}\right)}\right] τ+>32ηk3(12η).\displaystyle\iff\tau^{+}>\frac{32\eta k}{3(1-2\eta)}.
  • Alternatively, by setting β=1/2\beta=1/2, (29) can be rewritten as

    τ+[η2+12η2k12ηk+η1ηn]+12>2ηη+12ηk+τ[(12ηk)+1η2(12ηk)+(1η)ηn]+τ+τ2.\tau^{+}\left[\frac{-\frac{\eta}{2}+\frac{1-2\eta}{2k}}{\frac{1-2\eta}{k}+\eta-\frac{1-\eta}{n}}\right]+\frac{1}{2}>\frac{2\eta}{\eta+\frac{1-2\eta}{k}}+\tau^{-}\left[\frac{-(\frac{1-2\eta}{k})+\frac{1-\eta}{2}}{-(\frac{1-2\eta}{k})+(1-\eta)-\frac{\eta}{n}}\right]+\frac{\tau^{+}\tau^{-}}{2}. (31)

    Clearly, it holds true that

    12>2ηη+12ηkη<13k+2,\displaystyle\frac{1}{2}>\frac{2\eta}{\eta+\frac{1-2\eta}{k}}\iff\eta<\frac{1}{3k+2},

    which also ensures that the numerator of the coefficient of τ+\tau^{+} is positive. Therefore, if η<13k+2\eta<\frac{1}{3k+2}, then in order for (31) to hold, it suffices that

    τ<[η+12ηk12ηk+η]τ+[η2+12η2k12ηk+η1ηn]>τ+τ2.\tau^{-}<\left[\frac{-\eta+\frac{1-2\eta}{k}}{\frac{1-2\eta}{k}+\eta}\right]\implies\tau^{+}\left[\frac{-\frac{\eta}{2}+\frac{1-2\eta}{2k}}{\frac{1-2\eta}{k}+\eta-\frac{1-\eta}{n}}\right]>\frac{\tau^{+}\tau^{-}}{2}.
2. Ensuring (27)

Note that one can rewrite (27) as

ττ+(1β)+τ(1+pηd)<βτ+(1+p(1η)d+).\tau^{-}\tau^{+}(1-\beta)+\tau^{-}\left(1+\frac{p\eta}{d^{-}}\right)<\beta\tau^{+}\left(1+\frac{p(1-\eta)}{d^{+}}\right). (32)

Since pηd1\frac{p\eta}{d^{-}}\leqslant 1, (32) is ensured provided

ττ+(1β)+2τ<βτ+\tau^{-}\tau^{+}(1-\beta)+2\tau^{-}<\beta\tau^{+}

which in turn holds if each LHS term is respectively less than half of the RHS term. This leads to the condition

τ<min{β2(1β),β4τ+}.\tau^{-}<\min\left\{\frac{\beta}{2(1-\beta)},\frac{\beta}{4}\tau^{+}\right\}.

Finally, plugging the choices β=1(12η)2k(1η)(3/4)\beta=1-\frac{(1-2\eta)}{2k(1-\eta)}(\geqslant 3/4) and β=12\beta=\frac{1}{2} in the above equation, and combining it with the conditions derived for ensuring (26), we readily arrive (after minor simplifications) at the statements in the Lemma. ∎

4.2.2 Spectral gap for the general case

For the general-sized clusters case, it is difficult to find the exact value of (C)1/2C+(C)1/2\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert. Therefore, in the following lemma, we show an upper bound on this quantity by bounding the spectral norms of C+C^{+} and (C)1(C^{-})^{-1}.

Lemma 4.6 (Bounding the spectral norm of (C)1(C^{-})^{-1} and C+C^{+}).

Recall s:=mini[k]ni/ns:=\min_{i\in[k]}n_{i}/n. Then it holds true that

λmax(C+)\displaystyle\lambda_{\max}(C^{+}) τ+nηn(s(12η)+η)(1η),\displaystyle\leqslant\tau^{-}+\frac{n\eta}{n(s(1-2\eta)+\eta)-(1-\eta)}, (33)
λmin(C)\displaystyle\lambda_{\min}(C^{-}) τ+.\displaystyle\geqslant\tau^{+}\,. (34)

From the above two inequalities, it follows that

(C)1/2C+(C)1/2λmax(C+)λmin(C)τ+nηn(s(12η)+η)(1η)τ+.\displaystyle\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert\leqslant\frac{\lambda_{max}(C^{+})}{\lambda_{min}(C^{-})}\leqslant\frac{\tau^{-}+\frac{n\eta}{n(s(1-2\eta)+\eta)-(1-\eta)}}{\tau^{+}}\,.

The proof of the above lemma is deferred to Appendix D.

Remark 4.7.

It is difficult to obtain more precise bounds on λmax(C+)\lambda_{\max}(C^{+}) and λmin(C)\lambda_{\min}(C^{-}), given the expressions for C+C^{+} in (17), and CC^{-} in (18). Clearly, a tighter bound on (C)1/2C+(C)1/2\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert would yield a tighter analysis in the general case.

Recall l:=maxi[k]ni/nl:=\max_{i\in[k]}n_{i}/n; with a slight abuse of notation, let dl±d_{l}^{\pm} denote the degree of the largest cluster (of size nlnl). As before, we now derive conditions on τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 which ensure Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, or equivalently,

λnk+1(T¯)=(C)1/2C+(C)1/2<mini[k]αi+αi=1+τ+p(1η)/dl+1+τ++pη/dl=αl+αl=λnk(T¯).\lambda_{n-k+1}(\overline{T})=\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert<\min_{i\in[k]}\frac{\alpha_{i}^{+}}{\alpha_{i}^{-}}=\frac{1+\tau^{-}+p(1-\eta)/d_{l}^{+}}{1+\tau^{+}+p\eta/d_{l}^{-}}=\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}=\lambda_{n-k}(\overline{T}). (35)

Additionally, we find sufficient conditions on τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 which ensure a lower bound on the spectral gap λnk(T¯)λnk+1(T¯)=mini[k]αi+αi(C)1/2C+(C)1/2\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})=\min_{i\in[k]}\frac{\alpha_{i}^{+}}{\alpha_{i}^{-}}-\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert. These are shown in the following lemma.

Lemma 4.8 (Conditions on τ+,τ\tau^{+},\tau^{-}, and Lower-Bound on Spectral Gap).

Suppose nmax{2(1η)s(12η),2η(1l)(1η)}n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}, then the following is true.

  1. 1.

    If τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 satisfy

    2τ+4ηs(12η)+2η<s(12η)s(12η)+2ητ+2\tau^{-}+\frac{4\eta}{s(1-2\eta)+2\eta}<\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\tau^{+} (36)

    then Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, i.e., λnk+1(T¯)=(C)1/2C+(C)1/2<αl+αl=λnk(T¯)\lambda_{n-k+1}(\overline{T})=\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert<\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}=\lambda_{n-k}(\overline{T}).

  2. 2.

    For β=4ηs(12η)+4η\beta=\frac{4\eta}{s(1-2\eta)+4\eta} with 0<η<120<\eta<\frac{1}{2}, if τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 satisfy

    (1β)ττ++2τ+4ηs(12η)+2η<β2(s(12η)s(12η)+2η)τ+(1-\beta)\tau^{-}\tau^{+}+2\tau^{-}+\frac{4\eta}{s(1-2\eta)+2\eta}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\tau^{+} (37)

    then Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, and (C)1/2C+(C)1/2<βαl+αl\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert<\beta\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}, i.e., λnk(T¯)λnk+1(T¯)>(1β)αl+αl\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})>(1-\beta)\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}. Moreover, for (37) to hold, it suffices that

    τ+>16ηβs(12η),τ<β2(s(12η)s(12η)+2η)min{14(1β),τ+8}.\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\quad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}.
  3. 3.

    The statement in part (22) also holds for the choice β=12\beta=\frac{1}{2}, and provided ηs2s+4\eta\leqslant\frac{s}{2s+4}.

Proof.

From (35) and Lemma 4.6, it suffices to show for β(0,1]\beta\in(0,1] that

τ+ηs(12η)+η(1η)nτ+<β(1+τ+p(1η)/dl+1+τ++pη/dl).\frac{\tau^{-}+\frac{\eta}{s(1-2\eta)+\eta-\frac{(1-\eta)}{n}}}{\tau^{+}}<\beta\left(\frac{1+\tau^{-}+p(1-\eta)/d_{l}^{+}}{1+\tau^{+}+p\eta/d_{l}^{-}}\right). (38)

For the stated condition on nn, it is easy to verify that

n2(1η)s(12η)s(12η)+η(1η)n\displaystyle n\geqslant\frac{2(1-\eta)}{s(1-2\eta)}\implies s(1-2\eta)+\eta-\frac{(1-\eta)}{n} s(12η)2+η,\displaystyle\geqslant\frac{s(1-2\eta)}{2}+\eta,
n2η(1l)(1η)pηdl2ηn(1η)(1l)\displaystyle n\geqslant\frac{2\eta}{(1-l)(1-\eta)}\implies\frac{p\eta}{d_{l}^{-}}\leqslant\frac{2\eta}{n(1-\eta)(1-l)} 1.\displaystyle\leqslant 1.

Using these bounds in (38), observe that it suffices that τ+,τ\tau^{+},\tau^{-} satisfy

τ+2ηs(12η)+2ητ+<β(1+τ2+τ+).\frac{\tau^{-}+\frac{2\eta}{s(1-2\eta)+2\eta}}{\tau^{+}}<\beta\left(\frac{1+\tau^{-}}{2+\tau^{+}}\right). (39)

Then for β=1\beta=1, we readily see that (39) is equivalent to (36).

To establish the second part of the Lemma, we begin by rewriting (39) as

(1β)τ+τ+2τ+4ηs(12η)+2η<(β2ηs(12η)+2η)τ+=[βs(12η)2η(1β)s(12η)+2η]τ+,\displaystyle(1-\beta)\tau^{+}\tau^{-}+2\tau^{-}+\frac{4\eta}{s(1-2\eta)+2\eta}<\left(\beta-\frac{2\eta}{s(1-2\eta)+2\eta}\right)\tau^{+}=\left[\frac{\beta s(1-2\eta)-2\eta(1-\beta)}{s(1-2\eta)+2\eta}\right]\tau^{+}, (40)

and observe that

βs(12η)4η(1β)β4ηs(12η)+4η\beta s(1-2\eta)\geqslant 4\eta(1-\beta)\iff\beta\geqslant\frac{4\eta}{s(1-2\eta)+4\eta} (41)

This verifies (37) in the statement of the Lemma. The “moreover” part is established by ensuring that each term on the LHS of (37) is a sufficiently small fraction of the RHS term. In particular, it is enough to choose this fraction to be 1/41/4 for the first two terms, and 1/21/2 for the third term.

Finally, the third part of the Lemma can be shown in the same manner as the second part. The starting point is to ensure (40), and we simply observe that for β=1/2\beta=1/2, (41) is equivalent to ηs2s+4\eta\leqslant\frac{s}{2s+4}. The rest follows identically. ∎

4.3 Concentration bound for TT¯\left\lVert T-\overline{T}\right\rVert

In this section, we bound the “distance” between TT and T¯\overline{T}, i.e., TT¯\left\lVert T-\overline{T}\right\rVert. This is shown via individually bounding the terms Lsym+Lsym+¯\left\lVert L^{+}_{sym}-\overline{L^{+}_{sym}}\right\rVert, and LsymLsym¯\left\lVert L^{-}_{sym}-\overline{L^{-}_{sym}}\right\rVert. To this end, we first recall the following Theorem from [CR11].

Theorem 4.9 (Bounding LsymLsym¯\left\lVert L_{sym}-\overline{L_{sym}}\right\rVert, [CR11]).

Let LsymL_{sym} denote the normalized Laplacian of a random graph, and Lsym¯\overline{L_{sym}} the normalized Laplacian of the expected graph. Let δ\delta be the minimum expected degree of the graph. Choose ε>0\varepsilon>0. Then there exists a constant cεc_{\varepsilon} such that, if δcεlnn\delta\geqslant c_{\varepsilon}\ln n, then with probability at least 1ε1-\varepsilon, it holds true that

LsymLsym¯23ln(4n/ε)δ.\left\lVert L_{sym}-\overline{L_{sym}}\right\rVert\leqslant 2\sqrt{\frac{3\ln(4n/\varepsilon)}{\delta}}\,.
Remark 4.10.

A similar result appears in [Imb09] for the (unsigned) inhomogeneous Erdős-Rényi model, where LsymLsym¯=O(lnn/d0)\left\lVert L_{sym}-\overline{L_{sym}}\right\rVert=O(\sqrt{\ln n/d_{0}}), with d0d_{0} the smallest expected degree of the graph.

Using Theorem 4.9, we readily obtain the following concentration bounds for Lsym+Lsym+¯\left\lVert L^{+}_{sym}-\overline{L^{+}_{sym}}\right\rVert and LsymLsym¯\left\lVert L^{-}_{sym}-\overline{L^{-}_{sym}}\right\rVert.

Lemma 4.11 (Bounding Lsym±Lsym±¯\left\lVert L_{sym}^{\pm}-\overline{L_{sym}^{\pm}}\right\rVert).

Assuming nmax{2(1η)s(12η),2η(1l)(1η)}n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}, there exists a constant cε>0c_{\varepsilon}>0 such that if pcεlnnnmax{1s(12η)+2η,21l}p\geqslant\frac{c_{\varepsilon}\ln n}{n}\max\left\{\frac{1}{s(1-2\eta)+2\eta},\frac{2}{1-l}\right\}, then with probability at least 12ε1-2\varepsilon,

Lsym+Lsym+¯26ln(4n/ε)np[s(12η)+2η],andLsymLsym¯212ln(4n/ε)np(1l).\displaystyle\left\lVert L^{+}_{sym}-\overline{L^{+}_{sym}}\right\rVert\leqslant 2\sqrt{\frac{6\ln(4n/\varepsilon)}{np[s(1-2\eta)+2\eta]}},\qquad\text{and}\qquad\left\lVert L^{-}_{sym}-\overline{L^{-}_{sym}}\right\rVert\leqslant 2\sqrt{\frac{12\ln(4n/\varepsilon)}{np(1-l)}}.
Proof.

Note that the minimum expected degrees of the positive and negative subgraphs are given by ds+,dld_{s}^{+},d_{l}^{-}, respectively. For the stated condition on nn, it is easily seen that

ds+np2[s(12η)+2η],dlnp2(1l)(1η)np(1l)4.d_{s}^{+}\geqslant\frac{np}{2}\left[s(1-2\eta)+2\eta\right],\quad d_{l}^{-}\geqslant\frac{np}{2}(1-l)(1-\eta)\geqslant\frac{np(1-l)}{4}. (42)

Invoking Theorem 4.9, and observing that ds+,dlcε2lnnd_{s}^{+},d_{l}^{-}\geqslant\frac{c_{\varepsilon}}{2}\ln n are ensured for the stated condition on pp, the statement follows via the union bound. ∎

Next, using the above lemma, we can upper bound TT¯\left\lVert T-\overline{T}\right\rVert. This will help us show that Vk(T)V_{k}(T) and Vk(T¯)V_{k}(\overline{T}) are “close”.

Lemma 4.12 (Bounding TT¯\left\lVert T-\overline{T}\right\rVert).

Let P=(Lsym+τ+I)P=(L^{-}_{sym}+\tau^{+}I), P¯=(Lsym¯+τ+I)\;\overline{P}=(\overline{L^{-}_{sym}}+\tau^{+}I), Q=(Lsym++τI)\;Q=(L^{+}_{sym}+\tau^{-}I), and Q¯=(Lsym+¯+τI)\;\overline{Q}=(\overline{L^{+}_{sym}}+\tau^{-}I). Assume that PP¯ΔP\left\lVert P-\overline{P}\right\rVert\leqslant\Delta_{P}, and QQ¯ΔQ\;\left\lVert Q-\overline{Q}\right\rVert\leqslant\Delta_{Q}. Then it holds true that

TT¯(αs++ΔQ)τ+(ΔPτ++2ΔPτ+)+ΔQτ+\left\lVert T-\overline{T}\right\rVert\leqslant\frac{(\alpha_{s}^{+}+\Delta_{Q})}{\tau^{+}}\left(\frac{\Delta_{P}}{\tau^{+}}+2\sqrt{\frac{\Delta_{P}}{\tau^{+}}}\right)+\frac{\Delta_{Q}}{\tau^{+}}

where αs+=1+τ+p(1η)ds+\alpha_{s}^{+}=1+\tau^{-}+\frac{p(1-\eta)}{d_{s}^{+}} (see Lemma 4.1).

Proof.

Since P,P¯,Q,Q¯P,\overline{P},Q,\overline{Q} are positive definite, therefore using Proposition C.2, we obtain the bound

TT¯P1Q((P¯)1P¯P+2(P¯)1/2P¯P1/2)+(P¯)1QQ¯.\displaystyle\left\lVert T-\overline{T}\right\rVert\leqslant\left\lVert P^{-1}\right\rVert\left\lVert Q\right\rVert\left(\left\lVert(\overline{P})^{-1}\right\rVert\left\lVert\overline{P}-P\right\rVert+2\left\lVert({\overline{P}})^{-1/2}\right\rVert\left\lVert\overline{P}-P\right\rVert^{1/2}\right)+\left\lVert(\overline{P})^{-1}\right\rVert\left\lVert Q-\overline{Q}\right\rVert. (43)

We know that P1=1/τ+=P¯1\left\lVert P^{-1}\right\rVert=1/\tau^{+}=\left\lVert\overline{P}^{-1}\right\rVert and (P¯)1/2=1/τ+\left\lVert(\overline{P})^{-1/2}\right\rVert=1/\sqrt{\tau^{+}}. Moreover, QQ¯+ΔQ\left\lVert Q\right\rVert\leqslant\left\lVert\overline{Q}\right\rVert+\Delta_{Q} by Weyl’s inequality [Wey12] (see Appendix B). Hence (43) simplifies to

TT¯(Q¯+ΔQ)τ+(ΔPτ++2ΔPτ+)+ΔQτ+(αs++ΔQ)τ+(ΔPτ++2ΔPτ+)+ΔQτ+,\displaystyle\left\lVert T-\overline{T}\right\rVert\leqslant\frac{(\left\lVert\overline{Q}\right\rVert+\Delta_{Q})}{\tau^{+}}\left(\frac{\Delta_{P}}{\tau^{+}}+2\sqrt{\frac{\Delta_{P}}{\tau^{+}}}\right)+\frac{\Delta_{Q}}{\tau^{+}}\leqslant\frac{(\alpha_{s}^{+}+\Delta_{Q})}{\tau^{+}}\left(\frac{\Delta_{P}}{\tau^{+}}+2\sqrt{\frac{\Delta_{P}}{\tau^{+}}}\right)+\frac{\Delta_{Q}}{\tau^{+}}\,,

where the last inequality can be verified by examining the expression of Q¯\overline{Q} in (24), and noting from the definition of C+C^{+} that C+<max{α1+,,αk+}=αs+\left\lVert C^{+}\right\rVert<\max\left\{\alpha_{1}^{+},...,\alpha_{k}^{+}\right\}=\alpha_{s}^{+} holds (via Weyl’s inequality). ∎

4.4 Estimating Vk(T¯)V_{k}(\overline{T}) and Gk(T¯)G_{k}(\overline{T}) up to a rotation

We are now ready to combine the results of the previous sections to show that if n,pn,p are large enough, then the distance between the subspaces spanned by Vk(T)V_{k}(T) and Vk(T¯)V_{k}(\overline{T}) is small, i.e., there exists an orthonormal matrix OO such that Vk(T)V_{k}(T) is close to Vk(T¯)OV_{k}(\overline{T})O. For τ+,τ\tau^{+},\tau^{-} chosen suitably, we have seen in Lemma 4.8 that Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R for a rotation RR, hence this suggests that the rows of Vk(T)V_{k}(T) will then also approximately preserve the clustering structure of Vk(T¯)V_{k}(\overline{T}).

With P,P¯,Q,Q¯P,\overline{P},Q,\overline{Q} as defined in Lemma 4.12 recall from (4), (7) that Gk(T),Gk(T¯)G_{k}(T),G_{k}(\overline{T}) can be written as

Gk(T¯)=P¯1/2Vk(T¯),Gk(T)=P1/2Vk(T).G_{k}(\overline{T})=\overline{P}^{-1/2}V_{k}(\overline{T}),\quad G_{k}(T)=P^{-1/2}V_{k}(T). (44)

Therefore if Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R, then using the expression for P¯\overline{P} from (24) we see that Gk(T¯)=Θ(C)1/2RG_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R, and thus the rows of Gk(T¯)G_{k}(\overline{T}) also preserve the ground truth clustering structure. Moreover, if Vk(T)Vk(T¯)O\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert is small, then it can be shown to imply a bound on Gk(T)Gk(T¯)O\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert. Hence the rows of Gk(T)G_{k}(T) will approximately preserve the clustering structure of Gk(T¯)G_{k}(\overline{T}).

Before stating the theorem, let us define the terms

C1(τ+,τ)=3((3+τ)(2τ++1)+τ+(τ+)2),C2(s,η,l)=max{1s(12η)+2η,21l}.C_{1}(\tau^{+},\tau^{-})=3\left(\frac{(3+\tau^{-})(2\sqrt{\tau^{+}}+1)+\tau^{+}}{(\tau^{+})^{2}}\right),\quad C_{2}(s,\eta,l)=\max\left\{\frac{1}{s(1-2\eta)+2\eta},\frac{2}{1-l}\right\}. (45)
Theorem 4.13.

Assuming nmax{2(1η)s(12η),2η(1l)(1η)}n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}, suppose τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 are chosen to satisfy

τ+>16ηβs(12η),τ<β2(s(12η)s(12η)+2η)min{14(1β),τ+8}\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\quad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}

where β,η\beta,\eta satisfy one of the following conditions.

  1. 1.

    β=4ηs(12η)+4η\beta=\frac{4\eta}{s(1-2\eta)+4\eta} and 0<η<120<\eta<\frac{1}{2}, or

  2. 2.

    β=12\beta=\frac{1}{2} and ηs2s+4\eta\leqslant\frac{s}{2s+4}.

Then Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R and Gk(T¯)=Θ(C)1/2RG_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R where RR is a rotation matrix, and C0C^{-}\succ 0 is as defined in (18). Moreover, for any ε,δ(0,1)\varepsilon,\delta\in(0,1), there exists a constant c~ε>0\widetilde{c}_{\varepsilon}>0 such that the following is true. If pp satisfies

pmax{c~εC2(s,η,l),256C14(τ+,τ)(2+τ+)4δ4(1+τ)4(1β)4C2(s,η,l),81(1l)δ4}ln(4n/ε)np\geqslant\max\left\{\widetilde{c}_{\varepsilon}C_{2}(s,\eta,l),\frac{256C_{1}^{4}(\tau^{+},\tau^{-})(2+\tau^{+})^{4}}{\delta^{4}(1+\tau^{-})^{4}(1-\beta)^{4}}C_{2}(s,\eta,l),\frac{81}{(1-l)\delta^{4}}\right\}\frac{\ln(4n/\varepsilon)}{n}

with C1(),C2()C_{1}(\cdot),C_{2}(\cdot) as in (45), then with probability at least 12ε1-2\varepsilon, there exists an orthogonal matrix O\varmathbbRk×kO\in\varmathbb R^{k\times k} such that

Vk(T)Vk(T¯)Oδ,andGk(T)Gk(T¯)Oδτ++δ(τ+)2.\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert\leqslant\delta,\qquad\mbox{and}\qquad\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\frac{\delta}{(\tau^{+})^{2}}.
Proof.

We will first simplify the upper bound on TT¯\left\lVert T-\overline{T}\right\rVert in Lemma 4.12, starting by bounding αs+\alpha_{s}^{+}. If n2(1η)s(12η)n\geqslant\frac{2(1-\eta)}{s(1-2\eta)}, it is easy to verify that (1η)pds+1\frac{(1-\eta)p}{d_{s}^{+}}\leqslant 1 which implies αs+2+τ\alpha_{s}^{+}\leqslant 2+\tau^{-}. Moreover, we observe from Lemma 4.11 that ΔP,ΔQ1\Delta_{P},\Delta_{Q}\leqslant 1 is ensured if pc~εC2(s,η,l)ln(4n/ε)np\geqslant\widetilde{c}_{\varepsilon}C_{2}(s,\eta,l)\frac{\ln(4n/\varepsilon)}{n} where c~ε=max{24,cε}\widetilde{c}_{\varepsilon}=\max\left\{24,c_{\varepsilon}\right\}. These considerations altogether imply

TT¯(3+τ)(2τ++1)(τ+)2ΔP+ΔQτ+\displaystyle\left\lVert T-\overline{T}\right\rVert\leqslant\frac{(3+\tau^{-})(2\sqrt{\tau^{+}}+1)}{(\tau^{+})^{2}}\sqrt{\Delta_{P}}+\frac{\Delta_{Q}}{\tau^{+}} (3+τ)(2τ++1)+τ+(τ+)2max{ΔP,ΔQ}\displaystyle\leqslant\frac{(3+\tau^{-})(2\sqrt{\tau^{+}}+1)+\tau^{+}}{(\tau^{+})^{2}}\max\left\{\sqrt{\Delta_{P}},\sqrt{\Delta_{Q}}\right\}
C1(τ+,τ)C21/4(s,η,l)(ln(4n/ε)np)1/4\displaystyle\leqslant C_{1}(\tau^{+},\tau^{-})C_{2}^{1/4}(s,\eta,l)\left(\frac{\ln(4n/\varepsilon)}{np}\right)^{1/4} (46)

where in the penultimate inequality we used ΔQΔQ\Delta_{Q}\leqslant\sqrt{\Delta_{Q}}, and the last inequality uses Lemma 4.11.

Next, we will use the Davis-Kahan theorem [DK70] (see Appendix B) for bounding the distance (IVk(T¯)Vk(T¯)T)Vk(T)\left\lVert(I-V_{k}(\overline{T})V_{k}(\overline{T})^{T})V_{k}(T)\right\rVert. Applied to our setup, it yields

(IVk(T¯)Vk(T¯)T)Vk(T)TT¯λnk+1(T)λnk(T¯),\left\lVert(I-V_{k}(\overline{T})V_{k}(\overline{T})^{T})V_{k}(T)\right\rVert\leqslant\frac{\left\lVert T-\overline{T}\right\rVert}{\lambda_{n-k+1}(T)-\lambda_{n-k}(\overline{T})}, (47)

provided λnk+1(T)λnk(T¯)>0\lambda_{n-k+1}(T)-\lambda_{n-k}(\overline{T})>0. From Weyl’s inequality, we know that λnk+1(T)λnk+1(T¯)TT¯\lambda_{n-k+1}(T)\geqslant\lambda_{n-k+1}(\overline{T})-\left\lVert T-\overline{T}\right\rVert. Moreover, under the stated conditions on τ+,τ\tau^{+},\tau^{-}, we obtain from Lemma 4.8 the bound

λnk+1(T¯)λnk(T¯)(1β)αl+αl(1β)(1+τ2+τ+),\lambda_{n-k+1}(\overline{T})-\lambda_{n-k}(\overline{T})\geqslant(1-\beta)\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}\geqslant(1-\beta)\left(\frac{1+\tau^{-}}{2+\tau^{+}}\right),

where in the last inequality we used the simplifications p(1η)/dl+0p(1-\eta)/d_{l}^{+}\geqslant 0 and pη/dl1p\eta/d_{l}^{-}\leqslant 1 in the expressions for αl+,αl\alpha_{l}^{+},\alpha_{l}^{-}. Hence using (46), we observe that if

C1(τ+,τ)C21/4(s,η,l)(ln(4n/ε)np)1/4(1β2)(1+τ2+τ+)p(16C14(τ+,τ)C2(s,η,l)(2+τ+)4(1+τ)4(1β)4)ln(4n/ε)n,C_{1}(\tau^{+},\tau^{-})C_{2}^{1/4}(s,\eta,l)\left(\frac{\ln(4n/\varepsilon)}{np}\right)^{1/4}\leqslant\left(\frac{1-\beta}{2}\right)\left(\frac{1+\tau^{-}}{2+\tau^{+}}\right)\iff p\geqslant\left(\frac{16C_{1}^{4}(\tau^{+},\tau^{-})C_{2}(s,\eta,l)(2+\tau^{+})^{4}}{(1+\tau^{-})^{4}(1-\beta)^{4}}\right)\frac{\ln(4n/\varepsilon)}{n},

then the RHS of (47) can be bounded as

(IVk(T¯)Vk(T¯)T)Vk(T)\displaystyle\left\lVert(I-V_{k}(\overline{T})V_{k}(\overline{T})^{T})V_{k}(T)\right\rVert 2(2+τ+)(1+τ)(1β)C1(τ+,τ)C21/4(s,η,l)(ln(4n/ε)np)1/4.\displaystyle\leqslant\frac{2(2+\tau^{+})}{(1+\tau^{-})(1-\beta)}C_{1}(\tau^{+},\tau^{-})C_{2}^{1/4}(s,\eta,l)\left(\frac{\ln(4n/\varepsilon)}{np}\right)^{1/4}.

It follows that there exists an orthogonal matrix O\varmathbbRk×kO\in\varmathbb R^{k\times k} so that

Vk(T)Vk(T¯)O\displaystyle\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert 2(IVk(T¯)Vk(T¯)T)Vk(T)( using Proposition B.3)\displaystyle\leqslant 2\left\lVert(I-V_{k}(\overline{T})V_{k}(\overline{T})^{T})V_{k}(T)\right\rVert\quad(\text{ using \hyperref@@ii[prop:orth_basis_align]{Proposition~\ref*{prop:orth_basis_align}}})
4(2+τ+)(1+τ)(1β)C1(τ+,τ)C21/4(s,η,l)(ln(4n/ε)np)1/4\displaystyle\leqslant\frac{4(2+\tau^{+})}{(1+\tau^{-})(1-\beta)}C_{1}(\tau^{+},\tau^{-})C_{2}^{1/4}(s,\eta,l)\left(\frac{\ln(4n/\varepsilon)}{np}\right)^{1/4}
δ\displaystyle\leqslant\delta

for the stated bound on pp. This establishes the first part of the Theorem.

In order to bound Gk(T)Gk(T¯)O\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert, we obtain from (44) that

Gk(T)Gk(T¯)O\displaystyle\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert =P1/2(Vk(T)Vk(T¯)O)+(P1/2P¯1/2)Vk(T¯)O\displaystyle=\left\lVert P^{-1/2}(V_{k}(T)-V_{k}(\overline{T})O)+(P^{-1/2}-\overline{P}^{-1/2})V_{k}(\overline{T})O\right\rVert
P1/2(τ+)1/2Vk(T)Vk(T¯)Oδ+P1/2P¯1/2Vk(T¯)=1\displaystyle\leqslant\underbrace{\left\lVert P^{-1/2}\right\rVert}_{(\tau^{+})^{-1/2}}\underbrace{\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert}_{\leqslant\delta}+\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert\underbrace{\left\lVert V_{k}(\overline{T})\right\rVert}_{=1}
δτ++P1/2P¯1/2.\displaystyle\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert. (48)

The term P1/2P¯1/2\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert can be bounded as

P1/2P¯1/2=P1(P1/2P¯1/2)P¯1P1/2P¯1/2(τ+)2PP¯1/2(τ+)23(τ+)2[ln(4n/ε)np(1l)]1/4,\displaystyle\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert=\left\lVert P^{-1}(P^{1/2}-\overline{P}^{1/2})\overline{P}^{-1}\right\rVert\leqslant\frac{\left\lVert P^{1/2}-\overline{P}^{1/2}\right\rVert}{(\tau^{+})^{2}}\leqslant\frac{\left\lVert P-\overline{P}\right\rVert^{1/2}}{(\tau^{+})^{2}}\leqslant\frac{3}{(\tau^{+})^{2}}\left[\frac{\ln(4n/\varepsilon)}{np(1-l)}\right]^{1/4}, (49)

where the penultimate inequality uses Proposition C.1, and the last inequality follows from Lemma 4.11 with a minor simplification of the constant. Plugging (49) in (48) leads to the stated bound for p81(1l)δ4ln(4n/ε)np\geqslant\frac{81}{(1-l)\delta^{4}}\frac{\ln(4n/\varepsilon)}{n}. ∎

4.5 Clustering sparse graphs

We now turn our attention to the sparse regime where p=o(lnn)/np=o(\ln n)/n. In this regime, Lemma 4.11 is no longer applicable since it requires p=Ω(lnnn)p=\Omega\left(\frac{\ln n}{n}\right). In fact, it is not difficult to see that the matrices Lsym±L^{\pm}_{sym} will not concentrate around Lsym±¯\overline{L^{\pm}_{sym}} in this sparsity regime. To circumvent this issue, we will aim to show that the normalized Laplacian Lsym,γ±±L^{\pm}_{sym,\gamma^{\pm}} corresponding to the regularized adjacencies Aγ±±:=A±+γ±n𝟙𝟙A_{\gamma^{\pm}}^{\pm}:=A^{\pm}+\frac{\gamma^{\pm}}{n}\mathds{1}\mathds{1}^{\top} concentrate around Lsym±¯\overline{L^{\pm}_{sym}}, for carefully chosen values of γ+,γ\gamma^{+},\gamma^{-}.

To show this, we rely on the following theorem from [LLV17], which states that the symmetric Laplacian Lsym,γL_{sym,\gamma} of the regularized adjacency matrix Aγ:=A+γn𝟙𝟙A_{\gamma}:=A+\frac{\gamma}{n}\mathds{1}\mathds{1}^{\top} is close to the symmetric Laplacian Lsym,γ¯\overline{L_{sym,\gamma}} of the expected regularized adjacency matrix, for inhomogeneous Erdős-Rényi graphs.

Theorem 4.14 (Theorem 4.1 of [LLV17]).

Consider a random graph from the inhomogeneous Erdős-Rényi model (G=(n,pij)G=(n,p_{ij})), and let d=maxpijnpijd=\max_{p_{ij}}np_{ij}. Choose a number γ>0\gamma>0. Then, for any r1r\geqslant 1, CC being an absolute constant, with probability at least 1er1-e^{-r}

Lsym,γLsym,γ¯Cr2γ(1+dγ)5/2.\left\lVert L_{sym,\gamma}-\overline{L_{sym,\gamma}}\right\rVert\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{d}{\gamma}\right)^{5/2}\,. (50)

The above result leads to a bound on the distance between Lsym,γL_{sym,\gamma} and the normalized Laplacian Lsym¯\overline{L_{sym}} of the expected (un-regularized) adjacency matrix.

Theorem 4.15 (Concentration of Regularized Laplacians).

Consider a random graph from the inhomogeneous Erdős-Rényi model (G=(n,pij)G=(n,p_{ij})), and let d=maxpijnpijd=\max_{p_{ij}}np_{ij}, dmin=minijpijd_{\min}=\min_{i}\sum_{j}p_{ij} . Choose a number γ>0\gamma>0. Then, for any r1r\geqslant 1, CC being an absolute constant, with probability at least 1er1-e^{-r}

Lsym,γLsym¯Cr2γ(1+dγ)5/2+3γdmin+γ.\left\lVert L_{sym,\gamma}-\overline{L_{sym}}\right\rVert\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{d}{\gamma}\right)^{5/2}+3\sqrt{\frac{\gamma}{d_{\min}+\gamma}}\,. (51)
Proof.

To establish the above lemma we make use of triangle inequality, where we use the fact that Lsym,γLsym¯Lsym,γLsym,γ¯+Lsym,γ¯Lsym¯\left\lVert L_{sym,\gamma}-\overline{L_{sym}}\right\rVert\leqslant\left\lVert L_{sym,\gamma}-\overline{L_{sym,\gamma}}\right\rVert+\left\lVert\overline{L_{sym,\gamma}}-\overline{L_{sym}}\right\rVert. We know the bound on the first term on the RHS from Lemma 4.14 (which holds with probability 1er1-e^{-r}). To bound the second term on the RHS, note that

Lsym,γ¯Lsym¯\displaystyle\left\lVert\overline{L_{sym,\gamma}}-\overline{L_{sym}}\right\rVert =D¯1/2A¯D¯1/2D¯γ1/2A¯γD¯γ1/2\displaystyle=\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}_{\gamma}\overline{D}_{\gamma}^{-1/2}\right\rVert
=D¯1/2A¯D¯1/2D¯γ1/2A¯D¯γ1/2+D¯γ1/2A¯D¯γ1/2D¯γ1/2A¯γD¯γ1/2\displaystyle=\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}+\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}_{\gamma}\overline{D}_{\gamma}^{-1/2}\right\rVert
D¯1/2A¯D¯1/2D¯γ1/2A¯D¯γ1/2+D¯γ1/2A¯D¯γ1/2D¯γ1/2A¯γD¯γ1/2.\displaystyle\leqslant\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}\right\rVert+\left\lVert\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}_{\gamma}\overline{D}_{\gamma}^{-1/2}\right\rVert\,.

The second term of the inequality can be easily bounded as follows.

D¯γ1/2A¯D¯γ1/2D¯γ1/2A¯γD¯γ1/2D¯γ1/22A¯A¯γγdmin+γγdmin+γ.\left\lVert\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}_{\gamma}\overline{D}_{\gamma}^{-1/2}\right\rVert\leqslant\left\lVert\overline{D}_{\gamma}^{-1/2}\right\rVert^{2}\left\lVert\overline{A}-\overline{A}_{\gamma}\right\rVert\leqslant\frac{\gamma}{d_{\min}+\gamma}\leqslant\sqrt{\frac{\gamma}{d_{\min}+\gamma}}\,.

To analyse the first term, we observe that

D¯1/2A¯D¯1/2D¯γ1/2A¯D¯γ1/2\displaystyle\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}\right\rVert =D¯1/2A¯D¯1/2D¯γ1/2D¯1/2D¯1/2A¯D¯1/2D¯1/2D¯γ1/2\displaystyle=\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{D}^{1/2}\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2}\right\rVert
=(ILsym¯)(ID¯1/2D¯γ1/2)+(ID¯γ1/2D¯1/2)(ILsym¯)D¯1/2D¯γ1/2\displaystyle=\left\lVert(I-\overline{L_{sym}})(I-\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2})+(I-\overline{D}_{\gamma}^{-1/2}\overline{D}^{1/2})(I-\overline{L_{sym}})\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2}\right\rVert
ID¯1/2D¯γ1/2+ID¯γ1/2D¯1/2D¯1/2D¯γ1/2\displaystyle\leqslant\left\lVert I-\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2}\right\rVert+\left\lVert I-\overline{D}_{\gamma}^{-1/2}\overline{D}^{1/2}\right\rVert\left\lVert\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2}\right\rVert
(1dmindmin+γ)+(1dmindmin+γ)\displaystyle\leqslant\left(1-\sqrt{\frac{d_{\min}}{d_{\min}+\gamma}}\right)+\left(1-\sqrt{\frac{d_{\min}}{d_{\min}+\gamma}}\right)
2γdmin+γ,\displaystyle\leqslant 2\sqrt{\frac{\gamma}{d_{\min}+\gamma}}\,,

where in the first inequality we use the fact that ILsym¯1\left\lVert I-\overline{L_{sym}}\right\rVert\leqslant 1, and in the last inequality we use the fact that for two numbers a,b>0a,b>0 if a>ba>b then abab\sqrt{a}-\sqrt{b}\leqslant\sqrt{a-b}. We have all the components to plug into the triangle inequality, which yields the desired statement of the theorem. ∎

We now translate Theorem 4.15 to our setting for G+,GG^{+},G^{-} and show that if p=Ω(1/n)p=\Omega(1/n) for nn large enough, then for the choices γ+,γ(np)6/7\gamma^{+},\gamma^{-}\asymp(np)^{6/7}, the bounds Lsym,γ±±Lsym±¯=O(1(np)1/14)\left\lVert L^{\pm}_{sym,\gamma^{\pm}}-\overline{L^{\pm}_{sym}}\right\rVert=O\left(\frac{1}{(np)^{1/14}}\right) hold with sufficiently high probability.

Lemma 4.16.

Let nmax{2(1η)s(12η),2η(1η)(1l)}n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-\eta)(1-l)}\right\} and p1n(1η)p\geqslant\frac{1}{n(1-\eta)}. Then for the choices γ+,γ=[np(1η)]6/7\gamma^{+},\gamma^{-}=[np(1-\eta)]^{6/7}, and any r1r\geqslant 1, there exists a constant C>0C>0 such that with probability at least 12er1-2e^{r}, it holds true that

Lsym,γ++Lsym+¯\displaystyle\left\lVert L^{+}_{sym,\gamma^{+}}-\overline{L^{+}_{sym}}\right\rVert (25/2Cr2+32s(12η)+2η)1[np(1η)]1/14,\displaystyle\leqslant\left(2^{5/2}Cr^{2}+\frac{3\sqrt{2}}{\sqrt{s(1-2\eta)+2\eta}}\right)\frac{1}{[np(1-\eta)]^{1/14}}, (52)
Lsym,γLsym¯\displaystyle\left\lVert L^{-}_{sym,\gamma^{-}}-\overline{L^{-}_{sym}}\right\rVert (25/2Cr2+61l)1[np(1η)]1/14.\displaystyle\leqslant\left(2^{5/2}Cr^{2}+\frac{6}{\sqrt{1-l}}\right)\frac{1}{[np(1-\eta)]^{1/14}}. (53)
Proof.

We will apply Theorem 4.15 to the subgraphs G+,GG^{+},G^{-}. Let us denote d±d^{\pm} to be the quantity maxijnpij\max_{ij}np_{ij}, and dmin±d_{min}^{\pm} to be the minimum expected degree for the positive and negative subgraphs, respectively. From the SSBM model, it can be verified that d±=np(1η)d^{\pm}=np(1-\eta). We also know that dmin+=ds+d_{\min}^{+}=d_{s}^{+} and dmin=dld_{\min}^{-}=d_{l}^{-}, where for the stated condition on nn, ds+,dld_{s}^{+},d_{l}^{-} satisfy the bounds in (42). The latter can be written as

dmin+d+2[s(12η)+2η],dmind(1l)4.d_{\min}^{+}\geqslant\frac{d^{+}}{2}[s(1-2\eta)+2\eta],\qquad d_{\min}^{-}\geqslant\frac{d^{-}(1-l)}{4}.

Let us denote C3(s,η)=s(12η)+2ηC_{3}(s,\eta)=s(1-2\eta)+2\eta for convenience. In order to show (52), we obtain from Theorem 4.15 that, with probability at least 1er1-e^{-r},

Lsym,γ++Lsym+¯Cr2γ+(1+d+γ+)5/2+3γ+dmin++γ+Cr2γ+(1+d+γ+)5/2+3γ+C3(s,η)d+,\displaystyle\left\lVert L_{sym,\gamma^{+}}^{+}-\overline{L^{+}_{sym}}\right\rVert\leqslant\frac{Cr^{2}}{\sqrt{\gamma^{+}}}\left(1+\frac{d^{+}}{\gamma^{+}}\right)^{5/2}+3\sqrt{\frac{\gamma^{+}}{d_{\min}^{+}+\gamma^{+}}}\leqslant\frac{Cr^{2}}{\sqrt{\gamma^{+}}}\left(1+\frac{d^{+}}{\gamma^{+}}\right)^{5/2}+3\sqrt{\frac{\gamma^{+}}{C_{3}(s,\eta)d^{+}}},

where the last inequality uses ds++γ+ds+d_{s}^{+}+\gamma^{+}\geqslant d_{s}^{+}. Now note that if γ+d+\gamma^{+}\leqslant d^{+}, then the above bound simplifies to

Lsym,γ++Lsym+¯25/2Cr2(d+)5/2(γ+)3+32C3(s,η)γ+d+.\left\lVert L_{sym,\gamma^{+}}^{+}-\overline{L^{+}_{sym}}\right\rVert\leqslant\frac{2^{5/2}Cr^{2}(d^{+})^{5/2}}{(\gamma^{+})^{3}}+\frac{3\sqrt{2}}{\sqrt{C_{3}(s,\eta)}}\sqrt{\frac{\gamma^{+}}{d^{+}}}. (54)

Choosing γ+\gamma^{+} such that (d+)5/2(γ+)3=γ+d+\frac{(d^{+})^{5/2}}{(\gamma^{+})^{3}}=\sqrt{\frac{\gamma^{+}}{d^{+}}}, or equivalently, γ+=(d+)6/7\gamma^{+}=(d^{+})^{6/7}, and plugging this in (54), we arrive at (52). Clearly, γ+d+\gamma^{+}\leqslant d^{+} is equivalent to the stated condition on pp. The bound in (53) follows in an identical manner and is omitted. ∎

We are now in a position to write the bound on Tγ+,γT¯\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert in terms of Lsym,γ±±Lsym±¯\left\lVert L^{\pm}_{sym,\gamma^{\pm}}-\overline{L^{\pm}_{sym}}\right\rVert, in a completely analogous manner to Lemma 4.12.

Lemma 4.17 (Adapting Lemma 4.12 for the sparse regime).

Let Pγ=(Lsym,γ+τ+I)P_{\gamma^{-}}=(L^{-}_{sym,\gamma^{-}}+\tau^{+}I), P¯=(Lsym¯+τ+I)\;\overline{P}=(\overline{L^{-}_{sym}}+\tau^{+}I), Qγ+=(Lsym,γ+++τI)\;Q_{\gamma^{+}}=(L^{+}_{sym,\gamma^{+}}+\tau^{-}I), and Q¯=(Lsym+¯+τI)\;\overline{Q}=(\overline{L^{+}_{sym}}+\tau^{-}I). Assume that PγP¯ΔPγ\left\lVert P_{\gamma^{-}}-\overline{P}\right\rVert\leqslant\Delta_{P_{\gamma^{-}}}, Qγ+Q¯ΔQγ+\left\lVert Q_{\gamma^{+}}-\overline{Q}\right\rVert\leqslant\Delta_{Q_{\gamma^{+}}}. Then it holds true that

Tγ+,γT¯(αs++ΔQγ+)τ+(ΔPγτ++2ΔPγτ+)+ΔQγ+τ+,\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert\leqslant\frac{(\alpha_{s}^{+}+\Delta_{Q_{\gamma^{+}}})}{\tau^{+}}\left(\frac{\Delta_{P_{\gamma^{-}}}}{\tau^{+}}+2\sqrt{\frac{\Delta_{P_{\gamma^{-}}}}{\tau^{+}}}\right)+\frac{\Delta_{Q_{\gamma^{+}}}}{\tau^{+}},

where αs+=1+τ+p(1η)ds+\alpha_{s}^{+}=1+\tau^{-}+\frac{p(1-\eta)}{d_{s}^{+}} (see Lemma 4.1).

Next, we derive the main theorem for SPONGEsym in the sparse regime, which is the analogue of Theorem 4.13. The first part of the Theorem remains unchanged, i.e., for nn large enough and τ+,τ\tau^{+},\tau^{-} chosen suitably, we have Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R and Gk(T¯)=Θ(C)1/2RG_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R for a k×kk\times k rotation RR, and C0C^{-}\succ 0. The remaining arguments follow the same outline of Theorem 4.13, i.e., (a) using Lemma 4.17 and Lemma 4.16 to obtain a concentration bound on Tγ+,γT¯\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert (when p=Ω(1/n)p=\Omega(1/n)), and (b) using the Davis-Kahan theorem to show that the column span of Vk(Tγ+,γ)V_{k}(T_{\gamma^{+},\gamma^{-}}) is close to Vk(T¯)V_{k}(\overline{T}). The latter bound then implies that Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}}) is close (up to a rotation) to Gk(T¯)G_{k}(\overline{T}), where we recall

Gk(T¯)=P¯1/2Vk(T¯),Gk(Tγ+,γ)=Pγ1/2Vk(Tγ+,γ)G_{k}(\overline{T})=\overline{P}^{-1/2}V_{k}(\overline{T}),\quad G_{k}(T_{\gamma^{+},\gamma^{-}})=P_{\gamma^{-}}^{-1/2}V_{k}(T_{\gamma^{+},\gamma^{-}}) (55)

with Pγ,P¯P_{\gamma^{-}},\overline{P} as defined in Lemma 4.17.

Theorem 4.18.

Assuming nmax{2(1η)s(12η),2η(1l)(1η)}n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}, suppose τ+>0,τ0\tau^{+}>0,\tau^{-}\geqslant 0 are chosen to satisfy

τ+>16ηβs(12η),τ<β2(s(12η)s(12η)+2η)min{14(1β),τ+8}\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\quad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}

where β,η\beta,\eta satisfy one of the following conditions.

  1. 1.

    β=4ηs(12η)+4η\beta=\frac{4\eta}{s(1-2\eta)+4\eta} and 0<η<120<\eta<\frac{1}{2}, or

  2. 2.

    β=12\beta=\frac{1}{2} and ηs2s+4\eta\leqslant\frac{s}{2s+4}.

Then Vk(T¯)=ΘRV_{k}(\overline{T})=\Theta R and Gk(T¯)=Θ(C)1/2RG_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R where RR is a rotation matrix, and C0C^{-}\succ 0 is as defined in (18). Moreover, there exists a constant C>0C>0 such that for r1r\geqslant 1 and δ(0,1)\delta\in(0,1), if pp satisfies

pmax{1,(4C1(τ+,τ)(2+τ+)3(τ+)2(1β)(1+τ))28}C414(r,s,η,l)δ28(1η)n,p\geqslant\max\left\{1,\left(\frac{4C_{1}(\tau^{+},\tau^{-})(2+\tau^{+})}{3(\tau^{+})^{2}(1-\beta)(1+\tau^{-})}\right)^{28}\right\}\frac{C_{4}^{14}(r,s,\eta,l)}{\delta^{28}(1-\eta)n},

and γ+,γ=[np(1η)]6/7\gamma^{+},\gamma^{-}=[np(1-\eta)]^{6/7}, then with probability at least 12er1-2e^{-r}, there exists a rotation O\varmathbbRk×kO\in\varmathbb R^{k\times k} so that

Vk(Tγ+,γ)Vk(T¯)Oδ,Gk(Tγ+,γ)Gk(T¯)Oδτ++δ(τ+)2.\left\lVert V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O\right\rVert\leqslant\delta,\qquad\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\frac{\delta}{(\tau^{+})^{2}}.

Here, C4(r,s,η,l):=25/2Cr2+32C2(s,η,l)C_{4}(r,s,\eta,l):=2^{5/2}Cr^{2}+3\sqrt{2C_{2}(s,\eta,l)} with C2(s,η,l)C_{2}(s,\eta,l) as defined in (45).

Proof.

We will first simplify the upper bound on Tγ+,γT¯\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert in Lemma 4.17. Note that n2(1η)s(12η)n\geqslant\frac{2(1-\eta)}{s(1-2\eta)} implies αs+2+τ\alpha_{s}^{+}\leqslant 2+\tau^{-}, and moreover, we can bound Lsym,γ±±Lsym±¯\left\lVert L^{\pm}_{sym,\gamma^{\pm}}-\overline{L^{\pm}_{sym}}\right\rVert uniformly (from (52), (53)) as

Lsym,γ±±Lsym±¯25/2Cr2+32C2(s,η,l)[np(1η)]1/14C4(r,s,η,l)[np(1η)]1/14(=ΔPγ,ΔQγ+).\left\lVert L^{\pm}_{sym,\gamma^{\pm}}-\overline{L^{\pm}_{sym}}\right\rVert\leqslant\frac{2^{5/2}Cr^{2}+3\sqrt{2C_{2}(s,\eta,l)}}{[np(1-\eta)]^{1/14}}\leqslant\frac{C_{4}(r,s,\eta,l)}{[np(1-\eta)]^{1/14}}\ (=\Delta_{P_{\gamma^{-}}},\Delta_{Q_{\gamma^{+}}}). (56)

Note that ΔPγ,ΔQγ+1\Delta_{P_{\gamma^{-}}},\Delta_{Q_{\gamma^{+}}}\leqslant 1 if pC414(r,s,η,l)n(1η)p\geqslant\frac{C_{4}^{14}(r,s,\eta,l)}{n(1-\eta)}. Under these considerations, the bound in Lemma 4.17 simplifies to

Tγ+,γT¯(3+τ)(2τ++1)+τ+(τ+)2max{ΔPγ,ΔQγ+}=C1(τ+,τ)C4(r,s,η,l)3(τ+)2[np(1η)]1/28.\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert\leqslant\frac{(3+\tau^{-})(2\sqrt{\tau^{+}}+1)+\tau^{+}}{(\tau^{+})^{2}}\max\left\{\sqrt{\Delta_{P_{\gamma^{-}}}},\sqrt{\Delta_{Q_{\gamma^{+}}}}\right\}=\frac{C_{1}(\tau^{+},\tau^{-})\sqrt{C_{4}(r,s,\eta,l)}}{3(\tau^{+})^{2}[np(1-\eta)]^{1/28}}.

Following the steps in the proof of Theorem 4.13, we observe that Tγ+,γT¯12(λnk+1(Tγ+,γ)λnk(T¯))\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert\leqslant\frac{1}{2}(\lambda_{n-k+1}(T_{\gamma^{+},\gamma^{-}})-\lambda_{n-k}(\overline{T})) is guaranteed to hold, provided

C1(τ+,τ)C4(r,s,η,l)3(τ+)2[np(1η)]1/28(1β2)(1+τ2+τ+)p(2C1(τ+,τ)(2+τ+)3(τ+)2(1β)(1+τ))28C414(r,s,η,l)n(1η).\frac{C_{1}(\tau^{+},\tau^{-})\sqrt{C_{4}(r,s,\eta,l)}}{3(\tau^{+})^{2}[np(1-\eta)]^{1/28}}\leqslant(\frac{1-\beta}{2})\left(\frac{1+\tau^{-}}{2+\tau^{+}}\right)\iff p\geqslant\left(\frac{2C_{1}(\tau^{+},\tau^{-})(2+\tau^{+})}{3(\tau^{+})^{2}(1-\beta)(1+\tau^{-})}\right)^{28}\frac{C_{4}^{14}(r,s,\eta,l)}{n(1-\eta)}.

Then, we obtain via the Davis-Kahan theorem that there exists an orthogonal matrix O\varmathbbRk×kO\in\varmathbb R^{k\times k} such that

Vk(Tγ+,γ)Vk(T¯)O4Tγ+,γT¯λnk+1(T¯)λnk(T¯)4C1(τ+,τ)C4(r,s,η,l)(2+τ+)3(τ+)2[np(1η)]1/28(1β)(1+τ)δ,\left\lVert V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O\right\rVert\leqslant\frac{4\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert}{\lambda_{n-k+1}(\overline{T})-\lambda_{n-k}(\overline{T})}\leqslant\frac{4C_{1}(\tau^{+},\tau^{-})\sqrt{C_{4}(r,s,\eta,l)}(2+\tau^{+})}{3(\tau^{+})^{2}[np(1-\eta)]^{1/28}(1-\beta)(1+\tau^{-})}\leqslant\delta,

for the stated bound on pp in the theorem. This establishes the first part of the theorem.

In order to bound Gk(Tγ+,γ)Gk(T¯)O\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert, first observe that

Gk(Tγ+,γ)Gk(T¯)O\displaystyle\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert =Pγ1/2(Vk(Tγ+,γ)Vk(T¯)O)+(Pγ1/2P¯1/2)Vk(T¯)O\displaystyle=\left\lVert P_{\gamma^{-}}^{-1/2}(V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O)+(P_{\gamma^{-}}^{-1/2}-\overline{P}^{-1/2})V_{k}(\overline{T})O\right\rVert
Pγ1/2(τ+)1/2Vk(Tγ+,γ)Vk(T¯)Oδ+P1/2P¯1/2Vk(T¯)=1\displaystyle\leqslant\underbrace{\left\lVert P_{\gamma^{-}}^{-1/2}\right\rVert}_{\leqslant(\tau^{+})^{-1/2}}\underbrace{\left\lVert V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O\right\rVert}_{\leqslant\delta}+\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert\underbrace{\left\lVert V_{k}(\overline{T})\right\rVert}_{=1}
δτ++Pγ1/2P¯1/2.\displaystyle\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\left\lVert P_{\gamma^{-}}^{-1/2}-\overline{P}^{-1/2}\right\rVert. (57)

The second term Pγ1/2P¯1/2\left\lVert P_{\gamma^{-}}^{-1/2}-\overline{P}^{-1/2}\right\rVert can be bounded as

Pγ1/2P¯1/2=Pγ1(Pγ1/2P¯1/2)P¯1Pγ1/2P¯1/2(τ+)2PγP¯1/2(τ+)2C4(r,s,η,l)(τ+)2[np(1η)]1/28,\displaystyle\left\lVert P_{\gamma^{-}}^{-1/2}-\overline{P}^{-1/2}\right\rVert=\left\lVert P_{\gamma^{-}}^{-1}(P_{\gamma^{-}}^{1/2}-\overline{P}^{1/2})\overline{P}^{-1}\right\rVert\leqslant\frac{\left\lVert P_{\gamma^{-}}^{1/2}-\overline{P}^{1/2}\right\rVert}{(\tau^{+})^{2}}\leqslant\frac{\left\lVert P_{\gamma^{-}}-\overline{P}\right\rVert^{1/2}}{(\tau^{+})^{2}}\leqslant\frac{\sqrt{C_{4}(r,s,\eta,l)}}{(\tau^{+})^{2}[np(1-\eta)]^{1/28}}, (58)

where the penultimate inequality uses Proposition C.1, and the last inequality uses (56). Plugging (58) into (57) leads to the stated bound for pC414(r,s,η,l)n(1η)δ28p\geqslant\frac{C_{4}^{14}(r,s,\eta,l)}{n(1-\eta)\delta^{28}}. ∎

4.6 Mis-clustering rate from kk-means

We now analyze the mis-clustering error rate when we apply a (1+ξ)(1+\xi)-approximate kk-means algorithm (e.g., [KSS04]) on the rows of Gk(T)G_{k}(T) (respectively, Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}}) in the sparse regime). To this end, we rely on the following result from [LR15], which when applied to our setting, yields that the mis-clustering error is bounded by the estimation error Gk(T)Gk(T¯)OF2\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert_{F}^{2} (or Gk(Tγ+,γ)Gk(T¯)OF2\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert_{F}^{2} in the sparse setting). By an (1+ξ)(1+\xi)-approximate algorithm, we mean an algorithm that is provably within an (1+ξ)(1+\xi) factor of the cost of the optimal solution achieved by kk-means.

Lemma 4.19 (Lemma 5.3 of [LR15], Approximate kk-means error bound).

For any ξ>0\xi>0, and any two matrices U¯,U\overline{U},U, such that U¯=Θ¯X¯\overline{U}=\overline{\Theta}\overline{X} with (Θ¯,X¯)\varmathbbMn×k×\varmathbbRk×k(\overline{\Theta},\overline{X})\in\varmathbb{M}_{n\times k}\times\varmathbb R^{k\times k}, let (Θ~,X~)\varmathbbMn×k×\varmathbbRk×k(\tilde{\Theta},\tilde{X})\in\varmathbb{M}_{n\times k}\times\varmathbb R^{k\times k} be a (1+ξ)(1+\xi)-approximate solution to the kk-means problem minΘ\varmathbbMn×k,X\varmathbbRk×kΘXUF2\min_{\Theta\in\varmathbb{M}_{n\times k},X\in\varmathbb R^{k\times k}}\left\lVert\Theta X-U\right\rVert_{F}^{2} so that

Θ~X~UF2(1+ξ)minΘ\varmathbbMn×k,X\varmathbbRk×kΘXUF2\left\lVert\tilde{\Theta}\tilde{X}-U\right\rVert_{F}^{2}\leqslant(1+\xi)\min_{\Theta\in\varmathbb{M}_{n\times k},X\in\varmathbb R^{k\times k}}\left\lVert\Theta X-U\right\rVert_{F}^{2}

and U~=Θ~X~\tilde{U}=\tilde{\Theta}\tilde{X}. For any δiminiiX¯iX¯i\delta_{i}\leqslant\min_{i^{\prime}\neq i}\left\lVert\overline{X}_{i^{\prime}*}-\overline{X}_{i*}\right\rVert, define Si={jCi:U~jU¯jδi/2}S_{i}=\left\{j\in C_{i}~:~\left\lVert\tilde{U}_{j*}-\overline{U}_{j*}\right\rVert\geqslant\delta_{i}/2\right\} then

i=1k|Si|δi24(4+2ξ)UU¯F2.\sum_{i=1}^{k}\left\lvert S_{i}\right\rvert\delta_{i}^{2}\leqslant 4(4+2\xi)\left\lVert U-\overline{U}\right\rVert_{F}^{2}\,. (59)

Moreover, if

(16+8ξ)UU¯F2/δi2<nii[k],(16+8\xi)\left\lVert U-\overline{U}\right\rVert_{F}^{2}/\delta_{i}^{2}<n_{i}\qquad\forall i\in[k]\,, (60)

then there exists a k×kk\times k permutation matrix π\pi such that Θ~G=Θ¯Gπ\tilde{\Theta}_{G}=\overline{\Theta}_{G}\pi, where G=i=1k(CiSi)G=\cup_{i=1}^{k}(C_{i}\setminus S_{i}).

Combining Lemma 4.19 with the perturbation results of Theorem 4.13 and Theorem 4.18, we readily arrive at mis-clustering error bounds for SPONGEsym.

Theorem 4.20 (Mis-clustering error for SPONGEsym).

Under the notation and assumptions of Theorem 4.13, let (Θ~,X~)\varmathbbMn×k×\varmathbbRk×k(\tilde{\Theta},\tilde{X})\in\varmathbb{M}_{n\times k}\times\varmathbb R^{k\times k} be a (1+ξ)(1+\xi)-approximate solution to the kk-means problem minΘ\varmathbbMn×k,X\varmathbbRk×kΘXGk(T)F2\min_{\Theta\in\varmathbb{M}_{n\times k},X\in\varmathbb R^{k\times k}}\left\lVert\Theta X-G_{k}(T)\right\rVert_{F}^{2}. Denoting

Si={jCi:(Θ~X~)j(Θ(C)1/2RO)j12ni(τ++21l)},S_{i}=\left\{j\in C_{i}\ :\ \left\lVert(\tilde{\Theta}\tilde{X})_{j*}-(\Theta(C^{-})^{-1/2}RO)_{j*}\right\rVert\geqslant\frac{1}{2\sqrt{n_{i}(\tau^{+}+\frac{2}{1-l})}}\right\},

it holds with probability at least 12ε1-2\varepsilon that

i=1k|Si|niδ2(64+32ξ)k(τ++21l)((τ+)3+1(τ+)4).\sum_{i=1}^{k}\frac{\left\lvert S_{i}\right\rvert}{n_{i}}\leqslant\delta^{2}{(64+32\xi)k}\left(\tau^{+}+\frac{2}{1-l}\right)\left(\frac{(\tau^{+})^{3}+1}{(\tau^{+})^{4}}\right).

In particular, if δ\delta satisfies

δ<(τ+)2(64+32ξ)k(τ++21l)((τ+)3+1),\delta<\frac{(\tau^{+})^{2}}{\sqrt{(64+32\xi)k(\tau^{+}+\frac{2}{1-l})((\tau^{+})^{3}+1)}},

then there exists a k×kk\times k permutation matrix π\pi such that Θ~G=Θ^Gπ\tilde{\Theta}_{G}=\hat{\Theta}_{G}\pi, where G=i=1k(CiSi)G=\cup_{i=1}^{k}(C_{i}\setminus S_{i}).

In the sparse regime, the above statement holds under the notation and assumptions of Theorem 4.18 with Gk(T)G_{k}(T) replaced with Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}}), and with probability at least 12er1-2e^{-r}.

Proof.

Since Gk(T)Gk(T¯)OG_{k}(T)-G_{k}(\overline{T})O has rank at most 2k2k, we obtain from Theorem 4.13 that

Gk(T)Gk(T¯)OF2kGk(T)Gk(T¯)Oδ2k((τ+)3/2+1(τ+)2).\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert_{F}\leqslant\sqrt{2k}\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert\leqslant\delta\sqrt{2k}\left(\frac{(\tau^{+})^{3/2}+1}{(\tau^{+})^{2}}\right). (61)

We now use Lemma 4.19 with U=Gk(T)U=G_{k}(T) and U¯=Gk(T¯)O\overline{U}=G_{k}(\overline{T})O. It follows from (44) and Lemma 4.1 that Gk(T¯)=Θ(C)1/2R=ΘΔΔ1(C)1/2RG_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R=\Theta\Delta~\Delta^{-1}(C^{-})^{-1/2}R where Δ=diag(n1,,nk)\Delta=\textrm{diag}(\sqrt{n_{1}},\dots,\sqrt{n_{k}}). Denoting X¯=Δ1(C)1/2RO\overline{X}=\Delta^{-1}(C^{-})^{-1/2}RO, we can write Gk(T¯)O=Θ^X¯G_{k}(\overline{T})O=\hat{\Theta}\overline{X}, where Θ^\varmathbbMn×k\hat{\Theta}\in\varmathbb{M}_{n\times k} is the ground truth membership matrix, and for each ii[k]i\neq i^{\prime}\in[k], it holds true that

X¯iX¯iλmin((C)1/2)1/ni+1/ni1λmax(C)ni.\left\lVert\overline{X}_{i*}-\overline{X}_{i^{\prime}*}\right\rVert\geqslant\lambda_{\min}((C^{-})^{-1/2})\sqrt{1/n_{i}+1/n_{i^{\prime}}}\geqslant\frac{1}{\sqrt{\lambda_{\max}(C^{-})n_{i}}}\,.

From (18), one can verify using Weyl’s inequality that

λmax(C)1+τ++maxipdi(ηi+sin(12η))τ++21l,\lambda_{\max}(C^{-})\leqslant 1+\tau^{+}+\max_{i}\frac{p}{d_{i}^{-}}(\eta_{i}+s_{i}n(1-2\eta))\leqslant\tau^{+}+\frac{2}{1-l},

where the last inequality holds if n2η(1l)(1η)n\geqslant\frac{2\eta}{(1-l)(1-\eta)}. The above considerations imply that δi=1ni(τ++21l)\delta_{i}=\frac{1}{\sqrt{n_{i}(\tau^{+}+\frac{2}{1-l})}}. Now with SiS_{i} as defined in the statement, we obtain from (59) and (61) that

i=1k|Si|δi2=1τ++21li=1k|Si|niδ2(32+16ξ)k((τ+)3/2+1)2(τ+)4δ2(64+32ξ)k((τ+)3+1(τ+)4),\displaystyle\sum_{i=1}^{k}\left\lvert S_{i}\right\rvert\delta_{i}^{2}=\frac{1}{\tau^{+}+\frac{2}{1-l}}\sum_{i=1}^{k}\frac{\left\lvert S_{i}\right\rvert}{n_{i}}\leqslant\delta^{2}(32+16\xi)k\frac{((\tau^{+})^{3/2}+1)^{2}}{(\tau^{+})^{4}}\leqslant\delta^{2}(64+32\xi)k\left(\frac{(\tau^{+})^{3}+1}{(\tau^{+})^{4}}\right),

where the last inequality uses (a+b)22(a2+b2)(a+b)^{2}\leqslant 2(a^{2}+b^{2}) for a,b0a,b\geqslant 0. This yields the first part of the Theorem.

For the second part, we need to ensure (60) holds. Using (61) and the expression for δi\delta_{i}, it is easy to verify that (60) holds for the stated condition on δ\delta.

Finally, the statement for the sparse regime readily follows in an analogous manner (replacing Gk(T)G_{k}(T) with Gk(Tγ+,γ)G_{k}(T_{\gamma^{+},\gamma^{-}})), by following the same steps as above. ∎

5 Concentration results for the symmetric Signed Laplacian

This section contains proofs of the main results for the symmetric Signed Laplacian, in both the dense regime plnnnp\gtrsim\frac{\ln n}{n} and the sparse regime p1np\gtrsim\frac{1}{n}. Before proceeding with an overview of the main steps, for ease of reference, we summarize in the Table below the notation specific to this section.

Notation Description
Lsym¯\overline{L_{sym}} symmetric Signed Laplacian
sym\mathcal{L}_{sym} population Signed Laplacian
LγL_{\gamma} regularized Laplacian
γ\mathcal{L}_{\gamma} population regularized Laplacian
γ+,γ>0\gamma^{+},\gamma^{-}>0 regularization parameters
γ=γ++γ\gamma=\gamma^{+}+\gamma^{-}
α¯=1+pd¯(12η)\overline{\alpha}=1+\frac{p}{\overline{d}}(1-2\eta)
d¯=p(n1)\bar{d}=p(n-1) expected signed degree
ρ=nminnmax=sl\rho=\frac{n_{min}}{n_{max}}=\frac{s}{l} aspect ratio

The proof of Theorem 3.4 is built on the following steps. In Section 5.1, we compute the eigen-decomposition of the Signed Laplacian of the expected graph sym\mathcal{L}_{sym}. Then in Section 5.2, we show Lsym¯\overline{L_{sym}} and sym\mathcal{L}_{sym} are “close”, and obtain an upper bound on the error Lsym¯sym\left\lVert\overline{L_{sym}}-\mathcal{L}_{sym}\right\rVert. Finally, in Section 5.3, we use the Davis-Kahan theorem (see Theorem B.2) to bound the error between the subspaces Vk1(Lsym¯)V_{k-1}(\overline{L_{sym}}) and Vk1(sym)V_{k-1}(\mathcal{L}_{sym}). To prove Theorem 3.7, in Section 5.4, we first use a decomposition of the set of edges [n]×[n][n]\times[n] and characterize the behaviour of the regularized Signed Laplacian on each subset. This leads in Section 5.5 to the error bounds of Theorem 3.7. Finally, the proof of Theorem 3.9, that bound the error on the eigenspace, relies on the same arguments as Theorem 3.4 and can be found in Section 5.6. Similarly to the approach for SPONGEsym, the mis-clustering error is obtained using a (1+ξ1+\xi)-approximate solution of the kk-means problem applied to the rows of Vk1(Lsym¯)V_{k-1}(\overline{L_{sym}}) (resp. Vk1(Lγ)V_{k-1}(L_{\gamma})). This solution contains, in particular, an estimated membership matrix Θ~\tilde{\Theta}. The bound on the mis-clustering error of the algorithm given in Theorem 3.12 is derived using Lemma 4.19 (Lemma 5.3 of [LR15]), in Section 5.7.

5.1 Analysis of the expected Signed Laplacian

In this section, we compute the eigen-decomposition of the matrix sym\mathcal{L}_{sym}. In particular, we aim at proving a lower bound on the eigengap between the (k1)th(k-1)^{th} and kthk^{th} smallest eigenvalues. For equal-size clusters, there is an explicit expression for this eigengap.

5.1.1 Matrix decomposition

Lemma 5.1.

Let Θ\varmathbbRn×k\Theta\in\varmathbb{R}^{n\times k} denote the normalized membership matrix in the SSBM. Let V\varmathbbRn×(nk)V^{\perp}\in\varmathbb{R}^{n\times(n-k)} be a matrix whose columns are any orthonormal base of the subspace orthogonal to (Θ)\mathcal{R}(\Theta). The Signed Laplacian of the expected graph has the following decomposition

sym=[ΘV](C¯00α¯Ink)[ΘT(V)T],\mathcal{L}_{sym}=[\Theta\>V^{\perp}]\begin{pmatrix}\bar{C}&0\\ 0&\overline{\alpha}I_{n-k}\end{pmatrix}\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix}, (62)

with C¯=α¯IkB¯\overline{C}=\bar{\alpha}I_{k}-\overline{B}, α¯=1+pd¯(12η)\overline{\alpha}=1+\frac{p}{\overline{d}}(1-2\eta) and B¯\overline{B} is a k×kk\times k matrix such that

B¯ii={nipd¯(12η); if i=ininipd¯(12η); if ii.\hskip-8.53581pt\bar{B}_{ii^{\prime}}=\left\{\begin{array}[]{rl}\frac{n_{i}p}{\overline{d}}(1-2\eta);&\text{ if }i=i^{\prime}\\ -\frac{\sqrt{n_{i}n_{i^{\prime}}}p}{\overline{d}}(1-2\eta);&\text{ if }i\neq i^{\prime}.\end{array}\right. (63)
Proof.

On one hand, we recall from Section 2.3 that the expected degree matrix is a scaled identity matrix \varmathbbE[D¯]=d¯In\varmathbb{E}[\bar{D}]=\bar{d}I_{n}, with d¯=p(n1)\overline{d}=p(n-1). Thus, any vector v\varmathbbRnv\in\varmathbb R^{n} is an eigenvector of \varmathbbE[D¯]\varmathbb{E}[\bar{D}] with corresponding eigenvalue d¯\overline{d}, and it holds true that

\varmathbbE[D¯]1/2\displaystyle\varmathbb{E}[\bar{D}]^{-1/2} =1d¯In=1d¯[Θ(V)]In[ΘT(V)T].\displaystyle=\frac{1}{\sqrt{\bar{d}}}I_{n}=\frac{1}{\sqrt{\bar{d}}}[\Theta\>(V^{\perp})]\>I_{n}\>\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix}. (64)

On the other hand, the signed adjacency matrix can be written in the form

\varmathbbE[A]\displaystyle\varmathbb{E}[A] =\varmathbbE[A+]\varmathbbE[A]=Mp(12η)In,\displaystyle=\varmathbb{E}[A^{+}]-\varmathbb{E}[A^{-}]=M-p(1-2\eta)I_{n}, (65)

where

M\displaystyle M =[p(12η)Jn1p(12η)Jn1×n2p(12η)Jn1×nkp(12η)Jn2×n1p(12η)Jn2p(12η)Jn2×nkp(12η)Jnk×n1p(12η)Jnk].\displaystyle=\begin{bmatrix}p(1-2\eta)J_{n_{1}}&-p(1-2\eta)J_{n_{1}\times n_{2}}&\ldots&-p(1-2\eta)J_{n_{1}\times n_{k}}\\ -p(1-2\eta)J_{n_{2}\times n_{1}}&p(1-2\eta)J_{n_{2}}&\ldots&-p(1-2\eta)J_{n_{2}\times n_{k}}\\ \vdots&\vdots&\ddots&\vdots\\ -p(1-2\eta)J_{n_{k}\times n_{1}}&\ldots&\ldots&p(1-2\eta)J_{n_{k}}\end{bmatrix}.

The matrix MM has the following decomposition

M\displaystyle M =d¯ΘB¯ΘT=d¯[ΘV](B¯000)[ΘT(V)T],\displaystyle=\overline{d}\Theta\overline{B}\Theta^{T}=\overline{d}[\Theta\>V^{\perp}]\begin{pmatrix}\overline{B}&0\\ 0&0\end{pmatrix}\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix},

with B¯\overline{B} defined in (63). Thus, combining (64) and (65), we arrive at

\varmathbbE[D¯]1/2\varmathbbE[A]\varmathbbE[D¯]1/2=1d¯Mp(12η)1d¯In=[ΘV](B¯000)[ΘT(V)T](12η)pd¯In.\displaystyle\varmathbb{E}[\bar{D}]^{-1/2}\varmathbb{E}[A]\varmathbb{E}[\bar{D}]^{-1/2}=\frac{1}{\bar{d}}M-p(1-2\eta)\frac{1}{\bar{d}}I_{n}=[\Theta\>V^{\perp}]\begin{pmatrix}\overline{B}&0\\ 0&0\end{pmatrix}\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix}-(1-2\eta)\frac{p}{\bar{d}}I_{n}.

This finally leads to the decomposition of sym\mathcal{L}_{sym}

sym=I\varmathbbE[D¯]1/2\varmathbbE[A]\varmathbbE[D¯]1/2=[ΘV](C¯00α¯Ink)[ΘT(V)T],\mathcal{L}_{sym}=I-\varmathbb{E}[\bar{D}]^{-1/2}\varmathbb{E}[A]\varmathbb{E}[\bar{D}]^{-1/2}=[\Theta\>V^{\perp}]\begin{pmatrix}\bar{C}&0\\ 0&\overline{\alpha}I_{n-k}\end{pmatrix}\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix},

with C¯=α¯IkB¯\overline{C}=\bar{\alpha}I_{k}-\overline{B} and α¯=1+p(12η)\overline{\alpha}=1+p(1-2\eta). ∎

We can infer from Lemma 5.1 that the spectrum of sym\mathcal{L}_{sym} is the union of the spectrum of the matrix C¯\varmathbbRk×k\bar{C}\in\varmathbb R^{k\times k} and {α¯}\{\overline{\alpha}\}. Moreover, denoting u=1d¯(n1,,nk)Tu=\frac{1}{\sqrt{\overline{d}}}(\sqrt{n_{1}},\dots,\sqrt{n_{k}})^{T}, we have C¯=p(12η)uuT+diag(1+pd¯(12η)(12ni))\overline{C}=p(1-2\eta)uu^{T}+\text{diag}\left(1+\frac{p}{\overline{d}}(1-2\eta)(1-2n_{i})\right). For a SSBM with equal-size clusters, we are able to find explicit expressions for the eigenvalues of C¯\overline{C}.

5.1.2 Spectrum of the Signed Laplacian: equal-size clusters

In this section, we assume that the clusters in the SSBM have equal sizes n1=n2==nk=nkn_{1}=n_{2}=\dots=n_{k}=\frac{n}{k}. In this case,

1d¯(n1,,nk)T=nd¯χ1,\displaystyle\frac{1}{\sqrt{\overline{d}}}(\sqrt{n_{1}},\dots,\sqrt{n_{k}})^{T}=\sqrt{\frac{n}{\overline{d}}}\chi_{1},

and denoting by C¯e\overline{C}_{e} the matrix C¯\overline{C} in this setting of equal clusters, we may write

C¯e\displaystyle\overline{C}_{e} =npd¯(12η)χ1χ1T+(1+pd¯(12η)(12nk))Ik.\displaystyle=\frac{np}{\overline{d}}(1-2\eta)\chi_{1}\chi_{1}^{T}+\left(1+\frac{p}{\overline{d}}(1-2\eta)\bigg{(}1-2\frac{n}{k}\bigg{)}\right)I_{k}. (66)

Hence, the spectrum of C¯e\overline{C}_{e} contains only two different values. The largest one has multiplicity 1, and χ1\chi_{1} is the corresponding largest eigenvector. The k1k-1 remaining eigenvalues are all equal. In fact, we have

λi(C¯e)\displaystyle\lambda_{i}(\overline{C}_{e}) ={1+pd¯(12η)(n+12nk);if i=11+pd¯(12η)(12nk);if 2ik.\displaystyle=\begin{cases}1+\frac{p}{\overline{d}}(1-2\eta)(n+1-2\frac{n}{k});&\text{if }i=1\\ 1+\frac{p}{\overline{d}}(1-2\eta)\bigg{(}1-2\frac{n}{k}\bigg{)};&\text{if }2\leqslant i\leqslant k.\end{cases}

One can easily check that these eigenvalues are positive, and that the following inequality holds true

λ1(C¯e)=α¯+pd¯(12η)(n2nk)α¯>α¯2nk(12η)=λ2(C¯e).\displaystyle\lambda_{1}(\overline{C}_{e})=\overline{\alpha}+\frac{p}{\overline{d}}(1-2\eta)(n-2\frac{n}{k})\geqslant\overline{\alpha}>\overline{\alpha}-2\frac{n}{k}(1-2\eta)=\lambda_{2}(\overline{C}_{e}).

We finally have

λj(sym)\displaystyle\lambda_{j}(\mathcal{L}_{sym}) ={1+pd¯(12η)(n+12nk);if j=1α¯;if 2jnk+1λ2(C¯e);if nk+2jn.\displaystyle=\begin{cases}1+\frac{p}{\overline{d}}(1-2\eta)(n+1-2\frac{n}{k});&\text{if }j=1\\ \overline{\alpha};&\text{if }2\leqslant j\leqslant n-k+1\\ \lambda_{2}(\overline{C}_{e});&\text{if }n-k+2\leqslant j\leqslant n.\end{cases}

Note that for k=2k=2, λ1(C¯e)=α¯\lambda_{1}(\overline{C}_{e})=\overline{\alpha} and the spectrum of sym\mathcal{L}_{sym} contains only two values {α¯,λ2(C¯e)}\{\overline{\alpha},\lambda_{2}(\overline{C}_{e})\}. For k>2k>2, λ1(sym)>α¯>λ2(C¯e)\lambda_{1}(\mathcal{L}_{sym})>\overline{\alpha}>\lambda_{2}(\overline{C}_{e}). Writing the spectral decomposition

C¯e=RΛRT=[Rk1γ1]Λ[Rk1Tγ1T],\overline{C}_{e}=R\>\Lambda\>R^{T}=[R_{k-1}\>\gamma_{1}]\>\Lambda\>\begin{bmatrix}R_{k-1}^{T}\\ \gamma_{1}^{T}\end{bmatrix},

with γ1=χ1\gamma_{1}=\chi_{1} and Rk1\varmathbbRk×(k1)R_{k-1}\in\varmathbb R^{k\times(k-1)} being the matrix of eigenvectors associated to λ2(C¯e)\lambda_{2}(\overline{C}_{e}), we conclude that Vk1(sym)=ΘRk1V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}. In fact, since Θ\Theta has kk distinct rows and RR is a unitary matrix, ΘR\Theta R also has kk distinct rows. As χ1\chi_{1} is the all one’s vector , ΘRk1\Theta R_{k-1} has kk distinct rows as well. These observations are summarized in the following lemma and lead to the expression of the eigengap.

Lemma 5.2 (Eigengap for equal-size clusters).

For the SSBM with k2k\geqslant 2 clusters of equal-size nk\frac{n}{k}, we have that Vk1(sym)=ΘRk1\varmathbbRn×(k1)V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}\in\varmathbb R^{n\times(k-1)}, where Rk1R_{k-1} corresponds to the (k1)(k-1) smallest eigenvectors of C¯e\overline{C}_{e}. Moreover, with the eigengap defined as

λgap:=λnk+1(sym)λnk+2(sym),\lambda_{gap}:=\lambda_{n-k+1}(\mathcal{L}_{sym})-\lambda_{n-k+2}(\mathcal{L}_{sym}),

it holds true that

λgap=α¯λ2(C¯e)=2npkd¯(12η)2k(12η).\displaystyle\lambda_{gap}=\overline{\alpha}-\lambda_{2}(\overline{C}_{e})=\frac{2np}{k\overline{d}}(1-2\eta)\geqslant\frac{2}{k}(1-2\eta). (67)

5.1.3 Non-equal-size clusters

In the general setting of non-equal-size clusters, it is difficult to obtain an explicit expression of the spectrum of sym\mathcal{L}_{sym}. Thus, using a perturbation method, we establish a lower bound on the eigengap, provided that the aspect ratio ρ\rho is close to 1. Recall that

C¯\displaystyle\overline{C} =p(12η)uuT+diag(1+pd¯(12η)(12ni))\displaystyle=p(1-2\eta)uu^{T}+\text{diag}\left(1+\frac{p}{\overline{d}}(1-2\eta)(1-2n_{i})\right)
=p(12η)uuT2p(12η)diag(ui2)i=1n+diag(1+pd¯(12η)).\displaystyle=p(1-2\eta)uu^{T}-2p(1-2\eta)\text{diag}(u_{i}^{2})_{i=1}^{n}+\text{diag}\left(1+\frac{p}{\overline{d}}(1-2\eta)\right). (68)

We note that this matrix is of the form Λ+vvT\Lambda+vv^{T}, with Λ\Lambda being a diagonal matrix and v\varmathbbRkv\in\varmathbb{R}^{k} a vector. Using again the spectral decomposition

C¯=RΛRT=[Rk1γ1]Λ[Rk1Tγ1T],\overline{C}=R\>\Lambda\>R^{T}=[R_{k-1}\>\gamma_{1}]\>\Lambda\>\begin{bmatrix}R_{k-1}^{T}\\ \gamma_{1}^{T}\end{bmatrix}, (69)

where γ1\gamma_{1} is the largest eigenvector and Rk1\varmathbbRk×(k1)R_{k-1}\in\varmathbb R^{k\times(k-1)} contains the smallest (k1)(k-1) eigenvectors of C¯\overline{C}, we would like to ensure that the smallest (k1)(k-1) eigenvectors of sym\mathcal{L}_{sym} are related to the (k1)(k-1) eigenvectors of C¯\overline{C} in the following way Vk1(sym)=ΘRk1V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}. Note that γ1\gamma_{1} is not necessarily the all one’s vector, and ΘRk1\Theta R_{k-1} has at least k1k-1 distinct rows. To this end, we will like to ensure that

{λ2(C¯),,λk1(C¯),λk(C¯)}={λnk+2(sym),,λn1(sym),λn(sym)}.\{\lambda_{2}(\overline{C}),\dots,\lambda_{k-1}(\overline{C}),\lambda_{k}(\overline{C})\}=\{\lambda_{n-k+2}(\mathcal{L}_{sym}),\dots,\lambda_{n-1}(\mathcal{L}_{sym}),\lambda_{n}(\mathcal{L}_{sym})\}. (70)

From Weyl’s inequality (see Theorem B.1), we know that

|λi(C¯e)λi(C¯)|C¯C¯ei=1,k,\displaystyle|\lambda_{i}(\overline{C}_{e})-\lambda_{i}(\overline{C})|\leqslant\|\overline{C}-\overline{C}_{e}\|\quad\forall i=1,\dots k,

which in particular implies

λ2(C¯)λ2(C¯e)+C¯C¯e,λ1(C¯)λ1(C¯e)C¯C¯e.\displaystyle\lambda_{2}(\overline{C})\leqslant\lambda_{2}(\overline{C}_{e})+\|\overline{C}-\overline{C}_{e}\|,\qquad\lambda_{1}(\overline{C})\geqslant\lambda_{1}(\overline{C}_{e})-\|\overline{C}-\overline{C}_{e}\|.

Moreover, λ1(C¯)=α¯\lambda_{1}(\overline{C})=\overline{\alpha} when k=2k=2, and λ1(C¯)>α¯\lambda_{1}(\overline{C})>\overline{\alpha} when k>2k>2. Thus, for Condition 70 to be true, it suffices to ensure

λ2(C¯e)+C¯C¯e<α¯+C¯C¯e\displaystyle\lambda_{2}(\overline{C}_{e})+\|\overline{C}-\overline{C}_{e}\|<\overline{\alpha}+\|\overline{C}-\overline{C}_{e}\| C¯C¯e<α¯λ2(C¯e)2\displaystyle\iff\|\overline{C}-\overline{C}_{e}\|<\frac{\overline{\alpha}-\lambda_{2}(\overline{C}_{e})}{2}
C¯C¯e<npkd¯(12η),\displaystyle\iff\|\overline{C}-\overline{C}_{e}\|<\frac{np}{k\overline{d}}(1-2\eta),

using (67). In this case, we indeed have that Vk1(sym)=ΘRk1V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}. As it will be convenient later, we will ensure a slightly stronger condition, i.e.

C¯C¯e<α¯λ2(C¯e)4=np2kd¯(12η).\displaystyle\|\overline{C}-\overline{C}_{e}\|<\frac{\overline{\alpha}-\lambda_{2}(\overline{C}_{e})}{4}=\frac{np}{2k\overline{d}}(1-2\eta). (71)

Now we compute the error C¯C¯e\|\overline{C}-\overline{C}_{e}\|. We recall that u=nd¯\|u\|=\sqrt{\frac{n}{\overline{d}}} and denote Du=:1u2diag(ui2)i=1nD_{u}=:\frac{1}{\|u\|^{2}}\text{diag}(u_{i}^{2})_{i=1}^{n}, then (68) becomes

C¯\displaystyle\overline{C} =α¯Ik+npd¯(12η)(uu)(uu)T2npd¯(12η)Du.\displaystyle=\overline{\alpha}I_{k}+\frac{np}{\overline{d}}(1-2\eta)\bigg{(}\frac{u}{\|u\|}\bigg{)}\bigg{(}\frac{u}{\|u\|}\bigg{)}^{T}-2\frac{np}{\overline{d}}(1-2\eta)D_{u}.

Using (66), we obtain

C¯C¯e\displaystyle\|\overline{C}-\overline{C}_{e}\| =npd¯(12η)((uu)(uu)Tχ1χ1T)2npd¯(12η)(Du1kIn)\displaystyle=\left\|\frac{np}{\overline{d}}(1-2\eta)\left(\left(\frac{u}{\|u\|}\right)\left(\frac{u}{\|u\|}\right)^{T}-\chi_{1}\chi_{1}^{T}\right)-2\frac{np}{\overline{d}}(1-2\eta)\left(D_{u}-\frac{1}{k}I_{n}\right)\right\|
npd¯(12η)(uu)(uu)Tχ1χ1T+2npd¯(12η)Du1kIn.\displaystyle\leqslant\frac{np}{\overline{d}}(1-2\eta)\left\|\left(\frac{u}{\|u\|}\right)\left(\frac{u}{\|u\|}\right)^{T}-\chi_{1}\chi_{1}^{T}\right\|+2\frac{np}{\overline{d}}(1-2\eta)\left\|D_{u}-\frac{1}{k}I_{n}\right\|. (72)

For the first term on the RHS, we have

(uu)(uu)Tχ1χ1T\displaystyle\left\|\left(\frac{u}{\|u\|}\right)\left(\frac{u}{\|u\|}\right)^{T}-\chi_{1}\chi_{1}^{T}\right\| 2uuχ12kmaxi|nin1k|\displaystyle\leqslant 2\left\|\frac{u}{\|u\|}-\chi_{1}\right\|\leqslant 2\sqrt{k}\max_{i}\left|\sqrt{\frac{n_{i}}{n}}-\sqrt{\frac{1}{k}}\right|
2k(ls)2k(1ρ),\displaystyle\leqslant 2\sqrt{k}(\sqrt{l}-\sqrt{s})\leqslant 2\sqrt{k}(1-\sqrt{\rho}), (73)

while for the second term on the RHS, we have

Du1kIn\displaystyle\left\|D_{u}-\frac{1}{k}I_{n}\right\| =maxi|nin1k|1ρ.\displaystyle=\max_{i}\left|\sqrt{\frac{n_{i}}{n}}-\sqrt{\frac{1}{k}}\right|\leqslant 1-\sqrt{\rho}. (74)

By combining (73) and (74) into (72), we arrive at

C¯C¯e\displaystyle\|\overline{C}-\overline{C}_{e}\| npd¯(12η)k(1ρ)+2npd¯(12η)(1ρ)\displaystyle\leqslant\frac{np}{\overline{d}}(1-2\eta)\sqrt{k}(1-\sqrt{\rho})+\frac{2np}{\overline{d}}(1-2\eta)(1-\sqrt{\rho})
npd¯(12η)(1ρ)(k+2)\displaystyle\leqslant\frac{np}{\overline{d}}(1-2\eta)(1-\sqrt{\rho})\left(\sqrt{k}+2\right)
2(2+k)(12η)(1ρ),\displaystyle\leqslant 2(2+\sqrt{k})(1-2\eta)(1-\sqrt{\rho}),

using that npd¯=nn12\frac{np}{\overline{d}}=\frac{n}{n-1}\leqslant 2. Now since np2kd¯12η2k\frac{np}{2k\overline{d}}\geqslant\frac{1-2\eta}{2k} and from Condition 71, it suffices that ρ\rho satisfies

2(2+k)(12η)(1ρ)\displaystyle 2(2+\sqrt{k})(1-2\eta)(1-\sqrt{\rho}) 12η2k1ρ14k(2+k).\displaystyle\leqslant\frac{1-2\eta}{2k}\iff 1-\sqrt{\rho}\leqslant\frac{1}{4k(2+\sqrt{k})}.

Finally, we can compute

λgap\displaystyle\lambda_{gap} :=λnk+1(sym)λnk+2(sym)\displaystyle:=\lambda_{n-k+1}(\mathcal{L}_{sym})-\lambda_{n-k+2}(\mathcal{L}_{sym})
α¯C¯C¯e(λ2(C¯e)+C¯C¯e)\displaystyle\geqslant\overline{\alpha}-\|\overline{C}-\overline{C}_{e}\|-(\lambda_{2}(\overline{C}_{e})+\|\overline{C}-\overline{C}_{e}\|)
α¯λ2(C¯e)2C¯C¯e\displaystyle\geqslant\overline{\alpha}-\lambda_{2}(\overline{C}_{e})-2\|\overline{C}-\overline{C}_{e}\|
α¯λ2(C¯e)2=npkd¯(12η)12ηk.\displaystyle\geqslant\frac{\overline{\alpha}-\lambda_{2}(\overline{C}_{e})}{2}=\frac{np}{k\overline{d}}(1-2\eta)\geqslant\frac{1-2\eta}{k}.

Hence we arrive at the following lemma.

Lemma 5.3 (General lower-bound on the eigengap).

For a SSBM with k2k\geqslant 2 clusters of general sizes (n1,,nk)(n_{1},\dots,n_{k}) and aspect ratio ρ\rho satisfying

ρ>114k(2+k),\displaystyle\sqrt{\rho}>1-\frac{1}{4k(2+\sqrt{k})},

it holds true that Vk1(sym)=ΘRk1V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}, where Rk1\varmathbbRk×k1R_{k-1}\in\varmathbb R^{k\times k-1} corresponds to the (k1)(k-1) smallest eigenvectors of C¯\overline{C}. Furthermore, we can lower-bound the spectral gap λgap\lambda_{gap} as

λgap:=λnk+1(sym)λnk+2(sym)12ηk.\displaystyle\lambda_{gap}:=\lambda_{n-k+1}(\mathcal{L}_{sym})-\lambda_{n-k+2}(\mathcal{L}_{sym})\geqslant\frac{1-2\eta}{k}.

We will now show that Lsym¯\overline{L_{sym}} concentrates around the population Laplacian sym\mathcal{L}_{sym}, provided the graph is dense enough.

5.2 Concentration of the Signed Laplacian in the dense regime

In the moderately dense regime where plnnnp\gtrsim\frac{\ln n}{n}, the adjacency and the degree matrices concentrate towards their expected counterparts, as nn increases. This can be established using standard concentration tools from the literature.

Lemma 5.4.

We have the following concentration inequalities for AA and D¯\overline{D}

  1. 1.

    0<ε12,cε>0\forall 0<\varepsilon\leqslant\frac{1}{2},\exists c_{\varepsilon}>0,

    \varmathbbP(A\varmathbbE[A]((1+ε)42+2)np)1nexp(npcε).\varmathbb{P}\bigg{(}\|A-\varmathbb{E}[A]\|\leqslant((1+\varepsilon)4\sqrt{2}+2)\sqrt{np}\bigg{)}\geqslant 1-n\exp\bigg{(}-\frac{np}{c_{\varepsilon}}\bigg{)}.

    In particular, there exists a universal constant c>0c>0 such that

    \varmathbbP(A\varmathbbE[A]12np)1nexp(npc).\varmathbb{P}\bigg{(}\|A-\varmathbb{E}[A]\|\leqslant 12\sqrt{np}\bigg{)}\geqslant 1-n\exp\bigg{(}-\frac{np}{c}\bigg{)}.
  2. 2.

    If p>12lnnnp>12\frac{\ln n}{n},

    \varmathbbP(D¯\varmathbbE[D¯]3nplnn)12n.\varmathbb{P}\bigg{(}\|\overline{D}-\varmathbb{E}[\overline{D}]\|\leqslant\sqrt{3np\ln n}\bigg{)}\geqslant 1-\frac{2}{n}.
Proof.

For the first statement, we recall that AA is a symmetric matrix, with Ajj=0A_{jj^{\prime}}=0 and with independent entries above the diagonal (Ajj)j<j(A_{jj^{\prime}})_{j<j^{\prime}}. We denote Zjj=Ajj\varmathbbE[Ajj]Z_{jj^{\prime}}=A_{jj^{\prime}}-\operatorname*{\varmathbb{E}}[A_{jj^{\prime}}]. If j,jj,j^{\prime} lie in the same cluster,

Zjj={1p(12η);w. p. p(1η)1p(12η);w. p. pηp(12η);w. p. 1p.Z_{jj^{\prime}}=\left\{\begin{array}[]{rl}1-p(1-2\eta)\quad;&\text{w. p. }p(1-\eta)\\ -1-p(1-2\eta)\quad;&\text{w. p. }p\eta\\ -p(1-2\eta)\quad;&\text{w. p. }1-p\\ \end{array}\right..

If j,jj,j^{\prime} lie in different clusters,

Zjj={1+p(12η);w. p. pη1+p(12η);w. p. p(1η)p(12η);w. p. 1p.Z_{jj^{\prime}}=\left\{\begin{array}[]{rl}1+p(1-2\eta)\quad;&\text{w. p. }p\eta\\ -1+p(1-2\eta)\quad;&\text{w. p. }p(1-\eta)\\ p(1-2\eta)\quad;&\text{w. p. }1-p\\ \end{array}\right..

One can easily check that in both cases, it holds true that

\varmathbbE[(Zjj)2]\displaystyle\operatorname*{\varmathbb{E}}[(Z_{jj^{\prime}})^{2}] =p[(1η)(1p(12η))2+η(1+p(12η))2+p(12η)2)(1p)]\displaystyle=p\big{[}(1-\eta)(1-p(1-2\eta))^{2}+\eta(1+p(1-2\eta))^{2}+p(1-2\eta)^{2})(1-p)\big{]}
p(1+η(1+p)2+p)4p.\displaystyle\leqslant p(1+\eta(1+p)^{2}+p)\leqslant 4p.

Thus we can conclude that for each j[n]j\in[n], the following holds

j=1n\varmathbbE[(Zjj)2]\displaystyle\sqrt{\sum_{j^{\prime}=1}^{n}\operatorname*{\varmathbb{E}}[(Z_{jj^{\prime}})^{2}]} 4np=2np.\displaystyle\leqslant\sqrt{4np}=2\sqrt{np}.

Hence, σ~:=maxjj=1n\varmathbbE[(Zjj)2]2np\widetilde{\sigma}:=\max_{j}\sqrt{\sum_{j^{\prime}=1}^{n}\operatorname*{\varmathbb{E}}[(Z_{jj^{\prime}})^{2}\ ]}\leqslant 2\sqrt{np}. Moreover, σ~:=maxj,jZjj+=1+p(12η)2\widetilde{\sigma}_{*}:=\max_{j,j^{\prime}}\left\lVert Z_{jj^{\prime}}^{+}\right\rVert_{\infty}=1+p(1-2\eta)\leqslant 2. Therefore, we can apply the concentration bound for the norm of symmetric matrices by Bandeira and van Handel [BvH16, Corollary 3.12, Remark 3.13] (recalled in Appendix A.2) with t=2npt=2\sqrt{np}, in order to bound Z=A\varmathbbE[A]\left\lVert Z\right\rVert=\left\lVert A-\operatorname*{\varmathbb{E}}[A]\right\rVert. For any given 0<ε1/20<\varepsilon\leqslant 1/2, we have that

A\varmathbbE[A]((1+ε)42+2)np,\left\lVert A-\operatorname*{\varmathbb{E}}[A]\right\rVert\leqslant\big{(}(1+\varepsilon)4\sqrt{2}+2\big{)}\sqrt{np},

with probability at least 1nexp(pncε)1-n\exp{\Big{(}\frac{-pn}{c_{\varepsilon}}\Big{)}}, where cεc_{\varepsilon} only depends on ε\varepsilon.

For the second statement, we apply Chernoff’s bound (see Appendix A.1) to the random variables D¯jj=j=1n(Ajj++Ajj),\overline{D}_{jj}=\sum_{j^{\prime}=1}^{n}\left(A^{+}_{jj^{\prime}}+A^{-}_{jj^{\prime}}\right), where we note that (Ajj++Ajj)j=1n(A^{+}_{jj^{\prime}}+A^{-}_{jj^{\prime}})_{j^{\prime}=1}^{n} are independent Bernoulli random variables with mean pp. Hence, \varmathbbE[Djj]=d¯=p(n1)\operatorname*{\varmathbb{E}}[D_{jj}]=\overline{d}=p(n-1). Let δ=6lnnd¯\delta=\sqrt{\frac{6\ln n}{\overline{d}}} and assuming that p>12lnnnp>12\frac{\ln n}{n} (so that δ<1\delta<1), we obtain

\varmathbbP[|D¯jjd¯|6d¯lnn]\varmathbbP[|D¯jjd¯|3nplnn]2exp(2lnn)=2n2,\operatorname*{\varmathbb{P}}\left[\big{|}\overline{D}_{jj}-\overline{d}\big{|}\geqslant\sqrt{6\overline{d}\ln n}\right]\leqslant\operatorname*{\varmathbb{P}}\left[\big{|}\overline{D}_{jj}-\overline{d}\big{|}\geqslant\sqrt{3np\ln n}\right]\leqslant 2\exp{\big{(}-2\ln n\big{)}}=\frac{2}{n^{2}},

using that n1n2n-1\geqslant\frac{n}{2}. Applying the union bound, we finally obtain that

\varmathbbP(D¯\varmathbbE[D¯]\displaystyle\varmathbb{P}\bigg{(}\|\overline{D}-\varmathbb{E}[\overline{D}]\| 3nplnn)2n.\displaystyle\geqslant\sqrt{3np\ln n}\bigg{)}\leqslant\frac{2}{n}.

Lemma 5.5.

If A\varmathbbE[A]ΔA\|A-\varmathbb{E}[A]\|\leqslant\Delta_{A}, D¯\varmathbbE[D¯]ΔD\|\overline{D}-\varmathbb{E}[\overline{D}]\|\leqslant\Delta_{D} and p>12lnnnp>12\frac{\ln n}{n}, then with probability at least 12n1-\frac{2}{n}, it follows that

Lsym¯sym\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\| ΔAd¯+2ΔDd¯+ΔD2d¯2.\displaystyle\leqslant\frac{\Delta_{A}}{\overline{d}}+2\frac{\Delta_{D}}{\overline{d}}+\frac{\Delta_{D}^{2}}{\overline{d}^{2}}.
Proof.

We first note that using the proof of Lemma 5.4, with probability at least 12n1-\frac{2}{n}, we have that |D¯jjd¯|δd¯,j[n]\big{|}\overline{D}_{jj}-\overline{d}\big{|}\leqslant\delta\overline{d},\forall j\in[n], with δ<1\delta<1. Consequently,

(\varmathbbE[D¯])1/2D¯1/2I\displaystyle\|(\varmathbb{E}[\bar{D}])^{-1/2}\bar{D}^{1/2}-I\| =maxj|D¯jjd¯1|maxj|D¯jjd¯|d¯=ΔDd¯,\displaystyle=\max_{j}\left|\sqrt{\frac{\overline{D}_{jj}}{\overline{d}}}-1\right|\leqslant\max_{j}\frac{|\overline{D}_{jj}-\overline{d}|}{\overline{d}}=\frac{\Delta_{D}}{\overline{d}},

since |x1||x1||\sqrt{x}-1|\leqslant|x-1| for 0<x<10<x<1. We now apply the first inequality of Proposition C.2 with A=D¯,A+=A,B=\varmathbbE[D¯],B+=\varmathbbE[A]A^{-}=\overline{D},A^{+}=A,B^{-}=\operatorname*{\varmathbb{E}}\left[\overline{D}\right],B^{+}=\operatorname*{\varmathbb{E}}\left[A\right]. We obtain

Lsym¯symΔAd¯+D¯1A(ΔD2d¯2+2ΔDd¯).\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|\leqslant\frac{\Delta_{A}}{\overline{d}}+\left\lVert\overline{D}^{-1}\right\rVert\left\lVert A\right\rVert\left(\frac{\Delta_{D}^{2}}{\overline{d}^{2}}+2\frac{\Delta_{D}}{\overline{d}}\right).

It remains to prove that D¯1A1\left\lVert\overline{D}^{-1}\right\rVert\left\lVert A\right\rVert\leqslant 1. It holds since D¯\overline{D} is a diagonal matrix, thus D¯1A=D¯1A\left\lVert\overline{D}^{-1}\right\rVert\left\lVert A\right\rVert=\left\lVert\overline{D}^{-1}A\right\rVert and similarly to Lemma E.1, it is straightforward to prove that ID¯1A2I-\left\lVert\overline{D}^{-1}A\right\rVert\leqslant 2, therefore D¯1A1\left\lVert\overline{D}^{-1}A\right\rVert\leqslant 1.

Combining the results from Lemma 5.4 and Lemma 5.5, we arrive at the concentration bound for Lsym¯sym\|\overline{L_{sym}}-\mathcal{L}_{sym}\|.

Lemma 5.6.

Under the assumptions of Theorem 3.4, if n10n\geqslant 10, then with probability at least 1nexp(npcϵ)2n1-n\exp(-\frac{np}{c_{\epsilon}})-\frac{2}{n} there exists a universal constant 0<C<430<C<43 such that

Lsym¯symClnnnp.\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|\leqslant C\sqrt{\frac{\ln n}{np}}.
Proof.

If p12lnnnp\geqslant\frac{12\ln n}{n}, the bounds in Lemma 5.4 hold simultaneously with probability at least 1nexp(npc)2n1-n\exp(-\frac{np}{c})-\frac{2}{n} and we have, with the notations of Lemma 5.5, ΔA12np\Delta_{A}\leqslant 12\sqrt{np} and ΔD3nplnn\Delta_{D}\leqslant\sqrt{3np\ln n}. Applying Lemma 5.5, we then obtain

Lsym¯sym12npd¯+23nplnnd¯+3nplnnd¯224np+43lnnnp+12lnnnp.\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|\leqslant\frac{12\sqrt{np}}{\overline{d}}+2\frac{\sqrt{3np\ln n}}{\overline{d}}+\frac{3np\ln n}{\overline{d}^{2}}\leqslant\frac{24}{\sqrt{np}}+4\sqrt{3}\sqrt{\frac{\ln n}{np}}+\frac{12\ln n}{np}.

If n10n\geqslant 10, lnn1\ln n\geqslant 1 and lnnnp1np\sqrt{\frac{\ln n}{np}}\geqslant\frac{1}{\sqrt{np}}. Moreover, since p12lnnnp\geqslant 12\frac{\ln n}{n}, then lnnnp112<1\frac{\ln n}{np}\leqslant\frac{1}{12}<1 and lnnnplnnnp\sqrt{\frac{\ln n}{np}}\geqslant\frac{\ln n}{np}. We finally obtain

Lsym¯sym(24+43+12)lnnnp=Clnnnp,\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|\leqslant(24+4\sqrt{3}+12)\sqrt{\frac{\ln n}{np}}=C\sqrt{\frac{\ln n}{np}},

with C=24+43+1243C=24+4\sqrt{3}+12\leqslant 43. ∎

5.3 Proof of Theorem 3.4

The proof of this theorem relies on the Davis-Kahan theorem. Using Weyl’s inequality (see Theorem B.1) and Lemma 5.6, we obtain for all 1jn1\leqslant j\leqslant n,

|λj(Lsym¯)λj(sym)|C(lnnnp)1/2.|\lambda_{j}(\overline{L_{sym}})-\lambda_{j}(\mathcal{L}_{sym})|\leqslant C\left(\frac{\ln n}{np}\right)^{1/2}.

In particular, for the kk-th smallest eigenvalue,

λnk+1(Lsym¯)λnk+1(sym)C(lnnnp)1/2,\displaystyle\lambda_{n-k+1}(\overline{L_{sym}})\geqslant\lambda_{n-k+1}(\mathcal{L}_{sym})-C\left(\frac{\ln n}{np}\right)^{1/2},
λnk+1(Lsym¯)λnk+2(sym)λnk+1(sym)λnk+2(sym)C(lnnnp)1/2=λgapC(lnnnp)1/2.\displaystyle\lambda_{n-k+1}(\overline{L_{sym}})-\lambda_{n-k+2}(\mathcal{L}_{sym})\geqslant\lambda_{n-k+1}(\mathcal{L}_{sym})-\lambda_{n-k+2}(\mathcal{L}_{sym})-C\left(\frac{\ln n}{np}\right)^{1/2}=\lambda_{gap}-C\left(\frac{\ln n}{np}\right)^{1/2}.

For δ(0,1)\delta\in(0,1), we will like to ensure that

λgapC(lnnnp)1/2>λgap(1δ2).\displaystyle\lambda_{gap}-C\left(\frac{\ln n}{np}\right)^{1/2}>\lambda_{gap}\left(1-\frac{\delta}{2}\right). (75)

From Lemma 5.3, if ρ>114k(2+k)\sqrt{\rho}>1-\frac{1}{4k(2+\sqrt{k})}, then λgap1k(12η)\lambda_{gap}\geqslant\frac{1}{k}(1-2\eta). Then for the previous condition (75) to hold, it is sufficient that

C(lnnnp)1/2<δ2k(12η)p>(2Ckδ(12η))2lnnn\displaystyle C\left(\frac{\ln n}{np}\right)^{1/2}<\frac{\delta}{2k}(1-2\eta)\iff p>\left(\frac{2Ck}{\delta(1-2\eta})\right)^{2}\frac{\ln n}{n} =C(k,η,δ)lnnn,\displaystyle=C(k,\eta,\delta)\frac{\ln n}{n}, (76)

with C(k,η,δ)=(2Ckδ(12η))2C(k,\eta,\delta)=\left(\frac{2Ck}{\delta(1-2\eta)}\right)^{2}. We note that since C(k,η,δ)C12C(k,\eta,\delta)\geqslant C\geqslant 12, hence (76) implies that p>12lnnnp>12\frac{\ln n}{n}.

With this condition, we now apply the Davis-Kahan theorem (Theorem B.2)

(IVk1(Lsym¯)Vk1(Lsym¯)T)Vk1(sym)Lsym¯symλgapC(lnnnp)1/2δλgap/2λgap(1δ/2)=δ/21δ/2δ.\displaystyle\|(I-V_{k-1}(\overline{L_{sym}})V_{k-1}(\overline{L_{sym}})^{T})V_{k-1}(\mathcal{L}_{sym})\|\leqslant\frac{\left\lVert\overline{L_{sym}}-\mathcal{L}_{sym}\right\rVert}{\lambda_{gap}-C\left(\frac{\ln n}{np}\right)^{1/2}}\leqslant\frac{\delta\lambda_{gap}/2}{\lambda_{gap}(1-\delta/2)}=\frac{\delta/2}{1-\delta/2}\leqslant\delta.

Using Proposition B.3, there then exists an orthogonal matrix O\varmathbbR(k1)×(k1)O\in\varmathbb R^{(k-1)\times(k-1)} so that

Vk1(Lsym¯)ΘRk1O2δ.\|V_{k-1}(\overline{L_{sym}})-\Theta R_{k-1}O\|\leqslant 2\delta.

5.4 Properties of the regularized Laplacian in the sparse regime

The analysis of the signed regularized Laplacian differs from the one of unsigned regularized Laplacian. In particular, Lemma 4.14 cannot be directly applied, since the trimming approach of the adjacency matrix for unsigned graphs is not available in this case. However, we will also use arguments by Le et al. in [LLV15] and [LLV17] for unsigned directed adjacency matrices in the inhomogeneous Erdős-Rényi model G(n,(pjj)j,j)G(n,(p_{jj^{\prime}})_{j,j^{\prime}}). More precisely, in Section 5.4.1, we will prove that the adjacency matrix concentrates on a large subset of edges called the core. On this subset, the unregularized (resp. regularized) Laplacian also concentrates towards the expected matrix sym\mathcal{L}_{sym} (resp. γ\mathcal{L}_{\gamma}). In Section 5.4.2, we will show that on the remaining subset of nodes, the norm of the regularized Laplacian is relatively small.

5.4.1 Properties of the signed adjacency and degree matrices

In this section, we adapt the results by [LLV17] for the signed adjacency matrix and the degree matrix in our SSBM. Similarly to Theorem 2.6 [LLV17] (see Theorem A.3), the following lemma shows that the set of edges can be decomposed into a large block, and two blocks with respectively few columns and few rows.

Lemma 5.7.

(Decomposition of the set of edges for the SSBM) Let AA be the signed adjacency matrix of a graph sampled from the SSBM. For any r1r\geqslant 1, with probability at least 16nr1-6n^{-r}, the set of edges [n]×[n][n]\times[n] can be partitioned into three classes 𝒩,\mathcal{N},\mathcal{R} and 𝒞\mathcal{C} such that

  1. 1.

    the signed adjacency matrix concentrates on 𝒩\mathcal{N}

    (A\varmathbbEA)𝒩Cr3/2d¯(1η),\|(A-\varmathbb{E}A)_{\mathcal{N}}\|\leqslant Cr^{3/2}\sqrt{\overline{d}(1-\eta)},

    with C>1C>1 a constant;

  2. 2.

    \mathcal{R} (resp. 𝒞\mathcal{C}) intersects at most 4n/d¯4n/\overline{d} columns (resp. rows) of [n]×[n][n]\times[n];

  3. 3.

    each row (resp. column) of AA_{\mathcal{R}} (resp. A𝒞A_{\mathcal{C}}) has at most 128r128r non-zero entries.

Remark 5.8.

We underline that this lemma is valid because the unsigned adjacency matrices A+A^{+} and AA^{-} have disjoint support. We do not know if similar results could be obtained for the Signed Stochastic Block Model defined by Mercado et al. in [MTH16].

Proof.

We denote Asup±A^{\pm}_{sup} (resp. Ainf±A^{\pm}_{inf}) the upper (resp. lower) triangular part of the unsigned adjacency matrices. Using this decomposition, we have

A=Ainf++Asup+AinfAsup.A=A^{+}_{inf}+A^{+}_{sup}-A^{-}_{inf}-A^{-}_{sup}.

We note that Ainf+,Asup+,Ainf,AsupA^{+}_{inf},A^{+}_{sup},A^{-}_{inf},A^{-}_{sup} have disjoint supports, and each of them has independent entries. We can hence apply Theorem A.3 to each of these matrices, where we note that for each matrix

d:=nmaxj,j\varmathbbE[Ajj]=np(1η)2d¯(1η).d:=n\max_{j,j^{\prime}}\operatorname*{\varmathbb{E}}[A_{jj^{\prime}}]=np(1-\eta)\leqslant 2\overline{d}(1-\eta).

With probability at least 12×3nr1-2\times 3n^{-r}, there exists 𝒩inf±,inf±,𝒞inf±,𝒩sup±,sup±,𝒞sup±\mathcal{N}^{\pm}_{inf},\mathcal{R}^{\pm}_{inf},\mathcal{C}^{\pm}_{inf},\mathcal{N}^{\pm}_{sup},\mathcal{R}^{\pm}_{sup},\mathcal{C}^{\pm}_{sup} four partitions of [n]×[n][n]\times[n] that have the subsequent properties. For e.g., for Ainf+A^{+}_{inf},

  • (Ainf+\varmathbbEAinf+)𝒩Cr3/2dCr3/22d¯(1η)\|(A^{+}_{inf}-\varmathbb{E}A^{+}_{inf})_{\mathcal{N}}\|\leqslant Cr^{3/2}\sqrt{d}\leqslant Cr^{3/2}\sqrt{2\overline{d}(1-\eta)};

  • inf+\mathcal{R}^{+}_{inf} (resp. 𝒞inf+\mathcal{C}^{+}_{inf}) intersects at most n/dn/d¯n/d\leqslant n/\overline{d} columns (resp. rows) of [n]×[n][n]\times[n];

  • each row (resp. column) of (Ainf+)(A^{+}_{inf})_{\mathcal{R}} (resp. (Ainf+)𝒞(A^{+}_{inf})_{\mathcal{C}}) have at most 32r32r ones.

We note that this decomposition holds simultaneously for Ainf±A^{\pm}_{inf} and Asup±A^{\pm}_{sup}. Taking the unions of these subsets,

𝒩=𝒩inf+𝒩sup+𝒩inf𝒩sup,\mathcal{N}=\mathcal{N}^{+}_{inf}\cup\mathcal{N}^{+}_{sup}\cup\mathcal{N}^{-}_{inf}\cup\mathcal{N}^{-}_{sup},

and similarly for \mathcal{R} and 𝒞\mathcal{C}, we have, with the triangle inequality

(A\varmathbbEA)𝒩\displaystyle\|(A-\varmathbb{E}A)_{\mathcal{N}}\| =(Ainf+\varmathbbEAinf+)𝒩inf++(Asup+\varmathbbEAsup+)𝒩sup+(Ainf\varmathbbEAinf)𝒩inf(Asup\varmathbbEAsup)𝒩sup\displaystyle=\|(A^{+}_{inf}-\varmathbb{E}A^{+}_{inf})_{\mathcal{N}^{+}_{inf}}+(A^{+}_{sup}-\varmathbb{E}A^{+}_{sup})_{\mathcal{N}^{+}_{sup}}-(A^{-}_{inf}-\varmathbb{E}A^{-}_{inf})_{\mathcal{N}^{-}_{inf}}-(A^{-}_{sup}-\varmathbb{E}A^{-}_{sup})_{\mathcal{N}^{-}_{sup}}\|
(Ainf+\varmathbbEAinf+)𝒩inf++(Asup+\varmathbbEAsup+)𝒩sup++(Ainf\varmathbbEAinf)𝒩inf+(Asup\varmathbbEAsup)𝒩sup\displaystyle\leqslant\|(A^{+}_{inf}-\varmathbb{E}A^{+}_{inf})_{\mathcal{N}^{+}_{inf}}\|+\|(A^{+}_{sup}-\varmathbb{E}A^{+}_{sup})_{\mathcal{N}^{+}_{sup}}\|+\|(A^{-}_{inf}-\varmathbb{E}A^{-}_{inf})_{\mathcal{N}^{-}_{inf}}\|+\|(A^{-}_{sup}-\varmathbb{E}A^{-}_{sup})_{\mathcal{N}^{-}_{sup}}\|
4Cr3/2dC1r3/2d¯(1η),\displaystyle\leqslant 4Cr^{3/2}\sqrt{d}\leqslant C_{1}r^{3/2}\sqrt{\overline{d}(1-\eta)},

with C1=4C2C_{1}=4C\sqrt{2}. Moreover, each row of \mathcal{R} (resp. each column of 𝒞\mathcal{C}) has at most 2×32r2\times 32r entries equal to 1 and 2×32r2\times 32r entries equal to 1-1, which means at most 128r128r non-zero entries. Finally \mathcal{R} (resp. 𝒞\mathcal{C}) intersects at most 4n/d¯4n/\overline{d} rows (resp. columns) of [n]×[n][n]\times[n]. ∎

For the degree matrix D¯\overline{D}, we use inequality (4.3) from [LLV17]. Recall that the degree of node jj is D¯jj=j=1n(Ajj++Ajj)\overline{D}_{jj}=\sum_{j^{\prime}=1}^{n}(A^{+}_{jj^{\prime}}+A^{-}_{jj^{\prime}}) which is a sum of nn independent Bernoulli variables with bounded variance d/nd/n. We can thus find an upper bound on the error D¯\varmathbbE[D¯]F\|\overline{D}-\varmathbb{E}[\overline{D}]\|_{F}. This bound is weaker than the one obtained in Lemma 5.4 with the assumption plnnnp\gtrsim\frac{\ln n}{n}.

Lemma 5.9.

There exists a constant C>0C^{\prime}>0 such that for any r1r\geqslant 1, with probability at least 1e2r1-e^{-2r}, it holds true

j=1n(D¯jjd¯)2Cr2nd2Cr2nd¯(1η).\sum_{j=1}^{n}(\overline{D}_{jj^{\prime}}-\overline{d})^{2}\leqslant C^{\prime}r^{2}nd\leqslant 2C^{\prime}r^{2}n\overline{d}(1-\eta).

5.4.2 Properties of the regularized Laplacian outside the core

In this section, we will bound the norm of the Signed Laplacian restricted to the subsets of edges 𝒩\mathcal{N} and 𝒞\mathcal{C}. The following “restriction lemma” is an extension of Lemma 8.1 in [LLV15] for Signed Laplacian matrices.

Lemma 5.10.

(Restriction of Signed Laplacian) Let BB be a n×nn\times n symmetric matrix, BγB_{\gamma} its regularized form as described in Section 2.2, and 𝒞[n]×[n]\mathcal{C}\subset[n]\times[n]. We denote D¯γ\overline{D}_{\gamma} the regularized degree matrix , and L¯γ=D¯γ1/2BγD¯γ1/2\overline{L}_{\gamma}=\overline{D}_{\gamma}^{-1/2}B_{\gamma}\overline{D}_{\gamma}^{-1/2} the modified “Laplacian” and B𝒞B_{\mathcal{C}} the n×nn\times n matrix such that the entries outside of 𝒞\mathcal{C} are set to 0. Let 0<ε<10<\varepsilon<1 such that the degree of each node in (Bγ)𝒞(B_{\gamma})_{\mathcal{C}} is less that ε\varepsilon times the the corresponding degree in BγB_{\gamma}. Then we have

(L¯γ)𝒞ε.\|(\overline{L}_{\gamma})_{\mathcal{C}}\|\leqslant\sqrt{\varepsilon}.
Proof.

We denote D¯r\overline{D}_{r} (resp. D¯c\overline{D}_{c}) the degree matrix of (Bγ)𝒞(B_{\gamma})_{\mathcal{C}} (resp. (Bγ)𝒞T(B_{\gamma})_{\mathcal{C}}^{T}) and L~\tilde{L} its regularized “Laplacian” (it is not necessarily a symmetric matrix) where

L~=(D¯r1/2)(Bγ)𝒞(D¯c1/2).\tilde{L}=(\overline{D}^{1/2}_{r})^{\dagger}(B_{\gamma})_{\mathcal{C}}(\overline{D}^{1/2}_{c})^{\dagger}.

By definition of L¯γ\overline{L}_{\gamma}, (L¯γ)𝒞=D¯γ1/2(Bγ)𝒞D¯γ1/2(\overline{L}_{\gamma})_{\mathcal{C}}=\overline{D}^{-1/2}_{\gamma}(B_{\gamma})_{\mathcal{C}}\overline{D}_{\gamma}^{-1/2}. Since in (Bγ)𝒞(B_{\gamma})_{\mathcal{C}}, some entries in BB are set to 0, we have that for all 1jn1\leqslant j\leqslant n,

(D¯c)jj[D¯γ]jj.(\overline{D}_{c})_{jj}\leqslant[\overline{D}_{\gamma}]_{jj}.

Moreover, by assumption, (D¯r)jjε[D¯γ]jj(\overline{D}_{r})_{jj}\leqslant\varepsilon[\overline{D}_{\gamma}]_{jj}. We denote X=(D¯r1/2)X=(\overline{D}^{1/2}_{r})^{\dagger}, Y=(D¯c1/2)Y=(\overline{D}^{1/2}_{c})^{\dagger} and Z=D¯γ1/2Z=\overline{D}_{\gamma}^{-1/2}, and now we have

L¯𝒞\displaystyle\overline{L}_{\mathcal{C}} =ZB𝒞Z=ZXXB𝒞YYZ=ZXL~YZ.\displaystyle=ZB_{\mathcal{C}}Z=ZX^{\dagger}XB_{\mathcal{C}}YY^{\dagger}Z=ZX^{\dagger}\tilde{L}Y^{\dagger}Z.

Because ZXε\|ZX^{\dagger}\|\leqslant\sqrt{\varepsilon} and YZ1\|Y^{\dagger}Z\|\leqslant 1, by sub-multiplicativity of the norm, we thus obtain

L¯𝒞ZXL~YZεL~.\displaystyle\|\overline{L}_{\mathcal{C}}\|\leqslant\|ZX^{\dagger}\|\cdot\|\tilde{L}\|\cdot\|Y^{\dagger}Z\|\leqslant\sqrt{\varepsilon}\|\tilde{L}\|.

In addition, by considering the 2n×2n2n\times 2n symmetric matrix L~\widetilde{L}^{\prime}

L~=(0nL~L~0n),\widetilde{L}^{\prime}=\begin{pmatrix}0_{n}&\tilde{L}\\ \tilde{L}&0_{n}\end{pmatrix},

we have L~=L~1\|\widetilde{L}^{\prime}\|=\|\widetilde{L}\|\leqslant 1. In fact, L~\widetilde{L}^{\prime} is equal to the identity matrix minus the regularized Laplacian of

(0n(Bγ)𝒞(Bγ)𝒞T0n).\begin{pmatrix}0_{n}&(B_{\gamma})_{\mathcal{C}}\\ (B_{\gamma})^{T}_{\mathcal{C}}&0_{n}\end{pmatrix}.

Using Appendix E, we can conclude that the eigenvalues of L~\widetilde{L}^{\prime} are between -1 and 1, leading to L~1\|\widetilde{L}^{\prime}\|\leqslant 1. Hence, we finally arrive at (L¯γ)𝒞ε\|(\overline{L}_{\gamma})_{\mathcal{C}}\|\leqslant\sqrt{\varepsilon}. ∎

Remark 5.11.

We note that this lemma is not specific to the rows of the matrix BB, and one could also derive the same lemma with the assumptions on the columns of the matrix.

5.5 Error bounds w.r.t the expected regularized Laplacian and expected Signed Laplacian

In this section, we prove an upper bound on the errors Lγγ\left\lVert L_{\gamma}-\mathcal{L}_{\gamma}\right\rVert and Lγsym\left\lVert L_{\gamma}-\mathcal{L}_{sym}\right\rVert from Theorem 3.7. We will use the decomposition of the set of edges (𝒩,,𝒞)(\mathcal{N},\mathcal{R},\mathcal{C}) from Lemma 5.7, and sum the errors on each of these subsets of edges. We recall that on the subset 𝒩\mathcal{N}, we have an upper bound on (A\varmathbbEA)𝒩\|(A-\varmathbb{E}A)_{\mathcal{N}}\|. We will also use the fact that the regularized degrees [D¯γ]jj[\overline{D}_{\gamma}]_{jj} are lower-bounded by the regularization parameter γ\gamma. On the subsets \mathcal{R} and 𝒞\mathcal{C}, we will use Lemma 5.10 to upper bound the norm of the regularized Laplacian.

Lemma 5.12.

Under the assumptions of Theorem 3.7, for any r1r\geqslant 1, with probability at least 17e2r1-7e^{-2r}, we have

LγγCr2γ(1+d¯γ)5/2+322rγ+8d¯.\|L_{\gamma}-\mathcal{L}_{\gamma}\|\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}. (77)
Proof.

Let Lγγ=S+TL_{\gamma}-\mathcal{L}_{\gamma}=S+T with

S\displaystyle S =(D¯γ)1/2Aγ(D¯γ)1/2(D¯γ)1/2\varmathbbEAγ(D¯γ)1/2=(D¯γ)1/2(Aγ\varmathbbEAγ)(D¯γ)1/2,\displaystyle=(\overline{D}_{\gamma})^{-1/2}A_{\gamma}(\overline{D}_{\gamma})^{-1/2}-(\overline{D}_{\gamma})^{-1/2}\varmathbb{E}A_{\gamma}(\overline{D}_{\gamma})^{-1/2}=(\overline{D}_{\gamma})^{-1/2}(A_{\gamma}-\varmathbb{E}A_{\gamma})(\overline{D}_{\gamma})^{-1/2},
T\displaystyle T =(D¯γ)1/2\varmathbbEAγ(D¯γ)1/2(\varmathbbED¯γ)1/2\varmathbbEAγ(\varmathbbED¯γ)1/2.\displaystyle=(\overline{D}_{\gamma})^{-1/2}\varmathbb{E}A_{\gamma}(\overline{D}_{\gamma})^{-1/2}-(\varmathbb{E}\overline{D}_{\gamma})^{-1/2}\varmathbb{E}A_{\gamma}(\varmathbb{E}\overline{D}_{\gamma})^{-1/2}.

We will bound the norm of S+TS+T on 𝒩\mathcal{N}, and the norms of LγL_{\gamma} and γ\mathcal{L}_{\gamma} on the residuals ,𝒞\mathcal{R},\mathcal{C}. We first use the triangle inequality to obtain

Lγγ\displaystyle\|L_{\gamma}-\mathcal{L}_{\gamma}\| (Lγγ)𝒩+((LγI)(γI))+(Lγγ)𝒞\displaystyle\leqslant\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{N}}\|+\|\left((L_{\gamma}-I)-(\mathcal{L}_{\gamma}-I)\right)_{\mathcal{R}}\|+\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\|
(Lγγ)𝒩+(ILγ)+(Iγ)+(ILγ)𝒞+(Iγ)𝒞\displaystyle\leqslant\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{N}}\|+\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\|+\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\|+\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\|+\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\|
=(S+T)𝒩+(ILγ)+(Iγ)+(ILγ)𝒞+(Iγ)𝒞\displaystyle=\|\left(S+T\right)_{\mathcal{N}}\|+\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\|+\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\|+\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\|+\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\|
S𝒩+T𝒩+(ILγ)+(Iγ)+(ILγ)𝒞+(Iγ)𝒞.\displaystyle\leqslant\|S_{\mathcal{N}}\|+\|T_{\mathcal{N}}\|+\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\|+\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\|+\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\|+\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\|.
1. Bounding the norm T𝒩\|T_{\mathcal{N}}\|.

Denoting γ=γ++γ\gamma=\gamma^{+}+\gamma^{-}, we have that

T𝒩2T𝒩F2\displaystyle\|T_{\mathcal{N}}\|^{2}\leqslant\|T_{\mathcal{N}}\|_{F}^{2} =j,j=1nTjj2\displaystyle=\sum_{j,j^{\prime}=1}^{n}T_{jj^{\prime}}^{2}
=j,j=1n(\varmathbbEAjj+(γ+γ)/n)2[1(D¯jj+γ)(D¯jj+γ)1d¯+γ]2\displaystyle=\sum_{j,j^{\prime}=1}^{n}\left(\varmathbb{E}A_{jj^{\prime}}+(\gamma^{+}-\gamma^{-})/n\right)^{2}\left[\frac{1}{\sqrt{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)}}-\frac{1}{\overline{d}+\gamma}\right]^{2} (78)
(d¯+γ)22n2γ6[j=1n(D¯jj+γ)2j=1n(D¯jjd¯)2+n(d¯+γ)2i=1n(D¯jjd¯)2].\displaystyle\leqslant\frac{(\overline{d}+\gamma)^{2}}{2n^{2}\gamma^{6}}\left[\sum_{j=1}^{n}(\overline{D}_{jj}+\gamma)^{2}\sum_{j^{\prime}=1}^{n}(\overline{D}_{j^{\prime}j^{\prime}}-\overline{d})^{2}+n(\overline{d}+\gamma)^{2}\sum_{i=1}^{n}(\overline{D}_{jj}-\overline{d})^{2}\right]. (79)

To upper bound (78) by (79), we have used the simplification trick in the proof of [LLV17, Theorem 4.1] which we now recall. Firstly, the second factor of (78) can be upper bounded in the following way. For 1j,jn1\leqslant j,j^{\prime}\leqslant n,

|1(D¯jj+γ)(D¯jj+γ)1d¯+γ|\displaystyle\left|\frac{1}{\sqrt{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)}}-\frac{1}{\overline{d}+\gamma}\right| =|(D¯jj+γ)(D¯jj+γ)(d¯+γ)2|(D¯jj+γ)(D¯jj+γ)(d¯+γ)+(D¯jj+γ)(D¯jj+γ)(d¯+γ)2\displaystyle=\frac{|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)^{2}|}{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)(\overline{d}+\gamma)+\sqrt{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)}(\overline{d}+\gamma)^{2}}
|(D¯jj+γ)(D¯jj+γ)(d¯+γ)2|2γ3\displaystyle\leqslant\frac{|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)^{2}|}{2\gamma^{3}}
=|(D¯jj+γ)(D¯jj+γ)(d¯+γ)(D¯jj+γ)+(d¯+γ)(D¯jj+γ)(d¯+γ)2|2γ3\displaystyle=\frac{|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)(\overline{D}_{jj}+\gamma)+(\overline{d}+\gamma)(\overline{D}_{jj}+\gamma)-(\overline{d}+\gamma)^{2}|}{2\gamma^{3}}
=|(D¯jjd¯)(D¯jj+γ)+(d¯+γ)(D¯jjd¯)|2γ3,\displaystyle=\frac{|(\overline{D}_{jj}-\overline{d})(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)+(\overline{d}+\gamma)(\overline{D}_{jj}-\overline{d})|}{2\gamma^{3}}, (80)

where the inequality comes from the fact that D¯jj+γγ\overline{D}_{jj}+\gamma\geqslant\gamma. Secondly, we use the inequality (a+b)22(a2+b2)(a+b)^{2}\leqslant 2(a^{2}+b^{2}) and we recall that by definition, we can bound the first factor of (78) by |\varmathbbE(Aγ)jj|d¯+γn|\varmathbb{E}(A_{\gamma})_{jj^{\prime}}|\leqslant\frac{\overline{d}+\gamma}{n}. This finally leads to (79).

Now we will bound each term of (79). Using Lemma 5.9, we have, for any r1r\geqslant 1, with probability at least 1e2r1-e^{-2r},

j=1n(D¯jjd¯)22Cr2nd¯(1η)2Cr2nd¯.\sum_{j=1}^{n}(\overline{D}_{jj}-\overline{d})^{2}\leqslant 2C^{\prime}r^{2}n\overline{d}(1-\eta)\leqslant 2C^{\prime}r^{2}n\overline{d}.

If this holds, then the first term of (79) is upper bounded by

i=1n(D¯jj+γ)2j=1n(D¯jjd¯)2\displaystyle\sum_{i=1}^{n}(\overline{D}_{jj}+\gamma)^{2}\sum_{j=1}^{n}(\overline{D}_{j^{\prime}j^{\prime}}-\overline{d})^{2} (2j=1n(D¯jjd¯)2+2n(d¯+γ)2)j=1n(D¯jjd¯)2\displaystyle\leqslant\left(2\sum_{j=1}^{n}(\overline{D}_{jj}-\overline{d})^{2}+2n(\overline{d}+\gamma)^{2}\right)\sum_{j^{\prime}=1}^{n}(\overline{D}_{j^{\prime}j^{\prime}}-\overline{d})^{2}
2Cr2nd¯(4Cr2nd¯+2n(d¯+γ)2)\displaystyle\leqslant 2C^{\prime}r^{2}n\overline{d}\left(4C^{\prime}r^{2}n\overline{d}+2n(\overline{d}+\gamma)^{2}\right)
2Cr2n(d¯+γ)(1η)(4Cr2nd+2n(d¯+γ)2)\displaystyle\leqslant 2C^{\prime}r^{2}n(\overline{d}+\gamma)(1-\eta)\left(4C^{\prime}r^{2}nd+2n(\overline{d}+\gamma)^{2}\right)
2Cr2n(d¯+γ)(2(2C+1)r2n(d+γ)2)\displaystyle\leqslant 2C^{\prime}r^{2}n(\overline{d}+\gamma)\left(2(2C^{\prime}+1)r^{2}n(d+\gamma)^{2}\right)
C1r4n2(d¯+γ)3,\displaystyle\leqslant C_{1}r^{4}n^{2}(\overline{d}+\gamma)^{3},

with C1=4C(2C+1)C_{1}=4C^{\prime}(2C^{\prime}+1). Similarly, we can bound the second term of (79)

n(d¯+γ)2j=1n(D¯jjd¯)2\displaystyle n(\overline{d}+\gamma)^{2}\sum_{j=1}^{n}(\overline{D}_{jj}-\overline{d})^{2} 2C(d¯+γ)2r2n2d¯2C(d¯+γ)3r2n2.\displaystyle\leqslant 2C^{\prime}(\overline{d}+\gamma)^{2}r^{2}n^{2}\overline{d}\leqslant 2C^{\prime}(\overline{d}+\gamma)^{3}r^{2}n^{2}.

Hence, we obtain the following upper bound of (79)

T𝒩2\displaystyle\|T_{\mathcal{N}}\|^{2} (C1+2C)r42γ6(d¯+γ)5=C2r4γ(1+d¯γ)5,\displaystyle\leqslant\frac{(C_{1}+2C^{\prime})r^{4}}{2\gamma^{6}}(\overline{d}+\gamma)^{5}=\frac{C_{2}r^{4}}{\gamma}\left(1+\frac{\overline{d}}{\gamma}\right)^{5}, (81)

with C2=(C1+2C)/2C_{2}=(C_{1}+2C^{\prime})/2.

2. Bounding the norm S𝒩\|S_{\mathcal{N}}\|.

We first note that

S\displaystyle S =(D¯γ)1/2(Aγ\varmathbbEAγ)(D¯γ)1/2=(D¯γ)1/2(A\varmathbbEA)(D¯γ)1/2.\displaystyle=(\overline{D}_{\gamma})^{-1/2}(A_{\gamma}-\varmathbb{E}A_{\gamma})(\overline{D}_{\gamma})^{-1/2}=(\overline{D}_{\gamma})^{-1/2}(A-\varmathbb{E}A)(\overline{D}_{\gamma})^{-1/2}.

We also recall that D¯γγ\|\overline{D}_{\gamma}\|\geqslant\gamma. Hence, using Lemma 5.7, with probability at least 16nr1-6n^{-r}, we have

S𝒩\displaystyle\|S_{\mathcal{N}}\| D¯γ1/2(A\varmathbbEA)𝒩D¯γ1/2(A\varmathbbEA)𝒩/γCr3/2γd¯(1η)Cr3/2γd¯.\displaystyle\leqslant\|\overline{D}_{\gamma}^{-1/2}\|\>\|(A-\varmathbb{E}A)_{\mathcal{N}}\|\>\|\overline{D}_{\gamma}^{-1/2}\|\leqslant\|(A-\varmathbb{E}A)_{\mathcal{N}}\|/\gamma\leqslant\frac{Cr^{3/2}}{\gamma}\sqrt{\overline{d}(1-\eta)}\leqslant\frac{Cr^{3/2}}{\gamma}\sqrt{\overline{d}}. (82)

Summing the bounds in (81) and (82), we have the intermediate result

(Lγγ)𝒩\displaystyle\|(L_{\gamma}-\mathcal{L}_{\gamma})_{\mathcal{N}}\| Cr3/2γd¯+C2r2γ(1+d¯γ)5/2\displaystyle\leqslant\frac{Cr^{3/2}}{\gamma}\sqrt{\overline{d}}+\frac{\sqrt{C_{2}}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2} (83)
r2γ(Cd¯γ+C2(1+d¯γ)5/2)\displaystyle\leqslant\frac{r^{2}}{\sqrt{\gamma}}\left(C\sqrt{\frac{\overline{d}}{\gamma}}+\sqrt{C_{2}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}\right) (84)
r2γ(C+C2)(1+d¯γ)5/2=C3r2γ(1+d¯γ)5/2,\displaystyle\leqslant\frac{r^{2}}{\sqrt{\gamma}}(C+\sqrt{C_{2}})\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}=\frac{C_{3}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}, (85)

with C3=C+C2C_{3}=C+\sqrt{C_{2}}.

3. Bounding (Lγ),(Lγ)𝒞,(γ),(γ)𝒞\left\lVert\left(L_{\gamma}\right)_{\mathcal{R}}\right\rVert,\left\lVert\left(L_{\gamma}\right)_{\mathcal{C}}\right\rVert,\left\lVert\left(\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\right\rVert,\left\lVert\left(\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\right\rVert.

Using the proof of Lemma 5.7, each row of AA_{\mathcal{R}} has at most 128r128r non-zeros entries and intersects at most 4n/d¯4n/\overline{d} columns. Thus, for all 1jn1\leqslant j\leqslant n

j=1n[(Aγ++Aγ)]jj\displaystyle\sum_{j^{\prime}=1}^{n}\left[(A^{+}_{\gamma}+A^{-}_{\gamma})_{\mathcal{R}}\right]_{jj^{\prime}} 128r+4γd¯=γ(128rγ+4d¯)j[Aγ++Aγ]jj(128rγ+4d¯),\displaystyle\leqslant 128r+\frac{4\gamma}{\overline{d}}=\gamma\left(\frac{128r}{\gamma}+\frac{4}{\overline{d}}\right)\leqslant\sum_{j^{\prime}}\left[A^{+}_{\gamma}+A^{-}_{\gamma}\right]_{jj^{\prime}}\left(\frac{128r}{\gamma}+\frac{4}{\overline{d}}\right),

as j[Aγ++Aγ]jjn×(γ+n+γn)=γ\sum_{j^{\prime}}[A^{+}_{\gamma}+A^{-}_{\gamma}]_{jj^{\prime}}\geqslant n\times\left(\frac{\gamma^{+}}{n}+\frac{\gamma^{-}}{n}\right)=\gamma. We can thus apply Lemma 5.10 with ε=128rγ+4d¯\varepsilon=\frac{128r}{\gamma}+\frac{4}{\overline{d}}, and we arrive at

(Lγ)128rγ+4d¯.\left\lVert(L_{\gamma})_{\mathcal{R}}\right\rVert\leqslant\sqrt{\frac{128r}{\gamma}+\frac{4}{\overline{d}}}.

We also obtain the same bound for (Lγ)𝒞\|(L_{\gamma})_{\mathcal{C}}\|. Similarly, we have j[\varmathbbE[Aγ+]+\varmathbbE[Aγ]]jj=(n1)p+γ=d¯+γγ\sum_{j^{\prime}}\left[\operatorname*{\varmathbb{E}}[A^{+}_{\gamma}]+\operatorname*{\varmathbb{E}}[A^{-}_{\gamma}]\right]_{jj^{\prime}}=(n-1)p+\gamma=\overline{d}+\gamma\geqslant\gamma and

j=1n[(\varmathbbE[Aγ+]+\varmathbbE[Aγ])]jj\displaystyle\sum_{j^{\prime}=1}^{n}\left[(\operatorname*{\varmathbb{E}}[A^{+}_{\gamma}]+\operatorname*{\varmathbb{E}}[A^{-}_{\gamma}])_{\mathcal{R}}\right]_{jj^{\prime}} 4npd¯+4γd¯8+4γd¯=γ(8γ+4d¯)j[\varmathbbE[A+]γ+\varmathbbE[A]γ]jj(8γ+4d¯).\displaystyle\leqslant 4\frac{np}{\overline{d}}+\frac{4\gamma}{\overline{d}}\leqslant 8+\frac{4\gamma}{\overline{d}}=\gamma\left(\frac{8}{\gamma}+\frac{4}{\overline{d}}\right)\leqslant\sum_{j^{\prime}}\left[\operatorname*{\varmathbb{E}}[A^{+}]_{\gamma}+\operatorname*{\varmathbb{E}}[A^{-}]_{\gamma}\right]_{jj^{\prime}}\left(\frac{8}{\gamma}+\frac{4}{\overline{d}}\right).

We arrive at (γ)8γ+4d¯\left\lVert(\mathcal{L}_{\gamma})_{\mathcal{R}}\right\rVert\leqslant\sqrt{\frac{8}{\gamma}+\frac{4}{\overline{d}}}, and finally, we also have (γ)𝒞8γ+4d¯\left\lVert(\mathcal{L}_{\gamma})_{\mathcal{C}}\right\rVert\leqslant\sqrt{\frac{8}{\gamma}+\frac{4}{\overline{d}}}.

4. Bounding Lγγ\|L_{\gamma}-\mathcal{L}_{\gamma}\|.

Summing up the bounds obtained in the first three steps, with probability at least 1e2r6nr17e2r1-e^{-2r}-6n^{-r}\geqslant 1-7e^{-2r}, we finally arrive at the bound

Lγγ\displaystyle\|L_{\gamma}-\mathcal{L}_{\gamma}\| C3r2γ(1+d¯γ)5/2+2128rγ+4d¯+28γ+4d¯\displaystyle\leqslant\frac{C_{3}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+2\sqrt{\frac{128r}{\gamma}+\frac{4}{\overline{d}}}+2\sqrt{\frac{8}{\gamma}+\frac{4}{\overline{d}}}
C3r2γ(1+d¯γ)5/2+4128rγ+4d¯\displaystyle\leqslant\frac{C_{3}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+4\sqrt{\frac{128r}{\gamma}+\frac{4}{\overline{d}}}
C3r2γ(1+d¯γ)5/2+322rγ+8d¯.\displaystyle\leqslant\frac{C_{3}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}.

This bound also provides easily a bound on the norm of LγsymL_{\gamma}-\mathcal{L}_{sym}.

Corollary 5.13.

(Error bound of the regularized Laplacian) With the notations of Theorem 3.4 and Theorem 3.7, and γ=γ++γ\gamma=\gamma^{+}+\gamma^{-}, we have

LγsymCr2γ(1+d¯γ)5/2+322rγ+8d¯+γd¯+γ=:ΔL(γ,d¯).\|L_{\gamma}-\mathcal{L}_{sym}\|\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}+\frac{\gamma}{\overline{d}+\gamma}=:\Delta_{L}(\gamma,\overline{d}). (86)

In particular, for the choice γ=d¯7/8\gamma=\overline{d}^{7/8}, if p2/np\geqslant 2/n, we obtain

Lγsym(128Cr2+1)d¯1/8.\displaystyle\|L_{\gamma}-\mathcal{L}_{sym}\|\leqslant\left(128Cr^{2}+1\right)\overline{d}^{-1/8}.
Proof.

By triangular inequality,

Lγsym\displaystyle\|L_{\gamma}-\mathcal{L}_{sym}\| Lγγ+γsym.\displaystyle\leqslant\|L_{\gamma}-\mathcal{L}_{\gamma}\|+\|\mathcal{L}_{\gamma}-\mathcal{L}_{sym}\|.

For the second term on the RHS, we have

γsym\displaystyle\|\mathcal{L}_{\gamma}-\mathcal{L}_{sym}\| =1d¯+γ\varmathbbEA1d¯\varmathbbEA=γd¯(d¯+γ)\varmathbbEAγd¯+γ.\displaystyle=\left\lVert\frac{1}{\overline{d}+\gamma}\varmathbb{E}A-\frac{1}{\overline{d}}\varmathbb{E}A\right\rVert=\frac{\gamma}{\overline{d}(\overline{d}+\gamma)}\|\varmathbb{E}A\|\leqslant\frac{\gamma}{\overline{d}+\gamma}. (87)

The last inequality comes from the fact that \varmathbbEA(n1)p(1η)d¯\left\lVert\varmathbb{E}A\right\rVert\leqslant(n-1)p(1-\eta)\leqslant\overline{d}. Thus, by summing the bound obtained in Lemma 5.12 and (87), we arrive at the expected result in (86). Moreover, if γd¯\gamma\leqslant\overline{d}, since C>1C>1, one can readily verify that

γsym128Cr2d¯52γ3+γd¯.\displaystyle\|\mathcal{L}_{\gamma}-\mathcal{L}_{sym}\|\leqslant 128Cr^{2}\frac{\overline{d}^{\frac{5}{2}}}{\gamma^{3}}+\frac{\gamma}{\overline{d}}. (88)

If γ=d¯7/8\gamma=\overline{d}^{7/8}, then γd¯\gamma\leqslant\overline{d} holds provided d¯1\overline{d}\geqslant 1 or equivalently, p1n1p\geqslant\frac{1}{n-1}. The latter is ensured if p2/np\geqslant 2/n (since n2n\geqslant 2). Plugging this in (88), we then obtain the bound

γsym(128Cr2+1)d¯1/8.\displaystyle\|\mathcal{L}_{\gamma}-\mathcal{L}_{sym}\|\leqslant\left(128Cr^{2}+1\right)\overline{d}^{-1/8}.

This concludes the proof of Corollary 5.13 and Theorem 3.7. ∎

5.6 Error bound on the eigenspaces and mis-clutering rate in the sparse regime

This section provides a bound on the misalignment error of the eigenspaces of LγL_{\gamma} and sym\mathcal{L}_{sym}, which then leads to a bounds on the mis-clustering rate of the kk-means clustering step.

5.6.1 Eigenspace alignment

Using the bound from Corollary 5.13, we can perform the same analysis of the eigenspaces of LγL_{\gamma} and sym\mathcal{L}_{sym}, as in Theorem 3.4, which will prove Theorem 3.9. We apply, once again, Weyl’s inequality and the Davis-Kahan theorem to bound the distance between the two subspaces (Vk1(Lγ))\mathcal{R}(V_{k-1}(L_{\gamma})) and (Vk1(sym))\mathcal{R}(V_{k-1}(\mathcal{L}_{sym})). We have that

λnk+1(Lγ)λnk+2(sym)λgapLγsymλgapΔL(γ,d¯),\displaystyle\lambda_{n-k+1}(L_{\gamma})-\lambda_{n-k+2}(\mathcal{L}_{sym})\geqslant\lambda_{gap}-\|L_{\gamma}-\mathcal{L}_{sym}\|\geqslant\lambda_{gap}-\Delta_{L}(\gamma,\overline{d}),

using Corollary 5.13. If γ=γ0d¯7/8\gamma=\gamma_{0}\overline{d}^{7/8}, then

ΔL(γ,d¯)\displaystyle\Delta_{L}(\gamma,\overline{d}) (128Cr2+1)(d¯)1/8:=C4d¯1/8,\displaystyle\leqslant\left(128Cr^{2}+1\right)(\overline{d})^{-1/8}:=\frac{C_{4}}{\overline{d}^{1/8}},

with C4=128Cr2+1C_{4}=128Cr^{2}+1. For 0<δ<1/20<\delta<1/2, we would like to ensure that

λgapΔL(γ,d¯)λgap(1δ2).\displaystyle\lambda_{gap}-\Delta_{L}(\gamma,\overline{d})\geqslant\lambda_{gap}\left(1-\frac{\delta}{2}\right).

Hence, using the lower bound on the eigengap from Lemma 5.3, it suffices that

λgapC4d¯1/8λgap(1δ2)d¯1/82kC4δ(12η)p(2kC4δ(12η))81n1.\displaystyle\lambda_{gap}-\frac{C_{4}}{\overline{d}^{1/8}}\geqslant\lambda_{gap}\left(1-\frac{\delta}{2}\right)\iff\overline{d}^{1/8}\geqslant\frac{2kC_{4}}{\delta(1-2\eta)}\iff p\geqslant\left(\frac{2kC_{4}}{\delta(1-2\eta)}\right)^{8}\frac{1}{n-1}.

Thus, the condition p(2kC4δ(12η))82np\geqslant\left(\frac{2kC_{4}}{\delta(1-2\eta)}\right)^{8}\frac{2}{n} is sufficient. Applying the Davis-Kahan theorem, we arrive at

(IVk1(Lγ)Vk1(Lγ)T)Vk1(sym)δλgap/2λgap(1δ/2)δ/21δ/2δ,\displaystyle\|(I-V_{k-1}(L_{\gamma})V_{k-1}(L_{\gamma})^{T})V_{k-1}(\mathcal{L}_{sym})\|\leqslant\frac{\delta\lambda_{gap}/2}{\lambda_{gap}(1-\delta/2)}\leqslant\frac{\delta/2}{1-\delta/2}\leqslant\delta,

and using once again Proposition B.3, there exists an orthogonal matrix O\varmathbbR(k1)×(k1)O\in\varmathbb R^{(k-1)\times(k-1)} such that

Vk1(Lγ)ΘRk1O2δ.\|V_{k-1}(L_{\gamma})-\Theta R_{k-1}O\|\leqslant 2\delta.

5.7 Proof of Theorem 3.12

In this section, we finally prove our result on the clustering performance of the Signed Laplacian and regularized Laplacian algorithms. The proof essentially relies on the following lemma, which provides a lower bound on the distance between two rows of Δ1Rk1\Delta^{-1}R_{k-1}, with Δ=diag(ni)\Delta=\text{diag}(\sqrt{n_{i}}).

Lemma 5.14.

For all 1iik1\leqslant i\neq i^{\prime}\leqslant k, we have (Rk1)i(Rk1)i1.\left\lVert(R_{k-1})_{i*}-(R_{k-1})_{i^{\prime}*}\right\rVert\geqslant 1. Moreover, for i[k]i\in[k], it holds that

mini,i[k],iijCi,jCi(Δ1Rk1)j(Δ1Rk1)j223ni.\min_{\begin{subarray}{c}i,i^{\prime}\in[k],i\neq i^{\prime}\\ j\in C_{i},j^{\prime}\in C_{i^{\prime}}\end{subarray}}\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2}\geqslant\frac{2}{3n_{i}}.
Proof.

Recall from (68) that C¯=p(12η)uuT+diag(di)\overline{C}=p(1-2\eta)uu^{T}+\text{diag}(d_{i}), with di=ui2+(1+pd¯(12η))d_{i}=u_{i}^{2}+\left(1+\frac{p}{\overline{d}}(1-2\eta)\right) and ui=nid¯,1iku_{i}=\sqrt{\frac{n_{i}}{\overline{d}}},1\leqslant i\leqslant k. Moreover, from (69), C¯=RΛR\overline{C}=R\Lambda R with R=[Rk1γ1]R=[R_{k-1}\>\gamma_{1}] and γ1\gamma_{1} the largest eigenvector of C¯\overline{C}. We first show that the entries of γ1\gamma_{1} are necessarily of the same sign, i.e. (γ1)i0,i(\gamma_{1})_{i}\geqslant 0,\forall i or (γ1)i0,i(\gamma_{1})_{i}\leqslant 0,\forall i. In fact, by definition, γ1\gamma_{1} is the solution of

maxv=1vTC¯v=maxv=1p(12η)(vu)2+i=1kdivi2.\max_{\left\lVert v\right\rVert=1}v^{T}\overline{C}v=\max_{\left\lVert v\right\rVert=1}p(1-2\eta)(v^{u})^{2}+\sum_{i=1}^{k}d_{i}v_{i}^{2}. (89)

Since all the entries of uu are positive, it is easy to see that any solution γ1\gamma_{1} of (89) necessarily has entries of the same sign (otherwise you could replace some (γ1)i(\gamma_{1})_{i}) by (γ1)i-(\gamma_{1})_{i} and increase the objective function).

Let ii[k]i\neq i^{\prime}\in[k]. As RR has orthonormal rows,

<Ri,Ri>=0<(Rk1)i,(Rk1)i>+(γ1)i(γ1)i0=0<(Rk1)i,(Rk1)i>0.\displaystyle<R_{i*},R_{i^{\prime}*}>=0\iff<(R_{k-1})_{i*},(R_{k-1})_{i^{\prime}*}>+\underbrace{(\gamma_{1})_{i}(\gamma_{1})_{i^{\prime}}}_{\geqslant 0}=0\implies<(R_{k-1})_{i*},(R_{k-1})_{i^{\prime}*}>\leqslant 0.

Hence,

(Rk1)i(Rk1)i2\displaystyle\left\lVert(R_{k-1})_{i*}-(R_{k-1})_{i^{\prime}*}\right\rVert^{2} =(Rk1)i2+(Rk1)i22<(Rk1)i,(Rk1)i>0\displaystyle=\left\lVert(R_{k-1}){i*}\right\rVert^{2}+\left\lVert(R_{k-1}){i^{\prime}*}\right\rVert^{2}-2\underbrace{<(R_{k-1})_{i*},(R_{k-1})_{i^{\prime}*}>}_{\leqslant 0}
(Rk1)i2+(Rk1)i2\displaystyle\geqslant\left\lVert(R_{k-1})_{i*}\right\rVert^{2}+\left\lVert(R_{k-1})_{i^{\prime}*}\right\rVert^{2}
=2[(γ1)i2+(γ1)i2]11.\displaystyle=2-\underbrace{[(\gamma_{1})_{i}^{2}+(\gamma_{1})_{i^{\prime}}^{2}]}_{\leqslant 1}\geqslant 1.

In particular, this implies that Rk1R_{k-1} has kk distinct rows. Now let j,j[n]j,j^{\prime}\in[n] such that jCij\in C_{i} and jCij^{\prime}\in C_{i^{\prime}}. Recalling that with Δ=diag(ni)\Delta=\text{diag}(\sqrt{n_{i}}), Vk1(sym)=ΘRk1=ΘΔΔ1Rk1=Θ^Δ1Rk1V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}=\Theta\Delta\Delta^{-1}R_{k-1}=\hat{\Theta}\Delta^{-1}R_{k-1}, we have

{(Δ1Rk1)j=1ni(Rk1)i,(Δ1Rk1)j=1ni(Rk1)i.\displaystyle\begin{cases}(\Delta^{-1}R_{k-1})_{j*}=\frac{1}{\sqrt{n_{i}}}(R_{k-1})_{i*},&\\ (\Delta^{-1}R_{k-1})_{j^{\prime}*}=\frac{1}{\sqrt{n_{i^{\prime}}}}(R_{k-1})_{i^{\prime}*}.\end{cases}

Hence,

(Δ1Rk1)j(Δ1Rk1)j2\displaystyle\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2} =1ni(Rk1)i2+1ni(Rk1)i221nini<(Rk1)i,(Rk1)i>0\displaystyle=\frac{1}{n_{i}}\left\lVert(R_{k-1}){i*}\right\rVert^{2}+\frac{1}{n_{i^{\prime}}}\left\lVert(R_{k-1})_{i^{\prime}*}\right\rVert^{2}-2\frac{1}{\sqrt{n_{i}n_{i^{\prime}}}}\underbrace{<(R_{k-1})_{i*},(R_{k-1})_{i^{\prime}*}>}_{\leqslant 0}
1ni(Rk1)i2+1ni(Rk1)i2\displaystyle\geqslant\frac{1}{n_{i}}\left\lVert(R_{k-1})_{i*}\right\rVert^{2}+\frac{1}{n_{i^{\prime}}}\left\lVert(R_{k-1})_{i^{\prime}*}\right\rVert^{2}
1ni+1ni(γ1)i2ni(γ1)i2ni\displaystyle\geqslant\frac{1}{n_{i}}+\frac{1}{n_{i^{\prime}}}-\frac{(\gamma_{1})_{i}^{2}}{n_{i}}-\frac{(\gamma_{1})_{i^{\prime}}^{2}}{n_{i}^{\prime}}
1ni+1ni(γ1)i2+(γ1)i2ns1ni+1ni1ns1ni+1nl1ns.\displaystyle\geqslant\frac{1}{n_{i}}+\frac{1}{n_{i^{\prime}}}-\frac{(\gamma_{1})_{i}^{2}+(\gamma_{1})_{i^{\prime}}^{2}}{ns}\geqslant\frac{1}{n_{i}}+\frac{1}{n_{i^{\prime}}}-\frac{1}{ns}\geqslant\frac{1}{n_{i}}+\frac{1}{nl}-\frac{1}{ns}.

Besides, we know that 1nlρni\frac{1}{nl}\geqslant\frac{\rho}{n_{i}} and 1ns1ρni\frac{1}{ns}\leqslant\frac{1}{\rho n_{i}}. Therefore, we obtain the bound

(Δ1Rk1)j(Δ1Rk1)j21ni(1+ρ1ρ).\displaystyle\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2}\geqslant\frac{1}{n_{i}}\left(1+\rho-\frac{1}{\rho}\right).

We will now prove that with the condition ρ>114k(2+k)\sqrt{\rho}>1-\frac{1}{4k(2+\sqrt{k})}, we have 1+ρ1ρ231+\rho-\frac{1}{\rho}\geqslant\frac{2}{3} and this will lead to the final result. First, we note that ρ>112k(2+k)\rho>1-\frac{1}{2k(2+\sqrt{k})} and 2k(2+k)122k(2+\sqrt{k})\geqslant 12, and 2k(2+k)2k(2+k)154\frac{2k(2+\sqrt{k})}{2k(2+\sqrt{k})-1}\leqslant\frac{5}{4} for k2k\geqslant 2. Thus,

1+ρ1ρ\displaystyle 1+\rho-\frac{1}{\rho} 212k(2+k)2k(2+k)2k(2+k)1211254=23.\displaystyle\geqslant 2-\frac{1}{2k(2+\sqrt{k})}-\frac{2k(2+\sqrt{k})}{2k(2+\sqrt{k})-1}\geqslant 2-\frac{1}{12}-\frac{5}{4}=\frac{2}{3}.

Remark 5.15.

In the equal-size case ni=nk,1ikn_{i}=\frac{n}{k},\forall 1\leqslant i\leqslant k, since γ1=χ1\gamma_{1}=\chi_{1}, Rk1R_{k-1} has orthogonal rows and

(Rk1)i(Rk1)i2=RiRi2=2.\displaystyle\left\lVert(R_{k-1})_{i*}-(R_{k-1})_{i^{\prime}*}\right\rVert^{2}=\left\lVert R_{i*}-R_{i^{\prime}*}\right\rVert^{2}=2.

This implies that

(Δ1Rk1)j(Δ1Rk1)j2=2kn.\displaystyle\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2}=\frac{2k}{n}.

From Lemma 5.14, we have that 1ik,mini,i[k],iijCi,jCi(Δ1Rk1)j(Δ1Rk1)j223ni.\forall 1\leqslant i\leqslant k,\min\limits_{\begin{subarray}{c}i,i^{\prime}\in[k],i\neq i^{\prime}\\ j\in C_{i},j^{\prime}\in C_{i^{\prime}}\end{subarray}}\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2}\geqslant\frac{2}{3n_{i}}. Hence with δi2:=23ni\delta_{i}^{2}:=\frac{2}{3n_{i}} and using Lemma 4.19, we obtain

i=1kδi2|Si|=i=1k2|Si|3ni\displaystyle\sum_{i=1}^{k}\delta_{i}^{2}|S_{i}|=\sum_{i=1}^{k}\frac{2|S_{i}|}{3n_{i}} 4(4+2ξ)Vk1(Lsym¯)Vk1(sym)F2\displaystyle\leqslant 4(4+2\xi)\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})\right\rVert_{F}^{2}
4(16+8ξ)(k1)Vk1(Lsym¯)Vk1(sym)O2\displaystyle\leqslant 4(16+8\xi)(k-1)\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})O\right\rVert^{2}
8(16+8ξ)(k1)δ2,\displaystyle\leqslant 8(16+8\xi)(k-1)\delta^{2},

using Theorem 3.4 . Moreover, we have

Vk1(Lsym¯)Vk1(sym)F2\displaystyle\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})\right\rVert_{F}^{2} 2(k1)Vk1(Lsym¯)Vk1(sym)O2\displaystyle\leqslant 2(k-1)\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})O\right\rVert^{2}
8(k1)δ2<8(k1)112(16+8ξ)(k1)=niδi216+8ξ,1ik.\displaystyle\leqslant 8(k-1)\delta^{2}<8(k-1)\frac{1}{12(16+8\xi)(k-1)}=\frac{n_{i}\delta_{i}^{2}}{16+8\xi},\forall 1\leqslant i\leqslant k.

Therefore, we can use the second part of Lemma 4.19 and finally conclude that

i=1k|Si|ni96(2+ξ)δ2.\displaystyle\sum_{i=1}^{k}\frac{|S_{i}|}{n_{i}}\leqslant 96(2+\xi)\delta^{2}.

For the regularized Laplacian algorithm, the same computations are valid using the result from Theorem 3.9.

6 Numerical experiments

In this section, we report on the outcomes of numerical experiments that compare our two proposed algorithms with a suite of state-of-the-art methods from the signed clustering literature. We rely on a previous Python implementation of SPONGE and Signed Laplacian (along with their respective normalized versions), and of other methods from the literature333Python implementations of a suite of algorithms for signed clustering are available at https://github.com/alan-turing-institute/signet, made available in the context of previous work of a subset of the authors of the present paper [CDGT19]. More specifically, we consider algorithms based on the adjacency matrix AA, the Signed Laplacian matrix L¯\overline{L}, its symmetrically normalized version L¯sym\overline{L}_{sym} [KSL+10], SPONGE and its normalized version SPONGEsym, and the two algorithms introduced in [CWD12] that optimize the Balanced Ratio Cut and the Balanced Normalized Cut objectives.

We remark that once the low-dimensional embedding has been computed by any of the considered algorithms, the final partition is obtained after running kk-means++ [AV07], which improves over the popular kk-means algorithm by employing a careful seeding initialization procedure and is the typical choice in practice.

6.1 Grid search for choosing the parameters τ+,τ\tau^{+},\tau^{-}

In the following experiments, the Signed Stochastic Block Model will be sampled with the following set of parameters

  • the number of nodes n=5000n=5000,

  • the number of communities k{3,5,10,20}k\in\{3,5,10,20\},

  • the relative size of communities ρ=1\rho=1 (equal-size clusters) and ρ=1/k\rho=1/k (non-equal size clusters).

For the edge density parameter pp, we choose two sparsity regimes, “Regime I” and “Regime II”, where Regime II is strictly harder than Regime I, in the sense than for the same value of kk, the edge density in Regime I is significantly larger compared to Regime II. The noise level η\eta is chosen such that the recovery of the clusters is unsatisfactory for a subset of pairs of parameters (τ+,τ)(\tau^{+},\tau^{-}). For each set of parameters, we sample 20 graphs from the SSBM and average the resulting ARI.

Our experimental setup is summarized in the following steps

  1. 1.

    Select a set of parameters (k,ρ,p,η)(k,\rho,p,\eta) from the regime of interest;

  2. 2.

    Sample a graph from the SSBM(n,k,ρ,p,η)(n,k,\rho,p,\eta);

  3. 3.

    Extract the largest connected component of the measurement graph (regardless of the sign of the edges);

  4. 4.

    If the size of the latter is too small (<n/2<n/2), resample a graph until successful;

  5. 5.

    For each pair of parameters (τ+,τ)(\tau^{+},\tau^{-}), compute the kk-dimensional embeddings using the SPONGEsym algorithm (with the implementation in the signet package [CDGT19]);

  6. 6.

    Obtain a partition of the graph into kk clusters, and compute the ARIARI between this estimated partition and the ground-truth clusters using the implementation in scikit-learn of the kk-means++ algorithm;

  7. 7.

    Repeat steps 272-7 for 20 times.

The results in the dense regimes are reported in Figure 1, while those for the sparse regimes in Figure 2. This set of results indicate that the gradient of the ARI in the space of parameters (τ+,τ)(\tau^{+},\tau^{-}) is larger when the cluster sizes are very unbalanced and the edge density is low. We attribute this to the fact that, for suitably chosen values, the parameters (τ+,τ)(\tau^{+},\tau^{-}) are performing a form of regularization of the graph that can significantly improve the clustering performance.

Regime I Equal-size clusters Unequal-size clusters
k=3k=3 [Uncaptioned image] [Uncaptioned image]
k=5k=5 [Uncaptioned image] [Uncaptioned image]
k=10k=10 [Uncaptioned image] [Uncaptioned image]
k=20k=20 [Uncaptioned image] [Uncaptioned image]
Figure 1: Heatmaps of the Adjusted Rand Index between the ground truth and the partition obtained using the SPONGEsym algorithm with varying regularization parameters (τ+,τ)(\tau^{+},\tau^{-}), for a SSBM in Regime I, with n=5000n=5000 and k={3,5,10,20}k=\{3,5,10,20\} clusters of equal sizes (left column) and unequal sizes (right column).
Regime II Equal-size clusters Unequal-size clusters
k=3k=3 [Uncaptioned image] [Uncaptioned image]
k=5k=5 [Uncaptioned image] [Uncaptioned image]
k=10k=10 [Uncaptioned image] [Uncaptioned image]
k=20k=20 [Uncaptioned image] [Uncaptioned image]
Figure 2: Heatmaps of the Adjusted Rand Index between the ground truth and the partition obtained using the SPONGEsym algorithm with varying regularization parameters (τ+,τ)(\tau^{+},\tau^{-}), for a SSBM in Regime II with n=5000n=5000 and k={3,5,10,20}k=\{3,5,10,20\} clusters of equal sizes (left column) and unequal sizes (right column).

6.2 Comparison of a suite of spectral methods

This section performs a comparison of the performance of the following spectral clustering algorithms. We rely on the same notation used in [CDGT19], when mentioning the names of the SPONGE algorithms, namely: SPONGE and SPONGEsym. The complete list of algorithms compared is as follows.

  • the combinatorial (un-normalized) Signed Laplacian L¯=D¯A\overline{L}=\overline{D}-A,

  • the symmetric Signed Laplacian L¯sym=ID¯1/2AD¯1/2\overline{L}_{sym}=I-\overline{D}^{-1/2}A\overline{D}^{-1/2},

  • SPONGE and SPONGEsym with a suitably chosen pair of parameters (τ+,τ)(\tau^{+},\tau^{-})

  • the Balanced Ratio Cut LBRC=D+AL_{BRC}=D^{+}-A

  • the Balanced Normalized Cut LBNC=D1/2(D+A)D1/2L_{BNC}=D^{-1/2}(D^{+}-A)D^{-1/2}.

For the combinatorial and symmetric Signed Laplacians L¯\overline{L} and Lsym¯\overline{L_{sym}}, we compute k1k-1-dimensional embeddings before applying the kk-means++ algorithm. For all other methods, we use the kk smallest eigenvectors.

In this experiment, we fix the parameters n=5000,k{3,5,10,20}n=5000,k\in\{3,5,10,20\} and p,ηp,\eta in a certain set, and for each plot, we vary the aspect ratio ρ[0,1]\rho\in[0,1]. The relative proportions of the classes si=nins_{i}=\frac{n_{i}}{n} are chosen according to the following procedure

  1. 1.

    Fix s1=1/ks_{1}^{\prime}=1/k, pick a value for ρ\rho and compute sk=s1/ρs_{k}^{\prime}=s_{1}^{\prime}/\rho.

  2. 2.

    For i[2,k1]i\in[2,k-1], sample sis_{i}^{\prime} from the uniform distribution in the interval [s1,sk][s_{1}^{\prime},s_{k}^{\prime}].

  3. 3.

    Compute the proportions si=sii=1ksis_{i}=\frac{s_{i}^{\prime}}{\sum_{i=1}^{k}s_{i}^{\prime}}, and then sample the graph from the resulting SSBM.

  4. 4.

    Repeat 20 times the steps 1-3 mentioned above, and record the average performance over the 20 runs.

The results are reported in Figure 3. We note that in almost all settings, the SPONGEsym algorithm outperforms the other clustering methods, in particular for low values of the aspect ratio ρ\rho. With the exception of the symmetric Signed Laplacian, most methods seem to perform worse when the aspect ratio is higher, meaning that the clusters are more unbalanced, which is a more challenging regime.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Performance of the various clustering algorithms, as measured by the Adjusted Rand Index, versus the aspect ratio ρ\rho for a SSBM with k={3,5,10,20}k=\{3,5,10,20\} for n=5000n=5000. For larger number of clusters, k=10k=10 and especially k=20k=20, SPONGEsym is essentially the only algorithm able to produce meaningful results, and clearly outperforms all the other methods. Note that no regularization has been used throughout this set of experiments.

6.3 Performance of the regularized algorithms in the sparse regime

In this final batch of experiments, we study how the regularized Signed Laplacian and the SPONGEsym sparse algorithms perform. We consider sparse settings of the SSBM (p0.003p\leqslant 0.003) with n=5000n=5000 nodes. For the SPONGEsym algorithm, we fix the parameters (τ+,τ)(\tau^{+},\tau^{-}) in each setting. Our parameter selection procedure is to chose a pair of parameters that leads to a “good” recovery of the clusters for the unregularized algorithm (see Figure 2). We perform a grid search on the parameters (γ+,γ)(\gamma^{+},\gamma^{-}) for each of the two regularized algorithms (see Figure 4 and Figure 5). For the regularized Signed Laplacian algorithm, we observe distinct regions of performance on the space of parameters (γ+,γ)(\gamma^{+},\gamma^{-}). This is not predictable from our theoretical results, where the positive and negative regularization parameters play symmetric roles. We conjecture this to be due to the difference of density of the positive and negative subgraphs in our signed random graph model. For the SPONGEsym sparse algorithm, we note that the gradient of performances in the heatmaps (Figure 4, Figure 5) is similar to what was reported in Figure 2, which could be due to the fact that the parameters (τ+,τ)(\tau^{+},\tau^{-}) already have a regularization effect.

Sparse Regime k=3 LγL_{\gamma} SPONGEsym
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
Figure 4: Heatmaps of the Adjusted Rand Index between the ground truth and the partition obtained using the LγL_{\gamma} and SPONGEsym algorithm with fixed parameters (τ+,τ)(\tau^{+},\tau^{-}) and varying regularization parameters (γ+,γ)(\gamma^{+},\gamma^{-}), for a SSBM in two sparse regimes, with n=5000n=5000 and k=3k=3 clusters.
Sparse Regime k=5 LγL_{\gamma} SPONGEsym
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
Figure 5: Heatmaps of the Adjusted Rand Index between the ground truth and the partition obtained using the LγL_{\gamma} and SPONGEsym algorithm with fixed parameters (τ+,τ)(\tau^{+},\tau^{-}) and varying regularization parameters (γ+,γ)(\gamma^{+},\gamma^{-}), for a SSBM in two sparse regimes, with n=5000n=5000 and k=5k=5 clusters.

7 Concluding remarks and future research directions

In this work, we provided a thorough theoretical analysis of the robustness of the SPONGEsym and symmetric Signed Laplacian algorithms, for graphs generated from a Signed Stochastic Block Model. Under this model, the sign of the edges (rather than the usual discrepancy of the edge densities across clusters versus within clusters) is an essential attribute which induces the underlying cluster structure of the graph. We proved that our signed clustering algorithms, based on suitably defined matrix operators, are able to recover the clusters under certain favorable noise regimes, and under two regimes of edge sparsity. Although the sparse setting is particularly challenging, our algorithms based on regularized graphs perform well, provided that the regularization parameters are suitably chosen.

One theoretical question that has been not been answered yet relates to the choice of the positive and negative regularization parameters γ+,γ\gamma_{+},\gamma_{-}. Having a data-driven approach to tune the regularization parameters would be of great use in many practical applications involving very sparse graphs. An interesting future line of work would be to study the latest regularizing techniques based on powers of adjacency matrices or certain graph distance matrices, in the context of sparse signed graphs.

Yet another approach is to consider a pre-processing stage that performs low-rank matrix completion on the adjacency matrix, whose output could subsequently be used as input for our proposed algorithms. An extension of the Cheeger inequality to the setting of signed graphs, analogue to the generalized Cheeger inequality previously explored in [CKC+16], is another interesting research question. Extensions to the time-dependent setting and online clustering [LSS16, MWD+18], or when covariate information is available [YS20], are further research directions worth exploring, well motivated by real world applications involving signed networks.

References

  • [ABARS20] Emmanuel Abbe, Enric Boix-Adserà, Peter Ralli, and Colin Sandon, Graph powering and spectral robustness, SIAM Journal on Mathematics of Data Science 2 (2020), no. 1, 132–157.
  • [Abb17] Emmanuel Abbe, Community detection and stochastic block models: recent developments, The Journal of Machine Learning Research 18 (2017), no. 1, 6446–6531.
  • [ACBL13] Arash A. Amini, Aiyou Chen, Peter J. Bickel, and Elizaveta Levina, Pseudo-likelihood methods for community detection in large sparse networks, The Annals of Statistics 41 (2013), no. 4, 2097–2122.
  • [ADHP09] Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning 75 (2009), no. 2, 245–248.
  • [ASW15] Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh Ying Wah, Time-series clustering–a decade review, Information Systems 53 (2015), 16–38.
  • [AV07] David Arthur and Sergei Vassilvitskii, K-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (USA), SODA ’07, Society for Industrial and Applied Mathematics, 2007, p. 1027–1035.
  • [BBC04] Nikhil Bansal, Avrim Blum, and Shuchi Chawla, Correlation clustering, Mach. Learn. 56 (2004), no. 1-3, 89–113.
  • [Bha96] R. Bhatia, Matrix analysis, Springer New York, 1996.
  • [BSG+12] Sujogya Banerjee, Kaushik Sarkar, Sedat Gokalp, Arunabha Sen, and Hasan Davulcu, Partitioning signed bipartite graphs for classification of individuals and organizations, International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, Springer, 2012, pp. 196–204.
  • [BSS13] Afonso S Bandeira, Amit Singer, and Daniel A Spielman, A Cheeger inequality for the graph Connection Laplacian, SIAM Journal on Matrix Analysis and Applications 34 (2013), no. 4, 1611–1630.
  • [BvH16] Afonso S. Bandeira and Ramon van Handel, Sharp nonasymptotic bounds on the norm of random matrices with independent entries, Ann. Probab. 44 (2016), no. 4, 2479–2506.
  • [CCT12] Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas, Spectral clustering of graphs with general degrees in the extended planted partition model, 25th Annual Conference on Learning Theory (Edinburgh, Scotland) (Shie Mannor, Nathan Srebro, and Robert C. Williamson, eds.), Proceedings of Machine Learning Research, vol. 23, JMLR Workshop and Conference Proceedings, 25–27 Jun 2012, pp. 35.1–35.23.
  • [CDGT19] Mihai Cucuringu, Peter Davies, Aldo Glielmo, and Hemant Tyagi, Sponge: A generalized eigenproblem for clustering signed networks, Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 89, PMLR, 16–18 Apr 2019, pp. 1088–1098.
  • [CHN+14] Kai-Yang Chiang, Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon, and Ambuj Tewari, Prediction and clustering in signed networks: A local to global perspective, Journal of Machine Learning Research 15 (2014), 1177–1213.
  • [Chu96] Fan RK Chung, Laplacians of graphs and Cheeger’s inequalities, Combinatorics, Paul Erdos is Eighty 2 (1996), no. 157-172, 13–2.
  • [Chu05] Fan Chung, Laplacians and the Cheeger inequality for directed graphs, Annals of Combinatorics 9 (2005), no. 1, 1–19.
  • [CKC+16] Mihai Cucuringu, Ioannis Koutis, Sanjay Chawla, Gary Miller, and Richard Peng, Simple and scalable constrained clustering: a generalized spectral method, Artificial Intelligence and Statistics Conference (AISTATS) 2016 51 (2016), 445–454.
  • [CPv19] Mihai Cucuringu, Andrea Pizzoferrato, and Yves van Gennip, An MBO scheme for clustering and semi-supervised clustering of signed networks, To appear in Communications in Mathematical Sciences (2019), arXiv:1901.03091.
  • [CR11] Fan Chung and Mary Radcliffe, On the spectra of general random graphs, Electronic Journal of Combinatorics 18 (2011), no. 1, Paper 215, 14. MR 2853072
  • [Cuc15] M. Cucuringu, Synchronization over Z2 and community detection in multiplex networks with constraints, Journal of Complex Networks 3 (2015), 469–506.
  • [CWD12] Kai-Yang Chiang, Joyce Jiyoung Whang, and Inderjit S. Dhillon, Scalable clustering of signed networks using balance normalized cut, Proceedings of the 21st ACM International Conference on Information and Knowledge Management (New York, NY, USA), CIKM ’12, Association for Computing Machinery, 2012, p. 615–624.
  • [CWP+16] Lingyang Chu, Zhefeng Wang, Jian Pei, Jiannan Wang, Zijin Zhao, and Enhong Chen, Finding gangs in war from signed networks, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1505–1514.
  • [Das08] Sanjoy Dasgupta, The hardness of k-means clustering, Technical Report CS2007-0890, University of California, San Diego, 2008.
  • [DEFI06] Erik D. Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica, Correlation clustering in general weighted graphs, Theor. Comput. Sci. 361 (2006), no. 2, 172–187.
  • [DK70] Chandler Davis and W. M. Kahan, The rotation of eigenvectors by a perturbation. iii, SIAM Journal on Numerical Analysis 7 (1970), no. 1, 1–46.
  • [DMT18] Tyler Derr, Yao Ma, and Jiliang Tang, Signed graph convolutional networks, 2018 IEEE International Conference on Data Mining (ICDM), IEEE, 2018, pp. 929–934.
  • [epi] Epinions data set, http://www.epinions.com, Accessed: 2010-09-30.
  • [Foc05] Sergio M Focardi, Clustering economic and financial time series: Exploring the existence of stable correlation conditions, The Intertek Group (2005).
  • [FSK+12] André Fujita, Patricia Severino, Kaname Kojima, João Ricardo Sato, Alexandre Galvão Patriota, and Satoru Miyano, Functional clustering of time series gene expression data by Granger causality, BMC systems biology 6 (2012), no. 1, 137.
  • [Gal13] Jean H. Gallier, Notes on elementary spectral graph theory. applications to graph clustering using normalized cuts, CoRR abs/1311.2492 (2013).
  • [Gal16] Jean Gallier, Spectral theory of unsigned and signed graphs. applications to graph clustering: a survey, CoRR abs / 1601.04692 (2016), 1–122.
  • [HCD12] Cho-Jui Hsieh, Kai-Yang Chiang, and Inderjit S. Dhillon, Low-Rank Modeling of Signed Networks, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2012.
  • [HLN15] Gyeong-Gyun Ha, Jae Woo Lee, and Ashadun Nobi, Threshold network of a financial market using the p-value of correlation coefficients, Journal of the Korean Physical Society 66 (2015), no. 12, 1802–1808.
  • [Imb09] Roberto Imbuzeiro Oliveira, Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges, arXiv e-prints (2009), arXiv:0911.0600.
  • [J+15] Jiashun Jin et al., Fast community detection by score, The Annals of Statistics 43 (2015), no. 1, 57–89.
  • [JJSK16] Jinhong Jung, Woojeong Jin, Lee Sael, and U Kang, Personalized ranking in signed networks using signed random walk with restart, 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 973–978.
  • [JY16] Antony Joseph and Bin Yu, Impact of regularization on spectral clustering, Annals of Statistics 44 (2016), no. 4, 1765–1791.
  • [Kny01] A. Knyazev, Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method, SIAM Journal on Scientific Computing 23 (2001), no. 2, 517–541.
  • [KSL+10] Jérôme Kunegis, Stephan Schmidt, Andreas Lommatzsch, Jürgen Lerner, Ernesto W. De Luca, and Sahin Albayrak, Spectral analysis of signed graphs for clustering, prediction and visualization, pp. 559–570, SIAM, 2010.
  • [KSS04] A. Kumar, Y. Sabharwal, and S. Sen, A simple linear time (1 + ε\varepsilon)-approximation algorithm for k-means clustering in any dimensions, 45th Annual IEEE Symposium on Foundations of Computer Science, 2004, pp. 454–462.
  • [KSS14] Srijan Kumar, Francesca Spezzano, and VS Subrahmanian, Accurately detecting trolls in Slashdot Zoo via decluttering, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), IEEE, 2014, pp. 188–195.
  • [KSSF16] Srijan Kumar, Francesca Spezzano, V.S. Subrahmanian, and Christos Faloutsos, Edge weight prediction in weighted signed networks, ICDM, 2016.
  • [LFZ19] Xiaoming Li, Hui Fang, and Jie Zhang, Supervised user ranking in signed social networks, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 184–191.
  • [LGT14] James R Lee, Shayan Oveis Gharan, and Luca Trevisan, Multiway spectral partitioning and higher-order Cheeger inequalities, Journal of the ACM (JACM) 61 (2014), no. 6, 1–30.
  • [LHK10] J. Leskovec, D. Huttenlocher, and J. Kleinberg, Predicting positive and negative links in online social networks, WWW, 2010, pp. 641–650.
  • [Li94] Ren-Cang Li, On perturbations of matrix pencils with real spectra, Mathematics of Computation 62 (1994), no. 205, 231–265.
  • [LLV15] Can M. Le, Elizaveta Levina, and Roman Vershynin, Sparse random graphs: regularization and concentration of the laplacian, 2015.
  • [LLV17] Can M. Le, Elizaveta Levina, and Roman Vershynin, Concentration and regularization of random graphs, Random Structures & Algorithms 51 (2017), no. 3, 538–561.
  • [LR15] Jing Lei and Alessandro Rinaldo, Consistency of spectral clustering in stochastic block models, Ann. Statist. 43 (2015), no. 1, 215–237.
  • [LSS16] Edo Liberty, Ram Sriharsha, and Maxim Sviridenko, An algorithm for online k-means clustering, 2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), SIAM, 2016, pp. 81–89.
  • [LTJC20] Yu Li, Yuan Tian, Zhang Jiawei, and Yi Chang, Learning signed network embedding via graph attention, Proceedings of the AAAI Conference on Artificial Intelligence 34 (2020), 4772–4779.
  • [MNV12] Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan, The planar k-means problem is NP-hard, Theoretical Computer Science 442 (2012), 13 – 21.
  • [MSB14] Ekaterina Merkurjev, Justin Sunu, and Andrea L. Bertozzi, Graph MBO method for multiclass segmentation of hyperspectral stand-off detection video, Image Processing (ICIP), 2014 IEEE International Conference on, IEEE, 2014, pp. 689–693.
  • [MTH16] Pedro Mercado, Francesco Tudisco, and Matthias Hein, Clustering signed networks with the geometric mean of laplacians, Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, eds.), Curran Associates, Inc., 2016, pp. 4421–4429.
  • [MTH19]   , Spectral clustering of signed graphs via matrix power means, 36th International Conference on Machine Learning (Long Beach, California, USA) (Kamalika Chaudhuri and Ruslan Salakhutdinov, eds.), Proceedings of Machine Learning Research, vol. 97, PMLR, 09–15 Jun 2019, pp. 4526–4536.
  • [MU05] Michael Mitzenmacher and Eli Upfal, Probability and computing: Randomized algorithms and probabilistic analysis, Cambridge University Press, New York, NY, USA, 2005.
  • [MWD+18] Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, and Ignacio Lopez Moreno, Links: A high-dimensional online clustering method, 2018.
  • [NJW+02] Andrew Y Ng, Michael I Jordan, Yair Weiss, et al., On spectral clustering: Analysis and an algorithm, Advances in neural information processing systems 2 (2002), 849–856.
  • [PPTV06] Nicos G Pavlidis, Vassilis P Plagianakos, Dimitris K Tasoulis, and Michael N Vrahatis, Financial forecasting through unsupervised clustering and neural networks, Operational Research 6 (2006), no. 2, 103–127.
  • [QR13] Tai Qin and Karl Rohe, Regularized spectral clustering under the degree-corrected stochastic blockmodel, Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, 2013, pp. 3120–3128.
  • [RCY+11] Karl Rohe, Sourav Chatterjee, Bin Yu, et al., Spectral clustering and the high-dimensional stochastic blockmodel, The Annals of Statistics 39 (2011), no. 4, 1878–1915.
  • [SgS90] G.W. Stewart and Ji guang Sun, Matrix perturbation theory, Academic Press, 1990.
  • [Sin11] A. Singer, Angular synchronization by eigenvectors and semidefinite programming, Appl. Comput. Harmon. Anal. 30 (2011), no. 1, 20–36.
  • [sla] Slashdot data set, http://www.slashdot.com, Accessed: 2010-09-30.
  • [SM19] Ludovic Stephan and Laurent Massoulié, Robustness of spectral methods for community detection, Proceedings of the Thirty-Second Conference on Learning Theory (Phoenix, USA) (Alina Beygelzimer and Daniel Hsu, eds.), Proceedings of Machine Learning Research, vol. 99, PMLR, 25–28 Jun 2019, pp. 2831–2860.
  • [SMSK+11] Stephen M. Smith, Karla L. Miller, Gholamreza Salimi-Khorshidi, Matthew Webster, Christian F. Beckmann, Thomas E. Nichols, Joseph D. Ramsey, and Mark W. Woolrich, Network modelling methods for FMRI, NeuroImage 54 (2011), no. 2, 875 – 891.
  • [ST00] Ahmed Sameh and Zhanye Tong, The trace minimization method for the symmetric generalized eigenvalue problem, Journal of Computational and Applied Mathematics 123 (2000), no. 1, 155 – 175, Numerical Analysis 2000. Vol. III: Linear Algebra.
  • [TAL16] Jiliang Tang, Charu Aggarwal, and Huan Liu, Node classification in signed social networks, SDM, 2016.
  • [vL07] U. von Luxburg, A tutorial on spectral clustering, Statistics and Computing 17 (2007), 395–416.
  • [Wey12] Hermann Weyl, Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung), Mathematische Annalen 71 (1912), no. 4, 441–479.
  • [YCL07] B. Yang, W. K. Cheung, and J. Liu, Community mining from signed social networks, IEEE Trans Knowl Data Eng 19 (2007), no. 10, 1333–1348.
  • [YS20] Bowei Yan and Purnamrita Sarkar, Covariate regularized community detection in sparse graphs, Journal of the American Statistical Association 0 (2020), no. 0, 1–12.
  • [YWS15] Y. Yu, T. Wang, and R. J. Samworth, A useful variant of the Davis–Kahan theorem for statisticians, Biometrika 102 (2015), no. 2, 315–323.
  • [ZA18] Zhixin Zhou and Arash A. Amini, Analysis of spectral clustering algorithms for community detection: the general bipartite setting, 2018.
  • [ZJGK10] Hartmut Ziegler, Marco Jenny, Tino Gruse, and Daniel A Keim, Visual market sector analysis for financial time series data, Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on, IEEE, 2010, pp. 83–90.
  • [ZR18] Yilin Zhang and Karl Rohe, Understanding regularized spectral clustering via graph conductance, 2018.

Appendix A Useful concentration inequalities

A.1 Chernoff bounds

Recall the following Chernoff bound for sums of independent Bernoulli random variables.

Theorem A.1 ([MU05, Corollary 4.6]).

Let X1,,XnX_{1},\dots,X_{n} be independent Bernoulli random variables with \varmathbbP[Xi=1]=pi\operatorname*{\varmathbb{P}}\left[X_{i}=1\right]=p_{i}. Let X=i=1nXiX=\sum_{i=1}^{n}X_{i} and μ=\varmathbbE[X]\mu=\operatorname*{\varmathbb{E}}[X]. For δ(0,1)\delta\in(0,1), it holds true that

\varmathbbP[|Xμ|δμ]2exp(μδ2/3).\operatorname*{\varmathbb{P}}\left[\left\lvert X-\mu\right\rvert\geqslant\delta\mu\right]\leqslant 2\exp(-\mu\delta^{2}/3).

A.2 Spectral norm of random matrices

We will make use of the following result for bounding the spectral norm of symmetric matrices with independent, centered and bounded random variables.

Theorem A.2 ([BvH16, Corollary 3.12, Remark 3.13]).

Let XX be an n×nn\times n symmetric matrix whose entries XijX_{ij} (ij)(i\leqslant j) are independent, centered random variables. There there exists for any 0<ε1/20<\varepsilon\leqslant 1/2 a universal constant cεc_{\varepsilon} such that for every t0t\geqslant 0,

\varmathbbP[X(1+ε)22σ~+t]nexp(t2cεσ~2)\operatorname*{\varmathbb{P}}\left[\left\lVert X\right\rVert\geqslant(1+\varepsilon)2\sqrt{2}\widetilde{\sigma}+t\right]\leqslant n\exp\left(-\frac{t^{2}}{c_{\varepsilon}\widetilde{\sigma}_{*}^{2}}\right) (90)

where

σ~:=maxij\varmathbbE[Xij2],σ~:=maxi,jXij.\widetilde{\sigma}:=\max_{i}\sqrt{\sum_{j}\operatorname*{\varmathbb{E}}[X_{ij}^{2}]},\quad\widetilde{\sigma}_{*}:=\max_{i,j}\left\lVert X_{ij}\right\rVert_{\infty}.

Note that it suffices to employ upper bound estimates on σ~,σ~\widetilde{\sigma},\widetilde{\sigma}_{*} in (90). Indeed, if σ~σ~(u)\widetilde{\sigma}\leqslant\widetilde{\sigma}^{(u)} and σ~σ~(u)\widetilde{\sigma}_{*}\leqslant\widetilde{\sigma}_{*}^{(u)}, then

\varmathbbP[X(1+ε)22σ~(u)+t]\varmathbbP[X(1+ε)22σ~+t]nexp(t2cεσ~2)nexp(t2cε(σ~(u))2).\operatorname*{\varmathbb{P}}\left[\left\lVert X\right\rVert\geqslant(1+\varepsilon)2\sqrt{2}\widetilde{\sigma}^{(u)}+t\right]\leqslant\operatorname*{\varmathbb{P}}\left[\left\lVert X\right\rVert\geqslant(1+\varepsilon)2\sqrt{2}\widetilde{\sigma}+t\right]\leqslant n\exp\left(-\frac{t^{2}}{c_{\varepsilon}{\widetilde{\sigma}_{*}}^{2}}\right)\leqslant n\exp\left(-\frac{t^{2}}{c_{\varepsilon}(\widetilde{\sigma}_{*}^{(u)})^{2}}\right).

A.3 A graph decomposition result

The following graph decomposition result for inhomogeneous Erdős-Rényi graphs was established in [LLV17, Theorem 2.6].

Theorem A.3.

[LLV17, Theorem 2.6] Let AA be a directed adjacency matrix sampled from an inhomogeneous Erdős-Rényi G(n,(pjj)j,j)G(n,(p_{jj^{\prime}})_{j,j^{\prime}}) model and let d=nmaxj,jpjjd=n\max_{j,j^{\prime}}p_{jj^{\prime}}. For any r1r\geqslant 1, with probability at least 13nr1-3n^{-r}, the set of edges [n]×[n][n]\times[n] can be partitioned into three classes 𝒩,\mathcal{N},\mathcal{R} and 𝒞\mathcal{C}, such that

  1. 1.

    the signed adjacency matrix concentrates on 𝒩\mathcal{N}

    (A\varmathbbEA)𝒩Cr3/2d,\|(A-\varmathbb{E}A)_{\mathcal{N}}\|\leqslant Cr^{3/2}\sqrt{d},
  2. 2.

    \mathcal{R} (resp. 𝒞\mathcal{C}) intersects at most n/dn/d columns (resp. rows) of [n]×[n][n]\times[n],

  3. 3.

    each row (resp. column) of AA_{\mathcal{R}} (resp. A𝒞A_{\mathcal{C}}) have at most 32r32r non-zero entries.

Appendix B Matrix perturbation analysis

In this section, we recall several standard tools from matrix perturbation theory for studying the perturbation of the spectra of Hermitian matrices. The reader is referred to [SgS90] for a more comprehensive overview of this topic.

Let A\varmathbbCn×nA\in\varmathbb{C}^{n\times n} be Hermitian with eigenvalues λ1λ2λn\lambda_{1}\geqslant\lambda_{2}\geqslant\cdots\geqslant\lambda_{n} and corresponding eigenvectors v1,v2,,vn\varmathbbCnv_{1},v_{2},\dots,v_{n}\in\varmathbb{C}^{n}. Let A~=A+W\widetilde{A}=A+W be a perturbed version of AA, with the perturbation matrix W\varmathbbCn×nW\in\varmathbb{C}^{n\times n} being Hermitian. Let us denote the eigenvalues of A~\widetilde{A} and WW by λ~1λ~n\widetilde{\lambda}_{1}\geqslant\cdots\geqslant\widetilde{\lambda}_{n}, and ϵ1ϵ2ϵn\epsilon_{1}\geqslant\epsilon_{2}\geqslant\cdots\geqslant\epsilon_{n}, respectively.

To begin with, one can quantify the perturbation of the eigenvalues of A~\widetilde{A} with respect to the eigenvalues of AA. Weyl’s inequality [Wey12] is a very useful result in this regard.

Theorem B.1 (Weyl’s Inequality [Wey12]).

For each i=1,,ni=1,\dots,n, it holds that

λi+ϵnλ~iλi+ϵ1.\lambda_{i}+\epsilon_{n}\leqslant\widetilde{\lambda}_{i}\leqslant\lambda_{i}+\epsilon_{1}. (91)

In particular, this implies that λ~i[λiW,λi+W]\widetilde{\lambda}_{i}\in[\lambda_{i}-\left\lVert W\right\rVert,\lambda_{i}+\left\lVert W\right\rVert].

One can also quantify the perturbation of the subspace spanned by eigenvectors of AA, which was established by Davis and Kahan [DK70]. Before introducing the theorem, we need some definitions. Let U,U~\varmathbbCn×kU,\widetilde{U}\in\varmathbb{C}^{n\times k} (for knk\leqslant n) have orthonormal columns respectively, and let σ1σk\sigma_{1}\geqslant\dots\geqslant\sigma_{k} denote the singular values of UU~U^{*}\widetilde{U}. Also, let us denote (U)\mathcal{R}(U) to be the range space of the columns of UU, and similarly for (U~)\mathcal{R}(\widetilde{U}). Then the kk principal angles between (U),(U~)\mathcal{R}(U),\mathcal{R}(\widetilde{U}) are defined as θi:=cos1(σi)\theta_{i}:=\cos^{-1}(\sigma_{i}) for 1ik1\leqslant i\leqslant k, with each θi[0,π/2]\theta_{i}\in[0,\pi/2]. It is usual to define k×kk\times k diagonal matrices Θ((U),(U~)):=diag(θ1,,θk)\Theta(\mathcal{R}(U),\mathcal{R}(\widetilde{U})):=\text{diag}(\theta_{1},\dots,\theta_{k}) and sinΘ((U),(U~)):=diag(sinθ1,,sinθk)\sin\Theta(\mathcal{R}(U),\mathcal{R}(\widetilde{U})):=\text{diag}(\sin\theta_{1},\dots,\sin\theta_{k}). Denoting |||||||||\cdot||| to be any unitarily invariant norm (Frobenius, spectral, etc.), the following relation holds (see for eg., [Li94, Lemma 2.1], [SgS90, Corollary I.5.4]).

|sinΘ((U),(U~))|=|(IU~U~)U|.|||\sin\Theta(\mathcal{R}(U),\mathcal{R}(\widetilde{U}))|||=|||(I-\widetilde{U}\widetilde{U}^{*})U|||.

With the above notation in mind, we now introduce a version of the Davis-Kahan theorem taken from [YWS15, Theorem 1] (see also [SgS90, Theorem V.3.6]).

Theorem B.2 (Davis-Kahan).

Fix 1rsn1\leqslant r\leqslant s\leqslant n, let d=sr+1d=s-r+1, and let U=(ur,ur+1,,us)\varmathbbCn×dU=(u_{r},u_{r+1},\dots,u_{s})\in\varmathbb{C}^{n\times d} and U~=(u~r,u~r+1,,u~s)\varmathbbCn×d\widetilde{U}=(\widetilde{u}_{r},\widetilde{u}_{r+1},\dots,\widetilde{u}_{s})\in\varmathbb{C}^{n\times d}. Write

δ=inf{|λ^λ|:λ[λs,λr],λ^(,λ~s+1][λ~r1,)}\delta=\inf\left\{\left\lvert\hat{\lambda}-\lambda\right\rvert:\lambda\in[\lambda_{s},\lambda_{r}],\hat{\lambda}\in(-\infty,\widetilde{\lambda}_{s+1}]\cup[\widetilde{\lambda}_{r-1},\infty)\right\}

where we define λ~0=\widetilde{\lambda}_{0}=\infty and λ~n+1=\widetilde{\lambda}_{n+1}=-\infty and assume that δ>0\delta>0. Then

|sinΘ((U),(U~))|=|(IU~U~)U||W|δ.|||\sin\Theta(\mathcal{R}(U),\mathcal{R}(\widetilde{U}))|||=|||(I-\widetilde{U}\widetilde{U}^{*})U|||\leqslant\frac{|||W|||}{\delta}.

For instance, if r=s=jr=s=j, then by using the spectral norm \left\lVert\cdot\right\rVert, we obtain

sinΘ((v~j),(vj))=(Ivjvj)v~jWmin{|λ~j1λj|,|λ~j+1λj|}.\sin\Theta(\mathcal{R}(\widetilde{v}_{j}),\mathcal{R}(v_{j}))=\left\lVert(I-v_{j}v_{j}^{*})\widetilde{v}_{j}\right\rVert\leqslant\frac{\left\lVert W\right\rVert}{\min\left\{\left\lvert\widetilde{\lambda}_{j-1}-\lambda_{j}\right\rvert,\left\lvert\widetilde{\lambda}_{j+1}-\lambda_{j}\right\rvert\right\}}. (92)

Finally, we recall the following standard result which states that given any pair of kk-dimensional subspaces with orthonormal basis matrices U,U~\varmathbbRn×kU,\widetilde{U}\in\varmathbb{R}^{n\times k}, there exists an alignment of U,U~U,\widetilde{U} with the error after alignment bounded by the distance between the subspaces. We provide the proof for completeness.

Proposition B.3.

Let U,U~\varmathbbRn×kU,\widetilde{U}\in\varmathbb{R}^{n\times k} respectively consist of orthonormal vectors. Then there exists a k×kk\times k rotation matrix OO such that

U~UO2(IUUT)U~.\left\lVert\widetilde{U}-UO\right\rVert\leqslant 2\left\lVert(I-UU^{T})\widetilde{U}\right\rVert.
Proof.

Write the SVD as UTU~=VΣ(V)TU^{T}\widetilde{U}=V\Sigma(V^{\prime})^{T}, where we recall that the iith largest singular value σi=cosθi\sigma_{i}=\cos\theta_{i} with θi[0,π/2]\theta_{i}\in[0,\pi/2] denoting the principal angles between (U)\mathcal{R}(U) and (U~)\mathcal{R}(\widetilde{U}). Choosing O=V(V)TO=V(V^{\prime})^{T}, we then obtain

U~UV(V)T\displaystyle\left\lVert\widetilde{U}-UV(V^{\prime})^{T}\right\rVert U~UUTU~+UUTU~UV(V)T\displaystyle\leqslant\left\lVert\widetilde{U}-UU^{T}\widetilde{U}\right\rVert+\left\lVert UU^{T}\widetilde{U}-UV(V^{\prime})^{T}\right\rVert
=(IUUT)U~+UTU~V(V)T\displaystyle=\left\lVert(I-UU^{T})\widetilde{U}\right\rVert+\left\lVert U^{T}\widetilde{U}-V(V^{\prime})^{T}\right\rVert
=(IUUT)U~+IΣ\displaystyle=\left\lVert(I-UU^{T})\widetilde{U}\right\rVert+\left\lVert I-\Sigma\right\rVert
2(IUUT)U~,\displaystyle\leqslant 2\left\lVert(I-UU^{T})\widetilde{U}\right\rVert,

where the last inequality follows from the fact IΣ=1cosθksinθk\left\lVert I-\Sigma\right\rVert=1-\cos\theta_{k}\leqslant\sin\theta_{k}. ∎

Appendix C Summary of main technical tools

This section collects certain technical results that were used in the course of proving our main results.

Proposition C.1 ([Bha96, Theorem X.1.1]).

For matrices A,B0A,B\succ 0,

A1/2B1/2AB1/2\left\lVert A^{1/2}-B^{1/2}\right\rVert\leqslant||A-B||^{1/2}

holds as ()1/2(\cdot)^{1/2} is operator monotone.

Proposition C.2.

For symmetric matrices A+A^{+}, AA^{-}, B+B^{+} and BB^{-} where A,B0A^{-},B^{-}\succ 0, the following holds.

(A)1/2A+(A)1/2(B)1/2B+(B)1/2\displaystyle\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}B^{+}(B^{-})^{-1/2}\right\rVert
(A)1A+(I(B)1/2(A)1/22+2I(B)1/2(A)1/2)+(B)1A+B+\displaystyle\qquad\leqslant\left\lVert(A^{-})^{-1}\right\rVert\left\lVert A^{+}\right\rVert\left(\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert^{2}+2\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert\right)+\left\lVert(B^{-})^{-1}\right\rVert\left\lVert A^{+}-B^{+}\right\rVert
(A)1A+((B)1(B)(A)+2(B)1/2(B)(A)1/2)+(B)1A+B+.\displaystyle\qquad\leqslant\left\lVert(A^{-})^{-1}\right\rVert\left\lVert A^{+}\right\rVert\left(\left\lVert(B^{-})^{-1}\right\rVert\left\lVert(B^{-})-(A^{-})\right\rVert+2\left\lVert(B^{-})^{-1/2}\right\rVert\left\lVert(B^{-})-(A^{-})\right\rVert^{1/2}\right)+\left\lVert(B^{-})^{-1}\right\rVert\left\lVert A^{+}-B^{+}\right\rVert\,.
Proof.
(A)1/2A+(A)1/2(B)1/2B+(B)1/2\displaystyle\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}B^{+}(B^{-})^{-1/2}\right\rVert
=(A)1/2A+(A)1/2(B)1/2A+(B)1/2+(B)1/2A+(B)1/2(B)1/2B+(B)1/2\displaystyle=\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}+(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}-(B^{-})^{-1/2}B^{+}(B^{-})^{-1/2}\right\rVert
(B)1/2(A+B+)(B)1/2+(A)1/2A+(A)1/2(B)1/2A+(B)1/2.\displaystyle\leqslant\left\lVert(B^{-})^{-1/2}(A^{+}-B^{+})(B^{-})^{-1/2}\right\rVert+\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}\right\rVert\,.

Now, we bound the two terms separately. The first term is easy to bound.

(B)1/2(A+B+)(B)1/2\displaystyle\left\lVert(B^{-})^{-1/2}(A^{+}-B^{+})(B^{-})^{-1/2}\right\rVert (B)1/2A+B+(B)1/2\displaystyle\leqslant\left\lVert(B^{-})^{-1/2}\right\rVert\left\lVert A^{+}-B^{+}\right\rVert\left\lVert(B^{-})^{-1/2}\right\rVert
=(B)1A+B+.\displaystyle=\left\lVert(B^{-})^{-1}\right\rVert\left\lVert A^{+}-B^{+}\right\rVert\,. (93)

To bound the second term, we do the following manipulations,

(A)1/2A+(A)1/2(B)1/2A+(B)1/2\displaystyle\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}\right\rVert
=(A)1/2A+(A)1/2(A)1/2(A)1/2(B)1/2A+(B)1/2(A)1/2(A)1/2\displaystyle\qquad=\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(A^{-})^{-1/2}(A^{-})^{1/2}(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}(A^{-})^{1/2}(A^{-})^{-1/2}\right\rVert
=(A)1/2(A+(A)1/2(B)1/2A+(B)1/2(A)1/2)(A)1/2\displaystyle\qquad=\left\lVert(A^{-})^{-1/2}\left(A^{+}-(A^{-})^{1/2}(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}(A^{-})^{1/2}\right)(A^{-})^{-1/2}\right\rVert
=(A)1/2(A+((A)1/2(B)1/2I+I)A+((B)1/2(A)1/2I+I))(A)1/2\displaystyle\qquad=\left\lVert(A^{-})^{-1/2}\left(A^{+}-\left((A^{-})^{1/2}(B^{-})^{-1/2}-I+I\right)A^{+}\left((B^{-})^{-1/2}(A^{-})^{1/2}-I+I\right)\right)(A^{-})^{-1/2}\right\rVert
=(A)12(((A)12(B)12I)A+((B)12(A)12I)+A+((B)12(A)12I)+((A)12(B)12I)A+)(A)12\displaystyle\qquad=\left\lVert(A^{-})^{\frac{-1}{2}}\left(((A^{-})^{\frac{1}{2}}(B^{-})^{\frac{-1}{2}}-I)A^{+}((B^{-})^{\frac{-1}{2}}(A^{-})^{\frac{1}{2}}-I)+A^{+}((B^{-})^{\frac{-1}{2}}(A^{-})^{\frac{1}{2}}-I)+((A^{-})^{\frac{1}{2}}(B^{-})^{\frac{-1}{2}}-I)A^{+}\right)(A^{-})^{\frac{-1}{2}}\right\rVert
(A)1A+(I(B)1/2(A)1/22+2I(B)1/2(A)1/2).\displaystyle\qquad\leqslant\left\lVert(A^{-})^{-1}\right\rVert\left\lVert A^{+}\right\rVert\left(\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert^{2}+2\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert\right)\,. (94)

The first inequality of the lemma follows by adding (C) and (C).

To see the second inequality of the lemma, observe that,

I(B)1/2(A)1/2\displaystyle\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert =(B)1/2((B)1/2(A)1/2)\displaystyle=\left\lVert(B^{-})^{-1/2}((B^{-})^{1/2}-(A^{-})^{1/2})\right\rVert
(B)1/2(B)1/2(A)1/2\displaystyle\leqslant\left\lVert(B^{-})^{-1/2}\right\rVert\left\lVert(B^{-})^{1/2}-(A^{-})^{1/2}\right\rVert
(B)1/2BA1/2( using Proposition C.1).\displaystyle\leqslant\left\lVert(B^{-})^{-1/2}\right\rVert\left\lVert B^{-}-A^{-}\right\rVert^{1/2}\quad(\text{ using \hyperref@@ii[prop:op_monotone]{Proposition~\ref*{prop:op_monotone}}})\,. (95)

The second inequality of the lemma follows by substituting (C) in the first inequality of the lemma.

Appendix D Proofs from Section 4

Lemma D.1 (Expression for Ce+C^{+}_{e} & CeC^{-}_{e}).
Ce+=pηnd+χ1χ1+(1+τ+pd+(1ηnk(12η)))I,C^{+}_{e}=-p\eta\frac{n}{d^{+}}\chi_{1}\chi_{1}^{\top}+\left(1+\tau^{-}+\frac{p}{d^{+}}\left(1-\eta-\frac{n}{k}(1-2\eta)\right)\right)I\,,
Ce=p(1η)ndχ1χ1+(1+τ++pd(η+nk(12η)))I.C^{-}_{e}=-p(1-\eta)\frac{n}{d^{-}}\chi_{1}\chi_{1}^{\top}+\left(1+\tau^{+}+\frac{p}{d^{-}}\left(\eta+\frac{n}{k}(1-2\eta)\right)\right)I\,.

It follows that can be written as Ce+=RΣ+RC^{+}_{e}=R\Sigma^{+}R^{\top} and Ce=RΣRC^{-}_{e}=R\Sigma^{-}R^{\top}, where RR is a rotation matrix, and

Σ+=[(1+τ+pd+(1ηn(η+12ηk)))(1+τ+pd+(1ηn(12ηk)))Ik1],\Sigma^{+}=\begin{bmatrix}\left(1+\tau^{-}+\frac{p}{d^{+}}\left(1-\eta-n\left(\eta+\frac{1-2\eta}{k}\right)\right)\right)\\ &\left(1+\tau^{-}+\frac{p}{d^{+}}\left(1-\eta-n\left(\frac{1-2\eta}{k}\right)\right)\right)I_{k-1}\end{bmatrix}\,,
Σ=[(1+τ++pd(ηn(1η12ηk)))(1+τ++pd(η+n(12ηk)))Ik1].\Sigma^{-}=\begin{bmatrix}\left(1+\tau^{+}+\frac{p}{d^{-}}\left(\eta-n\left(1-\eta-\frac{1-2\eta}{k}\right)\right)\right)\\ &\left(1+\tau^{+}+\frac{p}{d^{-}}\left(\eta+n\left(\frac{1-2\eta}{k}\right)\right)\right)I_{k-1}\end{bmatrix}\,.

The above lemma shows that we know the spectrum of (C)1/2C+(C)1/2(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2} exactly, in the case of equal-sized clusters.

Proof of Lemma 4.6.

From (17) it follows that,

λmax(C+)maxi[k](1+τ+pdi+(1ηni(12η))).\lambda_{\max}(C^{+})\leqslant\max_{i\in[k]}\left(1+\tau^{-}+\frac{p}{d_{i}^{+}}(1-\eta-n_{i}(1-2\eta))\right)\,.

The maximum is achieved for the smallest sized cluster. This shows the proof for (33).

The proof of (34) follows from the fact that in (24) we had decomposed the matrix Lsym¯+τ+I\overline{L^{-}_{sym}}+\tau^{+}I as a block-diagonal matrix, with block of C,α1In11,,αkInk1C^{-},\alpha_{1}^{-}I_{n_{1}-1},\ldots,\alpha_{k}^{-}I_{n_{k}-1}. Since Lsym¯\overline{L^{-}_{sym}} is a symmetric Laplacian, we know that λmin(Lsym¯+τ+I)=τ+\lambda_{\min}(\overline{L^{-}_{sym}}+\tau^{+}I)=\tau^{+}. Also, αi>τ+\alpha_{i}^{-}>\tau^{+} for i[k]i\in[k]. Thus the equation follows. ∎

Appendix E Spectrum of Signed Laplacians

This section extends some classical results for the unsigned Laplacian to the symmetric Signed Laplacian and the regularized Laplacian.

Lemma E.1.

For all x\varmathbbRnx\in\varmathbb R^{n},

xTLsym¯x=12j,j|Ajj|(xjdjsgn(Ajj)xjdj)2x^{T}\overline{L_{sym}}x=\frac{1}{2}\sum_{j,j^{\prime}}|A_{jj^{\prime}}|\left(\frac{x_{j}}{\sqrt{d_{j}}}-sgn(A_{jj^{\prime}})\frac{x_{j^{\prime}}}{\sqrt{d_{j^{\prime}}}}\right)^{2} (96)

Moreover, the eigenvalues of Lsym¯\overline{L_{sym}} and LγL_{\gamma} are in the interval [0,2][0,2].

Proof.

Equation (96) is adapted from Proposition 5.2 from [Gal16] and is obtained by replacing xx by D¯1/2x\bar{D}^{-1/2}x. The second part of the lemma comes from the fact that (a±b)22(a2+b2)(a\pm b)^{2}\leqslant 2(a^{2}+b^{2}). In fact, for x\varmathbbRnx\in\varmathbb R^{n} such that x=1\left\lVert x\right\rVert=1, we have

xTLsym¯x\displaystyle x^{T}\overline{L_{sym}}x j,j|Ajj|(xj2dj+xj2dj)\displaystyle\leqslant\sum_{j,j^{\prime}}|A_{jj^{\prime}}|\left(\frac{x_{j}^{2}}{d_{j}}+\frac{x_{j^{\prime}}^{2}}{d_{j^{\prime}}}\right)
=2j,j|Ajj|xj2dj=2jxj2=2.\displaystyle=2\sum_{j,j^{\prime}}|A_{jj^{\prime}}|\frac{x_{j}^{2}}{d_{j}}=2\sum_{j}x_{j}^{2}=2.

Similarly, we have

xTLγx\displaystyle x^{T}L_{\gamma}x j,j|(Aγ)jj|(xj2D¯jj+γ+xj2D¯jj+γ)\displaystyle\leqslant\sum_{j,j^{\prime}}|(A_{\gamma}){jj^{\prime}}|\left(\frac{x_{j}^{2}}{\bar{D}_{jj}+\gamma}+\frac{x_{j^{\prime}}^{2}}{\bar{D}_{j^{\prime}j^{\prime}}+\gamma}\right)
2j,j(|Ajj|+γn)xj2D¯jj+γ\displaystyle\leqslant 2\sum_{j,j^{\prime}}(|A_{jj^{\prime}}|+\frac{\gamma}{n})\frac{x_{j}^{2}}{\bar{D}_{jj}+\gamma}
=2j(D¯jj+γ)xj2D¯jj+γ=2.\displaystyle=2\sum_{j}\frac{(\bar{D}_{jj}+\gamma)x_{j}^{2}}{\bar{D}_{jj}+\gamma}=2.

Moreover Lsym¯\overline{L_{sym}} and LγL_{\gamma} are positive semi-definite, thus we can conclude that their eigenvalues are between 0 and 2. ∎