SPONGE Extension

Mihai Cucuringu
mihai.cucuringu@stats.ox.ac.uk &Apoorv Vikram Singh
apoorv.singh@nyu.edu &Déborah Sulem
deborah.sulem@stats.ox.ac.uk &Hemant Tyagi
hemant.tyagi@inria.fr University of Oxford, Department of Statistics and Mathematical Institute. The Alan Turing Institute, London, UK. This work was supported by EPSRC grant EP/N510129/1.New York University, Department of Computer Science and Engineering. This work was done while the author was visiting the MODAL team at Inria Lille-Nord Europe.University of Oxford, Department of Statistics.Inria, Univ. Lille, CNRS, UMR 8524 - Laboratoire Paul Painlevé, F-59000.Authors are listed in alphabetical order.

Regularized spectral methods for clustering signed networks

Abstract

We study the problem of $k$ -way clustering in signed graphs. Considerable attention in recent years has been devoted to analyzing and modeling signed graphs, where the affinity measure between nodes takes either positive or negative values. Recently, [CDGT19] proposed a spectral method, namely SPONGE (Signed Positive over Negative Generalized Eigenproblem), which casts the clustering task as a generalized eigenvalue problem optimizing a suitably defined objective function. This approach is motivated by social balance theory, where the clustering task aims to decompose a given network into disjoint groups, such that individuals within the same group are connected by as many positive edges as possible, while individuals from different groups are mainly connected by negative edges. Through extensive numerical simulations, SPONGE was shown to achieve state-of-the-art empirical performance. On the theoretical front, [CDGT19] analyzed SPONGE, as well as the popular Signed Laplacian based spectral method under the setting of a Signed Stochastic Block Model, for $k=2$ equal-sized clusters, in the regime where the graph is moderately dense.

In this work, we build on the results in [CDGT19] on two fronts for the normalized versions of SPONGE and the Signed Laplacian. Firstly, for both algorithms, we extend the theoretical analysis in [CDGT19] to the general setting of $k\geqslant 2$ unequal-sized clusters in the moderately dense regime. Secondly, we introduce regularized versions of both methods to handle sparse graphs – a regime where standard spectral methods are known to underperform – and provide theoretical guarantees under the same setting of a Signed Stochastic Block Model. To the best of our knowledge, regularized spectral methods have so far not been considered in the setting of clustering signed graphs. We complement our theoretical results with an extensive set of numerical experiments on synthetic data.

Keywords: signed clustering, graph Laplacians, stochastic block models, spectral methods, regularization techniques, sparse graphs.

1 Introduction

Signed graphs.

The recent years have seen a significant increase in interest for analysis of signed graphs, for tasks such as clustering [CHN⁺14, CDGT19], link prediction [LHK10, KSSF16] and visualization [KSL⁺10]. Signed graphs are an increasingly popular family of undirected graphs, for which the edge weights may take both positive and negative values, thus encoding a measure of similarity or dissimilarity between the nodes. Signed social graphs have also received considerable attention to model trust relationships between entities, with positive (respectively, negative) edges encoding trust (respectively, distrust) relationships.

Clustering is arguably one of the most popular tasks in unsupervised machine learning, aiming at partitioning the node set such that the average connectivity or similarity between pairs of nodes within the same cluster is larger than that of pairs of nodes spanning different clusters. While the problem of clustering undirected unsigned graphs has been thoroughly studied for the past two decades (and to some extent, also that of clustering directed graphs in recent years), a lot less research has been undertaken on studying signed graphs.

Spectral clustering and regularization.

Spectral clustering methods have become a fundamental tool with a broad range of applications in areas including network science, machine learning and data mining [vL07]. The attractivity of spectral clustering methods stems, on one hand, from its computational scalability by leveraging state-of-the-art eigensolvers, and on the other hand, from the fact that such algorithms are amenable to a theoretical analysis under suitably defined stochastic block models that quantify robustness to noise and sparsity of the measurement graph. Furthermore, on the theoretical side, understanding the spectrum of the adjacency matrix and its Laplacians, is crucial for the development of efficient algorithms with performance guarantees, and leads to a very mathematically rich set of problems. One such example from the latter class is that of Cheeger inequalities for general graphs, which relate the dominant eigenvalues of the Laplacian to edge expansion on graphs [Chu96], extended to the setup of directed graphs [Chu05], and more recently, to the graph Connection Laplacian arising in the context of the group synchronization problem [BSS13], and higher-order Cheeger inequalities for multiway spectral clustering [LGT14]. There has been significant recent advances in theoretically analyzing spectral clustering methods in the context of stochastic block models; for a detailed survey, we refer the reader to the comprehensive recent survey of Abbe [Abb17].

In general, spectral clustering algorithms for unsigned and signed graphs typically have a common pipeline, where a suitable graph operator is considered (e.g., the graph Laplacian), its (usually $k$ ) extremal eigenvectors are computed, and the resulting point cloud in $\varmathbb{R}^{k}$ is clustered using a variation of the popular $k$ -means algorithm [RCY⁺11]. The main motivation for our current work stems from the lack of statistical guarantees in the above literature for the signed clustering problem, in the context of sparse graphs and large number of clusters $k\geqslant 3$ . The problem of $k$ -way clustering in signed graphs aims to find a partition of the node set into $k$ disjoint clusters, such that most edges within clusters are positive, while most edges across clusters are negative, thus altogether maximizing the number of satisfied edges in the graph. Another potential formulation to consider is to minimize the number of (unsatisfied) edges violating the partitions, i.e, the number of negative edges within clusters and positive edges across clusters.

A regularization step has been introduced in the recent literature motivated by the observation that properly regularizing the adjacency matrix $A$ of a graph can significantly improve performance of spectral algorithms in the sparse regime. It was well known beforehand that standard spectral clustering often fails to produce meaningful results for sparse networks that exhibit strong degree heterogeneity [ACBL13, J⁺15]. To this end, [CCT12] proposed the regularized graph Laplacian $L^{\tau}=D_{\tau}^{-1/2}AD_{\tau}^{-1/2},$ where $D_{\tau}=D+\tau I$ , for $\tau\geqslant 0$ . The spectral algorithm introduced and analyzed in [CCT12] splits the nodes into two random subsets and only relies on the subgraph induced by only one of the subsets to compute the spectral decomposition. Tai and Karl [QR13] studied the more traditional formulation of a spectral clustering algorithm that uses the spectral decomposition on the entire matrix [NJW⁺02], and proposed a regularized spectral clustering which they analyze. Subsequently, [JY16] provided a theoretical justification for the regularization $A_{\tau}=A+\tau J$ , where $J$ denotes the all ones matrix, partly explaining the empirical findings of [ACBL13] that the performance of regularized spectral clustering becomes insensitive for larger values of regularization parameters, and show that such large values can lead to better results. It is this latter form of regularization that we would be leveraging in our present work, in the context of clustering signed graphs. Additional references and discussion on the regularization literature are provided in Section 1.2.

Motivation & Applications.

The recent surge of interest in analyzing signed graphs has been fueled by a very wide range of real-world applications, in the context of clustering, link prediction, and node rankings. Such social signed networks model trust relationships between users with positive (trust) and negative (distrust) edges. A number of online social services such as Epinions [epi] and Slashdot [sla] that allow users to express their opinions are naturally represented as signed social networks [LHK10]. [BSG⁺12] considered shopping bipartite networks that encode like and dislike preferences between users and products. Other domain specific applications include personalized rankings via signed random walks [JJSK16], node rankings and centrality measures [LFZ19], node classification [TAL16], community detection [YCL07, CWP⁺16], and anomaly detection, as in [KSS14] which classifies users of an online signed social network as malicious or benign. In the very active research area of synthetic data generation, generative models for signed networks inspired by Structural Balance Theory have been proposed in [DMT18]. Learning low-dimensional representations of graphs (network embeddings) have received tremendous attention in the recent machine learning literature, and graph convolutional networks-based methods have also been proposed for the setting of signed graphs, including [DMT18, LTJC20], which provide network embeddings to facilitate subsequent downstream tasks, including clustering and link prediction.

A key motivation for our line of work stems from time series clustering [ASW15], an ubiquitous task arising in many applications that consider biological gene expression data [FSK⁺12], economic time series that capture macroeconomic variables [Foc05], and financial time series corresponding to large baskets of instruments in the stock market [ZJGK10, PPTV06]. Driven by the clustering task, a popular approach in the literature is to consider similarity measures based on the Pearson correlation coefficient that captures linear dependence between variables and takes values in $[-1,1]$ . By construing the correlation matrix as a weighted network whose (signed) edge weights capture the pairwise correlations, we cluster the multivariate time series by clustering the underlying signed network. To increase robustness, tests of statistical significance are often applied to individual pairwise correlations, indicating the probability of observing a correlation at least as large as the measured sample correlation, assuming the null hypothesis is true. Such a thresholding step on the $p$ -value associated to each individual sample correlation [HLN15], renders the correlation network as a sparse matrix, which is one of the main motivations of our current work which proposes and analyzes algorithms for handling such sparse signed networks. We refer the reader to the popular work of Smith et al. [SMSK⁺11] for a detailed survey and comparison of various methodologies for turning time series data into networks, where the authors explore the interplay between fMRI time series and the network generation process. Importantly, they conclude that, in general, correlation-based approaches can be quite successful at estimating the connectivity of brain networks from fMRI time series.

Paper outline.

This paper is structured as follows. The remainder of this Section 1 establishes the notation used throughout the paper, followed by a brief survey of related work in the signed clustering literature and graph regularization techniques for general graphs, along by a brief summary of our main contributions. Section 2 lays out the problem setup leading to our proposed algorithms in the context of the signed stochastic block model we subsequently analyze. Section 3 is a high-level summary of our main results across the two algorithms we consider. Section 4 contains the analysis of the proposed SPONGE_sym algorithm, for both the sparse and dense regimes, for general number of clusters. Similarly, Section 5 contains the main theoretical results for the symmetric Signed Laplacian, under both sparsity regimes as well. Section 6 contains detailed numerical experiments on various synthetic data sets, showcasing the performance of our proposed algorithms as we vary the number of clusters, the relative cluster sizes, the sparsity regimes, and the regularization parameters. Finally, Section 7 is a summary and discussion of our main findings, with an outlook towards potential future directions. We defer to the Appendix additional proof details and a summary of the main technical tools used throughout.

1.1 Notation

We denote by $G=(V,E)$ a signed graph with vertex set $V$ , edge set $E$ , and adjacency matrix $A\in\{0,\pm 1\}^{n\times n}$ . We will also refer to the unsigned subgraphs of positive (resp. negative) edges $G^{+}=(V,E^{+})$ (resp. $G^{-}=(V,E^{-})$ ) with adjacency matrices $A^{+}$ (resp. $A^{-}$ ), such that $A=A^{+}\hskip-2.84526pt-A^{-}$ . More precisely, $A_{ij}^{+}=\max\left\{A_{ij},0\right\}$ and $A_{ij}^{-}=\max\left\{-A_{ij},0\right\}$ , with $E^{+}\hskip 0.0pt\cap E^{-}\hskip-2.84526pt=\emptyset$ , and $E^{+}\hskip 0.0pt\cup E^{-}\hskip-2.84526pt=E$ . We denote by $\overline{D}=D^{+}+D^{-}$ the signed degree matrix, with the unsigned versions given by $D^{+}:=A^{+}\mathds{1}$ and $D^{-}:=A^{-}\mathds{1}$ . For a subset of nodes $C\subset V$ , we denote its complement by $\overline{C}=V\setminus C$ .

For a matrix $M\in\varmathbb R^{m\times n}$ , $\left\lVert M\right\rVert$ denotes its spectral norm $\left\lVert M\right\rVert_{2}$ , i.e., its largest singular value, and $\|M\|_{F}$ denotes its Frobenius norm. When $M$ is a $n\times n$ symmetric matrix, we denote $V_{k}(M)$ be the $n\times k$ matrix whose columns are given by the eigenvectors corresponding to the $k$ smallest eigenvalues, and let $\mathcal{R}(V_{k}(M))$ denote the range space of these eigenvectors. We denote the eigenvalues of $M$ by $(\lambda_{j}(M))_{j=1}^{n}$ , with the ordering

\lambda_{n}(M)\leqslant\lambda_{n-1}(M)\leqslant\dots\leqslant\lambda_{1}(M).

We also denote $M_{i*}$ to be the $i$ -th row of $M$ . We denote $\mathds{1}=(1,\dots,1)$ (resp. $\mathds{1}_{k}$ ) the all ones column vector of size $n$ (resp. $k$ ) and $\chi_{1}=\frac{1}{\sqrt{k}}\mathds{1}_{k}$ . $I_{m}$ denotes the square identity matrix of size $m$ and is shortened to $I$ when $m=n$ . $J_{mn}$ is the $m\times n$ matrix of all ones. Finally, for $a,b\geqslant 0$ , we write $a\lesssim b$ if there exists a universal constant $C>0$ such that $a\leqslant b$ . If $a\lesssim b$ and $b\lesssim a$ , then we write $a\asymp b$ .

1.2 Related literature on signed clustering and graph regularization techniques

Signed clustering.

There exists a very rich literature on algorithms developed to solve the $k$ -way clustering problem, with spectral methods playing a central role in the developments of the last two decades. Such spectral techniques optimize an objective function via the eigen-decomposition of a suitably chosen graph operator (typically a graph Laplacian) built directly from the data, in order to obtain a low-dimensional embedding (most often of dimension $k$ or $k-1$ ). A clustering algorithm such as $k$ -means or $k$ -means++ is subsequently applied in order to extract the final partition.

Kunegis et al. in [KSL⁺10] introduced the combinatorial Signed Laplacian $\overline{L}=\overline{D}-A$ for the 2-way clustering problem. For heterogeneous degree distributions, normalized extensions are generally preferred, such as the random-walk Signed Laplacian $\overline{L_{rw}}=I-\overline{D}^{-1}A$ , and the symmetric Signed Laplacian $\overline{L_{sym}}=I-\overline{D}^{-1/2}A\overline{D}^{-1/2}$ . Chiang et al. [CWD12] pointed out a weakness in the Signed Laplacian objective for $k$ -way clustering with $k>2$ , and proposed instead a Balanced Normalized Cut (BNC) objective based on the operator $\overline{L_{BNC}}=\overline{D}^{-1/2}(D^{+}-A)\overline{D}^{-1/2}$ . Mercado et al. [MTH16] based their clustering algorithm on a new operator called the Geometric Mean of Laplacians, and later extended this method in [MTH19] to a family of operators called the Matrix Power Mean of Laplacians. Previous work [CDGT19] by a subset of the authors of the present paper introduced the symmetric SPONGE objective using the matrix operator $T=(L^{-}_{sym}+\tau^{+}I)^{-1/2}(L^{+}_{sym}+\tau^{-}I)(L^{-}_{sym}+\tau^{+}I)^{-1/2}$ , using the unsigned normalized Laplacians $L_{sym}^{\pm}=I-(D^{\pm})^{-1/2}A^{\pm}(D^{\pm})^{-1/2}$ and regularization parameters $\tau^{+},\tau^{-}>0$ . This work also provides theoretical guarantees for the SPONGE and Signed Laplacian algorithms, in the setting of a Signed Stochastic Block Model.

In [MTH16] and [MTH19], Mercado et al. study the eigenspaces - in expectations and in probability - of several graph operators in a certain Signed Stochastic Block Model. However, this generative model differs from the one proposed in [CDGT19] that we analyze in this work. In the former, the positive and negative adjacency matrices do not have disjoint support, contrary to the latter. Moreover, their analysis is performed in the case of equal-size clusters. We will later show in our analysis that their result for the symmetric Signed Laplacian is not applicable in our setting.

Hsieh et al. [HCD12] proposed to perform low-rank matrix completion as a preprocessing step, before clustering using the top $k$ eigenvectors of the completed matrix. For $k=2$ , [Cuc15] showed that signed clustering can be cast as an instance of the group synchronization [Sin11] problem over $\varmathbb{Z}_{2}$ , potentially with constraints given by available side information, for which spectral, semidefinite programming relaxations, and message passing algorithms have been considered. In recent work, [CPv19] proposed a formulation for the signed clustering problem that relates to graph-based diffuse interface models utilizing the Ginzburg-Landau functionals, based on an adaptation of the classic numerical Merriman-Bence-Osher (MBO) scheme for minimizing such graph-based functionals [MSB14]. We refer the reader to [Gal13] for a recent survey on clustering signed and unsigned graphs.

In a different line of work, known as correlation clustering, Bansal et al. [BBC04] considered the problem of clustering signed complete graphs, proved that it is NP-complete, and proposed two approximation algorithms with theoretical guarantees on their performance. On a related note, Demaine and Immorlica [DEFI06] studied the same problem but for arbitrary weighted graphs, and proposed an O( $\log n$ ) approximation algorithm based on linear programming. For correlation clustering, in contrast to $k$ -way clustering, the number of clusters is not given in advance, and there is no normalization with respect to size or volume.

Regularization in the sparse regime.

In many applications, real-world networks are sparse. In this context, regularization methods have increased the performance of traditional spectral clustering techniques, both for synthetic Stochastic Block Models and real data sets [CCT12, ACBL13, JY16, LLV15].

Chaudhuri et al. [CCT12] regularize the Laplacian matrix by adding a (typically small) weight $\tau$ to the diagonal entries of the degree matrix $L_{\tau}=I-D_{\tau}^{-1/2}AD_{\tau}^{-1/2}$ with $D_{\tau}=D+\tau I$ . Amini et al. [ACBL13] regularize the graph by adding a weight $\tau/n$ to every edge, leading to the Laplacian $\widetilde{L}_{\tau}=I-D_{\tau}^{-1/2}A_{\tau}D_{\tau}^{-1/2}$ with $A_{\tau}=A+\tau/n\mathds{1}\mathds{1}^{T}$ and $D_{\tau}=A_{\tau}\mathds{1}$ . Le et al. [LLV17] proved that this technique makes the adjacency and Laplacian matrices concentrate for inhomogeneous Erdős-Rényi graphs. Zhang et al. [ZR18] showed that this technique prevents spectral clustering from overfitting through the analysis of dangling sets. In [LLV17], Le et al. propose a graph trimming method in order to reduce the degree of certain nodes. This is achieved by reducing the entries of the adjacency matrix that lead to high-degree vertices. Zhou and Amini [ZA18] added a spectral truncation step after this regularization method, and proved consistency results in the bipartite Stochastic Block Model.

Very recently, regularization methods using powers of the adjacency matrix have been introduced. Abbe et al. [ABARS20] transform the adjacency matrix into the operator $A_{r}=\mathds{1}\left\{(I+A)^{r}\geqslant 1\right\}$ , where the indicator function is applied entrywise. With this method, spectral clustering achieves the fundamental limit for weak recovery in the sparse setting. Very similarly, Stefan and Massoulié [SM19] transform the adjacency matrix into a distance matrix of outreach $l$ , which links pairs of nodes that are $l$ far apart w.r.t the graph distance.

1.3 Summary of our main contributions

This work extends the results obtained in [CDGT19] by a subset of the authors of our present paper. This previous work introduced the SPONGE algorithm, a principled and scalable spectral method for the signed clustering task that amounts to solving a generalized eigenvalue problem. [CDGT19] provided a theoretical analysis of both the newly introduced SPONGE algorithm and the popular Signed Laplacian-based method [KSL⁺10], quantifying their robustness against the sampling sparsity and noise level, under the setting of a Signed Stochastic Block Model (SSBM). These were the first such theoretical guarantees for the signed clustering problem under a suitably defined stochastic graph model. However, the analysis in [CDGT19] was restricted to the setting of $k=2$ equally-sized clusters, which is less realistic in light of most real world applications. Furthermore, the same previous line of work considered the moderately dense regime in terms of the edge sampling probability, in particular, it operated in the setting where $\operatorname*{\varmathbb{E}}[\overline{D}_{jj}]\gtrsim\ln n$ , i.e., $p\gtrsim\frac{\ln n}{n}$ . Many real world applications involve large but very sparse graphs, with $p=\Theta\left(\frac{1}{n}\right)$ , which provides motivation for our present work.

We summarize below our main contributions, and start with the remark that the theoretical analysis in the present paper pertains to the normalized version of SPONGE (denoted as SPONGE_sym) and the symmetric Signed Laplacian, while [CDGT19] analyzed only the un-normalized versions of these signed operators. The experiments reported in [CDGT19] also considered such normalized matrix operators, and reported their superior performance over their respective un-normalized versions, further providing motivational ground for our current work.

(i)

Our first main contribution is to analyze the two above-mentioned signed operators, namely SPONGE_sym and the symmetric Signed Laplacian, in the general SSBM model with $k\geqslant 2$ and unequal-cluster sizes, in the moderately dense regime. In particular, we evaluate the accuracy of both signed clustering algorithms by bounding the mis-clustering rate of the entire pipeline, as achieved by the popular $k$ -means algorithm.
(ii)

Our second contribution is to introduce and analyze new regularized versions of both SPONGE_sym and the symmetric Signed Laplacian, under the same general SSBM model, but in the sparse graph regime $\operatorname*{\varmathbb{E}}[\overline{D}_{jj}]\gtrsim 1$ , a setting where standard spectral methods are known to underperform. To the best of our knowledge, this sparsity regime has not been previously considered in the literature of signed networks; such regularized spectral methods have so far not been considered in the setting of clustering signed networks, or more broadly in the signed networks literature, where such regularization could prove useful for other related downstream tasks. One important aspect of regularization techniques is the choice of the regularization parameters. We show that our proposed algorithms can benefit from careful regularization and attain a higher level of accuracy in the sparse regime, provided that the regularization parameters scale as an adequate power of the average degree in the graph.

2 Problem setup

This section details the two algorithms for the signed clustering problem that we will analyze subsequently, namely, SPONGE_sym(Symmetric Signed Positive Over Negative Generalized Eigenproblem) and the symmetric Signed Laplacian, along with their respective regularized versions.

2.1 Clustering via the SPONGE_sym algorithm

The symmetric SPONGE method, denoted as SPONGE_sym, aims at jointly minimizing two measures of badness in a signed clustering problem. For an unsigned graph $G$ and $X,Y\subset V$ , we define the cut function $\text{Cut}_{G}(X,Y):=\sum_{i\in X,j\in Y}A_{ij}$ , and denote the volume of $X$ by $\text{Vol}_{G}(X):=\sum_{i\in X}\sum_{j=1}^{n}A_{ij}$ .

For a given cluster set $C\subset V$ , $\text{Cut}_{G}(C,\overline{C})$ is the total weight of edges crossing from $C$ to $\overline{C}$ and $\text{Vol}_{G}(C)$ is the sum of (weighted) degrees of nodes in $C$ . With this notation in mind and motivated by the approach of [CKC⁺16] in the context of constrained clustering, the symmetric SPONGE algorithm for signed clustering aims at minimizing the following two measures of badness given by $\frac{\text{Cut}_{G^{+}}(C,\overline{C})}{\text{Vol}_{G^{+}}(C)}$ and $\Big{(}\frac{\text{Cut}_{G^{-}}(C,\overline{C})}{\text{Vol}_{G^{-}}(C)}\Big{)}^{-1}=\frac{\text{Vol}_{G^{-}}(C)}{\text{Cut}_{G^{-}}(C,\overline{C})}$ . To this end, we consider “merging” the objectives, and aim to solve

\min_{C\subset V}\frac{\frac{\text{Cut}_{G^{+}}(C,\overline{C})}{\text{Vol}_{G^{+}}(C)}+\tau^{-}}{\frac{\text{Cut}_{G^{-}}(C,\overline{C})}{\text{Vol}_{G^{-}}(C)}+\tau^{+}}\,,

where $\tau^{+}>0,\tau^{-}\geqslant 0$ denote trade-off parameters. For $k$ -way signed clustering into disjoint clusters $C_{1},\ldots,C_{k}$ , we arrive at the combinatorial optimization problem

\min_{C_{1},\ldots,C_{k}}\sum_{i=1}^{k}\left(\frac{\frac{\text{Cut}_{G^{+}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{+}}(C_{i})}+\tau^{-}}{\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}}\right)\,.

(1)

Let $D^{+},L^{+}$ denote respectively the degree matrix and un-normalized Laplacian associated with $G^{+}$ , and $L^{+}_{sym}=(D^{+})^{-1/2}L^{+}(D^{+})^{-1/2}$ denote the symmetric Laplacian matrix for $G^{+}$ (similarly for $L^{-}_{sym},D^{-},L^{-}$ ). For a subset $C_{i}\subset V$ , denote $\mathds{1}_{C_{i}}$ to be the indicator vector for $C_{i}$ so that $(\mathds{1}_{C_{i}})_{j}$ equals $1$ if $j\in C_{i}$ , and is $0$ otherwise. Now define the normalized indicator vector $x_{C_{i}}\in\varmathbb R^{n}$ where

x_{C_{i}}=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1/2}\frac{1}{\sqrt{\text{Vol}_{G^{+}}(C_{i})}}(D^{+})^{1/2}\mathds{1}_{C_{i}}.

In light on this, one can verify that

	$\displaystyle x_{C_{i}}^{\top}x_{C_{i}}$	$\displaystyle=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1}\frac{\mathds{1}_{C_{i}}^{\top}D^{+}\mathds{1}_{C_{i}}}{\text{Vol}_{G^{+}}(C_{i})}=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1},$
	$\displaystyle x_{C_{i}}^{\top}L^{+}_{sym}x_{C_{i}}$	$\displaystyle=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1}\frac{\mathds{1}_{C_{i}}^{\top}L^{+}\mathds{1}_{C_{i}}}{\text{Vol}_{G^{+}}(C_{i})}=\left(\frac{\text{Cut}_{G^{-}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{-}}(C_{i})}+\tau^{+}\right)^{-1}\frac{\text{Cut}_{G^{+}}(C_{i},\overline{C_{i}})}{\text{Vol}_{G^{+}}(C_{i})}.$

Hence (1) is equivalent to the following discrete optimization problem

\min_{C_{1},\ldots,C_{k}}\sum_{i=1}^{k}x_{C_{i}}^{\top}(L^{+}_{sym}+\tau^{-}I)x_{C_{i}}

(2)

which is NP-Hard. A common approach to solve this problem is to drop the discreteness constraints, and allow $x_{C_{i}}$ to take values in $\varmathbb R^{n}$ . To this end, we introduce a new set of vectors $z_{1},\ldots,z_{k}\in\varmathbb{R}^{n}$ such that they are orthonormal with respect to the matrix $L^{-}_{sym}+\tau^{+}I$ , i.e., $z_{i}^{\top}(L^{-}_{sym}+\tau^{+}I)z_{i^{\prime}}=\delta_{ii^{\prime}}$ . This leads to the continuous optimization problem

\min_{z_{i}^{\top}(L^{-}_{sym}+\tau^{+}I)z_{i^{\prime}}=\delta_{ii^{\prime}}}\sum_{i=1}^{k}z_{i}^{\top}(L^{+}_{sym}+\tau^{-}I)z_{i}.

(3)

Note that the above choice of vectors $z_{1},...,z_{k}$ is not really a relaxation of (2) since $x_{C_{1}},\dots,x_{C_{k}}$ are not necessarily ( $L^{-}_{sym}+\tau^{+}I$ )-orthonormal, but (3) can be conveniently formulated as a suitable generalized eigenvalue problem, similar to the approach in [CKC⁺16]. Indeed, denoting $y_{i}=(L^{-}_{sym}+\tau^{+}I)^{1/2}z_{i}$ , and $Y=[y_{1},\ldots,y_{k}]\in\varmathbb{R}^{n\times k}$ , (3) can be rewritten as

\min_{Y^{\top}Y=I}\text{Tr}\Big{(}Y^{\top}(L^{-}_{sym}+\tau^{+}I)^{-1/2}(L^{+}_{sym}+\tau^{-}I)(L^{-}_{sym}+\tau^{+}I)^{-1/2}Y\Big{)},

the solution to which is well known to be given by the smallest $k$ eigenvectors of

T=(L^{-}_{sym}+\tau^{+}I)^{-1/2}(L^{+}_{sym}+\tau^{-}I)(L^{-}_{sym}+\tau^{+}I)^{-1/2},

see for e.g. [ST00, Theorem 2.1]. However this is not practically viable for large scale problems, since computing $T$ itself is already expensive. To circumvent this issue, one can instead consider the embedding in $\varmathbb{R}^{k}$ corresponding to the smallest $k$ generalized eigenvectors of the symmetric definite pair $(L^{+}_{sym}+\tau^{-}I,L^{-}_{sym}+\tau^{+}I)$ . There exist many efficient solvers for solving large scale generalized eigenproblems for symmetric definite matrix pairs. In our experiments, we use the LOBPCG (Locally Optimal Block Preconditioned Conjugate Gradient method) solver introduced in [Kny01].

One can verify that $(\lambda,v)$ is an eigenpair¹¹1With $\lambda$ denoting its eigenvalue, and $v$ the corresponding eigenvector. of $T$ iff $(\lambda,(L^{-}_{sym}+\tau^{+}I)^{-1/2}v)$ is a generalized eigenpair of $(L^{+}_{sym}+\tau^{-}I,L^{-}_{sym}+\tau^{+}I)$ . Indeed, for symmetric matrices $A,B$ with $A\succ 0$ , it holds true for $w=A^{-1/2}v$ that

A^{-1/2}BA^{-1/2}v=\lambda v\iff Bw=\lambda Aw.

Therefore, denoting $V_{k}(T)\in\varmathbb{R}^{n\times k}$ to be the matrix consisting of the smallest $k$ eigenvectors of $T$ , and $G_{k}(T)\in\varmathbb{R}^{n\times k}$ to be the matrix of the smallest $k$ generalized eigenvectors of $(L^{+}_{sym}+\tau^{-}I,L^{-}_{sym}+\tau^{+}I)$ , it follows that

G_{k}(T)=(L^{-}_{sym}+\tau^{+}I)^{-1/2}V_{k}(T).

(4)

Hence upon computing $G_{k}(T)$ , we will apply a suitable clustering algorithm on the rows of $G_{k}(T)$ such as the popular $k$ -means++ [AV07], to arrive at the final partition.

Remark 2.1.

In [CDGT19], similar arguments as above were shown for the SPONGE algorithm which led to computing the $k$ smallest generalized eigenvectors of the matrix pair $(L^{+}+\tau^{-}D^{-},L^{-}+\tau^{+}D^{+})$ . SPONGE_sym was proposed in [CDGT19] but no theoretical results were provided.

Clustering in the sparse regime.

We also provide a version of SPONGE_sym for the case where $G$ is sparse, i.e., the graph has very few edges and is typically disconnected. In this setting, we consider a regularized version of SPONGE_sym wherein a weight is added to each edge (including self-loops) of the positive and negative subgraphs, respectively. Formally, for regularization parameters $\gamma^{+},\gamma^{-}\geqslant 0$ , let us define $A_{\gamma^{\pm}}^{\pm}:=A^{\pm}+\frac{\gamma^{\pm}}{n}\mathds{1}\mathds{1}^{\top}$ to be the regularized adjacency matrices for the unsigned graphs $G^{+},G^{-}$ respectively. Denoting $D_{\gamma^{\pm}}^{\pm}$ to be the degree matrix of $A_{\gamma^{\pm}}^{\pm}$ , the normalized Laplacians corresponding to $A_{\gamma^{\pm}}^{\pm}$ are given by

L_{sym,\gamma^{\pm}}^{\pm}=I-(D_{\gamma^{\pm}}^{\pm})^{-1/2}A_{\gamma^{\pm}}^{\pm}(D_{\gamma^{\pm}}^{\pm})^{-1/2}.

Given the above modifications, let $V_{k}(T_{\gamma^{+},\gamma^{-}})\in\varmathbb{R}^{n\times k}$ denote the matrix consisting of the smallest $k$ eigenvectors of

T_{\gamma^{+},\gamma^{-}}=(L_{sym,\gamma^{-}}^{-}+\tau^{+}I)^{-1/2}(L_{sym,\gamma^{+}}^{+}+\tau^{-}I)(L_{sym,\gamma^{-}}^{-}+\tau^{+}I)^{-1/2}\,.

For the same reasons discussed earlier, we will consider the embedding given by the smallest $k$ generalized eigenvectors of the matrix pencil $(L_{sym,\gamma^{+}}^{+}+\tau^{-}I,L_{sym,\gamma^{-}}^{-}+\tau^{+}I)$ , namely $G_{k}(T_{\gamma^{+},\gamma^{-}})$ where

G_{k}(T_{\gamma^{+},\gamma^{-}})=(L^{-}_{sym,\gamma^{-}}+\tau^{+}I)^{-1/2}V_{k}(T_{\gamma^{+},\gamma^{-}}),

as in (44). The rows of $G_{k}(T_{\gamma^{+},\gamma^{-}})$ can then be clustered using an appropriate clustering procedure, such as $k$ -means++.

Remark 2.2.

Regularized spectral clustering for unsigned graphs involves adding $\frac{\gamma}{n}\mathds{1}\mathds{1}^{\top}$ to the adjacency matrix, followed by clustering the embedding given by the smallest $k$ eigenvectors of the normalized Laplacian (of the regularized adjacency), see for e.g. [ACBL13, LLV17]. To the best of our knowledge, regularized spectral clustering methods have not been explored thus far in the context of sparse signed graphs.

2.2 Clustering via the symmetric Signed Laplacian

The rationale behind the use of the (un-normalized) Signed Laplacian $\overline{L}$ for clustering is justified by Kunegis et al. in [KSL⁺10] using the signed ratio cut function. For $C\subset V$ ,

sRCut(C,\overline{C})=\left(2\text{Cut}_{G+}(C,\overline{C})+\text{Cut}_{G-}(C,C)+\text{Cut}_{G-}(\overline{C},\overline{C})\right)\left(\frac{1}{|C|}+\frac{1}{|\overline{C}|}\right).

(5)

For $2$ -way clustering, minimizing this objective corresponds to minimizing the number of positive edges between the two classes and the number of negative edges inside each class. Moreover, (5) is equivalent to the following optimization problem

\min_{u\in\mathcal{U}}u^{\top}\overline{L}u,

where $\mathcal{U}\in\varmathbb R^{n}$ is the set of vectors of the form $\forall i\in[n],u_{i}=\pm\frac{1}{2}\left(\sqrt{\frac{|C|}{|\overline{C}|}}+\sqrt{\frac{|\overline{C}|}{|C|}}\right)$ .

However, Gallier [Gal16] noted that this equivalence does not generalize to $k>2$ , and defined a new notion of signed cut, called the signed normalized cut function. For a partition $C_{1},\dots,C_{k}$ with membership matrix $X\in\{0,1\}^{n\times k}$ ,

sNCut(C_{1},\dots,C_{k})=\sum_{i=1}^{k}\frac{\text{Cut}_{G}(C_{i},\overline{C_{i}})}{\text{Vol}_{G}(C_{i})}+2\frac{\text{Cut}_{G-}(C_{i},C_{i})}{\text{Vol}_{G}(C_{i})}=\sum_{i=1}^{k}\frac{(X^{i})^{\top}\overline{L}X^{i}}{(X^{i})^{\top}\overline{D}X^{i}},

with $X^{i}$ the $i$ -th column of $X$ . Compared to (5), this objective also penalizes the number of negative edges across two subsets, which may not be a desirable feature for signed clustering. Minimizing this function with a relaxation of the constraint that $X^{i}\in\{0,1\}^{n}$ leads to the following problem

\min_{Y^{\top}Y=I}\text{Tr}\Big{(}Y^{\top}\overline{L_{sym}}Y\Big{)}.

The minimum of this problem is obtained by stacking column-wise the $k$ eigenvectors of $\overline{L_{sym}}$ corresponding to the smallest eigenvalues, i.e. $V_{k}(\overline{L_{sym}})$ . Therefore, one can apply a clustering algorithm to the rows of the matrix $V_{k}(\overline{L_{sym}})$ to find a partition of the set of nodes $V$ .

In fact, we will consider using only the $k-1$ smallest eigenvectors of $\overline{L_{sym}}$ and applying the $k$ -means++ algorithm on the rows of $V_{k-1}(\overline{L_{sym}})$ . This will be justified in our analysis via a stochastic generative model, namely the Signed Stochastic Block Model (SSBM), introduced in the next subsection. Under this model assumption, we will see later that the embedding given by the $k-1$ smallest eigenvectors of the symmetric Signed Laplacian of the expected graph has $k$ distinct rows (with two rows being equal if and only if the corresponding nodes belong to the same cluster).

Clustering in the sparse regime.

When $G$ is sparse, we propose a spectral clustering method based on a regularization of the signed graph, leading to a regularized Signed Laplacian. To this end, for $\gamma^{+},\gamma^{-}\geqslant 0$ , recall the regularized adjacency matrices $A_{\gamma^{\pm}}^{\pm}$ , with degree matrices $D_{\gamma^{\pm}}^{\pm}$ , for the unsigned graphs $G^{+},G^{-}$ respectively. In light of this, the regularized signed adjacency and degree matrices are defined as follows

	$\displaystyle A_{\gamma}$	$\displaystyle:=A_{\gamma^{+}}^{+}-A_{\gamma^{-}}^{-}=A+\frac{\gamma^{+}-\gamma^{-}}{n}\mathds{1}\mathds{1}^{\top},$
	$\displaystyle\overline{D}_{\gamma}$	$\displaystyle:=D_{\gamma^{+}}^{+}+D_{\gamma^{-}}^{-}=D^{+}+\gamma^{+}I+D^{-}+\gamma^{-}I=\overline{D}+(\gamma^{+}+\gamma^{-})I=\overline{D}+\gamma I,$

with $\gamma:=\gamma^{+}+\gamma^{-}$ . Our regularized Signed Laplacian is the symmetric Signed Laplacian on this regularized signed graph, i.e.

L_{\gamma}:=I-(\overline{D}_{\gamma})^{-1/2}A_{\gamma}(\overline{D}_{\gamma})^{-1/2}.

(6)

Similarly to the symmetric Signed Laplacian, our clustering algorithm in the sparse case finds the $k-1$ smallest eigenvectors of $L_{\gamma}$ and applies the $k$ -means algorithm on the rows of $V_{k-1}(L_{\gamma})$ .

Remark 2.3.

For the choice $\gamma^{+}=\gamma^{-}$ , the regularized Laplacian becomes

\displaystyle L_{\gamma}

\displaystyle:=I-(\overline{D}_{\gamma})^{-1/2}A(\overline{D}_{\gamma})^{-1/2},

with $\overline{D}_{\gamma}=\overline{D}+(\gamma^{+}+\gamma^{-})I$ . This regularization scheme is very similar to the degree-corrected normalized Laplacian defined in [CCT12].

2.3 Signed Stochastic Block Model (SSBM)

Our work theoretically analyzes the clustering performance of SPONGE_sym and the symmetric Signed Laplacian algorithms under a signed random graph model, also considered previously in [CDGT19, CPv19]. We recall here its definition and parameters.

•

$n$ : the number of nodes in network;
•

$k$ : the number of planted communities;
•

$p$ : the probability of an edge to be present;
•

$\eta$ : the probability of flipping the sign of an edge;
•

$C_{1},\dots,C_{k}$ : an arbitrary partition of the vertices with sizes $n_{1},\dots,n_{k}$ .

We first partition the vertices (arbitrarily) into clusters $C_{1},\dots,C_{k}$ where $\left\lvert C_{i}\right\rvert=n_{i}$ . Next, we generate a noiseless measurement graph from the Erdős-Rényi model $G(n,p)$ , wherein each edge takes value $+1$ if both its endpoints are contained in the same cluster, and $-1$ otherwise. To model noise, we flip the sign of each edge independently with probability $\eta\in[0,1/2)$ . This results in the realization of a signed graph instance $G$ from the SSBM ensemble.

Let $A\in\left\{0,\pm 1\right\}^{n\times n}$ denote the adjacency matrix of $G$ , and note that $(A_{jj^{\prime}})_{j\leqslant j^{\prime}}$ are independent random variables. Recall that $A=A^{+}-A^{-}$ , where $A^{+},A^{-}\in\left\{0,1\right\}^{n\times n}$ are the adjacency matrices of the unsigned graphs $G^{+},G^{-}$ respectively. Then, $(A_{jj^{\prime}}^{+})_{j\leqslant j^{\prime}}$ are independent, and similarly $(A_{jj^{\prime}}^{-})_{j\leqslant j^{\prime}}$ are also independent. But for given $j,j^{\prime}\in[n]$ with $j\neq j^{\prime}$ , $A_{jj^{\prime}}^{+}$ and $A_{jj^{\prime}}^{-}$ are dependent. Let $d_{i}^{\pm}$ denote the degree of a node in cluster $i$ , for $i\in[k]$ in the graph $\operatorname*{\varmathbb{E}}[A^{\pm}]$ . Moreover, under this model, the expected signed degree matrix is the scaled identity matrix $\varmathbb{E}\overline{D}=\overline{d}I$ , with $\overline{d}=p(n-1)$ .

Remark 2.4.

Contrary to stochastic block models for unsigned graphs, we do not require (for the purpose of detecting clusters) that the intra-cluster edge probabilities to be different from those of inter-cluster edges, since the sign of the edges already achieves this purpose implicitly. In fact, it is the noise parameter $\eta$ that is crucial for identifying the underlying latent cluster structure.

To formulate our theoretical results we will also need the following notations. Let $s_{i}=n_{i}/n$ denote the fraction of nodes in cluster $i$ , with $l$ (resp. $s$ ) denoting the fraction for the largest (resp. smallest) cluster. Hence, the size of the largest (resp. smallest) cluster is $nl$ (resp. $ns$ ). Following the notation in [LR15], we will denote $\varmathbb{M}_{n,k}$ to be the class of “membership” matrices of size $n\times k$ , and denote $\hat{\Theta}\in\varmathbb{M}_{n,k}$ to be the ground-truth membership matrix containing $k$ distinct indicator row-vectors (one for each cluster), i.e., for $i\in[k]$ and $j\in[n]$ ,

\hat{\Theta}_{ji}=\begin{cases}1&\text{ if node }j\in\text{ cluster }C_{i},\\ 0&\text{ otherwise}.\end{cases}

We also define the normalized membership matrix $\Theta$ corresponding to $\hat{\Theta}$ , where for $i\in[k]$ and $j\in[n]$ ,

\Theta_{ji}=\begin{cases}1/\sqrt{n_{i}}&\text{ if node }j\in\text{ cluster }C_{i},\\ 0&\text{ otherwise}.\end{cases}

3 Summary of main results

We now summarize our theoretical results for SPONGE_sym and the symmetric Signed Laplacian methods, when the graph is generated from the SSBM ensemble.

3.1 Symmetric SPONGE

We begin by describing conditions under which the rows of the matrix $G_{k}(T)$ approximately preserve the ground truth clustering structure. Before explaining our results, let us denote the matrix $\overline{T}$ to be the analogue of $T$ for the expected graph, i.e.,

\overline{T}=(\overline{L^{-}_{sym}}+\tau^{+}I)^{-1/2}(\overline{L^{+}_{sym}}+\tau^{-}I)(\overline{L^{-}_{sym}}+\tau^{+}I)^{-1/2}\,,

where $\overline{L_{sym}^{\pm}}=I-(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}\operatorname*{\varmathbb{E}}[A^{\pm}](\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}$ . We first show that for suitable values of $\tau^{+}>0,\tau^{-}\geqslant 0$ (with $n$ large enough), the smallest $k$ eigenvectors of $\overline{T}$ , denoted by $V_{k}(\overline{T})$ , are given by $V_{k}(\overline{T})=\Theta R$ , for some $k\times k$ rotation matrix $R$ . Hence, the rows of $V_{k}(\overline{T})$ have the same clustering structure as that of $\Theta$ . Denoting $G_{k}(\overline{T})\in\varmathbb{R}^{n\times k}$ to be the matrix consisting of the $k$ smallest generalized eigenvectors of $(\overline{L^{+}_{sym}}+\tau^{-}I,\overline{L^{-}_{sym}}+\tau^{+}I)$ , and recalling (4), we can relate $G_{k}(\overline{T})$ and $V_{k}(\overline{T})$ via

G_{k}(\overline{T})=(\overline{L^{-}_{sym}}+\tau^{+}I)^{-1/2}V_{k}(\overline{T}).

(7)

It turns out that when $V_{k}(\overline{T})=\Theta R$ , and in light of the expression for $\overline{L^{-}_{sym}}+\tau^{+}I$ from (24), we arrive at $G_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R$ , where $C^{-}\succ 0$ is as in (18). Since $(C^{-})^{-1/2}R$ is invertible, it follows that $G_{k}(\overline{T})$ has $k$ distinct rows, with the rows that belong to the same cluster being identical. The remaining arguments revolve around deriving concentration bounds on $\left\lVert T-\overline{T}\right\rVert$ , which imply (for $p$ large enough) that the distance between the column spans of $V_{k}(T)$ and $V_{k}(\overline{T})$ is small, i.e., there exists an orthonormal matrix $O$ such that $\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert$ is small. Finally, the expressions in (4) and (7) altogether imply that $\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert$ is small, which is an indication that the rows of $G_{k}(T)$ approximately preserve the clustering structure encoded in $\Theta$ .

The above discussion is summarized in the following theorem, which is our first main result for SPONGE_sym in the moderately dense regime.

Theorem 3.1 (Restating Theorem 4.13; Eigenspace alignment of SPONGE_sym in the dense case).

Assuming $n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}$ , suppose that $\tau^{+}>0,\tau^{-}\geqslant 0$ are chosen to satisfy

\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\quad\quad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}

where $\beta,\eta$ satisfy one of the following conditions

1.

$\beta=\frac{4\eta}{s(1-2\eta)+4\eta}$ and $0<\eta<\frac{1}{2}$ , or
2.

$\beta=\frac{1}{2}$ and $\eta\leqslant\frac{s}{2s+4}$ .

Then $V_{k}(\overline{T})=\Theta R$ and $G_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R$ , where $R$ is a rotation matrix, and $C^{-}\succ 0$ is as defined in (18). Moreover, for any $\varepsilon,\delta\in(0,1)$ , there exists a constant $\widetilde{c}_{\varepsilon}>0$ such that the following is true. If $p$ satisfies

p\geqslant\max\left\{\widetilde{c}_{\varepsilon}C_{2}(s,\eta,l),\frac{256C_{1}^{4}(\tau^{+},\tau^{-})(2+\tau^{+})^{4}}{\delta^{4}(1+\tau^{-})^{4}(1-\beta)^{4}}C_{2}(s,\eta,l),\frac{81}{(1-l)\delta^{4}}\right\}\frac{\ln(4n/\varepsilon)}{n}

with $C_{1}(\cdot),C_{2}(\cdot)$ as in (45), then with probability at least $1-2\varepsilon$ , there exists an orthogonal matrix $O\in\varmathbb R^{k\times k}$ such that

\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert\leqslant\delta,\qquad\mbox{and}\qquad\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\frac{\delta}{(\tau^{+})^{2}}.

Let us now interpret the scaling of the terms $n,p,\tau^{+}$ and $\tau^{-}$ in Theorem 3.1, and provide some intuition.

1.

In general, when no assumption is made on the noise level $\eta$ , we have $\beta=\frac{4\eta}{s(1-2\eta)+4\eta}$ and the requirement on $n$ is $n\gtrsim\max\left\{\frac{1}{s(1-2\eta)},\frac{\eta}{1-l}\right\}$ . Then a sufficient set of conditions on $\tau^{+}>0,\tau^{-}\geqslant 0$ are

$\tau^{+}\gtrsim 1+\frac{\eta}{s(1-2\eta)},\quad\tau^{-}\lesssim\frac{\eta}{s(1-2\eta)+2\eta}.$ (8)

Moreover, we see from (45) that $C_{1}(\tau^{+},\tau^{-})\lesssim 1/\tau^{+}$ , and thus $\frac{(2+\tau^{+})C_{1}(\tau^{+},\tau^{-})}{1+\tau^{-}}\lesssim 1$ . Hence, a sufficient condition on $p$ is

$p\gtrsim\frac{1}{\delta^{4}}\left(1+\frac{\eta}{s(1-2\eta)}\right)^{4}C_{2}(s,\eta,l)\frac{\ln n}{n}.$
2.

In the “low-noise” regime where $\eta\leqslant\frac{s}{2s+4}$ , the condition on $\tau^{-}$ in (8) becomes strict, especially as $\eta\rightarrow 0$ . In this regime, the second condition in Theorem 3.1 allows for a wider range of values for $\tau^{-}$ ; in particular, the following set of conditions suffice

$\tau^{+}\gtrsim 1,\quad\tau^{-}\lesssim\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}.$

Moreover, we then obtain that the condition $p\gtrsim\frac{1}{\delta^{4}}C_{2}(s,\eta,l)\frac{\ln n}{n}$ is sufficient.
3.

When $\tau^{+}\rightarrow\infty$ , then $\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert\rightarrow 0$ , which might lead one to believe that the clustering performance improves accordingly. This is not the case however, since when $\tau^{+}$ is large, then $G_{k}(T)\approx\frac{1}{\sqrt{\tau^{+}}}V_{k}(T)$ and $G_{k}(\overline{T})\approx\frac{1}{\sqrt{\tau^{+}}}V_{k}(\overline{T})$ , which means that clustering the rows of $G_{k}(T)$ (resp. $G_{k}(\overline{T})$ ) is roughly equivalent to clustering the rows of $V_{k}(T)$ (resp. $V_{k}(\overline{T})$ ). Moreover, note that for large $\tau^{+}$ , we have $T\approx\frac{1}{\tau^{+}}(L^{+}_{sym}+\tau^{-}I)$ and $\overline{T}\approx\frac{1}{\tau^{+}}(\overline{L^{+}_{sym}}+\tau^{-}I)$ and thus the negative subgraph has no effect on the clustering performance.

SPONGE_sym in the sparse regime.

Notice that the above theorem required the sparsity parameter $p=\Omega(\ln n/n)$ , when $n$ is large enough. This condition on $p$ is essentially required to show concentration bounds on $\left\lVert L^{\pm}_{sym}-\overline{L^{\pm}_{sym}}\right\rVert$ in Lemma 4.11, which in turn implies a concentration bound on $\left\lVert T-\overline{T}\right\rVert$ (see Lemma 4.12). However, in the sparse regime $p$ is of the order $o(\ln n)/n$ , and thus Lemma 4.11 does not apply in this setting. In fact, it is not difficult to see that the matrices $L^{\pm}_{sym}$ will not concentrate²²2See for e.g., [LLV17]. around $\overline{L^{\pm}_{sym}}$ in the sparse regime. On the other hand, by relying on a recent result in [LLV17, Theorem 4.1] on the concentration of the normalized Laplacian of regularized adjacency matrices of inhomogeneous Erdős-Rényi graphs in the sparse regime (see Theorem 4.15), we show concentration bounds on $\left\lVert L^{+}_{sym,\gamma^{+}}-\overline{L^{+}_{sym}}\right\rVert$ and $\left\lVert L^{-}_{sym,\gamma^{-}}-\overline{L^{-}_{sym}}\right\rVert$ , which hold when $p\gtrsim 1/n$ and $\gamma^{+},\gamma^{-}\asymp(np)^{6/7}$ (see Lemma 4.16). As before, these concentration bounds can then be shown to imply a concentration bound on $\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert$ (see Lemma 4.17). Other than these technical differences, the remainder of the arguments follow the same structure as in the proof of Theorem 3.1, thus leading to the following result in the sparse regime.

Theorem 3.2 (Restating Theorem 4.18 ).

Assuming $n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-\eta)(1-l)}\right\}$ , suppose $\tau^{+}>0,\tau^{-}\geqslant 0$ are chosen to satisfy

\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\qquad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}

where $\beta,\eta$ satisfy one of the following conditions

1.

$\beta=\frac{4\eta}{s(1-2\eta)+4\eta}$ and $0<\eta<\frac{1}{2}$ , or
2.

$\beta=\frac{1}{2}$ and $\eta\leqslant\frac{s}{2s+4}$ .

Then $V_{k}(\overline{T})=\Theta R$ and $G_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R$ , where $R$ is a rotation matrix, and $C^{-}\succ 0$ is as defined in (18). Moreover, there exists a constant $C>0$ such that for $r\geqslant 1$ and $\delta\in(0,1)$ , if $p$ satisfies

p\geqslant\max\left\{1,\left(\frac{4C_{1}(\tau^{+},\tau^{-})(2+\tau^{+})}{3(\tau^{+})^{2}(1-\beta)(1+\tau^{-})}\right)^{28}\right\}\frac{C_{4}^{14}(r,s,\eta,l)}{\delta^{28}(1-\eta)n},

and $\gamma^{+},\gamma^{-}=[np(1-\eta)]^{6/7}$ , then with probability at least $1-2e^{-r}$ , there exists a rotation $O\in\varmathbb R^{k\times k}$ so that

\left\lVert V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O\right\rVert\leqslant\delta,\qquad\mbox{and}\qquad\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\frac{\delta}{(\tau^{+})^{2}}.

Here, $C_{4}(r,s,\eta,l):=2^{5/2}Cr^{2}+3\sqrt{2C_{2}(s,\eta,l)}$ , with $C_{2}(s,\eta,l)$ as defined in (45).

The following remarks are in order.

1.

It is clear that $\gamma^{+},\gamma^{-}$ can neither be too small (since this would imply lack of concentration), nor too large (since this would destroy the latent geometries of $G^{+},G^{-}$ ). The choice $\gamma^{+},\gamma^{-}\asymp(np)^{6/7}$ provides a trade-off, and leads to the bounds $\left\lVert L^{+}_{sym,\gamma^{+}}-\overline{L^{+}_{sym}}\right\rVert,\left\lVert L^{-}_{sym,\gamma^{-}}-\overline{L^{-}_{sym}}\right\rVert=O((np)^{-1/14})$ when $p\gtrsim 1/n$ (see Lemma 4.16).
2.

In general, for $\eta\in(0,1/2)$ , it suffices that $\tau^{+},\tau^{-}$ satisfy (8) and $n\gtrsim\max\left\{\frac{1}{s(1-2\eta)},\frac{\eta}{1-l}\right\}$ . As discussed earlier, $\frac{(2+\tau^{+})C_{1}(\tau^{+},\tau^{-})}{1+\tau^{-}}\lesssim 1$ , and hence it suffices that $p\gtrsim\frac{C_{4}^{14}(r,s,\eta,l)}{\delta^{28}n}$ .

Mis-clustering error bounds.

Thus far, our analysis has shown that under suitable conditions on $n,p,\tau^{+}$ and $\tau^{-}$ , the matrix $G_{k}(T)$ (or $G_{k}(T_{\gamma^{+},\gamma^{-}})$ in the sparse regime) is close to $G_{k}(\overline{T})O$ for some rotation $O$ , with the rows of $G_{k}(\overline{T})$ preserving the ground truth clustering. This suggests that by applying the $k$ -means clustering algorithm on the rows of $G_{k}(T)$ (or $G_{k}(T_{\gamma^{+},\gamma^{-}})$ ) one should be able to approximately recover the underlying communities. However, the $k$ -means problem for clustering points in $\varmathbb R^{d}$ is known to be NP-Hard in general, even for $k=2$ or $d=2$ [ADHP09, Das08, MNV12]. On the other hand, there exist efficient $(1+\xi)$ -approximation algorithms (for $\xi>0$ ), such as, for e.g., the algorithm of Kumar et al. [KSS04] which has a running time of $O(2^{(k/\xi)^{O(1)}}nd)$ .

Using standard tools [LR15, Lemma 5.1], we can bound the mis-clustering error when a $(1+\xi)$ -approximate $k$ -means algorithm is applied on the rows of $G_{k}(T)$ (or $G_{k}(T_{\gamma^{+},\gamma^{-}})$ ), provided the estimation error bound $\delta$ is small enough. In the following theorem, the sets $S_{i}$ , $i=1,\ldots,k$ contain those vertices in $C_{i}$ for which we cannot guarantee correct clustering.

Theorem 3.3 (Re-Stating Theorem 4.20).

Under the notation and assumptions of Theorem 3.1, let $(\tilde{\Theta},\tilde{X})\in\varmathbb{M}_{n\times k}\times\varmathbb R^{k\times k}$ be a $(1+\xi)$ -approximate solution to the $k$ -means problem $\min_{\Theta\in\varmathbb{M}_{n\times k},X\in\varmathbb R^{k\times k}}\left\lVert\Theta X-G_{k}(T)\right\rVert_{F}^{2}$ . Denoting

S_{i}=\left\{j\in C_{i}\ :\ \left\lVert(\tilde{\Theta}\tilde{X})_{j*}-(\Theta(C^{-})^{-1/2}RO)_{j*}\right\rVert\geqslant\frac{1}{2\sqrt{n_{i}(\tau^{+}+\frac{2}{1-l})}}\right\}

it holds with probability at least $1-2\varepsilon$ that

\sum_{i=1}^{k}\frac{\left\lvert S_{i}\right\rvert}{n_{i}}\leqslant\delta^{2}{(64+32\xi)k}\left(\tau^{+}+\frac{2}{1-l}\right)\left(\frac{(\tau^{+})^{3}+1}{(\tau^{+})^{4}}\right).

(9)

In particular, if $\delta$ satisfies

\delta<\frac{(\tau^{+})^{2}}{\sqrt{(64+32\xi)k(\tau^{+}+\frac{2}{1-l})((\tau^{+})^{3}+1)}}

then there exists a $k\times k$ permutation matrix $\pi$ such that $\tilde{\Theta}_{G}=\hat{\Theta}_{G}\pi$ , where $G=\cup_{i=1}^{k}(C_{i}\setminus S_{i})$ .

In the sparse regime, the above statement holds under the notation and assumptions of Theorem 3.2 with $G_{k}(T)$ replaced with $G_{k}(T_{\gamma^{+},\gamma^{-}})$ , and with probability at least $1-2e^{-r}$ .

We remark that when $\tau^{+}\rightarrow\infty$ , the bound on $\delta$ becomes independent of $\tau^{+}$ and is of the form $\delta\lesssim\frac{1}{\sqrt{k}}$ . This is also true for the mis-clustering bound in (9), which is of the form $\sum_{i=1}^{k}\frac{\left\lvert S_{i}\right\rvert}{n_{i}}\lesssim\delta^{2}k.$

3.2 Symmetric Signed Laplacian

We now describe our results for the symmetric Signed Laplacian. We recall that $\varmathbb{E}[A]=\varmathbb{E}[A^{+}]-\varmathbb{E}[A^{-}]$ and $\varmathbb{E}[\bar{D}]$ denote the adjacency and degree matrices of the expected graph, under the SSBM ensemble. We define

\mathcal{L}_{sym}=I_{n}-(\varmathbb{E}[\bar{D}])^{-1/2}\varmathbb{E}[A](\varmathbb{E}[\bar{D}])^{-1/2},

(10)

to be the normalized Signed Laplacian of the expected graph. Moreover, $\rho=\frac{s}{l}\leqslant 1$ denotes the aspect ratio, measuring the discrepancy between the smallest and largest cluster sizes in the SSBM.

We will first show that for $\rho$ large enough, the smallest $k-1$ eigenvectors of $\mathcal{L}_{sym}$ , denoted by $V_{k-1}(\mathcal{L}_{sym})$ , are given by $V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}$ , with $R_{k-1}\in\varmathbb R^{k\times(k-1)}$ a matrix whose columns are the $k-1$ smallest eigenvectors of a $k\times k$ matrix $\overline{C}$ defined in Lemma 5.1. We will then prove that the rows of $V_{k-1}(\mathcal{L}_{sym})$ impart the same clustering structure as that of $\Theta$ . The remaining arguments revolve around deriving concentration bounds on $\left\lVert\overline{L_{sym}}-\mathcal{L}_{sym}\right\rVert$ , which imply, for $n,p$ and $\rho$ large enough, that the distance between the column spans of $V_{k-1}(\overline{L_{sym}})$ and $V_{k-1}(\mathcal{L}_{sym})$ is small, i.e. there exists a unitary matrix $O$ such that $\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})O\right\rVert$ is small. Altogether, this allows us to conclude that the rows of $V_{k-1}(\overline{L_{sym}})$ approximately encode the clustering structure of $\Theta$ . The above discussion is summarized in the following theorem, which is our first main result for the symmetric Signed Laplacian, in the moderately dense regime.

Theorem 3.4 (Eigenspace alignment in the dense case).

Assuming $\eta\in[0,1/2)$ , $k\geqslant 2$ , $n\geqslant 10$ , suppose the aspect ratio satisfies

\displaystyle\sqrt{\rho}>1-\frac{1}{4k(2+\sqrt{k})},

(11)

and suppose that, for $\delta\in(0,\frac{1}{2})$ , it holds true that

\displaystyle p>C(k,\eta,\delta)\frac{\ln n}{n}\qquad

\displaystyle\text{ with }\quad C(k,\eta,\delta)=\left(\frac{2Ck}{\delta(1-2\eta)}\right)^{2}\quad\text{and}\>C<43,

(12)

Then there exists a universal constant $c>0$ , such that with probability at least $1-\frac{2}{n}-n\exp{(\frac{-np}{c})}$ , there exists an orthogonal matrix $O\in\varmathbb R^{(k-1)\times(k-1)}$ such that

\|V_{k-1}(\overline{L_{sym}})-\Theta R_{k-1}O\|\leqslant 2\delta,

where $R_{k-1}\in\varmathbb R^{k\times(k-1)}$ is a matrix whose columns are the $(k-1)$ smallest eigenvectors of the matrix $\overline{C}$ defined in Lemma 5.1.

Remark 3.5.

(Related work) As previously explained, for the special case where $k=2$ and with equal-size clusters, a similar result was proved in [CDGT19, Theorem 3]. Under a different SSBM model, the Signed Laplacian clustering algorithm was analyzed by Mercado et al. [MTH19] for general $k$ . Although their generative model is more general than our SSBM, their results on the symmetric Signed Laplacian do not apply here. More precisely, one assumption of Theorem 3 [MTH19] translates into our model as $p(k-2)(1-2\eta)<0$ , which does not hold for $\eta<\frac{1}{2}$ and $k\geqslant 2$ .

Remark 3.6.

(Assumptions) The condition on the aspect ratio (11) is essential to apply a perturbation technique, where the reference is the setting with equal-size clusters, i.e. $n_{i}=\frac{n}{k},\forall i\in[k]$ (see Lemma 5.3). In the sparsity condition (12), we note that the constant $C(k,\eta,\delta)$ scales quadratically with the number of classes $k$ and as $\delta^{-2}$ with $\delta>0$ the error on the eigenspace. However, we conjecture that this assumption is only an artefact of the proof technique, and that the result could hold for more general graphs with very unbalanced cluster sizes.

Regularized Signed Laplacian.

We now consider the sparse regime $p=o(\ln n)/n$ and show that we can recover the ground-truth clustering structure up to some small error using the regularized Signed Laplacian $L_{\gamma}$ , provided that $n$ , $p$ and $\rho$ are large enough, and that the regularization parameters $\gamma^{+},\gamma^{-}$ are well-chosen. We denote $\mathcal{L}_{\gamma}$ to be the equivalent of the regularized Laplacian for the expected graph in our SSBM, i.e.

\displaystyle\mathcal{L}_{\gamma}=I-(\varmathbb{E}[\bar{D}_{\gamma}])^{-1/2}\varmathbb{E}[A_{\gamma}](\varmathbb{E}[\bar{D}_{\gamma}])^{-1/2},

with $\varmathbb{E}[A_{\gamma}]$ , resp. $\varmathbb{E}[\overline{D}_{\gamma}]$ , denoting the adjacency matrix, resp. the degree matrix, of the expected regularized graph. The next theorem is an intermediate result, which provides a high probability bound on $\left\lVert L_{\gamma}-\mathcal{L}_{\gamma}\right\rVert$ and $\left\lVert L_{\gamma}-\mathcal{L}_{sym}\right\rVert$ .

Theorem 3.7.

(Error bound for the regularized Signed Laplacian) Assuming $\eta\in[0,1/2)$ , $k\geqslant 2$ , and regularization parameters $\gamma^{+},\gamma^{-}\geqslant 0$ , $\gamma:=\gamma^{+}+\gamma^{-}$ , it holds true that for any $r\geqslant 1$ , with probability at least $1-7e^{-2r}$ , we have

\|L_{\gamma}-\mathcal{L}_{\gamma}\|\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}},

(13)

with $C>1$ an absolute constant. Moreover, it also holds true that

\|L_{\gamma}-\mathcal{L}_{sym}\|\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}+\frac{\gamma}{\overline{d}+\gamma}.

(14)

In particular, for the choice $\gamma=\overline{d}^{7/8}$ , if $p\geqslant 2/n$ , we obtain

\displaystyle\|L_{\gamma}-\mathcal{L}_{sym}\|\leqslant\left(128Cr^{2}+1\right)(\overline{d})^{-\frac{1}{8}}.

Remark 3.8.

The above theorem shows the concentration of our regularized Laplacian $L_{\gamma}$ towards the regularized Laplacian (13) and the Signed Laplacian (14) of the expected graph. More precisely, if for some well-chosen parameters $\gamma^{+},\gamma^{-}\geqslant 0$ , these upper bounds are small, e.g $\|L_{\gamma}-\mathcal{L}_{sym}\|<<1$ , then we have $\|L_{\gamma}-\mathcal{L}_{sym}\|<<\left\lVert\mathcal{L}_{sym}\right\rVert$ since $\left\lVert\mathcal{L}_{sym}\right\rVert=2$ (see Appendix E).

Using this concentration bound, we can show that the eigenspaces $V_{k-1}(L_{\gamma})$ and $V_{k-1}(\mathcal{L}_{sym})$ are “close”, provided that $p=\Omega(1/n)$ , $\rho$ is close enough to 1, and $\gamma$ is well-chosen. This is stated in the next theorem.

Theorem 3.9 (Eigenspace alignment in the sparse case).

Assuming $\eta\in[0,1/2)$ , $k\geqslant 2$ , and $n\geqslant 10$ , suppose that (11) holds true, and for $\delta\in(0,\frac{1}{2})$ and $r\geqslant 1$ , the sparsity $p$ satisfies

\displaystyle p>\left(\frac{2kC_{4}}{\delta(1-2\eta)}\right)^{8}\frac{2}{n}\qquad

\displaystyle\text{ with }\quad C_{4}=128Cr^{2}+1

(15)

and $C>1$ the constant defined in (13). If the regularization parameters $\gamma^{+},\gamma^{-}\geqslant 0$ are chosen so that $\gamma=\overline{d}^{7/8}$ , then with probability at least $1-7e^{-2r}-\frac{2}{n}-ne^{-np/c}$ , there exists an orthogonal matrix $O\in\varmathbb R^{(k-1)\times(k-1)}$ so that

\|V_{k-1}(L_{\gamma})-\Theta R_{k-1}O\|\leqslant 2\delta.

Remark 3.10.

In the sparse setting, the constant before the factor $\frac{1}{n}$ in the sparsity condition (15) scales as $\left(\frac{k}{\delta}\right)^{8}$ . However for $k$ fixed, it would hold if $p=\omega(1/n)$ as $n\to\infty$ .

Remark 3.11.

In practice, one can choose the regularization parameters by first estimating the sparsity parameter $p$ , e.g. from the fraction of connected pairs of nodes

\displaystyle p=\frac{2}{n(n-1)}\sum_{i<j}|A_{ij}|,

then choosing $\gamma\geqslant 0$ so that $\gamma=(\hat{p}(n-1))^{7/8}$ . However, from this analysis, it is not clear how one would suitably choose $\gamma^{+}$ and $\gamma^{-}$ .

Mis-clustering error bounds.

Since $V_{k-1}(\overline{L_{sym}})$ and $V_{k-1}(L_{\gamma})$ are “close” to $V_{k-1}(\mathcal{L}_{sym})$ , we recover the ground-truth clustering structure up to some error, which we quantify in the following theorem, where we bound the mis-clustering rate when using a $(1+\xi)$ -approximate $k$ -means error on the rows of $V_{k-1}(\overline{L_{sym}})$ (resp. $V_{k-1}(L_{\gamma})$ ).

Theorem 3.12.

(Number of mis-clustered nodes) Let $\xi>0$ and $\delta\in\left(0,\sqrt{\frac{1}{12(16+8\xi)(k-1)}}\right)$ , and suppose that $\rho$ and $p$ satisfy the assumptions of Theorem 3.4 (resp. Theorem 3.9 and $r\geqslant 1$ ). Let $(\tilde{\Theta},\tilde{R}_{k-1})$ be the $(1+\xi)$ -approximation of the $k$ -means problem

\min_{\Theta\in\varmathbb{M}_{n,k},R\in\varmathbb R^{k\times(k-1)}}\left\lVert\Theta R-V_{k-1}(\overline{L_{sym}})\right\rVert_{F}\qquad\text{(resp. }\min_{\Theta\in\varmathbb{M}_{n,k},R\in\varmathbb R^{k\times(k-1)}}\left\lVert\Theta R-V_{k-1}(L_{\gamma})\right\rVert_{F}\text{ )}.

Let $S_{i}=\left\{j\in C_{i};\left\lVert(\tilde{\Theta}\tilde{R}_{k-1})_{j*}-(\Theta R_{k-1}O)_{j*}\right\rVert^{2}\geqslant\frac{2}{3n_{i}}\right\}$ and $\tilde{V}=\cup_{i=1}^{k}C_{i}\backslash S_{i}$ . Then with probability at least $1-\frac{2}{n}-n\exp(\frac{-np}{c})$ (resp. $1-7e^{-2r}-\frac{2}{n}-ne^{-np/c}$ ), there exists a permutation $\pi\in\varmathbb R^{k\times k}$ such that $\tilde{\Theta}_{\tilde{V}*}=\hat{\Theta}_{\tilde{V}*}\pi$ and

\displaystyle\sum_{i=1}^{k}\frac{|S_{i}|}{n_{i}}\leqslant 96(2+\xi)(k-1)\delta^{2}.

In particular, the set of mis-clustered nodes is a subset of $\cup_{i=1}^{k}S_{i}$ .

4 Analysis of SPONGE Symmetric

This section contains the proof of our main results for SPONGE_sym, divided over the following subsections. Section 4.1 describes the eigen-decomposition of the matrix $\overline{T}$ , thus revealing that a subset of its eigenvectors contain relevant information about $\Theta$ . Section 4.2 provides conditions on $\tau^{+},\tau^{-}$ which ensure that $V_{k}(\Theta)=\Theta R$ (for some rotation matrix $R$ ), along with a lower bound on the eigengap $\lambda_{n-k+1}(\overline{T})-\lambda_{n-k}(\overline{T})$ . Section 4.3 then derives concentration bounds on $\left\lVert T-\overline{T}\right\rVert$ using standard tools from the random matrix literature. These results are combined in Section 4.4 to derive error bounds for estimating $V_{k}(\overline{T})$ and $G_{k}(\overline{T})$ up to a rotation (using the Davis-Kahan theorem). The results summarized thus far pertain to the “dense” regime, where we require $p=\Omega(\ln n/n)$ when $n$ is large. Section 4.5 extends these results to the sparse regime where $p=o(\ln n)/n$ , for the regularized version of SPONGE_sym. Finally, we conclude in Section 4.6 by translating our results from Sections 4.4 and 4.5 to obtain mis-clustering error bounds for a $(1+\xi)$ -approximate $k$ -means algorithm, by leveraging previous tools from the literature [LR15].

4.1 Eigen-decomposition of $\overline{T}$

The following lemma shows that a subset of the eigenvectors of $\overline{T}$ indeed contain information about $\Theta$ , i.e., the ground-truth clustering.

Lemma 4.1 (Spectrum of $\overline{T}$ ).

Let $d_{i}^{+}=p\left(n(s_{i}(1-2\eta)+\eta)-(1-\eta)\right)$ , and $d_{i}^{-}=p\left(n(-s_{i}(1-2\eta)+(1-\eta))-\eta\right)$ denote the expected degree of a node in cluster $C_{i}$ , $i\in[k]$ . Let $u^{+}=\left(\sqrt{\frac{n_{1}}{d_{1}^{+}}},\ldots,\sqrt{\frac{n_{k}}{d_{k}^{+}}}\right)^{\top}$ , $u^{-}=\left(\sqrt{\frac{n_{1}}{d_{1}^{-}}},\ldots,\sqrt{\frac{n_{k}}{d_{k}^{-}}}\right)^{\top}$ , $\alpha_{i}^{+}=1+\tau^{-}+p(1-\eta)/d_{i}^{+}$ , and $\alpha_{i}^{-}=1+\tau^{+}+p\eta/d_{i}^{-}$ , for $i\in[k]$ , for some $\tau^{+}>0,\tau^{-}\geqslant 0$ . Let the columns of $V^{\perp}$ contain eigenvectors of $\operatorname*{\varmathbb{E}}[D^{+}]$ which are orthogonal to the column span of $\Theta$ . It holds true that

\overline{T}=\begin{bmatrix}{\Theta R}&V^{\perp}\\ \end{bmatrix}\begin{bmatrix}\bm{\Lambda}\\ &\frac{\alpha_{1}^{+}}{\alpha_{1}^{-}}I_{n_{1}-1}\\ &&\ddots\\ &&&\frac{\alpha_{k}^{+}}{\alpha_{k}^{-}}I_{n_{k}-1}\end{bmatrix}\begin{bmatrix}(\Theta R)^{\top}\\ {V^{\perp}}^{\top}\end{bmatrix}\,,

(16)

where $R$ is a $k\times k$ rotation matrix, and $\Lambda$ is a diagonal matrix, such that $(C^{-})^{-1/2}\;C^{+}\;(C^{-})^{-1/2}=R\Lambda R^{T}$ , where

C^{+}=-p\eta u^{+}(u^{+})^{\top}+\textrm{diag}\left(1+\tau^{-}+\frac{p}{d_{i}^{+}}(1-\eta-n_{i}(1-2\eta))\right)\,,

(17)

C^{-}=-p(1-\eta)u^{-}(u^{-})^{\top}+\textrm{diag}\left(1+\tau^{+}+\frac{p}{d_{i}^{-}}(\eta+n_{i}(1-2\eta))\right)\,.

(18)

Proof.

We first consider the spectrum of $D^{+},D^{-},A^{+},A^{-}$ , followed by that of $(\overline{L^{+}_{sym}}+\tau^{-}I)$ and $(\overline{L^{-}_{sym}}+\tau^{+}I)$ , which altogether will reveal the spectral decomposition of $\overline{T}$ .

$\bullet$ Analysis in expectation of the spectra of $D^{+},D^{-},A^{+},A^{-}$ . Without loss of generality, we may assume that cluster $C_{1}$ contains the first $n_{1}$ vertices, cluster $C_{2}$ the next $n_{2}$ vertices and similarly for the remaining clusters. Note that $\operatorname*{\varmathbb{E}}[D^{\pm}]=\textrm{diag}\left(d_{1}^{\pm}I_{n_{1}},\ldots,d_{k}^{\pm}I_{n_{k}}\right)$ , where for $i\in[k]$ , straightforward calculations reveal that $d_{i}^{+}=p\left(n(s_{i}(1-2\eta)+\eta)-(1-\eta)\right)$ , and $d_{i}^{-}=p\left(n(-s_{i}(1-2\eta)+(1-\eta))-\eta\right)$ . One can rewrite the matrices $(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1}$ in the more convenient form

(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1}=[{\Theta}~V^{\perp}]~\textrm{diag}\left(\frac{1}{d_{1}^{\pm}},...,\frac{1}{d_{k}^{\pm}},\frac{1}{d_{1}^{\pm}}I_{n_{1}-1},...,\frac{1}{d_{k}^{\pm}}I_{n_{k}-1}\right)~[{\Theta}~V^{\perp}]^{\top}

(19)

since the column vectors of $\Theta$ are eigenvectors of $(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1}$ , and the eigenvalues of $(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1}$ are apparent because $\operatorname*{\varmathbb{E}}[D^{\pm}]$ is a diagonal matrix. Note that (19) is true in general, and does not make any assumption on the placement of the vertices into their respective $C_{i}$ cluster. Furthermore, one can verify that $\operatorname*{\varmathbb{E}}[A^{+}]$ admits the eigen-decomposition

\operatorname*{\varmathbb{E}}[A^{+}]=\Theta_{n\times k}{\begin{bmatrix}n_{1}p(1-\eta)&\sqrt{n_{1}n_{2}}p\eta&\ldots&\sqrt{n_{1}n_{k}}p\eta\\ \sqrt{n_{2}n_{1}}p\eta&n_{2}p(1-\eta)&\ldots&\sqrt{n_{2}n_{k}}p\eta\\ \vdots&\vdots&\ddots&\vdots\\ \sqrt{n_{k}n_{1}}p\eta&\sqrt{n_{k}n_{2}}p\eta&\ldots&n_{k}p(1-\eta)\end{bmatrix}_{k\times k}}{\Theta}^{\top}_{k\times n}-p(1-\eta)I_{n\times n}

(20)

and similarly, $\operatorname*{\varmathbb{E}}[A^{-}]$ can be decomposed as

\operatorname*{\varmathbb{E}}[A^{-}]=\Theta_{n\times k}{\begin{bmatrix}n_{1}p\eta&\sqrt{n_{1}n_{2}}p(1-\eta)&\ldots&\sqrt{n_{1}n_{k}}p(1-\eta)\\ \sqrt{n_{2}n_{1}}p(1-\eta)&n_{2}p\eta&\ldots&\sqrt{n_{2}n_{k}}p(1-\eta)\\ \vdots&\vdots&\ddots&\vdots\\ \sqrt{n_{k}n_{1}}p(1-\eta)&\sqrt{n_{k}n_{2}}p(1-\eta)&\ldots&n_{k}p\eta\end{bmatrix}_{k\times k}}{\Theta}^{\top}_{k\times n}-p\eta I_{n\times n}\,.

$\bullet$ Analysis of the spectra of $(\overline{L^{+}_{sym}}+\tau^{-}I)$ and $(\overline{L^{-}_{sym}}+\tau^{+}I)$ . We start by observing that

\displaystyle\overline{L^{\pm}_{sym}}+\tau^{\mp}I=I-(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{\pm}])(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}+\tau^{\mp}I=(1+\tau^{\mp})I-(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{\pm}])(\operatorname*{\varmathbb{E}}[D^{\pm}])^{-1/2}\,.

(21)

In light of (20), one can write $(\operatorname*{\varmathbb{E}}[D^{+}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{+}])(\operatorname*{\varmathbb{E}}[D^{+}])^{-1/2}$ as

(\operatorname*{\varmathbb{E}}[D^{+}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{+}])(\operatorname*{\varmathbb{E}}[D^{+}])^{-1/2}=

\begin{bmatrix}{\Theta}&V^{\perp}\\ \end{bmatrix}\begin{bmatrix}\overbrace{\begin{bmatrix}\frac{n_{1}}{d_{1}^{+}}p(1-\eta)&\sqrt{\frac{n_{1}n_{2}}{d_{1}^{+}d_{2}^{+}}}p\eta&\ldots&\sqrt{\frac{n_{1}n_{k}}{d_{1}^{+}d_{k}^{+}}}p\eta\\ \sqrt{\frac{n_{2}n_{1}}{d_{2}^{+}d_{1}^{+}}}p\eta&\frac{n_{2}}{d_{2}^{+}}p(1-\eta)&\ldots&\sqrt{\frac{n_{2}n_{k}}{d_{2}^{+}d_{k}^{+}}}p\eta\\ \vdots&\vdots&\ddots&\vdots\\ \sqrt{\frac{n_{k}n_{1}}{d_{k}^{+}d_{1}^{+}}}p\eta&\sqrt{\frac{n_{k}n_{2}}{d_{k}^{+}d_{2}^{+}}}p\eta&\ldots&\frac{n_{k}}{d_{k}^{+}}p(1-\eta)\end{bmatrix}_{k\times k}}^{\stackrel{{\scriptstyle\textup{def}}}{{=}}B^{+}}&\bm{0}_{k\times(n-k)}\\ \bm{0}_{(n-k)\times k}&\bm{0}_{(n-k)\times(n-k)}\\ \end{bmatrix}\begin{bmatrix}\Theta^{\top}\\ {V^{\perp}}^{\top}\end{bmatrix}-p(1-\eta)(\operatorname*{\varmathbb{E}}[D^{+}])^{-1}\,.

(22)

Similarly, using the expression for $\operatorname*{\varmathbb{E}}[A^{-}]$ , the expression for $(\operatorname*{\varmathbb{E}}[D^{-}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{-}])(\operatorname*{\varmathbb{E}}[D^{-}])^{-1/2}$ can be written as

(\operatorname*{\varmathbb{E}}[D^{-}])^{-1/2}(\operatorname*{\varmathbb{E}}[A^{-}])(\operatorname*{\varmathbb{E}}[D^{-}])^{-1/2}=

\begin{bmatrix}{\Theta}&V^{\perp}\\ \end{bmatrix}\begin{bmatrix}\overbrace{\begin{bmatrix}\frac{n_{1}}{d_{1}^{-}}p\eta&\sqrt{\frac{n_{1}n_{2}}{d_{1}^{-}d_{2}^{-}}}p(1-\eta)&\ldots&\sqrt{\frac{n_{1}n_{k}}{d_{1}^{-}d_{k}^{-}}}p(1-\eta)\\ \sqrt{\frac{n_{2}n_{1}}{d_{2}^{-}d_{1}^{-}}}p(1-\eta)&\frac{n_{2}}{d_{2}^{-}}p\eta&\ldots&\sqrt{\frac{n_{2}n_{k}}{d_{2}^{-}d_{k}^{-}}}p(1-\eta)\\ \vdots&\vdots&\ddots&\vdots\\ \sqrt{\frac{n_{k}n_{1}}{d_{k}^{-}d_{1}^{-}}}p(1-\eta)&\sqrt{\frac{n_{k}n_{2}}{d_{k}^{-}d_{2}^{-}}}p(1-\eta)&\ldots&\frac{n_{k}}{d_{k}^{-}}p\eta\end{bmatrix}_{k\times k}}^{\stackrel{{\scriptstyle\textup{def}}}{{=}}B^{-}}&\bm{0}_{k\times(n-k)}\\ \bm{0}_{(n-k)\times k}&\bm{0}_{(n-k)\times(n-k)}\\ \end{bmatrix}\begin{bmatrix}\Theta^{\top}\\ {V^{\perp}}^{\top}\end{bmatrix}-p\eta(\operatorname*{\varmathbb{E}}[D^{-}])^{-1}\,.

(23)

Combining (19), (22), and (23) into (21), we readily arrive at

(\overline{L_{sym}^{\pm}}+\tau^{\mp}I)=\begin{bmatrix}{\Theta}&V^{\perp}\\ \end{bmatrix}\begin{bmatrix}[\underbrace{\textrm{diag}(\alpha_{i}^{\pm})-B^{\pm}]_{k\times k}}_{\stackrel{{\scriptstyle\textup{def}}}{{=}}C^{\pm}}&\bm{0}_{k\times(n-k)}\\ &\alpha_{1}^{\pm}I_{n_{1}-1}\\ &&\alpha_{2}^{\pm}I_{n_{k}-1}\\ &&&\ddots\\ &&&&\alpha_{k}^{\pm}I_{n_{k}-1},\end{bmatrix}\begin{bmatrix}\Theta^{\top}\\ {V^{\perp}}^{\top}\end{bmatrix}

(24)

where $\alpha_{i}^{\pm}$ and $C^{+},C^{-}$ are defined as in the statement of the lemma. The spectral decomposition of $\overline{T}$ now follows trivially using (24), along with the spectral decomposition $(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}=R\Lambda R^{T}$ . ∎

Lemma 4.1 reveals that we need to extract the $k$ -informative eigenvectors $\Theta R$ from the $n$ -eigenvectors $\begin{bmatrix}{\Theta R}&V^{\perp}\\ \end{bmatrix}$ of $\overline{T}$ . Clearly, it suffices to recover any orthonormal basis for the column span of $\Theta$ , since the rows of any such corresponding matrix (one instance of which is $\Theta R$ ) will exhibit the same clustering structure as $\Theta$ .

4.2 Ensuring $V_{k}(\overline{T})=\Theta R$ and bounding the spectral gap

In this section, our aim is to show that, for suitable values of $\tau^{+}>0,\tau^{-}\geqslant 0$ , the eigenvectors corresponding to the smallest $k$ eigenvalues of $\overline{T}$ are given by $\Theta R$ , i.e., $V_{k}(\overline{T})=\Theta R$ . This is equivalent to ensuring (recall Lemma 4.1) that

\lambda_{n-k+1}(\overline{T})=\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert<\min_{i\in[k]}\frac{\alpha_{i}^{+}}{\alpha_{i}^{-}}=\lambda_{n-k}(\overline{T}).

(25)

Moreover, we will need to find a strictly positive lower-bound on the spectral gap $\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})$ , as it will be used later on, in order to show that the column span of $V_{k}(T)$ is close to that of $V_{k}(\overline{T})$ . We first consider the equal-sized clusters case, and then proceed to the general-sized clusters case.

4.2.1 Spectral gap for equal-sized clusters

When the cluster sizes are equal, the analysis is considerably cleaner than the general setting. Let us first establish notation specific to the equal-sized clusters case.

Remark 4.2 (Notation for the equal-sized clusters).

For clusters of equal size, we have that $n_{1}=...=n_{k}=n/k$ , $d^{+}:=d_{1}^{+}=...=d_{k}^{+}$ , $d^{-}:=d_{1}^{-}=...=d_{k}^{-}$ , $\alpha^{+}:=\alpha_{1}^{+}=...=\alpha_{k}^{+}$ , and $\alpha^{-}:=\alpha_{1}^{-}=...=\alpha_{k}^{-}$ . Let $C^{+}_{e},C^{-}_{e}$ , and $\overline{T_{e}}$ denote the respective counterparts of $C^{+},C^{-}$ , and $\overline{T}$ , for the equal-sized case. In light of (17) and (18), one can verify that $C^{+}_{e}$ and $C^{-}_{e}$ are simultaneously diagonalizable, which we show in Lemma D.1.

In the following lemma, we show the exact value of $\left\lVert\Lambda\right\rVert=\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert$ .

Lemma 4.3 (Bounding the spectral norm of $(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}$ ).

For equal-sized clusters, the following holds true

\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert=\max\left\{\frac{\tau^{-}}{\tau^{+}},\frac{\tau^{-}+\frac{pn\eta}{d^{+}}}{\tau^{+}+\frac{pn(1-\eta)}{d^{-}}}\right\}\,.

Proof.

The lemma follows directly from Lemma D.1. ∎

Next, we derive conditions on $\tau^{+}>0,\tau^{-}\geqslant 0$ which ensure $V_{k}(\overline{T})=\Theta R$ .

Lemma 4.4 (Conditions on $\tau^{-}$ and $\tau^{+}$ ).

Suppose $n\geqslant\frac{2k(1-\eta)}{1-2\eta}$ , and $\tau^{-}\geqslant 0$ , $\tau^{+}>0$ . If $\tau^{-}$ , $\tau^{+}$ satisfy

1.

$\tau^{-}\left(1+\frac{p\eta}{d^{-}}\right)<\tau^{+}\left(1+\frac{p(1-\eta)}{d^{+}}\right)\,,$

\tau^{-}\left[\frac{(1-2\eta)/k}{(1-\eta)-\frac{1-2\eta}{k}}\right]+\tau^{+}\left[\frac{(1-2\eta)/k}{\eta+\frac{1-2\eta}{k}}\right]+1>\frac{2\eta}{\eta+\frac{1-2\eta}{2k}}\,.

Then it holds true that $V_{k}(\overline{T})=\Theta R$ , i.e., $\lambda_{n-k+1}(\overline{T})=\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert<\frac{\alpha^{+}}{\alpha^{-}}=\lambda_{n-k}(\overline{T}).$

Proof.

Recalling the expression for $\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert$ from Lemma 4.3, we will ensure that each term inside the max is less than $\alpha^{+}/\alpha^{-}$ . To derive the first condition of the lemma, we simply ensure that

\frac{\tau^{-}}{\tau^{+}}<\frac{1+\tau^{-}+p(1-\eta)/d^{+}}{1+\tau^{+}+p\eta/d^{-}}\Leftrightarrow\tau^{-}\left(1+\frac{p\eta}{d^{-}}\right)<\tau^{+}\left(1+\frac{p(1-\eta)}{d^{+}}\right)\,.

Before deriving the second condition, let us note additional useful bounds on $\frac{np}{d^{-}},\frac{np}{d^{+}}$ which will be needed later.

1.

$d^{-}/np=1-\eta-(1-2\eta)/k-\eta/n\leqslant 1-\eta$ .
2.

Since $n\geqslant k\geqslant 2$ , we obtain that $d^{-}/np\geqslant(1-\eta)-(1-3\eta)/k\geqslant\frac{1-\eta}{2}$ . This also implies that $p\eta/d^{-}\leqslant 1$ .
Therefore, combining the above two bounds, we arrive at

$\frac{1}{1-\eta}\leqslant\frac{np}{d^{-}}\leqslant\frac{2}{1-\eta}\,.$
3.

$d^{+}/np=(1-2\eta)/k+\eta-(1-\eta)/n\leqslant\eta+(1-2\eta)/k$ .
4.

Since $n\geqslant\frac{2k(1-\eta)}{1-2\eta}$ , it holds that $d^{+}/np=(1-2\eta)/k+\eta-(1-\eta)/n\geqslant\eta+(1-2\eta)/2k$ .
5.

Therefore, combining the above two conditions yields

$\frac{1}{\eta+\frac{1-2\eta}{k}}\leqslant\frac{np}{d^{+}}\leqslant\frac{1}{\eta+\frac{1-2\eta}{2k}}\,.$

To derive the second condition, we need to ensure $\frac{\tau^{-}+\frac{pn\eta}{d^{+}}}{\tau^{+}+\frac{pn(1-\eta)}{d^{-}}}<\frac{1+\tau^{-}+p(1-\eta)/d^{+}}{1+\tau^{+}+p\eta/d^{-}}$ , which is equivalent to

\tau^{-}\left[1-\frac{np}{d^{-}}\left((1-\eta)-\frac{\eta}{n}\right)\right]<\tau^{+}\left[1-\frac{np}{d^{+}}\left(\eta-\frac{1-\eta}{n}\right)\right]+\underbrace{\left[\frac{np(1-\eta)}{d^{-}}\left(1+\frac{p(1-\eta)}{d^{+}}\right)-\frac{np\eta}{d^{+}}\left(1+\frac{p\eta}{d^{-}}\right)\right]}_{\text{term 2}}\,.

Now, we can lower bound “term 2” in the above equation as

{\frac{np(1-\eta)}{d^{-}}\left(1+\frac{p(1-\eta)}{d^{+}}\right)-\frac{np\eta}{d^{+}}\left(1+\frac{p\eta}{d^{-}}\right)}\geqslant 1-\frac{2\eta}{\eta+\frac{(1-2\eta)}{k}}\,.

Hence from the above two equations, we observe that it suffices that $\tau^{+},\tau^{-}$ satisfy

\tau^{-}\left[\frac{(1-2\eta)/k}{(1-\eta)-\frac{1-2\eta}{k}}\right]+\tau^{+}\left[\frac{(1-2\eta)/k}{\eta+\frac{1-2\eta}{k}}\right]+1>\frac{2\eta}{\eta+\frac{1-2\eta}{2k}}\,.

∎

Next, we derive sufficient conditions on $\tau^{+},\tau^{-}$ which ensure a lower bound on the spectral gap

\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})=\frac{\alpha^{+}}{\alpha^{-}}-\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert.

Lemma 4.5 (Conditions on $\tau^{+},\tau^{-}$ , and lower-bound on spectral gap).

Suppose $n\geqslant\frac{2k(1-\eta)}{1-2\eta}$ , then the following holds.

If $\tau^{+}>0,\tau^{-}\geqslant 0$ satisfy

\tau^{+}>\frac{32\eta k}{3(1-2\eta)},\quad\tau^{-}<\min\left\{\frac{3}{2},\frac{3}{16}\tau^{+},\frac{3(1-\eta)}{8(\eta+\frac{1-2\eta}{k})}\right\},

then $V_{k}(\overline{T})=\Theta R$ , and $\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert<\left(1-\frac{(1-2\eta)}{2k(1-\eta)}\right)\frac{\alpha^{+}}{\alpha^{-}}$ , i.e., $\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})>\left(\frac{(1-2\eta)}{2k(1-\eta)}\right)\frac{\alpha^{+}}{\alpha^{-}}$ .

2.

If $\eta<\frac{1}{3k+2}$ and $\tau^{+}>0,\tau^{-}\geqslant 0$ satisfy

$\tau^{-}<\min\left\{\left(\frac{\frac{1-2\eta}{k}-\eta}{\frac{1-2\eta}{k}+\eta}\right),\frac{1}{2},\frac{\tau^{+}}{8}\right\}\,,$

then $V_{k}(\overline{T})=\Theta R$ , and $\left\lVert(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}\right\rVert<\frac{\alpha^{+}}{2\alpha^{-}}$ , i.e., $\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})>\frac{\alpha^{+}}{2\alpha^{-}}$ .

Proof.

We need to ensure the following two conditions for a suitably chosen $\beta\in(0,1]$ .

	$\displaystyle\frac{\tau^{-}+\frac{pn\eta}{d^{+}}}{\tau^{+}+\frac{pn(1-\eta)}{d^{-}}}$	$\displaystyle<\beta\left(\frac{1+\tau^{-}+p(1-\eta)/d^{+}}{1+\tau^{+}+p\eta/d^{-}}\right),$		(26)
	$\displaystyle\frac{\tau^{-}}{\tau^{+}}$	$\displaystyle<\beta\left(\frac{1+\tau^{-}+p(1-\eta)/d^{+}}{1+\tau^{+}+p\eta/d^{-}}\right).$		(27)

1. Ensuring (26)

We can rewrite (26) as

\tau^{-}\left(1+\frac{p\eta}{d^{-}}-\beta\frac{pn(1-\eta)}{d^{-}}\right)+\tau^{+}\left(\frac{pn\eta}{d^{+}}-\beta\left(1+\frac{p(1-\eta)}{d^{+}}\right)\right)+\tau^{+}\tau^{-}(1-\beta)<\beta\frac{pn(1-\eta)}{d^{-}}\left(1+\frac{p(1-\eta)}{d^{+}}\right)-\frac{pn\eta}{d^{+}}\left(1+\frac{p\eta}{d^{-}}\right).

(28)

Using the expressions for $d^{+},d^{-}$ , we can write the coefficients of the terms $\tau^{+},\tau^{-}$ as follows.

	$\displaystyle 1+\frac{p\eta}{d^{-}}-\beta\frac{pn(1-\eta)}{d^{-}}$	$\displaystyle=\frac{-(\frac{1-2\eta}{k})+(1-\eta)(1-\beta)}{-(\frac{1-2\eta}{k})+(1-\eta)-\frac{\eta}{n}},$
	$\displaystyle\frac{pn\eta}{d^{+}}-\beta\left(1+\frac{p(1-\eta)}{d^{+}}\right)$	$\displaystyle=\frac{np}{d^{+}}(\eta-\beta\frac{1-\eta}{n})-\beta=\frac{\eta(1-\beta)-\beta(\frac{1-2\eta}{k})}{\frac{1-2\eta}{k}+\eta-\frac{1-\eta}{n}}.$

Moreover, using the bounds on $\frac{d^{-}}{np},\frac{d^{+}}{np}$ derived in Lemma 4.4, we can lower bound the RHS term in (28) as

\beta\frac{pn(1-\eta)}{d^{-}}\left(1+\frac{p(1-\eta)}{d^{+}}\right)-\frac{pn\eta}{d^{+}}\left(1+\frac{p\eta}{d^{-}}\right)>\beta-\frac{2\eta}{\eta+\frac{1-2\eta}{k}}.

From the above considerations, we see that (28) is ensured provided

\tau^{-}\left[\frac{(\frac{1-2\eta}{k})-(1-\eta)(1-\beta)}{-(\frac{1-2\eta}{k})+(1-\eta)-\frac{\eta}{n}}\right]+\tau^{+}\left[\frac{-\eta(1-\beta)+\beta(\frac{1-2\eta}{k})}{\frac{1-2\eta}{k}+\eta-\frac{1-\eta}{n}}\right]+\beta>\frac{2\eta}{\eta+\frac{1-2\eta}{k}}+\tau^{+}\tau^{-}(1-\beta).

(29)

We outline two possible ways in which (29) is ensured.

•

Note that the denominators of the coefficients of $\tau^{+},\tau^{-}$ in (29) are positive, while the numerators are non-negative provided $1-\beta\leqslant\frac{(1-2\eta)}{2k(1-\eta)}$ . Therefore, choosing

\beta=1-\frac{(1-2\eta)}{2k(1-\eta)}\quad\left(\geqslant\frac{3}{4}\right),

note that (29) is ensured provided

\tau^{-}\left[\frac{(1-2\eta)}{2k(1-\eta)}\right]+\tau^{+}\left[\frac{3(1-2\eta)}{8k\left(\eta+\frac{1-2\eta}{k}\right)}\right]+\frac{3}{4}>\frac{2\eta}{\eta+\frac{1-2\eta}{k}}+\tau^{+}\tau^{-}\left[\frac{(1-2\eta)}{2k(1-\eta)}\right].

(30)

Finally, we observe that in order for (30) to hold, it suffices that

	$\displaystyle\tau^{+}\tau^{-}\left[\frac{(1-2\eta)}{2k(1-\eta)}\right]<\frac{\tau^{+}}{2}\left[\frac{3(1-2\eta)}{8k\left(\eta+\frac{1-2\eta}{k}\right)}\right]$	$\displaystyle\iff\tau^{-}<\frac{3(1-\eta)}{8\left(\eta+\frac{1-2\eta}{k}\right)},\text{ and }$
	$\displaystyle\frac{2\eta}{\eta+\frac{1-2\eta}{k}}<\frac{\tau^{+}}{2}\left[\frac{3(1-2\eta)}{8k\left(\eta+\frac{1-2\eta}{k}\right)}\right]$	$\displaystyle\iff\tau^{+}>\frac{32\eta k}{3(1-2\eta)}.$

•

Alternatively, by setting $\beta=1/2$ , (29) can be rewritten as

\tau^{+}\left[\frac{-\frac{\eta}{2}+\frac{1-2\eta}{2k}}{\frac{1-2\eta}{k}+\eta-\frac{1-\eta}{n}}\right]+\frac{1}{2}>\frac{2\eta}{\eta+\frac{1-2\eta}{k}}+\tau^{-}\left[\frac{-(\frac{1-2\eta}{k})+\frac{1-\eta}{2}}{-(\frac{1-2\eta}{k})+(1-\eta)-\frac{\eta}{n}}\right]+\frac{\tau^{+}\tau^{-}}{2}.

(31)

Clearly, it holds true that

\displaystyle\frac{1}{2}>\frac{2\eta}{\eta+\frac{1-2\eta}{k}}\iff\eta<\frac{1}{3k+2},

which also ensures that the numerator of the coefficient of $\tau^{+}$ is positive. Therefore, if $\eta<\frac{1}{3k+2}$ , then in order for (31) to hold, it suffices that

\tau^{-}<\left[\frac{-\eta+\frac{1-2\eta}{k}}{\frac{1-2\eta}{k}+\eta}\right]\implies\tau^{+}\left[\frac{-\frac{\eta}{2}+\frac{1-2\eta}{2k}}{\frac{1-2\eta}{k}+\eta-\frac{1-\eta}{n}}\right]>\frac{\tau^{+}\tau^{-}}{2}.

2. Ensuring (27)

Note that one can rewrite (27) as

\tau^{-}\tau^{+}(1-\beta)+\tau^{-}\left(1+\frac{p\eta}{d^{-}}\right)<\beta\tau^{+}\left(1+\frac{p(1-\eta)}{d^{+}}\right).

(32)

Since $\frac{p\eta}{d^{-}}\leqslant 1$ , (32) is ensured provided

\tau^{-}\tau^{+}(1-\beta)+2\tau^{-}<\beta\tau^{+}

which in turn holds if each LHS term is respectively less than half of the RHS term. This leads to the condition

\tau^{-}<\min\left\{\frac{\beta}{2(1-\beta)},\frac{\beta}{4}\tau^{+}\right\}.

Finally, plugging the choices $\beta=1-\frac{(1-2\eta)}{2k(1-\eta)}(\geqslant 3/4)$ and $\beta=\frac{1}{2}$ in the above equation, and combining it with the conditions derived for ensuring (26), we readily arrive (after minor simplifications) at the statements in the Lemma. ∎

4.2.2 Spectral gap for the general case

For the general-sized clusters case, it is difficult to find the exact value of $\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert$ . Therefore, in the following lemma, we show an upper bound on this quantity by bounding the spectral norms of $C^{+}$ and $(C^{-})^{-1}$ .

Lemma 4.6 (Bounding the spectral norm of $(C^{-})^{-1}$ and $C^{+}$ ).

Recall $s:=\min_{i\in[k]}n_{i}/n$ . Then it holds true that

	$\displaystyle\lambda_{\max}(C^{+})$	$\displaystyle\leqslant\tau^{-}+\frac{n\eta}{n(s(1-2\eta)+\eta)-(1-\eta)},$		(33)
	$\displaystyle\lambda_{\min}(C^{-})$	$\displaystyle\geqslant\tau^{+}\,.$		(34)

From the above two inequalities, it follows that

\displaystyle\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert\leqslant\frac{\lambda_{max}(C^{+})}{\lambda_{min}(C^{-})}\leqslant\frac{\tau^{-}+\frac{n\eta}{n(s(1-2\eta)+\eta)-(1-\eta)}}{\tau^{+}}\,.

The proof of the above lemma is deferred to Appendix D.

Remark 4.7.

It is difficult to obtain more precise bounds on $\lambda_{\max}(C^{+})$ and $\lambda_{\min}(C^{-})$ , given the expressions for $C^{+}$ in (17), and $C^{-}$ in (18). Clearly, a tighter bound on $\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert$ would yield a tighter analysis in the general case.

Recall $l:=\max_{i\in[k]}n_{i}/n$ ; with a slight abuse of notation, let $d_{l}^{\pm}$ denote the degree of the largest cluster (of size $nl$ ). As before, we now derive conditions on $\tau^{+}>0,\tau^{-}\geqslant 0$ which ensure $V_{k}(\overline{T})=\Theta R$ , or equivalently,

\lambda_{n-k+1}(\overline{T})=\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert<\min_{i\in[k]}\frac{\alpha_{i}^{+}}{\alpha_{i}^{-}}=\frac{1+\tau^{-}+p(1-\eta)/d_{l}^{+}}{1+\tau^{+}+p\eta/d_{l}^{-}}=\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}=\lambda_{n-k}(\overline{T}).

(35)

Additionally, we find sufficient conditions on $\tau^{+}>0,\tau^{-}\geqslant 0$ which ensure a lower bound on the spectral gap $\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})=\min_{i\in[k]}\frac{\alpha_{i}^{+}}{\alpha_{i}^{-}}-\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert$ . These are shown in the following lemma.

Lemma 4.8 (Conditions on $\tau^{+},\tau^{-}$ , and Lower-Bound on Spectral Gap).

Suppose $n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}$ , then the following is true.

1.

If $\tau^{+}>0,\tau^{-}\geqslant 0$ satisfy

$2\tau^{-}+\frac{4\eta}{s(1-2\eta)+2\eta}<\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\tau^{+}$ (36)

then $V_{k}(\overline{T})=\Theta R$ , i.e., $\lambda_{n-k+1}(\overline{T})=\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert<\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}=\lambda_{n-k}(\overline{T})$ .

For $\beta=\frac{4\eta}{s(1-2\eta)+4\eta}$ with $0<\eta<\frac{1}{2}$ , if $\tau^{+}>0,\tau^{-}\geqslant 0$ satisfy

(1-\beta)\tau^{-}\tau^{+}+2\tau^{-}+\frac{4\eta}{s(1-2\eta)+2\eta}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\tau^{+}

(37)

then $V_{k}(\overline{T})=\Theta R$ , and $\left\lVert(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}\right\rVert<\beta\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}$ , i.e., $\lambda_{n-k}(\overline{T})-\lambda_{n-k+1}(\overline{T})>(1-\beta)\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}$ . Moreover, for (37) to hold, it suffices that

\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\quad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}.

3.

The statement in part ( $2$ ) also holds for the choice $\beta=\frac{1}{2}$ , and provided $\eta\leqslant\frac{s}{2s+4}$ .

Proof.

From (35) and Lemma 4.6, it suffices to show for $\beta\in(0,1]$ that

\frac{\tau^{-}+\frac{\eta}{s(1-2\eta)+\eta-\frac{(1-\eta)}{n}}}{\tau^{+}}<\beta\left(\frac{1+\tau^{-}+p(1-\eta)/d_{l}^{+}}{1+\tau^{+}+p\eta/d_{l}^{-}}\right).

(38)

For the stated condition on $n$ , it is easy to verify that

	$\displaystyle n\geqslant\frac{2(1-\eta)}{s(1-2\eta)}\implies s(1-2\eta)+\eta-\frac{(1-\eta)}{n}$	$\displaystyle\geqslant\frac{s(1-2\eta)}{2}+\eta,$
	$\displaystyle n\geqslant\frac{2\eta}{(1-l)(1-\eta)}\implies\frac{p\eta}{d_{l}^{-}}\leqslant\frac{2\eta}{n(1-\eta)(1-l)}$	$\displaystyle\leqslant 1.$

Using these bounds in (38), observe that it suffices that $\tau^{+},\tau^{-}$ satisfy

\frac{\tau^{-}+\frac{2\eta}{s(1-2\eta)+2\eta}}{\tau^{+}}<\beta\left(\frac{1+\tau^{-}}{2+\tau^{+}}\right).

(39)

Then for $\beta=1$ , we readily see that (39) is equivalent to (36).

To establish the second part of the Lemma, we begin by rewriting (39) as

\displaystyle(1-\beta)\tau^{+}\tau^{-}+2\tau^{-}+\frac{4\eta}{s(1-2\eta)+2\eta}<\left(\beta-\frac{2\eta}{s(1-2\eta)+2\eta}\right)\tau^{+}=\left[\frac{\beta s(1-2\eta)-2\eta(1-\beta)}{s(1-2\eta)+2\eta}\right]\tau^{+},

(40)

and observe that

\beta s(1-2\eta)\geqslant 4\eta(1-\beta)\iff\beta\geqslant\frac{4\eta}{s(1-2\eta)+4\eta}

(41)

This verifies (37) in the statement of the Lemma. The “moreover” part is established by ensuring that each term on the LHS of (37) is a sufficiently small fraction of the RHS term. In particular, it is enough to choose this fraction to be $1/4$ for the first two terms, and $1/2$ for the third term.

Finally, the third part of the Lemma can be shown in the same manner as the second part. The starting point is to ensure (40), and we simply observe that for $\beta=1/2$ , (41) is equivalent to $\eta\leqslant\frac{s}{2s+4}$ . The rest follows identically. ∎

4.3 Concentration bound for $\left\lVert T-\overline{T}\right\rVert$

In this section, we bound the “distance” between $T$ and $\overline{T}$ , i.e., $\left\lVert T-\overline{T}\right\rVert$ . This is shown via individually bounding the terms $\left\lVert L^{+}_{sym}-\overline{L^{+}_{sym}}\right\rVert$ , and $\left\lVert L^{-}_{sym}-\overline{L^{-}_{sym}}\right\rVert$ . To this end, we first recall the following Theorem from [CR11].

Theorem 4.9 (Bounding $\left\lVert L_{sym}-\overline{L_{sym}}\right\rVert$ , [CR11]).

Let $L_{sym}$ denote the normalized Laplacian of a random graph, and $\overline{L_{sym}}$ the normalized Laplacian of the expected graph. Let $\delta$ be the minimum expected degree of the graph. Choose $\varepsilon>0$ . Then there exists a constant $c_{\varepsilon}$ such that, if $\delta\geqslant c_{\varepsilon}\ln n$ , then with probability at least $1-\varepsilon$ , it holds true that

\left\lVert L_{sym}-\overline{L_{sym}}\right\rVert\leqslant 2\sqrt{\frac{3\ln(4n/\varepsilon)}{\delta}}\,.

Remark 4.10.

A similar result appears in [Imb09] for the (unsigned) inhomogeneous Erdős-Rényi model, where $\left\lVert L_{sym}-\overline{L_{sym}}\right\rVert=O(\sqrt{\ln n/d_{0}})$ , with $d_{0}$ the smallest expected degree of the graph.

Using Theorem 4.9, we readily obtain the following concentration bounds for $\left\lVert L^{+}_{sym}-\overline{L^{+}_{sym}}\right\rVert$ and $\left\lVert L^{-}_{sym}-\overline{L^{-}_{sym}}\right\rVert$ .

Lemma 4.11 (Bounding $\left\lVert L_{sym}^{\pm}-\overline{L_{sym}^{\pm}}\right\rVert$ ).

Assuming $n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}$ , there exists a constant $c_{\varepsilon}>0$ such that if $p\geqslant\frac{c_{\varepsilon}\ln n}{n}\max\left\{\frac{1}{s(1-2\eta)+2\eta},\frac{2}{1-l}\right\}$ , then with probability at least $1-2\varepsilon$ ,

\displaystyle\left\lVert L^{+}_{sym}-\overline{L^{+}_{sym}}\right\rVert\leqslant 2\sqrt{\frac{6\ln(4n/\varepsilon)}{np[s(1-2\eta)+2\eta]}},\qquad\text{and}\qquad\left\lVert L^{-}_{sym}-\overline{L^{-}_{sym}}\right\rVert\leqslant 2\sqrt{\frac{12\ln(4n/\varepsilon)}{np(1-l)}}.

Proof.

Note that the minimum expected degrees of the positive and negative subgraphs are given by $d_{s}^{+},d_{l}^{-}$ , respectively. For the stated condition on $n$ , it is easily seen that

d_{s}^{+}\geqslant\frac{np}{2}\left[s(1-2\eta)+2\eta\right],\quad d_{l}^{-}\geqslant\frac{np}{2}(1-l)(1-\eta)\geqslant\frac{np(1-l)}{4}.

(42)

Invoking Theorem 4.9, and observing that $d_{s}^{+},d_{l}^{-}\geqslant\frac{c_{\varepsilon}}{2}\ln n$ are ensured for the stated condition on $p$ , the statement follows via the union bound. ∎

Next, using the above lemma, we can upper bound $\left\lVert T-\overline{T}\right\rVert$ . This will help us show that $V_{k}(T)$ and $V_{k}(\overline{T})$ are “close”.

Lemma 4.12 (Bounding $\left\lVert T-\overline{T}\right\rVert$ ).

Let $P=(L^{-}_{sym}+\tau^{+}I)$ , $\;\overline{P}=(\overline{L^{-}_{sym}}+\tau^{+}I)$ , $\;Q=(L^{+}_{sym}+\tau^{-}I)$ , and $\;\overline{Q}=(\overline{L^{+}_{sym}}+\tau^{-}I)$ . Assume that $\left\lVert P-\overline{P}\right\rVert\leqslant\Delta_{P}$ , and $\;\left\lVert Q-\overline{Q}\right\rVert\leqslant\Delta_{Q}$ . Then it holds true that

\left\lVert T-\overline{T}\right\rVert\leqslant\frac{(\alpha_{s}^{+}+\Delta_{Q})}{\tau^{+}}\left(\frac{\Delta_{P}}{\tau^{+}}+2\sqrt{\frac{\Delta_{P}}{\tau^{+}}}\right)+\frac{\Delta_{Q}}{\tau^{+}}

where $\alpha_{s}^{+}=1+\tau^{-}+\frac{p(1-\eta)}{d_{s}^{+}}$ (see Lemma 4.1).

Proof.

Since $P,\overline{P},Q,\overline{Q}$ are positive definite, therefore using Proposition C.2, we obtain the bound

\displaystyle\left\lVert T-\overline{T}\right\rVert\leqslant\left\lVert P^{-1}\right\rVert\left\lVert Q\right\rVert\left(\left\lVert(\overline{P})^{-1}\right\rVert\left\lVert\overline{P}-P\right\rVert+2\left\lVert({\overline{P}})^{-1/2}\right\rVert\left\lVert\overline{P}-P\right\rVert^{1/2}\right)+\left\lVert(\overline{P})^{-1}\right\rVert\left\lVert Q-\overline{Q}\right\rVert.

(43)

We know that $\left\lVert P^{-1}\right\rVert=1/\tau^{+}=\left\lVert\overline{P}^{-1}\right\rVert$ and $\left\lVert(\overline{P})^{-1/2}\right\rVert=1/\sqrt{\tau^{+}}$ . Moreover, $\left\lVert Q\right\rVert\leqslant\left\lVert\overline{Q}\right\rVert+\Delta_{Q}$ by Weyl’s inequality [Wey12] (see Appendix B). Hence (43) simplifies to

\displaystyle\left\lVert T-\overline{T}\right\rVert\leqslant\frac{(\left\lVert\overline{Q}\right\rVert+\Delta_{Q})}{\tau^{+}}\left(\frac{\Delta_{P}}{\tau^{+}}+2\sqrt{\frac{\Delta_{P}}{\tau^{+}}}\right)+\frac{\Delta_{Q}}{\tau^{+}}\leqslant\frac{(\alpha_{s}^{+}+\Delta_{Q})}{\tau^{+}}\left(\frac{\Delta_{P}}{\tau^{+}}+2\sqrt{\frac{\Delta_{P}}{\tau^{+}}}\right)+\frac{\Delta_{Q}}{\tau^{+}}\,,

where the last inequality can be verified by examining the expression of $\overline{Q}$ in (24), and noting from the definition of $C^{+}$ that $\left\lVert C^{+}\right\rVert<\max\left\{\alpha_{1}^{+},...,\alpha_{k}^{+}\right\}=\alpha_{s}^{+}$ holds (via Weyl’s inequality). ∎

4.4 Estimating $V_{k}(\overline{T})$ and $G_{k}(\overline{T})$ up to a rotation

We are now ready to combine the results of the previous sections to show that if $n,p$ are large enough, then the distance between the subspaces spanned by $V_{k}(T)$ and $V_{k}(\overline{T})$ is small, i.e., there exists an orthonormal matrix $O$ such that $V_{k}(T)$ is close to $V_{k}(\overline{T})O$ . For $\tau^{+},\tau^{-}$ chosen suitably, we have seen in Lemma 4.8 that $V_{k}(\overline{T})=\Theta R$ for a rotation $R$ , hence this suggests that the rows of $V_{k}(T)$ will then also approximately preserve the clustering structure of $V_{k}(\overline{T})$ .

With $P,\overline{P},Q,\overline{Q}$ as defined in Lemma 4.12 recall from (4), (7) that $G_{k}(T),G_{k}(\overline{T})$ can be written as

G_{k}(\overline{T})=\overline{P}^{-1/2}V_{k}(\overline{T}),\quad G_{k}(T)=P^{-1/2}V_{k}(T).

(44)

Therefore if $V_{k}(\overline{T})=\Theta R$ , then using the expression for $\overline{P}$ from (24) we see that $G_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R$ , and thus the rows of $G_{k}(\overline{T})$ also preserve the ground truth clustering structure. Moreover, if $\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert$ is small, then it can be shown to imply a bound on $\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert$ . Hence the rows of $G_{k}(T)$ will approximately preserve the clustering structure of $G_{k}(\overline{T})$ .

Before stating the theorem, let us define the terms

C_{1}(\tau^{+},\tau^{-})=3\left(\frac{(3+\tau^{-})(2\sqrt{\tau^{+}}+1)+\tau^{+}}{(\tau^{+})^{2}}\right),\quad C_{2}(s,\eta,l)=\max\left\{\frac{1}{s(1-2\eta)+2\eta},\frac{2}{1-l}\right\}.

(45)

Theorem 4.13.

Assuming $n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}$ , suppose $\tau^{+}>0,\tau^{-}\geqslant 0$ are chosen to satisfy

\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\quad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}

where $\beta,\eta$ satisfy one of the following conditions.

1.

$\beta=\frac{4\eta}{s(1-2\eta)+4\eta}$ and $0<\eta<\frac{1}{2}$ , or
2.

$\beta=\frac{1}{2}$ and $\eta\leqslant\frac{s}{2s+4}$ .

Then $V_{k}(\overline{T})=\Theta R$ and $G_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R$ where $R$ is a rotation matrix, and $C^{-}\succ 0$ is as defined in (18). Moreover, for any $\varepsilon,\delta\in(0,1)$ , there exists a constant $\widetilde{c}_{\varepsilon}>0$ such that the following is true. If $p$ satisfies

p\geqslant\max\left\{\widetilde{c}_{\varepsilon}C_{2}(s,\eta,l),\frac{256C_{1}^{4}(\tau^{+},\tau^{-})(2+\tau^{+})^{4}}{\delta^{4}(1+\tau^{-})^{4}(1-\beta)^{4}}C_{2}(s,\eta,l),\frac{81}{(1-l)\delta^{4}}\right\}\frac{\ln(4n/\varepsilon)}{n}

with $C_{1}(\cdot),C_{2}(\cdot)$ as in (45), then with probability at least $1-2\varepsilon$ , there exists an orthogonal matrix $O\in\varmathbb R^{k\times k}$ such that

\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert\leqslant\delta,\qquad\mbox{and}\qquad\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\frac{\delta}{(\tau^{+})^{2}}.

Proof.

We will first simplify the upper bound on $\left\lVert T-\overline{T}\right\rVert$ in Lemma 4.12, starting by bounding $\alpha_{s}^{+}$ . If $n\geqslant\frac{2(1-\eta)}{s(1-2\eta)}$ , it is easy to verify that $\frac{(1-\eta)p}{d_{s}^{+}}\leqslant 1$ which implies $\alpha_{s}^{+}\leqslant 2+\tau^{-}$ . Moreover, we observe from Lemma 4.11 that $\Delta_{P},\Delta_{Q}\leqslant 1$ is ensured if $p\geqslant\widetilde{c}_{\varepsilon}C_{2}(s,\eta,l)\frac{\ln(4n/\varepsilon)}{n}$ where $\widetilde{c}_{\varepsilon}=\max\left\{24,c_{\varepsilon}\right\}$ . These considerations altogether imply

	$\displaystyle\left\lVert T-\overline{T}\right\rVert\leqslant\frac{(3+\tau^{-})(2\sqrt{\tau^{+}}+1)}{(\tau^{+})^{2}}\sqrt{\Delta_{P}}+\frac{\Delta_{Q}}{\tau^{+}}$	$\displaystyle\leqslant\frac{(3+\tau^{-})(2\sqrt{\tau^{+}}+1)+\tau^{+}}{(\tau^{+})^{2}}\max\left\{\sqrt{\Delta_{P}},\sqrt{\Delta_{Q}}\right\}$
		$\displaystyle\leqslant C_{1}(\tau^{+},\tau^{-})C_{2}^{1/4}(s,\eta,l)\left(\frac{\ln(4n/\varepsilon)}{np}\right)^{1/4}$		(46)

where in the penultimate inequality we used $\Delta_{Q}\leqslant\sqrt{\Delta_{Q}}$ , and the last inequality uses Lemma 4.11.

Next, we will use the Davis-Kahan theorem [DK70] (see Appendix B) for bounding the distance $\left\lVert(I-V_{k}(\overline{T})V_{k}(\overline{T})^{T})V_{k}(T)\right\rVert$ . Applied to our setup, it yields

\left\lVert(I-V_{k}(\overline{T})V_{k}(\overline{T})^{T})V_{k}(T)\right\rVert\leqslant\frac{\left\lVert T-\overline{T}\right\rVert}{\lambda_{n-k+1}(T)-\lambda_{n-k}(\overline{T})},

(47)

provided $\lambda_{n-k+1}(T)-\lambda_{n-k}(\overline{T})>0$ . From Weyl’s inequality, we know that $\lambda_{n-k+1}(T)\geqslant\lambda_{n-k+1}(\overline{T})-\left\lVert T-\overline{T}\right\rVert$ . Moreover, under the stated conditions on $\tau^{+},\tau^{-}$ , we obtain from Lemma 4.8 the bound

\lambda_{n-k+1}(\overline{T})-\lambda_{n-k}(\overline{T})\geqslant(1-\beta)\frac{\alpha_{l}^{+}}{\alpha_{l}^{-}}\geqslant(1-\beta)\left(\frac{1+\tau^{-}}{2+\tau^{+}}\right),

where in the last inequality we used the simplifications $p(1-\eta)/d_{l}^{+}\geqslant 0$ and $p\eta/d_{l}^{-}\leqslant 1$ in the expressions for $\alpha_{l}^{+},\alpha_{l}^{-}$ . Hence using (46), we observe that if

C_{1}(\tau^{+},\tau^{-})C_{2}^{1/4}(s,\eta,l)\left(\frac{\ln(4n/\varepsilon)}{np}\right)^{1/4}\leqslant\left(\frac{1-\beta}{2}\right)\left(\frac{1+\tau^{-}}{2+\tau^{+}}\right)\iff p\geqslant\left(\frac{16C_{1}^{4}(\tau^{+},\tau^{-})C_{2}(s,\eta,l)(2+\tau^{+})^{4}}{(1+\tau^{-})^{4}(1-\beta)^{4}}\right)\frac{\ln(4n/\varepsilon)}{n},

then the RHS of (47) can be bounded as

\displaystyle\left\lVert(I-V_{k}(\overline{T})V_{k}(\overline{T})^{T})V_{k}(T)\right\rVert

\displaystyle\leqslant\frac{2(2+\tau^{+})}{(1+\tau^{-})(1-\beta)}C_{1}(\tau^{+},\tau^{-})C_{2}^{1/4}(s,\eta,l)\left(\frac{\ln(4n/\varepsilon)}{np}\right)^{1/4}.

It follows that there exists an orthogonal matrix $O\in\varmathbb R^{k\times k}$ so that

	$\displaystyle\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert$	$\displaystyle\leqslant 2\left\lVert(I-V_{k}(\overline{T})V_{k}(\overline{T})^{T})V_{k}(T)\right\rVert\quad(\text{ using \hyperref@@ii[prop:orth_basis_align]{Proposition~\ref*{prop:orth_basis_align}}})$
		$\displaystyle\leqslant\frac{4(2+\tau^{+})}{(1+\tau^{-})(1-\beta)}C_{1}(\tau^{+},\tau^{-})C_{2}^{1/4}(s,\eta,l)\left(\frac{\ln(4n/\varepsilon)}{np}\right)^{1/4}$
		$\displaystyle\leqslant\delta$

for the stated bound on $p$ . This establishes the first part of the Theorem.

In order to bound $\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert$ , we obtain from (44) that

$\displaystyle\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert$	$\displaystyle=\left\lVert P^{-1/2}(V_{k}(T)-V_{k}(\overline{T})O)+(P^{-1/2}-\overline{P}^{-1/2})V_{k}(\overline{T})O\right\rVert$
	$\displaystyle\leqslant\underbrace{\left\lVert P^{-1/2}\right\rVert}_{(\tau^{+})^{-1/2}}\underbrace{\left\lVert V_{k}(T)-V_{k}(\overline{T})O\right\rVert}_{\leqslant\delta}+\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert\underbrace{\left\lVert V_{k}(\overline{T})\right\rVert}_{=1}$
	$\displaystyle\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert.$	(48)

The term $\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert$ can be bounded as

\displaystyle\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert=\left\lVert P^{-1}(P^{1/2}-\overline{P}^{1/2})\overline{P}^{-1}\right\rVert\leqslant\frac{\left\lVert P^{1/2}-\overline{P}^{1/2}\right\rVert}{(\tau^{+})^{2}}\leqslant\frac{\left\lVert P-\overline{P}\right\rVert^{1/2}}{(\tau^{+})^{2}}\leqslant\frac{3}{(\tau^{+})^{2}}\left[\frac{\ln(4n/\varepsilon)}{np(1-l)}\right]^{1/4},

(49)

where the penultimate inequality uses Proposition C.1, and the last inequality follows from Lemma 4.11 with a minor simplification of the constant. Plugging (49) in (48) leads to the stated bound for $p\geqslant\frac{81}{(1-l)\delta^{4}}\frac{\ln(4n/\varepsilon)}{n}$ . ∎

4.5 Clustering sparse graphs

We now turn our attention to the sparse regime where $p=o(\ln n)/n$ . In this regime, Lemma 4.11 is no longer applicable since it requires $p=\Omega\left(\frac{\ln n}{n}\right)$ . In fact, it is not difficult to see that the matrices $L^{\pm}_{sym}$ will not concentrate around $\overline{L^{\pm}_{sym}}$ in this sparsity regime. To circumvent this issue, we will aim to show that the normalized Laplacian $L^{\pm}_{sym,\gamma^{\pm}}$ corresponding to the regularized adjacencies $A_{\gamma^{\pm}}^{\pm}:=A^{\pm}+\frac{\gamma^{\pm}}{n}\mathds{1}\mathds{1}^{\top}$ concentrate around $\overline{L^{\pm}_{sym}}$ , for carefully chosen values of $\gamma^{+},\gamma^{-}$ .

To show this, we rely on the following theorem from [LLV17], which states that the symmetric Laplacian $L_{sym,\gamma}$ of the regularized adjacency matrix $A_{\gamma}:=A+\frac{\gamma}{n}\mathds{1}\mathds{1}^{\top}$ is close to the symmetric Laplacian $\overline{L_{sym,\gamma}}$ of the expected regularized adjacency matrix, for inhomogeneous Erdős-Rényi graphs.

Theorem 4.14 (Theorem 4.1 of [LLV17]).

Consider a random graph from the inhomogeneous Erdős-Rényi model ( $G=(n,p_{ij})$ ), and let $d=\max_{p_{ij}}np_{ij}$ . Choose a number $\gamma>0$ . Then, for any $r\geqslant 1$ , $C$ being an absolute constant, with probability at least $1-e^{-r}$

\left\lVert L_{sym,\gamma}-\overline{L_{sym,\gamma}}\right\rVert\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{d}{\gamma}\right)^{5/2}\,.

(50)

The above result leads to a bound on the distance between $L_{sym,\gamma}$ and the normalized Laplacian $\overline{L_{sym}}$ of the expected (un-regularized) adjacency matrix.

Theorem 4.15 (Concentration of Regularized Laplacians).

Consider a random graph from the inhomogeneous Erdős-Rényi model ( $G=(n,p_{ij})$ ), and let $d=\max_{p_{ij}}np_{ij}$ , $d_{\min}=\min_{i}\sum_{j}p_{ij}$ . Choose a number $\gamma>0$ . Then, for any $r\geqslant 1$ , $C$ being an absolute constant, with probability at least $1-e^{-r}$

\left\lVert L_{sym,\gamma}-\overline{L_{sym}}\right\rVert\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{d}{\gamma}\right)^{5/2}+3\sqrt{\frac{\gamma}{d_{\min}+\gamma}}\,.

(51)

Proof.

To establish the above lemma we make use of triangle inequality, where we use the fact that $\left\lVert L_{sym,\gamma}-\overline{L_{sym}}\right\rVert\leqslant\left\lVert L_{sym,\gamma}-\overline{L_{sym,\gamma}}\right\rVert+\left\lVert\overline{L_{sym,\gamma}}-\overline{L_{sym}}\right\rVert$ . We know the bound on the first term on the RHS from Lemma 4.14 (which holds with probability $1-e^{-r}$ ). To bound the second term on the RHS, note that

	$\displaystyle\left\lVert\overline{L_{sym,\gamma}}-\overline{L_{sym}}\right\rVert$	$\displaystyle=\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}_{\gamma}\overline{D}_{\gamma}^{-1/2}\right\rVert$
		$\displaystyle=\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}+\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}_{\gamma}\overline{D}_{\gamma}^{-1/2}\right\rVert$
		$\displaystyle\leqslant\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}\right\rVert+\left\lVert\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}_{\gamma}\overline{D}_{\gamma}^{-1/2}\right\rVert\,.$

The second term of the inequality can be easily bounded as follows.

\left\lVert\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}_{\gamma}\overline{D}_{\gamma}^{-1/2}\right\rVert\leqslant\left\lVert\overline{D}_{\gamma}^{-1/2}\right\rVert^{2}\left\lVert\overline{A}-\overline{A}_{\gamma}\right\rVert\leqslant\frac{\gamma}{d_{\min}+\gamma}\leqslant\sqrt{\frac{\gamma}{d_{\min}+\gamma}}\,.

To analyse the first term, we observe that

	$\displaystyle\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{A}\overline{D}_{\gamma}^{-1/2}\right\rVert$	$\displaystyle=\left\lVert\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}-\overline{D}_{\gamma}^{-1/2}\overline{D}^{1/2}\overline{D}^{-1/2}\overline{A}\overline{D}^{-1/2}\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2}\right\rVert$
		$\displaystyle=\left\lVert(I-\overline{L_{sym}})(I-\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2})+(I-\overline{D}_{\gamma}^{-1/2}\overline{D}^{1/2})(I-\overline{L_{sym}})\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2}\right\rVert$
		$\displaystyle\leqslant\left\lVert I-\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2}\right\rVert+\left\lVert I-\overline{D}_{\gamma}^{-1/2}\overline{D}^{1/2}\right\rVert\left\lVert\overline{D}^{1/2}\overline{D}_{\gamma}^{-1/2}\right\rVert$
		$\displaystyle\leqslant\left(1-\sqrt{\frac{d_{\min}}{d_{\min}+\gamma}}\right)+\left(1-\sqrt{\frac{d_{\min}}{d_{\min}+\gamma}}\right)$
		$\displaystyle\leqslant 2\sqrt{\frac{\gamma}{d_{\min}+\gamma}}\,,$

where in the first inequality we use the fact that $\left\lVert I-\overline{L_{sym}}\right\rVert\leqslant 1$ , and in the last inequality we use the fact that for two numbers $a,b>0$ if $a>b$ then $\sqrt{a}-\sqrt{b}\leqslant\sqrt{a-b}$ . We have all the components to plug into the triangle inequality, which yields the desired statement of the theorem. ∎

We now translate Theorem 4.15 to our setting for $G^{+},G^{-}$ and show that if $p=\Omega(1/n)$ for $n$ large enough, then for the choices $\gamma^{+},\gamma^{-}\asymp(np)^{6/7}$ , the bounds $\left\lVert L^{\pm}_{sym,\gamma^{\pm}}-\overline{L^{\pm}_{sym}}\right\rVert=O\left(\frac{1}{(np)^{1/14}}\right)$ hold with sufficiently high probability.

Lemma 4.16.

Let $n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-\eta)(1-l)}\right\}$ and $p\geqslant\frac{1}{n(1-\eta)}$ . Then for the choices $\gamma^{+},\gamma^{-}=[np(1-\eta)]^{6/7}$ , and any $r\geqslant 1$ , there exists a constant $C>0$ such that with probability at least $1-2e^{r}$ , it holds true that

	$\displaystyle\left\lVert L^{+}_{sym,\gamma^{+}}-\overline{L^{+}_{sym}}\right\rVert$	$\displaystyle\leqslant\left(2^{5/2}Cr^{2}+\frac{3\sqrt{2}}{\sqrt{s(1-2\eta)+2\eta}}\right)\frac{1}{[np(1-\eta)]^{1/14}},$		(52)
	$\displaystyle\left\lVert L^{-}_{sym,\gamma^{-}}-\overline{L^{-}_{sym}}\right\rVert$	$\displaystyle\leqslant\left(2^{5/2}Cr^{2}+\frac{6}{\sqrt{1-l}}\right)\frac{1}{[np(1-\eta)]^{1/14}}.$		(53)

Proof.

We will apply Theorem 4.15 to the subgraphs $G^{+},G^{-}$ . Let us denote $d^{\pm}$ to be the quantity $\max_{ij}np_{ij}$ , and $d_{min}^{\pm}$ to be the minimum expected degree for the positive and negative subgraphs, respectively. From the SSBM model, it can be verified that $d^{\pm}=np(1-\eta)$ . We also know that $d_{\min}^{+}=d_{s}^{+}$ and $d_{\min}^{-}=d_{l}^{-}$ , where for the stated condition on $n$ , $d_{s}^{+},d_{l}^{-}$ satisfy the bounds in (42). The latter can be written as

d_{\min}^{+}\geqslant\frac{d^{+}}{2}[s(1-2\eta)+2\eta],\qquad d_{\min}^{-}\geqslant\frac{d^{-}(1-l)}{4}.

Let us denote $C_{3}(s,\eta)=s(1-2\eta)+2\eta$ for convenience. In order to show (52), we obtain from Theorem 4.15 that, with probability at least $1-e^{-r}$ ,

\displaystyle\left\lVert L_{sym,\gamma^{+}}^{+}-\overline{L^{+}_{sym}}\right\rVert\leqslant\frac{Cr^{2}}{\sqrt{\gamma^{+}}}\left(1+\frac{d^{+}}{\gamma^{+}}\right)^{5/2}+3\sqrt{\frac{\gamma^{+}}{d_{\min}^{+}+\gamma^{+}}}\leqslant\frac{Cr^{2}}{\sqrt{\gamma^{+}}}\left(1+\frac{d^{+}}{\gamma^{+}}\right)^{5/2}+3\sqrt{\frac{\gamma^{+}}{C_{3}(s,\eta)d^{+}}},

where the last inequality uses $d_{s}^{+}+\gamma^{+}\geqslant d_{s}^{+}$ . Now note that if $\gamma^{+}\leqslant d^{+}$ , then the above bound simplifies to

\left\lVert L_{sym,\gamma^{+}}^{+}-\overline{L^{+}_{sym}}\right\rVert\leqslant\frac{2^{5/2}Cr^{2}(d^{+})^{5/2}}{(\gamma^{+})^{3}}+\frac{3\sqrt{2}}{\sqrt{C_{3}(s,\eta)}}\sqrt{\frac{\gamma^{+}}{d^{+}}}.

(54)

Choosing $\gamma^{+}$ such that $\frac{(d^{+})^{5/2}}{(\gamma^{+})^{3}}=\sqrt{\frac{\gamma^{+}}{d^{+}}}$ , or equivalently, $\gamma^{+}=(d^{+})^{6/7}$ , and plugging this in (54), we arrive at (52). Clearly, $\gamma^{+}\leqslant d^{+}$ is equivalent to the stated condition on $p$ . The bound in (53) follows in an identical manner and is omitted. ∎

We are now in a position to write the bound on $\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert$ in terms of $\left\lVert L^{\pm}_{sym,\gamma^{\pm}}-\overline{L^{\pm}_{sym}}\right\rVert$ , in a completely analogous manner to Lemma 4.12.

Lemma 4.17 (Adapting Lemma 4.12 for the sparse regime).

Let $P_{\gamma^{-}}=(L^{-}_{sym,\gamma^{-}}+\tau^{+}I)$ , $\;\overline{P}=(\overline{L^{-}_{sym}}+\tau^{+}I)$ , $\;Q_{\gamma^{+}}=(L^{+}_{sym,\gamma^{+}}+\tau^{-}I)$ , and $\;\overline{Q}=(\overline{L^{+}_{sym}}+\tau^{-}I)$ . Assume that $\left\lVert P_{\gamma^{-}}-\overline{P}\right\rVert\leqslant\Delta_{P_{\gamma^{-}}}$ , $\left\lVert Q_{\gamma^{+}}-\overline{Q}\right\rVert\leqslant\Delta_{Q_{\gamma^{+}}}$ . Then it holds true that

\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert\leqslant\frac{(\alpha_{s}^{+}+\Delta_{Q_{\gamma^{+}}})}{\tau^{+}}\left(\frac{\Delta_{P_{\gamma^{-}}}}{\tau^{+}}+2\sqrt{\frac{\Delta_{P_{\gamma^{-}}}}{\tau^{+}}}\right)+\frac{\Delta_{Q_{\gamma^{+}}}}{\tau^{+}},

where $\alpha_{s}^{+}=1+\tau^{-}+\frac{p(1-\eta)}{d_{s}^{+}}$ (see Lemma 4.1).

Next, we derive the main theorem for SPONGE_sym in the sparse regime, which is the analogue of Theorem 4.13. The first part of the Theorem remains unchanged, i.e., for $n$ large enough and $\tau^{+},\tau^{-}$ chosen suitably, we have $V_{k}(\overline{T})=\Theta R$ and $G_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R$ for a $k\times k$ rotation $R$ , and $C^{-}\succ 0$ . The remaining arguments follow the same outline of Theorem 4.13, i.e., (a) using Lemma 4.17 and Lemma 4.16 to obtain a concentration bound on $\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert$ (when $p=\Omega(1/n)$ ), and (b) using the Davis-Kahan theorem to show that the column span of $V_{k}(T_{\gamma^{+},\gamma^{-}})$ is close to $V_{k}(\overline{T})$ . The latter bound then implies that $G_{k}(T_{\gamma^{+},\gamma^{-}})$ is close (up to a rotation) to $G_{k}(\overline{T})$ , where we recall

G_{k}(\overline{T})=\overline{P}^{-1/2}V_{k}(\overline{T}),\quad G_{k}(T_{\gamma^{+},\gamma^{-}})=P_{\gamma^{-}}^{-1/2}V_{k}(T_{\gamma^{+},\gamma^{-}})

(55)

with $P_{\gamma^{-}},\overline{P}$ as defined in Lemma 4.17.

Theorem 4.18.

Assuming $n\geqslant\max\left\{\frac{2(1-\eta)}{s(1-2\eta)},\frac{2\eta}{(1-l)(1-\eta)}\right\}$ , suppose $\tau^{+}>0,\tau^{-}\geqslant 0$ are chosen to satisfy

\tau^{+}>\frac{16\eta}{\beta s(1-2\eta)},\quad\tau^{-}<\frac{\beta}{2}\left(\frac{s(1-2\eta)}{s(1-2\eta)+2\eta}\right)\min\left\{\frac{1}{4(1-\beta)},\frac{\tau^{+}}{8}\right\}

where $\beta,\eta$ satisfy one of the following conditions.

1.

$\beta=\frac{4\eta}{s(1-2\eta)+4\eta}$ and $0<\eta<\frac{1}{2}$ , or
2.

$\beta=\frac{1}{2}$ and $\eta\leqslant\frac{s}{2s+4}$ .

Then $V_{k}(\overline{T})=\Theta R$ and $G_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R$ where $R$ is a rotation matrix, and $C^{-}\succ 0$ is as defined in (18). Moreover, there exists a constant $C>0$ such that for $r\geqslant 1$ and $\delta\in(0,1)$ , if $p$ satisfies

p\geqslant\max\left\{1,\left(\frac{4C_{1}(\tau^{+},\tau^{-})(2+\tau^{+})}{3(\tau^{+})^{2}(1-\beta)(1+\tau^{-})}\right)^{28}\right\}\frac{C_{4}^{14}(r,s,\eta,l)}{\delta^{28}(1-\eta)n},

and $\gamma^{+},\gamma^{-}=[np(1-\eta)]^{6/7}$ , then with probability at least $1-2e^{-r}$ , there exists a rotation $O\in\varmathbb R^{k\times k}$ so that

\left\lVert V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O\right\rVert\leqslant\delta,\qquad\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\frac{\delta}{(\tau^{+})^{2}}.

Here, $C_{4}(r,s,\eta,l):=2^{5/2}Cr^{2}+3\sqrt{2C_{2}(s,\eta,l)}$ with $C_{2}(s,\eta,l)$ as defined in (45).

Proof.

We will first simplify the upper bound on $\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert$ in Lemma 4.17. Note that $n\geqslant\frac{2(1-\eta)}{s(1-2\eta)}$ implies $\alpha_{s}^{+}\leqslant 2+\tau^{-}$ , and moreover, we can bound $\left\lVert L^{\pm}_{sym,\gamma^{\pm}}-\overline{L^{\pm}_{sym}}\right\rVert$ uniformly (from (52), (53)) as

\left\lVert L^{\pm}_{sym,\gamma^{\pm}}-\overline{L^{\pm}_{sym}}\right\rVert\leqslant\frac{2^{5/2}Cr^{2}+3\sqrt{2C_{2}(s,\eta,l)}}{[np(1-\eta)]^{1/14}}\leqslant\frac{C_{4}(r,s,\eta,l)}{[np(1-\eta)]^{1/14}}\ (=\Delta_{P_{\gamma^{-}}},\Delta_{Q_{\gamma^{+}}}).

(56)

Note that $\Delta_{P_{\gamma^{-}}},\Delta_{Q_{\gamma^{+}}}\leqslant 1$ if $p\geqslant\frac{C_{4}^{14}(r,s,\eta,l)}{n(1-\eta)}$ . Under these considerations, the bound in Lemma 4.17 simplifies to

\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert\leqslant\frac{(3+\tau^{-})(2\sqrt{\tau^{+}}+1)+\tau^{+}}{(\tau^{+})^{2}}\max\left\{\sqrt{\Delta_{P_{\gamma^{-}}}},\sqrt{\Delta_{Q_{\gamma^{+}}}}\right\}=\frac{C_{1}(\tau^{+},\tau^{-})\sqrt{C_{4}(r,s,\eta,l)}}{3(\tau^{+})^{2}[np(1-\eta)]^{1/28}}.

Following the steps in the proof of Theorem 4.13, we observe that $\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert\leqslant\frac{1}{2}(\lambda_{n-k+1}(T_{\gamma^{+},\gamma^{-}})-\lambda_{n-k}(\overline{T}))$ is guaranteed to hold, provided

\frac{C_{1}(\tau^{+},\tau^{-})\sqrt{C_{4}(r,s,\eta,l)}}{3(\tau^{+})^{2}[np(1-\eta)]^{1/28}}\leqslant(\frac{1-\beta}{2})\left(\frac{1+\tau^{-}}{2+\tau^{+}}\right)\iff p\geqslant\left(\frac{2C_{1}(\tau^{+},\tau^{-})(2+\tau^{+})}{3(\tau^{+})^{2}(1-\beta)(1+\tau^{-})}\right)^{28}\frac{C_{4}^{14}(r,s,\eta,l)}{n(1-\eta)}.

Then, we obtain via the Davis-Kahan theorem that there exists an orthogonal matrix $O\in\varmathbb R^{k\times k}$ such that

\left\lVert V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O\right\rVert\leqslant\frac{4\left\lVert T_{\gamma^{+},\gamma^{-}}-\overline{T}\right\rVert}{\lambda_{n-k+1}(\overline{T})-\lambda_{n-k}(\overline{T})}\leqslant\frac{4C_{1}(\tau^{+},\tau^{-})\sqrt{C_{4}(r,s,\eta,l)}(2+\tau^{+})}{3(\tau^{+})^{2}[np(1-\eta)]^{1/28}(1-\beta)(1+\tau^{-})}\leqslant\delta,

for the stated bound on $p$ in the theorem. This establishes the first part of the theorem.

In order to bound $\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert$ , first observe that

$\displaystyle\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert$	$\displaystyle=\left\lVert P_{\gamma^{-}}^{-1/2}(V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O)+(P_{\gamma^{-}}^{-1/2}-\overline{P}^{-1/2})V_{k}(\overline{T})O\right\rVert$
	$\displaystyle\leqslant\underbrace{\left\lVert P_{\gamma^{-}}^{-1/2}\right\rVert}_{\leqslant(\tau^{+})^{-1/2}}\underbrace{\left\lVert V_{k}(T_{\gamma^{+},\gamma^{-}})-V_{k}(\overline{T})O\right\rVert}_{\leqslant\delta}+\left\lVert P^{-1/2}-\overline{P}^{-1/2}\right\rVert\underbrace{\left\lVert V_{k}(\overline{T})\right\rVert}_{=1}$
	$\displaystyle\leqslant\frac{\delta}{\sqrt{\tau^{+}}}+\left\lVert P_{\gamma^{-}}^{-1/2}-\overline{P}^{-1/2}\right\rVert.$	(57)

The second term $\left\lVert P_{\gamma^{-}}^{-1/2}-\overline{P}^{-1/2}\right\rVert$ can be bounded as

\displaystyle\left\lVert P_{\gamma^{-}}^{-1/2}-\overline{P}^{-1/2}\right\rVert=\left\lVert P_{\gamma^{-}}^{-1}(P_{\gamma^{-}}^{1/2}-\overline{P}^{1/2})\overline{P}^{-1}\right\rVert\leqslant\frac{\left\lVert P_{\gamma^{-}}^{1/2}-\overline{P}^{1/2}\right\rVert}{(\tau^{+})^{2}}\leqslant\frac{\left\lVert P_{\gamma^{-}}-\overline{P}\right\rVert^{1/2}}{(\tau^{+})^{2}}\leqslant\frac{\sqrt{C_{4}(r,s,\eta,l)}}{(\tau^{+})^{2}[np(1-\eta)]^{1/28}},

(58)

where the penultimate inequality uses Proposition C.1, and the last inequality uses (56). Plugging (58) into (57) leads to the stated bound for $p\geqslant\frac{C_{4}^{14}(r,s,\eta,l)}{n(1-\eta)\delta^{28}}$ . ∎

4.6 Mis-clustering rate from $k$ -means

We now analyze the mis-clustering error rate when we apply a $(1+\xi)$ -approximate $k$ -means algorithm (e.g., [KSS04]) on the rows of $G_{k}(T)$ (respectively, $G_{k}(T_{\gamma^{+},\gamma^{-}})$ in the sparse regime). To this end, we rely on the following result from [LR15], which when applied to our setting, yields that the mis-clustering error is bounded by the estimation error $\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert_{F}^{2}$ (or $\left\lVert G_{k}(T_{\gamma^{+},\gamma^{-}})-G_{k}(\overline{T})O\right\rVert_{F}^{2}$ in the sparse setting). By an $(1+\xi)$ -approximate algorithm, we mean an algorithm that is provably within an $(1+\xi)$ factor of the cost of the optimal solution achieved by $k$ -means.

Lemma 4.19 (Lemma 5.3 of [LR15], Approximate $k$ -means error bound).

For any $\xi>0$ , and any two matrices $\overline{U},U$ , such that $\overline{U}=\overline{\Theta}\overline{X}$ with $(\overline{\Theta},\overline{X})\in\varmathbb{M}_{n\times k}\times\varmathbb R^{k\times k}$ , let $(\tilde{\Theta},\tilde{X})\in\varmathbb{M}_{n\times k}\times\varmathbb R^{k\times k}$ be a $(1+\xi)$ -approximate solution to the $k$ -means problem $\min_{\Theta\in\varmathbb{M}_{n\times k},X\in\varmathbb R^{k\times k}}\left\lVert\Theta X-U\right\rVert_{F}^{2}$ so that

\left\lVert\tilde{\Theta}\tilde{X}-U\right\rVert_{F}^{2}\leqslant(1+\xi)\min_{\Theta\in\varmathbb{M}_{n\times k},X\in\varmathbb R^{k\times k}}\left\lVert\Theta X-U\right\rVert_{F}^{2}

and $\tilde{U}=\tilde{\Theta}\tilde{X}$ . For any $\delta_{i}\leqslant\min_{i^{\prime}\neq i}\left\lVert\overline{X}_{i^{\prime}*}-\overline{X}_{i*}\right\rVert$ , define $S_{i}=\left\{j\in C_{i}~:~\left\lVert\tilde{U}_{j*}-\overline{U}_{j*}\right\rVert\geqslant\delta_{i}/2\right\}$ then

\sum_{i=1}^{k}\left\lvert S_{i}\right\rvert\delta_{i}^{2}\leqslant 4(4+2\xi)\left\lVert U-\overline{U}\right\rVert_{F}^{2}\,.

(59)

Moreover, if

(16+8\xi)\left\lVert U-\overline{U}\right\rVert_{F}^{2}/\delta_{i}^{2}<n_{i}\qquad\forall i\in[k]\,,

(60)

then there exists a $k\times k$ permutation matrix $\pi$ such that $\tilde{\Theta}_{G}=\overline{\Theta}_{G}\pi$ , where $G=\cup_{i=1}^{k}(C_{i}\setminus S_{i})$ .

Combining Lemma 4.19 with the perturbation results of Theorem 4.13 and Theorem 4.18, we readily arrive at mis-clustering error bounds for SPONGE_sym.

Theorem 4.20 (Mis-clustering error for SPONGE_sym).

Under the notation and assumptions of Theorem 4.13, let $(\tilde{\Theta},\tilde{X})\in\varmathbb{M}_{n\times k}\times\varmathbb R^{k\times k}$ be a $(1+\xi)$ -approximate solution to the $k$ -means problem $\min_{\Theta\in\varmathbb{M}_{n\times k},X\in\varmathbb R^{k\times k}}\left\lVert\Theta X-G_{k}(T)\right\rVert_{F}^{2}$ . Denoting

S_{i}=\left\{j\in C_{i}\ :\ \left\lVert(\tilde{\Theta}\tilde{X})_{j*}-(\Theta(C^{-})^{-1/2}RO)_{j*}\right\rVert\geqslant\frac{1}{2\sqrt{n_{i}(\tau^{+}+\frac{2}{1-l})}}\right\},

it holds with probability at least $1-2\varepsilon$ that

\sum_{i=1}^{k}\frac{\left\lvert S_{i}\right\rvert}{n_{i}}\leqslant\delta^{2}{(64+32\xi)k}\left(\tau^{+}+\frac{2}{1-l}\right)\left(\frac{(\tau^{+})^{3}+1}{(\tau^{+})^{4}}\right).

In particular, if $\delta$ satisfies

\delta<\frac{(\tau^{+})^{2}}{\sqrt{(64+32\xi)k(\tau^{+}+\frac{2}{1-l})((\tau^{+})^{3}+1)}},

then there exists a $k\times k$ permutation matrix $\pi$ such that $\tilde{\Theta}_{G}=\hat{\Theta}_{G}\pi$ , where $G=\cup_{i=1}^{k}(C_{i}\setminus S_{i})$ .

In the sparse regime, the above statement holds under the notation and assumptions of Theorem 4.18 with $G_{k}(T)$ replaced with $G_{k}(T_{\gamma^{+},\gamma^{-}})$ , and with probability at least $1-2e^{-r}$ .

Proof.

Since $G_{k}(T)-G_{k}(\overline{T})O$ has rank at most $2k$ , we obtain from Theorem 4.13 that

\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert_{F}\leqslant\sqrt{2k}\left\lVert G_{k}(T)-G_{k}(\overline{T})O\right\rVert\leqslant\delta\sqrt{2k}\left(\frac{(\tau^{+})^{3/2}+1}{(\tau^{+})^{2}}\right).

(61)

We now use Lemma 4.19 with $U=G_{k}(T)$ and $\overline{U}=G_{k}(\overline{T})O$ . It follows from (44) and Lemma 4.1 that $G_{k}(\overline{T})=\Theta(C^{-})^{-1/2}R=\Theta\Delta~\Delta^{-1}(C^{-})^{-1/2}R$ where $\Delta=\textrm{diag}(\sqrt{n_{1}},\dots,\sqrt{n_{k}})$ . Denoting $\overline{X}=\Delta^{-1}(C^{-})^{-1/2}RO$ , we can write $G_{k}(\overline{T})O=\hat{\Theta}\overline{X}$ , where $\hat{\Theta}\in\varmathbb{M}_{n\times k}$ is the ground truth membership matrix, and for each $i\neq i^{\prime}\in[k]$ , it holds true that

\left\lVert\overline{X}_{i*}-\overline{X}_{i^{\prime}*}\right\rVert\geqslant\lambda_{\min}((C^{-})^{-1/2})\sqrt{1/n_{i}+1/n_{i^{\prime}}}\geqslant\frac{1}{\sqrt{\lambda_{\max}(C^{-})n_{i}}}\,.

From (18), one can verify using Weyl’s inequality that

\lambda_{\max}(C^{-})\leqslant 1+\tau^{+}+\max_{i}\frac{p}{d_{i}^{-}}(\eta_{i}+s_{i}n(1-2\eta))\leqslant\tau^{+}+\frac{2}{1-l},

where the last inequality holds if $n\geqslant\frac{2\eta}{(1-l)(1-\eta)}$ . The above considerations imply that $\delta_{i}=\frac{1}{\sqrt{n_{i}(\tau^{+}+\frac{2}{1-l})}}$ . Now with $S_{i}$ as defined in the statement, we obtain from (59) and (61) that

\displaystyle\sum_{i=1}^{k}\left\lvert S_{i}\right\rvert\delta_{i}^{2}=\frac{1}{\tau^{+}+\frac{2}{1-l}}\sum_{i=1}^{k}\frac{\left\lvert S_{i}\right\rvert}{n_{i}}\leqslant\delta^{2}(32+16\xi)k\frac{((\tau^{+})^{3/2}+1)^{2}}{(\tau^{+})^{4}}\leqslant\delta^{2}(64+32\xi)k\left(\frac{(\tau^{+})^{3}+1}{(\tau^{+})^{4}}\right),

where the last inequality uses $(a+b)^{2}\leqslant 2(a^{2}+b^{2})$ for $a,b\geqslant 0$ . This yields the first part of the Theorem.

For the second part, we need to ensure (60) holds. Using (61) and the expression for $\delta_{i}$ , it is easy to verify that (60) holds for the stated condition on $\delta$ .

Finally, the statement for the sparse regime readily follows in an analogous manner (replacing $G_{k}(T)$ with $G_{k}(T_{\gamma^{+},\gamma^{-}})$ ), by following the same steps as above. ∎

5 Concentration results for the symmetric Signed Laplacian

This section contains proofs of the main results for the symmetric Signed Laplacian, in both the dense regime $p\gtrsim\frac{\ln n}{n}$ and the sparse regime $p\gtrsim\frac{1}{n}$ . Before proceeding with an overview of the main steps, for ease of reference, we summarize in the Table below the notation specific to this section.

Notation	Description
$\overline{L_{sym}}$	symmetric Signed Laplacian
$\mathcal{L}_{sym}$	population Signed Laplacian
$L_{\gamma}$	regularized Laplacian
$\mathcal{L}_{\gamma}$	population regularized Laplacian
$\gamma^{+},\gamma^{-}>0$	regularization parameters
$\gamma=\gamma^{+}+\gamma^{-}$
$\overline{\alpha}=1+\frac{p}{\overline{d}}(1-2\eta)$
$\bar{d}=p(n-1)$	expected signed degree
$\rho=\frac{n_{min}}{n_{max}}=\frac{s}{l}$	aspect ratio

The proof of Theorem 3.4 is built on the following steps. In Section 5.1, we compute the eigen-decomposition of the Signed Laplacian of the expected graph $\mathcal{L}_{sym}$ . Then in Section 5.2, we show $\overline{L_{sym}}$ and $\mathcal{L}_{sym}$ are “close”, and obtain an upper bound on the error $\left\lVert\overline{L_{sym}}-\mathcal{L}_{sym}\right\rVert$ . Finally, in Section 5.3, we use the Davis-Kahan theorem (see Theorem B.2) to bound the error between the subspaces $V_{k-1}(\overline{L_{sym}})$ and $V_{k-1}(\mathcal{L}_{sym})$ . To prove Theorem 3.7, in Section 5.4, we first use a decomposition of the set of edges $[n]\times[n]$ and characterize the behaviour of the regularized Signed Laplacian on each subset. This leads in Section 5.5 to the error bounds of Theorem 3.7. Finally, the proof of Theorem 3.9, that bound the error on the eigenspace, relies on the same arguments as Theorem 3.4 and can be found in Section 5.6. Similarly to the approach for SPONGE_sym, the mis-clustering error is obtained using a ( $1+\xi$ )-approximate solution of the $k$ -means problem applied to the rows of $V_{k-1}(\overline{L_{sym}})$ (resp. $V_{k-1}(L_{\gamma})$ ). This solution contains, in particular, an estimated membership matrix $\tilde{\Theta}$ . The bound on the mis-clustering error of the algorithm given in Theorem 3.12 is derived using Lemma 4.19 (Lemma 5.3 of [LR15]), in Section 5.7.

5.1 Analysis of the expected Signed Laplacian

In this section, we compute the eigen-decomposition of the matrix $\mathcal{L}_{sym}$ . In particular, we aim at proving a lower bound on the eigengap between the $(k-1)^{th}$ and $k^{th}$ smallest eigenvalues. For equal-size clusters, there is an explicit expression for this eigengap.

5.1.1 Matrix decomposition

Lemma 5.1.

Let $\Theta\in\varmathbb{R}^{n\times k}$ denote the normalized membership matrix in the SSBM. Let $V^{\perp}\in\varmathbb{R}^{n\times(n-k)}$ be a matrix whose columns are any orthonormal base of the subspace orthogonal to $\mathcal{R}(\Theta)$ . The Signed Laplacian of the expected graph has the following decomposition

\mathcal{L}_{sym}=[\Theta\>V^{\perp}]\begin{pmatrix}\bar{C}&0\\ 0&\overline{\alpha}I_{n-k}\end{pmatrix}\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix},

(62)

with $\overline{C}=\bar{\alpha}I_{k}-\overline{B}$ , $\overline{\alpha}=1+\frac{p}{\overline{d}}(1-2\eta)$ and $\overline{B}$ is a $k\times k$ matrix such that

\hskip-8.53581pt\bar{B}_{ii^{\prime}}=\left\{\begin{array}[]{rl}\frac{n_{i}p}{\overline{d}}(1-2\eta);&\text{ if }i=i^{\prime}\\ -\frac{\sqrt{n_{i}n_{i^{\prime}}}p}{\overline{d}}(1-2\eta);&\text{ if }i\neq i^{\prime}.\end{array}\right.

(63)

Proof.

On one hand, we recall from Section 2.3 that the expected degree matrix is a scaled identity matrix $\varmathbb{E}[\bar{D}]=\bar{d}I_{n}$ , with $\overline{d}=p(n-1)$ . Thus, any vector $v\in\varmathbb R^{n}$ is an eigenvector of $\varmathbb{E}[\bar{D}]$ with corresponding eigenvalue $\overline{d}$ , and it holds true that

\displaystyle\varmathbb{E}[\bar{D}]^{-1/2}

\displaystyle=\frac{1}{\sqrt{\bar{d}}}I_{n}=\frac{1}{\sqrt{\bar{d}}}[\Theta\>(V^{\perp})]\>I_{n}\>\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix}.

(64)

On the other hand, the signed adjacency matrix can be written in the form

\displaystyle\varmathbb{E}[A]

\displaystyle=\varmathbb{E}[A^{+}]-\varmathbb{E}[A^{-}]=M-p(1-2\eta)I_{n},

(65)

where

\displaystyle M

\displaystyle=\begin{bmatrix}p(1-2\eta)J_{n_{1}}&-p(1-2\eta)J_{n_{1}\times n_{2}}&\ldots&-p(1-2\eta)J_{n_{1}\times n_{k}}\\ -p(1-2\eta)J_{n_{2}\times n_{1}}&p(1-2\eta)J_{n_{2}}&\ldots&-p(1-2\eta)J_{n_{2}\times n_{k}}\\ \vdots&\vdots&\ddots&\vdots\\ -p(1-2\eta)J_{n_{k}\times n_{1}}&\ldots&\ldots&p(1-2\eta)J_{n_{k}}\end{bmatrix}.

The matrix $M$ has the following decomposition

\displaystyle M

\displaystyle=\overline{d}\Theta\overline{B}\Theta^{T}=\overline{d}[\Theta\>V^{\perp}]\begin{pmatrix}\overline{B}&0\\ 0&0\end{pmatrix}\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix},

with $\overline{B}$ defined in (63). Thus, combining (64) and (65), we arrive at

\displaystyle\varmathbb{E}[\bar{D}]^{-1/2}\varmathbb{E}[A]\varmathbb{E}[\bar{D}]^{-1/2}=\frac{1}{\bar{d}}M-p(1-2\eta)\frac{1}{\bar{d}}I_{n}=[\Theta\>V^{\perp}]\begin{pmatrix}\overline{B}&0\\ 0&0\end{pmatrix}\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix}-(1-2\eta)\frac{p}{\bar{d}}I_{n}.

This finally leads to the decomposition of $\mathcal{L}_{sym}$

\mathcal{L}_{sym}=I-\varmathbb{E}[\bar{D}]^{-1/2}\varmathbb{E}[A]\varmathbb{E}[\bar{D}]^{-1/2}=[\Theta\>V^{\perp}]\begin{pmatrix}\bar{C}&0\\ 0&\overline{\alpha}I_{n-k}\end{pmatrix}\begin{bmatrix}\Theta^{T}\\ (V^{\perp})^{T}\end{bmatrix},

with $\overline{C}=\bar{\alpha}I_{k}-\overline{B}$ and $\overline{\alpha}=1+p(1-2\eta)$ . ∎

We can infer from Lemma 5.1 that the spectrum of $\mathcal{L}_{sym}$ is the union of the spectrum of the matrix $\bar{C}\in\varmathbb R^{k\times k}$ and $\{\overline{\alpha}\}$ . Moreover, denoting $u=\frac{1}{\sqrt{\overline{d}}}(\sqrt{n_{1}},\dots,\sqrt{n_{k}})^{T}$ , we have $\overline{C}=p(1-2\eta)uu^{T}+\text{diag}\left(1+\frac{p}{\overline{d}}(1-2\eta)(1-2n_{i})\right)$ . For a SSBM with equal-size clusters, we are able to find explicit expressions for the eigenvalues of $\overline{C}$ .

5.1.2 Spectrum of the Signed Laplacian: equal-size clusters

In this section, we assume that the clusters in the SSBM have equal sizes $n_{1}=n_{2}=\dots=n_{k}=\frac{n}{k}$ . In this case,

\displaystyle\frac{1}{\sqrt{\overline{d}}}(\sqrt{n_{1}},\dots,\sqrt{n_{k}})^{T}=\sqrt{\frac{n}{\overline{d}}}\chi_{1},

and denoting by $\overline{C}_{e}$ the matrix $\overline{C}$ in this setting of equal clusters, we may write

\displaystyle\overline{C}_{e}

\displaystyle=\frac{np}{\overline{d}}(1-2\eta)\chi_{1}\chi_{1}^{T}+\left(1+\frac{p}{\overline{d}}(1-2\eta)\bigg{(}1-2\frac{n}{k}\bigg{)}\right)I_{k}.

(66)

Hence, the spectrum of $\overline{C}_{e}$ contains only two different values. The largest one has multiplicity 1, and $\chi_{1}$ is the corresponding largest eigenvector. The $k-1$ remaining eigenvalues are all equal. In fact, we have

\displaystyle\lambda_{i}(\overline{C}_{e})

\displaystyle=\begin{cases}1+\frac{p}{\overline{d}}(1-2\eta)(n+1-2\frac{n}{k});&\text{if }i=1\\ 1+\frac{p}{\overline{d}}(1-2\eta)\bigg{(}1-2\frac{n}{k}\bigg{)};&\text{if }2\leqslant i\leqslant k.\end{cases}

One can easily check that these eigenvalues are positive, and that the following inequality holds true

\displaystyle\lambda_{1}(\overline{C}_{e})=\overline{\alpha}+\frac{p}{\overline{d}}(1-2\eta)(n-2\frac{n}{k})\geqslant\overline{\alpha}>\overline{\alpha}-2\frac{n}{k}(1-2\eta)=\lambda_{2}(\overline{C}_{e}).

We finally have

\displaystyle\lambda_{j}(\mathcal{L}_{sym})

\displaystyle=\begin{cases}1+\frac{p}{\overline{d}}(1-2\eta)(n+1-2\frac{n}{k});&\text{if }j=1\\ \overline{\alpha};&\text{if }2\leqslant j\leqslant n-k+1\\ \lambda_{2}(\overline{C}_{e});&\text{if }n-k+2\leqslant j\leqslant n.\end{cases}

Note that for $k=2$ , $\lambda_{1}(\overline{C}_{e})=\overline{\alpha}$ and the spectrum of $\mathcal{L}_{sym}$ contains only two values $\{\overline{\alpha},\lambda_{2}(\overline{C}_{e})\}$ . For $k>2$ , $\lambda_{1}(\mathcal{L}_{sym})>\overline{\alpha}>\lambda_{2}(\overline{C}_{e})$ . Writing the spectral decomposition

\overline{C}_{e}=R\>\Lambda\>R^{T}=[R_{k-1}\>\gamma_{1}]\>\Lambda\>\begin{bmatrix}R_{k-1}^{T}\\ \gamma_{1}^{T}\end{bmatrix},

with $\gamma_{1}=\chi_{1}$ and $R_{k-1}\in\varmathbb R^{k\times(k-1)}$ being the matrix of eigenvectors associated to $\lambda_{2}(\overline{C}_{e})$ , we conclude that $V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}$ . In fact, since $\Theta$ has $k$ distinct rows and $R$ is a unitary matrix, $\Theta R$ also has $k$ distinct rows. As $\chi_{1}$ is the all one’s vector , $\Theta R_{k-1}$ has $k$ distinct rows as well. These observations are summarized in the following lemma and lead to the expression of the eigengap.

Lemma 5.2 (Eigengap for equal-size clusters).

For the SSBM with $k\geqslant 2$ clusters of equal-size $\frac{n}{k}$ , we have that $V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}\in\varmathbb R^{n\times(k-1)}$ , where $R_{k-1}$ corresponds to the $(k-1)$ smallest eigenvectors of $\overline{C}_{e}$ . Moreover, with the eigengap defined as

\lambda_{gap}:=\lambda_{n-k+1}(\mathcal{L}_{sym})-\lambda_{n-k+2}(\mathcal{L}_{sym}),

it holds true that

\displaystyle\lambda_{gap}=\overline{\alpha}-\lambda_{2}(\overline{C}_{e})=\frac{2np}{k\overline{d}}(1-2\eta)\geqslant\frac{2}{k}(1-2\eta).

(67)

5.1.3 Non-equal-size clusters

In the general setting of non-equal-size clusters, it is difficult to obtain an explicit expression of the spectrum of $\mathcal{L}_{sym}$ . Thus, using a perturbation method, we establish a lower bound on the eigengap, provided that the aspect ratio $\rho$ is close to 1. Recall that

	$\displaystyle\overline{C}$	$\displaystyle=p(1-2\eta)uu^{T}+\text{diag}\left(1+\frac{p}{\overline{d}}(1-2\eta)(1-2n_{i})\right)$
		$\displaystyle=p(1-2\eta)uu^{T}-2p(1-2\eta)\text{diag}(u_{i}^{2})_{i=1}^{n}+\text{diag}\left(1+\frac{p}{\overline{d}}(1-2\eta)\right).$		(68)

We note that this matrix is of the form $\Lambda+vv^{T}$ , with $\Lambda$ being a diagonal matrix and $v\in\varmathbb{R}^{k}$ a vector. Using again the spectral decomposition

\overline{C}=R\>\Lambda\>R^{T}=[R_{k-1}\>\gamma_{1}]\>\Lambda\>\begin{bmatrix}R_{k-1}^{T}\\ \gamma_{1}^{T}\end{bmatrix},

(69)

where $\gamma_{1}$ is the largest eigenvector and $R_{k-1}\in\varmathbb R^{k\times(k-1)}$ contains the smallest $(k-1)$ eigenvectors of $\overline{C}$ , we would like to ensure that the smallest $(k-1)$ eigenvectors of $\mathcal{L}_{sym}$ are related to the $(k-1)$ eigenvectors of $\overline{C}$ in the following way $V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}$ . Note that $\gamma_{1}$ is not necessarily the all one’s vector, and $\Theta R_{k-1}$ has at least $k-1$ distinct rows. To this end, we will like to ensure that

\{\lambda_{2}(\overline{C}),\dots,\lambda_{k-1}(\overline{C}),\lambda_{k}(\overline{C})\}=\{\lambda_{n-k+2}(\mathcal{L}_{sym}),\dots,\lambda_{n-1}(\mathcal{L}_{sym}),\lambda_{n}(\mathcal{L}_{sym})\}.

(70)

From Weyl’s inequality (see Theorem B.1), we know that

\displaystyle|\lambda_{i}(\overline{C}_{e})-\lambda_{i}(\overline{C})|\leqslant\|\overline{C}-\overline{C}_{e}\|\quad\forall i=1,\dots k,

which in particular implies

\displaystyle\lambda_{2}(\overline{C})\leqslant\lambda_{2}(\overline{C}_{e})+\|\overline{C}-\overline{C}_{e}\|,\qquad\lambda_{1}(\overline{C})\geqslant\lambda_{1}(\overline{C}_{e})-\|\overline{C}-\overline{C}_{e}\|.

Moreover, $\lambda_{1}(\overline{C})=\overline{\alpha}$ when $k=2$ , and $\lambda_{1}(\overline{C})>\overline{\alpha}$ when $k>2$ . Thus, for Condition 70 to be true, it suffices to ensure

	$\displaystyle\lambda_{2}(\overline{C}_{e})+\\|\overline{C}-\overline{C}_{e}\\|<\overline{\alpha}+\\|\overline{C}-\overline{C}_{e}\\|$	$\displaystyle\iff\\|\overline{C}-\overline{C}_{e}\\|<\frac{\overline{\alpha}-\lambda_{2}(\overline{C}_{e})}{2}$
		$\displaystyle\iff\\|\overline{C}-\overline{C}_{e}\\|<\frac{np}{k\overline{d}}(1-2\eta),$

using (67). In this case, we indeed have that $V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}$ . As it will be convenient later, we will ensure a slightly stronger condition, i.e.

\displaystyle\|\overline{C}-\overline{C}_{e}\|<\frac{\overline{\alpha}-\lambda_{2}(\overline{C}_{e})}{4}=\frac{np}{2k\overline{d}}(1-2\eta).

(71)

Now we compute the error $\|\overline{C}-\overline{C}_{e}\|$ . We recall that $\|u\|=\sqrt{\frac{n}{\overline{d}}}$ and denote $D_{u}=:\frac{1}{\|u\|^{2}}\text{diag}(u_{i}^{2})_{i=1}^{n}$ , then (68) becomes

\displaystyle\overline{C}

\displaystyle=\overline{\alpha}I_{k}+\frac{np}{\overline{d}}(1-2\eta)\bigg{(}\frac{u}{\|u\|}\bigg{)}\bigg{(}\frac{u}{\|u\|}\bigg{)}^{T}-2\frac{np}{\overline{d}}(1-2\eta)D_{u}.

Using (66), we obtain

	$\displaystyle\\|\overline{C}-\overline{C}_{e}\\|$	$\displaystyle=\left\\|\frac{np}{\overline{d}}(1-2\eta)\left(\left(\frac{u}{\\|u\\|}\right)\left(\frac{u}{\\|u\\|}\right)^{T}-\chi_{1}\chi_{1}^{T}\right)-2\frac{np}{\overline{d}}(1-2\eta)\left(D_{u}-\frac{1}{k}I_{n}\right)\right\\|$
		$\displaystyle\leqslant\frac{np}{\overline{d}}(1-2\eta)\left\\|\left(\frac{u}{\\|u\\|}\right)\left(\frac{u}{\\|u\\|}\right)^{T}-\chi_{1}\chi_{1}^{T}\right\\|+2\frac{np}{\overline{d}}(1-2\eta)\left\\|D_{u}-\frac{1}{k}I_{n}\right\\|.$		(72)

For the first term on the RHS, we have

	$\displaystyle\left\\|\left(\frac{u}{\\|u\\|}\right)\left(\frac{u}{\\|u\\|}\right)^{T}-\chi_{1}\chi_{1}^{T}\right\\|$	$\displaystyle\leqslant 2\left\\|\frac{u}{\\|u\\|}-\chi_{1}\right\\|\leqslant 2\sqrt{k}\max_{i}\left\|\sqrt{\frac{n_{i}}{n}}-\sqrt{\frac{1}{k}}\right\|$
		$\displaystyle\leqslant 2\sqrt{k}(\sqrt{l}-\sqrt{s})\leqslant 2\sqrt{k}(1-\sqrt{\rho}),$		(73)

while for the second term on the RHS, we have

\displaystyle\left\|D_{u}-\frac{1}{k}I_{n}\right\|

\displaystyle=\max_{i}\left|\sqrt{\frac{n_{i}}{n}}-\sqrt{\frac{1}{k}}\right|\leqslant 1-\sqrt{\rho}.

(74)

By combining (73) and (74) into (72), we arrive at

	$\displaystyle\\|\overline{C}-\overline{C}_{e}\\|$	$\displaystyle\leqslant\frac{np}{\overline{d}}(1-2\eta)\sqrt{k}(1-\sqrt{\rho})+\frac{2np}{\overline{d}}(1-2\eta)(1-\sqrt{\rho})$
		$\displaystyle\leqslant\frac{np}{\overline{d}}(1-2\eta)(1-\sqrt{\rho})\left(\sqrt{k}+2\right)$
		$\displaystyle\leqslant 2(2+\sqrt{k})(1-2\eta)(1-\sqrt{\rho}),$

using that $\frac{np}{\overline{d}}=\frac{n}{n-1}\leqslant 2$ . Now since $\frac{np}{2k\overline{d}}\geqslant\frac{1-2\eta}{2k}$ and from Condition 71, it suffices that $\rho$ satisfies

\displaystyle 2(2+\sqrt{k})(1-2\eta)(1-\sqrt{\rho})

\displaystyle\leqslant\frac{1-2\eta}{2k}\iff 1-\sqrt{\rho}\leqslant\frac{1}{4k(2+\sqrt{k})}.

Finally, we can compute

	$\displaystyle\lambda_{gap}$	$\displaystyle:=\lambda_{n-k+1}(\mathcal{L}_{sym})-\lambda_{n-k+2}(\mathcal{L}_{sym})$
		$\displaystyle\geqslant\overline{\alpha}-\\|\overline{C}-\overline{C}_{e}\\|-(\lambda_{2}(\overline{C}_{e})+\\|\overline{C}-\overline{C}_{e}\\|)$
		$\displaystyle\geqslant\overline{\alpha}-\lambda_{2}(\overline{C}_{e})-2\\|\overline{C}-\overline{C}_{e}\\|$
		$\displaystyle\geqslant\frac{\overline{\alpha}-\lambda_{2}(\overline{C}_{e})}{2}=\frac{np}{k\overline{d}}(1-2\eta)\geqslant\frac{1-2\eta}{k}.$

Hence we arrive at the following lemma.

Lemma 5.3 (General lower-bound on the eigengap).

For a SSBM with $k\geqslant 2$ clusters of general sizes $(n_{1},\dots,n_{k})$ and aspect ratio $\rho$ satisfying

\displaystyle\sqrt{\rho}>1-\frac{1}{4k(2+\sqrt{k})},

it holds true that $V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}$ , where $R_{k-1}\in\varmathbb R^{k\times k-1}$ corresponds to the $(k-1)$ smallest eigenvectors of $\overline{C}$ . Furthermore, we can lower-bound the spectral gap $\lambda_{gap}$ as

\displaystyle\lambda_{gap}:=\lambda_{n-k+1}(\mathcal{L}_{sym})-\lambda_{n-k+2}(\mathcal{L}_{sym})\geqslant\frac{1-2\eta}{k}.

We will now show that $\overline{L_{sym}}$ concentrates around the population Laplacian $\mathcal{L}_{sym}$ , provided the graph is dense enough.

5.2 Concentration of the Signed Laplacian in the dense regime

In the moderately dense regime where $p\gtrsim\frac{\ln n}{n}$ , the adjacency and the degree matrices concentrate towards their expected counterparts, as $n$ increases. This can be established using standard concentration tools from the literature.

Lemma 5.4.

We have the following concentration inequalities for $A$ and $\overline{D}$

$\forall 0<\varepsilon\leqslant\frac{1}{2},\exists c_{\varepsilon}>0$ ,

\varmathbb{P}\bigg{(}\|A-\varmathbb{E}[A]\|\leqslant((1+\varepsilon)4\sqrt{2}+2)\sqrt{np}\bigg{)}\geqslant 1-n\exp\bigg{(}-\frac{np}{c_{\varepsilon}}\bigg{)}.

In particular, there exists a universal constant $c>0$ such that

\varmathbb{P}\bigg{(}\|A-\varmathbb{E}[A]\|\leqslant 12\sqrt{np}\bigg{)}\geqslant 1-n\exp\bigg{(}-\frac{np}{c}\bigg{)}.

If $p>12\frac{\ln n}{n}$ ,

\varmathbb{P}\bigg{(}\|\overline{D}-\varmathbb{E}[\overline{D}]\|\leqslant\sqrt{3np\ln n}\bigg{)}\geqslant 1-\frac{2}{n}.

Proof.

For the first statement, we recall that $A$ is a symmetric matrix, with $A_{jj^{\prime}}=0$ and with independent entries above the diagonal $(A_{jj^{\prime}})_{j<j^{\prime}}$ . We denote $Z_{jj^{\prime}}=A_{jj^{\prime}}-\operatorname*{\varmathbb{E}}[A_{jj^{\prime}}]$ . If $j,j^{\prime}$ lie in the same cluster,

Z_{jj^{\prime}}=\left\{\begin{array}[]{rl}1-p(1-2\eta)\quad;&\text{w. p. }p(1-\eta)\\ -1-p(1-2\eta)\quad;&\text{w. p. }p\eta\\ -p(1-2\eta)\quad;&\text{w. p. }1-p\\ \end{array}\right..

If $j,j^{\prime}$ lie in different clusters,

Z_{jj^{\prime}}=\left\{\begin{array}[]{rl}1+p(1-2\eta)\quad;&\text{w. p. }p\eta\\ -1+p(1-2\eta)\quad;&\text{w. p. }p(1-\eta)\\ p(1-2\eta)\quad;&\text{w. p. }1-p\\ \end{array}\right..

One can easily check that in both cases, it holds true that

	$\displaystyle\operatorname*{\varmathbb{E}}[(Z_{jj^{\prime}})^{2}]$	$\displaystyle=p\big{[}(1-\eta)(1-p(1-2\eta))^{2}+\eta(1+p(1-2\eta))^{2}+p(1-2\eta)^{2})(1-p)\big{]}$
		$\displaystyle\leqslant p(1+\eta(1+p)^{2}+p)\leqslant 4p.$

Thus we can conclude that for each $j\in[n]$ , the following holds

\displaystyle\sqrt{\sum_{j^{\prime}=1}^{n}\operatorname*{\varmathbb{E}}[(Z_{jj^{\prime}})^{2}]}

\displaystyle\leqslant\sqrt{4np}=2\sqrt{np}.

Hence, $\widetilde{\sigma}:=\max_{j}\sqrt{\sum_{j^{\prime}=1}^{n}\operatorname*{\varmathbb{E}}[(Z_{jj^{\prime}})^{2}\ ]}\leqslant 2\sqrt{np}$ . Moreover, $\widetilde{\sigma}_{*}:=\max_{j,j^{\prime}}\left\lVert Z_{jj^{\prime}}^{+}\right\rVert_{\infty}=1+p(1-2\eta)\leqslant 2$ . Therefore, we can apply the concentration bound for the norm of symmetric matrices by Bandeira and van Handel [BvH16, Corollary 3.12, Remark 3.13] (recalled in Appendix A.2) with $t=2\sqrt{np}$ , in order to bound $\left\lVert Z\right\rVert=\left\lVert A-\operatorname*{\varmathbb{E}}[A]\right\rVert$ . For any given $0<\varepsilon\leqslant 1/2$ , we have that

\left\lVert A-\operatorname*{\varmathbb{E}}[A]\right\rVert\leqslant\big{(}(1+\varepsilon)4\sqrt{2}+2\big{)}\sqrt{np},

with probability at least $1-n\exp{\Big{(}\frac{-pn}{c_{\varepsilon}}\Big{)}}$ , where $c_{\varepsilon}$ only depends on $\varepsilon$ .

For the second statement, we apply Chernoff’s bound (see Appendix A.1) to the random variables $\overline{D}_{jj}=\sum_{j^{\prime}=1}^{n}\left(A^{+}_{jj^{\prime}}+A^{-}_{jj^{\prime}}\right),$ where we note that $(A^{+}_{jj^{\prime}}+A^{-}_{jj^{\prime}})_{j^{\prime}=1}^{n}$ are independent Bernoulli random variables with mean $p$ . Hence, $\operatorname*{\varmathbb{E}}[D_{jj}]=\overline{d}=p(n-1)$ . Let $\delta=\sqrt{\frac{6\ln n}{\overline{d}}}$ and assuming that $p>12\frac{\ln n}{n}$ (so that $\delta<1$ ), we obtain

\operatorname*{\varmathbb{P}}\left[\big{|}\overline{D}_{jj}-\overline{d}\big{|}\geqslant\sqrt{6\overline{d}\ln n}\right]\leqslant\operatorname*{\varmathbb{P}}\left[\big{|}\overline{D}_{jj}-\overline{d}\big{|}\geqslant\sqrt{3np\ln n}\right]\leqslant 2\exp{\big{(}-2\ln n\big{)}}=\frac{2}{n^{2}},

using that $n-1\geqslant\frac{n}{2}$ . Applying the union bound, we finally obtain that

\displaystyle\varmathbb{P}\bigg{(}\|\overline{D}-\varmathbb{E}[\overline{D}]\|

\displaystyle\geqslant\sqrt{3np\ln n}\bigg{)}\leqslant\frac{2}{n}.

∎

Lemma 5.5.

If $\|A-\varmathbb{E}[A]\|\leqslant\Delta_{A}$ , $\|\overline{D}-\varmathbb{E}[\overline{D}]\|\leqslant\Delta_{D}$ and $p>12\frac{\ln n}{n}$ , then with probability at least $1-\frac{2}{n}$ , it follows that

\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|

\displaystyle\leqslant\frac{\Delta_{A}}{\overline{d}}+2\frac{\Delta_{D}}{\overline{d}}+\frac{\Delta_{D}^{2}}{\overline{d}^{2}}.

Proof.

We first note that using the proof of Lemma 5.4, with probability at least $1-\frac{2}{n}$ , we have that $\big{|}\overline{D}_{jj}-\overline{d}\big{|}\leqslant\delta\overline{d},\forall j\in[n]$ , with $\delta<1$ . Consequently,

\displaystyle\|(\varmathbb{E}[\bar{D}])^{-1/2}\bar{D}^{1/2}-I\|

\displaystyle=\max_{j}\left|\sqrt{\frac{\overline{D}_{jj}}{\overline{d}}}-1\right|\leqslant\max_{j}\frac{|\overline{D}_{jj}-\overline{d}|}{\overline{d}}=\frac{\Delta_{D}}{\overline{d}},

since $|\sqrt{x}-1|\leqslant|x-1|$ for $0<x<1$ . We now apply the first inequality of Proposition C.2 with $A^{-}=\overline{D},A^{+}=A,B^{-}=\operatorname*{\varmathbb{E}}\left[\overline{D}\right],B^{+}=\operatorname*{\varmathbb{E}}\left[A\right]$ . We obtain

\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|\leqslant\frac{\Delta_{A}}{\overline{d}}+\left\lVert\overline{D}^{-1}\right\rVert\left\lVert A\right\rVert\left(\frac{\Delta_{D}^{2}}{\overline{d}^{2}}+2\frac{\Delta_{D}}{\overline{d}}\right).

It remains to prove that $\left\lVert\overline{D}^{-1}\right\rVert\left\lVert A\right\rVert\leqslant 1$ . It holds since $\overline{D}$ is a diagonal matrix, thus $\left\lVert\overline{D}^{-1}\right\rVert\left\lVert A\right\rVert=\left\lVert\overline{D}^{-1}A\right\rVert$ and similarly to Lemma E.1, it is straightforward to prove that $I-\left\lVert\overline{D}^{-1}A\right\rVert\leqslant 2$ , therefore $\left\lVert\overline{D}^{-1}A\right\rVert\leqslant 1$ .

∎

Combining the results from Lemma 5.4 and Lemma 5.5, we arrive at the concentration bound for $\|\overline{L_{sym}}-\mathcal{L}_{sym}\|$ .

Lemma 5.6.

Under the assumptions of Theorem 3.4, if $n\geqslant 10$ , then with probability at least $1-n\exp(-\frac{np}{c_{\epsilon}})-\frac{2}{n}$ there exists a universal constant $0<C<43$ such that

\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|\leqslant C\sqrt{\frac{\ln n}{np}}.

Proof.

If $p\geqslant\frac{12\ln n}{n}$ , the bounds in Lemma 5.4 hold simultaneously with probability at least $1-n\exp(-\frac{np}{c})-\frac{2}{n}$ and we have, with the notations of Lemma 5.5, $\Delta_{A}\leqslant 12\sqrt{np}$ and $\Delta_{D}\leqslant\sqrt{3np\ln n}$ . Applying Lemma 5.5, we then obtain

\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|\leqslant\frac{12\sqrt{np}}{\overline{d}}+2\frac{\sqrt{3np\ln n}}{\overline{d}}+\frac{3np\ln n}{\overline{d}^{2}}\leqslant\frac{24}{\sqrt{np}}+4\sqrt{3}\sqrt{\frac{\ln n}{np}}+\frac{12\ln n}{np}.

If $n\geqslant 10$ , $\ln n\geqslant 1$ and $\sqrt{\frac{\ln n}{np}}\geqslant\frac{1}{\sqrt{np}}$ . Moreover, since $p\geqslant 12\frac{\ln n}{n}$ , then $\frac{\ln n}{np}\leqslant\frac{1}{12}<1$ and $\sqrt{\frac{\ln n}{np}}\geqslant\frac{\ln n}{np}$ . We finally obtain

\displaystyle\|\overline{L_{sym}}-\mathcal{L}_{sym}\|\leqslant(24+4\sqrt{3}+12)\sqrt{\frac{\ln n}{np}}=C\sqrt{\frac{\ln n}{np}},

with $C=24+4\sqrt{3}+12\leqslant 43$ . ∎

5.3 Proof of Theorem 3.4

The proof of this theorem relies on the Davis-Kahan theorem. Using Weyl’s inequality (see Theorem B.1) and Lemma 5.6, we obtain for all $1\leqslant j\leqslant n$ ,

|\lambda_{j}(\overline{L_{sym}})-\lambda_{j}(\mathcal{L}_{sym})|\leqslant C\left(\frac{\ln n}{np}\right)^{1/2}.

In particular, for the $k$ -th smallest eigenvalue,

	$\displaystyle\lambda_{n-k+1}(\overline{L_{sym}})\geqslant\lambda_{n-k+1}(\mathcal{L}_{sym})-C\left(\frac{\ln n}{np}\right)^{1/2},$
	$\displaystyle\lambda_{n-k+1}(\overline{L_{sym}})-\lambda_{n-k+2}(\mathcal{L}_{sym})\geqslant\lambda_{n-k+1}(\mathcal{L}_{sym})-\lambda_{n-k+2}(\mathcal{L}_{sym})-C\left(\frac{\ln n}{np}\right)^{1/2}=\lambda_{gap}-C\left(\frac{\ln n}{np}\right)^{1/2}.$

For $\delta\in(0,1)$ , we will like to ensure that

\displaystyle\lambda_{gap}-C\left(\frac{\ln n}{np}\right)^{1/2}>\lambda_{gap}\left(1-\frac{\delta}{2}\right).

(75)

From Lemma 5.3, if $\sqrt{\rho}>1-\frac{1}{4k(2+\sqrt{k})}$ , then $\lambda_{gap}\geqslant\frac{1}{k}(1-2\eta)$ . Then for the previous condition (75) to hold, it is sufficient that

\displaystyle C\left(\frac{\ln n}{np}\right)^{1/2}<\frac{\delta}{2k}(1-2\eta)\iff p>\left(\frac{2Ck}{\delta(1-2\eta})\right)^{2}\frac{\ln n}{n}

\displaystyle=C(k,\eta,\delta)\frac{\ln n}{n},

(76)

with $C(k,\eta,\delta)=\left(\frac{2Ck}{\delta(1-2\eta)}\right)^{2}$ . We note that since $C(k,\eta,\delta)\geqslant C\geqslant 12$ , hence (76) implies that $p>12\frac{\ln n}{n}$ .

With this condition, we now apply the Davis-Kahan theorem (Theorem B.2)

\displaystyle\|(I-V_{k-1}(\overline{L_{sym}})V_{k-1}(\overline{L_{sym}})^{T})V_{k-1}(\mathcal{L}_{sym})\|\leqslant\frac{\left\lVert\overline{L_{sym}}-\mathcal{L}_{sym}\right\rVert}{\lambda_{gap}-C\left(\frac{\ln n}{np}\right)^{1/2}}\leqslant\frac{\delta\lambda_{gap}/2}{\lambda_{gap}(1-\delta/2)}=\frac{\delta/2}{1-\delta/2}\leqslant\delta.

Using Proposition B.3, there then exists an orthogonal matrix $O\in\varmathbb R^{(k-1)\times(k-1)}$ so that

\|V_{k-1}(\overline{L_{sym}})-\Theta R_{k-1}O\|\leqslant 2\delta.

5.4 Properties of the regularized Laplacian in the sparse regime

The analysis of the signed regularized Laplacian differs from the one of unsigned regularized Laplacian. In particular, Lemma 4.14 cannot be directly applied, since the trimming approach of the adjacency matrix for unsigned graphs is not available in this case. However, we will also use arguments by Le et al. in [LLV15] and [LLV17] for unsigned directed adjacency matrices in the inhomogeneous Erdős-Rényi model $G(n,(p_{jj^{\prime}})_{j,j^{\prime}})$ . More precisely, in Section 5.4.1, we will prove that the adjacency matrix concentrates on a large subset of edges called the core. On this subset, the unregularized (resp. regularized) Laplacian also concentrates towards the expected matrix $\mathcal{L}_{sym}$ (resp. $\mathcal{L}_{\gamma}$ ). In Section 5.4.2, we will show that on the remaining subset of nodes, the norm of the regularized Laplacian is relatively small.

5.4.1 Properties of the signed adjacency and degree matrices

In this section, we adapt the results by [LLV17] for the signed adjacency matrix and the degree matrix in our SSBM. Similarly to Theorem 2.6 [LLV17] (see Theorem A.3), the following lemma shows that the set of edges can be decomposed into a large block, and two blocks with respectively few columns and few rows.

Lemma 5.7.

(Decomposition of the set of edges for the SSBM) Let $A$ be the signed adjacency matrix of a graph sampled from the SSBM. For any $r\geqslant 1$ , with probability at least $1-6n^{-r}$ , the set of edges $[n]\times[n]$ can be partitioned into three classes $\mathcal{N},\mathcal{R}$ and $\mathcal{C}$ such that

1.

the signed adjacency matrix concentrates on $\mathcal{N}$

$\|(A-\varmathbb{E}A)_{\mathcal{N}}\|\leqslant Cr^{3/2}\sqrt{\overline{d}(1-\eta)},$

with $C>1$ a constant;
2.

$\mathcal{R}$ (resp. $\mathcal{C}$ ) intersects at most $4n/\overline{d}$ columns (resp. rows) of $[n]\times[n]$ ;
3.

each row (resp. column) of $A_{\mathcal{R}}$ (resp. $A_{\mathcal{C}}$ ) has at most $128r$ non-zero entries.

Remark 5.8.

We underline that this lemma is valid because the unsigned adjacency matrices $A^{+}$ and $A^{-}$ have disjoint support. We do not know if similar results could be obtained for the Signed Stochastic Block Model defined by Mercado et al. in [MTH16].

Proof.

We denote $A^{\pm}_{sup}$ (resp. $A^{\pm}_{inf}$ ) the upper (resp. lower) triangular part of the unsigned adjacency matrices. Using this decomposition, we have

A=A^{+}_{inf}+A^{+}_{sup}-A^{-}_{inf}-A^{-}_{sup}.

We note that $A^{+}_{inf},A^{+}_{sup},A^{-}_{inf},A^{-}_{sup}$ have disjoint supports, and each of them has independent entries. We can hence apply Theorem A.3 to each of these matrices, where we note that for each matrix

d:=n\max_{j,j^{\prime}}\operatorname*{\varmathbb{E}}[A_{jj^{\prime}}]=np(1-\eta)\leqslant 2\overline{d}(1-\eta).

With probability at least $1-2\times 3n^{-r}$ , there exists $\mathcal{N}^{\pm}_{inf},\mathcal{R}^{\pm}_{inf},\mathcal{C}^{\pm}_{inf},\mathcal{N}^{\pm}_{sup},\mathcal{R}^{\pm}_{sup},\mathcal{C}^{\pm}_{sup}$ four partitions of $[n]\times[n]$ that have the subsequent properties. For e.g., for $A^{+}_{inf}$ ,

•

$\|(A^{+}_{inf}-\varmathbb{E}A^{+}_{inf})_{\mathcal{N}}\|\leqslant Cr^{3/2}\sqrt{d}\leqslant Cr^{3/2}\sqrt{2\overline{d}(1-\eta)}$ ;
•

$\mathcal{R}^{+}_{inf}$ (resp. $\mathcal{C}^{+}_{inf}$ ) intersects at most $n/d\leqslant n/\overline{d}$ columns (resp. rows) of $[n]\times[n]$ ;
•

each row (resp. column) of $(A^{+}_{inf})_{\mathcal{R}}$ (resp. $(A^{+}_{inf})_{\mathcal{C}}$ ) have at most $32r$ ones.

We note that this decomposition holds simultaneously for $A^{\pm}_{inf}$ and $A^{\pm}_{sup}$ . Taking the unions of these subsets,

\mathcal{N}=\mathcal{N}^{+}_{inf}\cup\mathcal{N}^{+}_{sup}\cup\mathcal{N}^{-}_{inf}\cup\mathcal{N}^{-}_{sup},

and similarly for $\mathcal{R}$ and $\mathcal{C}$ , we have, with the triangle inequality

	$\displaystyle\\|(A-\varmathbb{E}A)_{\mathcal{N}}\\|$	$\displaystyle=\\|(A^{+}_{inf}-\varmathbb{E}A^{+}_{inf})_{\mathcal{N}^{+}_{inf}}+(A^{+}_{sup}-\varmathbb{E}A^{+}_{sup})_{\mathcal{N}^{+}_{sup}}-(A^{-}_{inf}-\varmathbb{E}A^{-}_{inf})_{\mathcal{N}^{-}_{inf}}-(A^{-}_{sup}-\varmathbb{E}A^{-}_{sup})_{\mathcal{N}^{-}_{sup}}\\|$
		$\displaystyle\leqslant\\|(A^{+}_{inf}-\varmathbb{E}A^{+}_{inf})_{\mathcal{N}^{+}_{inf}}\\|+\\|(A^{+}_{sup}-\varmathbb{E}A^{+}_{sup})_{\mathcal{N}^{+}_{sup}}\\|+\\|(A^{-}_{inf}-\varmathbb{E}A^{-}_{inf})_{\mathcal{N}^{-}_{inf}}\\|+\\|(A^{-}_{sup}-\varmathbb{E}A^{-}_{sup})_{\mathcal{N}^{-}_{sup}}\\|$
		$\displaystyle\leqslant 4Cr^{3/2}\sqrt{d}\leqslant C_{1}r^{3/2}\sqrt{\overline{d}(1-\eta)},$

with $C_{1}=4C\sqrt{2}$ . Moreover, each row of $\mathcal{R}$ (resp. each column of $\mathcal{C}$ ) has at most $2\times 32r$ entries equal to 1 and $2\times 32r$ entries equal to $-1$ , which means at most $128r$ non-zero entries. Finally $\mathcal{R}$ (resp. $\mathcal{C}$ ) intersects at most $4n/\overline{d}$ rows (resp. columns) of $[n]\times[n]$ . ∎

For the degree matrix $\overline{D}$ , we use inequality (4.3) from [LLV17]. Recall that the degree of node $j$ is $\overline{D}_{jj}=\sum_{j^{\prime}=1}^{n}(A^{+}_{jj^{\prime}}+A^{-}_{jj^{\prime}})$ which is a sum of $n$ independent Bernoulli variables with bounded variance $d/n$ . We can thus find an upper bound on the error $\|\overline{D}-\varmathbb{E}[\overline{D}]\|_{F}$ . This bound is weaker than the one obtained in Lemma 5.4 with the assumption $p\gtrsim\frac{\ln n}{n}$ .

Lemma 5.9.

There exists a constant $C^{\prime}>0$ such that for any $r\geqslant 1$ , with probability at least $1-e^{-2r}$ , it holds true

\sum_{j=1}^{n}(\overline{D}_{jj^{\prime}}-\overline{d})^{2}\leqslant C^{\prime}r^{2}nd\leqslant 2C^{\prime}r^{2}n\overline{d}(1-\eta).

5.4.2 Properties of the regularized Laplacian outside the core

In this section, we will bound the norm of the Signed Laplacian restricted to the subsets of edges $\mathcal{N}$ and $\mathcal{C}$ . The following “restriction lemma” is an extension of Lemma 8.1 in [LLV15] for Signed Laplacian matrices.

Lemma 5.10.

(Restriction of Signed Laplacian) Let $B$ be a $n\times n$ symmetric matrix, $B_{\gamma}$ its regularized form as described in Section 2.2, and $\mathcal{C}\subset[n]\times[n]$ . We denote $\overline{D}_{\gamma}$ the regularized degree matrix , and $\overline{L}_{\gamma}=\overline{D}_{\gamma}^{-1/2}B_{\gamma}\overline{D}_{\gamma}^{-1/2}$ the modified “Laplacian” and $B_{\mathcal{C}}$ the $n\times n$ matrix such that the entries outside of $\mathcal{C}$ are set to 0. Let $0<\varepsilon<1$ such that the degree of each node in $(B_{\gamma})_{\mathcal{C}}$ is less that $\varepsilon$ times the the corresponding degree in $B_{\gamma}$ . Then we have

\|(\overline{L}_{\gamma})_{\mathcal{C}}\|\leqslant\sqrt{\varepsilon}.

Proof.

We denote $\overline{D}_{r}$ (resp. $\overline{D}_{c}$ ) the degree matrix of $(B_{\gamma})_{\mathcal{C}}$ (resp. $(B_{\gamma})_{\mathcal{C}}^{T}$ ) and $\tilde{L}$ its regularized “Laplacian” (it is not necessarily a symmetric matrix) where

\tilde{L}=(\overline{D}^{1/2}_{r})^{\dagger}(B_{\gamma})_{\mathcal{C}}(\overline{D}^{1/2}_{c})^{\dagger}.

By definition of $\overline{L}_{\gamma}$ , $(\overline{L}_{\gamma})_{\mathcal{C}}=\overline{D}^{-1/2}_{\gamma}(B_{\gamma})_{\mathcal{C}}\overline{D}_{\gamma}^{-1/2}$ . Since in $(B_{\gamma})_{\mathcal{C}}$ , some entries in $B$ are set to 0, we have that for all $1\leqslant j\leqslant n$ ,

(\overline{D}_{c})_{jj}\leqslant[\overline{D}_{\gamma}]_{jj}.

Moreover, by assumption, $(\overline{D}_{r})_{jj}\leqslant\varepsilon[\overline{D}_{\gamma}]_{jj}$ . We denote $X=(\overline{D}^{1/2}_{r})^{\dagger}$ , $Y=(\overline{D}^{1/2}_{c})^{\dagger}$ and $Z=\overline{D}_{\gamma}^{-1/2}$ , and now we have

\displaystyle\overline{L}_{\mathcal{C}}

\displaystyle=ZB_{\mathcal{C}}Z=ZX^{\dagger}XB_{\mathcal{C}}YY^{\dagger}Z=ZX^{\dagger}\tilde{L}Y^{\dagger}Z.

Because $\|ZX^{\dagger}\|\leqslant\sqrt{\varepsilon}$ and $\|Y^{\dagger}Z\|\leqslant 1$ , by sub-multiplicativity of the norm, we thus obtain

\displaystyle\|\overline{L}_{\mathcal{C}}\|\leqslant\|ZX^{\dagger}\|\cdot\|\tilde{L}\|\cdot\|Y^{\dagger}Z\|\leqslant\sqrt{\varepsilon}\|\tilde{L}\|.

In addition, by considering the $2n\times 2n$ symmetric matrix $\widetilde{L}^{\prime}$

\widetilde{L}^{\prime}=\begin{pmatrix}0_{n}&\tilde{L}\\ \tilde{L}&0_{n}\end{pmatrix},

we have $\|\widetilde{L}^{\prime}\|=\|\widetilde{L}\|\leqslant 1$ . In fact, $\widetilde{L}^{\prime}$ is equal to the identity matrix minus the regularized Laplacian of

\begin{pmatrix}0_{n}&(B_{\gamma})_{\mathcal{C}}\\ (B_{\gamma})^{T}_{\mathcal{C}}&0_{n}\end{pmatrix}.

Using Appendix E, we can conclude that the eigenvalues of $\widetilde{L}^{\prime}$ are between -1 and 1, leading to $\|\widetilde{L}^{\prime}\|\leqslant 1$ . Hence, we finally arrive at $\|(\overline{L}_{\gamma})_{\mathcal{C}}\|\leqslant\sqrt{\varepsilon}$ . ∎

Remark 5.11.

We note that this lemma is not specific to the rows of the matrix $B$ , and one could also derive the same lemma with the assumptions on the columns of the matrix.

5.5 Error bounds w.r.t the expected regularized Laplacian and expected Signed Laplacian

In this section, we prove an upper bound on the errors $\left\lVert L_{\gamma}-\mathcal{L}_{\gamma}\right\rVert$ and $\left\lVert L_{\gamma}-\mathcal{L}_{sym}\right\rVert$ from Theorem 3.7. We will use the decomposition of the set of edges $(\mathcal{N},\mathcal{R},\mathcal{C})$ from Lemma 5.7, and sum the errors on each of these subsets of edges. We recall that on the subset $\mathcal{N}$ , we have an upper bound on $\|(A-\varmathbb{E}A)_{\mathcal{N}}\|$ . We will also use the fact that the regularized degrees $[\overline{D}_{\gamma}]_{jj}$ are lower-bounded by the regularization parameter $\gamma$ . On the subsets $\mathcal{R}$ and $\mathcal{C}$ , we will use Lemma 5.10 to upper bound the norm of the regularized Laplacian.

Lemma 5.12.

Under the assumptions of Theorem 3.7, for any $r\geqslant 1$ , with probability at least $1-7e^{-2r}$ , we have

\|L_{\gamma}-\mathcal{L}_{\gamma}\|\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}.

(77)

Proof.

Let $L_{\gamma}-\mathcal{L}_{\gamma}=S+T$ with

	$\displaystyle S$	$\displaystyle=(\overline{D}_{\gamma})^{-1/2}A_{\gamma}(\overline{D}_{\gamma})^{-1/2}-(\overline{D}_{\gamma})^{-1/2}\varmathbb{E}A_{\gamma}(\overline{D}_{\gamma})^{-1/2}=(\overline{D}_{\gamma})^{-1/2}(A_{\gamma}-\varmathbb{E}A_{\gamma})(\overline{D}_{\gamma})^{-1/2},$
	$\displaystyle T$	$\displaystyle=(\overline{D}_{\gamma})^{-1/2}\varmathbb{E}A_{\gamma}(\overline{D}_{\gamma})^{-1/2}-(\varmathbb{E}\overline{D}_{\gamma})^{-1/2}\varmathbb{E}A_{\gamma}(\varmathbb{E}\overline{D}_{\gamma})^{-1/2}.$

We will bound the norm of $S+T$ on $\mathcal{N}$ , and the norms of $L_{\gamma}$ and $\mathcal{L}_{\gamma}$ on the residuals $\mathcal{R},\mathcal{C}$ . We first use the triangle inequality to obtain

	$\displaystyle\\|L_{\gamma}-\mathcal{L}_{\gamma}\\|$	$\displaystyle\leqslant\\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{N}}\\|+\\|\left((L_{\gamma}-I)-(\mathcal{L}_{\gamma}-I)\right)_{\mathcal{R}}\\|+\\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\\|$
		$\displaystyle\leqslant\\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{N}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\\|$
		$\displaystyle=\\|\left(S+T\right)_{\mathcal{N}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\\|$
		$\displaystyle\leqslant\\|S_{\mathcal{N}}\\|+\\|T_{\mathcal{N}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\\|.$

1. Bounding the norm $\|T_{\mathcal{N}}\|$ .

Denoting $\gamma=\gamma^{+}+\gamma^{-}$ , we have that

$\displaystyle\\|T_{\mathcal{N}}\\|^{2}\leqslant\\|T_{\mathcal{N}}\\|_{F}^{2}$	$\displaystyle=\sum_{j,j^{\prime}=1}^{n}T_{jj^{\prime}}^{2}$
	$\displaystyle=\sum_{j,j^{\prime}=1}^{n}\left(\varmathbb{E}A_{jj^{\prime}}+(\gamma^{+}-\gamma^{-})/n\right)^{2}\left[\frac{1}{\sqrt{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)}}-\frac{1}{\overline{d}+\gamma}\right]^{2}$	(78)
	$\displaystyle\leqslant\frac{(\overline{d}+\gamma)^{2}}{2n^{2}\gamma^{6}}\left[\sum_{j=1}^{n}(\overline{D}_{jj}+\gamma)^{2}\sum_{j^{\prime}=1}^{n}(\overline{D}_{j^{\prime}j^{\prime}}-\overline{d})^{2}+n(\overline{d}+\gamma)^{2}\sum_{i=1}^{n}(\overline{D}_{jj}-\overline{d})^{2}\right].$	(79)

To upper bound (78) by (79), we have used the simplification trick in the proof of [LLV17, Theorem 4.1] which we now recall. Firstly, the second factor of (78) can be upper bounded in the following way. For $1\leqslant j,j^{\prime}\leqslant n$ ,

$\displaystyle\left\|\frac{1}{\sqrt{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)}}-\frac{1}{\overline{d}+\gamma}\right\|$	$\displaystyle=\frac{\|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)^{2}\|}{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)(\overline{d}+\gamma)+\sqrt{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)}(\overline{d}+\gamma)^{2}}$
	$\displaystyle\leqslant\frac{\|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)^{2}\|}{2\gamma^{3}}$
	$\displaystyle=\frac{\|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)(\overline{D}_{jj}+\gamma)+(\overline{d}+\gamma)(\overline{D}_{jj}+\gamma)-(\overline{d}+\gamma)^{2}\|}{2\gamma^{3}}$
	$\displaystyle=\frac{\|(\overline{D}_{jj}-\overline{d})(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)+(\overline{d}+\gamma)(\overline{D}_{jj}-\overline{d})\|}{2\gamma^{3}},$	(80)

where the inequality comes from the fact that $\overline{D}_{jj}+\gamma\geqslant\gamma$ . Secondly, we use the inequality $(a+b)^{2}\leqslant 2(a^{2}+b^{2})$ and we recall that by definition, we can bound the first factor of (78) by $|\varmathbb{E}(A_{\gamma})_{jj^{\prime}}|\leqslant\frac{\overline{d}+\gamma}{n}$ . This finally leads to (79).

Now we will bound each term of (79). Using Lemma 5.9, we have, for any $r\geqslant 1$ , with probability at least $1-e^{-2r}$ ,

\sum_{j=1}^{n}(\overline{D}_{jj}-\overline{d})^{2}\leqslant 2C^{\prime}r^{2}n\overline{d}(1-\eta)\leqslant 2C^{\prime}r^{2}n\overline{d}.

If this holds, then the first term of (79) is upper bounded by

	$\displaystyle\sum_{i=1}^{n}(\overline{D}_{jj}+\gamma)^{2}\sum_{j=1}^{n}(\overline{D}_{j^{\prime}j^{\prime}}-\overline{d})^{2}$	$\displaystyle\leqslant\left(2\sum_{j=1}^{n}(\overline{D}_{jj}-\overline{d})^{2}+2n(\overline{d}+\gamma)^{2}\right)\sum_{j^{\prime}=1}^{n}(\overline{D}_{j^{\prime}j^{\prime}}-\overline{d})^{2}$
		$\displaystyle\leqslant 2C^{\prime}r^{2}n\overline{d}\left(4C^{\prime}r^{2}n\overline{d}+2n(\overline{d}+\gamma)^{2}\right)$
		$\displaystyle\leqslant 2C^{\prime}r^{2}n(\overline{d}+\gamma)(1-\eta)\left(4C^{\prime}r^{2}nd+2n(\overline{d}+\gamma)^{2}\right)$
		$\displaystyle\leqslant 2C^{\prime}r^{2}n(\overline{d}+\gamma)\left(2(2C^{\prime}+1)r^{2}n(d+\gamma)^{2}\right)$
		$\displaystyle\leqslant C_{1}r^{4}n^{2}(\overline{d}+\gamma)^{3},$

with $C_{1}=4C^{\prime}(2C^{\prime}+1)$ . Similarly, we can bound the second term of (79)

\displaystyle n(\overline{d}+\gamma)^{2}\sum_{j=1}^{n}(\overline{D}_{jj}-\overline{d})^{2}

\displaystyle\leqslant 2C^{\prime}(\overline{d}+\gamma)^{2}r^{2}n^{2}\overline{d}\leqslant 2C^{\prime}(\overline{d}+\gamma)^{3}r^{2}n^{2}.

Hence, we obtain the following upper bound of (79)

\displaystyle\|T_{\mathcal{N}}\|^{2}

\displaystyle\leqslant\frac{(C_{1}+2C^{\prime})r^{4}}{2\gamma^{6}}(\overline{d}+\gamma)^{5}=\frac{C_{2}r^{4}}{\gamma}\left(1+\frac{\overline{d}}{\gamma}\right)^{5},

(81)

with $C_{2}=(C_{1}+2C^{\prime})/2$ .

2. Bounding the norm $\|S_{\mathcal{N}}\|$ .

We first note that

\displaystyle S

\displaystyle=(\overline{D}_{\gamma})^{-1/2}(A_{\gamma}-\varmathbb{E}A_{\gamma})(\overline{D}_{\gamma})^{-1/2}=(\overline{D}_{\gamma})^{-1/2}(A-\varmathbb{E}A)(\overline{D}_{\gamma})^{-1/2}.

We also recall that $\|\overline{D}_{\gamma}\|\geqslant\gamma$ . Hence, using Lemma 5.7, with probability at least $1-6n^{-r}$ , we have

\displaystyle\|S_{\mathcal{N}}\|

\displaystyle\leqslant\|\overline{D}_{\gamma}^{-1/2}\|\>\|(A-\varmathbb{E}A)_{\mathcal{N}}\|\>\|\overline{D}_{\gamma}^{-1/2}\|\leqslant\|(A-\varmathbb{E}A)_{\mathcal{N}}\|/\gamma\leqslant\frac{Cr^{3/2}}{\gamma}\sqrt{\overline{d}(1-\eta)}\leqslant\frac{Cr^{3/2}}{\gamma}\sqrt{\overline{d}}.

(82)

Summing the bounds in (81) and (82), we have the intermediate result

$\displaystyle\\|(L_{\gamma}-\mathcal{L}_{\gamma})_{\mathcal{N}}\\|$	$\displaystyle\leqslant\frac{Cr^{3/2}}{\gamma}\sqrt{\overline{d}}+\frac{\sqrt{C_{2}}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}$	(83)
	$\displaystyle\leqslant\frac{r^{2}}{\sqrt{\gamma}}\left(C\sqrt{\frac{\overline{d}}{\gamma}}+\sqrt{C_{2}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}\right)$	(84)
	$\displaystyle\leqslant\frac{r^{2}}{\sqrt{\gamma}}(C+\sqrt{C_{2}})\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}=\frac{C_{3}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2},$	(85)

with $C_{3}=C+\sqrt{C_{2}}$ .

3. Bounding $\left\lVert\left(L_{\gamma}\right)_{\mathcal{R}}\right\rVert,\left\lVert\left(L_{\gamma}\right)_{\mathcal{C}}\right\rVert,\left\lVert\left(\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\right\rVert,\left\lVert\left(\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\right\rVert$ .

Using the proof of Lemma 5.7, each row of $A_{\mathcal{R}}$ has at most $128r$ non-zeros entries and intersects at most $4n/\overline{d}$ columns. Thus, for all $1\leqslant j\leqslant n$

\displaystyle\sum_{j^{\prime}=1}^{n}\left[(A^{+}_{\gamma}+A^{-}_{\gamma})_{\mathcal{R}}\right]_{jj^{\prime}}

\displaystyle\leqslant 128r+\frac{4\gamma}{\overline{d}}=\gamma\left(\frac{128r}{\gamma}+\frac{4}{\overline{d}}\right)\leqslant\sum_{j^{\prime}}\left[A^{+}_{\gamma}+A^{-}_{\gamma}\right]_{jj^{\prime}}\left(\frac{128r}{\gamma}+\frac{4}{\overline{d}}\right),

as $\sum_{j^{\prime}}[A^{+}_{\gamma}+A^{-}_{\gamma}]_{jj^{\prime}}\geqslant n\times\left(\frac{\gamma^{+}}{n}+\frac{\gamma^{-}}{n}\right)=\gamma$ . We can thus apply Lemma 5.10 with $\varepsilon=\frac{128r}{\gamma}+\frac{4}{\overline{d}}$ , and we arrive at

\left\lVert(L_{\gamma})_{\mathcal{R}}\right\rVert\leqslant\sqrt{\frac{128r}{\gamma}+\frac{4}{\overline{d}}}.

We also obtain the same bound for $\|(L_{\gamma})_{\mathcal{C}}\|$ . Similarly, we have $\sum_{j^{\prime}}\left[\operatorname*{\varmathbb{E}}[A^{+}_{\gamma}]+\operatorname*{\varmathbb{E}}[A^{-}_{\gamma}]\right]_{jj^{\prime}}=(n-1)p+\gamma=\overline{d}+\gamma\geqslant\gamma$ and

\displaystyle\sum_{j^{\prime}=1}^{n}\left[(\operatorname*{\varmathbb{E}}[A^{+}_{\gamma}]+\operatorname*{\varmathbb{E}}[A^{-}_{\gamma}])_{\mathcal{R}}\right]_{jj^{\prime}}

\displaystyle\leqslant 4\frac{np}{\overline{d}}+\frac{4\gamma}{\overline{d}}\leqslant 8+\frac{4\gamma}{\overline{d}}=\gamma\left(\frac{8}{\gamma}+\frac{4}{\overline{d}}\right)\leqslant\sum_{j^{\prime}}\left[\operatorname*{\varmathbb{E}}[A^{+}]_{\gamma}+\operatorname*{\varmathbb{E}}[A^{-}]_{\gamma}\right]_{jj^{\prime}}\left(\frac{8}{\gamma}+\frac{4}{\overline{d}}\right).

We arrive at $\left\lVert(\mathcal{L}_{\gamma})_{\mathcal{R}}\right\rVert\leqslant\sqrt{\frac{8}{\gamma}+\frac{4}{\overline{d}}}$ , and finally, we also have $\left\lVert(\mathcal{L}_{\gamma})_{\mathcal{C}}\right\rVert\leqslant\sqrt{\frac{8}{\gamma}+\frac{4}{\overline{d}}}$ .

4. Bounding $\|L_{\gamma}-\mathcal{L}_{\gamma}\|$ .

Summing up the bounds obtained in the first three steps, with probability at least $1-e^{-2r}-6n^{-r}\geqslant 1-7e^{-2r}$ , we finally arrive at the bound

	$\displaystyle\\|L_{\gamma}-\mathcal{L}_{\gamma}\\|$	$\displaystyle\leqslant\frac{C_{3}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+2\sqrt{\frac{128r}{\gamma}+\frac{4}{\overline{d}}}+2\sqrt{\frac{8}{\gamma}+\frac{4}{\overline{d}}}$
		$\displaystyle\leqslant\frac{C_{3}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+4\sqrt{\frac{128r}{\gamma}+\frac{4}{\overline{d}}}$
		$\displaystyle\leqslant\frac{C_{3}r^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}.$

∎

This bound also provides easily a bound on the norm of $L_{\gamma}-\mathcal{L}_{sym}$ .

Corollary 5.13.

(Error bound of the regularized Laplacian) With the notations of Theorem 3.4 and Theorem 3.7, and $\gamma=\gamma^{+}+\gamma^{-}$ , we have

\|L_{\gamma}-\mathcal{L}_{sym}\|\leqslant\frac{Cr^{2}}{\sqrt{\gamma}}\left(1+\frac{\overline{d}}{\gamma}\right)^{5/2}+\frac{32\sqrt{2r}}{\sqrt{\gamma}}+\frac{8}{\sqrt{\overline{d}}}+\frac{\gamma}{\overline{d}+\gamma}=:\Delta_{L}(\gamma,\overline{d}).

(86)

In particular, for the choice $\gamma=\overline{d}^{7/8}$ , if $p\geqslant 2/n$ , we obtain

\displaystyle\|L_{\gamma}-\mathcal{L}_{sym}\|\leqslant\left(128Cr^{2}+1\right)\overline{d}^{-1/8}.

Proof.

By triangular inequality,

\displaystyle\|L_{\gamma}-\mathcal{L}_{sym}\|

\displaystyle\leqslant\|L_{\gamma}-\mathcal{L}_{\gamma}\|+\|\mathcal{L}_{\gamma}-\mathcal{L}_{sym}\|.

For the second term on the RHS, we have

\displaystyle\|\mathcal{L}_{\gamma}-\mathcal{L}_{sym}\|

\displaystyle=\left\lVert\frac{1}{\overline{d}+\gamma}\varmathbb{E}A-\frac{1}{\overline{d}}\varmathbb{E}A\right\rVert=\frac{\gamma}{\overline{d}(\overline{d}+\gamma)}\|\varmathbb{E}A\|\leqslant\frac{\gamma}{\overline{d}+\gamma}.

(87)

The last inequality comes from the fact that $\left\lVert\varmathbb{E}A\right\rVert\leqslant(n-1)p(1-\eta)\leqslant\overline{d}$ . Thus, by summing the bound obtained in Lemma 5.12 and (87), we arrive at the expected result in (86). Moreover, if $\gamma\leqslant\overline{d}$ , since $C>1$ , one can readily verify that

\displaystyle\|\mathcal{L}_{\gamma}-\mathcal{L}_{sym}\|\leqslant 128Cr^{2}\frac{\overline{d}^{\frac{5}{2}}}{\gamma^{3}}+\frac{\gamma}{\overline{d}}.

(88)

If $\gamma=\overline{d}^{7/8}$ , then $\gamma\leqslant\overline{d}$ holds provided $\overline{d}\geqslant 1$ or equivalently, $p\geqslant\frac{1}{n-1}$ . The latter is ensured if $p\geqslant 2/n$ (since $n\geqslant 2$ ). Plugging this in (88), we then obtain the bound

\displaystyle\|\mathcal{L}_{\gamma}-\mathcal{L}_{sym}\|\leqslant\left(128Cr^{2}+1\right)\overline{d}^{-1/8}.

This concludes the proof of Corollary 5.13 and Theorem 3.7. ∎

5.6 Error bound on the eigenspaces and mis-clutering rate in the sparse regime

This section provides a bound on the misalignment error of the eigenspaces of $L_{\gamma}$ and $\mathcal{L}_{sym}$ , which then leads to a bounds on the mis-clustering rate of the $k$ -means clustering step.

5.6.1 Eigenspace alignment

Using the bound from Corollary 5.13, we can perform the same analysis of the eigenspaces of $L_{\gamma}$ and $\mathcal{L}_{sym}$ , as in Theorem 3.4, which will prove Theorem 3.9. We apply, once again, Weyl’s inequality and the Davis-Kahan theorem to bound the distance between the two subspaces $\mathcal{R}(V_{k-1}(L_{\gamma}))$ and $\mathcal{R}(V_{k-1}(\mathcal{L}_{sym}))$ . We have that

\displaystyle\lambda_{n-k+1}(L_{\gamma})-\lambda_{n-k+2}(\mathcal{L}_{sym})\geqslant\lambda_{gap}-\|L_{\gamma}-\mathcal{L}_{sym}\|\geqslant\lambda_{gap}-\Delta_{L}(\gamma,\overline{d}),

using Corollary 5.13. If $\gamma=\gamma_{0}\overline{d}^{7/8}$ , then

\displaystyle\Delta_{L}(\gamma,\overline{d})

\displaystyle\leqslant\left(128Cr^{2}+1\right)(\overline{d})^{-1/8}:=\frac{C_{4}}{\overline{d}^{1/8}},

with $C_{4}=128Cr^{2}+1$ . For $0<\delta<1/2$ , we would like to ensure that

\displaystyle\lambda_{gap}-\Delta_{L}(\gamma,\overline{d})\geqslant\lambda_{gap}\left(1-\frac{\delta}{2}\right).

Hence, using the lower bound on the eigengap from Lemma 5.3, it suffices that

\displaystyle\lambda_{gap}-\frac{C_{4}}{\overline{d}^{1/8}}\geqslant\lambda_{gap}\left(1-\frac{\delta}{2}\right)\iff\overline{d}^{1/8}\geqslant\frac{2kC_{4}}{\delta(1-2\eta)}\iff p\geqslant\left(\frac{2kC_{4}}{\delta(1-2\eta)}\right)^{8}\frac{1}{n-1}.

Thus, the condition $p\geqslant\left(\frac{2kC_{4}}{\delta(1-2\eta)}\right)^{8}\frac{2}{n}$ is sufficient. Applying the Davis-Kahan theorem, we arrive at

\displaystyle\|(I-V_{k-1}(L_{\gamma})V_{k-1}(L_{\gamma})^{T})V_{k-1}(\mathcal{L}_{sym})\|\leqslant\frac{\delta\lambda_{gap}/2}{\lambda_{gap}(1-\delta/2)}\leqslant\frac{\delta/2}{1-\delta/2}\leqslant\delta,

and using once again Proposition B.3, there exists an orthogonal matrix $O\in\varmathbb R^{(k-1)\times(k-1)}$ such that

\|V_{k-1}(L_{\gamma})-\Theta R_{k-1}O\|\leqslant 2\delta.

5.7 Proof of Theorem 3.12

In this section, we finally prove our result on the clustering performance of the Signed Laplacian and regularized Laplacian algorithms. The proof essentially relies on the following lemma, which provides a lower bound on the distance between two rows of $\Delta^{-1}R_{k-1}$ , with $\Delta=\text{diag}(\sqrt{n_{i}})$ .

Lemma 5.14.

For all $1\leqslant i\neq i^{\prime}\leqslant k$ , we have $\left\lVert(R_{k-1})_{i*}-(R_{k-1})_{i^{\prime}*}\right\rVert\geqslant 1.$ Moreover, for $i\in[k]$ , it holds that

\min_{\begin{subarray}{c}i,i^{\prime}\in[k],i\neq i^{\prime}\\ j\in C_{i},j^{\prime}\in C_{i^{\prime}}\end{subarray}}\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2}\geqslant\frac{2}{3n_{i}}.

Proof.

Recall from (68) that $\overline{C}=p(1-2\eta)uu^{T}+\text{diag}(d_{i})$ , with $d_{i}=u_{i}^{2}+\left(1+\frac{p}{\overline{d}}(1-2\eta)\right)$ and $u_{i}=\sqrt{\frac{n_{i}}{\overline{d}}},1\leqslant i\leqslant k$ . Moreover, from (69), $\overline{C}=R\Lambda R$ with $R=[R_{k-1}\>\gamma_{1}]$ and $\gamma_{1}$ the largest eigenvector of $\overline{C}$ . We first show that the entries of $\gamma_{1}$ are necessarily of the same sign, i.e. $(\gamma_{1})_{i}\geqslant 0,\forall i$ or $(\gamma_{1})_{i}\leqslant 0,\forall i$ . In fact, by definition, $\gamma_{1}$ is the solution of

\max_{\left\lVert v\right\rVert=1}v^{T}\overline{C}v=\max_{\left\lVert v\right\rVert=1}p(1-2\eta)(v^{u})^{2}+\sum_{i=1}^{k}d_{i}v_{i}^{2}.

(89)

Since all the entries of $u$ are positive, it is easy to see that any solution $\gamma_{1}$ of (89) necessarily has entries of the same sign (otherwise you could replace some $(\gamma_{1})_{i}$ ) by $-(\gamma_{1})_{i}$ and increase the objective function).

Let $i\neq i^{\prime}\in[k]$ . As $R$ has orthonormal rows,

\displaystyle<R_{i*},R_{i^{\prime}*}>=0\iff<(R_{k-1})_{i*},(R_{k-1})_{i^{\prime}*}>+\underbrace{(\gamma_{1})_{i}(\gamma_{1})_{i^{\prime}}}_{\geqslant 0}=0\implies<(R_{k-1})_{i*},(R_{k-1})_{i^{\prime}*}>\leqslant 0.

Hence,

	$\displaystyle\left\lVert(R_{k-1})_{i}-(R_{k-1})_{i^{\prime}}\right\rVert^{2}$	$\displaystyle=\left\lVert(R_{k-1}){i}\right\rVert^{2}+\left\lVert(R_{k-1}){i^{\prime}}\right\rVert^{2}-2\underbrace{<(R_{k-1})_{i},(R_{k-1})_{i^{\prime}}>}_{\leqslant 0}$
		$\displaystyle\geqslant\left\lVert(R_{k-1})_{i}\right\rVert^{2}+\left\lVert(R_{k-1})_{i^{\prime}}\right\rVert^{2}$
		$\displaystyle=2-\underbrace{[(\gamma_{1})_{i}^{2}+(\gamma_{1})_{i^{\prime}}^{2}]}_{\leqslant 1}\geqslant 1.$

In particular, this implies that $R_{k-1}$ has $k$ distinct rows. Now let $j,j^{\prime}\in[n]$ such that $j\in C_{i}$ and $j^{\prime}\in C_{i^{\prime}}$ . Recalling that with $\Delta=\text{diag}(\sqrt{n_{i}})$ , $V_{k-1}(\mathcal{L}_{sym})=\Theta R_{k-1}=\Theta\Delta\Delta^{-1}R_{k-1}=\hat{\Theta}\Delta^{-1}R_{k-1}$ , we have

\displaystyle\begin{cases}(\Delta^{-1}R_{k-1})_{j*}=\frac{1}{\sqrt{n_{i}}}(R_{k-1})_{i*},&\\ (\Delta^{-1}R_{k-1})_{j^{\prime}*}=\frac{1}{\sqrt{n_{i^{\prime}}}}(R_{k-1})_{i^{\prime}*}.\end{cases}

Hence,

	$\displaystyle\left\lVert(\Delta^{-1}R_{k-1})_{j}-(\Delta^{-1}R_{k-1})_{j^{\prime}}\right\rVert^{2}$	$\displaystyle=\frac{1}{n_{i}}\left\lVert(R_{k-1}){i}\right\rVert^{2}+\frac{1}{n_{i^{\prime}}}\left\lVert(R_{k-1})_{i^{\prime}}\right\rVert^{2}-2\frac{1}{\sqrt{n_{i}n_{i^{\prime}}}}\underbrace{<(R_{k-1})_{i},(R_{k-1})_{i^{\prime}}>}_{\leqslant 0}$
		$\displaystyle\geqslant\frac{1}{n_{i}}\left\lVert(R_{k-1})_{i}\right\rVert^{2}+\frac{1}{n_{i^{\prime}}}\left\lVert(R_{k-1})_{i^{\prime}}\right\rVert^{2}$
		$\displaystyle\geqslant\frac{1}{n_{i}}+\frac{1}{n_{i^{\prime}}}-\frac{(\gamma_{1})_{i}^{2}}{n_{i}}-\frac{(\gamma_{1})_{i^{\prime}}^{2}}{n_{i}^{\prime}}$
		$\displaystyle\geqslant\frac{1}{n_{i}}+\frac{1}{n_{i^{\prime}}}-\frac{(\gamma_{1})_{i}^{2}+(\gamma_{1})_{i^{\prime}}^{2}}{ns}\geqslant\frac{1}{n_{i}}+\frac{1}{n_{i^{\prime}}}-\frac{1}{ns}\geqslant\frac{1}{n_{i}}+\frac{1}{nl}-\frac{1}{ns}.$

Besides, we know that $\frac{1}{nl}\geqslant\frac{\rho}{n_{i}}$ and $\frac{1}{ns}\leqslant\frac{1}{\rho n_{i}}$ . Therefore, we obtain the bound

\displaystyle\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2}\geqslant\frac{1}{n_{i}}\left(1+\rho-\frac{1}{\rho}\right).

We will now prove that with the condition $\sqrt{\rho}>1-\frac{1}{4k(2+\sqrt{k})}$ , we have $1+\rho-\frac{1}{\rho}\geqslant\frac{2}{3}$ and this will lead to the final result. First, we note that $\rho>1-\frac{1}{2k(2+\sqrt{k})}$ and $2k(2+\sqrt{k})\geqslant 12$ , and $\frac{2k(2+\sqrt{k})}{2k(2+\sqrt{k})-1}\leqslant\frac{5}{4}$ for $k\geqslant 2$ . Thus,

\displaystyle 1+\rho-\frac{1}{\rho}

\displaystyle\geqslant 2-\frac{1}{2k(2+\sqrt{k})}-\frac{2k(2+\sqrt{k})}{2k(2+\sqrt{k})-1}\geqslant 2-\frac{1}{12}-\frac{5}{4}=\frac{2}{3}.

∎

Remark 5.15.

In the equal-size case $n_{i}=\frac{n}{k},\forall 1\leqslant i\leqslant k$ , since $\gamma_{1}=\chi_{1}$ , $R_{k-1}$ has orthogonal rows and

\displaystyle\left\lVert(R_{k-1})_{i*}-(R_{k-1})_{i^{\prime}*}\right\rVert^{2}=\left\lVert R_{i*}-R_{i^{\prime}*}\right\rVert^{2}=2.

This implies that

\displaystyle\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2}=\frac{2k}{n}.

From Lemma 5.14, we have that $\forall 1\leqslant i\leqslant k,\min\limits_{\begin{subarray}{c}i,i^{\prime}\in[k],i\neq i^{\prime}\\ j\in C_{i},j^{\prime}\in C_{i^{\prime}}\end{subarray}}\left\lVert(\Delta^{-1}R_{k-1})_{j*}-(\Delta^{-1}R_{k-1})_{j^{\prime}*}\right\rVert^{2}\geqslant\frac{2}{3n_{i}}.$ Hence with $\delta_{i}^{2}:=\frac{2}{3n_{i}}$ and using Lemma 4.19, we obtain

	$\displaystyle\sum_{i=1}^{k}\delta_{i}^{2}\|S_{i}\|=\sum_{i=1}^{k}\frac{2\|S_{i}\|}{3n_{i}}$	$\displaystyle\leqslant 4(4+2\xi)\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})\right\rVert_{F}^{2}$
		$\displaystyle\leqslant 4(16+8\xi)(k-1)\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})O\right\rVert^{2}$
		$\displaystyle\leqslant 8(16+8\xi)(k-1)\delta^{2},$

using Theorem 3.4 . Moreover, we have

	$\displaystyle\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})\right\rVert_{F}^{2}$	$\displaystyle\leqslant 2(k-1)\left\lVert V_{k-1}(\overline{L_{sym}})-V_{k-1}(\mathcal{L}_{sym})O\right\rVert^{2}$
		$\displaystyle\leqslant 8(k-1)\delta^{2}<8(k-1)\frac{1}{12(16+8\xi)(k-1)}=\frac{n_{i}\delta_{i}^{2}}{16+8\xi},\forall 1\leqslant i\leqslant k.$

Therefore, we can use the second part of Lemma 4.19 and finally conclude that

\displaystyle\sum_{i=1}^{k}\frac{|S_{i}|}{n_{i}}\leqslant 96(2+\xi)\delta^{2}.

For the regularized Laplacian algorithm, the same computations are valid using the result from Theorem 3.9.

6 Numerical experiments

In this section, we report on the outcomes of numerical experiments that compare our two proposed algorithms with a suite of state-of-the-art methods from the signed clustering literature. We rely on a previous Python implementation of SPONGE and Signed Laplacian (along with their respective normalized versions), and of other methods from the literature³³3Python implementations of a suite of algorithms for signed clustering are available at https://github.com/alan-turing-institute/signet, made available in the context of previous work of a subset of the authors of the present paper [CDGT19]. More specifically, we consider algorithms based on the adjacency matrix $A$ , the Signed Laplacian matrix $\overline{L}$ , its symmetrically normalized version $\overline{L}_{sym}$ [KSL⁺10], SPONGE and its normalized version SPONGE_sym, and the two algorithms introduced in [CWD12] that optimize the Balanced Ratio Cut and the Balanced Normalized Cut objectives.

We remark that once the low-dimensional embedding has been computed by any of the considered algorithms, the final partition is obtained after running $k$ -means++ [AV07], which improves over the popular $k$ -means algorithm by employing a careful seeding initialization procedure and is the typical choice in practice.

6.1 Grid search for choosing the parameters $\tau^{+},\tau^{-}$

In the following experiments, the Signed Stochastic Block Model will be sampled with the following set of parameters

•

the number of nodes $n=5000$ ,
•

the number of communities $k\in\{3,5,10,20\}$ ,
•

the relative size of communities $\rho=1$ (equal-size clusters) and $\rho=1/k$ (non-equal size clusters).

For the edge density parameter $p$ , we choose two sparsity regimes, “Regime I” and “Regime II”, where Regime II is strictly harder than Regime I, in the sense than for the same value of $k$ , the edge density in Regime I is significantly larger compared to Regime II. The noise level $\eta$ is chosen such that the recovery of the clusters is unsatisfactory for a subset of pairs of parameters $(\tau^{+},\tau^{-})$ . For each set of parameters, we sample 20 graphs from the SSBM and average the resulting ARI.

Our experimental setup is summarized in the following steps

1.

Select a set of parameters $(k,\rho,p,\eta)$ from the regime of interest;
2.

Sample a graph from the SSBM $(n,k,\rho,p,\eta)$ ;
3.

Extract the largest connected component of the measurement graph (regardless of the sign of the edges);
4.

If the size of the latter is too small ( $<n/2$ ), resample a graph until successful;
5.

For each pair of parameters $(\tau^{+},\tau^{-})$ , compute the $k$ -dimensional embeddings using the SPONGE_sym algorithm (with the implementation in the signet package [CDGT19]);
6.

Obtain a partition of the graph into $k$ clusters, and compute the $ARI$ between this estimated partition and the ground-truth clusters using the implementation in scikit-learn of the $k$ -means++ algorithm;
7.

Repeat steps $2-7$ for 20 times.

The results in the dense regimes are reported in Figure 1, while those for the sparse regimes in Figure 2. This set of results indicate that the gradient of the ARI in the space of parameters $(\tau^{+},\tau^{-})$ is larger when the cluster sizes are very unbalanced and the edge density is low. We attribute this to the fact that, for suitably chosen values, the parameters $(\tau^{+},\tau^{-})$ are performing a form of regularization of the graph that can significantly improve the clustering performance.

[Uncaptioned image] — Figure 1: Heatmaps of the Adjusted Rand Index between the ground truth and the partition obtained using the SPONGE_sym algorithm with varying regularization parameters $(\tau^{+},\tau^{-})$ , for a SSBM in Regime I, with $n=5000$ and $k=\{3,5,10,20\}$ clusters of equal sizes (left column) and unequal sizes (right column).

6.2 Comparison of a suite of spectral methods

This section performs a comparison of the performance of the following spectral clustering algorithms. We rely on the same notation used in [CDGT19], when mentioning the names of the SPONGE algorithms, namely: SPONGE and SPONGE_sym. The complete list of algorithms compared is as follows.

•

the combinatorial (un-normalized) Signed Laplacian $\overline{L}=\overline{D}-A$ ,
•

the symmetric Signed Laplacian $\overline{L}_{sym}=I-\overline{D}^{-1/2}A\overline{D}^{-1/2}$ ,
•

SPONGE and SPONGE_sym with a suitably chosen pair of parameters $(\tau^{+},\tau^{-})$
•

the Balanced Ratio Cut $L_{BRC}=D^{+}-A$
•

the Balanced Normalized Cut $L_{BNC}=D^{-1/2}(D^{+}-A)D^{-1/2}$ .

For the combinatorial and symmetric Signed Laplacians $\overline{L}$ and $\overline{L_{sym}}$ , we compute $k-1$ -dimensional embeddings before applying the $k$ -means++ algorithm. For all other methods, we use the $k$ smallest eigenvectors.

In this experiment, we fix the parameters $n=5000,k\in\{3,5,10,20\}$ and $p,\eta$ in a certain set, and for each plot, we vary the aspect ratio $\rho\in[0,1]$ . The relative proportions of the classes $s_{i}=\frac{n_{i}}{n}$ are chosen according to the following procedure

1.

Fix $s_{1}^{\prime}=1/k$ , pick a value for $\rho$ and compute $s_{k}^{\prime}=s_{1}^{\prime}/\rho$ .
2.

For $i\in[2,k-1]$ , sample $s_{i}^{\prime}$ from the uniform distribution in the interval $[s_{1}^{\prime},s_{k}^{\prime}]$ .
3.

Compute the proportions $s_{i}=\frac{s_{i}^{\prime}}{\sum_{i=1}^{k}s_{i}^{\prime}}$ , and then sample the graph from the resulting SSBM.
4.

Repeat 20 times the steps 1-3 mentioned above, and record the average performance over the 20 runs.

The results are reported in Figure 3. We note that in almost all settings, the SPONGE_sym algorithm outperforms the other clustering methods, in particular for low values of the aspect ratio $\rho$ . With the exception of the symmetric Signed Laplacian, most methods seem to perform worse when the aspect ratio is higher, meaning that the clusters are more unbalanced, which is a more challenging regime.

Refer to caption — Figure 3: Performance of the various clustering algorithms, as measured by the Adjusted Rand Index, versus the aspect ratio $\rho$ for a SSBM with $k=\{3,5,10,20\}$ for $n=5000$ . For larger number of clusters, $k=10$ and especially $k=20$ , SPONGE_sym is essentially the only algorithm able to produce meaningful results, and clearly outperforms all the other methods. Note that no regularization has been used throughout this set of experiments.

6.3 Performance of the regularized algorithms in the sparse regime

In this final batch of experiments, we study how the regularized Signed Laplacian and the SPONGE_sym sparse algorithms perform. We consider sparse settings of the SSBM ( $p\leqslant 0.003$ ) with $n=5000$ nodes. For the SPONGE_sym algorithm, we fix the parameters $(\tau^{+},\tau^{-})$ in each setting. Our parameter selection procedure is to chose a pair of parameters that leads to a “good” recovery of the clusters for the unregularized algorithm (see Figure 2). We perform a grid search on the parameters $(\gamma^{+},\gamma^{-})$ for each of the two regularized algorithms (see Figure 4 and Figure 5). For the regularized Signed Laplacian algorithm, we observe distinct regions of performance on the space of parameters $(\gamma^{+},\gamma^{-})$ . This is not predictable from our theoretical results, where the positive and negative regularization parameters play symmetric roles. We conjecture this to be due to the difference of density of the positive and negative subgraphs in our signed random graph model. For the SPONGE_sym sparse algorithm, we note that the gradient of performances in the heatmaps (Figure 4, Figure 5) is similar to what was reported in Figure 2, which could be due to the fact that the parameters $(\tau^{+},\tau^{-})$ already have a regularization effect.

7 Concluding remarks and future research directions

In this work, we provided a thorough theoretical analysis of the robustness of the SPONGE_sym and symmetric Signed Laplacian algorithms, for graphs generated from a Signed Stochastic Block Model. Under this model, the sign of the edges (rather than the usual discrepancy of the edge densities across clusters versus within clusters) is an essential attribute which induces the underlying cluster structure of the graph. We proved that our signed clustering algorithms, based on suitably defined matrix operators, are able to recover the clusters under certain favorable noise regimes, and under two regimes of edge sparsity. Although the sparse setting is particularly challenging, our algorithms based on regularized graphs perform well, provided that the regularization parameters are suitably chosen.

One theoretical question that has been not been answered yet relates to the choice of the positive and negative regularization parameters $\gamma_{+},\gamma_{-}$ . Having a data-driven approach to tune the regularization parameters would be of great use in many practical applications involving very sparse graphs. An interesting future line of work would be to study the latest regularizing techniques based on powers of adjacency matrices or certain graph distance matrices, in the context of sparse signed graphs.

Yet another approach is to consider a pre-processing stage that performs low-rank matrix completion on the adjacency matrix, whose output could subsequently be used as input for our proposed algorithms. An extension of the Cheeger inequality to the setting of signed graphs, analogue to the generalized Cheeger inequality previously explored in [CKC⁺16], is another interesting research question. Extensions to the time-dependent setting and online clustering [LSS16, MWD⁺18], or when covariate information is available [YS20], are further research directions worth exploring, well motivated by real world applications involving signed networks.

References

[ABARS20] Emmanuel Abbe, Enric Boix-Adserà, Peter Ralli, and Colin Sandon, Graph powering and spectral robustness, SIAM Journal on Mathematics of Data Science 2 (2020), no. 1, 132–157.
[Abb17] Emmanuel Abbe, Community detection and stochastic block models: recent developments, The Journal of Machine Learning Research 18 (2017), no. 1, 6446–6531.
[ACBL13] Arash A. Amini, Aiyou Chen, Peter J. Bickel, and Elizaveta Levina, Pseudo-likelihood methods for community detection in large sparse networks, The Annals of Statistics 41 (2013), no. 4, 2097–2122.
[ADHP09] Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning 75 (2009), no. 2, 245–248.
[ASW15] Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh Ying Wah, Time-series clustering–a decade review, Information Systems 53 (2015), 16–38.
[AV07] David Arthur and Sergei Vassilvitskii, K-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (USA), SODA ’07, Society for Industrial and Applied Mathematics, 2007, p. 1027–1035.
[BBC04] Nikhil Bansal, Avrim Blum, and Shuchi Chawla, Correlation clustering, Mach. Learn. 56 (2004), no. 1-3, 89–113.
[Bha96] R. Bhatia, Matrix analysis, Springer New York, 1996.
[BSG⁺12] Sujogya Banerjee, Kaushik Sarkar, Sedat Gokalp, Arunabha Sen, and Hasan Davulcu, Partitioning signed bipartite graphs for classification of individuals and organizations, International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, Springer, 2012, pp. 196–204.
[BSS13] Afonso S Bandeira, Amit Singer, and Daniel A Spielman, A Cheeger inequality for the graph Connection Laplacian, SIAM Journal on Matrix Analysis and Applications 34 (2013), no. 4, 1611–1630.
[BvH16] Afonso S. Bandeira and Ramon van Handel, Sharp nonasymptotic bounds on the norm of random matrices with independent entries, Ann. Probab. 44 (2016), no. 4, 2479–2506.
[CCT12] Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas, Spectral clustering of graphs with general degrees in the extended planted partition model, 25th Annual Conference on Learning Theory (Edinburgh, Scotland) (Shie Mannor, Nathan Srebro, and Robert C. Williamson, eds.), Proceedings of Machine Learning Research, vol. 23, JMLR Workshop and Conference Proceedings, 25–27 Jun 2012, pp. 35.1–35.23.
[CDGT19] Mihai Cucuringu, Peter Davies, Aldo Glielmo, and Hemant Tyagi, Sponge: A generalized eigenproblem for clustering signed networks, Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 89, PMLR, 16–18 Apr 2019, pp. 1088–1098.
[CHN⁺14] Kai-Yang Chiang, Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon, and Ambuj Tewari, Prediction and clustering in signed networks: A local to global perspective, Journal of Machine Learning Research 15 (2014), 1177–1213.
[Chu96] Fan RK Chung, Laplacians of graphs and Cheeger’s inequalities, Combinatorics, Paul Erdos is Eighty 2 (1996), no. 157-172, 13–2.
[Chu05] Fan Chung, Laplacians and the Cheeger inequality for directed graphs, Annals of Combinatorics 9 (2005), no. 1, 1–19.
[CKC⁺16] Mihai Cucuringu, Ioannis Koutis, Sanjay Chawla, Gary Miller, and Richard Peng, Simple and scalable constrained clustering: a generalized spectral method, Artificial Intelligence and Statistics Conference (AISTATS) 2016 51 (2016), 445–454.
[CPv19] Mihai Cucuringu, Andrea Pizzoferrato, and Yves van Gennip, An MBO scheme for clustering and semi-supervised clustering of signed networks, To appear in Communications in Mathematical Sciences (2019), arXiv:1901.03091.
[CR11] Fan Chung and Mary Radcliffe, On the spectra of general random graphs, Electronic Journal of Combinatorics 18 (2011), no. 1, Paper 215, 14. MR 2853072
[Cuc15] M. Cucuringu, Synchronization over Z₂ and community detection in multiplex networks with constraints, Journal of Complex Networks 3 (2015), 469–506.
[CWD12] Kai-Yang Chiang, Joyce Jiyoung Whang, and Inderjit S. Dhillon, Scalable clustering of signed networks using balance normalized cut, Proceedings of the 21st ACM International Conference on Information and Knowledge Management (New York, NY, USA), CIKM ’12, Association for Computing Machinery, 2012, p. 615–624.
[CWP⁺16] Lingyang Chu, Zhefeng Wang, Jian Pei, Jiannan Wang, Zijin Zhao, and Enhong Chen, Finding gangs in war from signed networks, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1505–1514.
[Das08] Sanjoy Dasgupta, The hardness of k-means clustering, Technical Report CS2007-0890, University of California, San Diego, 2008.
[DEFI06] Erik D. Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica, Correlation clustering in general weighted graphs, Theor. Comput. Sci. 361 (2006), no. 2, 172–187.
[DK70] Chandler Davis and W. M. Kahan, The rotation of eigenvectors by a perturbation. iii, SIAM Journal on Numerical Analysis 7 (1970), no. 1, 1–46.
[DMT18] Tyler Derr, Yao Ma, and Jiliang Tang, Signed graph convolutional networks, 2018 IEEE International Conference on Data Mining (ICDM), IEEE, 2018, pp. 929–934.
[epi] Epinions data set, http://www.epinions.com, Accessed: 2010-09-30.
[Foc05] Sergio M Focardi, Clustering economic and financial time series: Exploring the existence of stable correlation conditions, The Intertek Group (2005).
[FSK⁺12] André Fujita, Patricia Severino, Kaname Kojima, João Ricardo Sato, Alexandre Galvão Patriota, and Satoru Miyano, Functional clustering of time series gene expression data by Granger causality, BMC systems biology 6 (2012), no. 1, 137.
[Gal13] Jean H. Gallier, Notes on elementary spectral graph theory. applications to graph clustering using normalized cuts, CoRR abs/1311.2492 (2013).
[Gal16] Jean Gallier, Spectral theory of unsigned and signed graphs. applications to graph clustering: a survey, CoRR abs / 1601.04692 (2016), 1–122.
[HCD12] Cho-Jui Hsieh, Kai-Yang Chiang, and Inderjit S. Dhillon, Low-Rank Modeling of Signed Networks, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2012.
[HLN15] Gyeong-Gyun Ha, Jae Woo Lee, and Ashadun Nobi, Threshold network of a financial market using the p-value of correlation coefficients, Journal of the Korean Physical Society 66 (2015), no. 12, 1802–1808.
[Imb09] Roberto Imbuzeiro Oliveira, Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges, arXiv e-prints (2009), arXiv:0911.0600.
[J⁺15] Jiashun Jin et al., Fast community detection by score, The Annals of Statistics 43 (2015), no. 1, 57–89.
[JJSK16] Jinhong Jung, Woojeong Jin, Lee Sael, and U Kang, Personalized ranking in signed networks using signed random walk with restart, 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 973–978.
[JY16] Antony Joseph and Bin Yu, Impact of regularization on spectral clustering, Annals of Statistics 44 (2016), no. 4, 1765–1791.
[Kny01] A. Knyazev, Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method, SIAM Journal on Scientific Computing 23 (2001), no. 2, 517–541.
[KSL⁺10] Jérôme Kunegis, Stephan Schmidt, Andreas Lommatzsch, Jürgen Lerner, Ernesto W. De Luca, and Sahin Albayrak, Spectral analysis of signed graphs for clustering, prediction and visualization, pp. 559–570, SIAM, 2010.
[KSS04] A. Kumar, Y. Sabharwal, and S. Sen, A simple linear time (1 + $\varepsilon$ )-approximation algorithm for k-means clustering in any dimensions, 45th Annual IEEE Symposium on Foundations of Computer Science, 2004, pp. 454–462.
[KSS14] Srijan Kumar, Francesca Spezzano, and VS Subrahmanian, Accurately detecting trolls in Slashdot Zoo via decluttering, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), IEEE, 2014, pp. 188–195.
[KSSF16] Srijan Kumar, Francesca Spezzano, V.S. Subrahmanian, and Christos Faloutsos, Edge weight prediction in weighted signed networks, ICDM, 2016.
[LFZ19] Xiaoming Li, Hui Fang, and Jie Zhang, Supervised user ranking in signed social networks, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 184–191.
[LGT14] James R Lee, Shayan Oveis Gharan, and Luca Trevisan, Multiway spectral partitioning and higher-order Cheeger inequalities, Journal of the ACM (JACM) 61 (2014), no. 6, 1–30.
[LHK10] J. Leskovec, D. Huttenlocher, and J. Kleinberg, Predicting positive and negative links in online social networks, WWW, 2010, pp. 641–650.
[Li94] Ren-Cang Li, On perturbations of matrix pencils with real spectra, Mathematics of Computation 62 (1994), no. 205, 231–265.
[LLV15] Can M. Le, Elizaveta Levina, and Roman Vershynin, Sparse random graphs: regularization and concentration of the laplacian, 2015.
[LLV17] Can M. Le, Elizaveta Levina, and Roman Vershynin, Concentration and regularization of random graphs, Random Structures & Algorithms 51 (2017), no. 3, 538–561.
[LR15] Jing Lei and Alessandro Rinaldo, Consistency of spectral clustering in stochastic block models, Ann. Statist. 43 (2015), no. 1, 215–237.
[LSS16] Edo Liberty, Ram Sriharsha, and Maxim Sviridenko, An algorithm for online k-means clustering, 2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), SIAM, 2016, pp. 81–89.
[LTJC20] Yu Li, Yuan Tian, Zhang Jiawei, and Yi Chang, Learning signed network embedding via graph attention, Proceedings of the AAAI Conference on Artificial Intelligence 34 (2020), 4772–4779.
[MNV12] Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan, The planar k-means problem is NP-hard, Theoretical Computer Science 442 (2012), 13 – 21.
[MSB14] Ekaterina Merkurjev, Justin Sunu, and Andrea L. Bertozzi, Graph MBO method for multiclass segmentation of hyperspectral stand-off detection video, Image Processing (ICIP), 2014 IEEE International Conference on, IEEE, 2014, pp. 689–693.
[MTH16] Pedro Mercado, Francesco Tudisco, and Matthias Hein, Clustering signed networks with the geometric mean of laplacians, Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, eds.), Curran Associates, Inc., 2016, pp. 4421–4429.
[MTH19] , Spectral clustering of signed graphs via matrix power means, 36th International Conference on Machine Learning (Long Beach, California, USA) (Kamalika Chaudhuri and Ruslan Salakhutdinov, eds.), Proceedings of Machine Learning Research, vol. 97, PMLR, 09–15 Jun 2019, pp. 4526–4536.
[MU05] Michael Mitzenmacher and Eli Upfal, Probability and computing: Randomized algorithms and probabilistic analysis, Cambridge University Press, New York, NY, USA, 2005.
[MWD⁺18] Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, and Ignacio Lopez Moreno, Links: A high-dimensional online clustering method, 2018.
[NJW⁺02] Andrew Y Ng, Michael I Jordan, Yair Weiss, et al., On spectral clustering: Analysis and an algorithm, Advances in neural information processing systems 2 (2002), 849–856.
[PPTV06] Nicos G Pavlidis, Vassilis P Plagianakos, Dimitris K Tasoulis, and Michael N Vrahatis, Financial forecasting through unsupervised clustering and neural networks, Operational Research 6 (2006), no. 2, 103–127.
[QR13] Tai Qin and Karl Rohe, Regularized spectral clustering under the degree-corrected stochastic blockmodel, Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, 2013, pp. 3120–3128.
[RCY⁺11] Karl Rohe, Sourav Chatterjee, Bin Yu, et al., Spectral clustering and the high-dimensional stochastic blockmodel, The Annals of Statistics 39 (2011), no. 4, 1878–1915.
[SgS90] G.W. Stewart and Ji guang Sun, Matrix perturbation theory, Academic Press, 1990.
[Sin11] A. Singer, Angular synchronization by eigenvectors and semidefinite programming, Appl. Comput. Harmon. Anal. 30 (2011), no. 1, 20–36.
[sla] Slashdot data set, http://www.slashdot.com, Accessed: 2010-09-30.
[SM19] Ludovic Stephan and Laurent Massoulié, Robustness of spectral methods for community detection, Proceedings of the Thirty-Second Conference on Learning Theory (Phoenix, USA) (Alina Beygelzimer and Daniel Hsu, eds.), Proceedings of Machine Learning Research, vol. 99, PMLR, 25–28 Jun 2019, pp. 2831–2860.
[SMSK⁺11] Stephen M. Smith, Karla L. Miller, Gholamreza Salimi-Khorshidi, Matthew Webster, Christian F. Beckmann, Thomas E. Nichols, Joseph D. Ramsey, and Mark W. Woolrich, Network modelling methods for FMRI, NeuroImage 54 (2011), no. 2, 875 – 891.
[ST00] Ahmed Sameh and Zhanye Tong, The trace minimization method for the symmetric generalized eigenvalue problem, Journal of Computational and Applied Mathematics 123 (2000), no. 1, 155 – 175, Numerical Analysis 2000. Vol. III: Linear Algebra.
[TAL16] Jiliang Tang, Charu Aggarwal, and Huan Liu, Node classification in signed social networks, SDM, 2016.
[vL07] U. von Luxburg, A tutorial on spectral clustering, Statistics and Computing 17 (2007), 395–416.
[Wey12] Hermann Weyl, Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung), Mathematische Annalen 71 (1912), no. 4, 441–479.
[YCL07] B. Yang, W. K. Cheung, and J. Liu, Community mining from signed social networks, IEEE Trans Knowl Data Eng 19 (2007), no. 10, 1333–1348.
[YS20] Bowei Yan and Purnamrita Sarkar, Covariate regularized community detection in sparse graphs, Journal of the American Statistical Association 0 (2020), no. 0, 1–12.
[YWS15] Y. Yu, T. Wang, and R. J. Samworth, A useful variant of the Davis–Kahan theorem for statisticians, Biometrika 102 (2015), no. 2, 315–323.
[ZA18] Zhixin Zhou and Arash A. Amini, Analysis of spectral clustering algorithms for community detection: the general bipartite setting, 2018.
[ZJGK10] Hartmut Ziegler, Marco Jenny, Tino Gruse, and Daniel A Keim, Visual market sector analysis for financial time series data, Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on, IEEE, 2010, pp. 83–90.
[ZR18] Yilin Zhang and Karl Rohe, Understanding regularized spectral clustering via graph conductance, 2018.

Appendix A Useful concentration inequalities

A.1 Chernoff bounds

Recall the following Chernoff bound for sums of independent Bernoulli random variables.

Theorem A.1 ([MU05, Corollary 4.6]).

Let $X_{1},\dots,X_{n}$ be independent Bernoulli random variables with $\operatorname*{\varmathbb{P}}\left[X_{i}=1\right]=p_{i}$ . Let $X=\sum_{i=1}^{n}X_{i}$ and $\mu=\operatorname*{\varmathbb{E}}[X]$ . For $\delta\in(0,1)$ , it holds true that

\operatorname*{\varmathbb{P}}\left[\left\lvert X-\mu\right\rvert\geqslant\delta\mu\right]\leqslant 2\exp(-\mu\delta^{2}/3).

A.2 Spectral norm of random matrices

We will make use of the following result for bounding the spectral norm of symmetric matrices with independent, centered and bounded random variables.

Theorem A.2 ([BvH16, Corollary 3.12, Remark 3.13]).

Let $X$ be an $n\times n$ symmetric matrix whose entries $X_{ij}$ $(i\leqslant j)$ are independent, centered random variables. There there exists for any $0<\varepsilon\leqslant 1/2$ a universal constant $c_{\varepsilon}$ such that for every $t\geqslant 0$ ,

\operatorname*{\varmathbb{P}}\left[\left\lVert X\right\rVert\geqslant(1+\varepsilon)2\sqrt{2}\widetilde{\sigma}+t\right]\leqslant n\exp\left(-\frac{t^{2}}{c_{\varepsilon}\widetilde{\sigma}_{*}^{2}}\right)

(90)

where

\widetilde{\sigma}:=\max_{i}\sqrt{\sum_{j}\operatorname*{\varmathbb{E}}[X_{ij}^{2}]},\quad\widetilde{\sigma}_{*}:=\max_{i,j}\left\lVert X_{ij}\right\rVert_{\infty}.

Note that it suffices to employ upper bound estimates on $\widetilde{\sigma},\widetilde{\sigma}_{*}$ in (90). Indeed, if $\widetilde{\sigma}\leqslant\widetilde{\sigma}^{(u)}$ and $\widetilde{\sigma}_{*}\leqslant\widetilde{\sigma}_{*}^{(u)}$ , then

\operatorname*{\varmathbb{P}}\left[\left\lVert X\right\rVert\geqslant(1+\varepsilon)2\sqrt{2}\widetilde{\sigma}^{(u)}+t\right]\leqslant\operatorname*{\varmathbb{P}}\left[\left\lVert X\right\rVert\geqslant(1+\varepsilon)2\sqrt{2}\widetilde{\sigma}+t\right]\leqslant n\exp\left(-\frac{t^{2}}{c_{\varepsilon}{\widetilde{\sigma}_{*}}^{2}}\right)\leqslant n\exp\left(-\frac{t^{2}}{c_{\varepsilon}(\widetilde{\sigma}_{*}^{(u)})^{2}}\right).

A.3 A graph decomposition result

The following graph decomposition result for inhomogeneous Erdős-Rényi graphs was established in [LLV17, Theorem 2.6].

Theorem A.3.

[LLV17, Theorem 2.6] Let $A$ be a directed adjacency matrix sampled from an inhomogeneous Erdős-Rényi $G(n,(p_{jj^{\prime}})_{j,j^{\prime}})$ model and let $d=n\max_{j,j^{\prime}}p_{jj^{\prime}}$ . For any $r\geqslant 1$ , with probability at least $1-3n^{-r}$ , the set of edges $[n]\times[n]$ can be partitioned into three classes $\mathcal{N},\mathcal{R}$ and $\mathcal{C}$ , such that

1.

the signed adjacency matrix concentrates on $\mathcal{N}$

$\|(A-\varmathbb{E}A)_{\mathcal{N}}\|\leqslant Cr^{3/2}\sqrt{d},$
2.

$\mathcal{R}$ (resp. $\mathcal{C}$ ) intersects at most $n/d$ columns (resp. rows) of $[n]\times[n]$ ,
3.

each row (resp. column) of $A_{\mathcal{R}}$ (resp. $A_{\mathcal{C}}$ ) have at most $32r$ non-zero entries.

Appendix B Matrix perturbation analysis

In this section, we recall several standard tools from matrix perturbation theory for studying the perturbation of the spectra of Hermitian matrices. The reader is referred to [SgS90] for a more comprehensive overview of this topic.

Let $A\in\varmathbb{C}^{n\times n}$ be Hermitian with eigenvalues $\lambda_{1}\geqslant\lambda_{2}\geqslant\cdots\geqslant\lambda_{n}$ and corresponding eigenvectors $v_{1},v_{2},\dots,v_{n}\in\varmathbb{C}^{n}$ . Let $\widetilde{A}=A+W$ be a perturbed version of $A$ , with the perturbation matrix $W\in\varmathbb{C}^{n\times n}$ being Hermitian. Let us denote the eigenvalues of $\widetilde{A}$ and $W$ by $\widetilde{\lambda}_{1}\geqslant\cdots\geqslant\widetilde{\lambda}_{n}$ , and $\epsilon_{1}\geqslant\epsilon_{2}\geqslant\cdots\geqslant\epsilon_{n}$ , respectively.

To begin with, one can quantify the perturbation of the eigenvalues of $\widetilde{A}$ with respect to the eigenvalues of $A$ . Weyl’s inequality [Wey12] is a very useful result in this regard.

Theorem B.1 (Weyl’s Inequality [Wey12]).

For each $i=1,\dots,n$ , it holds that

\lambda_{i}+\epsilon_{n}\leqslant\widetilde{\lambda}_{i}\leqslant\lambda_{i}+\epsilon_{1}.

(91)

In particular, this implies that $\widetilde{\lambda}_{i}\in[\lambda_{i}-\left\lVert W\right\rVert,\lambda_{i}+\left\lVert W\right\rVert]$ .

One can also quantify the perturbation of the subspace spanned by eigenvectors of $A$ , which was established by Davis and Kahan [DK70]. Before introducing the theorem, we need some definitions. Let $U,\widetilde{U}\in\varmathbb{C}^{n\times k}$ (for $k\leqslant n$ ) have orthonormal columns respectively, and let $\sigma_{1}\geqslant\dots\geqslant\sigma_{k}$ denote the singular values of $U^{*}\widetilde{U}$ . Also, let us denote $\mathcal{R}(U)$ to be the range space of the columns of $U$ , and similarly for $\mathcal{R}(\widetilde{U})$ . Then the $k$ principal angles between $\mathcal{R}(U),\mathcal{R}(\widetilde{U})$ are defined as $\theta_{i}:=\cos^{-1}(\sigma_{i})$ for $1\leqslant i\leqslant k$ , with each $\theta_{i}\in[0,\pi/2]$ . It is usual to define $k\times k$ diagonal matrices $\Theta(\mathcal{R}(U),\mathcal{R}(\widetilde{U})):=\text{diag}(\theta_{1},\dots,\theta_{k})$ and $\sin\Theta(\mathcal{R}(U),\mathcal{R}(\widetilde{U})):=\text{diag}(\sin\theta_{1},\dots,\sin\theta_{k})$ . Denoting $|||\cdot|||$ to be any unitarily invariant norm (Frobenius, spectral, etc.), the following relation holds (see for eg., [Li94, Lemma 2.1], [SgS90, Corollary I.5.4]).

|||\sin\Theta(\mathcal{R}(U),\mathcal{R}(\widetilde{U}))|||=|||(I-\widetilde{U}\widetilde{U}^{*})U|||.

With the above notation in mind, we now introduce a version of the Davis-Kahan theorem taken from [YWS15, Theorem 1] (see also [SgS90, Theorem V.3.6]).

Theorem B.2 (Davis-Kahan).

Fix $1\leqslant r\leqslant s\leqslant n$ , let $d=s-r+1$ , and let $U=(u_{r},u_{r+1},\dots,u_{s})\in\varmathbb{C}^{n\times d}$ and $\widetilde{U}=(\widetilde{u}_{r},\widetilde{u}_{r+1},\dots,\widetilde{u}_{s})\in\varmathbb{C}^{n\times d}$ . Write

\delta=\inf\left\{\left\lvert\hat{\lambda}-\lambda\right\rvert:\lambda\in[\lambda_{s},\lambda_{r}],\hat{\lambda}\in(-\infty,\widetilde{\lambda}_{s+1}]\cup[\widetilde{\lambda}_{r-1},\infty)\right\}

where we define $\widetilde{\lambda}_{0}=\infty$ and $\widetilde{\lambda}_{n+1}=-\infty$ and assume that $\delta>0$ . Then

|||\sin\Theta(\mathcal{R}(U),\mathcal{R}(\widetilde{U}))|||=|||(I-\widetilde{U}\widetilde{U}^{*})U|||\leqslant\frac{|||W|||}{\delta}.

For instance, if $r=s=j$ , then by using the spectral norm $\left\lVert\cdot\right\rVert$ , we obtain

\sin\Theta(\mathcal{R}(\widetilde{v}_{j}),\mathcal{R}(v_{j}))=\left\lVert(I-v_{j}v_{j}^{*})\widetilde{v}_{j}\right\rVert\leqslant\frac{\left\lVert W\right\rVert}{\min\left\{\left\lvert\widetilde{\lambda}_{j-1}-\lambda_{j}\right\rvert,\left\lvert\widetilde{\lambda}_{j+1}-\lambda_{j}\right\rvert\right\}}.

(92)

Finally, we recall the following standard result which states that given any pair of $k$ -dimensional subspaces with orthonormal basis matrices $U,\widetilde{U}\in\varmathbb{R}^{n\times k}$ , there exists an alignment of $U,\widetilde{U}$ with the error after alignment bounded by the distance between the subspaces. We provide the proof for completeness.

Proposition B.3.

Let $U,\widetilde{U}\in\varmathbb{R}^{n\times k}$ respectively consist of orthonormal vectors. Then there exists a $k\times k$ rotation matrix $O$ such that

\left\lVert\widetilde{U}-UO\right\rVert\leqslant 2\left\lVert(I-UU^{T})\widetilde{U}\right\rVert.

Proof.

Write the SVD as $U^{T}\widetilde{U}=V\Sigma(V^{\prime})^{T}$ , where we recall that the $i$ th largest singular value $\sigma_{i}=\cos\theta_{i}$ with $\theta_{i}\in[0,\pi/2]$ denoting the principal angles between $\mathcal{R}(U)$ and $\mathcal{R}(\widetilde{U})$ . Choosing $O=V(V^{\prime})^{T}$ , we then obtain

	$\displaystyle\left\lVert\widetilde{U}-UV(V^{\prime})^{T}\right\rVert$	$\displaystyle\leqslant\left\lVert\widetilde{U}-UU^{T}\widetilde{U}\right\rVert+\left\lVert UU^{T}\widetilde{U}-UV(V^{\prime})^{T}\right\rVert$
		$\displaystyle=\left\lVert(I-UU^{T})\widetilde{U}\right\rVert+\left\lVert U^{T}\widetilde{U}-V(V^{\prime})^{T}\right\rVert$
		$\displaystyle=\left\lVert(I-UU^{T})\widetilde{U}\right\rVert+\left\lVert I-\Sigma\right\rVert$
		$\displaystyle\leqslant 2\left\lVert(I-UU^{T})\widetilde{U}\right\rVert,$

where the last inequality follows from the fact $\left\lVert I-\Sigma\right\rVert=1-\cos\theta_{k}\leqslant\sin\theta_{k}$ . ∎

Appendix C Summary of main technical tools

This section collects certain technical results that were used in the course of proving our main results.

Proposition C.1 ([Bha96, Theorem X.1.1]).

For matrices $A,B\succ 0$ ,

\left\lVert A^{1/2}-B^{1/2}\right\rVert\leqslant||A-B||^{1/2}

holds as $(\cdot)^{1/2}$ is operator monotone.

Proposition C.2.

For symmetric matrices $A^{+}$ , $A^{-}$ , $B^{+}$ and $B^{-}$ where $A^{-},B^{-}\succ 0$ , the following holds.

	$\displaystyle\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}B^{+}(B^{-})^{-1/2}\right\rVert$
	$\displaystyle\qquad\leqslant\left\lVert(A^{-})^{-1}\right\rVert\left\lVert A^{+}\right\rVert\left(\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert^{2}+2\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert\right)+\left\lVert(B^{-})^{-1}\right\rVert\left\lVert A^{+}-B^{+}\right\rVert$
	$\displaystyle\qquad\leqslant\left\lVert(A^{-})^{-1}\right\rVert\left\lVert A^{+}\right\rVert\left(\left\lVert(B^{-})^{-1}\right\rVert\left\lVert(B^{-})-(A^{-})\right\rVert+2\left\lVert(B^{-})^{-1/2}\right\rVert\left\lVert(B^{-})-(A^{-})\right\rVert^{1/2}\right)+\left\lVert(B^{-})^{-1}\right\rVert\left\lVert A^{+}-B^{+}\right\rVert\,.$

Proof.

	$\displaystyle\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}B^{+}(B^{-})^{-1/2}\right\rVert$
	$\displaystyle=\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}+(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}-(B^{-})^{-1/2}B^{+}(B^{-})^{-1/2}\right\rVert$
	$\displaystyle\leqslant\left\lVert(B^{-})^{-1/2}(A^{+}-B^{+})(B^{-})^{-1/2}\right\rVert+\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}\right\rVert\,.$

Now, we bound the two terms separately. The first term is easy to bound.

	$\displaystyle\left\lVert(B^{-})^{-1/2}(A^{+}-B^{+})(B^{-})^{-1/2}\right\rVert$	$\displaystyle\leqslant\left\lVert(B^{-})^{-1/2}\right\rVert\left\lVert A^{+}-B^{+}\right\rVert\left\lVert(B^{-})^{-1/2}\right\rVert$
		$\displaystyle=\left\lVert(B^{-})^{-1}\right\rVert\left\lVert A^{+}-B^{+}\right\rVert\,.$		(93)

To bound the second term, we do the following manipulations,

		$\displaystyle\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}\right\rVert$
		$\displaystyle\qquad=\left\lVert(A^{-})^{-1/2}A^{+}(A^{-})^{-1/2}-(A^{-})^{-1/2}(A^{-})^{1/2}(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}(A^{-})^{1/2}(A^{-})^{-1/2}\right\rVert$
		$\displaystyle\qquad=\left\lVert(A^{-})^{-1/2}\left(A^{+}-(A^{-})^{1/2}(B^{-})^{-1/2}A^{+}(B^{-})^{-1/2}(A^{-})^{1/2}\right)(A^{-})^{-1/2}\right\rVert$
		$\displaystyle\qquad=\left\lVert(A^{-})^{-1/2}\left(A^{+}-\left((A^{-})^{1/2}(B^{-})^{-1/2}-I+I\right)A^{+}\left((B^{-})^{-1/2}(A^{-})^{1/2}-I+I\right)\right)(A^{-})^{-1/2}\right\rVert$
		$\displaystyle\qquad=\left\lVert(A^{-})^{\frac{-1}{2}}\left(((A^{-})^{\frac{1}{2}}(B^{-})^{\frac{-1}{2}}-I)A^{+}((B^{-})^{\frac{-1}{2}}(A^{-})^{\frac{1}{2}}-I)+A^{+}((B^{-})^{\frac{-1}{2}}(A^{-})^{\frac{1}{2}}-I)+((A^{-})^{\frac{1}{2}}(B^{-})^{\frac{-1}{2}}-I)A^{+}\right)(A^{-})^{\frac{-1}{2}}\right\rVert$
		$\displaystyle\qquad\leqslant\left\lVert(A^{-})^{-1}\right\rVert\left\lVert A^{+}\right\rVert\left(\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert^{2}+2\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert\right)\,.$		(94)

The first inequality of the lemma follows by adding (C) and (C).

To see the second inequality of the lemma, observe that,

$\displaystyle\left\lVert I-(B^{-})^{-1/2}(A^{-})^{1/2}\right\rVert$	$\displaystyle=\left\lVert(B^{-})^{-1/2}((B^{-})^{1/2}-(A^{-})^{1/2})\right\rVert$
	$\displaystyle\leqslant\left\lVert(B^{-})^{-1/2}\right\rVert\left\lVert(B^{-})^{1/2}-(A^{-})^{1/2}\right\rVert$
	$\displaystyle\leqslant\left\lVert(B^{-})^{-1/2}\right\rVert\left\lVert B^{-}-A^{-}\right\rVert^{1/2}\quad(\text{ using \hyperref@@ii[prop:op_monotone]{Proposition~\ref*{prop:op_monotone}}})\,.$	(95)

The second inequality of the lemma follows by substituting (C) in the first inequality of the lemma.

∎

Appendix D Proofs from Section 4

Lemma D.1 (Expression for $C^{+}_{e}$ & $C^{-}_{e}$ ).

C^{+}_{e}=-p\eta\frac{n}{d^{+}}\chi_{1}\chi_{1}^{\top}+\left(1+\tau^{-}+\frac{p}{d^{+}}\left(1-\eta-\frac{n}{k}(1-2\eta)\right)\right)I\,,

C^{-}_{e}=-p(1-\eta)\frac{n}{d^{-}}\chi_{1}\chi_{1}^{\top}+\left(1+\tau^{+}+\frac{p}{d^{-}}\left(\eta+\frac{n}{k}(1-2\eta)\right)\right)I\,.

It follows that can be written as $C^{+}_{e}=R\Sigma^{+}R^{\top}$ and $C^{-}_{e}=R\Sigma^{-}R^{\top}$ , where $R$ is a rotation matrix, and

\Sigma^{+}=\begin{bmatrix}\left(1+\tau^{-}+\frac{p}{d^{+}}\left(1-\eta-n\left(\eta+\frac{1-2\eta}{k}\right)\right)\right)\\ &\left(1+\tau^{-}+\frac{p}{d^{+}}\left(1-\eta-n\left(\frac{1-2\eta}{k}\right)\right)\right)I_{k-1}\end{bmatrix}\,,

\Sigma^{-}=\begin{bmatrix}\left(1+\tau^{+}+\frac{p}{d^{-}}\left(\eta-n\left(1-\eta-\frac{1-2\eta}{k}\right)\right)\right)\\ &\left(1+\tau^{+}+\frac{p}{d^{-}}\left(\eta+n\left(\frac{1-2\eta}{k}\right)\right)\right)I_{k-1}\end{bmatrix}\,.

The above lemma shows that we know the spectrum of $(C^{-})^{-1/2}C^{+}(C^{-})^{-1/2}$ exactly, in the case of equal-sized clusters.

Proof of Lemma 4.6.

From (17) it follows that,

\lambda_{\max}(C^{+})\leqslant\max_{i\in[k]}\left(1+\tau^{-}+\frac{p}{d_{i}^{+}}(1-\eta-n_{i}(1-2\eta))\right)\,.

The maximum is achieved for the smallest sized cluster. This shows the proof for (33).

The proof of (34) follows from the fact that in (24) we had decomposed the matrix $\overline{L^{-}_{sym}}+\tau^{+}I$ as a block-diagonal matrix, with block of $C^{-},\alpha_{1}^{-}I_{n_{1}-1},\ldots,\alpha_{k}^{-}I_{n_{k}-1}$ . Since $\overline{L^{-}_{sym}}$ is a symmetric Laplacian, we know that $\lambda_{\min}(\overline{L^{-}_{sym}}+\tau^{+}I)=\tau^{+}$ . Also, $\alpha_{i}^{-}>\tau^{+}$ for $i\in[k]$ . Thus the equation follows. ∎

Appendix E Spectrum of Signed Laplacians

This section extends some classical results for the unsigned Laplacian to the symmetric Signed Laplacian and the regularized Laplacian.

Lemma E.1.

For all $x\in\varmathbb R^{n}$ ,

x^{T}\overline{L_{sym}}x=\frac{1}{2}\sum_{j,j^{\prime}}|A_{jj^{\prime}}|\left(\frac{x_{j}}{\sqrt{d_{j}}}-sgn(A_{jj^{\prime}})\frac{x_{j^{\prime}}}{\sqrt{d_{j^{\prime}}}}\right)^{2}

(96)

Moreover, the eigenvalues of $\overline{L_{sym}}$ and $L_{\gamma}$ are in the interval $[0,2]$ .

Proof.

Equation (96) is adapted from Proposition 5.2 from [Gal16] and is obtained by replacing $x$ by $\bar{D}^{-1/2}x$ . The second part of the lemma comes from the fact that $(a\pm b)^{2}\leqslant 2(a^{2}+b^{2})$ . In fact, for $x\in\varmathbb R^{n}$ such that $\left\lVert x\right\rVert=1$ , we have

	$\displaystyle x^{T}\overline{L_{sym}}x$	$\displaystyle\leqslant\sum_{j,j^{\prime}}\|A_{jj^{\prime}}\|\left(\frac{x_{j}^{2}}{d_{j}}+\frac{x_{j^{\prime}}^{2}}{d_{j^{\prime}}}\right)$
		$\displaystyle=2\sum_{j,j^{\prime}}\|A_{jj^{\prime}}\|\frac{x_{j}^{2}}{d_{j}}=2\sum_{j}x_{j}^{2}=2.$

Similarly, we have

	$\displaystyle x^{T}L_{\gamma}x$	$\displaystyle\leqslant\sum_{j,j^{\prime}}\|(A_{\gamma}){jj^{\prime}}\|\left(\frac{x_{j}^{2}}{\bar{D}_{jj}+\gamma}+\frac{x_{j^{\prime}}^{2}}{\bar{D}_{j^{\prime}j^{\prime}}+\gamma}\right)$
		$\displaystyle\leqslant 2\sum_{j,j^{\prime}}(\|A_{jj^{\prime}}\|+\frac{\gamma}{n})\frac{x_{j}^{2}}{\bar{D}_{jj}+\gamma}$
		$\displaystyle=2\sum_{j}\frac{(\bar{D}_{jj}+\gamma)x_{j}^{2}}{\bar{D}_{jj}+\gamma}=2.$

Moreover $\overline{L_{sym}}$ and $L_{\gamma}$ are positive semi-definite, thus we can conclude that their eigenvalues are between 0 and 2. ∎

	$\displaystyle\\|\overline{C}-\overline{C}_{e}\\|$	$\displaystyle=\left\\|\frac{np}{\overline{d}}(1-2\eta)\left(\left(\frac{u}{\\|u\\|}\right)\left(\frac{u}{\\|u\\|}\right)^{T}-\chi_{1}\chi_{1}^{T}\right)-2\frac{np}{\overline{d}}(1-2\eta)\left(D_{u}-\frac{1}{k}I_{n}\right)\right\\|$
		$\displaystyle\leqslant\frac{np}{\overline{d}}(1-2\eta)\left\\|\left(\frac{u}{\\|u\\|}\right)\left(\frac{u}{\\|u\\|}\right)^{T}-\chi_{1}\chi_{1}^{T}\right\\|+2\frac{np}{\overline{d}}(1-2\eta)\left\\|D_{u}-\frac{1}{k}I_{n}\right\\|.$		(72)

	$\displaystyle\\|(A-\varmathbb{E}A)_{\mathcal{N}}\\|$	$\displaystyle=\\|(A^{+}_{inf}-\varmathbb{E}A^{+}_{inf})_{\mathcal{N}^{+}_{inf}}+(A^{+}_{sup}-\varmathbb{E}A^{+}_{sup})_{\mathcal{N}^{+}_{sup}}-(A^{-}_{inf}-\varmathbb{E}A^{-}_{inf})_{\mathcal{N}^{-}_{inf}}-(A^{-}_{sup}-\varmathbb{E}A^{-}_{sup})_{\mathcal{N}^{-}_{sup}}\\|$
		$\displaystyle\leqslant\\|(A^{+}_{inf}-\varmathbb{E}A^{+}_{inf})_{\mathcal{N}^{+}_{inf}}\\|+\\|(A^{+}_{sup}-\varmathbb{E}A^{+}_{sup})_{\mathcal{N}^{+}_{sup}}\\|+\\|(A^{-}_{inf}-\varmathbb{E}A^{-}_{inf})_{\mathcal{N}^{-}_{inf}}\\|+\\|(A^{-}_{sup}-\varmathbb{E}A^{-}_{sup})_{\mathcal{N}^{-}_{sup}}\\|$
		$\displaystyle\leqslant 4Cr^{3/2}\sqrt{d}\leqslant C_{1}r^{3/2}\sqrt{\overline{d}(1-\eta)},$

	$\displaystyle\\|L_{\gamma}-\mathcal{L}_{\gamma}\\|$	$\displaystyle\leqslant\\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{N}}\\|+\\|\left((L_{\gamma}-I)-(\mathcal{L}_{\gamma}-I)\right)_{\mathcal{R}}\\|+\\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\\|$
		$\displaystyle\leqslant\\|\left(L_{\gamma}-\mathcal{L}_{\gamma}\right)_{\mathcal{N}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\\|$
		$\displaystyle=\\|\left(S+T\right)_{\mathcal{N}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\\|$
		$\displaystyle\leqslant\\|S_{\mathcal{N}}\\|+\\|T_{\mathcal{N}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{R}}\\|+\\|\left(I-L_{\gamma}\right)_{\mathcal{C}}\\|+\\|\left(I-\mathcal{L}_{\gamma}\right)_{\mathcal{C}}\\|.$

$\displaystyle\left\|\frac{1}{\sqrt{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)}}-\frac{1}{\overline{d}+\gamma}\right\|$	$\displaystyle=\frac{\|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)^{2}\|}{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)(\overline{d}+\gamma)+\sqrt{(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)}(\overline{d}+\gamma)^{2}}$
	$\displaystyle\leqslant\frac{\|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)^{2}\|}{2\gamma^{3}}$
	$\displaystyle=\frac{\|(\overline{D}_{jj}+\gamma)(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)-(\overline{d}+\gamma)(\overline{D}_{jj}+\gamma)+(\overline{d}+\gamma)(\overline{D}_{jj}+\gamma)-(\overline{d}+\gamma)^{2}\|}{2\gamma^{3}}$
	$\displaystyle=\frac{\|(\overline{D}_{jj}-\overline{d})(\overline{D}_{j^{\prime}j^{\prime}}+\gamma)+(\overline{d}+\gamma)(\overline{D}_{jj}-\overline{d})\|}{2\gamma^{3}},$	(80)

	$\displaystyle\left\lVert(R_{k-1})_{i}-(R_{k-1})_{i^{\prime}}\right\rVert^{2}$	$\displaystyle=\left\lVert(R_{k-1}){i}\right\rVert^{2}+\left\lVert(R_{k-1}){i^{\prime}}\right\rVert^{2}-2\underbrace{<(R_{k-1})_{i},(R_{k-1})_{i^{\prime}}>}_{\leqslant 0}$
		$\displaystyle\geqslant\left\lVert(R_{k-1})_{i}\right\rVert^{2}+\left\lVert(R_{k-1})_{i^{\prime}}\right\rVert^{2}$
		$\displaystyle=2-\underbrace{[(\gamma_{1})_{i}^{2}+(\gamma_{1})_{i^{\prime}}^{2}]}_{\leqslant 1}\geqslant 1.$

Regime I	Equal-size clusters	Unequal-size clusters
$k=3$
$k=5$
$k=10$
$k=20$

Regime II	Equal-size clusters	Unequal-size clusters
$k=3$
$k=5$
$k=10$
$k=20$

SPONGE Extension

Regularized spectral methods for clustering signed networks

Abstract

1 Introduction

Signed graphs.

Spectral clustering and regularization.

Motivation & Applications.

Paper outline.

1.1 Notation

1.2 Related literature on signed clustering and graph regularization techniques

Signed clustering.

Regularization in the sparse regime.

1.3 Summary of our main contributions

2 Problem setup

2.1 Clustering via the SPONGEsym algorithm

Remark 2.1.

Clustering in the sparse regime.

Remark 2.2.

2.2 Clustering via the symmetric Signed Laplacian

Clustering in the sparse regime.

Remark 2.3.

2.3 Signed Stochastic Block Model (SSBM)

Remark 2.4.

3 Summary of main results

3.1 Symmetric SPONGE

Theorem 3.1 (Restating Theorem 4.13; Eigenspace alignment of SPONGEsym in the dense case).

SPONGEsym in the sparse regime.

Theorem 3.2 (Restating Theorem 4.18 ).

Mis-clustering error bounds.

Theorem 3.3 (Re-Stating Theorem 4.20).

3.2 Symmetric Signed Laplacian

Theorem 3.4 (Eigenspace alignment in the dense case).

Remark 3.5.

Remark 3.6.

Regularized Signed Laplacian.

Theorem 3.7.

Remark 3.8.

Theorem 3.9 (Eigenspace alignment in the sparse case).

Remark 3.10.

Remark 3.11.

Mis-clustering error bounds.

Theorem 3.12.

4 Analysis of SPONGE Symmetric

4.1 Eigen-decomposition of T¯\overline{T}

Lemma 4.1 (Spectrum of T¯\overline{T}).

Proof.

4.2 Ensuring Vk​(T¯)=Θ​RV_{k}(\overline{T})=\Theta R and bounding the spectral gap

4.2.1 Spectral gap for equal-sized clusters

Remark 4.2 (Notation for the equal-sized clusters).

Lemma 4.3 (Bounding the spectral norm of (Ce−)−1/2​Ce+​(Ce−)−1/2(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}).

Proof.

Lemma 4.4 (Conditions on τ−\tau^{-} and τ+\tau^{+}).

Proof.

Lemma 4.5 (Conditions on τ+,τ−\tau^{+},\tau^{-}, and lower-bound on spectral gap).

Proof.

1. Ensuring (26)

2. Ensuring (27)

4.2.2 Spectral gap for the general case

Lemma 4.6 (Bounding the spectral norm of (C−)−1(C^{-})^{-1} and C+C^{+}).

Remark 4.7.

Lemma 4.8 (Conditions on τ+,τ−\tau^{+},\tau^{-}, and Lower-Bound on Spectral Gap).

Proof.

4.3 Concentration bound for ‖T−T¯‖\left\lVert T-\overline{T}\right\rVert

Theorem 4.9 (Bounding ‖Ls​y​m−Ls​y​m¯‖\left\lVert L_{sym}-\overline{L_{sym}}\right\rVert, [CR11]).

Remark 4.10.

Lemma 4.11 (Bounding ‖Ls​y​m±−Ls​y​m±¯‖\left\lVert L_{sym}^{\pm}-\overline{L_{sym}^{\pm}}\right\rVert).

Proof.

Lemma 4.12 (Bounding ‖T−T¯‖\left\lVert T-\overline{T}\right\rVert).

Proof.

4.4 Estimating Vk​(T¯)V_{k}(\overline{T}) and Gk​(T¯)G_{k}(\overline{T}) up to a rotation

Theorem 4.13.

Proof.

4.5 Clustering sparse graphs

Theorem 4.14 (Theorem 4.1 of [LLV17]).

Theorem 4.15 (Concentration of Regularized Laplacians).

Proof.

Lemma 4.16.

Proof.

Lemma 4.17 (Adapting Lemma 4.12 for the sparse regime).

Theorem 4.18.

2.1 Clustering via the SPONGE_sym algorithm

Theorem 3.1 (Restating Theorem 4.13; Eigenspace alignment of SPONGE_sym in the dense case).

SPONGE_sym in the sparse regime.

4.1 Eigen-decomposition of $\overline{T}$

Lemma 4.1 (Spectrum of $\overline{T}$ ).

4.2 Ensuring $V_{k}(\overline{T})=\Theta R$ and bounding the spectral gap

Lemma 4.3 (Bounding the spectral norm of $(C^{-}_{e})^{-1/2}C^{+}_{e}(C^{-}_{e})^{-1/2}$ ).

Lemma 4.4 (Conditions on $\tau^{-}$ and $\tau^{+}$ ).

Lemma 4.5 (Conditions on $\tau^{+},\tau^{-}$ , and lower-bound on spectral gap).

Lemma 4.6 (Bounding the spectral norm of $(C^{-})^{-1}$ and $C^{+}$ ).

Lemma 4.8 (Conditions on $\tau^{+},\tau^{-}$ , and Lower-Bound on Spectral Gap).

4.3 Concentration bound for $\left\lVert T-\overline{T}\right\rVert$

Theorem 4.9 (Bounding $\left\lVert L_{sym}-\overline{L_{sym}}\right\rVert$ , [CR11]).

Lemma 4.11 (Bounding $\left\lVert L_{sym}^{\pm}-\overline{L_{sym}^{\pm}}\right\rVert$ ).

Lemma 4.12 (Bounding $\left\lVert T-\overline{T}\right\rVert$ ).

4.4 Estimating $V_{k}(\overline{T})$ and $G_{k}(\overline{T})$ up to a rotation

4.6 Mis-clustering rate from $k$ -means

Lemma 4.19 (Lemma 5.3 of [LR15], Approximate $k$ -means error bound).

Theorem 4.20 (Mis-clustering error for SPONGE_sym).

1. Bounding the norm $\|T_{\mathcal{N}}\|$ .

2. Bounding the norm $\|S_{\mathcal{N}}\|$ .

4. Bounding $\|L_{\gamma}-\mathcal{L}_{\gamma}\|$ .

6.1 Grid search for choosing the parameters $\tau^{+},\tau^{-}$