This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Estimating mixed memberships in multi-layer networks

Huan Qing qinghuan@u.nus.edu;qinghuan@cqut.edu.cn;qinghuan07131995@163.com School of Economics and Finance, Lab of Financial Risk Intelligent Early Warning and Modern Governance, Chongqing University of Technology, Chongqing, 400054, China
Abstract

Community detection in multi-layer networks has emerged as a crucial area of modern network analysis. However, conventional approaches often assume that nodes belong exclusively to a single community, which fails to capture the complex structure of real-world networks where nodes may belong to multiple communities simultaneously. To address this limitation, we propose novel spectral methods to estimate the common mixed memberships in the multi-layer mixed membership stochastic block model. The proposed methods leverage the eigen-decomposition of three aggregate matrices: the sum of adjacency matrices, the debiased sum of squared adjacency matrices, and the sum of squared adjacency matrices. We establish rigorous theoretical guarantees for the consistency of our methods. Specifically, we derive per-node error rates under mild conditions on network sparsity, demonstrating their consistency as the number of nodes and/or layers increases under the multi-layer mixed membership stochastic block model. Our theoretical results reveal that the method leveraging the sum of adjacency matrices generally performs poorer than the other two methods for mixed membership estimation in multi-layer networks. We conduct extensive numerical experiments to empirically validate our theoretical findings. For real-world multi-layer networks with unknown community information, we introduce two novel modularity metrics to quantify the quality of mixed membership community detection. Finally, we demonstrate the practical applications of our algorithms and modularity metrics by applying them to real-world multi-layer networks, demonstrating their effectiveness in extracting meaningful community structures.

keywords:
Multi-layer networks, community detection , spectral methods , mixed membership, modularity
journal:  

1 Introduction

Multi-layer networks have emerged as a powerful tool for describing complex real-world systems. Unlike single-layer networks, these structures consist of multiple layers, where nodes represent entities and edges represent their interactions [30, 24, 5]. In this paper, we focus on multi-layer networks with the same nodes set in all layers and nodes solely interacting within their respective layers, excluding any cross-layer connections. Such networks are ubiquitous, spanning from social networks to transportation systems, biological networks, and trade networks. For instance, in social networks, individuals tend to form connections within different platforms (e.g., Facebook, Twitter, WeChat, LinkedIn), while cross-platform connections are typically not allowed. In transportation networks, layers might represent different modes of transportation (roads, railways, airways), with nodes corresponding to locations and edges indicating the availability of a specific mode between two locations [5]. The absence of cross-layer connections indicates the inability to directly switch modes without intermediate stops. Biological networks also exhibit rich multi-layer structures. In gene regulatory networks, layers could represent different types of gene interactions, with nodes representing genes and edges depicting their specific interactions [31, 4, 46]. The absence of cross-layer connections underscores the specialized nature of interactions within each layer and their vital role in governing cellular functions. The FAO-trade multi-layer network gathers trade relationships between countries across various products sourced from the Food and Agriculture Organization of the United Nations [10]. Figure 1 illustrates the networks of the first three products within this dataset.

Refer to caption
Refer to caption
Refer to caption
Figure 1: Networks of the first three products of the FAO-trade multi-layer network.

Community detection in multi-layer networks is a crucial analytical tool, revealing latent structures within networks [23, 18]. A community (also known as a group, cluster, or block) typically comprises nodes more densely interconnected than those outside [35, 36, 11, 12, 19]. In social networks, individuals often form distinct communities based on interests, occupations, or locations. For instance, on Facebook, users might belong to hobby-based communities like photography or hiking, identifiable by dense interaction patterns among members. In transportation networks, communities emerge due to geographical proximity or functional similarity. For instance, cities might form communities linked by tightly integrated railways. In practical applications, a node can simultaneously belong to multiple communities. For example, in social networks, an individual may be a member of several different social groups. Similarly, in transportation networks, a specific location can function as a hub for multiple modes of transportation, bridging diverse communities. Likewise, in biological networks, a gene can belong to several overlapping communities, participating in a wide range of processes.

In the last few years, community detection in multi-layer networks, where each node belongs exclusively to a single community, has attracted considerable attention. For instance, several studies have been proposed under the multi-layer stochastic block model (MLSBM), a model that assumes the network for each layer can be modeled by the well-known stochastic block model (SBM) [17]. MLSBM also assumes that the community information is common to all layers. Under the framework within MLSBM, [16] studied the consistent community detection for maximum likelihood estimate and a spectral method designed by running the K-means algorithm on a few eigenvectors of the sum of adjacency matrices when only the number of layers grows. [37] studied the consistency of co-regularized spectral clustering, orthogonal linked matrix factorization, and the spectral method in [16] as both the number of nodes and the number of layers grow within the MLSBM context. Moreover, [26] established consistency results for a least squares estimation of community memberships within the MLSBM framework. Recently, [27] studied the consistency of a bias-adjusted (i.e., debiased) spectral clustering method using a novel debiased sum of squared adjacency matrices under MLSBM. Their numerical experiments demonstrated that this approach significantly outperforms the spectral method proposed by [16]. Also see [39, 21, 9, 43] for other recent works exploring variants of MLSBM.

However, the MLSBM model has one significant limitation, being tailored exclusively for multi-layer networks with non-overlapping communities. The mixed membership stochastic block (MMSB) model [1] is a popular statistical model for capturing mixed community memberships in single-layer networks. Under MMSB, some methods have been developed to estimate mixed memberships for single-layer networks generated from MMSB, including variational expectation-maximization [1, 15], nonnegative matrix factorization [40, 45], tensor decomposition [2], and spectral clustering [29, 20]. In this paper, we consider the problem of estimating nodes’ common community memberships in multi-layer networks generated from the multi-layer mixed membership stochastic block (MLMMSB) model, a multi-layer version of MMSB. The main contributions of this paper are summarized as follows:

  • 1.

    We introduce three spectral methods for estimating mixed memberships in multi-layer networks generated from MLMMSB. These methods employ vertex-hunting algorithms on a few selected eigenvectors of three aggregate matrices: the sum of adjacency matrices, the debiased sum of squared adjacency matrices, and the sum of squared adjacency matrices.

  • 2.

    We establish per-node error rates for these three methods under mild conditions on network sparsity, demonstrating their consistent mixed membership estimation as the number of nodes and/or layers increases within the MLMMSB framework. Our theoretical analysis reveals that the method utilizing the debiased sum of squared adjacency matrices consistently outperforms the method using the sum of squared adjacency matrices in terms of error rate. Additionally, both methods generally exhibit lower error rates than the method based on the sum of adjacency matrices. This underscores the advantage of debiased spectral clustering in mixed membership community detection for multi-layer networks. To the best of our knowledge, this is the first work to estimate mixed memberships using the above three aggregate matrices and the first work to establish consistency results within the MLMMSB framework.

  • 3.

    To assess the quality of mixed membership community detection in multi-layer networks, we introduce two novel modularity metrics: fuzzy sum modularity and fuzzy mean modularity. The first is derived from computing the fuzzy modularity of the sum of adjacency matrices, while the second is obtained by averaging the fuzzy modularity of adjacency matrices across individual layers. To the best of our knowledge, our two modularity metrics are the first to measure the quality of mixed membership community detection in multi-layer networks.

  • 4.

    We conduct extensive simulations to validate our theoretical findings and demonstrate the practical effectiveness of our methods and metrics through real-world multi-layer network applications.

The remainder of this paper is structured as follows: Section 2 introduces the model, followed by Section 3 outlining the methods developed. Section 4 presents the theoretical results. Subsequently, Section 5 includes experimental results on computer-generated multi-layer networks, while Section 6 focuses on real-world multi-layer networks. Lastly, Section 7 concludes the paper and technical proofs are provided in A.

Notation. We employ the notation [m][m] to denote the set {1,2,,m}\{1,2,\ldots,m\}. Furthermore, Im×mI_{m\times m} stands for the mm-by-mm identity matrix. For a vector xx, xq\|x\|_{q} represents its lql_{q} norm. When considering a matrix MM and an index set ss that is a subset of [m][m], M(s,:)M(s,:) refers to the sub-matrix of MM comprising the rows indexed by ss. Additionally, MM^{\prime} denotes the transpose of MM, MF\|M\|_{F} is its Frobenius norm, M\|M\|_{\infty} represents the maximum absolute row sum, M2\|M\|_{2\rightarrow\infty} signifies the maximum row-wise l2l_{2} norm, rank(M)\mathrm{rank}(M) gives its rank, and λk(M)\lambda_{k}(M) stands for the kk-th largest eigenvalue of MM in magnitude. The notation 𝔼[]\mathbb{E}[\cdot] is used to denote expectation, while ()\mathbb{P}(\cdot) represents probability. Finally, eie_{i} is defined as the indicator vector with a 1 in the ii-th position and zeros elsewhere.

2 Multi-layer mixed membership stochastic block model

Throughout, we consider undirected and unweighted multi-layer networks with nn common nodes and LL layers. For the ll-th layer, let the n×nn\times n symmetric matrix AlA_{l} be its adjacency matrix such that Al(i,j)=Al(j,i)=1A_{l}(i,j)=A_{l}(j,i)=1 if there is an edge connecting node ii and node jj and Al(i,j)=Al(j,i)=0A_{l}(i,j)=A_{l}(j,i)=0 otherwise, for all i[n],j[n],l[L]i\in[n],j\in[n],l\in[L], i.e., Al=Al{0,1}n×nA_{l}=A^{\prime}_{l}\in\{0,1\}^{n\times n}. Additionally, we allow for the possibility of self-edges (loops) in this paper. We assume that the multi-layer network consists of KK common communities

𝒞1,𝒞2,,𝒞K.\displaystyle\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{K}. (1)

Throughout, we assume that the number of communities KK is known in this paper. The challenging task of theoretically estimating KK in multi-layer networks exceeds the scope of this paper. Let Π[0,1]n×K\Pi\in[0,1]^{n\times K} be the membership matrix such that Π(i,k)\Pi(i,k) represents the “weight” that node ii belongs to the kk-th community 𝒞k\mathcal{C}_{k} for i[n],k[K]i\in[n],k\in[K]. Suppose that Π\Pi satisfies the following condition

rank(Π)=K,0Π(i,k)1,Π(i,:)1=1fori[n],k[K].\displaystyle\mathrm{rank}(\Pi)=K,0\leq\Pi(i,k)\leq 1,\|\Pi(i,:)\|_{1}=1\mathrm{~{}for~{}}i\in[n],k\in[K]. (2)

We call node ii a “pure” node if one entry of Π(i,:)\Pi(i,:) is 1 while the other (K1)(K-1) elements are zeros and a “mixed” node otherwise. Assume that

Thekthcommunity𝒞khasatleastonepurenodefork[K].\displaystyle\mathrm{The~{}}k\mathrm{-th~{}community~{}}\mathcal{C}_{k}~{}\mathrm{has~{}at~{}least~{}one~{}pure~{}node~{}for~{}}k\in[K]. (3)

Based on Equation (3), let \mathcal{I} be the index set of pure nodes such that ={p1,p2,,pK}\mathcal{I}=\{p_{1},p_{2},\ldots,p_{K}\} with pk{1,2,,n}p_{k}\in\{1,2,\ldots,n\} being an arbitrary pure node in the kk-th community 𝒞k\mathcal{C}_{k} for k[K]k\in[K]. Similar to [29], without loss of generality, we reorder the nodes such that Π(,:)=IK×K\Pi(\mathcal{I},:)=I_{K\times K}.

Suppose that all the layers share a common mixed membership matrix Π\Pi but with possibly different edge probabilities. In particular, we work with a multi-layer version of the popular mixed membership stochastic block model (MMSB) [1]. To be precise, we define the multi-layer MMSB below:

Definition 1.

Suppose that Equations (1)-(3) are satisfied, the multi-layer mixed membership stochastic block model (MLMMSB) for generating a multi-layer network with adjacency matrices {Al}l=1L\{A_{l}\}^{L}_{l=1} is as follows:

Ωl:=ρΠBlΠAl(i,j)=Al(j,i)Bernoulli(Ωl(i,j))fori[n],j[n],l[L],\displaystyle\Omega_{l}:=\rho\Pi B_{l}\Pi^{\prime}~{}~{}~{}~{}A_{l}(i,j)=A_{l}(j,i)\sim\mathrm{Bernoulli}(\Omega_{l}(i,j))\mathrm{~{}for~{}}i\in[n],j\in[n],l\in[L], (4)

where Bl=Bl[0,1]K×KB_{l}=B^{\prime}_{l}\in[0,1]^{K\times K} for l[L]l\in[L] and ρ(0,1]\rho\in(0,1].

Since BlB_{l} can vary for different ll, AlA_{l} generated by Equation (4) may have different expectation Ωl𝔼[Al]\Omega_{l}\equiv\mathbb{E}[A_{l}] for l[L]l\in[L]. Notably, when L=1L=1, the MLMMSB model degenerates to the popular MMSB model. When all nodes are pure, MLMMSB reduces to the MLSBM studied in previous works [16, 37, 26, 27]. Define ={B1,B2,,BL}\mathcal{B}=\{B_{1},B_{2},\ldots,B_{L}\}. Equation (4) says that MLMMSB is parameterized by Π,ρ\Pi,\rho, and \mathcal{B}. For brevity, we denote MLMMSB defined by Equation (4) as MLMMSB(Π,ρ,)\mathrm{MLMMSB}(\Pi,\rho,\mathcal{B}). Since (Al(i,j)=1)=Ωl(i,j)=ρΠ(i,:)BlΠ(j,:)ρ\mathbb{P}(A_{l}(i,j)=1)=\Omega_{l}(i,j)=\rho\Pi(i,:)B_{l}\Pi^{\prime}(j,:)\leq\rho, we see that decreasing the value of ρ\rho results in a sparser multi-layer network, i.e., ρ\rho controls the overall sparsity of the multi-layer network. For this reason, we call ρ\rho the sparsity parameter in this paper. We allow the sparsity parameter ρ\rho to go to zero by either increasing the number of nodes nn, the number of layers LL, or both simultaneously. We will study the impact of ρ\rho on the performance of the proposed methods by considering ρ\rho in our theoretical analysis. The primal goal of community detection within the MLMMSB framework is to accurately estimate the common mixed membership matrix Π\Pi from the observed LL adjacency matrices {Al}l=1L\{A_{l}\}^{L}_{l=1}. This estimation task is crucial for understanding the underlying community structure in multi-layer networks.

3 Spectral methods for mixed membership estimation

In this section, we propose three spectral methods designed to estimate the mixed membership matrix Π\Pi within the MLMMSB model for multi-layer networks in which nodes may belong to multiple communities with different weights. We recall that Ωl\Omega_{l} is the expectation of the ll-th observed adjacency matrix AlA_{l} for l[L]l\in[L] and Π\Pi contains the common community memberships for all nodes. To provide intuitions of the designs of our methods, we consider the oracle case where {Ωl}l=1L\{\Omega_{l}\}^{L}_{l=1} are directly observed. First, we define two distinct aggregate matrices formed by {Ωl}l=1L\{\Omega_{l}\}^{L}_{l=1}: Ωsuml[L]Ωl\Omega_{\mathrm{sum}}\equiv\sum_{l\in[L]}\Omega_{l} and S~suml[L]Ωl2\tilde{S}_{\mathrm{sum}}\equiv\sum_{l\in[L]}\Omega^{2}_{l}, where the former represents the sum of all expectation adjacency matrices, the latter is the sum of all squared expectation adjacency matrices, and both matrices contain the information about the mixed memberships of nodes in the multi-layer network since Ωsum=ρΠ(l[L]Bl)Π\Omega_{\mathrm{sum}}=\rho\Pi(\sum_{l\in[L]}B_{l})\Pi^{\prime} and S~sum=ρ2Π(l[L]BlΠΠBl)Π\tilde{S}_{\mathrm{sum}}=\rho^{2}\Pi(\sum_{l\in[L]}B_{l}\Pi^{\prime}\Pi B_{l})\Pi^{\prime}. Since rank(Π)=K\mathrm{rank}(\Pi)=K, it is easy to see that rank(Ωsum)=K\mathrm{rank}(\Omega_{\mathrm{sum}})=K if rank(l[L]Bl)=K\mathrm{rank}(\sum_{l\in[L]}B_{l})=K and rank(S~sum)=K\mathrm{rank}(\tilde{S}_{\mathrm{sum}})=K if rank(l[L]Bl2)=K\mathrm{rank}(\sum_{l\in[L]}B^{2}_{l})=K. Assuming that the number of communities KK is much smaller than the number of nodes nn, we observe that Ωsum\Omega_{\mathrm{sum}} and S~sum\tilde{S}_{\mathrm{sum}} possess low-dimensional structure. This low-rank property is crucial for the development of our spectral methods, as it allows us to efficiently extract meaningful information from the high-dimensional data. Building on these insights, we present the following lemma that characterizes the geometries of the two aggregate matrices Ωsum\Omega_{\mathrm{sum}} and S~sum\tilde{S}_{\mathrm{sum}}. This lemma forms the theoretical foundation for our methods, enabling us to develop accurate and computationally efficient algorithms for estimating the mixed membership matrix Π\Pi within the MLMMSB model.

Lemma 1.

(Ideal Simplexes) Under the model MLMMSB(Π,ρ,)\mathrm{MLMMSB}(\Pi,\rho,\mathcal{B}), depending on the conditions imposed on the set {Bl}l=1L\{B_{l}\}^{L}_{l=1}, we arrive at the following conclusions:

  1. 1.

    When rank(l[L]Bl)=K\mathrm{rank}(\sum_{l\in[L]}B_{l})=K: Let UΣUU\Sigma U^{\prime} denote the top KK eigen-decomposition of Ωsum\Omega_{\mathrm{sum}} where the n×Kn\times K matrix UU satisfies UU=IK×KU^{\prime}U=I_{K\times K} and the kk-th diagonal entry of the K×KK\times K diagonal matrix Σ\Sigma represents the kk-th largest eigenvalue (in magnitude) of Ωsum\Omega_{\mathrm{sum}} for k[K]k\in[K]. Then, we have U=ΠU(,:)U=\Pi U(\mathcal{I},:).

  2. 2.

    When rank(l[L]Bl2)=K\mathrm{rank}(\sum_{l\in[L]}B^{2}_{l})=K: Let VΛVV\Lambda V^{\prime} represent the top KK eigen-decomposition of S~sum\tilde{S}_{\mathrm{sum}} where the n×Kn\times K matrix VV fulfills VV=IK×KV^{\prime}V=I_{K\times K} and the kk-th diagonal entry of the K×KK\times K diagonal matrix Λ\Lambda is the kk-th largest eigenvalue (in magnitude) of S~sum\tilde{S}_{\mathrm{sum}} for k[K]k\in[K]. In this case, we have V=ΠV(,:)V=\Pi V(\mathcal{I},:).

Recall that Π\Pi satisfies Equations (2) and (3), by U=ΠU(,:)U=\Pi U(\mathcal{I},:) according to Lemma 1, the rows of UU forms a KK-simplex in K\mathbb{R}^{K}, referred to as the Ideal Simplex of UU (ISUIS_{U} for brevity). Notably, the KK rows of U(,:)U(\mathcal{I},:) serve as its vertices. Given that Π\Pi satisfies Equations (2) and (3), and U=ΠU(,:)U=\Pi U(\mathcal{I},:), it follows that U(i,:)=Π(i,:)U(,:)U(i,:)=\Pi(i,:)U(\mathcal{I},:) for i[n]i\in[n]. This implies that U(i,:)U(i,:) is a convex linear combination of the KK vertices of ISUIS_{U} with weights determined by Π(i,:)\Pi(i,:). Consequently, a pure row of UU (call U(i,:)U(i,:) pure row if node ii is a pure node and mixed row otherwise) lies on one of the KK vertices of the simplex, while a mixed row occupies an interior position within the simplex. Such simplex structure is also observed in mixed membership estimation for community detection in single-layer networks [29, 20, 41] and in topic matrix estimation for topic modeling [22]. Notably, the mixed membership matrix Π\Pi can be precisely recovered by setting Π=UU1(,:)\Pi=UU^{-1}(\mathcal{I},:) provided that the corner matrix U(,:)U(\mathcal{I},:) is obtainable. Thanks to the simplex structure in U=ΠU(,:)U=\Pi U(\mathcal{I},:), the vertices of the simplex can be found by a vertex-hunting technique such as the successive projection (SP) algorithm [3, 13, 14]. Applying SP to all rows of UU with KK clusters enables the exact recovery of the K×KK\times K corner matrix U(,:)U(\mathcal{I},:). Details of the SP algorithm are provided in Algorithm 1 [14]. SP can efficiently find the KK distinct pure rows of UU. Similarly, V=ΠV(,:)V=\Pi V(\mathcal{I},:) also exhibits a simplex structure, leading to the recovery of Π\Pi via Π=VV1(,:)\Pi=VV^{-1}(\mathcal{I},:) by Lemma 1. Consequently, applying the SP algorithm to VV with KK clusters also yields an exact reconstruction of the mixed membership matrix Π\Pi.

Remark 1.

Let ~\tilde{\mathcal{I}} represent another index set, consisting of arbitrary pure nodes p~1,p~2,,p~K\tilde{p}_{1},\tilde{p}_{2},\ldots,\tilde{p}_{K} from respective communities 𝒞k\mathcal{C}_{k} for k[K]k\in[K]. According to the proof of Lemma 1, we have U=ΠU(~,:)U=\Pi U(\tilde{\mathcal{I}},:). Furthermore, this implies that U=ΠU(~,:)=ΠU(,:)U=\Pi U(\tilde{\mathcal{I}},:)=\Pi U(\mathcal{I},:), which in turn signifies that U(~,:)U(\tilde{\mathcal{I}},:) is equivalent to U(,:)U(\mathcal{I},:). In essence, the corner matrix U(,:)U(\mathcal{I},:) remains unchanged regardless of the specific pure nodes chosen to form the index set. An analogous argument holds for V(,:)V(\mathcal{I},:).

For real-world multi-layer networks, the LL adjacency matrices A1,A2,,ALA_{1},A_{2},\ldots,A_{L} are observed instead of their expectations. For our first method, set Asum=l[L]AlA_{\mathrm{sum}}=\sum_{l\in[L]}A_{l}. We have 𝔼[Asum]=Ωsum\mathbb{E}[A_{\mathrm{sum}}]=\Omega_{\mathrm{sum}} under MLMMSB. Subsequently, let U^Σ^U^\hat{U}\hat{\Sigma}\hat{U}^{\prime} be the top KK eigen-decomposition of AsumA_{\mathrm{sum}} where U^\hat{U} is an orthogonal matrix with U^U^=IK×K\hat{U}^{\prime}\hat{U}=I_{K\times K} and Σ^\hat{\Sigma} is a diagonal matrix with its kk-th diagonal element representing the kk-th largest eigenvalue of AsumA_{\mathrm{sum}} in magnitude for k[K]k\in[K]. Given that the expectation of AsumA_{\mathrm{sum}} is Ωsum\Omega_{\mathrm{sum}}, it follows that U^\hat{U} can be interpreted as a slightly perturbed version of UU. To proceed, we apply the SP algorithm to all rows of U^\hat{U} with KK clusters, resulting in an estimated index set of pure nodes, denoted as ^\hat{\mathcal{I}}. We infer that U^(^,:)\hat{U}(\hat{\mathcal{I}},:) should closely approximate the corner matrix U(,:)U(\mathcal{I},:). Finally, we estimate the mixed membership matrix Π\Pi by computing U^U^1(^,:)\hat{U}\hat{U}^{-1}(\hat{\mathcal{I}},:). Our first algorithm, called “Successive projection on the sum of adjacency matrices” (SPSum, Algorithm 1) summarises the above analysis. It efficiently estimates the mixed membership matrix Π\Pi by executing the SP algorithm on the top KK eigenvectors of AsumA_{\mathrm{sum}}.

Algorithm 1 SPSum
1:Adjacency matrices A1,A2,,ALA_{1},A_{2},\ldots,A_{L}, and number of communities KK.
2:Estimated mixed membership matrix Π^\hat{\Pi}.
3:Compute Asum=l[L]AlA_{\mathrm{sum}}=\sum_{l\in[L]}A_{l}.
4:Get U^Σ^U^\hat{U}\hat{\Sigma}\hat{U}^{\prime}, the top KK eigen-decomposition of AsumA_{\mathrm{sum}}.
5:Apply the SP algorithm on all rows of U^\hat{U} with KK clusters to obtain the estimated index set ^\hat{\mathcal{I}}.
6:Set Π^=max(0,U^U^1(^,:))\hat{\Pi}=\mathrm{max}(0,\hat{U}\hat{U}^{-1}(\hat{\mathcal{I}},:)).
7:Normalize each row of Π^\hat{\Pi} by its l1l_{1} norm: Π^(i,:)=Π^(i,:)Π^(i,:)1\hat{\Pi}(i,:)=\frac{\hat{\Pi}(i,:)}{\|\hat{\Pi}(i,:)\|_{1}} for i[n]i\in[n].

In our second methodological approach, we define the aggregate matrix SsumS_{\mathrm{sum}} as the sum of the squared adjacency matrices, adjusted for bias, specifically as Ssum=l[L](Al2Dl)S_{\mathrm{sum}}=\sum_{l\in[L]}(A_{l}^{2}-D_{l}). Here, DlD_{l} represents a diagonal matrix, with its ii-th diagonal entry being the degree of node ii in layer ll, i.e., Dl(i,i)=j[n]Al(i,j)D_{l}(i,i)=\sum_{j\in[n]}A_{l}(i,j) for i[n]i\in[n] and l[L]l\in[L]. The matrix SsumS_{\mathrm{sum}} was first introduced in [27] within the context of MLSBM, serving as a debiased estimate of the sum of squared adjacency matrices. This debiasing is necessary as l[L]Al2\sum_{l\in[L]}A_{l}^{2} alone provides a biased approximation of l[L]Ωl2\sum_{l\in[L]}\Omega_{l}^{2}. By subtracting the diagonal matrix DlD_{l} from each squared adjacency matrix, we can effectively remove this bias, rendering SsumS_{\mathrm{sum}} a reliable estimator of S~sum\tilde{S}_{\mathrm{sum}} as demonstrated in [27]. Subsequently, let V^Λ^V^\hat{V}\hat{\Lambda}\hat{V}^{\prime} be the top KK eigen-decomposition of SsumS_{\mathrm{sum}}. Given that V^\hat{V} provides a close approximation of VV, applying the SP algorithm to all rows of V^\hat{V} yields a reliable estimate of the corner matrix V(,:)V(\mathcal{I},:). In summary, our second spectral method, centered on SsumS_{\mathrm{sum}}, is outlined in Algorithm 2. We refer to this method as “Successive projection on the debiased sum of squared adjacency matrices” (SPDSoS for brevity).

The third method utilizes the summation of squared adjacency matrices, expressed as l[L]Al2\sum_{l\in[L]}A^{2}_{l}, as a substitute for SsumS_{\mathrm{sum}} in Algorithm 2. Notably, this approach excludes the crucial bias-removal step inherent in SsumS_{\mathrm{sum}}. We call this method “Successive projection on the sum of squared adjacency matrices”, abbreviated as SPSoS. In the subsequent section, we will demonstrate that SPDSoS consistently outperforms SPSoS, and both methods are generally theoretically superior to SPSum.

Algorithm 2 SPDSoS
1:Adjacency matrices A1,A2,,ALA_{1},A_{2},\ldots,A_{L}, and number of communities KK.
2:Estimated mixed membership matrix Π^\hat{\Pi}.
3:Compute Ssum=l[L](Al2Dl)S_{\mathrm{sum}}=\sum_{l\in[L]}(A^{2}_{l}-D_{l}).
4:Get V^Λ^V^\hat{V}\hat{\Lambda}\hat{V}^{\prime}, the top KK eigen-decomposition of SsumS_{\mathrm{sum}}.
5:Apple the SP algorithm on all rows of V^\hat{V} with KK clusters to obtain the estimated index set ^\hat{\mathcal{I}}.
6:Set Π^=max(0,U^U^1(^,:))\hat{\Pi}=\mathrm{max}(0,\hat{U}\hat{U}^{-1}(\hat{\mathcal{I}},:)).
7:Update Π^\hat{\Pi} by Π^(i,:)=Π^(i,:)Π^(i,:)1\hat{\Pi}(i,:)=\frac{\hat{\Pi}(i,:)}{\|\hat{\Pi}(i,:)\|_{1}} for i[n]i\in[n].

4 Main results

In this section, we demonstrate the consistency of our methods by presenting their theoretical upper bounds for per-node error rates as the number of nodes nn and/or the number of layers LL increases within the context of the MLMMSB model. Assumption 1 provides a prerequisite lower bound for the sparsity parameter ρ\rho to ensure the theoretical validity of SPSum.

Assumption 1.

(sparsity requirement for SPSum) ρnLτ2log(n+L)\rho nL\geq\tau^{2}\mathrm{log}(n+L), where τ=maxi[n],j[n]|l[L](Al(i,j)Ωl(i,j))|\tau=\mathrm{max}_{i\in[n],j\in[n]}|\sum_{l\in[L]}(A_{l}(i,j)-\Omega_{l}(i,j))|.

In Assumption 1, the choice of τ\tau plays a crucial role in determining the level of sparsity regime. Since l[L]Al(i,j)L\sum_{l\in[L]}A_{l}(i,j)\leq L, τ\tau has an upper bound LL. However, to consider an even sparser scenario, we introduce a constant positive value β\beta that is strictly less than LL and remains fixed regardless of the growth of nn or LL. This ensures that the aggregate number of edges connecting any two nodes ii and jj across all layers remains below β\beta even in the limit as nn or LL approaches infinity. Under these conditions, Assumption 1 is revised to reflect a more stringent requirement: ρβ2log(n+L)nL\rho\geq\frac{\beta^{2}\mathrm{log}(n+L)}{nL}. This revised assumption characterizes the most challenging scenario for community detection. To establish the theoretical bounds for SPSum, we need the following requirement on \mathcal{B}.

Assumption 2.

(SPSum’s requirement on \mathcal{B}) |λK(l[L]Bl)|c1L|\lambda_{K}(\sum_{l\in[L]}B_{l})|\geq c_{1}L for some constant c1>0c_{1}>0.

Assumption 2 is mild as |λK(l[L]Bl)||\lambda_{K}(\sum_{l\in[L]}B_{l})| represents the smallest singular value resulting from the summation of LL matrices, and it is reasonable to presume that |λK(l[L]Bl)||\lambda_{K}(\sum_{l\in[L]}B_{l})| is on the order of O(L)O(L). To simplify our theoretical analysis, we also introduce the following condition:

Condition 1.

K=O(1)K=O(1) and λK(ΠΠ)=O(nK)\lambda_{K}(\Pi^{\prime}\Pi)=O(\frac{n}{K}).

Condition 1 is mild as K=O(1)K=O(1) implies that the number of communities remains constant, while λK(ΠΠ)=O(nK)\lambda_{K}(\Pi^{\prime}\Pi)=O(\frac{n}{K}) ensures the balance of community sizes, where the size of the kk-th community 𝒞k\mathcal{C}_{k} is defined as i[n]Π(i,k)\sum_{i\in[n]}\Pi(i,k) for k[K]k\in[K]. Our main result for SPSum offers an upper bound on its per-node error rate in terms of ρ,n\rho,n, and LL under the MLMMSB framework.

Theorem 1.

(Per-node error rate of SPSum) Under MLMMSB(Π,ρ,)\mathrm{MLMMSB}(\Pi,\rho,\mathcal{B}), when Assumption 1, Assumption 2, and Condition 1 hold, let Π^\hat{\Pi} be obtained from the SPSum algorithm, there exists a K×KK\times K permutation matrix 𝒫\mathcal{P} such that with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

maxi[n]ei(Π^Π𝒫)1=O(log(n+L)ρnL).\displaystyle\mathrm{max}_{i\in[n]}\|e^{\prime}_{i}(\hat{\Pi}-\Pi\mathcal{P})\|_{1}=O(\sqrt{\frac{\mathrm{log}(n+L)}{\rho nL}}).

According to Theorem 1, it becomes evident that as nn (and/or LL) approaches infinity, SPSum’s error rate diminishes to zero, highlighting the advantage of employing multiple layers in community detection and SPSum’s consistent community detection. Additionally, Theorem 1 says that to attain a sufficiently low error rate for SPSum, the sparsity parameter ρ\rho must decrease at a rate slower than log(n+L)nL\frac{\mathrm{log}(n+L)}{nL} (i.e., ρ\rho must be significantly greater than log(n+L)nL\frac{\mathrm{log}(n+L)}{nL}). This finding aligns with the sparsity requirement stated in Assumption 1 for SPSum. It is noteworthy that the theoretical upper bound for SPSum’s per-node error rate, given as O(log(n+L)ρnL)O(\sqrt{\frac{\mathrm{log}(n+L)}{\rho nL}}) in Theorem 1, is independent of how we select the value of τ\tau in Assumption 1. This underscores the generality of our theoretical findings.

The assumption 3 stated below establishes the lower bound of the sparsity parameter ρ\rho that ensures the consistency of both SPDSoS and SPSoS.

Assumption 3.

(sparsity requirement for SPDSoS and SPSoS) ρ2n2Lτ~2log(n+L)\rho^{2}n^{2}L\geq\tilde{\tau}^{2}\mathrm{log}(n+L), where τ~=maxi[n]maxj[n]|l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))|\tilde{\tau}=\mathrm{max}_{i\in[n]}\mathrm{max}_{j\in[n]}\\ |\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))|.

Analogous to Assumption 1, when considering an extremely sparse scenario where τ~\tilde{\tau} does not exceed a positive constant β~\tilde{\beta}, Assumption 3 dictates that ρ\rho must satisfy ρ1nβ~2log(n+L)L\rho\geq\frac{1}{n}\sqrt{\frac{\tilde{\beta}^{2}\mathrm{log}(n+L)}{L}}. This requirement aligns with the sparsity requirement in Theorem 1 of [27]. The subsequent assumption serves a similar purpose to Assumption 2.

Assumption 4.

(SPDSoS’s and SPSoS’s requirement on \mathcal{B}) |λK(l[L]Bl2)|c2L|\lambda_{K}(\sum_{l\in[L]}B^{2}_{l})|\geq c_{2}L for some constant c2>0c_{2}>0.

Theorems 2 and 3 are the main results of SPDSoS and SPSoS, respectively.

Theorem 2.

(Per-node error rate of SPDSoS) Under MLMMSB(Π,ρ,)\mathrm{MLMMSB}(\Pi,\rho,\mathcal{B}), when Assumption 3, Assumption 4, and Condition 1 hold, let Π^\hat{\Pi} be obtained from the SPDSoS algorithm, there exists a K×KK\times K permutational matrix 𝒫~\tilde{\mathcal{P}} such that with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

maxi[n]ei(Π^Π𝒫~)1=O(log(n+L)ρ2n2L)+O(1n).\displaystyle\mathrm{max}_{i\in[n]}\|e^{\prime}_{i}(\hat{\Pi}-\Pi\tilde{\mathcal{P}})\|_{1}=O(\sqrt{\frac{\mathrm{log}(n+L)}{\rho^{2}n^{2}L}})+O(\frac{1}{n}).
Theorem 3.

(Per-node error rate of SPSoS) Under the same conditions as in Theorem 2 and suppose ρnLlog(n+L)\rho nL\geq\mathrm{log}(n+L), let Π^\hat{\Pi} be obtained from the SPSoS algorithm, with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

maxi[n]ei(Π^Π𝒫~)1=O(log(n+L)ρ2n2L)+O(1ρn).\displaystyle\mathrm{max}_{i\in[n]}\|e^{\prime}_{i}(\hat{\Pi}-\Pi\tilde{\mathcal{P}})\|_{1}=O(\sqrt{\frac{\mathrm{log}(n+L)}{\rho^{2}n^{2}L}})+O(\frac{1}{\rho n}).

According to Theorems 2 and 3, it is evident that both SPDSoS and SPSoS enjoy consistent mixed membership estimation. This consistency arises from the fact that their per-node error rates tend towards zero as the number of nodes nn (and/or the number of layers LL) approaches infinity. Furthermore, similar to Theorem 1, the theoretical bounds outlined in Theorems 2 and 3 are independent of the selection of τ~\tilde{\tau} in Assumption 3.

By the proof of Theorem 3, we know that SPSoS’s theoretical upper bound of per-node error rate is the summation of SPDSoS’s theoretical upper bound of per-node error rate and O(1ρn)O(\frac{1}{\rho n}). Therefore, SPDSoS’s error rate is always smaller than that of SPSoS and this indicates the benefit of the bias-removal step in SsumS_{\mathrm{sum}}. Furthermore, to compare the theoretical performances between SPSum and SPSoS, we consider the following two simple cases:

  • 1.

    When SPSoS’s error rate is at the order 1ρn\frac{1}{\rho n}, if SPSum significantly outperforms SPSoS, we have log(n+L)ρnL1ρnLρnlog(n+L)\sqrt{\frac{\mathrm{log}(n+L)}{\rho nL}}\ll\frac{1}{\rho n}\Leftrightarrow L\gg\rho n\mathrm{log}(n+L).

  • 2.

    When SPSoS’s error rate is at the order log(n+L)ρ2n2L\sqrt{\frac{\mathrm{log}(n+L)}{\rho^{2}n^{2}L}}, if SPSum significantly outperforms SPSoS, we have log(n+L)ρnLlog(n+L)ρ2n2Lρn1ρ1n\sqrt{\frac{\mathrm{log}(n+L)}{\rho nL}}\ll\sqrt{\frac{\mathrm{log}(n+L)}{\rho^{2}n^{2}L}}\Leftrightarrow\rho n\ll 1\Leftrightarrow\rho\ll\frac{1}{n}. Since log(n+L)ρnL1\sqrt{\frac{\mathrm{log}(n+L)}{\rho nL}}\ll 1, we have ρlog(n+L)nL\rho\gg\frac{\mathrm{log}(n+L)}{nL}. Combine ρlog(n+L)nL\rho\gg\frac{\mathrm{log}(n+L)}{nL} with ρ1n\rho\ll\frac{1}{n}, we have log(n+L)nL1nLlog(n+L)\frac{\mathrm{log}(n+L)}{nL}\ll\frac{1}{n}\Leftrightarrow L\gg\mathrm{log}(n+L).

The preceding points indicate that SPSum’s superior performance over SPSoS is limited to scenarios where the number of layers LL is exceptionally large (Lρnlog(n+L)L\gg\rho n\mathrm{log}(n+L) or Llog(n+L)L\gg\mathrm{log}(n+L)). However, this requirement is unrealistic for the majority of real-world multi-layer networks. Given that SPDSoS consistently outperforms SPSoS, it follows that SPDSoS also prevails over SPSum in most scenarios. Additionally, even in the rare scenarios where SPSum significantly surpasses SPDSoS for a significantly large LL, Theorem 2 assures that SPDSoS’s error rate is negligible. Therefore, among these three methods, we arrive at the following conclusions: (a) SPDSoS consistently outperforms SPSoS; (b) SPSum’s significant superiority over SPDSoS and SPSoS necessitates an unreasonably high number of layers, LL, which is impractical for most real-world multi-layer networks. Consequently, we can confidently assert that SPDSoS and SPSoS virtually always surpass SPSum.

5 Numerical results on synthetic multi-layer networks

In this section, we evaluate the performance of our proposed methods through the utilization of computer-generated multi-layer networks. For these simulated networks, we possess knowledge of the ground-truth mixed membership matrix Π\Pi. To quantify the performance of each method, we employ two metrics: the Hamming error and the Relative error. The Hamming error is defined as Hammingerror=min𝒫𝒮Π^Π𝒫1n\mathrm{Hamming~{}error}=\mathrm{min}_{\mathcal{P}\in\mathcal{S}}\frac{\|\hat{\Pi}-\Pi\mathcal{P}\|_{1}}{n}, while the relative error is given by Relativeerror=min𝒫𝒮Π^Π𝒫FΠF\mathrm{Relative~{}error}=\mathrm{min}_{\mathcal{P}\in\mathcal{S}}\frac{\|\hat{\Pi}-\Pi\mathcal{P}\|_{F}}{\|\Pi\|_{F}}. Here, 𝒮\mathcal{S} represents the set of all KK-by-KK permutation matrices, accounting for potential label permutations. For each parameter setting in our simulation examples, we report the average of each metric for each proposed approach across 100 independent repetitions.

For all simulations conducted below, we set K=3K=3 and let n0n_{0} be the number of pure nodes within each community. For mixed nodes, say node ii, we formulate its mixed membership vector Π(i,:)\Pi(i,:) in the following manner. Initially, we generate two random values ri(1)=rand(1)r^{(1)}_{i}=\mathrm{rand}(1) and ri(2)=rand(1)r^{(2)}_{i}=\mathrm{rand}(1), where rand(1)\mathrm{rand}(1) represents a random value drawn from a Uniform distribution over the interval [0,1][0,1]. Subsequently, we define the mixed membership vector as Π(i,:)=(ri(1)2,ri(2)2,1ri(1)2ri(2)2)\Pi(i,:)=(\frac{r^{(1)}_{i}}{2},\frac{r^{(2)}_{i}}{2},1-\frac{r^{(1)}_{i}}{2}-\frac{r^{(2)}_{i}}{2}). Regarding the connectivity matrices, for the ll-th matrix BlB_{l}, we assign Bl(k,k~)=Bl(k~,k)=rand(1)B_{l}(k,\tilde{k})=B_{l}(\tilde{k},k)=\mathrm{rand}(1) for all 1kk~K1\leq k\leq\tilde{k}\leq K and l[L]l\in[L]. Finally, the number of nodes nn, the sparsity parameter ρ\rho, the number of layers LL, and the number of pure nodes n0n_{0} within each community are independently set for each simulation.

Experiment 1: Changing ρ\rho. Fix (n,L,n0)=(500,100,100)(n,L,n_{0})=(500,100,100) and let ρ\rho range in{0.02,0.04,,0.2}\{0.02,0.04,\ldots,0.2\}. The multi-layer network becomes denser as ρ\rho increases. The results shown in panel (a) and panel (b) of Figure 2 demonstrate that SPDSoS performs slightly better than SPSoS, and both methods significantly outperform SPSum. Meanwhile, the error rates of SPDSoS and SPSoS decrease rapidly as the sparse parameter ρ\rho increases while SPSum’s error rates decrease slowly.

Refer to caption
(a) Experiment 1: Hamming error
Refer to caption
(b) Experiment 1: Relative error
Refer to caption
(c) Experiment 2: Hamming error
Refer to caption
(d) Experiment 2: Relative error
Refer to caption
(e) Experiment 3: Hamming error
Refer to caption
(f) Experiment 3: Relative error
Refer to caption
(g) Experiment 4: Hamming error
Refer to caption
(h) Experiment 4: Relative error
Figure 2: Numerical results.

Experiment 2: Changing LL. Fix (n,ρ,n0)=(500,0.1,100)(n,\rho,n_{0})=(500,0.1,100) and let LL range in{10,20,,100}\{10,20,\ldots,100\}. More layers are observed as LL increases. The results are presented in panel (c) and panel (d) of Figure 2. It is evident that SPDSoS and SPSoS exhibit comparable performance, both significantly surpassing SPSum. Furthermore, as LL increases, SPDSoS and SPSoS demonstrate improved performance, whereas SPSum’s performance remains relatively unchanged throughout this experiment.

Experiment 3: Changing nn. Fix (L,ρ)=(40,0.1)(L,\rho)=(40,0.1), let nn range in {200,400,,2000}\{200,400,\ldots,2000\}, and let n0=n4n_{0}=\frac{n}{4} for each choice of nn. Panel (e) and panel (f) of Figure 2 display the results, which demonstrate that SPDSoS and SPSoS enjoy similar performances and that both methods perform better than SPSum. Meanwhile, we also observe that the error rates of all methods decrease as nn increases here.

Experiment 4: Changing n0n_{0}. Fix (n,L,ρ)=(600,50,0.1)(n,L,\rho)=(600,50,0.1) and let n0n_{0} range in {20,40,,200}\{20,40,\ldots,200\}. The number of pure nodes increases as n0n_{0} grows. The results are displayed in the final two panels of Figure 2. Our observations indicate that SPDSoS and SPSoS exhibit notably superior performance compared to SPSum. Furthermore, both SPDSoS and SPSoS exhibit improved performance in scenarios with a higher number of pure nodes, whereas SPSum demonstrates weaker performance in this experiment.

6 Real data applications

In this section, we demonstrate the application of our methods to multi-layer networks in the real world. The application of a mixed membership estimation algorithm to such networks consistently yields an estimated mixed membership matrix, denoted as Π^\hat{\Pi}. Notably, Π^\hat{\Pi} may vary depending on the specific algorithm used. Consequently, accurately assessing the quality of the estimated mixed membership community partition becomes a crucial problem. To address this challenge, we introduce two modularity metrics in this paper, designed to quantitatively evaluate the quality of mixed membership community detection in real-world multi-layer networks.

Recall that Asum=l[L]AlA_{\text{sum}}=\sum_{l\in[L]}A_{l} represents the summation of all adjacency matrices. This summation effectively quantifies the weight between nodes. Given that nodes sharing similar community memberships tend to exhibit stronger connections than those with different memberships, AsumA_{\text{sum}} can be interpreted as the weighted adjacency matrix of an assortative network [33, 34]. Our fuzzy sum modularity, QfsumQ_{\text{fsum}}, is defined as follows:

Qfsum=1msumi[n]j[n](Asum(i,j)dsum(i)dsum(j)msum)Π^(i,:)Π^(j,:),\displaystyle Q_{\text{fsum}}=\frac{1}{m_{\text{sum}}}\sum_{i\in[n]}\sum_{j\in[n]}\left(A_{\text{sum}}(i,j)-\frac{d_{\text{sum}}(i)d_{\text{sum}}(j)}{m_{\text{sum}}}\right)\hat{\Pi}(i,:)\hat{\Pi}^{\prime}(j,:),

where fsum stands for “fuzzy sum”, dsum(i)=j[n]Asum(i,j)d_{\text{sum}}(i)=\sum_{j\in[n]}A_{\text{sum}}(i,j) for i[n]i\in[n], and msum=i[n]dsum(i)m_{\text{sum}}=\sum_{i\in[n]}d_{\text{sum}}(i). Notably, when L=1L=1, our fuzzy sum modularity QfsumQ_{\text{fsum}} simplifies to the fuzzy modularity introduced in Equation (14) of [32]. Furthermore, when L=1L=1 and all nodes are pure, our modularity metric reduces to the classical Newman-Girvan modularity [36].

Our second modularity metric is the fuzzy mean modularity, defined as the average of fuzzy modularities across all layers. Here’s how it is formulated:

Qfmean=1Ll[L]i[n]j[n]1ml(Al(i,j)Dl(i,i)Dl(j,j)ml)Π^(i,:)Π^(j,:),\displaystyle Q_{\text{fmean}}=\frac{1}{L}\sum_{l\in[L]}\sum_{i\in[n]}\sum_{j\in[n]}\frac{1}{m_{l}}\left(A_{l}(i,j)-\frac{D_{l}(i,i)D_{l}(j,j)}{m_{l}}\right)\hat{\Pi}(i,:)\hat{\Pi}^{\prime}(j,:),

where fmean stands for “fuzzy mean” and ml=i[n]Dl(i,i)m_{l}=\sum_{i\in[n]}D_{l}(i,i) represents the sum of diagonal elements in the degree matrix DlD_{l} for layer ll for l[L]l\in[L]. When all nodes are pure, our fuzzy mean modularity QfmeanQ_{\text{fmean}} simplifies to the multi-normalized average modularity introduced in Equation (3.1) of [38]. When there is only a single layer (L=1L=1), QfmeanQ_{\text{fmean}} reduces to the fuzzy modularity described in [32]. Furthermore, if both conditions hold (L=1L=1 and all nodes are pure), QfmeanQ_{\text{fmean}} degenerates to the classical Newman-Girvan modularity [36].

Analogous to the Newman-Girvan modularity [36], the fuzzy modularity [32], and the multi-normalized average modularity [38], a higher value of the fuzzy sum modularity QfsumQ_{\text{fsum}} indicates a better community partition. Consequently, we consistently favor a larger value of QfsumQ_{\text{fsum}}. Similarly, a larger QfmeanQ_{\text{fmean}} also indicates a better community partition. To the best of our knowledge, our fuzzy sum modularity QfsumQ_{\text{fsum}} and fuzzy mean modularity QfmeanQ_{\text{fmean}} are the first metrics designed to evaluate the quality of mixed membership community detection in multi-layer networks.

For real-world multi-layer networks, where the number of communities KK is usually unknown, we adopt the strategy introduced in [32] to estimate KK. Specifically, we determine KK by selecting the one that maximizes the fuzzy sum modularity QfsumQ_{\text{fsum}} (or the fuzzy mean modularity QfmeanQ_{\text{fmean}}). This strategy ensures that we obtain an optimal community partition based on the chosen modularity metric.

In this paper, we consider the following real-world multi-layer networks, which can be accessed at https://manliodedomenico.com/data.php:

  • 1.

    Lazega Law Firm: This data is a multi-layer social network with n=71n=71 nodes and L=3L=3 layers [25, 42]. For this data, nodes represent company partners and layers denote different cooperative relationships (Advice, Friendship, and Co-work) among partners.

  • 2.

    C.Elegans: This data is from biology and it collects the connection of Caenorhabditis elegans [7]. It has n=279n=279 nodes and L=3L=3 layers with nodes denoting Caenorhabditis elegans and layers being different synaptic junctions (Electric, Chemical Monadic, and Chemical Polyadic).

  • 3.

    CS-Aarhus: This data is a multi-layer social network with n=61n=61 nodes and L=5L=5 layers, where nodes represent employees of the Computer Science department at Aarhus and layers denote different relationships (Lunch, Facebook, Coauthor, Leisure, Work) [28].

  • 4.

    FAO-trade: This data collects different types of trade relationships among countries from FAO (Food and Agriculture Organization of the United Nations) [10]. It has n=214n=214 nodes and L=364L=364 layers, where nodes represent countries, layers denote products, and edges denote trade relationships among countries.

The estimated number of communities and the corresponding modularity of each method for the real data analyzed in this paper are comprehensively presented in Table 1 and Table 2. After analyzing the results in these tables, we arrive at the following conclusions:

  • 1.

    QfsumQ_{\text{fsum}} and QfmeanQ_{\text{fmean}} consistently demonstrate similar values for each method, indicating a high degree of agreement. Specifically, SPSum achieves the highest modularity score using both metrics across multiple datasets, including Lazega Law Firm, CS-Aarhus, and FAO-trade. It is worth mentioning that this observation (SPSum surpassing the other two methods) contrasts with both theoretical and numerical discoveries. A plausible explanation for this discrepancy could be that QfsumQ_{\text{fsum}} and QfmeanQ_{\text{fmean}} are computed directly from {Al}l=1L\{A_{l}\}^{L}_{l=1} rather than {Al2}l=1L\{A^{2}_{l}\}^{L}_{l=1}. Similarly, for SPSoS, its QfsumQ_{\text{fsum}} and QfmeanQ_{\text{fmean}} scores for CS-Aarhus rank second among all methods, with scores that are closely aligned. This consistency across measures suggests that both QfsumQ_{\text{fsum}} and QfmeanQ_{\text{fmean}} effectively capture similar aspects of community structure, providing reliable and comparable assessments of different methods.

  • 2.

    While SPSum may not perform as well in numerical studies, it nevertheless exhibits superior performance in terms of modularity for real-world datasets compared to SPDSoS and SPSoS. The only exception is the C.Elegans network, where SPSum’s modularity score is slightly lower than SPDSoS’s.

  • 3.

    The CS-Aarhus network exhibits a more distinct community structure compared to the other three real multi-layer networks, as evidenced by the higher modularity scores obtained by our methods for CS-Aarhus. Conversely, the FAO-trade network possesses the least discernible community structure among all real multi-layer networks, as the modularity scores achieved by all proposed methods for FAO-trade are lower than those of the other three networks.

  • 4.

    The results presented in Table 1 and Table 2 indicate that the optimal number of communities for the four real-world multi-layer networks, namely Lazega Law Firm, CS-Aarhus, CS-Aarhus, and FAO-trade, are 3, 2, 5, and 2, respectively. This determination is based on selecting the value of KK that yields the highest modularity score across all methods for each dataset.

Table 1: (EstimatedK,Qfsum)(\mathrm{Estimated~{}}K,Q_{\text{fsum}}) obtained by the proposed approaches for the real data used in this paper. The boldface values represent the highest QfsumQ_{\text{fsum}} scores among the three methods.
Dataset SPSum SPDSoS SPSoS
Lazega Law Firm (3,0.2025) (3,0.1604) (3,0.1993)
C.Elegans (2,0.2778) (2,0.2808) (2,0.2509)
CS-Aarhus (5,0.3575) (4,0.3474) (4,0.3559)
FAO-trade (2,0.1508) (2,0.1329) (2,0.1412)
Table 2: (EstimatedK,Qfmean)(\mathrm{Estimated~{}}K,Q_{\text{fmean}}) obtained by the proposed approaches for the real data used in this paper. The boldface values represent the highest QfmeanQ_{\text{fmean}} scores among the three methods.
Dataset SPSum SPDSoS SPSoS
Lazega Law Firm (3,0.1990) (3,0.1572) (3,0.1961)
C.Elegans (2,0.2779) (2,0.2780) (2,0.2495)
CS-Aarhus (5,0.3681) (4,0.3404) (4,0.3529)
FAO-trade (2,0.1487) (2,0.1274) (2,0.1403)

To simplify our analysis and consider the fact that SPSum generates modularity scores that are either higher or comparable to those of SPDSoS and SPSoS for the four real-world multi-layer networks under consideration, we focus our further analysis on the SPSum method. Let Π^\hat{\Pi} represent the estimated mixed membership matrix obtained by SPSum for a given real-world multi-layer network. For a node ii within the network, we define its estimated home base community as the one that corresponds to the maximum value in the ii-th row of Π^\hat{\Pi}, denoted as k~=argmaxk[K]Π^(i,k)\tilde{k}=\mathrm{argmax}_{k\in[K]}\hat{\Pi}(i,k). Furthermore, we categorize nodes based on their membership distribution: a node is considered highly mixed if maxk[K]Π^(i,k)0.6\mathrm{max}_{k\in[K]}\hat{\Pi}(i,k)\leq 0.6, highly pure if maxk[K]Π^(i,k)0.9\mathrm{max}_{k\in[K]}\hat{\Pi}(i,k)\geq 0.9, and neutral otherwise. We introduce two additional metrics, ςmixed\varsigma_{mixed} and ςpure\varsigma_{pure}, which represent the proportions of highly mixed and highly pure nodes, respectively. Additionally, we define the balanced parameter υ\upsilon as the ratio of the minimum to the maximum l1l_{1} norm of the columns of Π^\hat{\Pi}, i.e., υ=mink[K]Π^(:,k)1maxk[K]Π^(:,k)1\upsilon=\frac{\mathrm{min}_{k\in[K]}\|\hat{\Pi}(:,k)\|_{1}}{\mathrm{max}_{k\in[K]}\|\hat{\Pi}(:,k)\|_{1}}. A higher value of υ\upsilon indicates a more balanced multi-layer network. Table 3 presents the values of these three indices obtained by applying SPSum to the real-world multi-layer networks studied in this paper. Based on the results in Table 3, we draw the following conclusions:

  • 1.

    Across all networks, the number of highly pure nodes significantly exceeds the number of highly mixed nodes, indicating that most nodes are strongly associated with a single community, while only a few exhibit mixed membership.

  • 2.

    In the Lazega Law Firm network, approximately 10 nodes are highly mixed, 35 nodes are highly pure, and 26 are neutral. Meanwhile, this network exhibits the highest proportion of highly mixed nodes among the four real-world multi-layer networks analyzed.

  • 3.

    The C.Elegans network has the lowest proportion of highly mixed nodes and the highest proportion of highly pure nodes. Its balanced parameter, υ\upsilon, is 0.9512, the highest among all networks, indicating that the sizes of the two estimated communities in C.Elegans are nearly identical.

  • 4.

    The CS-Aarhus network exhibits indices that are comparable to those of the Lazega Law Firm network.

  • 5.

    FAO-trade displays a proportion of highly mixed (and pure) nodes similar to C.Elegans. However, its balanced parameter is the lowest among all networks, indicating that FAO-trade exhibits the most unbalanced community structure among the four real-world datasets.

Table 3: ςmixed,ςpure\varsigma_{mixed},\varsigma_{pure}, and υ\upsilon computed from Π^\hat{\Pi}, where Π^\hat{\Pi} is returned by running the SPSum algorithm to real multi-layer networks used in this paper. Here, we use the Estimated KK of SPSum in Table 1 for each data.
Dataset ςmixed\varsigma_{\mathrm{mixed}} ςpure\varsigma_{\mathrm{pure}} υ\upsilon
Lazega Law Firm 0.1408 0.4930 0.7276
C.Elegans 0.0609 0.6882 0.9512
CS-Aarhus 0.1311 0.4590 0.7006
FAO-trade 0.0701 0.6449 0.5480
Refer to caption
Figure 3: Ternary diagram of the 71×371\times 3 estimated membership matrix Π^\hat{\Pi} for Lazega Law Firm. Each dot represents a company partner, and its location within the triangle corresponds to its membership scores.
Refer to caption
Refer to caption
Refer to caption
Figure 4: Illustration of the 3 layers of Lazega Law Firm. Colors (shapes) indicate home based communities and the black square represents highly mixed nodes detected by SPSum with 3 communities. For all layers, we only plot the largest connected graph.
Refer to caption
Refer to caption
Refer to caption
Figure 5: Illustration of the 3 layers of C.Elegans. Colors (shapes) indicate home based communities and the black square represents highly mixed nodes detected by SPSum with 2 communities. For all layers, we only plot the largest connected graph.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Illustration of the Lunch, Facebook, Leisure, and Work layers of CS-Aarhus, where we do not show the Coauthor layer because it is too sparse. Colors (shapes) indicate home based communities and the black square represents highly mixed nodes detected by SPSum with 5 communities. For all layers, we only plot the largest connected graph.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: Illustration of the six layers of six products in FAO-trade. Colors (shapes) indicate home based communities and the black square represents highly mixed nodes detected by SPSum with 2 communities. For all layers, we only plot the largest connected graph.

Figure 3 presents a ternary diagram that visualizes the estimated community membership matrix Π^\hat{\Pi} obtained from the SPSum algorithm for the Lazega Law Firm network when there are three communities. In this diagram, we observe that node ii is positioned closer to one of the triangle’s vertices compared to node jj if maxk[3]Π^(i,k)>maxk[3]Π^(j,k)\mathrm{max}_{k\in[3]}\hat{\Pi}(i,k)>\mathrm{max}_{k\in[3]}\hat{\Pi}(j,k) for i[71],j[71]i\in[71],j\in[71]. A pure node is located at one of the triangle’s vertices, and a neutral node is closer to a vertex than a highly mixed node. Therefore, Figure 3 indicates the purity of each node for the Lazega Law Firm network.

Figures 4-7 present the communities estimated by our SPSum method for the four real multi-layer networks. In these figures, nodes sharing the same color represent nodes belonging to the same home base community, while black squares denote highly mixed nodes. From these figures, we can clearly find the communities detected by SPSum in each layer for each multi-layer network.

7 Conclusion

This paper considers the problem of estimating community memberships of nodes in multi-layer networks under the multi-layer mixed membership stochastic block model, a model that permits nodes to belong to multiple communities simultaneously. We have developed spectral methods leveraging the eigen-decomposition of some aggregate matrices, and have provided theoretical guarantees for their consistency, demonstrating the convergence of per-node error rates as the number of nodes and/or layers increases under MLMMSB. To the best of our knowledge, this is the first work to estimate the mixed community memberships for multi-layer networks by using these aggregate matrices. Our theoretical analysis reveals that the algorithm designed based on the debiased sum of squared adjacency matrices always outperforms the algorithm using the sum of squared adjacency matrices, while they generally outperform the method using the sum of adjacency matrices in the task of estimating mixed memberships for multi-layer networks. Such a result is new for mixed membership estimation in multi-layer networks to the best of our knowledge. Extensive simulated studies support our theoretical findings, validating the efficiency of our method using the debiased sum of adjacency matrices. Additionally, the proposed fuzzy modularity measures offer a novel perspective for evaluating the quality of mixed membership community detection in multi-layer networks.

For future research, first, developing methods with theoretical guarantees for estimating the number of communities in MLMMSB remains a challenging and meaningful task. Second, accelerating our methods for detecting mixed memberships in large-scale multi-layer networks is crucial for practical applications. Third, exploring more efficient algorithms for estimating mixed memberships would further enrich our understanding of community structures in multi-layer networks. Finally, extending our framework to directed multi-layer networks would broaden the scope of our work and enable the analysis of even more complex systems.

CRediT authorship contribution statement

Huan Qing: Conceptualization; Data curation; Formal analysis; Funding acquisition; Methodology; Project administration; Resources; Software; Validation; Visualization; Writing-original draft; Writing-review &\& editing.

Declaration of competing interest

The author declares no competing interests.

Data availability

Data and code will be made available on request.

Acknowledgements

H.Q. was sponsored by the Scientific Research Foundation of Chongqing University of Technology (Grant No: 0102240003) and the Natural Science Foundation of Chongqing, China (Grant No: CSTB2023NSCQ-LZX0048).

Appendix A Proofs

A.1 Proof of Lemma 1

Proof.

Since Ωsum=ρΠ(l[L]Bl)Π=UΣU\Omega_{\mathrm{sum}}=\rho\Pi(\sum_{l\in[L]}B_{l})\Pi^{\prime}=U\Sigma U^{\prime} and UU=IK×KU^{\prime}U=I_{K\times K}, we have

ρΠ(l[L]Bl)Π=UΣUΠ(ρl[L]Bl)ΠU=UΣUU=UΣU=Π(ρl[L]Bl)ΠUΣ1.\displaystyle\rho\Pi(\sum_{l\in[L]}B_{l})\Pi^{\prime}=U\Sigma U^{\prime}\Rightarrow\Pi(\rho\sum_{l\in[L]}B_{l})\Pi^{\prime}U=U\Sigma U^{\prime}U=U\Sigma\Rightarrow U=\Pi(\rho\sum_{l\in[L]}B_{l})\Pi^{\prime}U\Sigma^{-1}.

Since Π(,:)=IK×K\Pi(\mathcal{I},:)=I_{K\times K}, we have U(,:)=(Π(ρl[L]Bl)ΠUΣ1)(,:)=Π(,:)(ρl[L]Bl)ΠUΣ1=(ρl[L]Bl)ΠUΣ1U(\mathcal{I},:)=(\Pi(\rho\sum_{l\in[L]}B_{l})\Pi^{\prime}U\Sigma^{-1})(\mathcal{I},:)=\Pi(\mathcal{I},:)(\rho\sum_{l\in[L]}B_{l})\Pi^{\prime}U\Sigma^{-1}=(\rho\sum_{l\in[L]}B_{l})\Pi^{\prime}U\Sigma^{-1}, i.e., U(,:)=(ρl[L]Bl)ΠUΣ1U(\mathcal{I},:)=(\rho\sum_{l\in[L]}B_{l})\Pi^{\prime}U\Sigma^{-1}. Therefore, U=ΠU(,:)U=\Pi U(\mathcal{I},:) holds. Similarly, we have V=ΠV(,:)V=\Pi V(\mathcal{I},:). ∎

A.2 Proof of Theorem 1

Proof.

The following lemma bounds AsumΩsum\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}.

Lemma 2.

Under MLMMSB(Π,ρ,)\mathrm{MLMMSB}(\Pi,\rho,\mathcal{B}), when Assumption 1 holds, with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

AsumΩsum=O(ρnLlog(n+L)).\displaystyle\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}=O(\sqrt{\rho nL\mathrm{log}(n+L)}).
Proof.

Recall that AsumΩsum=maxi[n]j[n]|Asum(i,j)Ωsum(i,j)|=maxi[n]j[n]|l[L](Al(i,j)Ωl(i,j))|\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}=\mathrm{max}_{i\in[n]}\sum_{j\in[n]}|A_{\mathrm{sum}}(i,j)-\Omega_{\mathrm{sum}}(i,j)|=\mathrm{max}_{i\in[n]}\sum_{j\in[n]}|\sum_{l\in[L]}(A_{l}(i,j)-\Omega_{l}(i,j))|, next we bound it by using the Bernstein inequality below.

Theorem 4.

(Theorem 1.4 of [44]) Let {Xi}\{X_{i}\} be independent, random, self-adjoint matrices with dimension dd. When 𝔼[Xi]=0\mathbb{E}[X_{i}]=0 and andXiR\mathrm{and~{}}\|X_{i}\|\leq R almost surely. Then, for all t0t\geq 0,

(iXit)dexp(t2/2σ2+Rt/3),\displaystyle\mathbb{P}(\|\sum_{i}X_{i}\|\geq t)\leq d\cdot\mathrm{exp}(\frac{-t^{2}/2}{\sigma^{2}+Rt/3}),

where σ2:=i𝔼[Xi2]\sigma^{2}:=\|\sum_{i}\mathbb{E}[X^{2}_{i}]\| and \|\cdot\| denotes spectral norm.

Let xx be any n×1n\times 1 vector. Set y(ij)=l[L](Al(i,j)Ωl(i,j))y_{(ij)}=\sum_{l\in[L]}(A_{l}(i,j)-\Omega_{l}(i,j)) for i[n],j[n]i\in[n],j\in[n], and T(i)=j[n]y(ij)x(j)T_{(i)}=\sum_{j\in[n]}y_{(ij)}x(j) for i[n]i\in[n], we have:

  • 1.

    𝔼(y(ij)x(j))=0\mathbb{E}(y_{(ij)}x(j))=0 for i[n],j[n]i\in[n],j\in[n].

  • 2.

    |y(ij)x(j)|τmaxj|x(j)|=τx|y_{(ij)}x(j)|\leq\tau\mathrm{max}_{j}|x(j)|=\tau\|x\|_{\infty} for i[n],j[n]i\in[n],j\in[n].

  • 3.

    Let Var(X)\mathrm{Var}(X) denote the variance of any random variable XX. Combine Al{0,1}n×nA_{l}\in\{0,1\}^{n\times n} for l[L]l\in[L] with the fact that Al(i,m)A_{l}(i,m) and Al(m,j)A_{l}(m,j) are independent when jij\neq i, we have

    j[n]𝔼[y(ij)2x2(j)]\displaystyle\sum_{j\in[n]}\mathbb{E}[y^{2}_{(ij)}x^{2}(j)] =j[n]x2(j)𝔼[y(ij)2]=j[n]x2(j)Var(y(ij))=j[n]x2(j)l[L]Var(Al(i,j)Ωl(i,j))\displaystyle=\sum_{j\in[n]}x^{2}(j)\mathbb{E}[y^{2}_{(ij)}]=\sum_{j\in[n]}x^{2}(j)\mathrm{Var}(y_{(ij)})=\sum_{j\in[n]}x^{2}(j)\sum_{l\in[L]}\mathrm{Var}(A_{l}(i,j)-\Omega_{l}(i,j))
    =j[n]x2(j)l[L]Var(Al(i,j))=j[n]x2(j)l[L]Ωl(i,j)(1Ωl(i,j))j[n]x2(j)l[L]Ωl(i,j)\displaystyle=\sum_{j\in[n]}x^{2}(j)\sum_{l\in[L]}\mathrm{Var}(A_{l}(i,j))=\sum_{j\in[n]}x^{2}(j)\sum_{l\in[L]}\Omega_{l}(i,j)(1-\Omega_{l}(i,j))\leq\sum_{j\in[n]}x^{2}(j)\sum_{l\in[L]}\Omega_{l}(i,j)
    j[n]x2(j)l[L]ρ=ρLxF2\displaystyle\leq\sum_{j\in[n]}x^{2}(j)\sum_{l\in[L]}\rho=\rho L\|x\|^{2}_{F}

By Theorem 4, for any t0t\geq 0, we have

(|T(i)|t)exp(t2/2ρLxF2+τxt3).\displaystyle\mathbb{P}(|T_{(i)}|\geq t)\leq\mathrm{exp}(\frac{-t^{2}/2}{\rho L\|x\|^{2}_{F}+\frac{\tau\|x\|_{\infty}t}{3}}).

Set t=α+1+(α+1)(α+19)3ρLxF2log(n+L)t=\frac{\alpha+1+\sqrt{(\alpha+1)(\alpha+19)}}{3}\sqrt{\rho L\|x\|^{2}_{F}\mathrm{log}(n+L)} for any α0\alpha\geq 0, if ρLxF2τ2x2log(n+L)\rho L\|x\|^{2}_{F}\geq\tau^{2}\|x\|^{2}_{\infty}\mathrm{log}(n+L), we have

(|T(i)|t)exp((α+1)log(n+L)118(α+1+α+19)2+2α+1α+1+α+19τ2x2log(n+L)ρLxF2)1(n+L)α+1.\displaystyle\mathbb{P}(|T_{(i)}|\geq t)\leq\mathrm{exp}(-(\alpha+1)\mathrm{log}(n+L)\frac{1}{\frac{18}{(\sqrt{\alpha+1}+\sqrt{\alpha+19})^{2}}+\frac{2\sqrt{\alpha+1}}{\sqrt{\alpha+1}+\sqrt{\alpha+19}}\sqrt{\frac{\tau^{2}\|x\|^{2}_{\infty}\mathrm{log}(n+L)}{\rho L\|x\|^{2}_{F}}}})\leq\frac{1}{(n+L)^{\alpha+1}}.

Set x{1,1}n×1x\in\{-1,1\}^{n\times 1}, we have: when ρnLτ2log(n+L)\rho nL\geq\tau^{2}\mathrm{log}(n+L), with probability at least 1o(1(n+L)α+1)1-o(\frac{1}{(n+L)^{\alpha+1}}) for any α0\alpha\geq 0,

T(i)α+1+(α+1)(α+19)3ρnLlog(n+L).\displaystyle T_{(i)}\leq\frac{\alpha+1+\sqrt{(\alpha+1)(\alpha+19)}}{3}\sqrt{\rho nL\mathrm{log}(n+L)}.

Set α=1\alpha=1, when ρnLτ2log(n+L)\rho nL\geq\tau^{2}\mathrm{log}(n+L), with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

AsumΩsum=maxi[n]T(i)=O(ρnLlog(n+L)).\displaystyle\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}=\mathrm{max}_{i\in[n]}T_{(i)}=O(\sqrt{\rho nL\mathrm{log}(n+L)}).

By Theorem 4.2 of [6], when |λK(Ωsum)|4AsumΩsum|\lambda_{K}(\Omega_{\mathrm{sum}})|\geq 4\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}, there is an orthogonal matrix 𝒪\mathcal{O} such that

U^U𝒪214AsumΩsumU2|λK(Ωsum)|.\displaystyle\|\hat{U}-U\mathcal{O}\|_{2\rightarrow\infty}\leq 14\frac{\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}\|U\|_{2\rightarrow\infty}}{|\lambda_{K}(\Omega_{\mathrm{sum}})|}.

Since ϖ:=U^U^UU22U^U𝒪2\varpi:=\|\hat{U}\hat{U}^{\prime}-UU^{\prime}\|_{2\rightarrow\infty}\leq 2\|\hat{U}-U\mathcal{O}\|_{2\rightarrow\infty}, we get

ϖ28AsumΩsumU2|λK(Ωsum)|.\displaystyle\varpi\leq 28\frac{\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}\|U\|_{2\rightarrow\infty}}{|\lambda_{K}(\Omega_{\mathrm{sum}})|}.

Since U2=O(Kn)=O(1n)\|U\|_{2\rightarrow\infty}=O(\sqrt{\frac{K}{n}})=O(\sqrt{\frac{1}{n}}) by Lemma 3.1 of [29] and Condition 1, we get

ϖ=O(AsumΩsum|λK(Ωsum)|n).\displaystyle\varpi=O(\frac{\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}}{|\lambda_{K}(\Omega_{\mathrm{sum}})|\sqrt{n}}).

For |λK(Ωsum)||\lambda_{K}(\Omega_{\mathrm{sum}})|, by Condition 1 and Assumption 4, we have

|λK(Ωsum)|=|λK(ρΠ(l[L]Bl)Π)|=ρ|λK(Π(l[L]Bl)Π)|=ρ|λK(ΠΠl[L]Bl)|ρλK(ΠΠ)|λK(l[L]Bl)|=O(ρnKL)=O(ρnL),\displaystyle|\lambda_{K}(\Omega_{\mathrm{sum}})|=|\lambda_{K}(\rho\Pi(\sum_{l\in[L]}B_{l})\Pi^{\prime})|=\rho|\lambda_{K}(\Pi(\sum_{l\in[L]}B_{l})\Pi^{\prime})|=\rho|\lambda_{K}(\Pi^{\prime}\Pi\sum_{l\in[L]}B_{l})|\geq\rho\lambda_{K}(\Pi^{\prime}\Pi)|\lambda_{K}(\sum_{l\in[L]}B_{l})|=O(\rho\frac{n}{K}L)=O(\rho nL),

which gives that

ϖ=O(AsumΩsumρn1.5L).\displaystyle\varpi=O(\frac{\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}}{\rho n^{1.5}L}).

By the proof of Theorem 3.2 of [29], there is a KK-by-KK permutation matrix 𝒫\mathcal{P} such that,

maxi[n]ei(Π^Π𝒫)1=O(ϖκ(ΠΠ)λ1(ΠΠ))=O(ϖnK)=O(ϖn)=O(AsumΩsumρnL).\displaystyle\mathrm{max}_{i\in[n]}\|e^{\prime}_{i}(\hat{\Pi}-\Pi\mathcal{P})\|_{1}=O(\varpi\kappa(\Pi^{\prime}\Pi)\sqrt{\lambda_{1}(\Pi^{\prime}\Pi)})=O(\varpi\sqrt{\frac{n}{K}})=O(\varpi\sqrt{n})=O(\frac{\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}}{\rho nL}).

By Lemma 2, this theorem holds. Finally, recall that when we use Theorem 4.2 of [6], we require |λK(Ωsum)|4AsumΩsum|\lambda_{K}(\Omega_{\mathrm{sum}})|\geq 4\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}. Since |λK(Ωsum)|=O(ρnL)|\lambda_{K}(\Omega_{\mathrm{sum}})|=O(\rho nL), it is easy to see that as long as ρnLAsumΩsum\rho nL\gg\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}, this requirement holds. The condition ρnLAsumΩsum\rho nL\gg\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty} holds naturally because we need the row-wise error bound O(AsumΩsumρnL)O(\frac{\|A_{\mathrm{sum}}-\Omega_{\mathrm{sum}}\|_{\infty}}{\rho nL}) to be much smaller than 1. ∎

A.3 An alternative proof of Theorem 1

Here, we provide an alternative proof of Theorem 1 by using Theorem 4.2. of [8].

Proof.

Set HU^=U^UH_{\hat{U}}=\hat{U}^{\prime}U and let HU^=UHU^ΣHU^VHU^H_{\hat{U}}=U_{H_{\hat{U}}}\Sigma_{H_{\hat{U}}}V^{\prime}_{H_{\hat{U}}} be its top KK singular value decomposition. Define sgn(HU^)\mathrm{sgn}(H_{\hat{U}}) as UHU^VHU^U_{H_{\hat{U}}}V^{\prime}_{H_{\hat{U}}}. Under MLMMSB(Π,ρ,)\mathrm{MLMMSB}(\Pi,\rho,\mathcal{B}), the following results are true.

  • 1.

    𝔼[l[L](Al(i,j)Ωl(i,j))]=0\mathbb{E}[\sum_{l\in[L]}(A_{l}(i,j)-\Omega_{l}(i,j))]=0 for i[n],j[n]i\in[n],j\in[n].

  • 2.

    𝔼[(l[L](Al(i,j)Ωl(i,j)))2]=𝔼[l[L](Al(i,j)Ωl(i,j))2+l1l2,l1[L],l2[L](Al1(i,j)Ωl1(i,j))(Al2(i,j)Ωl2(i,j))]=l[L]𝔼[(Al(i,j)Ωl(i,j))2]=l[L]Ωl(i,j)(1Ωl(i,j))l[L]Ωl(i,j)l[L]ρ=ρL\mathbb{E}[(\sum_{l\in[L]}(A_{l}(i,j)-\Omega_{l}(i,j)))^{2}]=\mathbb{E}[\sum_{l\in[L]}(A_{l}(i,j)-\Omega_{l}(i,j))^{2}+\sum_{l_{1}\neq l_{2},l_{1}\in[L],l_{2}\in[L]}(A_{l_{1}}(i,j)-\Omega_{l_{1}}(i,j))(A_{l_{2}}(i,j)-\Omega_{l_{2}}(i,j))]=\sum_{l\in[L]}\mathbb{E}[(A_{l}(i,j)-\Omega_{l}(i,j))^{2}]=\sum_{l\in[L]}\Omega_{l}(i,j)(1-\Omega_{l}(i,j))\leq\sum_{l\in[L]}\Omega_{l}(i,j)\leq\sum_{l\in[L]}\rho=\rho L for i[n],j[n]i\in[n],j\in[n].

  • 3.

    |Asum(i,j)Ωsum(i,j)|=|l[L](Al(i,j)Ωl(i,j))|τ|A_{\mathrm{sum}}(i,j)-\Omega_{\mathrm{sum}}(i,j)|=|\sum_{l\in[L]}(A_{l}(i,j)-\Omega_{l}(i,j))|\leq\tau for i[n],j[n]i\in[n],j\in[n].

  • 4.

    Set μ=nU22K\mu=\frac{n\|U\|^{2}_{2\rightarrow\infty}}{K}. By Lemma 3.1 of [29], we have 1Kλ1(ΠΠ)U221λK(ΠΠ)\frac{1}{K\lambda_{1}(\Pi^{\prime}\Pi)}\leq\|U\|^{2}_{2\rightarrow\infty}\leq\frac{1}{\lambda_{K}(\Pi^{\prime}\Pi)}, which gives that μ=O(1)\mu=O(1) by Condition 1.

  • 5.

    Set cb=τρLn/(μlog(n))c_{b}=\frac{\tau}{\sqrt{\rho Ln/(\mu\mathrm{log}(n))}}. μ=O(1)\mu=O(1) gives cb=O(τ2log(n)ρnL)O(1)c_{b}=O(\sqrt{\frac{\tau^{2}\mathrm{log}(n)}{\rho nL}})\leq O(1) by Assumption 1.

The above results ensure that conditions in Assumption 4.1. [8] are satisfied. Then, by Theorem 4.2. [8], when |λK(Ωsum)|ρnLlog(n)|\lambda_{K}(\Omega_{\mathrm{sum}})|\gg\sqrt{\rho nL\mathrm{log}(n)}, with probability at least 1O(1n5)1-O(\frac{1}{n^{5}}), we have

U^sgn(HU^)U2=O(κ(Ωsum)ρLμK+KρLlog(n)|λK(Ωsum)|).\displaystyle\|\hat{U}\mathrm{sgn}(H_{\hat{U}})-U\|_{2\rightarrow\infty}=O(\frac{\kappa(\Omega_{\mathrm{sum}})\sqrt{\rho L\mu K}+\sqrt{K\rho L\mathrm{log}(n)}}{|\lambda_{K}(\Omega_{\mathrm{sum}})|}).

Since μ=O(1)\mu=O(1) and K=O(1)K=O(1), we have

U^sgn(HU^)U2=O(κ(Ωsum)ρL+ρLlog(n)|λK(Ωsum)|).\displaystyle\|\hat{U}\mathrm{sgn}(H_{\hat{U}})-U\|_{2\rightarrow\infty}=O(\frac{\kappa(\Omega_{\mathrm{sum}})\sqrt{\rho L}+\sqrt{\rho L\mathrm{log}(n)}}{|\lambda_{K}(\Omega_{\mathrm{sum}})|}).

By Assumption 2 and Condition 1, we have |λK(Ωsum)|=O(ρnL)|\lambda_{K}(\Omega_{\mathrm{sum}})|=O(\rho nL) and |λ1(Ωsum)|=Ωsum=ρΠ(l[L]Bl)ΠρΠΠl[L]Bl=O(ρnLK)=O(ρnL)|\lambda_{1}(\Omega_{\mathrm{sum}})|=\|\Omega_{\mathrm{sum}}\|=\rho\|\Pi(\sum_{l\in[L]}B_{l})\Pi^{\prime}\|\leq\rho\|\Pi^{\prime}\Pi\|\|\sum_{l\in[L]}B_{l}\|=O(\frac{\rho nL}{K})=O(\rho nL), which gives that κ(Ωsum)=O(1)\kappa(\Omega_{\mathrm{sum}})=O(1) and

U^sgn(HU^)U2=O((ρL+ρLlog(n)ρnL)=O(1nlog(n)ρL).\displaystyle\|\hat{U}\mathrm{sgn}(H_{\hat{U}})-U\|_{2\rightarrow\infty}=O(\frac{(\sqrt{\rho L}+\sqrt{\rho L\mathrm{log}(n)}}{\rho nL})=O(\frac{1}{n}\sqrt{\frac{\mathrm{log}(n)}{\rho L}}).

Since ϖ=U^U^UU22UU^sgn(HU^)2\varpi=\|\hat{U}\hat{U}^{\prime}-UU^{\prime}\|_{2\rightarrow\infty}\leq 2\|U-\hat{U}\mathrm{sgn}(H_{\hat{U}})\|_{2\rightarrow\infty}, we have

ϖ=O(1nlog(n)ρL).\displaystyle\varpi=O(\frac{1}{n}\sqrt{\frac{\mathrm{log}(n)}{\rho L}}).

By the proof of Theorem 3.2 in [29], there is a permutation matrix 𝒫\mathcal{P} such that,

maxi[n]ei(Π^Π𝒫)1=O(ϖκ(ΠΠ)λ1(ΠΠ))=O(ϖnK)=O(ϖn)=O(log(n)ρnL).\displaystyle\mathrm{max}_{i\in[n]}\|e^{\prime}_{i}(\hat{\Pi}-\Pi\mathcal{P})\|_{1}=O(\varpi\kappa(\Pi^{\prime}\Pi)\sqrt{\lambda_{1}(\Pi^{\prime}\Pi)})=O(\varpi\sqrt{\frac{n}{K}})=O(\varpi\sqrt{n})=O(\sqrt{\frac{\mathrm{log}(n)}{\rho nL}}).

Recall that |λK(Ωsum)|O(ρnL)|\lambda_{K}(\Omega_{\mathrm{sum}})|\geq O(\rho nL) and we need |λK(Ωsum)|ρnLlog(n)|\lambda_{K}(\Omega_{\mathrm{sum}})|\gg\sqrt{\rho nL\mathrm{log}(n)} hold. It is easy to see that as long as O(ρnL)ρnLlog(n)ρnLlog(n)O(\rho nL)\gg\sqrt{\rho nL\mathrm{log}(n)}\Leftrightarrow\rho nL\gg\mathrm{log}(n), |λK(Ωsum)|ρnLlog(n)|\lambda_{K}(\Omega_{\mathrm{sum}})|\gg\sqrt{\rho nL\mathrm{log}(n)} always holds. The condition ρnLlog(n)\rho nL\gg\mathrm{log}(n) holds naturally since we require the row-wise error bound O(log(n)ρnL)O(\sqrt{\frac{\mathrm{log}(n)}{\rho nL}}) to be much smaller than 1. ∎

A.4 Proof of Theorem 2

Proof.

Since 𝔼[Ssum]S~sum\mathbb{E}[S_{\mathrm{sum}}]\neq\tilde{S}_{\mathrm{sum}}, i.e., 𝔼[Ssum(i,j)S~sum(i,j)]0\mathbb{E}[S_{\mathrm{sum}}(i,j)-\tilde{S}_{\mathrm{sum}}(i,j)]\neq 0 for i[n],j[n]i\in[n],j\in[n], we can not use Theorem 4.2 in [8] to obtain the row-wise eigenspace error V^V^VV2\|\hat{V}\hat{V}^{\prime}-VV^{\prime}\|_{2\rightarrow\infty} for SsumS_{\mathrm{sum}} and S~sum\tilde{S}_{\mathrm{sum}}. To obtain SPDSoS’s error rate, we use Theorem 4.2 in[6]. By Bernstein inequality, we have the following lemma that bounds SsumS~sum\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}.

Lemma 3.

Under MLMMSB(Π,ρ,)\mathrm{MLMMSB}(\Pi,\rho,\mathcal{B}), when Assumption 3 holds, with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

SsumS~sum=O(ρ2n2Llog(n+L))+O(ρ2nL).\displaystyle\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}=O(\sqrt{\rho^{2}n^{2}L\mathrm{log}(n+L)})+O(\rho^{2}nL).
Proof.

Since Al{0,1}n×nA_{l}\in\{0,1\}^{n\times n}, we have Al2(i,j)=Al(i,j)A^{2}_{l}(i,j)=A_{l}(i,j) for i[n],j[n],l[L]i\in[n],j\in[n],l\in[L], which gives that

SsumS~sum\displaystyle\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty} =maxi[n]j[n]|Ssum(i,j)S~sum(i,j)|=maxi[n]j[n]|l[L](Al2DlΩl2)(i,j)|\displaystyle=\mathrm{max}_{i\in[n]}\sum_{j\in[n]}|S_{\mathrm{sum}}(i,j)-\tilde{S}_{\mathrm{sum}}(i,j)|=\mathrm{max}_{i\in[n]}\sum_{j\in[n]}|\sum_{l\in[L]}(A^{2}_{l}-D_{l}-\Omega^{2}_{l})(i,j)|
=maxi[n]j[n]|l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))l[L]Dl(i,j)|\displaystyle=\mathrm{max}_{i\in[n]}\sum_{j\in[n]}|\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))-\sum_{l\in[L]}D_{l}(i,j)|
=maxi[n](ji,j[n]|l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))|+|l[L]m[n](Al2(i,m)Ωl2(i,m))l[L]Dl(i,i)|)\displaystyle=\mathrm{max}_{i\in[n]}(\sum_{j\neq i,j\in[n]}|\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))|+|\sum_{l\in[L]}\sum_{m\in[n]}(A^{2}_{l}(i,m)-\Omega^{2}_{l}(i,m))-\sum_{l\in[L]}D_{l}(i,i)|)
=maxi[n](ji,j[n]|l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))|+|l[L]m[n](Al(i,m)Ωl2(i,m))l[L]Dl(i,i)|)\displaystyle=\mathrm{max}_{i\in[n]}(\sum_{j\neq i,j\in[n]}|\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))|+|\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)-\Omega^{2}_{l}(i,m))-\sum_{l\in[L]}D_{l}(i,i)|)
=maxi[n](ji,j[n]|l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))|+l[L]m[n]Ωl2(i,m))\displaystyle=\mathrm{max}_{i\in[n]}(\sum_{j\neq i,j\in[n]}|\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))|+\sum_{l\in[L]}\sum_{m\in[n]}\Omega^{2}_{l}(i,m))
maxi[n](ji,j[n]|l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))|+l[L]m[n]ρ2\displaystyle\leq\mathrm{max}_{i\in[n]}(\sum_{j\neq i,j\in[n]}|\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))|+\sum_{l\in[L]}\sum_{m\in[n]}\rho^{2}
=maxi[n](ji,j[n]|l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))|+ρ2nL.\displaystyle=\mathrm{max}_{i\in[n]}(\sum_{j\neq i,j\in[n]}|\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))|+\rho^{2}nL.

Next, we bound ji,j[n]|l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))|\sum_{j\neq i,j\in[n]}|\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))| for i[n]i\in[n]. Let x~\tilde{x} be any (n1)×1(n-1)\times 1 vector. Set y~(ij)=l[L]m[n](Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))\tilde{y}_{(ij)}=\sum_{l\in[L]}\sum_{m\in[n]}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j)) for i[n],ji,j[n]i\in[n],j\neq i,j\in[n] and T~(i)=ji,j[n]y~(ij)x~(j)\tilde{T}_{(i)}=\sum_{j\neq i,j\in[n]}\tilde{y}_{(ij)}\tilde{x}(j) for i[n]i\in[n]. The following results hold:

  • 1.

    𝔼(y~(ij)x~(j))=0\mathbb{E}(\tilde{y}_{(ij)}\tilde{x}(j))=0 since Al(i,m)A_{l}(i,m) and Al(m,j)A_{l}(m,j) are independent when jij\neq i for i[n],j[n]i\in[n],j\in[n].

  • 2.

    |y~(ij)x~(j)|τ~maxj|x~(j)|=τ~x~|\tilde{y}_{(ij)}\tilde{x}(j)|\leq\tilde{\tau}\mathrm{max}_{j}|\tilde{x}(j)|=\tilde{\tau}\|\tilde{x}\|_{\infty} for i[n],j[n]i\in[n],j\in[n].

  • 3.

    Combine Al{0,1}n×nA_{l}\in\{0,1\}^{n\times n} for l[L]l\in[L] with the fact that Al(i,m)A_{l}(i,m) and Al(m,j)A_{l}(m,j) are independent when jij\neq i, we have

    ji,j[n]𝔼[y~(ij)2x~2(j)]\displaystyle\sum_{j\neq i,j\in[n]}\mathbb{E}[\tilde{y}^{2}_{(ij)}\tilde{x}^{2}(j)] =ji,j[n]x~2(j)𝔼[y~(ij)2]=ji,j[n]x~2(j)Var(y~(ij))\displaystyle=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\mathbb{E}[\tilde{y}^{2}_{(ij)}]=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\mathrm{Var}(\tilde{y}_{(ij)})
    =ji,j[n]x~2(j)l[L]m[n]Var(Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))=ji,j[n]x~2(j)l[L]m[n]Var(Al(i,m)Al(m,j))\displaystyle=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}\mathrm{Var}(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}\mathrm{Var}(A_{l}(i,m)A_{l}(m,j))
    =ji,j[n]x~2(j)l[L]m[n]𝔼[(Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j))2]\displaystyle=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}\mathbb{E}[(A_{l}(i,m)A_{l}(m,j)-\Omega_{l}(i,m)\Omega_{l}(m,j))^{2}]
    =ji,j[n]x~2(j)l[L]m[n]𝔼[Al2(i,m)Al2(m,j)+Ωl2(i,m)Ωl2(m,j)2Al(i,m)Al(m,j)Ωl(i,m)Ωl(m,j)]\displaystyle=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}\mathbb{E}[A^{2}_{l}(i,m)A^{2}_{l}(m,j)+\Omega^{2}_{l}(i,m)\Omega^{2}_{l}(m,j)-2A_{l}(i,m)A_{l}(m,j)\Omega_{l}(i,m)\Omega_{l}(m,j)]
    =ji,j[n]x~2(j)l[L]m[n](𝔼[Al2(i,m)Al2(m,j)]Ωl2(i,m)Ωl2(m,j))\displaystyle=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}(\mathbb{E}[A^{2}_{l}(i,m)A^{2}_{l}(m,j)]-\Omega^{2}_{l}(i,m)\Omega^{2}_{l}(m,j))
    =ji,j[n]x~2(j)l[L]m[n](𝔼[Al(i,m)]𝔼[Al(m,j)]Ωl2(i,m)Ωl2(m,j))\displaystyle=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}(\mathbb{E}[A_{l}(i,m)]\mathbb{E}[A_{l}(m,j)]-\Omega^{2}_{l}(i,m)\Omega^{2}_{l}(m,j))
    =ji,j[n]x~2(j)l[L]m[n]Ωl(i,m)Ωl(m,j)(1Ωl(i,m)Ωl(m,j))\displaystyle=\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}\Omega_{l}(i,m)\Omega_{l}(m,j)(1-\Omega_{l}(i,m)\Omega_{l}(m,j))
    ji,j[n]x~2(j)l[L]m[n]Ωl(i,m)Ωl(m,j)ji,j[n]x~2(j)l[L]m[n]ρ2=ρ2nLx~F2.\displaystyle\leq\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}\Omega_{l}(i,m)\Omega_{l}(m,j)\leq\sum_{j\neq i,j\in[n]}\tilde{x}^{2}(j)\sum_{l\in[L]}\sum_{m\in[n]}\rho^{2}=\rho^{2}nL\|\tilde{x}\|^{2}_{F}.

By Theorem 4, for any t~0\tilde{t}\geq 0, we have

(|T~(i)|t~)exp(t~2ρ2nLx~F2+τ~x~t~3).\displaystyle\mathbb{P}(|\tilde{T}_{(i)}|\geq\tilde{t})\leq\mathrm{exp}(\frac{-\tilde{t}^{2}}{\rho^{2}nL\|\tilde{x}\|^{2}_{F}+\frac{\tilde{\tau}\|\tilde{x}\|_{\infty}\tilde{t}}{3}}).

Set t~=α+1+(α+1)(α+19)3ρ2nLx~F2log(n+L)\tilde{t}=\frac{\alpha+1+\sqrt{(\alpha+1)(\alpha+19)}}{3}\sqrt{\rho^{2}nL\|\tilde{x}\|^{2}_{F}\mathrm{log}(n+L)} for any α0\alpha\geq 0. If ρ2nLx~F2τ~2x~2log(n+L)\rho^{2}nL\|\tilde{x}\|^{2}_{F}\geq\tilde{\tau}^{2}\|\tilde{x}\|^{2}_{\infty}\mathrm{log}(n+L), we have

(|T~(i)|t~)exp((α+1)log(n+L)118(α+1+α+19)2+2α+1α+1+α+19τ~2x~2log(n+L)ρ2nLx~F2)1(n+L)α+1.\displaystyle\mathbb{P}(|\tilde{T}_{(i)}|\geq\tilde{t})\leq\mathrm{exp}(-(\alpha+1)\mathrm{log}(n+L)\frac{1}{\frac{18}{(\sqrt{\alpha+1}+\sqrt{\alpha+19})^{2}}+\frac{2\sqrt{\alpha+1}}{\sqrt{\alpha+1}+\sqrt{\alpha+19}}\sqrt{\frac{\tilde{\tau}^{2}\|\tilde{x}\|^{2}_{\infty}\mathrm{log}(n+L)}{\rho^{2}nL\|\tilde{x}\|^{2}_{F}}}})\leq\frac{1}{(n+L)^{\alpha+1}}.

Recall that x~\tilde{x} is any (n1)×1(n-1)\times 1 vector, setting x~{1,1}(n1)×1\tilde{x}\in\{-1,1\}^{(n-1)\times 1} gives the following result: when ρ2n2Lρ2n(n1)Lτ~2log(n+L)\rho^{2}n^{2}L\geq\rho^{2}n(n-1)L\geq\tilde{\tau}^{2}\mathrm{log}(n+L), with probability at least 1o(1(n+L)α+1)1-o(\frac{1}{(n+L)^{\alpha+1}}) for any α0\alpha\geq 0, we have

T~(i)α+1+(α+1)(α+19)3ρ2n(n1)Llog(n+L).\displaystyle\tilde{T}_{(i)}\leq\frac{\alpha+1+\sqrt{(\alpha+1)(\alpha+19)}}{3}\sqrt{\rho^{2}n(n-1)L\mathrm{log}(n+L)}.

Set α=1\alpha=1, when ρ2n2Lτ~2log(n+L)\rho^{2}n^{2}L\geq\tilde{\tau}^{2}\mathrm{log}(n+L), with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

maxi[n]T~(i)=O(ρ2n2Llog(n+L)).\displaystyle\mathrm{max}_{i\in[n]}\tilde{T}_{(i)}=O(\sqrt{\rho^{2}n^{2}L\mathrm{log}(n+L)}).

Hence, we have

SsumS~sum=O(ρ2n2Llog(n+L))+O(ρ2nL).\displaystyle\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}=O(\sqrt{\rho^{2}n^{2}L\mathrm{log}(n+L)})+O(\rho^{2}nL).

By Theorem 4.2 of [6], if |λK(S~sum)|4SsumS~sum|\lambda_{K}(\tilde{S}_{\mathrm{sum}})|\geq 4\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}, there is an orthogonal matrix 𝒪~\tilde{\mathcal{O}} such that

V^V𝒪~214SsumS~sumV2|λK(S~sum)|.\displaystyle\|\hat{V}-V\tilde{\mathcal{O}}\|_{2\rightarrow\infty}\leq 14\frac{\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}\|V\|_{2\rightarrow\infty}}{|\lambda_{K}(\tilde{S}_{\mathrm{sum}})|}.

Since ϖ~:=V^V^VV22V^V𝒪~2\tilde{\varpi}:=\|\hat{V}\hat{V}^{\prime}-VV^{\prime}\|_{2\rightarrow\infty}\leq 2\|\hat{V}-V\tilde{\mathcal{O}}\|_{2\rightarrow\infty} by basic algebra, we have

ϖ~28SsumS~sumV2|λK(S~sum)|.\displaystyle\tilde{\varpi}\leq 28\frac{\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}\|V\|_{2\rightarrow\infty}}{|\lambda_{K}(\tilde{S}_{\mathrm{sum}})|}.

Since V2=O(1n)\|V\|_{2\rightarrow\infty}=O(\sqrt{\frac{1}{n}}) by Lemma 3.1 [29] and Condition 1, we have

ϖ~=O(SsumS~sum|λK(S~sum)|n).\displaystyle\tilde{\varpi}=O(\frac{\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}}{|\lambda_{K}(\tilde{S}_{\mathrm{sum}})|\sqrt{n}}).

For |λK(S~sum)||\lambda_{K}(\tilde{S}_{\mathrm{sum}})|, by Condition 1 and Assumption 4, we have

|λK(S~sum)|\displaystyle|\lambda_{K}(\tilde{S}_{\mathrm{sum}})| =λK((l[L]Ωl2)2)=λK((l[L]ρ2ΠBlΠΠBlΠ)2)=ρ2λK2(l[L]ΠBlΠΠBlΠ)\displaystyle=\sqrt{\lambda_{K}((\sum_{l\in[L]}\Omega^{2}_{l})^{2})}=\sqrt{\lambda_{K}((\sum_{l\in[L]}\rho^{2}\Pi B_{l}\Pi^{\prime}\Pi B_{l}\Pi^{\prime})^{2})}=\rho^{2}\sqrt{\lambda^{2}_{K}(\sum_{l\in[L]}\Pi B_{l}\Pi^{\prime}\Pi B_{l}\Pi^{\prime})}
=ρ2λK2(Π(l[L]BlΠΠBl)Π)=ρ2λK2(ΠΠ(l[L]BlΠΠBl))ρ2λK(ΠΠ)λK2(l[L]BlΠΠBl)\displaystyle=\rho^{2}\sqrt{\lambda^{2}_{K}(\Pi(\sum_{l\in[L]}B_{l}\Pi^{\prime}\Pi B_{l})\Pi^{\prime})}=\rho^{2}\sqrt{\lambda^{2}_{K}(\Pi^{\prime}\Pi(\sum_{l\in[L]}B_{l}\Pi^{\prime}\Pi B_{l}))}\geq\rho^{2}\lambda_{K}(\Pi^{\prime}\Pi)\sqrt{\lambda^{2}_{K}(\sum_{l\in[L]}B_{l}\Pi^{\prime}\Pi B_{l})}
=O(ρ2λK2(ΠΠ)|λK(l[L]Bl2)|)=O(ρ2n2L),\displaystyle=O(\rho^{2}\lambda^{2}_{K}(\Pi^{\prime}\Pi)|\lambda_{K}(\sum_{l\in[L]}B^{2}_{l})|)=O(\rho^{2}n^{2}L),

which gives that

ϖ~=O(SsumS~sumρ2n2.5L).\displaystyle\tilde{\varpi}=O(\frac{\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}}{\rho^{2}n^{2.5}L}).

Proof of Theorem 3.2 in [29] gives that, there is a K×KK\times K permutation matrix 𝒫~\tilde{\mathcal{P}} such that,

maxi[n]ei(Π^Π𝒫~)1=O(ϖ~κ(ΠΠ)λ1(ΠΠ))=O(ϖ~nK)=O(ϖ~n)=O(SsumS~sumρ2n2L).\displaystyle\mathrm{max}_{i\in[n]}\|e^{\prime}_{i}(\hat{\Pi}-\Pi\tilde{\mathcal{P}})\|_{1}=O(\tilde{\varpi}\kappa(\Pi^{\prime}\Pi)\sqrt{\lambda_{1}(\Pi^{\prime}\Pi)})=O(\tilde{\varpi}\sqrt{\frac{n}{K}})=O(\tilde{\varpi}\sqrt{n})=O(\frac{\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}}{\rho^{2}n^{2}L}).

By Lemma 3, this theorem holds. Finally, since |λK(S~sum)|=O(ρ2n2L)|\lambda_{K}(\tilde{S}_{\mathrm{sum}})|=O(\rho^{2}n^{2}L), the requirement |λK(S~sum)|4SsumS~sum|\lambda_{K}(\tilde{S}_{\mathrm{sum}})|\geq 4\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty} is satisfied as long as ρ2n2LSsumS~sum\rho^{2}n^{2}L\gg\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty} which holds naturally since we need the row-wise error bound O(SsumS~sumρ2n2L)O(\frac{\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}}{\rho^{2}n^{2}L}) to be much smaller than 1. ∎

A.5 Proof of Theorem 3

Proof.

Lemma 4 bounds l[L]Al2l[L]Ωl2\|\sum_{l\in[L]}A^{2}_{l}-\sum_{l\in[L]}\Omega^{2}_{l}\|_{\infty}.

Lemma 4.

Under MLMMSB(Π,ρ,)\mathrm{MLMMSB}(\Pi,\rho,\mathcal{B}), if ρnLlog(n+L)\rho nL\geq\mathrm{log}(n+L), with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

l[L]Al2l[L]Ωl2=SsumS~sum+O(ρnL).\displaystyle\|\sum_{l\in[L]}A^{2}_{l}-\sum_{l\in[L]}\Omega^{2}_{l}\|_{\infty}=\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}+O(\rho nL).
Proof.

Since l[L]Al2l[L]Ωl2=SsumS~sum+l[L]Dl\sum_{l\in[L]}A^{2}_{l}-\sum_{l\in[L]}\Omega^{2}_{l}=S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}+\sum_{l\in[L]}D_{l} and each DlD_{l} is a diagonal matrix for l[L]l\in[L], we have

l[L]Al2l[L]Ωl2\displaystyle\|\sum_{l\in[L]}A^{2}_{l}-\sum_{l\in[L]}\Omega^{2}_{l}\|_{\infty} =SsumS~sum+l[L]Dl=maxi[n]j[n]|Ssum(i,j)S~sum(i,j)+l[L]Dl(i,j)|\displaystyle=\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}+\sum_{l\in[L]}D_{l}\|_{\infty}=\mathrm{max}_{i\in[n]}\sum_{j\in[n]}|S_{\mathrm{sum}}(i,j)-\tilde{S}_{\mathrm{sum}}(i,j)+\sum_{l\in[L]}D_{l}(i,j)|
maxi[n]j[n]|Ssum(i,j)S~sum(i,j)|+maxi[n]j[n]|l[L]Dl(i,j)|\displaystyle\leq\mathrm{max}_{i\in[n]}\sum_{j\in[n]}|S_{\mathrm{sum}}(i,j)-\tilde{S}_{\mathrm{sum}}(i,j)|+\mathrm{max}_{i\in[n]}\sum_{j\in[n]}|\sum_{l\in[L]}D_{l}(i,j)|
=SsumS~sum+maxi[n]l[L]j[n]Dl(i,j)=SsumS~sum+maxi[n]l[L]Dl(i,i).\displaystyle=\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}+\mathrm{max}_{i\in[n]}\sum_{l\in[L]}\sum_{j\in[n]}D_{l}(i,j)=\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}+\mathrm{max}_{i\in[n]}\sum_{l\in[L]}D_{l}(i,i).

Since Lemma 3 provides an upper bound of SsumS~sum\|S_{\mathrm{sum}}-\tilde{S}_{\mathrm{sum}}\|_{\infty}, we only need to bound maxi[n]l[L]Dl(i,i)\mathrm{max}_{i\in[n]}\sum_{l\in[L]}D_{l}(i,i). To bound it, we let W(i)=l[L]Dl(i,i)l[L]j[n]Ωl(i,j)=l[L]j[n](Al(i,j)Ωl(i,j))W_{(i)}=\sum_{l\in[L]}D_{l}(i,i)-\sum_{l\in[L]}\sum_{j\in[n]}\Omega_{l}(i,j)=\sum_{l\in[L]}\sum_{j\in[n]}(A_{l}(i,j)-\Omega_{l}(i,j)) for i[n]i\in[n], The following results hold:

  • 1.

    𝔼[(Al(i,j)Ωl(i,j))]=0\mathbb{E}[(A_{l}(i,j)-\Omega_{l}(i,j))]=0 for l[L],i[n],j[n]l\in[L],i\in[n],j\in[n].

  • 2.

    |Al(i,j)Ωl(i,j)|1|A_{l}(i,j)-\Omega_{l}(i,j)|\leq 1 for l[L],i[n],j[n]l\in[L],i\in[n],j\in[n].

  • 3.

    Since 𝔼[Al(i,j)]=Ωl(i,j)\mathbb{E}[A_{l}(i,j)]=\Omega_{l}(i,j) and Var(Al(i,j))=Ωl(i,j)(1Ωl(i,j))\mathrm{Var}(A_{l}(i,j))=\Omega_{l}(i,j)(1-\Omega_{l}(i,j)) for Bernoulli distribution, we have

    l[L]j[n]𝔼[(Al(i,j)Ωl(i,j))2]=l[L]j[n]Var(Al(i,j))=l[L]j[n]Ωl(i,j)(1Ωl(i,j))l[L]j[n]Ωl(i,j)l[L]j[n]ρ=ρnL.\displaystyle\sum_{l\in[L]}\sum_{j\in[n]}\mathbb{E}[(A_{l}(i,j)-\Omega_{l}(i,j))^{2}]=\sum_{l\in[L]}\sum_{j\in[n]}\mathrm{Var}(A_{l}(i,j))=\sum_{l\in[L]}\sum_{j\in[n]}\Omega_{l}(i,j)(1-\Omega_{l}(i,j))\leq\sum_{l\in[L]}\sum_{j\in[n]}\Omega_{l}(i,j)\leq\sum_{l\in[L]}\sum_{j\in[n]}\rho=\rho nL.

By Theorem 4, for any t~~0\tilde{\tilde{t}}\geq 0, we have

(|W(i)|t~~)exp(t~~2ρnL+t~~3).\displaystyle\mathbb{P}(|W_{(i)}|\geq\tilde{\tilde{t}})\leq\mathrm{exp}(\frac{-\tilde{\tilde{t}}^{2}}{\rho nL+\frac{\tilde{\tilde{t}}}{3}}).

Set t~~=α+1+(α+1)(α+19)3ρnLlog(n+L)\tilde{\tilde{t}}=\frac{\alpha+1+\sqrt{(\alpha+1)(\alpha+19)}}{3}\sqrt{\rho nL\mathrm{log}(n+L)} for any α0\alpha\geq 0, if ρnLlog(n+L)\rho nL\geq\mathrm{log}(n+L), we have

(|W(i)|t~~)exp((α+1)log(n+L)118(α+1+α+19)2+2α+1α+1+α+19log(n+L)ρnL)1(n+L)α+1.\displaystyle\mathbb{P}(|W_{(i)}|\geq\tilde{\tilde{t}})\leq\mathrm{exp}(-(\alpha+1)\mathrm{log}(n+L)\frac{1}{\frac{18}{(\sqrt{\alpha+1}+\sqrt{\alpha+19})^{2}}+\frac{2\sqrt{\alpha+1}}{\sqrt{\alpha+1}+\sqrt{\alpha+19}}\sqrt{\frac{\mathrm{log}(n+L)}{\rho nL}}})\leq\frac{1}{(n+L)^{\alpha+1}}.

Hence, when ρnLlog(n+L)\rho nL\geq\mathrm{log}(n+L), with probability at least 1o(1(n+L)α+1)1-o(\frac{1}{(n+L)^{\alpha+1}}), we have

|W(i)|=|l[L]Dl(i,i)l[L]j[n]Ωl(i,j)|t~~.\displaystyle|W_{(i)}|=|\sum_{l\in[L]}D_{l}(i,i)-\sum_{l\in[L]}\sum_{j\in[n]}\Omega_{l}(i,j)|\leq\tilde{\tilde{t}}.

Let α=1\alpha=1, when ρnLlog(n+L)\rho nL\geq\mathrm{log}(n+L), with probability at least 1o(1n+L)1-o(\frac{1}{n+L}), we have

maxi[n]|W(i)|2+2103ρnLlog(n+L).\displaystyle\mathrm{max}_{i\in[n]}|W_{(i)}|\leq\frac{2+2\sqrt{10}}{3}\sqrt{\rho nL\mathrm{log}(n+L)}.

|l[L]Dl(i,i)l[L]j[n]Ωl(i,j)|2+2103ρnLlog(n+L)|\sum_{l\in[L]}D_{l}(i,i)-\sum_{l\in[L]}\sum_{j\in[n]}\Omega_{l}(i,j)|\leq\frac{2+2\sqrt{10}}{3}\sqrt{\rho nL\mathrm{log}(n+L)} for i[n]i\in[n] gives l[L]Dl(i,i)l[L]j[n]Ωl(i,j)+2+2103ρnLlog(n+L)ρnL+2+2103ρnLlog(n+L)=O(ρnL)\sum_{l\in[L]}D_{l}(i,i)\leq\sum_{l\in[L]}\sum_{j\in[n]}\Omega_{l}(i,j)+\frac{2+2\sqrt{10}}{3}\sqrt{\rho nL\mathrm{log}(n+L)}\leq\rho nL+\frac{2+2\sqrt{10}}{3}\sqrt{\rho nL\mathrm{log}(n+L)}=O(\rho nL) since we require ρnLlog(n+L)\rho nL\geq\mathrm{log}(n+L) to hold. Therefore, we have maxi[n]l[L]Dl(i,i)=O(ρnL)\mathrm{max}_{i\in[n]}\sum_{l\in[L]}D_{l}(i,i)=O(\rho nL) and this lemma holds. ∎

Follow a similar proof as Theorem 2, for the SPSoS method, we have

maxi[n]ei(Π^Π𝒫)1=O(l[L]Al2l[L]Ωl2ρ2n2L).\displaystyle\mathrm{max}_{i\in[n]}\|e^{\prime}_{i}(\hat{\Pi}-\Pi\mathcal{P})\|_{1}=O(\frac{\|\sum_{l\in[L]}A^{2}_{l}-\sum_{l\in[L]}\Omega^{2}_{l}\|_{\infty}}{\rho^{2}n^{2}L}).

By Lemma 4 and the fact that ρ1\rho\leq 1, this theorem holds. ∎

References

  • Airoldi et al. [2008] Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2008). Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9, 1981–2014.
  • Anandkumar et al. [2014] Anandkumar, A., Ge, R., Hsu, D., & Kakade, S. M. (2014). A tensor approach to learning mixed membership community models. Journal of Machine Learning Research, 15, 2239–2312.
  • Araújo et al. [2001] Araújo, M. C. U., Saldanha, T. C. B., Galvao, R. K. H., Yoneyama, T., Chame, H. C., & Visani, V. (2001). The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems, 57, 65–73.
  • Bakken et al. [2016] Bakken, T. E., Miller, J. A., Ding, S.-L., Sunkin, S. M., Smith, K. A., Ng, L., Szafer, A., Dalley, R. A., Royall, J. J., Lemon, T. et al. (2016). A comprehensive transcriptional map of primate brain development. Nature, 535, 367–375.
  • Boccaletti et al. [2014] Boccaletti, S., Bianconi, G., Criado, R., Del Genio, C. I., Gómez-Gardenes, J., Romance, M., Sendina-Nadal, I., Wang, Z., & Zanin, M. (2014). The structure and dynamics of multilayer networks. Physics Reports, 544, 1–122.
  • Cape et al. [2019] Cape, J., Tang, M., & Priebe, C. E. (2019). The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. Annals of Statistics, 47, 2405–2439.
  • Chen et al. [2006] Chen, B. L., Hall, D. H., & Chklovskii, D. B. (2006). Wiring optimization can relate neuronal structure and function. Proceedings of the National Academy of Sciences, 103, 4723–4728.
  • Chen et al. [2021] Chen, Y., Chi, Y., Fan, J., & Ma, C. (2021). Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14, 566–806.
  • Chen & Mo [2022] Chen, Y., & Mo, D. (2022). Community detection for multilayer weighted networks. Information Sciences, 595, 119–141.
  • De Domenico et al. [2015] De Domenico, M., Nicosia, V., Arenas, A., & Latora, V. (2015). Structural reducibility of multilayer networks. Nature Communications, 6, 6864.
  • Fortunato [2010] Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486, 75–174.
  • Fortunato & Hric [2016] Fortunato, S., & Hric, D. (2016). Community detection in networks: A user guide. Physics Reports, 659, 1–44.
  • Gillis & Vavasis [2013] Gillis, N., & Vavasis, S. A. (2013). Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE transactions on Pattern Analysis and Machine Intelligence, 36, 698–714.
  • Gillis & Vavasis [2015] Gillis, N., & Vavasis, S. A. (2015). Semidefinite programming based preconditioning for more robust near-separable nonnegative matrix factorization. SIAM Journal on Optimization, 25, 677–698.
  • Gopalan & Blei [2013] Gopalan, P., & Blei, D. (2013). Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences of the United States of America, 110, 14534–14539.
  • Han et al. [2015] Han, Q., Xu, K., & Airoldi, E. (2015). Consistent estimation of dynamic and multi-layer block models. In International Conference on Machine Learning (pp. 1511–1520). PMLR.
  • Holland et al. [1983] Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5, 109–137.
  • Huang et al. [2021] Huang, X., Chen, D., Ren, T., & Wang, D. (2021). A survey of community detection methods in multilayer networks. Data Mining and Knowledge Discovery, 35, 1–45.
  • Javed et al. [2018] Javed, M. A., Younis, M. S., Latif, S., Qadir, J., & Baig, A. (2018). Community detection in networks: A multidisciplinary review. Journal of Network and Computer Applications, 108, 87–111.
  • Jin et al. [2024] Jin, J., Ke, Z. T., & Luo, S. (2024). Mixed membership estimation for social networks. Journal of Econometrics, 239, 105369.
  • Jing et al. [2021] Jing, B.-Y., Li, T., Lyu, Z., & Xia, D. (2021). Community detection on mixture multilayer networks via regularized tensor decomposition. Annals of Statistics, 49, 3181–3205.
  • Ke & Wang [2024] Ke, Z. T., & Wang, M. (2024). Using SVD for topic modeling. Journal of the American Statistical Association, 119, 434–449.
  • Kim & Lee [2015] Kim, J., & Lee, J.-G. (2015). Community detection in multi-layer graphs: A survey. ACM SIGMOD Record, 44, 37–48.
  • Kivelä et al. [2014] Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., & Porter, M. A. (2014). Multilayer networks. Journal of Complex Networks, 2, 203–271.
  • Lazega [2001] Lazega, E. (2001). The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. Oxford University Press, USA.
  • Lei et al. [2020] Lei, J., Chen, K., & Lynch, B. (2020). Consistent community detection in multi-layer network data. Biometrika, 107, 61–73.
  • Lei & Lin [2023] Lei, J., & Lin, K. Z. (2023). Bias-adjusted spectral clustering in multi-layer stochastic block models. Journal of the American Statistical Association, 118, 2433–2445.
  • Magnani et al. [2013] Magnani, M., Micenkova, B., & Rossi, L. (2013). Combinatorial analysis of multiple networks. arXiv preprint arXiv:1303.4986, .
  • Mao et al. [2021] Mao, X., Sarkar, P., & Chakrabarti, D. (2021). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association, 116, 1928–1940.
  • Mucha et al. [2010] Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J.-P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328, 876–878.
  • Narayanan et al. [2010] Narayanan, M., Vetta, A., Schadt, E. E., & Zhu, J. (2010). Simultaneous clustering of multiple gene expression and physical interaction datasets. PLoS Computational Biology, 6, e1000742.
  • Nepusz et al. [2008] Nepusz, T., Petróczi, A., Négyessy, L., & Bazsó, F. (2008). Fuzzy communities and the concept of bridgeness in complex networks. Physical Review E, 77, 016107.
  • Newman [2002] Newman, M. E. (2002). Assortative mixing in networks. Physical Review Letters, 89, 208701.
  • Newman [2003a] Newman, M. E. (2003a). Mixing patterns in networks. Physical Review E, 67, 026126.
  • Newman [2003b] Newman, M. E. (2003b). The structure and function of complex networks. SIAM Review, 45, 167–256.
  • Newman & Girvan [2004] Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113.
  • Paul & Chen [2020] Paul, S., & Chen, Y. (2020). Spectral and matrix factorization methods for consistent community detection in multi-layer networks. Annals of Statistics, 48, 230 – 250.
  • Paul & Chen [2021] Paul, S., & Chen, Y. (2021). Null models and community detection in multi-layer networks. Sankhya A, (pp. 1–55).
  • Pensky & Zhang [2019] Pensky, M., & Zhang, T. (2019). Spectral clustering in the dynamic stochastic block model. Electronic Journal of Statistics, 13, 678 – 709.
  • Psorakis et al. [2011] Psorakis, I., Roberts, S., Ebden, M., & Sheldon, B. (2011). Overlapping community detection using bayesian non-negative matrix factorization. Physical Review E, 83, 066114.
  • Qing & Wang [2024] Qing, H., & Wang, J. (2024). Bipartite mixed membership distribution-free model. a novel model for community detection in overlapping bipartite weighted networks. Expert Systems with Applications, 235, 121088.
  • Snijders et al. [2006] Snijders, T. A., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153.
  • Su et al. [2024] Su, W., Guo, X., Chang, X., & Yang, Y. (2024). Spectral co-clustering in multi-layer directed networks. Computational Statistics &\& Data Analysis, (p. 107987).
  • Tropp [2012] Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12, 389–434.
  • Wang et al. [2011] Wang, F., Li, T., Wang, X., Zhu, S., & Ding, C. (2011). Community discovery using nonnegative matrix factorization. Data Mining and Knowledge Discovery, 22, 493–521.
  • Zhang & Cao [2017] Zhang, J., & Cao, J. (2017). Finding common modules in a time-varying network with application to the drosophila melanogaster gene regulation network. Journal of the American Statistical Association, 112, 994–1008.