Limit Results for Estimation of Connectivity Matrix in Multi-layer Stochastic Block Models

Wenqing Su Department of Mathematical Sciences, Tsinghua University Xiao Guo School of Mathematics, Northwest University Ying Yang Department of Mathematical Sciences, Tsinghua University

Abstract

Multi-layer networks arise naturally in various domains including biology, finance and sociology, among others. The multi-layer stochastic block model (multi-layer SBM) is commonly used for community detection in the multi-layer networks. Most of current literature focuses on statistical consistency of community detection methods under multi-layer SBMs. However, the asymptotic distributional properties are also indispensable which play an important role in statistical inference. In this work, we aim to study the estimation and asymptotic properties of the layer-wise scaled connectivity matrices in the multi-layer SBMs. We develop a novel and efficient method to estimate the scaled connectivity matrices. Under the multi-layer SBM and its variant multi-layer degree-corrected SBM, we establish the asymptotic normality of the estimated matrices under mild conditions, which can be used for interval estimation and hypothesis testing. Simulations show the superior performance of proposed method over existing methods in two considered statistical inference tasks. We also apply the method to a real dataset and obtain interpretable results.

Keywords: Multi-layer networks, Asymptotic properties, Spectral methods

1 Introduction

The multi-layer network has gained widespread attention for its ability to represent multiple relationships among the same entities of interest, thereby enhancing the understanding of complex network data (Mucha et al., 2010; Holme and Saramäki, 2012; Kivelä et al., 2014; Han et al., 2015). For example, the Worldwide Food and Agricultural Trade (WFAT) multi-layer trade network (De Domenico et al., 2015), encompassing trade relationships among the same countries (nodes) across different commodities (layers), offers a richer insight than a single-layer trade network.

The multi-layer stochastic block model (multi-layer SBM) (Han et al., 2015; Paul and Chen, 2016) has been popularly used for the community detection task in the multi-layer network, where each layer corresponds to a stochastic block model (SBM) (Holland et al., 1983). In particular, given common unobserved communities (blocks), the edges in each layer are generated according to a layer-wise block probability matrix or the so-called connectivity matrix. Various methods have been proposed to detect the communities of multi-layer networks under the multi-layer SBM, see, e.g., Han et al. (2015); Bhattacharyya and Chatterjee (2018); Paul and Chen (2020); Jing et al. (2021); Arroyo et al. (2021); Lei and Lin (2023).

Most of the aforementioned literature on community detection of multi-layer SBMs is devoted to the statistical consistency. However, the asymptotic distributional properties are also vital and particularly useful in subsequent inference tasks. In the context of single-layer SBMs or other related models, various versions of asymptotically Gaussian behavior emerged for different purposes (Agterberg and Cape, 2023). Specifically, Tang and Priebe (2018) provided a central limit theorem for the eigenvectors of the normalized Laplacian matrix under the random dot product graph (RDPG) model (Young and Scheinerman, 2007). Rubin-Delanchy et al. (2022) extended these asymptotic results to the generalized RDPG and applied the theory to SBMs. Bickel et al. (2013) and Tang et al. (2022) established the asymptotic normality results for the estimation of the connectivity matrix using the maximum likelihood method and spectral embedding, respectively. Fan et al. (2022b) tested whether two nodes from a mixed-membership model (Airoldi et al., 2008) have the same membership parameters by establishing the asymptotic distributional properties. Jin et al. (2023) proposed a goodness-of-fit approach to select the number of communities and the asymptotic null distribution of the test statistic is normal. Also see Agterberg and Cape (2023) for a comprehensive overview of asymptotic normality in SBMs.

In the multi-layer SBM, the asymptotic distributional properties are crucial for subsequent inferences especially for those tailored for the multi-layer networks. However, as far as we are aware, the asymptotic properties under multi-layer SBMs are largely unexplored. An important inference task in multi-layer SBMs is to test whether networks across different layers are generated from the same SBM, essentially the same connectivity matrix given the consensus community memberships. For instance, in analyzing the WFAT multi-layer network, the goal extends beyond the community detection to ascertaining whether trading patterns across different commodities are similar. Asymptotic properties play a fundamental role in such inference. In the literature, Arroyo et al. (2021) and Zheng and Tang (2022) showed that the connectivity matrices estimated using the spectral embedding are asymptotically normal under the common subspace independent edge graph (COSIE) model. These studies necessitate that the connectivity matrices for all network layers are full rank. However, in practical multi-layer networks like WFAT, not all layers carry comprehensive cluster information, potentially leading to missing clusters, which results in the rank-deficiency of individual connectivity matrices (Su et al., 2024). Moreover, the methods in Arroyo et al. (2021) and Zheng and Tang (2022) treat multi-layer SBMs as special case of their COSIE model, rather than being specifically tailored for them.

Motivated by the above problems, we study the asymptotic properties in the estimation of connectivity matrices under multi-layer SBMs, without imposing the full rank assumption on their populations. We develop a simple and efficient spectral clustering-based method for the scaled connectivity matrix estimation, where the scaled connectivity matrix is also called the score matrix (Arroyo et al., 2021). Under proper conditions, we establish the asymptotic normality of the estimated scaled connectivity matrices under multi-layer SBMs and its variants, the degree-corrected multi-layer SBM. We emphasize that the asymptotic properties can be used to perform various statistical inferences, such as interval estimation and hypothesis testing.

The main contributions of this paper are threefold. First, we systematically study and specifically tailor the asymptotic normality of the scaled connectivity matrix estimation under the multi-layer SBM and its variant, the multi-layer degree-corrected SBM. To the best of our knowledge, the asymptotic normality under the degree-corrected SBM is less explored, let alone the multi-layer degree-corrected SBM. Second, the conditions under which our results hold are relatively mild. Compared to previous work which necessitated full rank in each layer (Arroyo et al., 2021; Zheng and Tang, 2022), we allow individual connectivity matrices to be rank-deficient and only require their squared summation to be full rank. This adaptability makes our approach particularly suitable for analyzing multi-layer networks where each layer only captures partial underlying communities. Third, we apply the established asymptotic normality to two statistical inference tasks, one for interval estimation of the scaled connectivity matrices, and the other for testing whether the population matrices corresponding to different layers are the same. The numerical results show superior performance of the proposed method over compared methods in these tasks. The analysis of a real-world trade network dataset provides interpretable results.

The rest of the paper is organized as follows. Section 2 introduces the estimation method of individual connectivity matrices in the multi-layer SBM. Section 3 provides the asymptotic normality of the estimates. Section 4 extends the method to multi-layer degree-corrected SBMs and presents the corresponding asymptotic normality results. Sections 5 and 6 include the simulation and real data experiment. Section 7 concludes the paper and provides possible extensions. Technical proofs are included in the Appendix.

2 Methodology

In this section, we first present and reparameterize the multi-layer stochastic block model. Then we propose the method for estimating the individual connectivity matrices. The following notation is used throughout the paper.

Notation: We use $[n]$ to denote the set $\{1,...,n\}$ . For a matrix $A\in\mathbb{R}^{n\times n}$ , $\|A\|_{F}$ and $\|A\|_{1,\infty}$ denote the Frobenius norm and the maximum row-wise $\ell_{1}$ norm, respectively. The spectral norm of a matrix and the Euclidean norm of a vector are denoted by $\|\cdot\|_{2}$ . The notation ${\rm diag}(\cdot)$ is used to denote a diagonal matrix formed from either a vector or the diagonal elements of a matrix. The notation ${\rm vec}(\cdot)$ refers to the vectorization of the upper triangular part of a symmetric matrix, proceeding in a column-wise order. For two sequences $a_{n}$ and $b_{n}$ , we write $a_{n}\lesssim b_{n}$ if there exists some constant $c>0$ such that $a_{n}\leq cb_{n}$ , and use $a_{n}\asymp b_{n}$ to denote the case where both $a_{n}\lesssim b_{n}$ and $b_{n}\lesssim a_{n}$ . The notation $a_{n}=\omega(b_{n})$ denotes that $b_{n}/a_{n}\rightarrow 0$ as $n$ goes to infinity, while $a_{n}=o(b_{n})$ means that $b_{n}=\omega(a_{n})$ .

2.1 Multi-layer SBMs

Consider the multi-layer network comprising $L$ layers and $n$ shared nodes, with its symmetric adjacency matrices represented by $A_{l}$ , where $A_{l}\in\{0,1\}^{n\times n}$ for all $1\leq l\leq L$ . The multi-layer SBM (Han et al., 2015; Valles-Catala et al., 2016; Paul and Chen, 2016) is as follows.

Assume the $n$ nodes are partitioned into $K$ layer-independent communities with the community assignment of node $i$ denoted by $g_{i}\in[K]$ . With this community assignment, the element $A_{l,ij}(i<j)$ of $A_{l}(l\in[L])$ is generated independently as the following SBM (Holland et al., 1983),

A_{l,ij}\sim{\rm Bernoulli}(\rho B_{l,g_{i}g_{j}}),

where $\rho\in(0,1]$ controls the overall sparsity of the generated networks, $B_{l}\in\mathbb{[}0,1]^{K\times K}$ denotes the heterogeneous connectivity matrix, and we assume the diagonal entries $A_{l,ii}$ ’s are all 0.

It is easy to see that

Q_{l}=\rho\Theta B_{l}\Theta^{T}\in[0,1]^{n\times n}

is the population matrix for $A_{l}$ , in the sense that $Q_{l}-{\rm diag}(Q_{l})=\mathbb{E}(A_{l})$ , where $\Theta\in\{0,1\}^{n\times K}$ denotes the membership matrix with $\Theta_{ik}=1$ if and only if $k=g_{i}$ and $\Theta_{ik}=0$ otherwise. Define $\Delta={\rm diag}(n_{1},\ldots,n_{K})$ , where each diagonal element $n_{k}$ represents the number of nodes in community $k\in[K]$ , we can then rearrange $Q_{l}$ as

Q_{l}=\rho\Theta\Delta^{-1/2}\Delta^{1/2}B_{l}\Delta^{1/2}\Delta^{-1/2}\Theta^{T}:=UM_{l}U^{T},

(1)

where $U=\Theta\Delta^{-1/2}$ and $M_{l}=\rho\Delta^{1/2}B_{l}\Delta^{1/2}$ . Here, $U$ is an $n\times K$ orthogonal column matrix with $K$ different rows and serves as the eigenspace of the population matrix $Q_{l}$ . The $K\times K$ symmetric matrix $M_{l}$ is referred to as the scaled connectivity matrix or the score matrix. We will study its estimation and asymptotic normality in subsequent sections. Note that Arroyo et al. (2021) first considered this type of decomposition when they developed the COSIE model.

Through the decomposition in (1), the multi-layer SBM can be reparameterized by ( $U,M_{1},\ldots,M_{L}$ ). Note that such parametrization is identifiable up to orthogonal transformation because for any orthogonal matrix $Z\in\mathbb{O}(K)$ , $(UZ)Z^{T}M_{l}Z(UZ)^{T}$ provides another valid decomposition, where $\mathbb{O}(K)$ represents the set consists of all $K\times K$ orthogonal matrices.

2.2 Estimation of the scaled connectivity matrices

In order to estimate the scaled connectivity matrices $M_{l}(l\in[L])$ , we first estimate the common eigenspace $U$ .

To utilize the information across layers, we regard $U$ as the eigenspace of the following $\sum_{l=1}^{L}Q_{l}^{2}$ ,

\sum_{l=1}^{L}Q_{l}^{2}=\sum_{l=1}^{L}\rho^{2}\Theta B_{l}\Theta^{T}\Theta B_{l}\Theta^{T}=U\sum_{l=1}^{L}\rho^{2}\Delta^{1/2}B_{l}\Delta B_{l}\Delta^{1/2}U^{T},

where we squared $Q_{l}$ ’s before summing them up to avoid the community cancellation by direct summation (Lei and Lin, 2023). Suppose $\sum_{l=1}^{L}B_{l}^{2}$ is of full rank, and recalling that $\Delta$ is a full rank diagonal matrix, then $\sum_{l=1}^{L}\rho^{2}\Delta^{1/2}B_{l}\Delta B_{l}\Delta^{1/2}$ is also of full rank. Denote the $\sum_{l=1}^{L}\rho^{2}\Delta^{1/2}B_{l}\Delta B_{l}\Delta^{1/2}$ by $W\Lambda W^{T}$ , where $W\in\mathbb{O}(K)$ and $\Lambda$ is a diagonal matrix. Then we have

\sum_{l=1}^{L}Q_{l}^{2}=UW\Lambda W^{T}U^{T}.

Hence, $UW$ are the eigenvectors of $\sum_{l=1}^{L}Q_{l}^{2}$ .

Therefore, we can estimate $U$ using the eigenvectors of the sample version $\sum_{l=1}^{L}Q_{l}^{2}$ . Specifically, to mitigate the bias introduced by the diagonal part of $\sum_{l=1}^{L}A_{l}^{2}$ , we conduct eigendecomposition on the following bias-adjusted sum of squares of the adjacency matrices

\sum_{l=1}^{L}(A_{l}^{2}-D_{l}),

where $D_{l}$ is an $n\times n$ diagonal matrix with $D_{l,ii}=\sum_{j=1}^{n}A_{l,ij}$ representing the degree of node $i$ in layer $l$ . The resulting eigenvectors are denoted by $\hat{U}$ .

Given $\hat{U}$ , we are ready to estimate the $M_{l}$ matrices by recalling the decomposition of the population matrix $Q_{l}$ in (1). Specifically, we can obtain the following estimator for each layer $l$ :

\hat{M_{l}}=\operatorname*{arg\,min}\limits_{M\in\mathbb{R}^{K\times K}}\|A_{l}-\hat{U}M\hat{U}^{T}\|=\hat{U}^{T}A_{l}\hat{U}.

(2)

Remark 1.

Similar estimation approaches for scaled connectivity matrices have been used by Arroyo et al. (2021), where the common eigenspace $U$ is derived based on distributed estimation techniques (Fan et al., 2019). By contrast, our methodology utilizes the bias-adjusted sum of squares of the adjacency matrices to estimate the eigenspace, aiming to avoid potential signal cancellation. This aggregation method was initially proposed by Lei and Lin (2023); however, their primary focus was on the statistical consistency of community detection instead of the asymptotic normality in the estimation of connectivity matrices.

3 Asymptotic properties

In this section, we present the asymptotic properties of the individual estimates of the scaled connectivity matrices $M_{l}(l\in[L])$ in the multi-layer SBM.

To establish the asymptotic normality of $M_{l}$ , we need the following assumptions.

Assumption 1.

The number of communities $K$ is fixed. The community sizes are balanced, that is, there exists some constant $c_{1}>1$ such that each community size is in $[c_{1}^{-1}n/K,c_{1}n/K]$ .

Assumption 1 also indicates that there exist positive constants $c_{2},c_{3}$ , and an orthogonal matrix $Z\in\mathbb{O}(K)$ , such that $c_{2}/\sqrt{n}\leq|(UZ)_{is}|\leq c_{3}/\sqrt{n}$ for all $i\in[n]$ and $s\in[K]$ . This is referred to as eigenvector delocalization (Rudelson and Vershynin, 2015; He et al., 2019). Indeed, by recalling the relationship $U=\Theta\Delta^{-1/2}$ and choosing the orthogonal matrix $Z$ with $c_{2}/\sqrt{K}\leq Z_{st}\leq c_{3}/\sqrt{K}$ for each $K$ and each pair $1\leq s,t\leq K$ , the delocalization of $U$ follows by Assumption 1.

Assumption 2.

The minimum eigenvalue of $\sum_{l=1}^{L}B_{l}^{2}$ is at least $c_{4}L$ for some constant $c_{4}>0$ .

Assumption 2 specifies the growth rate of the minimum eigenvalue of $\sum_{l=1}^{L}B_{l}^{2}$ in order to make the theoretical bound of the estimated common eigenspace $\hat{U}$ around $U$ concrete. The linear growth rate is reasonable because when the connectivity matrices are common, Assumption 2 holds naturally.

Assumption 3.

Suppose the sum of the variance of the edges satisfies

s^{2}(Q_{l}):=\sum_{i=1}^{n}\sum_{j=1}^{n}Q_{l,ij}(1-Q_{l,ij})=\omega(1)

for all $l\in[L]$ .

Assumption 3 is critical to ensure the Lindeberg conditions for the Central Limit Theorem are met. For balanced community sizes, Assumption 3 simplifies to $\sum_{s=1}^{K}\sum_{t=1}^{K}\rho B_{l,st}(1-\rho B_{l,st})=\omega(1/n^{2})$ , which suffices if $\rho=\omega(1/n^{2})$ .

Before stating the next assumption, we first define the following $K(K+1)/2\times K(K+1)/2$ matrix $\Sigma^{l}$ ,

\Sigma_{\frac{2s+t(t-1)}{2},\frac{2s^{\prime}+t^{\prime}(t^{\prime}-1)}{2}}^{l}:=\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(U_{is}U_{jt}+U_{js}U_{it})\;(U_{is^{\prime}}U_{jt^{\prime}}+U_{js^{\prime}}U_{it^{\prime}})\;Q_{l,ij}(1-Q_{l,ij})

(3)

for each $1\leq s\leq t\leq K$ and $1\leq s^{\prime}\leq t^{\prime}\leq K$ . Also, define $G_{l}=A_{l}-P_{l}$ with $P_{l}=\mathbb{E}(A_{l})=Q_{l}-\text{diag}(Q_{l})$ to be the noise matrix. As we will see, $\Sigma^{l}$ actually serves as the covariance matrix of the vectorized $U^{T}G_{l}U$ , denoted by ${\rm vec}(U^{T}G_{l}U)\in\mathbb{R}^{K(K+1)/2}$ , where we focus on the upper triangular part of the symmetric $U^{T}G_{l}U$ . Specifically, for any $1\leq s\leq t\leq K$ , the $\frac{2s+t(t-1)}{2}$ th entry in ${\rm vec}(U^{T}G_{l}U)$ corresponds to $(U^{T}G_{l}U)_{s,t}$ .

Assumption 4.

$\lambda_{\min}(\Sigma^{l})=\omega(1/n^{2})$ for all $l\in[L]$ , where $\lambda_{\min}(\cdot)$ denotes the smallest eigenvalue.

Similar to Assumption 3, Assumption 4 contributes to verifying the Lindeberg conditions for the Central Limit Theorem. By Assumption 3 and the discussions after Assumption 1, we can conclude that each entry of $\Sigma_{l}$ is $\omega(1/n^{2})$ . Hence, Assumption 4 is stronger than Assumption 3.

Remark 2.

Assumptions 3 and 4 were first used in Arroyo et al. (2021).

With these assumptions, we obtain the following entry-wise and vector-wise asymptotic properties of the estimated scaled connectivity matrices under the multi-layer SBM.

Theorem 1.

Suppose Assumptions 1 and 2 hold for the multi-layer SBM, and a positive constant $c_{5}$ exists such that $n\rho\geq c_{5}\log(L+n)$ . Then the estimate $\hat{M_{l}}$ obtained from (2) has the following asymptotic properties.

(a) If Assumption 3 holds, then for any given $l\in[L]$ and $s,t\in[K]$ , we have

\left(\Sigma_{\frac{2s+t(t-1)}{2},\frac{2s+t(t-1)}{2}}^{l}\right)^{-1/2}\left(Z\hat{M_{l}}Z^{T}-M_{l}-ZE_{l}Z^{T}\right)_{s,t}\rightarrow_{d}\mathcal{N}(0,1)

as $n$ goes to infinity.

(b) If Assumption 4 holds, then for any given $l\in[L]$ , we have

\left(\Sigma^{l}\right)^{-1/2}{\rm vec}\left(Z\hat{M_{l}}Z^{T}-M_{l}-ZE_{l}Z^{T}\right)\rightarrow_{d}\mathcal{N}(\bm{0},\bm{I})

as $n$ goes to infinity. Here, $\mathcal{N}(\bm{0},\bm{I})$ is a standard normal distribution in $K(K+1)/2$ dimensions.

In (a) and (b), $Z=\operatorname*{arg\,inf}\limits_{Z\in\mathbb{O}(K)}\|U^{T}\hat{U}-Z\|_{F}$ is a $K\times K$ orthogonal matrix, and $E_{l}$ is a diminishing term satisfying $\|E_{l}\|_{F}\leq c_{6}\left(\rho+\frac{\log^{1/2}(L+n)}{L^{1/2}}\right)$ with high probability for some positive constant $c_{6}$ .

Remark 3.

By Assumption 3 and the discussions after Assumption 1, we have $\Sigma_{\frac{2s+t(t-1)}{2},\frac{2s+t(t-1)}{2}}^{l}\\ =\frac{s^{2}(Q_{l})}{n^{2}}(1+o(1))$ . Thus the smaller edge variance $s^{2}(Q_{l})$ would lead to a more efficient estimate $(\hat{M}_{l})_{st}$ , as long as the edge variance satisfies Assumption 3.

Remark 4.

The bias term $\|E_{l}\|_{F}$ is negligible because it is dominated by $\|M_{l}\|_{F}\asymp n\rho$ . This is also numerically verified in Section 5.

Remark 5.

Compared with Arroyo et al. (2021), the merit of Theorem 1 lies in that we derive the non-asymptotic error bound for the bias term while their bound holds in expected value. In addition, we relax the full rank assumption of each connectivity matrix therein.

4 Extension to multi-layer degree-corrected SBMs

In this section, we extend the proposed method and the corresponding asymptotic normality results to multi-layer degree-corrected stochastic block models, which is a counterpart of the multi-layer SBM but in each layer the network is assumed to be generated from a degree-corrected SBM (DCSBM) (Karrer and Newman, 2011).

The SBM can not capture the degree heterogeneity inherent in the networks. To address this, the DCSBM extends the standard SBM by incorporating node specific parameters, allowing degrees to vary within the same community.

We now introduce the multi-layer DCSBM, where the layer-wise networks are generated from the DCSBM. Without specification, the notes and notation are the same as those in Section 2. Define $\psi\in\mathbb{R}^{n}_{+}$ to be the degree heterogeneity parameter which measures the propensity of a node in forming edges with other nodes and is consensus among layers. For identifiability, we assume $\max_{i\in C_{k}}\psi_{i}=1$ for all $k\in[K]$ (Lei and Rinaldo, 2015; Zhang et al., 2022). Given $\psi$ , the community assignments $g_{i}\in[K]$ , the connectivity matrix $B_{l}(l\in[L])$ , and the network sparsity parameter $\rho$ , the element $A_{l,ij}(i<j)$ is generated independently as follows

A_{l,ij}\sim{\rm Bernoulli}(\rho\psi_{i}\psi_{j}B_{l,g_{i}g_{j}}),

and $A_{l,ij}=A_{l,ji}$ and $A_{l,ii}=0$ . The population adjacency matrix of $A_{l}$ is then

Q_{l}=\rho{\rm diag}(\psi)\Theta B_{l}\Theta^{T}{\rm diag}(\psi).

To facilitate further analysis, we now give some additional notation. For each $k\in[K]$ , define $G_{k}=\{1\leq i\leq n:g_{i}=k\}$ as the set of nodes whose community membership is $k$ . Let $\phi_{k}$ be an $n\times 1$ vector that matches $\psi$ on the index set $G_{k}$ and is zero elsewhere. Define $\Psi={\rm diag}(\|\phi_{1}\|_{2}^{2},\ldots,\|\phi_{K}\|_{2}^{2})$ , where each diagonal element $\|\phi_{k}\|_{2}^{2}$ represents the effective community size of community $k$ . This allows us to reformat $Q_{l}$ as

Q_{l}=\rho{\rm diag}(\psi)\Theta\Psi^{-1/2}\Psi^{1/2}B_{l}\Theta^{T}\Psi^{1/2}\Psi^{-1/2}{\rm diag}(\psi):=UM_{l}U^{T},

where $U={\rm diag}(\psi)\Theta\Psi^{-1/2}$ and $M_{l}=\rho\Psi^{1/2}B_{l}\Psi^{1/2}$ . It can be shown that $U^{T}U=I$ .

As described in the case of multi-layer SBMs, we regard $U$ as the eigenspace of the following $\sum_{l=1}^{L}Q_{l}^{2}$ ,

\sum_{l=1}^{L}Q_{l}^{2}=\sum_{l=1}^{L}\rho^{2}{\rm diag}(\psi)\Theta B_{l}\Theta^{T}{\rm diag}^{2}(\psi)\Theta B_{l}\Theta^{T}{\rm diag}(\psi)=U\sum_{l=1}^{L}\rho^{2}\Psi^{1/2}B_{l}\Psi B_{l}\Psi^{1/2}U^{T}.

Suppose $\sum_{l=1}^{L}B_{l}^{2}$ is of full rank, then $\sum_{l=1}^{L}\rho^{2}\Psi^{1/2}B_{l}\Psi B_{l}\Psi^{1/2}$ is of full rank as well. Similar to the case of multi-layer SBMs, we perform the eigendecomposition on the bias-adjusted sum of squares of the adjacency matrices $\sum_{l=1}^{L}(A_{l}^{2}-D_{l})$ , where recall $D_{l}$ is an $n\times n$ diagonal matrix with $D_{l,ii}=\sum_{j=1}^{n}A_{l,ij}$ and the resulting eigenvectors are denoted by $\hat{U}$ . We further obtain $\hat{M_{l}}$ with the help of $\hat{U}$ through (2).

The following Assumptions E1, E3, and E4 are needed for establishing the asymptotic normality of $\hat{M}_{l}$ , which are counterparts of the assumptions under the multi-layer SBM.

Assumption E1.

The number of communities $K$ is fixed and $K\|\phi_{k}\|_{2}\asymp\|\psi\|_{2}$ for all $k\in[K]$ .

Assumption E3.

$\min\limits_{l\in[L]}\sum_{i=1}^{n}\sum_{j=1}^{n}Q_{l,ij}(1-Q_{l,ij})=\omega(n^{2}/\|\psi\|_{2}^{4})$ .

Assumption E4.

$\lambda_{\min}(\Sigma^{l})=\omega(1/\|\psi\|^{4}_{2})$ for all $l\in[L]$ , where $\lambda_{\min}(\cdot)$ denotes the smallest eigenvalue.

Assumption E1 requires that the node propensity parameters restricted to each community have the same order of Euclidean norm. This condition is frequently imposed in the analysis of the DCSBM; see Su et al. (2020); Jin et al. (2023) and Agterberg et al. (2022). Generally, Assumptions E3 and E4 are more stringent than Assumptions 3 and 4 because $\|\psi\|_{2}\lesssim\sqrt{n}$ ; while in the special case of multi-layer SBMs where $\psi=\mathbf{1}$ and $\|\psi\|_{2}\asymp\sqrt{n}$ , Assumptions E3 and E4 reduce to Assumptions 3 and 4, respectively. With these assumptions, the entry-wise and vector-wise asymptotic normality hold for the estimated scaled connectivity matrix $\hat{M}_{l}$ under the multi-layer DCSBM.

Theorem 2.

Suppose Assumptions E1 and 2 hold for the multi-layer DCSBM, and a positive constant $c_{7}$ exists such that $n\rho\geq c_{7}\log(L+n)$ . Consider the estimate $\hat{M_{l}}$ obtained from (2), which has the following asymptotic properties.

(a) If Assumption E3 holds, then for any given $l\in[L]$ and $s,t\in[K]$ , we have

\left(\Sigma_{\frac{2s+t(t-1)}{2},\frac{2s+t(t-1)}{2}}^{l}\right)^{-1/2}\left(Z\hat{M_{l}}Z^{T}-M_{l}-ZE_{l}Z^{T}\right)_{s,t}\rightarrow_{d}\mathcal{N}(0,1)

as $n$ goes to infinity.

(b) If Assumption E4 holds, then for any given $l\in[L]$ , we have

\left(\Sigma^{l}\right)^{-1/2}{\rm vec}\left(Z\hat{M_{l}}Z^{T}-M_{l}-ZE_{l}Z^{T}\right)\rightarrow_{d}\mathcal{N}(\bm{0},\bm{I})

as $n$ goes to infinity. Here, $\mathcal{N}(\bm{0},\bm{I})$ is a standard normal distribution in $K(K+1)/2$ dimensions.

In (a) and (b), $Z=\operatorname*{arg\,inf}\limits_{Z\in\mathbb{O}(K)}\|U^{T}\hat{U}-Z\|_{F}$ is a $K\times K$ orthogonal matrix, and $E_{l}$ is a bias term satisfying $\|E_{l}\|_{F}\leq c_{8}\left(\frac{n^{2}\log^{1/2}(L+n)}{L^{1/2}\|\psi\|_{2}^{4}}+\max\{\frac{n^{3/2}\rho^{1/2}}{\|\psi\|_{2}^{4}},\rho\}\right)$ with high probability for some positive constant $c_{8}$ .

Remark 6.

$\|\psi\|_{2}$ reveals the degree of heterogeneity to some extent. A larger $\|\psi\|_{2}$ would lead to a smaller bias $\|E_{l}\|_{F}$ . When $\|\psi\|_{2}=\omega(n^{1/4})$ , the upper bound of $\|E_{l}\|_{F}$ does not exceed $c_{9}\left(n^{1/2}\rho^{1/2}+\frac{n}{L^{1/2}}\right)$ up to log factors, where $c_{9}$ is some positive constant. In this case, when $\rho L^{1/2}=\omega(1)$ , the bias term $\|E_{l}\|_{F}$ is dominated by $\|M_{l}\|_{F}\asymp n\rho$ .

Remark 7.

The discrepancy between $U$ and $\hat{U}$ , which we establish in Lemma A2, is important for Theorem 2. It is of independent interest in evaluating the misclassification performance of the spectral clustering-based algorithm under the multi-layer DCSBM.

5 Simulations

In this section, we conduct simulations to investigate the finite sample performance of the proposed method. In Section 5.1, we verify the vanishing behavior of the bias term $E_{l}$ in the formulation of asymptotic normality. In Sections 5.2 and 5.3, we test the efficacy of the derived asymptotic normality in two downstream statistical inference tasks, one for the interval estimation of the entries of the scaled connectivity matrices and the other for hypothesis testing to infer whether the multi-layer network exhibits homogeneity.

The proposed method is denoted by SCCE, namely, the abbreviation for spectral clustering-based method for connectivity matrix estimation. Since studies on the asymptotic properties under multi-layer SBMs are relatively limited, we compare SCCE with the Multiple Adjacency Spectral Embedding (MASE) method studied in Arroyo et al. (2021) and Zheng and Tang (2022).

5.1 Bias evaluation

As shown in Theorems 1 and 2, $\hat{M}_{l}$ exhibits asymptotic normality with a bias term $E_{l}$ for all $l\in[L]$ . The theoretical results indicate that for a fixed number of nodes $n$ , the bias tends to zero in Frobenius norm as the number of layers $L$ increases and the edge density $\rho$ decreases. The exact formulation for $E_{l}$ can be found in the proofs and (A1).

In this experiment, we numerically verify the vanishing behavior of the bias $E_{l}$ . To this end, we generate the following multi-layer SBM. Consider $n$ nodes per network with $K=3$ communities proportional to the number of nodes via $(0.4,0.3,0.3)$ . We set $B_{l}=\rho B^{(1)}$ for $l\in\{1,\ldots,L/2\}$ and $B_{l}=\rho B^{(2)}$ for $l\in\{L/2+1,\ldots,L\}$ , with

B^{(1)}=U\begin{bmatrix}1.5&0&0\\ 0&0.2&0\\ 0&0&0.5\end{bmatrix}U^{T}\approx\begin{bmatrix}0.675&0.175&0.46\\ 0.175&0.675&0.46\\ 0.46&0.46&0.85\end{bmatrix}

and

B^{(2)}=U\begin{bmatrix}1.5&0&0\\ 0&0.2&0\\ 0&0&-0.5\end{bmatrix}U^{T}\approx\begin{bmatrix}0.175&0.675&0.46\\ 0.675&0.175&0.46\\ 0.46&0.46&0.85\end{bmatrix},

where

U=\begin{bmatrix}1/2&1/2&-\sqrt{2}/2\\ 1/2&1/2&\sqrt{2}/2\\ \sqrt{2}/2&-\sqrt{2}/2&0\end{bmatrix}.

(4)

We test how the bias $\sum_{l=1}^{L}\|E_{l}\|_{F}/L$ varies against the number of layers $L$ . The average results over 100 replications are displayed in Figure 1. Specifically, in Figure 1(a), the number of nodes is fixed to be 500 and we test the method for various fixed $\rho\in\{0.3,0.2,0.1,0.05\}$ . The bias decreases as $L$ increases and $\rho$ decreases. In particular, when $\rho$ is large, our upper bound on the bias indicates that $\rho$ is the dominating term and large $L$ will not decrease the bias. The numerical result with $\rho=0.3$ coincides with this theoretical finding. In Figure 1(b), the overall edge density $\rho$ is fixed to be 0.1 and we test the results for various fixed $n\in\{200,300,400,500\}$ . The results are also consistent with the theoretical bound, showing that the bias decreases as the number of layers $L$ or the number of nodes $n$ increases.

5.2 Interval estimation

The asymptotic distribution of $\hat{M}_{l,ij}$ helps to establish the interval estimate of $M_{l,st}$ for $1\leq l\leq L$ and $1\leq s,t\leq K$ . In particular, the interval estimate for $M_{l,st}$ , as indicated by Theorems 1 and 2, is given by

(Z\hat{M}_{l}Z^{T})_{s,t}\pm u_{\alpha/2}\left(\hat{\Sigma}_{\frac{2s+t(t-1)}{2},\frac{2s+t(t-1)}{2}}^{l}\right)^{1/2},

where $u_{\alpha/2}$ is the $\alpha/2$ -th percentile of the standard normal distribution, and $\hat{\Sigma}$ represents the estimated covariance matrix, which is the counterpart of (3) with $U$ and $Q_{l}$ ’s replaced by $\hat{U}$ and $\hat{A}_{l}$ ’s, respectively.

In this experiment, we test the accuracy of the estimated interval using the probability of the estimated interval covering the true underlying parameter ${M}_{l,ij}$ over 200 replications. We consider two model set-ups, one for the multi-layer SBM and the other for the multi-layer DCSBM. In the multi-layer SBM, the network generation process is similar to that described in Section 5.1. We consider various combinations of $n\in\{100,200,300,400,500\}$ and $\rho\in\{0.3,0.2,0.1,0.05\}$ . For each combination of $(n,\rho)$ , we consider two different numbers of layers, $L$ , set at 50 and 100, respectively. In the multi-layer DCSBM, the network generation process is also similar to that in Section 5.1 with the number of nodes $n\in\{300,400,500\}$ except that we introduce the degree heterogeneity parameter $\psi$ . For each $\psi_{i}$ , we independently generate $\tilde{\psi}_{i}$ from the uniform distribution $U(2,3)$ and set $\psi_{i}=\beta_{n}\tilde{\psi}_{i}/\|\tilde{\psi}\|_{2}$ (Jin et al., 2023). Here we set the corresponding $\beta_{n}$ to 10.4, 12 and 13.4 when the number of nodes $n$ is 300, 400 and 500, respectively.

Tables 1 and 2 show the average rate of the 95% confidence intervals covering the true parameters over 200 replications for the multi-layer SBM and multi-layer DCSBM, respectively. In particular, we calculate the average coverage rate over all the triples $(s,t,L)$ . The results show that as $\rho$ , $n$ , and $L$ increase, the coverage rate under both multi-layer SBMs and multi-layer DCSBMs improves. The proposed method consistently outperforms the MASE over all range of the considered parameter settings. In addition, we observe that the 95% confidence intervals do not achieve a 95% coverage rate, which can be attributed to the presence of a bias term or to an inadequate number of nodes and network layers. Nevertheless, as the number of layers and nodes increases, the coverage rate becomes close to 95%.

Table 1: The average coverage rate of the 95% confidence intervals under multi-layer SBMs.

		$\rho=0.3$		$\rho=0.2$		$\rho=0.1$		$\rho=0.05$
${n}$	$L$	SCCE	MASE	SCCE	MASE	SCCE	MASE	SCCE	MASE
100	50	0.860	0.764	0.752	0.696	0.787	0.697	0.817	0.743
100	100	0.906	0.864	0.846	0.728	0.811	0.732	0.854	0.753
200	50	0.920	0.876	0.902	0.782	0.705	0.655	0.726	0.530
200	100	0.924	0.906	0.919	0.872	0.865	0.715	0.755	0.598
300	50	0.924	0.895	0.920	0.799	0.831	0.618	0.588	0.550
300	100	0.928	0.917	0.929	0.885	0.905	0.756	0.724	0.636
400	50	0.925	0.917	0.928	0.832	0.903	0.456	0.630	0.467
400	100	0.927	0.923	0.928	0.894	0.922	0.658	0.857	0.490
500	50	0.929	0.926	0.927	0.854	0.911	0.404	0.752	0.413
500	100	0.929	0.929	0.927	0.903	0.925	0.602	0.886	0.419

Table 2: The average coverage rate of the 95% confidence intervals under multi-layer DCSBMs.

		$\rho=0.3$		$\rho=0.2$		$\rho=0.1$		$\rho=0.05$
n	L	SCCE	MASE	SCCE	MASE	SCCE	MASE	SCCE	MASE
300	50	0.883	0.555	0.732	0.537	0.621	0.438	0.721	0.581
300	100	0.915	0.692	0.881	0.579	0.664	0.434	0.758	0.577
400	50	0.905	0.480	0.843	0.450	0.522	0.363	0.660	0.463
400	100	0.924	0.697	0.905	0.531	0.700	0.428	0.679	0.457
500	50	0.916	0.441	0.880	0.365	0.480	0.351	0.561	0.402
500	100	0.925	0.634	0.914	0.434	0.796	0.433	0.583	0.392

5.3 Hypothesis testing

In multi-layer networks, an interesting statistical inference task is to test whether there is homogeneity across different layers, specifically whether some layer-wise adjacency matrices come from the same population. In the multi-layer SBM, recall the model can be reparameterized by ( $U,M_{1},\ldots,M_{L}$ ), as described in (1), which implies that for any pair $(k,l)$ with $k\neq l$ and $k,l\in\{1,...,L\}$ , the populations are identical, namely $Q_{k}=Q_{l}$ , if and only if $M_{k}=M_{l}$ . Consequently, this type of homogeneity manifests as a partition of the scaled connectivity matrices $M_{l}$ , say

M_{i_{1}}=\cdots=M_{i_{r}};\quad M_{i_{r+1}}=\cdots=M_{i_{r+j}};\quad\cdots.

Here $i_{r}$ is the index in $\{1,\ldots,L\}$ . All pairs within a set of the partition are equal, and two $M_{l}$ ’s in different partitions are unequal. To infer the homogeneity of the multi-layer SBM, we consider the simultaneous testing of the $\binom{L}{2}$ hypotheses

H_{kl,0}:M_{k}=M_{l},\quad\quad 1\leq k<l\leq L,

(5)

and use a Holm type step-down procedure to control the family-wise error rate (Lehmann and Romano, 2005). The totality of acceptance and rejection statements resulting from the multiple comparison procedure may lead to a partition of the connectivity matrices. In the Holm procedure, null hypotheses are considered successively, from most significant to least significant, with further tests depending on the outcome of earlier ones. If any hypothesis is rejected at the level $\alpha^{\prime}=\alpha/\binom{L}{2}$ , the denominator of $\alpha^{\prime}$ for the next test is $\binom{L}{2}-1$ and the criterion continues to be modified in a stage-wise manner, with the denominator of $\alpha^{\prime}$ reduced by 1 each time a hypothesis is rejected, so that tests can be conducted at successively higher significance levels. This type of multiple comparison procedure is commonly used (Dudoit et al., 2008; Noble, 2009).

The primary challenge lies in how to test the $\binom{L}{2}$ individual hypotheses, which can be facilitated using the asymptotic distribution of the estimated scaled connectivity matrices. Specifically, to test each individual hypothesis $H_{kl,0}$ at the specified significance level, we employ the Frobenius norm of the difference between the estimated scaled connectivity matrices $T_{kl}:=\|\hat{M}_{k}-\hat{M}_{l}\|_{F}$ as the test statistic. The distribution of the test statistic can be determined with the help of asymptotic distribution of ${\rm vec}(\hat{M}_{k}-\hat{M}_{l})$ .

We first specify the distribution of ${\rm vec}(\hat{M}_{k}-\hat{M}_{l})$ . In Theorem 1, we have provided the asymptotic distribution of ${\rm vec}(Z\hat{M_{l}}Z^{T}-M_{l})$ , with the vanishing bias term excluded. However, the dependence between $\hat{M}_{k}$ and $\hat{M}_{l}$ prevents a straightforward summation to obtain the asymptotic distribution of ${\rm vec}(Z\hat{M_{k}}Z^{T}-M_{k}-Z\hat{M_{l}}Z^{T}+M_{l})$ , which, under the hypothesis $H_{kl,0}$ , reduces to the distribution of ${\rm vec}(Z(\hat{M}_{k}-\hat{M}_{l})Z^{T})$ . Fortunately, by leveraging the analogous technique used in the proof of Theorem 1, specifically the decomposition (A2), we can obtain the limiting distribution of ${\rm vec}(Z(\hat{M}_{k}-\hat{M}_{l})Z^{T}-M_{k}+M_{l})$ . Specifically, we have

\left(\Sigma^{k}+\Sigma^{l}\right)^{-1/2}{\rm vec}\left(Z(\hat{M}_{k}-\hat{M}_{l})Z^{T}-M_{k}+M_{l}-Z(E_{k}-E_{l})Z^{T}\right)\rightarrow_{d}\mathcal{N}(\bm{0},\bm{I})

for each distinct pair $k$ and $l$ satisfying $1\leq k\neq l\leq L$ , where $E_{k}$ and $E_{l}$ are negligible under certain conditions and the notation is the same as those in Theorem 1. The conclusion holds for both the multi-layer SBM and the multi-layer DCSBM. As a result, under the hypothesis $H_{kl,0}$ and omitting the bias terms, we can approximate the asymptotic distribution of ${\rm vec}(Z(\hat{M}_{k}-\hat{M}_{l})Z^{T})$ by $\mathcal{N}(\bm{0},\Sigma^{k}+\Sigma^{l})$ . With the help of asymptotic distribution of ${\rm vec}(Z(\hat{M}_{k}-\hat{M}_{l})Z^{T})$ , we estimate the distribution of the test statistic $T_{kl}=\|\hat{M}_{k}-\hat{M}_{l}\|_{F}=\|Z(\hat{M}_{k}-\hat{M}_{l})Z^{T}\|_{F}$ by drawing samples from $\mathcal{N}(\bm{0},\hat{\Sigma}^{k}+\hat{\Sigma}^{l})$ and calculating the empirical distribution of the Frobenius norm, where $\hat{\Sigma}^{k}$ and $\hat{\Sigma}^{l}$ denote the estimated covariance matrices defined in Section 5.2. The above procedure to determine the distribution of the test statistic is denoted by SCCE.

Experiment 1.

The distribution of the test statistic $T_{kl}$ enables the testing of the individual hypothesis $H_{kl,0}$ . In this experiment, we evaluate the power of the test statistic $T_{kl}$ in testing $H_{kl,0}$ for a given pair $k\neq l$ . Both the multi-layer SBM and the multi-layer DCSBM are considered. We fix $L=50$ , $n=300$ and $K=3$ . The number of nodes in three communities is assigned according to $(0.4,0.3,0.3)$ . The overall edge density $\rho$ is fixed at 0.2. The network generation processes of both models are similar with that in Section 5.2 except the definitions of $B_{l}$ ’s. In this experiment, we set $B_{l}=\rho B^{(1)}$ for $l\in\{3,\ldots,L/2+1\}$ , and $B_{l}=\rho B^{(2)}$ for $l\in\{L/2+2,\ldots,L\}$ , where $B^{(1)}$ and $B^{(2)}$ are defined in Section 5.1. For the first layer $B_{1}=\rho B^{(1)}$ , and for the second layer $B_{2}=\rho B^{(2^{\prime})}$ , the $B^{(2^{\prime})}$ is defined to be the same as $B^{(1)}$ except for the first entry $B^{(2^{\prime})}_{11}$ . We vary $B^{(2^{\prime})}_{11}$ to obtain a sequence of $B_{2}$ matrices. Under these parameter settings, we test the individual hypothesis $H_{12,0}:M_{1}=M_{2}$ at a specified significance level.

Figure 2 shows the averaged empirical power to reject the individual hypothesis $H_{12,0}$ at a 0.05 significance level with $\|B_{1}-B_{2}\|_{F}$ increasing over 100 replications. In our comparisons, we compare SCCE with a counterpart SCCE-Oracle, which differs from SCCE primarily in that it does not use the asymptotic distribution of ${\rm vec}(Z(\hat{M}_{k}-\hat{M}_{l})Z^{T})$ . Instead, SCCE-Oracle generates various samples from the population to obtain various $\hat{M}_{k}-\hat{M}_{l}$ and uses the empirical distribution to approximate the real distribution of the test statistic $T_{kl}$ . Another comparison method, MASE, is defined similarly with SCCE except that we use the estimator $\hat{M}_{l}$ and the asymptotic distribution derived in Arroyo et al. (2021). The results in Figure 2 show that the proposed method SCCE has power close to that of SCCE-Oracle, demonstrating the accuracy of the asymptotic distributions. In addition, the proposed method SCCE has higher power than MASE under both network models, showing the efficacy of the proposed method in hypothesis testing.

Experiment 2.

We shall now consider the simultaneous testing of the $\binom{L}{2}$ hypotheses (5) by means of a Holm type step-down procedure for inferring homogeneity of multi-layer networks. For this purpose, we consider the multi-layer SBM with two distinct connectivity matrices, where all connectivity matrices constitute a partition as follows:

M_{1}=M_{2}=\cdots=M_{L/2};\quad\quad M_{L/2+1}=M_{L/2+2}=\cdots=M_{L}.

All pairs within a set of the partition are equal, and two $M_{l}$ ’s in different partitions are unequal. It is clear that the overall network is heterogeneous, while the network layers within each partition exhibit homogeneity. Specifically, we set $B_{l}=\rho B^{(1)}$ for $l\in\{1,\ldots,L/2\}$ and $B_{l}=\rho B^{(2)}$ for $l\in\{L/2+1,\ldots,L\}$ , with

B^{(1)}=U\begin{bmatrix}1&0&0\\ 0&0.4&0\\ 0&0&0.1\end{bmatrix}U^{T}\approx\begin{bmatrix}0.4&0.3&0.212\\ 0.3&0.4&0.212\\ 0.212&0.212&0.7\end{bmatrix}

and

B^{(2)}=U\begin{bmatrix}1&0&0\\ 0&0.4&0\\ 0&0&-0.1\end{bmatrix}U^{T}\approx\begin{bmatrix}0.3&0.4&0.212\\ 0.4&0.3&0.212\\ 0.212&0.212&0.7\end{bmatrix},

where $U$ is defined as in (4). We set $L=20$ , $n=500$ and $K=3$ , with the number of nodes in three communities proportioned at $(0.4,0.3,0.3)$ . The overall edge density $\rho$ is fixed at 0.2. Recall the definition of $M_{l}$ , the difference in $M_{l}$ between different partitions is now given by $\rho\|B^{(1)}-B^{(2)}\|_{F}=0.04$ .

Figure 3 presents the outcomes of the multiple comparisons conducted using a Holm type step-down procedure, where olive green signifies acceptance and white signifies rejection of the individual hypotheses. We control the Holm procedure to ensure that the family-wise error rate is no bigger than $\alpha=0.05$ . In the results of the proposed method SCCE, two distinct blocks are observed within which all individual hypotheses are simultaneously accepted, demonstrating the presence of homogeneity within these blocks. This is consistent with our experimental setup. However, MASE cannot accurately infer the true homogeneity. Moreover, in cases where the true hypothesis is false, the proposed method SCCE almost always rejects them.

6 Real data analysis

In this section, we use the proposed method SCCE to measure the homogeneity across all layers of a real multi-layer network, specifically the WFAT dataset (De Domenico et al., 2015). For this purpose, we simultaneously test the $\binom{L}{2}$ hypotheses

H_{kl,0}:M_{k}=M_{l},\quad\quad 1\leq k<l\leq L,

using a Holm type step-down procedure. Next, we provide the data description and the results for the real dataset.

The WFAT dataset, sourced from the Food and Agriculture Organization (FAO) of the United Nations, contains annual trade records for over 400 food and agricultural products across all countries globally. Our analysis focuses specifically on 2020 trade data for cereals, a category defined by the FAO and detailed in Table A3. We constructed a multi-layer network using this data. Each layer represents a different type of cereal, with nodes representing countries and edges within each layer representing the trade relationships with respect a specific cereal. In particular, a trade link between countries is established if the trade value for a cereal exceeds $\$2000$ . To ensure network connectivity, we excluded countries with a total degree over 37 layers less than 23. This guarantees that each node is linked to at least one other node in at least half of the layers. The resulting network consists of 37 layers, each containing 114 common nodes, denoted by $A_{l},l\in\{1,\ldots,37\}$ . The five continents, including America, Africa, Asia, Europe, and Oceania, suggest the choice $K=5$ (Noroozi and Pensky, 2024).

Figure 4 shows the results of the multiple comparisons conducted using a Holm type step-down procedure, where olive green indicates the acceptance and white denotes the rejection of the individual hypotheses. We control the Holm procedure to ensure that the family-wise error rate is no greater than $\alpha=0.05$ . Additionally, the $p$ -value matrix for individual hypothesis tests across all network layers is shown. In Figure 4(a), several distinct blocks are observed where all individual hypotheses within each block are simultaneously accepted, demonstrating the presence of homogeneity within the multi-layer network. For instance, layers 5 to 14, corresponding to ‘Germ of maize’, ‘Triticale’, ‘Cereals n.e.c.’, ‘Millet’, ‘Sorghum’, ‘Bran of maize’, ‘Bran of cereals n.e.c.’, ‘Flour of mixed grain’, ‘Flour of rice’, and ‘Rye’, exhibit no rejections of the individual hypotheses in any pairwise comparisons. This outcome suggests a significant similarity in trade patterns among these cereal products. Notably, most of these cereals are consumed primarily by Asia, which might lead to simultaneous acceptance of the individual hypotheses. This observation suggests the need for an integrated analysis of the trade patterns of these cereals.

7 Conclusion

In this paper, we focused on the asymptotic normality in the estimation of the scaled connectivity matrix under multi-layer SBMs and multi-layer DCSBMs. To this end, we first leveraged the leading eigenvectors of the bias-adjusted sum of squares of the adjacency matrices to estimate the common eigenspaces across layers (Lei and Lin, 2023). Building on this, we further derived efficient estimates for the layer-wise scaled connectivity matrices. These estimates are shown to satisfy asymptotic normality under mild additional assumptions. To corroborate the theoretical results, we conducted a series of simulations to apply the derived asymptotic normality to statistical inference tasks, including interval estimation and hypothesis testing. Finally, we applied the method to a real-world network dataset to infer the population homogeneity within the multi-layer network.

There are many ways to extend the content of this paper. First, we assume the connectivity matrices are different among layers. However, in practice, the network layers are often grouped such that connectivity matrices are the same within each group (Jing et al., 2021; Fan et al., 2022a). It is thus meaningful to make use of all the network layers within each group to estimate the connectivity matrices, which would lead to improved estimation accuracy and a weakened requirement for network sparsity. Second, we focused on the undirected multi-layer networks. It is of great importance to extend the method and theory to multi-layer directed networks (Zheng and Tang, 2022; Su et al., 2024). Third, we assumed the number of communities is known and fixed. In the future, it is important to study the estimation of the number of communities under multi-layer SBMs.

Appendix

Appendix A.1 provides the technical lemmas that are needed to prove the central limit theorems. Appendix A.2 includes the proof of main theorems in the main text. Appendix A.3 contains auxiliary lemmas. Appendix A.4 provides the additional results for real data analysis.

A.1 Technical lemmas

Lemma A1.

Suppose the multi-layer SBM satisfies Assumptions 1 and 2. Let $\hat{U}$ denotes the leading eigenvectors of $\sum_{l=1}^{L}(A_{l}^{2}-D_{l})$ . If $n\rho\geq c_{1}\log(L+n)$ for some positive constant $c_{1}$ , then

\displaystyle\|\hat{U}-UZ\|_{F}\leq c_{2}\left(\frac{1}{n}+\frac{\log^{1/2}(L+n)}{L^{1/2}\rho^{1/2}n^{1/2}}\right)

with probability at least $1-O((L+n)^{-1})$ for some constant $c_{2}>0$ .

Proof.

The bias-adjusted sum-of-squares of the adjacency matrices $\sum_{l=1}^{L}(A_{l}^{2}-D_{l})$ can be decomposed into a signal term and for noise terms

\sum_{l=1}^{L}\left(A_{l}^{2}-D_{l}\right)=\sum_{l=1}^{L}Q_{l}^{2}+N_{1}+N_{2}+N_{3}+N_{4},

where $N_{1}=\mbox{diag}^{2}(Q_{l})-Q_{l}\mbox{diag}(Q_{l})-\mbox{diag}(Q_{l})Q_{l}$ , $N_{2}=\sum_{l=1}^{L}\left(P_{l}G_{l}^{T}+G_{l}P_{l}^{T}\right)$ , $N_{3}=\sum_{l=1}^{L}G_{l}^{2}-\mbox{diag}(\sum_{l=1}^{L}G_{l}^{2})$ , and $N_{4}=\mbox{diag}(\sum_{l=1}^{L}G_{l}^{2})-\sum_{l=1}^{L}D_{l}$ . Recall that in the context of multi-layer SBMs, $Q_{l}=\rho\Theta B_{l}\Theta^{T}$ , $P_{l}=Q_{l}-\mbox{diag}(Q_{l})$ and $G_{l}=A_{l}-P_{l}$ .

According to Theorem 1 of Lei and Lin (2023), the minimum eigenvalue of the signal term $\sum_{l=1}^{L}Q_{l}^{2}$ is lower bounded by $cL\rho^{2}n^{2}$ for some positive constant $c$ . Note that we use $c$ to represent the generic constant and it may be different from line to line. We proceed to establish upper bounds for the spectral norms of all noise terms. The first noise term satisfies $\|N_{1}\|_{2}\leq cLn\rho^{2}.$ Regarding $N_{2}$ , under the condition $n\rho\geq c_{1}\log(L+n)$ , we have $\|N_{2}\|_{2}\leq cL^{1/2}n^{3/2}\rho^{3/2}\log^{1/2}(n+L)$ with high probability. For $N_{3}$ , by Theorem 4 of Lei and Lin (2023), under the condition $n\rho\geq c_{1}\log(L+n)$ , we have $\|N_{3}\|_{2}\leq cL^{1/2}n\rho\log(L+n)$ with probability at least $1-O((L+n)^{-1})$ . Finally, for $N_{4}$ , given $(\sum_{l=1}^{L}G_{l}^{2})_{ii}\leq Ln\rho^{2}+D_{l,ii}$ for all $i\in[m]$ , it follows that $\|N_{4}\|_{2}\leq Ln\rho^{2}$ .

Integrating the bounds of all terms, we have with high probability,

\frac{\|N_{1}+N_{2}+N_{3}+N_{4}\|_{2}}{\lambda_{K}(\sum_{l=1}^{L}Q_{l}^{2})}\leq c\frac{Ln\rho^{2}+L^{1/2}n^{3/2}\rho^{3/2}\log^{1/2}(L+n)}{L\rho^{2}n^{2}}.

Let $\widehat{U}$ be the $n\times K$ matrices that contain the leading eigenvectors of $\sum_{l=1}^{L}\left(A_{l}^{2}-D_{l}\right)$ . In accordance with Proposition 2.2 of Vu and Lei (2013) and the Davis-Kahan sin $\Theta$ theorem (Theorem VII.3.1 of Bhatia (1997)), there exists a $K\times K$ orthogonal matrix $Z$ such that

\|\widehat{U}-UZ\|_{F}\leq\sqrt{K}\|\widehat{U}-UZ\|_{2}\leq c\left(\frac{1}{n}+\frac{\log^{1/2}(L+n)}{L^{1/2}\rho^{1/2}n^{1/2}}\right).

The proof is completed. ∎

Lemma A2.

Suppose the multi-layer DCSBM satisfies Assumptions E1 and 2. Let $\hat{U}$ denotes the leading eigenvectors of $\sum_{l=1}^{L}(A_{l}^{2}-D_{l})$ . If $n\rho\geq c_{1}\log(L+n)$ for some positive constant $c_{1}$ , then

\|\hat{U}-UZ\|_{F}\leq c_{2}\left(\frac{n}{\|\psi\|_{2}^{4}}+\frac{n^{3/2}\log^{1/2}(L+n)}{L^{1/2}\rho^{1/2}\|\psi\|_{2}^{4}}\right)

with probability at least $1-O((L+n)^{-1})$ for some positive constant $c_{2}$ .

Proof.

Following the approach used in the proof of Lemma A1, we separately bound the signal term and the noise terms. Recall that in the context of multi-layer degree-corrected SBMs, $P_{l}=Q_{l}-\mbox{diag}(Q_{l})$ , $Q_{l}=\rho\mbox{diag}(\psi)\Theta B_{l}\Theta^{T}\mbox{diag}(\psi)$ and $G_{l}=A_{l}-P_{l}$ . Given the balanced effective community sizes assumption and the fixed number of communities, the minimum eigenvalue of $\Psi$ is lower bounded by $c\|\psi\|_{2}^{2}$ for some positive constant $c$ . Then we have

$\displaystyle\sum_{l=1}^{L}Q_{l}^{2}$	$\displaystyle=$	$\displaystyle\rho^{2}\tilde{\Theta}\Psi^{1/2}\left[\sum_{l=1}^{L}B_{l}\Psi B_{l}\right]\Psi^{1/2}\tilde{\Theta}^{T}$
	$\displaystyle\succeq$	$\displaystyle c\rho^{2}\\|\psi\\|_{2}^{2}\tilde{\Theta}\Psi^{1/2}\sum_{l=1}^{L}B_{l}^{2}\Psi^{1/2}\tilde{\Theta}^{T}$
	$\displaystyle\succeq$	$\displaystyle cL\rho^{2}\\|\psi\\|_{2}^{4}\tilde{\Theta}\tilde{\Theta}^{T},$

where $\succeq$ denotes the Loewner partial order, in particular, let $A$ and $B$ be two Hermitian matrices of order $p$ , we say that $A\succeq B$ if $A-B$ is positive semi-definite. The latter $\succeq$ is derived from Assumption 2. Recall that $\tilde{\Theta}$ is an orthogonal column matrix, the smallest eigenvalue of $\sum_{l=1}^{L}Q_{l}^{2}$ is lower bounded by $cL\rho^{2}\|\psi\|_{2}^{4}$ . For the bounds of the spectral norms of all noise terms, they are consistent with those established in Lemma A1.

Combining the bounds of all terms, Proposition 2.2 of Vu and Lei (2013) and the Davis-Kahan sin $\Theta$ theorem, there exists a $K\times K$ orthogonal matrix $Z$ such that

\|\widehat{U}-UZ\|_{F}\leq\sqrt{K}\|\widehat{U}-UZ\|_{2}\leq c\frac{Ln\rho^{2}+L^{1/2}n^{3/2}\rho^{3/2}\log^{1/2}(L+n)}{L\rho^{2}\|\psi\|_{2}^{4}}

holds with high probability. The proof is completed. ∎

A.2 Main proofs

Proof of Theorem 1

Recall that $Q_{l}=\rho\Theta B_{l}\Theta^{T}$ and the noise matrix $G_{l}=A_{l}-P_{l}$ , where $P_{l}=Q_{l}-\text{diag}(Q_{l})$ for all $l\in[L]$ . Then the estimator $\hat{M_{l}}$ can be rearranged as

$\displaystyle\hat{M_{l}}$	$\displaystyle=$	$\displaystyle\hat{U}^{T}Q_{l}\hat{U}+\hat{U}^{T}G_{l}\hat{U}-\hat{U}^{T}{\rm diag}(Q_{l})\hat{U}$	(A1)
	$\displaystyle=$	$\displaystyle\hat{U}^{T}UM_{l}U^{T}\hat{U}+\hat{U}^{T}G_{l}\hat{U}-\hat{U}^{T}{\rm diag}(Q_{l})\hat{U}$
	$\displaystyle=$	$\displaystyle(\hat{U}^{T}U-Z^{T})M_{l}(U^{T}\hat{U})+Z^{T}M_{l}(U^{T}\hat{U}-Z)+Z^{T}M_{l}Z$
		$\displaystyle+\leavevmode\nobreak\ (\hat{U}^{T}-Z^{T}U^{T})G_{l}(\hat{U}-UZ)+Z^{T}U^{T}G_{l}(\hat{U}-UZ)-(Z^{T}U^{T}-\hat{U}^{T})G_{l}UZ$
		$\displaystyle+\leavevmode\nobreak\ Z^{T}U^{T}G_{l}UZ-\hat{U}^{T}{\rm diag}(Q_{l})\hat{U}$
	$\displaystyle=$	$\displaystyle Z^{T}M_{l}Z+Z^{T}U^{T}G_{l}UZ+E_{l},$

where $E_{l}$ in the final equality represents all terms from the penultimate equality excluding $Z^{T}M_{l}Z$ and $Z^{T}U^{T}G_{l}UZ$ . Here, $Z=\operatorname*{arg\,inf}\limits_{Z\in\mathbb{O}(K)}\|U^{T}\hat{U}-Z\|_{F}$ is a $K\times K$ orthogonal matrix. Thus,

Z\hat{M_{l}}Z^{T}-M_{l}-ZE_{l}Z^{T}=U^{T}G_{l}U.

(A2)

We initiate our argument by establishing that $ZE_{l}Z^{T}$ tends towards zero as the number of nodes and layers increase, and the edge density decreases. We accomplish this by exerting control over each term in $E_{l}$ .

$\displaystyle\\|E_{l}\\|_{F}$	$\displaystyle\leq$	$\displaystyle\\|(\hat{U}^{T}U-Z^{T})M_{l}(U^{T}\hat{U})\\|_{F}+\\|Z^{T}M_{l}(U^{T}\hat{U}-Z)\\|_{F}$
		$\displaystyle+\leavevmode\nobreak\ \\|(\hat{U}^{T}-Z^{T}U^{T})G_{l}(\hat{U}-UZ)\\|_{F}+2\\|Z^{T}U^{T}G_{l}(\hat{U}-UZ)\\|_{F}$
		$\displaystyle+\leavevmode\nobreak\ \\|\hat{U}^{T}{\rm diag}(Q_{l})\hat{U}\\|_{F}$
	$\displaystyle\leq$	$\displaystyle\\|\hat{U}^{T}U-Z^{T}\\|_{F}\\|M_{l}\\|_{2}+\\|(U^{T}\hat{U}-Z)\\|_{F}\\|M_{l}\\|_{2}$
		$\displaystyle+\leavevmode\nobreak\ \\|\hat{U}-UZ\\|_{F}^{2}\\|G_{l}\\|_{2}+2\\|\hat{U}-UZ\\|_{F}\\|G_{l}\\|_{2}+\\|{\rm diag}(Q_{l})\\|_{2}\\|\hat{U}\\|_{F}$
	$\displaystyle=$	$\displaystyle 4(\\|M_{l}\\|_{2}+\\|G_{l}\\|_{2})\\|\hat{U}-UZ\\|_{F}^{2}+2\\|\hat{U}-UZ\\|_{F}\\|G_{l}\\|_{2}+\sqrt{K}\\|{\rm diag}(Q_{l})\\|_{2}.$

The upper bound of $\|\hat{U}-UZ\|_{F}$ is provided in Lemma A1. With regards to the term $M_{l}$ , it holds that $\|M_{l}\|_{2}=\|UM_{l}U^{T}\|_{2}=\|Q_{l}\|_{2}$ . For $G_{l}$ , applying Lemma A3, we have

\|G_{l}\|_{2}\leq c\sqrt{\|Q_{l}\|_{1,\infty}}

with probability at least $1-O(n^{-1})$ . Note that we use $c$ and $c^{\prime}$ to represent the generic positive constants and it may be different from line to line. Consequently, it can be deduced that

	$\displaystyle\\|ZE_{l}Z^{T}\\|_{F}$	$\displaystyle\leq$	$\displaystyle c(\\|Q_{l}\\|_{2}+\sqrt{\\|Q_{l}\\|_{1,\infty}})\left(\frac{1}{n^{2}}+\frac{\log(L+n)}{Ln\rho}\right)$
			$\displaystyle\leavevmode\nobreak\ +c\sqrt{\\|Q_{l}\\|_{1,\infty}}\left(\frac{1}{n}+\frac{\log^{1/2}(L+n)}{L^{1/2}n^{1/2}\rho^{1/2}}\right)+\sqrt{K}\rho$

with high probability. Additionally, given that every element of $Q_{l}$ is at most $\rho$ , both $\|Q_{l}\|_{2}$ and $\|Q_{l}\|_{1,\infty}$ do not exceed $n\rho$ . Thus, we can deduce that $\|ZE_{l}Z^{T}\|_{F}\leq c\left(\rho+\frac{\log^{1/2}(L+n)}{L^{1/2}}\right)$ . As the number of layers $L$ increase, and the overall edge probability decrease, $ZE_{l}Z^{T}$ tends to vanish.

Consequently, demonstrating the central limit theorem can be reduced to verifying the asymptotic normality of $U^{T}G_{l}U$ . The covariance of any pair $(U^{T}G_{l}U)_{s,t}$ and $(U^{T}G_{l}U)_{s^{\prime},t^{\prime}}$ , where $1\leq s\leq t\leq K$ and $1\leq s^{\prime}\leq t^{\prime}\leq K$ , is

			$\displaystyle{\rm Cov}\left((U^{T}G_{l}U)_{s,t},\;(U^{T}G_{l}U)_{s^{\prime},t^{\prime}}\right)$
		$\displaystyle=$	$\displaystyle{\rm Cov}\left(\sum_{1\leq i,j\leq n}U_{is}G_{l,ij}U_{jt},\;\sum_{1\leq i,j\leq n}U_{is^{\prime}}G_{l,ij}U_{jt^{\prime}}\right)$
		$\displaystyle=$	$\displaystyle\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(U_{is}U_{jt}+U_{js}U_{it})\;(U_{is^{\prime}}U_{jt^{\prime}}+U_{js^{\prime}}U_{it^{\prime}})\;{\rm Cov}\left(G_{l,ij},\;G_{l,ij}\right)$
		$\displaystyle=$	$\displaystyle\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(U_{is}U_{jt}+U_{js}U_{it})\;(U_{is^{\prime}}U_{jt^{\prime}}+U_{js^{\prime}}U_{it^{\prime}})\;Q_{l,ij}(1-Q_{l,ij}).$

That is the $\Sigma_{\frac{2s+t(t-1)}{2},\frac{2s^{\prime}+t^{\prime}(t^{\prime}-1)}{2}}^{l}$ term defined in (3). Regarding the expectation term, given that $\mathbb{E}(G_{l})=0$ , it is straightforward to see that each element of $U^{T}G_{l}U$ has a zero-mean, as each element sums terms with zero-mean. In other words, we have $\mathbb{E}\left((U^{T}G_{l}U)_{s,t}\right)=0$ for all $1\leq s\leq t\leq K$ .

Part (a): Note that

(U^{T}G_{l}U)_{s,t}=\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(U_{is}U_{jt}+U_{js}U_{it})\;G_{l,ij}:=\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}F_{l,ij},

(A3)

where we let $F_{l,ij}$ denote $(U_{is}U_{jt}+U_{js}U_{it})\;G_{l,ij}$ . Given $l$ , it is clear that the $F_{l,ij}$ terms are zero-mean and independent of each other for all $1\leq i<j\leq n$ . By the definition of $U$ and according to Assumption 1, we have $|F_{l,ij}|\leq c^{\prime}/n$ for all $l\in[L]$ and $1\leq i<j\leq n$ , where $c^{\prime}$ is some positive constant. To affirm the Lindeberg condition, we proceed by calculating the sum of ${\rm Var}(F_{l,ij})$ for all $1\leq i<j\leq n$ . Given Assumption 3 and the premise of balanced community sizes, it follows that ${\rm Var}\left((U^{T}G_{l}U)_{s,t}\right)=\omega(1/n^{2})$ . Therefore, for any $\varepsilon>0$ and sufficiently large $n$ , we can state that $|F_{l,ij}|<\varepsilon\left[{\rm Var}\left((U^{T}G_{l}U)_{s,t}\right)\right]^{1/2}$ . This leads to the Lindeberg condition as given by

\lim\limits_{n\rightarrow\infty}\frac{1}{{\rm Var}\left((U^{T}G_{l}U)_{s,t}\right)}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}\mathbb{E}\left(|F_{l,ij}|^{2}\mathbb{I}_{\{|F_{l,ij}|>\varepsilon[{\rm Var}((U^{T}G_{l}U)_{s,t})]^{1/2}\}}\right)=0

(A4)

is satisfied. It should be emphasised that here $U$ , $G_{l}$ and $F_{l,ij}$ are all related to $n$ . For simplicity, we have not notated the $n$ display. By the Lindeberg-Feller Central Limit Theorem, we have

\left(\Sigma_{\frac{2s+t(t-1)}{2},\frac{2s+t(t-1)}{2}}^{l}\right)^{-1/2}(U^{T}G_{l}U)_{s,t}\rightarrow_{d}\mathcal{N}(0,1)

for each $l\in[L]$ . The claim follows by combining the above discussion with (A2).

Part (b): To complete the central limit theorem for vectorlized $U^{T}G_{l}U$ , we first show that this term can be expressed as a summation of independent zero-mean random variables. Let $\widetilde{\Sigma}^{l}:=(\Sigma^{l})^{-1/2}$ and consider an arbitrary vector $x\in\mathbb{R}^{K(K+1)/2}$ .

	$\displaystyle x^{T}(\Sigma^{l})^{-1/2}{\rm vec}(U^{T}G_{l}U)$	(A5)
$\displaystyle=$	$\displaystyle\sum_{1\leq s\leq t\leq K}x_{\frac{2s+t(t-1)}{2}}\sum_{1\leq s^{\prime}\leq t^{\prime}\leq K}\tilde{\Sigma}_{\frac{2s+t(t-1)}{2},\frac{2s^{\prime}+t^{\prime}(t^{\prime}-1)}{2}}^{l}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(U_{is^{\prime}}U_{jt^{\prime}}+U_{js^{\prime}}U_{it^{\prime}})\;G_{l,ij}$
$\displaystyle=$	$\displaystyle\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}\left\{G_{l,ij}\sum_{1\leq s\leq t\leq K}x_{\frac{2s+t(t-1)}{2}}\sum_{1\leq s^{\prime}\leq t^{\prime}\leq K}\tilde{\Sigma}_{\frac{2s+t(t-1)}{2},\frac{2s^{\prime}+t^{\prime}(t^{\prime}-1)}{2}}^{l}(U_{is^{\prime}}U_{jt^{\prime}}+U_{js^{\prime}}U_{it^{\prime}})\right\}$
$\displaystyle:=$	$\displaystyle\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}\bar{F}_{l,ij},$

where we let $\bar{F}_{l,ij}$ denote the portion enclosed in braces in the second equality, and the first equality follows from (A3). Given $l$ , it is obvious that the $\bar{F}_{l,ij}$ terms are zero-mean and independent of each other for all $1\leq i<j\leq n$ . In accordance with Assumption 4, we have

$\displaystyle\|\bar{F}_{l,ij}\|$	$\displaystyle\leq$	$\displaystyle\frac{c^{\prime}}{n}\sum_{1\leq s\leq t\leq K}x_{\frac{2s+t(t-1)}{2}}\sum_{1\leq s^{\prime}\leq t^{\prime}\leq K}\left\|\tilde{\Sigma}_{\frac{2s+t(t-1)}{2},\frac{2s^{\prime}+t^{\prime}(t^{\prime}-1)}{2}}^{l}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{c^{\prime}}{n}\frac{K(K+1)}{2}\\|x\\|_{2}\left\{\sum_{\begin{subarray}{c}1\leq s\leq t\leq K\\[3.0pt] 1\leq s^{\prime}\leq t^{\prime}\leq K\end{subarray}}\left(\tilde{\Sigma}_{\frac{2s+t(t-1)}{2},\frac{2s^{\prime}+t^{\prime}(t^{\prime}-1)}{2}}^{l}\right)^{2}\right\}^{1/2}$
	$\displaystyle\leq$	$\displaystyle\frac{c^{\prime}}{n}K^{3}\\|x\\|_{2}\lambda_{\max}(\widetilde{\Sigma}^{l})$
	$\displaystyle\leq$	$\displaystyle\frac{c^{\prime}K^{3}\\|x\\|_{2}}{n\sqrt{\lambda_{\min}(\Sigma^{l})}}=o(\\|x_{2}\\|_{2})$

for all $l\in[L]$ and $1\leq i<j\leq n$ . To affirm the Lindeberg condition, we proceed by calculating the sum of ${\rm Var}(\bar{F}_{l,ij})$ for all $1\leq i<j\leq n$ . By a simple calculation, we have ${\rm Var}\left(x^{T}(\Sigma^{l})^{-1/2}{\rm vec}(U^{T}G_{l}U)\right)=\|x\|_{2}^{2}$ . Therefore, for any $\varepsilon>0$ and sufficiently large $n$ , it holds true that $|\bar{F}_{l,ij}|<\varepsilon\left[{\rm Var}\left(x^{T}(\Sigma^{l})^{-1/2}{\rm vec}(U^{T}G_{l}U)\right)\right]^{1/2}$ . This subsequently satisfies the Lindeberg condition as given by

\lim\limits_{n\rightarrow\infty}\frac{1}{\|x\|_{2}^{2}}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}\mathbb{E}\left(|\bar{F}_{l,ij}|^{2}\mathbb{I}_{\{|\bar{F}_{l,ij}|>\varepsilon[{\rm Var}(x^{T}(\Sigma^{l})^{-1/2}{\rm vec}(U^{T}G_{l}U))]^{1/2}\}}\right)=0.

We should also emphasize that $U$ , $G_{l}$ , and $\bar{F}_{l,ij}$ are all dependent on $n$ . For the sake of simplicity, we have not explicitly displayed the dependence on $n$ in our notation. By the Lindeberg-Feller Central Limit Theorem and the Cramér-Wold theorem (Thoerem 3.9.5 of Durrett (2010)), we have

\left(\Sigma^{l}\right)^{-1/2}{\rm vec}\left(U^{T}G_{l}U\right)\rightarrow_{d}\mathcal{N}(\bm{0},\bm{I})

for each $l\in[L]$ . Here, $\mathcal{N}(\bm{0},\bm{I})$ is a $K(K+1)/2$ dimensional standard normal distribution. The claim follows by combining the above discussion with (A2). $\hfill\qed$

Proof of Theorem 2

Consistent with the proof of Theorem 1, a random matrix $E_{l}$ exists such that A2 holds. According to Theorem A2, the bias term $E_{l}$ satisfies

	$\displaystyle\\|ZE_{l}Z^{T}\\|_{F}$	$\displaystyle\leq$	$\displaystyle c(\\|Q_{l}\\|_{2}+\sqrt{\\|Q_{l}\\|_{1,\infty}})\left(\frac{n^{2}}{\\|\psi\\|_{2}^{8}}+\frac{n^{3}\log(L+n)}{L\rho\\|\psi\\|_{2}^{8}}\right)$
			$\displaystyle\leavevmode\nobreak\ +c\sqrt{\\|Q_{l}\\|_{1,\infty}}\left(\frac{n}{\\|\psi\\|_{2}^{4}}+\frac{n^{3/2}\log^{1/2}(L+n)}{L^{1/2}\rho^{1/2}\\|\psi\\|_{2}^{4}}\right)+\sqrt{K}\rho$

with high probability. Furthermore, given that every element of $Q_{l}$ is at most $\rho$ , neither $\|Q_{l}\|_{2}$ nor $\|Q_{l}\|_{1,\infty}$ exceeds $n\rho$ . Thus, we can deduce that $\|ZE_{l}Z^{T}\|_{F}\leq c\left(\frac{n^{2}\log^{1/2}(L+n)}{L^{1/2}\|\psi\|_{2}^{4}}+\max\{\frac{n^{3/2}\rho^{1/2}}{\|\psi\|_{2}^{4}},\rho\}\right)$ . Under certain setting for $\psi$ , $ZE_{l}Z^{T}$ tends to vanish as the number of layers $L$ increase, and as the overall edge density $\rho$ decrease.

To demonstrate the central limit theorem, it suffices to ascertain the asymptotic normality of $U^{T}G_{l}U$ . For part (a), recall that the decomposition of $(UG_{l}U^{T})_{s,t}$ in A3 for all $1\leq s\leq t\leq K$ . By the definition of $U$ , that is $\tilde{\Theta}$ here, and according to Assumption E1, we have $F_{l,ij}\leq c/\|\psi\|_{2}^{2}$ for all $l\in[L]$ and $1\leq i\leq j\leq n$ . Combing with Assumption E3, the Lindeberg condition in (A4) holds. For part (b), recall that the decomposition of $x^{T}(\Sigma^{l})^{-1/2}{\rm vec}(U^{T}G_{l}U)$ and the definition of $\bar{F}_{l,ij}$ in (A5). Under Assumption E4, it follows that $|\bar{F}_{l,ij}|\leq\frac{c\|x\|_{2}}{\|\psi\|_{2}^{2}\sqrt{\lambda_{\min}(\Sigma^{l})}}=o(\|x\|_{2})$ . The rest proof is similar to that of Theorem 1, we omit it here. $\hfill\qed$

A.3 Auxiliary lemmas

Lemma A3 (Theorem 5.2 of Lei and Rinaldo (2015)).

Let $A$ be the adjacency matrix of a random graph on $n$ nodes in which edges occur independently. Set $\mathbb{E}(A)=P=(p_{ij})$ for ${i,j=1,\ldots,n}$ , and assume that $n\max_{i,j}p_{ij}<d$ for $d>c\log n$ and $c>0$ . Then, for any $r>0$ there exists a constant $C=C(r,c)$ such that

\|A-P\|_{2}\leq C\sqrt{d}

with probability at least $1-n^{-r}$ .

A.4 Additional results for data analysis

Table A3: Categories of cereals as defined by the FAO, with ‘n.e.c.’ representing ‘not elsewhere classified’. Cereal numbers correspond to layers in the WFAT network depicted in Figure 4(b).

1.Malt, whether or not roasted; 2.Rice; 3.Canary seed; 4.Buckwheat; 5.Germ of

maize; 6.Triticale; 7.Cereals n.e.c.; 8.Millet; 9.Sorghum; 10.Bran of maize; 11.Bran

of cereals n.e.c.; 12.Flour of mixed grain; 13.Flour of rice; 14.Rye; 15.Malt extract;

16.Rice, milled (husked); 17.Oats; 18.Cereal preparations; 19.Oats, rolled; 20.Bran of

wheat; 21.Flour of cereals n.e.c.; 22.Rice, broken; 23.Flour of maize; 24.Gluten feed

and meal; 25.Bread; 26.Husked rice; 27.Wheat and meslin flour; 28.Communion

wafers, empty cachets of a kind suitable for pharmaceutical use, sealing wafers, rice

paper and similar products; 29.Barley; 30.Breakfast cereals; 31.Uncooked pasta, not

stuffed or otherwise prepared; 32.Maize (corn); 33.Wheat; 34.Mixes and doughs for

the preparation of bakers’ wares; 35.Food preparations of flour, meal or malt extract;

36.Rice, milled; 37.Pastry

This section presents the additional experimental results that are not shown in the main text. In Section 6, for the WFAT dataset, we initially chose the number of communities as $K=5$ in the multi-layer SBM analysis, corresponding naturally to the geographical division into five continents. Here, we adopt an alternative approach to determine $K$ . By observing an elbow in the scree plot of the top absolute eigenvalues of the sum of squared adjacency matrices at the 4th position, as shown in Figure A5, we select $K=4$ . Figure A6 presents the outcomes of the multiple comparisons using a Holm type step-down procedure, where olive green indicates the acceptance and white denotes the rejection of the individual hypotheses. We control the Holm procedure to ensure that the family-wise error rate is no greater than $\alpha=0.05$ . In this setup, layer $l$ corresponds to the cereal numbered $l$ in Table A3. Similar to Figure 4(a), the outcomes of this multiple comparison procedure also exhibit several distinct blocks where all individual hypotheses are simultaneously accepted within each block. Moreover, the results are largely consistent with those observed when $K=5$ . This strongly suggests that some global cereal trade patterns are similar and also demonstrates the stability of our method.

References

Agterberg and Cape (2023) Agterberg, J. and J. Cape (2023). An overview of asymptotic normality in stochastic blockmodels: Cluster analysis and inference. arXiv preprint arXiv:2305.06353.
Agterberg et al. (2022) Agterberg, J., Z. Lubberts, and J. Arroyo (2022). Joint spectral clustering in multilayer degree-corrected stochastic blockmodels. arXiv preprint arXiv:2212.05053.
Airoldi et al. (2008) Airoldi, E. M., D. Blei, S. Fienberg, and E. Xing (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9(65), 1981–2014.
Arroyo et al. (2021) Arroyo, J., A. Athreya, J. Cape, G. Chen, C. E. Priebe, and J. T. Vogelstein (2021). Inference for multiple heterogeneous networks with a common invariant subspace. Journal of Machine Learning Research 22(142), 1–49.
Bhatia (1997) Bhatia, R. (1997). Matrix analysis. Springer-Verlag, New York.
Bhattacharyya and Chatterjee (2018) Bhattacharyya, S. and S. Chatterjee (2018). Spectral clustering for multiple sparse networks: I. arXiv preprint arXiv:1805.10594.
Bickel et al. (2013) Bickel, P., D. Choi, X. Chang, and H. Zhang (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels1. The Annals of Statistics 41(4), 1922–1943.
De Domenico et al. (2015) De Domenico, M., V. Nicosia, A. Arenas, and V. Latora (2015). Structural reducibility of multilayer networks. Nature Communications 6, 6864.
Dudoit et al. (2008) Dudoit, S., M. J. Van Der Laan, and M. J. van der Laan (2008). Multiple testing procedures with applications to genomics. Springer, New York.
Durrett (2010) Durrett, R. (2010). Probability: theory and examples. Cambridge University Press, New York.
Fan et al. (2022a) Fan, J., Y. Fan, X. Han, and J. Lv (2022a). Asymptotic theory of eigenvectors for random matrices with diverging spikes. Journal of the American Statistical Association 117(538), 996–1009.
Fan et al. (2022b) Fan, J., Y. Fan, X. Han, and J. Lv (2022b). Simple: Statistical inference on membership profiles in large networks. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(2), 630–653.
Fan et al. (2019) Fan, J., D. Wang, K. Wang, and Z. Zhu (2019). Distributed estimation of principal eigenspaces. The Annals of Statistics 47(6), 3009–3031.
Han et al. (2015) Han, Q., K. Xu, and E. Airoldi (2015). Consistent estimation of dynamic and multi-layer block models. In International Conference on Machine Learning, pp. 1511–1520. PMLR.
He et al. (2019) He, Y., A. Knowles, and M. Marcozzi (2019). Local law and complete eigenvector delocalization for supercritical Erdős–Rényi graphs. The Annals of Probability 47(5), 3278 – 3302.
Holland et al. (1983) Holland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: First steps. Social Networks 5(2), 109–137.
Holme and Saramäki (2012) Holme, P. and J. Saramäki (2012). Temporal networks. Physics Reports 519(3), 97–125.
Jin et al. (2023) Jin, J., Z. T. Ke, S. Luo, and M. Wang (2023). Optimal estimation of the number of network communities. Journal of the American Statistical Association 118(543), 2101–2116.
Jing et al. (2021) Jing, B.-Y., T. Li, Z. Lyu, and D. Xia (2021). Community detection on mixture multilayer networks via regularized tensor decomposition. The Annals of Statistics 49(6), 3181–3205.
Karrer and Newman (2011) Karrer, B. and M. E. Newman (2011). Stochastic blockmodels and community structure in networks. Physical Review E 83(1), 016107.
Kivelä et al. (2014) Kivelä, M., A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, and M. A. Porter (2014). Multilayer networks. Journal of Complex Networks 2(3), 203–271.
Lehmann and Romano (2005) Lehmann, E. L. and J. P. Romano (2005). Testing statistical hypotheses. Springer, New York.
Lei and Lin (2023) Lei, J. and K. Z. Lin (2023). Bias-adjusted spectral clustering in multi-layer stochastic block models. Journal of the American Statistical Association 118(544), 2433–2445.
Lei and Rinaldo (2015) Lei, J. and A. Rinaldo (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics 43(1), 215–237.
Mucha et al. (2010) Mucha, P. J., T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980), 876–878.
Noble (2009) Noble, W. S. (2009). How does multiple testing correction work? Nature Biotechnology 27(12), 1135–1137.
Noroozi and Pensky (2024) Noroozi, M. and M. Pensky (2024). Sparse subspace clustering in diverse multiplex network model. Journal of Multivariate Analysis 203, 105333.
Paul and Chen (2016) Paul, S. and Y. Chen (2016). Consistent community detection in multi-relational data through restricted multi-layer stochastic blockmodel. Electronic Journal of Statistics 10(2), 3807–3870.
Paul and Chen (2020) Paul, S. and Y. Chen (2020). Spectral and matrix factorization methods for consistent community detection in multi-layer networks. The Annals of Statistics 48(1), 230–250.
Rubin-Delanchy et al. (2022) Rubin-Delanchy, P., J. Cape, M. Tang, and C. E. Priebe (2022). A statistical interpretation of spectral embedding: The generalised random dot product graph. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(4), 1446–1473.
Rudelson and Vershynin (2015) Rudelson, M. and R. Vershynin (2015). Delocalization of eigenvectors of random matrices with independent entries. Duke Mathematical Journal 164(13), 2507–2538.
Su et al. (2020) Su, L., W. Wang, and Y. Zhang (2020). Strong consistency of spectral clustering for stochastic block models. IEEE Transactions on Information Theory 66(1), 324–338.
Su et al. (2024) Su, W., X. Guo, X. Chang, and Y. Yang (2024). Spectral co-clustering in multi-layer directed networks. Computational Statistics & Data Analysis 198, 107987.
Tang et al. (2022) Tang, M., J. Cape, and C. E. Priebe (2022). Asymptotically efficient estimators for stochastic blockmodels: The naive mle, the rank-constrained mle, and the spectral estimator. Bernoulli 28(2), 1049–1073.
Tang and Priebe (2018) Tang, M. and C. E. Priebe (2018). Limit theorems for eigenvectors of the normalized laplacian for random graphs. The Annals of Statistics 46(5), 2360–2415.
Valles-Catala et al. (2016) Valles-Catala, T., F. A. Massucci, R. Guimera, and M. Sales-Pardo (2016). Multilayer stochastic block models reveal the multilayer structure of complex networks. Physical Review X 6(1), 011036.
Vu and Lei (2013) Vu, V. Q. and J. Lei (2013). Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics 41(6), 2905–2947.
Young and Scheinerman (2007) Young, S. J. and E. R. Scheinerman (2007). Random dot product graph models for social networks. In International Workshop on Algorithms and Models for the Web-Graph, pp. 138–149. Springer.
Zhang et al. (2022) Zhang, H., X. Guo, and X. Chang (2022). Randomized spectral clustering in large-scale stochastic block models. Journal of Computational and Graphical Statistics 31(3), 887–906.
Zheng and Tang (2022) Zheng, R. and M. Tang (2022). Limit results for distributed estimation of invariant subspaces in multiple networks inference and pca. arXiv preprint arXiv:2206.04306.

$\displaystyle\\|E_{l}\\|_{F}$	$\displaystyle\leq$	$\displaystyle\\|(\hat{U}^{T}U-Z^{T})M_{l}(U^{T}\hat{U})\\|_{F}+\\|Z^{T}M_{l}(U^{T}\hat{U}-Z)\\|_{F}$
		$\displaystyle+\leavevmode\nobreak\ \\|(\hat{U}^{T}-Z^{T}U^{T})G_{l}(\hat{U}-UZ)\\|_{F}+2\\|Z^{T}U^{T}G_{l}(\hat{U}-UZ)\\|_{F}$
		$\displaystyle+\leavevmode\nobreak\ \\|\hat{U}^{T}{\rm diag}(Q_{l})\hat{U}\\|_{F}$
	$\displaystyle\leq$	$\displaystyle\\|\hat{U}^{T}U-Z^{T}\\|_{F}\\|M_{l}\\|_{2}+\\|(U^{T}\hat{U}-Z)\\|_{F}\\|M_{l}\\|_{2}$
		$\displaystyle+\leavevmode\nobreak\ \\|\hat{U}-UZ\\|_{F}^{2}\\|G_{l}\\|_{2}+2\\|\hat{U}-UZ\\|_{F}\\|G_{l}\\|_{2}+\\|{\rm diag}(Q_{l})\\|_{2}\\|\hat{U}\\|_{F}$
	$\displaystyle=$	$\displaystyle 4(\\|M_{l}\\|_{2}+\\|G_{l}\\|_{2})\\|\hat{U}-UZ\\|_{F}^{2}+2\\|\hat{U}-UZ\\|_{F}\\|G_{l}\\|_{2}+\sqrt{K}\\|{\rm diag}(Q_{l})\\|_{2}.$

	$\displaystyle\\|ZE_{l}Z^{T}\\|_{F}$	$\displaystyle\leq$	$\displaystyle c(\\|Q_{l}\\|_{2}+\sqrt{\\|Q_{l}\\|_{1,\infty}})\left(\frac{1}{n^{2}}+\frac{\log(L+n)}{Ln\rho}\right)$
			$\displaystyle\leavevmode\nobreak\ +c\sqrt{\\|Q_{l}\\|_{1,\infty}}\left(\frac{1}{n}+\frac{\log^{1/2}(L+n)}{L^{1/2}n^{1/2}\rho^{1/2}}\right)+\sqrt{K}\rho$

$\displaystyle\|\bar{F}_{l,ij}\|$	$\displaystyle\leq$	$\displaystyle\frac{c^{\prime}}{n}\sum_{1\leq s\leq t\leq K}x_{\frac{2s+t(t-1)}{2}}\sum_{1\leq s^{\prime}\leq t^{\prime}\leq K}\left\|\tilde{\Sigma}_{\frac{2s+t(t-1)}{2},\frac{2s^{\prime}+t^{\prime}(t^{\prime}-1)}{2}}^{l}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{c^{\prime}}{n}\frac{K(K+1)}{2}\\|x\\|_{2}\left\{\sum_{\begin{subarray}{c}1\leq s\leq t\leq K\\[3.0pt] 1\leq s^{\prime}\leq t^{\prime}\leq K\end{subarray}}\left(\tilde{\Sigma}_{\frac{2s+t(t-1)}{2},\frac{2s^{\prime}+t^{\prime}(t^{\prime}-1)}{2}}^{l}\right)^{2}\right\}^{1/2}$
	$\displaystyle\leq$	$\displaystyle\frac{c^{\prime}}{n}K^{3}\\|x\\|_{2}\lambda_{\max}(\widetilde{\Sigma}^{l})$
	$\displaystyle\leq$	$\displaystyle\frac{c^{\prime}K^{3}\\|x\\|_{2}}{n\sqrt{\lambda_{\min}(\Sigma^{l})}}=o(\\|x_{2}\\|_{2})$

	$\displaystyle\\|ZE_{l}Z^{T}\\|_{F}$	$\displaystyle\leq$	$\displaystyle c(\\|Q_{l}\\|_{2}+\sqrt{\\|Q_{l}\\|_{1,\infty}})\left(\frac{n^{2}}{\\|\psi\\|_{2}^{8}}+\frac{n^{3}\log(L+n)}{L\rho\\|\psi\\|_{2}^{8}}\right)$
			$\displaystyle\leavevmode\nobreak\ +c\sqrt{\\|Q_{l}\\|_{1,\infty}}\left(\frac{n}{\\|\psi\\|_{2}^{4}}+\frac{n^{3/2}\log^{1/2}(L+n)}{L^{1/2}\rho^{1/2}\\|\psi\\|_{2}^{4}}\right)+\sqrt{K}\rho$