This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Overlapping and nonoverlapping models

\nameHuan Qing \emailqinghuan@cumt.edu.cn
\addrSchool of Mathematics
China University of Mining and Technology
Xuzhou, 221116, P.R. China
Abstract

Consider a directed network with KrK_{r} row communities and KcK_{c} column communities. Previous works found that modeling directed networks in which all nodes have overlapping property requires Kr=KcK_{r}=K_{c} for identifiability. In this paper, we propose an overlapping and nonoverlapping model to study directed networks in which row nodes have overlapping property while column nodes do not. The proposed model is identifiable when KrKcK_{r}\leq K_{c}. Meanwhile, we provide one identifiable model as extension of ONM to model directed networks with variation in node degree. Two spectral algorithms with theoretical guarantee on consistent estimations are designed to fit the models. A small scale of numerical studies are used to illustrate the algorithms.

Keywords: Community detection, directed networks, spectral clustering, asymptotic analysis, SVD.

1 Introduction

In the study of social networks, various models have been proposed to learn the latent structure of networks. Due to the extremely intensive studies on community detection, we only focus on identifiable models that are closely relevant to our study in this paper. For undirected network, the Stochastic Blockmodel (SBM) (Holland et al., 1983) is a classical and widely used model to generate undirected networks. The degree-corrected stochastic blockmodel (DCSBM) Karrer and Newman (2011) extends SBM by introducing degree heterogeneities. Under SBM and DCSBM, all nodes are pure such that each node only belong to one community. While, in real cases some nodes may belong to multiple communities, and such nodes have overlapping (also known as mixed membership) property. To model undirected networks in which nodes have overlapping property, Airoldi et al. (2008) designs the Mixed Membership Stochastic Blockmodel (MMSB). Jin et al. (2017) introduces the degree-corrected mixed membership model (DCMM) which extends MMSB by considering degree heterogeneities. Zhang et al. (2020) designs the OCCAM model which equals DCMM actually. Spectral methods with consistent estimations under the above models are provided in Rohe et al. (2011); Qin and Rohe (2013); Lei and Rinaldo (2015); Joseph and Yu (2016); Jin (2015); Jin et al. (2017); Mao et al. (2020, 2018). For directed networks in which all nodes have nonoverlapping property, Rohe et al. (2016) proposes a model called Stochastic co-Blockmodel (ScBM) and its extension DCScBM by considering degree heterogeneity, where ScBM (DCScBM) is extension of SBM (DCSBM). ScBM and DCScBM can model nonoverlapping directed networks in which row nodes belong to KrK_{r} row communities and column nodes belong to KcK_{c} column communities, where KrK_{r} can differ from KcK_{c}. Zhou and A.Amini (2019); Qing and Wang (2021b) study the consistency of some adjacency-based spectral algorithms under ScBM. Wang et al. (2020) studies the consistency of the spectral method D-SCORE under DCScBM when Kr=KcK_{r}=K_{c}. Qing and Wang (2021a) designs directed mixed membership stochastic blockmodel (DiMMSB) as an extension of ScBM and MMSB to model directed networks in which all nodes have overlapping property. Meanwhile, DiMMSB can also be seen as an extension of the two-way blockmodels with Bernoulli distribution of Airoldi et al. (2013). All the above models are identifiable under certain conditions. The identifiability of ScBM and DCScBM holds even for the case when KrKcK_{r}\neq K_{c}. DiMMSB is identifiable only when Kr=KcK_{r}=K_{c}. Sure, SBM, DCSBM, MMSB, DCMM and OCCAM are identifiable when Kr=KcK_{r}=K_{c} since they model undirected networks. For all the above models, row nodes and column nodes have symmetric structural information such that they always have nonoverlapping property or overlapping property simultaneously. As shown by the identifiability of DiMMSB, to model a directed network in which all nodes have overlapping property, the identifiability of the model requires Kr=KcK_{r}=K_{c}. Naturally, there is a bridge model from ScBM to DiMMSB such that the bride model can model a directed network in which row nodes and column nodes have asymmetric structural information such that they have different overlapping property. In this paper, we introduce this model and name it as overlapping and nonoverlapping model.

Our contributions in this paper are as follows. We propose an identifiable model for directed networks, the overlapping and nonoverlapping model (ONM for short). ONM allows that nodes in a directed network can have different overlapping property. Without loss of generality, in a directed network, we let row nodes have overlapping property while column nodes do not. The proposed model is identifiable when KrKcK_{r}\leq K_{c}. Recall that the identifiability of ScBM modeling nonoverlapping directed networks holds even for the case KrKcK_{r}\neq K_{c}, and DiMMSB modeling overlapping directed networks is identifiable only when Kr=KcK_{r}=K_{c}, this is the reason we call ONM modeling directed networks in which row nodes have different overlapping property as column nodes as a bridge model from ScBM to DiMMSB. Similar as DCScBM is an extension of ScBM, we propose an identifiable model overlapping and degree-corrected nonoverlapping model (ODCNM) as extension of ONM by considering degree heterogeneity. We construct two spectral algorithms to fit ONM and ODCNM. We show that our method enjoy consistent estimations under mild conditions by delicate spectral analysis. Especially, our theoretical results under ODCNM match those under ONM when ODCNM degenerates to ONM.

Notations. We take the following general notations in this paper. For any positive integer mm, let [m]:={1,2,,m}[m]:=\{1,2,\ldots,m\}. For a vector xx and fixed q>0q>0, xq\|x\|_{q} denotes its lql_{q}-norm. For a matrix MM, MM^{\prime} denotes the transpose of the matrix MM, M\|M\| denotes the spectral norm, MF\|M\|_{F} denotes the Frobenius norm, and M2\|M\|_{2\rightarrow\infty} denotes the maximum l2l_{2}-norm of all the rows of MM. Let σi(M)\sigma_{i}(M) be the ii-th largest singular value of matrix MM, and λi(M)\lambda_{i}(M) denote the ii-th largest eigenvalue of the matrix MM ordered by the magnitude. M(i,:)M(i,:) and M(:,j)M(:,j) denote the ii-th row and the jj-th column of matrix MM, respectively. M(Sr,:)M(S_{r},:) and M(:,Sc)M(:,S_{c}) denote the rows and columns in the index sets SrS_{r} and ScS_{c} of matrix MM, respectively. For any matrix MM, we simply use Y=max(0,M)Y=\mathrm{max}(0,M) to represent Yij=max(0,Mij)Y_{ij}=\mathrm{max}(0,M_{ij}) for any i,ji,j. For any matrix Mm×mM\in\mathbb{R}^{m\times m}, let diag(M)\mathrm{diag}(M) be the m×mm\times m diagonal matrix whose ii-th diagonal entry is M(i,i)M(i,i). 𝟏\mathbf{1} is a column vector with all entries being ones. eie_{i} is a column vector whose ii-th entry is 1 while other entries are zero.

2 The overlapping and nonoverlapping model

Consider a directed network 𝒩=(Vr,Vc,E)\mathcal{N}=(V_{r},V_{c},E), where Vr={1,2,,nr}V_{r}=\{1,2,\ldots,n_{r}\} is the set of row nodes, Vc={1,2,,nc}V_{c}=\{1,2,\ldots,n_{c}\} is the set of column nodes, and EE is the set of edges from row nodes to column nodes. Note that since row nodes can be different from column nodes, we may have VrVc=V_{r}\cap V_{c}=\varnothing, where \varnothing denotes the null set. In this paper, we use subscript rr and cc to distinguish terms for row nodes and column nodes. Let A{0,1}nr×ncA\in\{0,1\}^{n_{r}\times n_{c}} be the bi-adjacency matrix of directed network 𝒩\mathcal{N} such that A(ir,ic)=1A(i_{r},i_{c})=1 if there is a directional edge from row node iri_{r} to column node ici_{c}, and A(ir,ic)=0A(i_{r},i_{c})=0 otherwise.

We propose a new block model which we call overlapping and nonoverlapping model (ONM for short). ONM can model directed networks whose row nodes belong to KrK_{r} overlapping row communities while column nodes belong to KcK_{c} nonoverlapping column communities.

For row nodes, let Πrnr×Kr\Pi_{r}\in\mathbb{R}^{n_{r}\times K_{r}} be the membership matrix of row nodes such that

Πr(ir,)0,Πr(ir,:)1=1forir[nr].\displaystyle\Pi_{r}(i_{r},)\geq 0,\|\Pi_{r}(i_{r},:)\|_{1}=1\qquad\mathrm{for~{}}i_{r}\in[n_{r}]. (1)

Call row node iri_{r} pure if Πr(ir,:)\Pi_{r}(i_{r},:) degenerates (i.e., one entry is 1, all others Kr1K_{r}-1 entries are 0) and mixed otherwise. From such definition, row node iri_{r} has mixed membership and may belong to more than one row communities for ir[nr]i_{r}\in[n_{r}].

For column nodes, let \ell be the nc×1n_{c}\times 1 vector whose ici_{c}-th entry (ic)=k\ell(i_{c})=k if column node ici_{c} belongs to the kk-th column community, and (ic)\ell(i_{c}) takes value from {1,2,,Kc}\{1,2,\ldots,K_{c}\} for ic[nc]i_{c}\in[n_{c}]. Let Πcnc×Kc\Pi_{c}\in\mathbb{R}^{n_{c}\times K_{c}} be the membership matrix of column nodes such that

Πc(ic,k)=1when(ic)=k,and0otherwise,andΠc(ic,:)1=1foric[nc],k[Kc].\displaystyle\Pi_{c}(i_{c},k)=1~{}\mathrm{when~{}}\ell(i_{c})=k,\mathrm{~{}and~{}}0\mathrm{~{}otherwise},\mathrm{and~{}}\|\Pi_{c}(i_{c},:)\|_{1}=1\qquad\mathrm{for~{}}i_{c}\in[n_{c}],k\in[K_{c}]. (2)

From such definition, column node ici_{c} belongs to exactly one of the KcK_{c} column communities for ic[nc]i_{c}\in[n_{c}]. Sure, all column nodes are pure nodes.

In this paper, we assume that

KrKc.\displaystyle K_{r}\leq K_{c}. (3)

Eq (3) is required for the identifiability of ONM.

Let PKr×KcP\in\mathbb{R}^{K_{r}\times K_{c}} be the probability matrix (also known as connectivity matrix) such that

0P(k,l)ρ1fork[Kr],l[Kc],\displaystyle 0\leq P(k,l)\leq\rho\leq 1\qquad\mathrm{for~{}}k\in[K_{r}],l\in[K_{c}], (4)

where ρ\rho controls the network sparsity and is called sparsity parameter in this paper. For convenience, set P=ρP~P=\rho\tilde{P} where P~(k,l)[0,1]\tilde{P}(k,l)\in[0,1] for k[Kr],l[Kc]k\in[K_{r}],l\in[K_{c}], and maxk[Kr],l[Kc]P~(k,l)=1\mathrm{max}_{k\in[K_{r}],l\in[K_{c}]}\tilde{P}(k,l)=1 for model identifiability. For all pairs of (ir,ic)(i_{r},i_{c}) with ir[nr],ic[nc]i_{r}\in[n_{r}],i_{c}\in[n_{c}], our model assumes that A(ir,ic)A(i_{r},i_{c}) are independent Bernoulli random variables satisfying

Ω:=ΠrPΠc,A(ir,ic)Bernoulli(Ω(ir,ic))forir[nr],ic[nc],\displaystyle\Omega:=\Pi_{r}P\Pi^{\prime}_{c},~{}~{}~{}A(i_{r},i_{c})\sim\mathrm{Bernoulli}(\Omega(i_{r},i_{c}))\qquad\mathrm{for~{}}i_{r}\in[n_{r}],i_{c}\in[n_{c}], (5)

where Ω=𝔼[A]\Omega=\mathbb{E}[A] , and we call it population adjacency matrix in this paper.

Definition 1

Call model (1)-(5) the Overlapping and Nonoverlapping model (ONM) and denote it by ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}).

The following conditions are sufficient for the identifiability of ONM:

  • (I1) rank(P)=Kr,rank(Πr)=Kr\mathrm{rank}(P)=K_{r},\mathrm{rank}(\Pi_{r})=K_{r} and rank(Πc)=Kc\mathrm{rank}(\Pi_{c})=K_{c}.

  • (I2) There is at least one pure row node for each of the KrK_{r} row communities.

Here, rank(Πr)=Kr\mathrm{rank}(\Pi_{r})=K_{r} means that ir=1nr(Πr(ir,k))>0\sum_{i_{r}=1}^{n_{r}}(\Pi_{r}(i_{r},k))>0 for all k[Kr]k\in[K_{r}]; rank(Πc)=Kc\mathrm{rank}(\Pi_{c})=K_{c} means that each column community has at least one column node. For k[Kr]k\in[K_{r}], let r(k)={i{1,2,,nr}:Πr(i,k)=1}\mathcal{I}^{(k)}_{r}=\{i\in\{1,2,\ldots,n_{r}\}:\Pi_{r}(i,k)=1\}. By condition (I2), Ir(k)I^{(k)}_{r} is non empty for all k[Kr]k\in[K_{r}]. For k[Kr]k\in[K_{r}], select one row node from r(k)\mathcal{I}^{(k)}_{r} to construct the index set r\mathcal{I}_{r}, i.e., r\mathcal{I}_{r} is the indices of row nodes corresponding to KrK_{r} pure row nodes, one from each community. W.L.O.G., let Πr(r,:)=IKr\Pi_{r}(\mathcal{I}_{r},:)=I_{K_{r}} (Lemma 2.1 Mao et al. (2020) also has similar setting to design their spectral algorithms under MMSB.). c\mathcal{I}_{c} is defined similarly for column nodes such that Πc(c,:)=IKc\Pi_{c}(\mathcal{I}_{c},:)=I_{K_{c}}. Next proposition guarantees that once conditions (I1) and (I2) hold, ONM is identifiable.

Proposition 2

If conditions (I1) and (I2) hold, ONM is identifiable: For eligible (P,Πr,Πc)(P,\Pi_{r},\Pi_{c}) and (Pˇ,Πˇr,Πˇc)(\check{P},\check{\Pi}_{r},\check{\Pi}_{c}), if ΠrPΠc=ΠˇrPˇΠˇc\Pi_{r}P\Pi^{\prime}_{c}=\check{\Pi}_{r}\check{P}\check{\Pi}^{\prime}_{c}, then P=Pˇ,Πr=ΠˇrP=\check{P},\Pi_{r}=\check{\Pi}_{r}, and Πc=Πˇc\Pi_{c}=\check{\Pi}_{c}.

Compared to some previous models for directed networks, ONM models different directed networks.

  • When all row nodes are pure, our ONM reduces to ScBM with KrK_{r} row clusters and KcK_{c} column clusters Rohe et al. (2016). However, ONM allows row nodes to have overlapping memberships while ScBM does not. Meanwhile, for model identifiability, ScBM does not require rank(P)=Kr\mathrm{rank}(P)=K_{r} while ONM requires, and this can be seen as the cost of ONM when modeling overlapping row nodes.

  • Though DiMMSB Qing and Wang (2021a) can model directed networks whose row and column nodes have overlapping memberships, DiMMSB requires Kr=KcK_{r}=K_{c} for model identifiability. For comparison, our ONM allows KrKcK_{r}\leq K_{c} at the cost of losing overlapping property of column nodes.

2.1 A spectral algorithm for fitting ONM

The primary goal of the proposed algorithm is to estimate the row membership matrix Πr\Pi_{r} and column membership matrix Πc\Pi_{c} from the observed adjacency matrix AA with given KrK_{r} and KcK_{c}.

We now discuss our intuition for the design of our algorithm to fit ONM. Under conditions (I1) and (I2), by basic algebra, we have rank(Ω)=Kr\mathrm{rank}(\Omega)=K_{r}. Let Ω=UrΛUc\Omega=U_{r}\Lambda U^{\prime}_{c} be the compact singular value decomposition of Ω\Omega, where Urnr×Kr,ΛKr×Kr,Ucnc×KrU_{r}\in\mathbb{R}^{n_{r}\times K_{r}},\Lambda\in\mathbb{R}^{K_{r}\times K_{r}},U_{c}\in\mathbb{R}^{n_{c}\times K_{r}}, UrUr=IKr,UcUc=IKrU^{\prime}_{r}U_{r}=I_{K_{r}},U^{\prime}_{c}U_{c}=I_{K_{r}}, and IKrI_{K_{r}} is a Kr×KrK_{r}\times K_{r} identity matrix. Let nc,k=|{ic:(ic)=k}|n_{c,k}=|\{i_{c}:\ell(i_{c})=k\}| be the size of the kk-th column community for k[Kc]k\in[K_{c}]. Let nc,max=maxk[Kc]nc,kn_{c,\mathrm{max}}=\mathrm{max}_{k\in[K_{c}]}n_{c,k} and nc,min=mink[Kc]nc,kn_{c,\mathrm{min}}=\mathrm{min}_{k\in[K_{c}]}n_{c,k}. Meanwhile, without causing confusion, let nc,Krn_{c,K_{r}} be the KrK_{r}-th largest size among all column communities. The following lemma guarantees that UrU_{r} enjoys ideal simplex structure and UcU_{c} has KcK_{c} distinct rows.

Lemma 3

Under ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}), there exist an unique Kr×KrK_{r}\times K_{r} matrix BrB_{r} and an unique Kc×KrK_{c}\times K_{r} matrix BcB_{c} such that

  • Ur=ΠrBrU_{r}=\Pi_{r}B_{r} where Br=Ur(r,:)B_{r}=U_{r}(\mathcal{I}_{r},:). Meanwhile, Ur(ir,:)=Ur(i¯r,:)U_{r}(i_{r},:)=U_{r}(\bar{i}_{r},:) when Πr(ir,:)=Πr(i¯r,:)\Pi_{r}(i_{r},:)=\Pi_{r}(\bar{i}_{r},:) for ir,i¯r[nr]i_{r},\bar{i}_{r}\in[n_{r}].

  • Uc=ΠcBcU_{c}=\Pi_{c}B_{c}. Meanwhile, Uc(ic,:)=Uc(i¯c,:)U_{c}(i_{c},:)=U_{c}(\bar{i}_{c},:) when (ic)=(i¯c)\ell(i_{c})=\ell(\bar{i}_{c}) for ic,i¯c[nc]i_{c},\bar{i}_{c}\in[n_{c}], i.e., UcU_{c} has KcK_{c} distinct rows. Furthermore, when Kr=Kc=KK_{r}=K_{c}=K, we have Bc(k,:)Bc(l,:)F=1nc,k+1nc,l\|B_{c}(k,:)-B_{c}(l,:)\|_{F}=\sqrt{\frac{1}{n_{c,k}}+\frac{1}{n_{c,l}}} for all 1k<lK1\leq k<l\leq K.

Lemma 3 says that the rows of UcU_{c} form a KrK_{r}-simplex in Kr\mathbb{R}^{K_{r}} which we call the Ideal Simplex (IS), with the KrK_{r} rows of BrB_{r} being the vertices. Such IS is also found in Jin et al. (2017); Mao et al. (2020); Qing and Wang (2021a). Meanwhile, Lemma 3 says that UcU_{c} has KcK_{c} distinct rows, and if two column nodes ici_{c} and i¯c\bar{i}_{c} are from the same column community, then Uc(ic,:)=Uc(i¯c,:)U_{c}(i_{c},:)=U_{c}(\bar{i}_{c},:).

Under ONM, to recover Πc\Pi_{c} from UcU_{c}, since UcU_{c} has KcK_{c} distinct rows, applying k-means algorithm on all rows of UcU_{c} returns true column communities by Lemma 3. Meanwhile, since UcU_{c} has KcK_{c} distinct rows, we can set δc=minklBc(k,:)Bc(l,:)F\delta_{c}=\mathrm{min}_{k\neq l}\|B_{c}(k,:)-B_{c}(l,:)\|_{F} to measure the minimum center separation of BcB_{c}. By Lemma 3, δc2nc,max\delta_{c}\geq\sqrt{\frac{2}{n_{c,\mathrm{max}}}} when Kr=Kc=KK_{r}=K_{c}=K under ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}). However, when Kr<KcK_{r}<K_{c}, it is challenge to obtain a positive lower bound of δc\delta_{c}, see the proof of Lemma 3 for detail.

Under ONM, to recover Πc\Pi_{c} from UcU_{c}, since BrB_{r} is full rank, if UrU_{r} and BrB_{r} are known in advance ideally, we can exactly recover Πr\Pi_{r} by setting Πr=UrBr(BrBr)1\Pi_{r}=U_{r}B_{r}^{\prime}(B_{r}B_{r}^{\prime})^{-1} by Lemma 3. Set Yr=UrBr(BrBr)1Y_{r}=U_{r}B_{r}^{\prime}(B_{r}B_{r}^{\prime})^{-1}, since YrΠrY_{r}\equiv\Pi_{r} and Πr(ir,:)1=1\|\Pi_{r}(i_{r},:)\|_{1}=1 for ir[nr]i_{r\in[n_{r}]}, we have

Πr(ir,:)=Yr(ir,:)Yr(ir,:)1,ir[nr].\displaystyle\Pi_{r}(i_{r},:)=\frac{Y_{r}(i_{r},:)}{\|Y_{r}(i_{r},:)\|_{1}},i_{r}\in[n_{r}].

With given UrU_{r}, since it enjoys IS structure Ur=ΠrBrΠrUr(r,:)U_{r}=\Pi_{r}B_{r}\equiv\Pi_{r}U_{r}(\mathcal{I}_{r},:), as long as we can obtain the row corner matrix Ur(r,:)U_{r}(\mathcal{I}_{r},:) (i.e., BrB_{r}), we can recover Πr\Pi_{r} exactly. As mentioned in Jin et al. (2017); Mao et al. (2020), for such ideal simplex, the successive projection (SP) algorithm Gillis and Vavasis (2015) (for detail of SP, see Algorithm 3) can be applied to UrU_{r} with KrK_{r} row communities to find Ur(r,:)U_{r}(\mathcal{I}_{r},:).

Based on the above analysis, we are now ready to give the following algorithm which we call Ideal ONA. Input Ω,Kr,Kc\Omega,K_{r},K_{c} with KrKcK_{r}\leq K_{c}. Output: Πr\Pi_{r} and \ell.

  • Let Ω=UrΛUc\Omega=U_{r}\Lambda U^{\prime}_{c} be the compact SVD of Ω\Omega such that Urnr×Kr,Ucnc×Kr,ΛKr×Kr,UrUr=IKr,UcUc=IKrU_{r}\in\mathbb{R}^{n_{r}\times K_{r}},U_{c}\in\mathbb{R}^{n_{c}\times K_{r}},\Lambda\in\mathbb{R}^{K_{r}\times K_{r}},U^{\prime}_{r}U_{r}=I_{K_{r}},U^{\prime}_{c}U_{c}=I_{K_{r}}.

  • For row nodes,

    • Run SP algorithm on all rows of UrU_{r} assuming there are KrK_{r} row communities to obtain Ur(r,:)U_{r}(\mathcal{I}_{r},:). Set Br=Ur(r,:)B_{r}=U_{r}(\mathcal{I}_{r},:).

    • Set Yr=UrBr(BrBr)1Y_{r}=U_{r}B_{r}^{\prime}(B_{r}B_{r}^{\prime})^{-1}. Recover Πr\Pi_{r} by setting Πr(ir,:)=Yr(ir,:)Yr(ir,:)1\Pi_{r}(i_{r},:)=\frac{Y_{r}(i_{r},:)}{\|Y_{r}(i_{r},:)\|_{1}} for ir[nr]i_{r}\in[n_{r}].

    For column nodes,

    • Run k-means on UcU_{c} assuming there are KcK_{c} column communities, i.e., find the solution to the following optimization problem

      M=argminMMnc,Kr,KcMUcF2,\displaystyle M^{*}=\mathrm{argmin}_{M\in M_{n_{c},K_{r},K_{c}}}\|M-U_{c}\|^{2}_{F},

      where Mnc,Kr,KcM_{n_{c},K_{r},K_{c}} denotes the set of nc×Krn_{c}\times K_{r} matrices with only KcK_{c} different rows.

    • use MM^{*} to obtain the labels vector \ell of column nodes.

Follow similar proof of Theorem 1 of Qing and Wang (2021a), Ideal ONA exactly recoveries row nodes memberships and column nodes labels, and this also verifies the identifiability of ONM in turn. For convenience, call the two steps for column nodes as “run k-means on UcU_{c} assuming there are KcK_{c} column communities to obtain \ell”.

We now extend the ideal case to the real case. Set A~=U^rΛ^U^c\tilde{A}=\hat{U}_{r}\hat{\Lambda}\hat{U}^{\prime}_{c} be the top-KrK_{r}-dimensional SVD of AA such that U^rnr×Kr,U^cnc×Kr,Λ^Kr×Kr,U^rU^r=IKr,U^cU^c=IKr\hat{U}_{r}\in\mathbb{R}^{n_{r}\times K_{r}},\hat{U}_{c}\in\mathbb{R}^{n_{c}\times K_{r}},\hat{\Lambda}\in\mathbb{R}^{K_{r}\times K_{r}},\hat{U}^{\prime}_{r}\hat{U}_{r}=I_{K_{r}},\hat{U}^{\prime}_{c}\hat{U}_{c}=I_{K_{r}}, and Λ^\hat{\Lambda} contains the top KrK_{r} singular values of AA. For the real case, we use B^r,B^c,Y^r,Π^r,Π^c\hat{B}_{r},\hat{B}_{c},\hat{Y}_{r},\hat{\Pi}_{r},\hat{\Pi}_{c} given in Algorithm 1 to estimate Br,Bc,Yr,Πr,ΠcB_{r},B_{c},Y_{r},\Pi_{r},\Pi_{c}, respectively. Algorithm 1 called overlapping and nonoverlapping algorithm (ONA for short) is a natural extension of the Ideal ONA to the real case. In ONA, we set the negative entries of Y^r\hat{Y}_{r} as 0 by setting Y^r=max(0,Y^r)\hat{Y}_{r}=\mathrm{max}(0,\hat{Y}_{r}) for the reason that weights for any row node should be nonnegative while there may exist some negative entries of U^rB^r(B^rB^r)1\hat{U}_{r}\hat{B}_{r}^{\prime}(\hat{B}_{r}\hat{B}_{r}^{\prime})^{-1}. Note that, in a directed network, if column nodes have overlapping property while row nodes do not, to do community detection for such directed network, set the transpose of the adjacency matrix as input when applying our algorithm.

Algorithm 1 Overlapping and Nonoverlapping Algorithm (ONA)
1:The adjacency matrix Anr×ncA\in\mathbb{R}^{n_{r}\times n_{c}} of a directed network, the number of row communities KrK_{r}, and the number of column communities KcK_{c} with KrKcK_{r}\leq K_{c}.
2:The estimated nr×Krn_{r}\times K_{r} membership matrix Π^r\hat{\Pi}_{r} for row nodes, and the estimated nc×1n_{c}\times 1 labels vector ^\hat{\ell} for column nodes.
3:Compute U^rnr×Kr\hat{U}_{r}\in\mathbb{R}^{n_{r}\times K_{r}} and U^cnc×Kr\hat{U}_{c}\in\mathbb{R}^{n_{c}\times K_{r}} from the top-KrK_{r}-dimensional SVD of AA.
4:For row nodes:
  • Apply SP algorithm (i.e., Algorithm 3) on the rows of U^r\hat{U}_{r} assuming there are KrK_{r} row clusters to obtain the near-corners matrix U^r(^r,:)Kr×Kr\hat{U}_{r}(\mathcal{\hat{I}}_{r},:)\in\mathbb{R}^{K_{r}\times K_{r}}, where ^r\mathcal{\hat{I}}_{r} is the index set returned by SP algorithm. Set B^r=U^r(^r,:)\hat{B}_{r}=\hat{U}_{r}(\mathcal{\hat{I}}_{r},:).

  • Compute the nr×Krn_{r}\times K_{r} matrix Y^r\hat{Y}_{r} such that Y^r=U^rB^r(B^rB^r)1\hat{Y}_{r}=\hat{U}_{r}\hat{B}_{r}^{\prime}(\hat{B}_{r}\hat{B}_{r}^{\prime})^{-1}. Set Y^r=max(0,Y^r)\hat{Y}_{r}=\mathrm{max}(0,\hat{Y}_{r}) and estimate Πr(ir,:)\Pi_{r}(i_{r},:) by Π^r(ir,:)=Y^r(ir,:)Y^r(ir,:)1,ir[nr]\hat{\Pi}_{r}(i_{r},:)=\frac{\hat{Y}_{r}(i_{r},:)}{\|\hat{Y}_{r}(i_{r},:)\|_{1}},i_{r}\in[n_{r}].

For column nodes: run k-means on U^c\hat{U}_{c} assuming there are KcK_{c} column communities to obtain ^\hat{\ell}.

2.2 Main results for ONA

In this section, we show the consistency of our algorithm for fitting the ONM as the number of row nodes nrn_{r} and the number of column nodes ncn_{c} increase. Throughout this paper, KrKcK_{r}\leq K_{c} are two known integers. First, we assume that

Assumption 4

ρmax(nr,nc)log(nr+nc)\rho\mathrm{max}(n_{r},n_{c})\geq\mathrm{log}(n_{r}+n_{c}).

Assumption (4) controls the sparsity of directed network considered for theoretical study. By Lemma 4 of Qing and Wang (2021a), we have below lemma.

Lemma 5

(Row-wise singular eigenvector error) Under ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}), when Assumption (4) holds, suppose σKr(Ω)Cρ(nr+nc)log(nr+nc)\sigma_{K_{r}}(\Omega)\geq C\sqrt{\rho(n_{r}+n_{c})\mathrm{log}(n_{r}+n_{c})}, with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}),

U^rU^rUrUr2=O(Kr(κ(Ω)max(nr,nc)μmin(nr,nc)+log(nr+nc))ρσKr(P~)σKr(Πr)nc,Kr),\displaystyle\|\hat{U}_{r}\hat{U}^{\prime}_{r}-U_{r}U^{\prime}_{r}\|_{2\rightarrow\infty}=O(\frac{\sqrt{K_{r}}(\kappa(\Omega)\sqrt{\frac{\mathrm{max}(n_{r},n_{c})\mu}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\sqrt{\rho}\sigma_{K_{r}}(\tilde{P})\sigma_{K_{r}}(\Pi_{r})\sqrt{n_{c,K_{r}}}}),

where μ\mu is the incoherence parameter defined as μ=max(nrUr22Kr,ncUc22Kr)\mu=\mathrm{max}(\frac{n_{r}\|U_{r}\|^{2}_{2\rightarrow\infty}}{K_{r}},\frac{n_{c}\|U_{c}\|^{2}_{2\rightarrow\infty}}{K_{r}}).

For convenience, set ϖ=U^rU^rUrUr2\varpi=\|\hat{U}_{r}\hat{U}^{\prime}_{r}-U_{r}U^{\prime}_{r}\|_{2\rightarrow\infty} in this paper. To measure the performance of ONA for row nodes memberships, since row nodes have mixed memberships, naturally, we use the l1l_{1} norm difference between Πr\Pi_{r} and Π^r\hat{\Pi}_{r}. Since column nodes are all pure nodes, we consider the performance criterion defined in Joseph and Yu (2016) to measure estimation error of ONA on column nodes. We introduce this measurement of estimation error as below.

Let 𝒯c={𝒯c,1,𝒯c,2,,𝒯c,Kc}\mathcal{T}_{c}=\{\mathcal{T}_{c,1},\mathcal{T}_{c,2},\ldots,\mathcal{T}_{c,K_{c}}\} be the true partition of column nodes {1,2,,nc}\{1,2,\ldots,n_{c}\} obtained from \ell such that 𝒯c,k={ic:(ic)=k}\mathcal{T}_{c,k}=\{i_{c}:\ell(i_{c})=k\} for k[Kc]k\in[K_{c}]. Let 𝒯^c={𝒯^c,1,𝒯^c,2,,𝒯^c,Kc}\mathcal{\hat{T}}_{c}=\{\mathcal{\hat{T}}_{c,1},\mathcal{\hat{T}}_{c,2},\ldots,\mathcal{\hat{T}}_{c,K_{c}}\} be the estimated partition of column nodes {1,2,,nc}\{1,2,\ldots,n_{c}\} obtained from ^\hat{\ell} of ONA such that 𝒯^c,k={ic:^(ic)=k}\mathcal{\hat{T}}_{c,k}=\{i_{c}:\hat{\ell}(i_{c})=k\} for k[Kc]k\in[K_{c}]. The criterion is defined as

f^c=minπSKcmaxk[Kc]|𝒯c,k𝒯^c,π(k)c|+|𝒯c,kc𝒯^c,π(k)|nc,k,\displaystyle\hat{f}_{c}=\mathrm{min}_{\pi\in S_{K_{c}}}\mathrm{max}_{k\in[K_{c}]}\frac{|\mathcal{T}_{c,k}\cap\mathcal{\hat{T}}^{c}_{c,\pi(k)}|+|\mathcal{T}^{c}_{c,k}\cap\mathcal{\hat{T}}_{c,\pi(k)}|}{n_{c,k}},

where SKcS_{K_{c}} is the set of all permutations of {1,2,,Kc}\{1,2,\ldots,K_{c}\} and the superscript cc denotes complementary set. As mentioned in Joseph and Yu (2016), f^c\hat{f}_{c} measures the maximum proportion of column nodes in the symmetric difference of 𝒯c,k\mathcal{T}_{c,k} and 𝒯^c,π(k)\mathcal{\hat{T}}_{c,\pi(k)}.

Next theorem gives theoretical bounds on estimations of memberships for both row and column nodes, which is the main theoretical result for ONA.

Theorem 6

Under ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}), suppose conditions in Lemma 5 hold, with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}),

  • for row nodes, there exists a permutation matrix 𝒫r\mathcal{P}_{r} such that

    maxir[nr]eir(Π^rΠr𝒫r)1=O(ϖκ(ΠrΠr)Krλ1(ΠrΠr)).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\varpi\kappa(\Pi^{\prime}_{r}\Pi_{r})K_{r}\sqrt{\lambda_{1}(\Pi^{\prime}_{r}\Pi_{r})}).
  • for column nodes,

    f^c=O(KrKcmax(nr,nc)log(nr+nc)σKr2(P~)ρδc2σKr2(Πr)nc,Krnc,min).\displaystyle\hat{f}_{c}=O(\frac{K_{r}K_{c}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(\tilde{P})\rho\delta^{2}_{c}\sigma^{2}_{K_{r}}(\Pi_{r})n_{c,K_{r}}n_{c,\mathrm{min}}}).

    Especially, when Kr=Kc=KK_{r}=K_{c}=K,

    f^c=O(K2max(nr,nc)nc,maxlog(nr+nc)σK2(P~)ρσK2(Πr)nc,min2).\displaystyle\hat{f}_{c}=O(\frac{K^{2}\mathrm{max}(n_{r},n_{c})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(\tilde{P})\rho\sigma^{2}_{K}(\Pi_{r})n^{2}_{c,\mathrm{min}}}).

Add conditions similar as Corollary 3.1 in Mao et al. (2020), we have the following corollary.

Corollary 7

Under ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}), suppose conditions in Lemma 5 hold, and further suppose that λKr(ΠrΠr)=O(nrKr),nc,min=O(ncKc)\lambda_{K_{r}}(\Pi^{\prime}_{r}\Pi_{r})=O(\frac{n_{r}}{K_{r}}),n_{c,\mathrm{min}}=O(\frac{n_{c}}{K_{c}}), with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}),

  • for row nodes, when Kr=Kc=KK_{r}=K_{c}=K,

    maxir[nr]eir(Π^rΠr𝒫r)1=O(K2(Cmax(nr,nc)min(nr,nc)+log(nr+nc))σK(P~)ρnc).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\frac{K^{2}(\sqrt{\frac{C\mathrm{max}(n_{r},n_{c})}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\sigma_{K}(\tilde{P})\sqrt{\rho n_{c}}}).
  • for column nodes,

    f^c=O(Kr2Kc3max(nr,nc)log(nr+nc)σKr2(P~)ρδc2nrnc2).\displaystyle\hat{f}_{c}=O(\frac{K^{2}_{r}K^{3}_{c}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(\tilde{P})\rho\delta^{2}_{c}n_{r}n^{2}_{c}}).

    When Kr=Kc=KK_{r}=K_{c}=K,

    f^c=O(K4max(nr,nc)log(nr+nc)σK2(P~)ρnrnc).\displaystyle\hat{f}_{c}=O(\frac{K^{4}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(\tilde{P})\rho n_{r}n_{c}}).

Especially, when nr=O(n),nc=O(n),Kr=O(1)n_{r}=O(n),n_{c}=O(n),K_{r}=O(1) and Kc=O(1)K_{c}=O(1),

  • for row nodes, when Kr=KcK_{r}=K_{c},

    maxir[nr]eir(Π^rΠr𝒫r)1=O(log(n)σKr(P~)ρn).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\frac{\sqrt{\mathrm{log}(n)}}{\sigma_{K_{r}}(\tilde{P})\sqrt{\rho n}}).
  • for column nodes,

    f^c=O(log(n)σKr2(P~)ρδc2n2).\displaystyle\hat{f}_{c}=O(\frac{\mathrm{log}(n)}{\sigma^{2}_{K_{r}}(\tilde{P})\rho\delta^{2}_{c}n^{2}}).

    When Kr=Kc=KK_{r}=K_{c}=K,

    f^c=O(log(n)σK2(P~)ρn).\displaystyle\hat{f}_{c}=O(\frac{\mathrm{log}(n)}{\sigma^{2}_{K}(\tilde{P})\rho n}).

When KrKcK_{r}\neq K_{c}, though it is challenge to obtain the lower bound of δc\delta_{c}, we can roughly set 2nc,max\sqrt{\frac{2}{n_{c,\mathrm{max}}}} as the lower bound of δc\delta_{c} since δc2nc,max\delta_{c}\geq\sqrt{\frac{2}{n_{c,\mathrm{max}}}} when Kr=KcK_{r}=K_{c}.

When ONM degenerates to SBM by setting Πr=Πc\Pi_{r}=\Pi_{c} and all nodes are pure, applying the separation condition and sharp threshold criterion developed in Qing (2021b) on the upper bounds of error rates in Corollary 7, sure we can obtain the classical separation condition of a balanced network and sharp threshold of the Erdös-Rényi random graph G(n,p)G(n,p) of Erdos and Rényi (2011), and this guarantees the optimality of our theoretical results.

3 The overlapping and degree-corrected nonoverlapping model

Similar as DCSBM Karrer and Newman (2011) is an extension of SBM by introducing node specific parameters to allow for varying degrees, in this section, we propose an extension of ONM by considering degree heterogeneity and build theoretical guarantees for algorithm fitting our model.

Let θc\theta_{c} be an nc×1n_{c}\times 1 vector whose ici_{c}-th entry is the degree heterogeneity of column node ici_{c}, for ic[nc]i_{c}\in[n_{c}]. Let Θc\Theta_{c} be an nc×ncn_{c}\times n_{c} diagonal matrix whose ici_{c}-th diagonal element is θc(ic)\theta_{c}(i_{c}). The extended model for generating AA is as follows:

Ω:=ΠrPΠcΘc,A(ir,ic)Bernoulli(Ω(ir,ic))forir[nr],ic[nc].\displaystyle\Omega:=\Pi_{r}P\Pi^{\prime}_{c}\Theta_{c},~{}~{}~{}A(i_{r},i_{c})\sim\mathrm{Bernoulli}(\Omega(i_{r},i_{c}))\qquad\mathrm{for~{}}i_{r}\in[n_{r}],i_{c}\in[n_{c}]. (6)
Definition 8

Call model (1), (2), (3),(4), (6) the Overlapping and Degree-Corrected Nonoverlapping model (ODCNM) and denote it by ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}).

Note that, under ODCNM, the maximum element of PP can be larger than 1 since maxic[nc]θc(ic)\mathrm{max}_{i_{c}\in[n_{c}]}\theta_{c}(i_{c}) also can control the sparsity of the directed network 𝒩\mathcal{N}. The following proposition guarantees that ODCNM is identifiable in terms of P,ΠrP,\Pi_{r} and Πc\Pi_{c}, and such identifiability is similar as that of DCSBM and DCScBM.

Proposition 9

If conditions (I1) and (I2) hold, ODCNM is identifiable for membership matrices: For eligible (P,Πr,Πc,Θc)(P,\Pi_{r},\Pi_{c},\Theta_{c}) and (Pˇ,Πˇr,Πˇc,Θˇc)(\check{P},\check{\Pi}_{r},\check{\Pi}_{c},\check{\Theta}_{c}), if ΠrPΠcΘc=ΠˇrPˇΠˇcΘˇc\Pi_{r}P\Pi^{\prime}_{c}\Theta_{c}=\check{\Pi}_{r}\check{P}\check{\Pi}^{\prime}_{c}\check{\Theta}_{c}, then Πr=Πˇr\Pi_{r}=\check{\Pi}_{r} and Πc=Πˇc\Pi_{c}=\check{\Pi}_{c}.

Remark 10

By setting θc(ic)=ρ\theta_{c}(i_{c})=\rho for ic[nc]i_{c}\in[n_{c}], ODCNM reduces to ONM, and this is the reason that ODCNM can be seen as an extension of ONM. Meanwhile, though DCScBM Rohe et al. (2016) can model directed networks with degree heterogeneities for both row and column nodes, DCScBM does not allow the overlapping property for nodes. For comparison, our ODCNM allows row nodes have overlapping property at the cost of losing the degree heterogeneities and requiring KrKcK_{r}\leq K_{c} for model identifiability. Furthermore, another identifiable model extends ONM by considering degree heterogeneity for row nodes with overlapping property is provided in Appendix D, in which we also explain why we do not extend ONM by considering degree heterogeneities for both row and column nodes.

3.1 A spectral algorithm for fitting ODCNM

We now discuss our intuition for the design of our algorithm to fit ODCNM. Without causing confusion, we also use Ur,Uc,Br,Bc,δc,YrU_{r},U_{c},B_{r},B_{c},\delta_{c},Y_{r}, and so on under ODCNM. Let Uc,nc×KrU_{c,*}\in\mathbb{R}^{n_{c}\times K_{r}} be the row-normalized version of UcU_{c} such that Uc,(ic,:)=Uc(ic,:)Uc(ic,:)FU_{c,*}(i_{c},:)=\frac{U_{c}(i_{c},:)}{\|U_{c}(i_{c},:)\|_{F}} for ic[nc]i_{c}\in[n_{c}]. Then clustering the rows of Uc,U_{c,*} by k-means algorithm can return perfect clustering for column nodes, and this is guaranteed by next lemma.

Lemma 11

Under ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}), there exist an unique Kr×KrK_{r}\times K_{r} matrix BrB_{r} and an unique Kc×KrK_{c}\times K_{r} matrix BcB_{c} such that

  • Ur=ΠrBrU_{r}=\Pi_{r}B_{r} where Br=Ur(r,:)B_{r}=U_{r}(\mathcal{I}_{r},:). Meanwhile, Ur(ir,:)=Ur(i¯r,:)U_{r}(i_{r},:)=U_{r}(\bar{i}_{r},:) when Πr(ir,:)=Πr(i¯r,:)\Pi_{r}(i_{r},:)=\Pi_{r}(\bar{i}_{r},:) for ir,i¯r[nr]i_{r},\bar{i}_{r}\in[n_{r}].

  • Uc,=ΠcBcU_{c,*}=\Pi_{c}B_{c}. Meanwhile, Uc,(ic,:)=Uc,(i¯c,:)U_{c,*}(i_{c},:)=U_{c,*}(\bar{i}_{c},:) when (ic)=(i¯c)\ell(i_{c})=\ell(\bar{i}_{c}) for ic,i¯c[nc]i_{c},\bar{i}_{c}\in[n_{c}]. Furthermore, when Kr=Kc=KK_{r}=K_{c}=K, we have Bc(k,:)Bc(l,:)F=2\|B_{c}(k,:)-B_{c}(l,:)\|_{F}=\sqrt{2} for all 1k<lK1\leq k<l\leq K.

Recall that we set δc=minklBc(k,:)Bc(l,:)F\delta_{c}=\mathrm{min}_{k\neq l}\|B_{c}(k,:)-B_{c}(l,:)\|_{F}, by Lemma 11, δc=2\delta_{c}=\sqrt{2} when Kr=Kc=KK_{r}=K_{c}=K under ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}). However, when Kr<KcK_{r}<K_{c}, it is challenge to obtain a positive lower bound of δc\delta_{c}, see the proof of Lemma 11 for detail.

Under ODCNM, to recover Πc\Pi_{c} from UcU_{c}, since Uc,U_{c,*} has KcK_{c} distinct rows, applying k-means algorithm on all rows of Uc,U_{c,*} returns true column communities by Lemma 11; to recover Πr\Pi_{r} from UrU_{r}, just follow same idea as that of under ONM.

Based on the above analysis, we are now ready to give the following algorithm which we call Ideal ODCNA. Input Ω,Kr,Kc\Omega,K_{r},K_{c} with KrKcK_{r}\leq K_{c}. Output: Πr\Pi_{r} and \ell.

  • Let Ω=UrΛUc\Omega=U_{r}\Lambda U^{\prime}_{c} be the compact SVD of Ω\Omega such that Urnr×Kr,Ucnc×Kr,ΛKr×Kr,UrUr=IKr,UcUc=IKrU_{r}\in\mathbb{R}^{n_{r}\times K_{r}},U_{c}\in\mathbb{R}^{n_{c}\times K_{r}},\Lambda\in\mathbb{R}^{K_{r}\times K_{r}},U^{\prime}_{r}U_{r}=I_{K_{r}},U^{\prime}_{c}U_{c}=I_{K_{r}}. Let Uc,U_{c,*} be the row-normalization of UcU_{c}.

  • For row nodes,

    • Run SP algorithm on all rows of UrU_{r} assuming there are KrK_{r} row communities to obtain Ur(r,:)U_{r}(\mathcal{I}_{r},:). Set Br=Ur(r,:)B_{r}=U_{r}(\mathcal{I}_{r},:).

    • Set Yr=UrBr(BrBr)1Y_{r}=U_{r}B_{r}^{\prime}(B_{r}B_{r}^{\prime})^{-1}. Recover Πr\Pi_{r} by setting Πr(ir,:)=Yr(ir,:)Yr(ir,:)1\Pi_{r}(i_{r},:)=\frac{Y_{r}(i_{r},:)}{\|Y_{r}(i_{r},:)\|_{1}} for ir[nr]i_{r}\in[n_{r}].

    For column nodes: run k-means on Uc,U_{c,*} assuming there are KcK_{c} column communities to obtain \ell.

Sure, Ideal ODCNA exactly recoveries row nodes memberships and column nodes labels, and this also supports the identifiability of ODCNM.

We now extend the ideal case to the real case. Let U^c,nc×Kr\hat{U}_{c,*}\in\mathbb{R}^{n_{c}\times K_{r}} be the row-normalized version of U^c\hat{U}_{c} such that U^c,(ic,:)=U^c(ic,:)U^c(ic,:)F\hat{U}_{c,*}(i_{c},:)=\frac{\hat{U}_{c}(i_{c},:)}{\|\hat{U}_{c}(i_{c},:)\|_{F}} for ic[nc]i_{c}\in[n_{c}]. Algorithm 2 called overlapping and degree-corrected nonoverlapping algorithm (ODCNA for short) is a natural extension of the Ideal ODCNA to the real case.

Algorithm 2 Overlapping and Degree-Corrected Nonoverlapping Algorithm (ODCNA)
1:The adjacency matrix Anr×ncA\in\mathbb{R}^{n_{r}\times n_{c}} of a directed network, the number of row communities KrK_{r}, and the number of column communities KcK_{c} with KrKcK_{r}\leq K_{c}.
2:The estimated nr×Krn_{r}\times K_{r} membership matrix Π^r\hat{\Pi}_{r} for row nodes, and the estimated nc×1n_{c}\times 1 labels vector ^\hat{\ell} for column nodes.
3:Compute U^rnr×Kr\hat{U}_{r}\in\mathbb{R}^{n_{r}\times K_{r}} and U^cnc×Kr\hat{U}_{c}\in\mathbb{R}^{n_{c}\times K_{r}} from the top-KrK_{r}-dimensional SVD of AA. Compute U^c,\hat{U}_{c,*} from U^c\hat{U}_{c}.
4:For row nodes:
  • Apply SP algorithm (i.e., Algorithm 3) on the rows of U^r\hat{U}_{r} assuming there are KrK_{r} row clusters to obtain the near-corners matrix U^r(^r,:)Kr×Kr\hat{U}_{r}(\mathcal{\hat{I}}_{r},:)\in\mathbb{R}^{K_{r}\times K_{r}}, where ^r\mathcal{\hat{I}}_{r} is the index set returned by SP algorithm. Set B^r=U^r(^r,:)\hat{B}_{r}=\hat{U}_{r}(\mathcal{\hat{I}}_{r},:).

  • Compute the nr×Krn_{r}\times K_{r} matrix Y^r\hat{Y}_{r} such that Y^r=U^rB^r(B^rB^r)1\hat{Y}_{r}=\hat{U}_{r}\hat{B}_{r}^{\prime}(\hat{B}_{r}\hat{B}_{r}^{\prime})^{-1}. Set Y^r=max(0,Y^r)\hat{Y}_{r}=\mathrm{max}(0,\hat{Y}_{r}) and estimate Πr(ir,:)\Pi_{r}(i_{r},:) by Π^r(ir,:)=Y^r(ir,:)Y^r(ir,:)1,ir[nr]\hat{\Pi}_{r}(i_{r},:)=\frac{\hat{Y}_{r}(i_{r},:)}{\|\hat{Y}_{r}(i_{r},:)\|_{1}},i_{r}\in[n_{r}].

For column nodes: run k-means on U^c,\hat{U}_{c,*} assuming there are KcK_{c} column communities to obtain ^\hat{\ell}.

3.2 Main results for ODCNA

Set θc,max=maxic[nc]θc(ic),θc,min=minic[nc]θc(ic)\theta_{c,\mathrm{max}}=\mathrm{max}_{i_{c}\in[n_{c}]}\theta_{c}(i_{c}),\theta_{c,\mathrm{min}}=\mathrm{min}_{i_{c}\in[n_{c}]}\theta_{c}(i_{c}), and Pmax=maxk[Kr],l[nc]P(k,l)P_{\mathrm{max}}=\mathrm{max}_{k\in[K_{r}],l\in[n_{c}]}P(k,l). Assume that

Assumption 12

Pmaxmax(θc,maxnr,θc1)log(nr+nc)P_{\mathrm{max}}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\geq\mathrm{log}(n_{r}+n_{c}).

By the proof of Lemma 4.3 of Qing (2021a), we have below lemma.

Lemma 13

(Row-wise singular eigenvector error) Under ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}), when Assumption (12) holds, suppose σKr(Ω)Cθc,max(nr+nc)log(nr+nc)\sigma_{K_{r}}(\Omega)\geq C\sqrt{\theta_{c,\mathrm{max}}(n_{r}+n_{c})\mathrm{log}(n_{r}+n_{c})}, with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}),

U^rU^rUrUr2=O(θc,maxKr(κ(Ω)max(nr,nc)μmin(nr,nc)+log(nr+nc))θc,minσKr(P)σKr(Πr)nc,Kr).\displaystyle\|\hat{U}_{r}\hat{U}^{\prime}_{r}-U_{r}U^{\prime}_{r}\|_{2\rightarrow\infty}=O(\frac{\sqrt{\theta_{c,\mathrm{max}}K_{r}}(\kappa(\Omega)\sqrt{\frac{\mathrm{max}(n_{r},n_{c})\mu}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\theta_{c,\mathrm{min}}\sigma_{K_{r}}(P)\sigma_{K_{r}}(\Pi_{r})\sqrt{n_{c,K_{r}}}}).

Next theorem is the main theoretical result for ODCNA, where we also use same measurements as ONA to measure the performances of ODCNA.

Theorem 14

Under ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}), suppose conditions in Lemma 13 hold, with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}),

  • for row nodes,

    maxir[nr]eir(Π^rΠr𝒫r)1=O(ϖκ(ΠrΠr)Krλ1(ΠrΠr)).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\varpi\kappa(\Pi^{\prime}_{r}\Pi_{r})K_{r}\sqrt{\lambda_{1}(\Pi^{\prime}_{r}\Pi_{r})}).
  • for column nodes,

    f^c=O(θc,max2KrKcmax(θc,maxnr,θc1)nc,maxlog(nr+nc)σKr2(P)θc,min4δc2mVc2σKr2(Πr)nc,Krnc,min),\displaystyle\hat{f}_{c}=O(\frac{\theta^{2}_{c,\mathrm{max}}K_{r}K_{c}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(P)\theta^{4}_{c,\mathrm{min}}\delta^{2}_{c}m^{2}_{V_{c}}\sigma^{2}_{K_{r}}(\Pi_{r})n_{c,K_{r}}n_{c,\mathrm{min}}}),

    where mVcm_{V_{c}} is a parameter defined in the proof of this theorem, and it is 11 when Kr=KcK_{r}=K_{c}. Especially, when Kr=Kc=KK_{r}=K_{c}=K,

    f^c=O(θc,max2K2max(θc,maxnr,θc1)nc,maxlog(nr+nc)σK2(P)θc,min4σK2(Πr)nc,min2).\displaystyle\hat{f}_{c}=O(\frac{\theta^{2}_{c,\mathrm{max}}K^{2}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(P)\theta^{4}_{c,\mathrm{min}}\sigma^{2}_{K}(\Pi_{r})n^{2}_{c,\mathrm{min}}}).

Add some conditions on model parameters, we have the following corollary.

Corollary 15

Under ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}), suppose conditions in Lemma 13 hold, and further suppose that λKr(ΠrΠr)=O(nrKr),nc,min=O(ncKc)\lambda_{K_{r}}(\Pi^{\prime}_{r}\Pi_{r})=O(\frac{n_{r}}{K_{r}}),n_{c,\mathrm{min}}=O(\frac{n_{c}}{K_{c}}), with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}),

  • for row nodes, when Kr=Kc=KK_{r}=K_{c}=K,

    maxir[nr]eir(Π^rΠr𝒫r)1=O(K2θc,max(Cmax(nr,nc)min(nr,nc)+log(nr+nc))θc,minσK(P)nc).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\frac{K^{2}\sqrt{\theta_{c,\mathrm{max}}}(\sqrt{\frac{C\mathrm{max}(n_{r},n_{c})}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\theta_{c,\mathrm{min}}\sigma_{K}(P)\sqrt{n_{c}}}).
  • for column nodes,

    f^c=O(θc,max2Kr2Kc2max(θc,maxnr,θc1)log(nr+nc)σKr2(P)θc,min4δc2mVc2nrnc).\displaystyle\hat{f}_{c}=O(\frac{\theta^{2}_{c,\mathrm{max}}K^{2}_{r}K^{2}_{c}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(P)\theta^{4}_{c,\mathrm{min}}\delta^{2}_{c}m^{2}_{V_{c}}n_{r}n_{c}}).

    When Kr=Kc=KK_{r}=K_{c}=K,

    f^c=O(θc,max2K4max(θc,maxnr,θc1)log(nr+nc)σK2(P)θc,min4nrnc).\displaystyle\hat{f}_{c}=O(\frac{\theta^{2}_{c,\mathrm{max}}K^{4}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(P)\theta^{4}_{c,\mathrm{min}}n_{r}n_{c}}).

Especially, when nr=O(n),nc=O(n),Kr=O(1)n_{r}=O(n),n_{c}=O(n),K_{r}=O(1) and Kc=O(1)K_{c}=O(1),

  • for row nodes, when Kr=KcK_{r}=K_{c},

    maxir[nr]eir(Π^rΠr𝒫r)1=O(θc,maxlog(n)θc,minσK(P)n).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\frac{\sqrt{\theta_{c,\mathrm{max}}\mathrm{log}(n)}}{\theta_{c,\mathrm{min}}\sigma_{K}(P)\sqrt{n}}).
  • for column nodes,

    f^c=O(θc,max2max(θc,maxnr,θc1)log(n)σKr2(P)θc,min4δc2mVc2n2).\displaystyle\hat{f}_{c}=O(\frac{\theta^{2}_{c,\mathrm{max}}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\mathrm{log}(n)}{\sigma^{2}_{K_{r}}(P)\theta^{4}_{c,\mathrm{min}}\delta^{2}_{c}m^{2}_{V_{c}}n^{2}}).

    When Kr=Kc=KK_{r}=K_{c}=K,

    f^c=O(θc,max2max(θc,maxnr,θc1)log(n)σK2(P)θc,min4n2).\displaystyle\hat{f}_{c}=O(\frac{\theta^{2}_{c,\mathrm{max}}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\mathrm{log}(n)}{\sigma^{2}_{K}(P)\theta^{4}_{c,\mathrm{min}}n^{2}}).

When KrKcK_{r}\neq K_{c}, though it is challenge to obtain the lower bounds of δc\delta_{c} and mVcm_{V_{c}}, we can roughly set 2\sqrt{2} and 11 as the lower bounds of δc\delta_{c} and mVcm_{V_{c}}, respectively, since δc=2\delta_{c}=\sqrt{2} and mVc=1m_{V_{c}}=1 when Kr=KcK_{r}=K_{c}. Meanwhile, if we further set θc,max=O(ρ)\theta_{c,\mathrm{max}}=O(\rho) and θc,min=O(ρ)\theta_{c,\mathrm{min}}=O(\rho), we have below corollary.

Corollary 16

Under ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}), suppose conditions in Lemma 13 hold, and further suppose that λKr(ΠrΠr)=O(nrKr),nc,min=O(ncKc)\lambda_{K_{r}}(\Pi^{\prime}_{r}\Pi_{r})=O(\frac{n_{r}}{K_{r}}),n_{c,\mathrm{min}}=O(\frac{n_{c}}{K_{c}}) and θc,max=O(ρ),θc,min=O(ρ)\theta_{c,\mathrm{max}}=O(\rho),\theta_{c,\mathrm{min}}=O(\rho), with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}),

  • for row nodes, when Kr=Kc=KK_{r}=K_{c}=K,

    maxir[nr]eir(Π^rΠr𝒫r)1=O(K2(Cmax(nr,nc)min(nr,nc)+log(nr+nc))σK(P)ρnc).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\frac{K^{2}(\sqrt{\frac{C\mathrm{max}(n_{r},n_{c})}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\sigma_{K}(P)\sqrt{\rho n_{c}}}).
  • for column nodes,

    f^c=O(Kr2Kc2max(nr,nc)log(nr+nc)σKr2(P)ρδc2mVc2nrnc).\displaystyle\hat{f}_{c}=O(\frac{K^{2}_{r}K^{2}_{c}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(P)\rho\delta^{2}_{c}m^{2}_{V_{c}}n_{r}n_{c}}).

    When Kr=Kc=KK_{r}=K_{c}=K,

    f^c=O(K4max(nr,nc)log(nr+nc)σK2(P)ρnrnc).\displaystyle\hat{f}_{c}=O(\frac{K^{4}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(P)\rho n_{r}n_{c}}).

Especially, when nr=O(n),nc=O(n),Kr=O(1)n_{r}=O(n),n_{c}=O(n),K_{r}=O(1) and Kc=O(1)K_{c}=O(1),

  • for row nodes, when Kr=KcK_{r}=K_{c},

    maxir[nr]eir(Π^rΠr𝒫r)1=O(log(n)σK(P)ρn).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\frac{\sqrt{\mathrm{log}(n)}}{\sigma_{K}(P)\sqrt{\rho n}}).
  • for column nodes,

    f^c=O(log(n)σKr2(P)ρδc2mVc2n).\displaystyle\hat{f}_{c}=O(\frac{\mathrm{log}(n)}{\sigma^{2}_{K_{r}}(P)\rho\delta^{2}_{c}m^{2}_{V_{c}}n}).

    When Kr=Kc=KK_{r}=K_{c}=K,

    f^c=O(log(n)σK2(P)ρn).\displaystyle\hat{f}_{c}=O(\frac{\mathrm{log}(n)}{\sigma^{2}_{K}(P)\rho n}).

By setting Θc=ρI\Theta_{c}=\rho I, ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}) degenerates to ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}). By comparing Corollary 7 and Corollary 16, we see that theoretical results under ODCNM are consistent with those under ONM when ODCNM degenerates to ONM for the case that Kr=Kc=KK_{r}=K_{c}=K.

4 Simulations

In this section,we present some simulations to investigate the performance of the three proposed algorithms. We measure their performances by Mixed-Hamming error rate (MHamm for short) for row nodes and Hamming error rate (Hamm for short) for column nodes defined below

MHamm=minπSKrΠ^rπΠr1nrandHamm=minπSΠ^cπΠc1nc,\displaystyle\mathrm{MHamm}=\frac{\mathrm{min}_{\pi\in S_{K_{r}}}\|\hat{\Pi}_{r}\pi-\Pi_{r}\|_{1}}{n_{r}}\mathrm{~{}and~{}}\mathrm{Hamm}=\frac{\mathrm{min}_{\pi\in S}\|\hat{\Pi}_{c}\pi-\Pi_{c}\|_{1}}{n_{c}},

where Π^cnc×Kc\hat{\Pi}_{c}\in\mathbb{R}^{n_{c}\times K_{c}} is defined as Π^c(ic,k)=1\hat{\Pi}_{c}(i_{c},k)=1 if ^(ic)=k\hat{\ell}(i_{c})=k and 0 otherwise for ic[nc],k[Kc]i_{c}\in[n_{c}],k\in[K_{c}].

For all simulations in this section, the parameters (nr,nc,Kr,Kc,P,ρ,Πr,Πc,Θc)(n_{r},n_{c},K_{r},K_{c},P,\rho,\Pi_{r},\Pi_{c},\Theta_{c}) are set as follows. Unless specified, set nr=400,nc=300,Kr=3,Kc=4n_{r}=400,n_{c}=300,K_{r}=3,K_{c}=4. For column nodes, generate Πc\Pi_{c} by setting each column node belonging to one of the column communities with equal probability. Let each row community have 100100 pure nodes, and let all the mixed row nodes have memberships (0.6,0.3,0.1)(0.6,0.3,0.1). P=ρP~P=\rho\tilde{P} is set independently under ONM and ODCNM. Under ONM, ρ\rho is 0.5 in Experiment 1 and we study the influence of ρ\rho in Experiment 2; Under ODCNM, for zc1z_{c}\geq 1, we generate the degree parameters for column nodes as below: let θcnc×1\theta_{c}\in\mathbb{R}^{n_{c}\times 1} such that 1/θc(ic)iidU(1,zc)1/\theta_{c}(i_{c})\overset{iid}{\sim}U(1,z_{c}) for ic[nc]i_{c}\in[n_{c}], where U(1,zc)U(1,z_{c}) denotes the uniform distribution on [1,zc][1,z_{c}]. We study the influences of ZcZ_{c} and ρ\rho under ODCNM in Experiments 3 and 4, respectively. For all settings, we report the averaged MHamm and the averaged Hamm over 50 repetitions.

Experiment 1: Changing ncn_{c} under ONM. Let ncn_{c} range in {50,100,150,,300}\{50,100,150,\ldots,300\}. For this experiment, PP is set as

P=ρ[10.30.20.30.20.90.10.20.30.20.80.3].P=\rho\begin{bmatrix}1&0.3&0.2&0.3\\ 0.2&0.9&0.1&0.2\\ 0.3&0.2&0.8&0.3\\ \end{bmatrix}.

Let ρ=0.5\rho=0.5 under for this experiment designed under ONM. The numerical results are shown in panels (a) and (b) of Figure 1. The results show that as ncn_{c} increases, ONA and ODCNA perform better. Meanwhile, the total run-time for this experiment is roughly 70 seconds. For row nodes, since both ONA and ODCNA apply SP algorithm on U^\hat{U} to estimate Πr\Pi_{r}, the estimated row membership matrices of ONA and ODCNA are same, and hence MHamm for ONA always equal to that of ODCNA.

Refer to caption
(a) Changing ncn_{c} under ONM: MHamm.
Refer to caption
(b) Changing ncn_{c} under ONM: Hamm.
Refer to caption
(c) Changing ρ\rho under ONM: MHamm.
Refer to caption
(d) Changing ρ\rho under ONM: Hamm.
Refer to caption
(e) Changing zcz_{c} under ODCNM: MHamm.
Refer to caption
(f) Changing zcz_{c} under ODCNM: Hamm.
Refer to caption
(g) Changing ρ\rho under ODCNM: MHamm.
Refer to caption
(h) Changing ρ\rho under ODCNM: Hamm.
Figure 1: Estimation errors of ONA and ODCNA.

Experiment 2: Changing ρ\rho under ONM. PP is set same as Experiment 1, and we let ρ\rho range in {0.1,0.2,,1}\{0.1,0.2,\ldots,1\} to study the influence of ρ\rho on performances of ONA and ODCNA under ONM. The results are displayed in panels (c) and (d) of Figure 1. From the results, we see that both methods perform better as ρ\rho increases since a larger ρ\rho gives more edges generated in a directed network. Meanwhile, the total run-time for this experiment is roughly 136 seconds.

Experiment 3: Change zcz_{c} under ODCNM. PP is set same as Experiment 1. Let zcz_{c} range in {1,2,,8}\{1,2,\ldots,8\}. Increasing zcz_{c} decreases edges generated under ODCNM. Panels (e) and (f) in Figure 1 display simulation results of this experiment. The results show that, generally, increasing the variability of node degrees makes it harder to detect node memberships for both ONA and ODCNA. Though ODCNA is designed under ODCNM, it holds similar performances as ONA for directed networks in which column nodes have various degrees in this experiment, and this is consistent with our theoretical findings in Corollaries 7 and 15. Meanwhile, the total run-time for this experiment is around 131 seconds.

Experiment 4: Change ρ\rho under ODCNM. Set zc=3z_{c}=3, PP is set same as Experiment 1, and let ρ\rho range in {0.1,0.2,,1}\{0.1,0.2,\ldots,1\} under ODCNM. Panels (g) and (h) in Figure 1 displays simulation results of this experiment. The performances of the two proposed methods are similar as that of Experiment 2. Meanwhile, the total run-time for this experiment is around 221 seconds.

5 Discussions

In this paper, we introduced overlapping and nonoverlapping models and its extension by considering degree heterogeneity. The models can model directed network with KrK_{r} row communities and KcK_{c} column communities, in which row node can belong to multiple row communities while column node only belong to one of the column communities. The proposed models are identifiable when KrKcK_{r}\leq K_{c} and some other popular constraints on the connectivity matrix and membership matrices. For comparison, modeling directed network in which row nodes have overlapping property while column nodes do not with Kr>KcK_{r}>K_{c} is unidentifiable. Meanwhile, since previous works found that modeling directed networks in which both row and column nodes have overlapping property with KrKcK_{r}\neq K_{c} is unidentifiable, our identifiable ONM and ODCNM as well as the DCONM in Appendix D supply a gap in modeling overlapping directed networks when KrKcK_{r}\neq K_{c}. Theses models provide exploratory tools for studying community structure in directed networks with one side is overlapping while another side is nonoverlapping. Two spectral algorithms are designed to fit ONM and ODCNM. We also showed estimation consistency under mild conditions for our methods. Especially, when ODCNM reduces to ONM, our theoretical results under ODCNM are consistent with those under ONM. But perhaps the main limitation of the models is that the KrK_{r} and KcK_{c} in the directed network are assumed given, and such limitation also holds for the ScBM and DCScBM of Rohe et al. (2016). In most community problems, the number of row community and the number of column community are unknown, therefore a complete calculation and theoretical study require not only the algorithms and their theoretically consistent estimations described in this paper but also a method for estimating KrK_{r} and KcK_{c}. We leave studies of this problem to our future work.

A Successive Projection algorithm

Algorithm 3 is the Successive Projection algorithm.

Algorithm 3 Successive Projection (SP) (Gillis and Vavasis, 2015)
1:Near-separable matrix Ysp=SspMsp+Zsp+m×nY_{sp}=S_{sp}M_{sp}+Z_{sp}\in\mathbb{R}^{m\times n}_{+} , where Ssp,MspS_{sp},M_{sp} should satisfy Assumption 1 Gillis and Vavasis (2015), the number rr of columns to be extracted.
2:Set of indices 𝒦\mathcal{K} such that Ysp(𝒦,:)SY_{sp}(\mathcal{K},:)\approx S (up to permutation)
3:Let R=Ysp,𝒦={},k=1R=Y_{sp},\mathcal{K}=\{\},k=1.
4:While R0R\neq 0 and krk\leq r do
5:       k=argmaxkR(k,:)Fk_{*}=\mathrm{argmax}_{k}\|R(k,:)\|_{F}.
6:      uk=R(k,:)u_{k}=R(k_{*},:).
7:      R(IukukukF2)RR\leftarrow(I-\frac{u_{k}u^{\prime}_{k}}{\|u_{k}\|^{2}_{F}})R.
8:      𝒦=𝒦{k}\mathcal{K}=\mathcal{K}\cup\{k_{*}\}.
9:      k=k+1.
10:end while

B Proofs under ONM

B.1 Proof of Proposition 2

Proof  By Lemma 3, let UrΛUcU_{r}\Lambda U^{\prime}_{c} be the compact SVD of Ω\Omega such that Ω=UrΛUc\Omega=U_{r}\Lambda U^{\prime}_{c}, since Ω=ΠrPΠc=ΠˇrPˇΠˇc\Omega=\Pi_{r}P\Pi^{\prime}_{c}=\check{\Pi}_{r}\check{P}\check{\Pi}^{\prime}_{c}, we have Ω(r,c)=P=Pˇ\Omega(\mathcal{I}_{r},\mathcal{I}_{c})=P=\check{P}, which gives P=PˇP=\check{P}. By Lemma 3, since Ur=ΠrUr(r,:)=ΠˇrUr(r,:)U_{r}=\Pi_{r}U_{r}(\mathcal{I}_{r},:)=\check{\Pi}_{r}U_{r}(\mathcal{I}_{r},:), we have Πr=Πˇr\Pi_{r}=\check{\Pi}_{r} where we have used the fact that the inverse of Ur(r,:)U_{r}(\mathcal{I}_{r},:) exists. Since Ω=ΠrPΠc=ΠˇrPˇΠˇc=ΠrPΠˇc\Omega=\Pi_{r}P\Pi^{\prime}_{c}=\check{\Pi}_{r}\check{P}\check{\Pi}^{\prime}_{c}=\Pi_{r}P\check{\Pi}^{\prime}_{c}, we have ΠrPΠc=ΠrPΠˇc\Pi_{r}P\Pi^{\prime}_{c}=\Pi_{r}P\check{\Pi}^{\prime}_{c}. By Lemma 7 of Qing and Wang (2021a), we have PΠc=PΠˇcP\Pi^{\prime}_{c}=P\check{\Pi}^{\prime}_{c}, i.e., ΠcX=ΠˇcX\Pi_{c}X=\check{\Pi}_{c}X where we set X=PKc×KrX=P^{\prime}\in\mathbb{R}^{K_{c}\times K_{r}}. Let ˇ\check{\ell} be the nc×1n_{c}\times 1 vector of column nodes labels obtained from Πˇc\check{\Pi}_{c}. For ic[nc],k[Kr]i_{c}\in[n_{c}],k\in[K_{r}], from ΠcX=ΠˇcX\Pi_{c}X=\check{\Pi}_{c}X, we have (ΠcX)(ic,k)=Πc(ic,:)X(:,k)=X((ic),k)=X(ˇ(ic),k)(\Pi_{c}X)(i_{c},k)=\Pi_{c}(i_{c},:)X(:,k)=X(\ell(i_{c}),k)=X(\check{\ell}(i_{c}),k), which means that we must have (ic)=ˇ(ic)\ell(i_{c})=\check{\ell}(i_{c}) for all ic[nc]i_{c}\in[n_{c}], i.e., =ˇ\ell=\check{\ell} and Πc=Πˇc\Pi_{c}=\check{\Pi}_{c}. Note that, for the special case Kr=Kc=KK_{r}=K_{c}=K, Πc=Πˇc\Pi_{c}=\check{\Pi}_{c} can be obtained easily: since PΠc=PΠˇcP\Pi^{\prime}_{c}=P\check{\Pi}^{\prime}_{c} and PK×KP\in\mathbb{R}^{K\times K} is assumed to be full rank, we have Πc=Πˇc\Pi_{c}=\check{\Pi}_{c}. Thus the proposition holds.  

B.2 Proof of Lemma 3

Proof  For UrU_{r}, since Ω=UrΛUc\Omega=U_{r}\Lambda U^{\prime}_{c} and UcUc=IKrU^{\prime}_{c}U_{c}=I_{K_{r}}, we have Ur=ΩUcΛ1U_{r}=\Omega U_{c}\Lambda^{-1}. Recall that Ω=ΠrPΠc\Omega=\Pi_{r}P\Pi^{\prime}_{c}, we have Ur=ΠrPΠcUcΛ1=ΠrBrU_{r}=\Pi_{r}P\Pi^{\prime}_{c}U_{c}\Lambda^{-1}=\Pi_{r}B_{r}, where we set Br=PΠcUcΛ1B_{r}=P\Pi^{\prime}_{c}U_{c}\Lambda^{-1}. Since Ur(r,:)=Πr(r,:)Br=BrU_{r}(\mathcal{I}_{r},:)=\Pi_{r}(\mathcal{I}_{r},:)B_{r}=B_{r}, we have Br=Ur(r,:)B_{r}=U_{r}(\mathcal{I}_{r},:). For ir[nr]i_{r}\in[n_{r}], Ur(ir,:)=eirΠrBr=Πr(ir,:)BrU_{r}(i_{r},:)=e^{\prime}_{i_{r}}\Pi_{r}B_{r}=\Pi_{r}(i_{r},:)B_{r}, so sure we have Ur(ir,:)=Ur(i¯r,:)U_{r}(i_{r},:)=U_{r}(\bar{i}_{r},:) when Πr(ir,:)=Πr(i¯r,:)\Pi_{r}(i_{r},:)=\Pi_{r}(\bar{i}_{r},:).

For UcU_{c}, follow similar analysis as for UrU_{r}, we have Uc=ΠcBcU_{c}=\Pi_{c}B_{c}, where Bc=PΠrUrΛ1B_{c}=P^{\prime}\Pi^{\prime}_{r}U_{r}\Lambda^{-1}. Note that BcKcKrB_{c}\in\mathbb{R}^{K_{c}*K_{r}}. Sure, Uc(ic,:)=Uc(i¯c,:)U_{c}(i_{c},:)=U_{c}(\bar{i}_{c},:) when (ic)=(i¯c)\ell(i_{c})=\ell(\bar{i}_{c}) for ic,i¯c[nc]i_{c},\bar{i}_{c}\in[n_{c}].

Now, we focus on the case when Kr=Kc=KK_{r}=K_{c}=K. For this case, since BcKcKrB_{c}\in\mathbb{R}^{K_{c}*K_{r}}, BcB_{c} is full rank when Kr=KcK_{r}=K_{c}. Since IKr=IK=UcUc=BcΠcΠcBcI_{K_{r}}=I_{K}=U^{\prime}_{c}U_{c}=B^{\prime}_{c}\Pi^{\prime}_{c}\Pi_{c}B_{c}, we have ΠcΠc=(BcBc)1\Pi^{\prime}_{c}\Pi_{c}=(B_{c}B^{\prime}_{c})^{-1}. Since ΠcΠc=diag(nc,1,nc,2,,nc,K)\Pi^{\prime}_{c}\Pi_{c}=\mathrm{diag}(n_{c,1},n_{c,2},\ldots,n_{c,K}), we have BcBc=diag(1nc,1,1nc,2,,1nc,K)B_{c}B^{\prime}_{c}=\mathrm{diag}(\frac{1}{n_{c,1}},\frac{1}{n_{c,2}},\ldots,\frac{1}{n_{c,K}}). When Kr=Kc=KK_{r}=K_{c}=K, we have Bc(k,:)Bc(l,:)=0B_{c}(k,:)B^{\prime}_{c}(l,:)=0 for any klk\neq l and k,l[K]k,l\in[K]. Then ,we have BcBc=diag(Bc(1,:)F2,Bc(2,:)F2,,Bc(K,:)F2)=diag(1nc,1,1nc,2,,1nc,K)B_{c}B^{\prime}_{c}=\mathrm{diag}(\|B_{c}(1,:)\|^{2}_{F},\|B_{c}(2,:)\|^{2}_{F},\ldots,\|B_{c}(K,:)\|^{2}_{F})=\mathrm{diag}(\frac{1}{n_{c,1}},\frac{1}{n_{c,2}},\ldots,\frac{1}{n_{c,K}}) and the lemma follows.

Note that when Kr<KcK_{r}<K_{c}, since BcB_{c} is not full rank now, we can not obtain ΠcΠc=(BcBc)1\Pi^{\prime}_{c}\Pi_{c}=(B_{c}B^{\prime}_{c})^{-1} from IKr=BcΠcΠcBcI_{K_{r}}=B^{\prime}_{c}\Pi^{\prime}_{c}\Pi_{c}B_{c}. Therefore, when Kr<KcK_{r}<K_{c}, the equality Bc(k,:)Bc(l,:)F=1nc,k+1nc,l\|B_{c}(k,:)-B_{c}(l,:)\|_{F}=\sqrt{\frac{1}{n_{c,k}}+\frac{1}{n_{c,l}}} does not hold for any klk\neq l. And we can only know that UcU_{c} has KcK_{c} distinct rows when Kr<KcK_{r}<K_{c}, but have no knowledge about the minimum distance between any two distinct rows of UcU_{c}.  

B.3 Proof of Theorem 6

Proof  For row nodes, when conditions in Lemma 5 hold, by Theorem 2 of Qing and Wang (2021a), with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}) for any α>0\alpha>0, there exists a permutation matrix 𝒫r\mathcal{P}_{r} such that, for ir[nr]i_{r}\in[n_{r}], we have

eir(Π^rΠr𝒫r)1=O(ϖκ(ΠrΠr)Krλ1(ΠrΠr)).\displaystyle\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\varpi\kappa(\Pi^{\prime}_{r}\Pi_{r})K_{r}\sqrt{\lambda_{1}(\Pi^{\prime}_{r}\Pi_{r})}).

Next, we focus on column nodes. By the proof of Lemma 2.3 of Qing and Wang (2021b), there exists an orthogonal matrix O^\hat{O} such that

U^cO^UcF22KrAΩλKr(ΩΩ).\displaystyle\|\hat{U}_{c}\hat{O}-U_{c}\|_{F}\leq\frac{2\sqrt{2K_{r}}\|A-\Omega\|}{\sqrt{\lambda_{K_{r}}(\Omega^{\prime}\Omega)}}. (7)

Under ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}), by Lemma 10 of Qing and Wang (2021a), we have

λKr(ΩΩ)ρσKr(P~)σKr(Πr)σKr(Πc).\displaystyle\sqrt{\lambda_{K_{r}}(\Omega^{\prime}\Omega)}\geq\rho\sigma_{K_{r}}(\tilde{P})\sigma_{K_{r}}(\Pi_{r})\sigma_{K_{r}}(\Pi_{c}). (8)

Since all column nodes are pure, σKr(Πc)=nc,Kr\sigma_{K_{r}}(\Pi_{c})=\sqrt{n_{c,K_{r}}}. By Lemma 3 of Qing and Wang (2021a), when Assumption (4) holds, with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}), we have

AΩ=O(ρmax(nr,nc)log(nr+nc)).\displaystyle\|A-\Omega\|=O(\sqrt{\rho\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}). (9)

Substitute the two bounds in Eqs (8) and (9) into Eq (7), we have

U^cO^UcFCKrmax(nr,nc)log(nr+nc)σKr(P~)ρσKr(Πr)nc,Kr.\displaystyle\|\hat{U}_{c}\hat{O}-U_{c}\|_{F}\leq C\frac{\sqrt{K_{r}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}}{\sigma_{K_{r}}(\tilde{P})\sqrt{\rho}\sigma_{K_{r}}(\Pi_{r})\sqrt{n_{c,K_{r}}}}. (10)

Let ς>0\varsigma>0 be a small quantity, by Lemma 2 in Joseph and Yu (2016), if

KcςUcU^cO^F(1nc,k+1nc,l)Bc(k,:)Bc(l,:)F,foreach1klKc,\displaystyle\frac{\sqrt{K_{c}}}{\varsigma}\|U_{c}-\hat{U}_{c}\hat{O}\|_{F}(\frac{1}{\sqrt{n_{c,k}}}+\frac{1}{\sqrt{n_{c,l}}})\leq\|B_{c}(k,:)-B_{c}(l,:)\|_{F},\mathrm{~{}for~{}each~{}}1\leq k\neq l\leq K_{c}, (11)

then the clustering error f^c=O(ς2)\hat{f}_{c}=O(\varsigma^{2}). Recall that we set δc=minklBc(k,:)Bc(l,:)F\delta_{c}=\mathrm{min}_{k\neq l}\|B_{c}(k,:)-B_{c}(l,:)\|_{F} to measure the minimum center separation of BcB_{c}. Setting ς=2δcKcnc,minUcU^cO^F\varsigma=\frac{2}{\delta_{c}}\sqrt{\frac{K_{c}}{n_{c,\mathrm{min}}}}\|U_{c}-\hat{U}_{c}\hat{O}\|_{F} makes Eq (11) hold for all 1klKc1\leq k\neq l\leq K_{c}. Then we have f^c=O(ς2)=O(KcUcU^cO^F2δc2nc,min)\hat{f}_{c}=O(\varsigma^{2})=O(\frac{K_{c}\|U_{c}-\hat{U}_{c}\hat{O}\|^{2}_{F}}{\delta^{2}_{c}n_{c,\mathrm{min}}}). By Eq (10), we have

f^c=O(KrKcmax(nr,nc)log(nr+nc)σKr2(P~)ρδc2σKr2(Πr)nc,Krnc,min).\displaystyle\hat{f}_{c}=O(\frac{K_{r}K_{c}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(\tilde{P})\rho\delta^{2}_{c}\sigma^{2}_{K_{r}}(\Pi_{r})n_{c,K_{r}}n_{c,\mathrm{min}}}).

Especially, when Kr=Kc=KK_{r}=K_{c}=K, δc2nc,max\delta_{c}\geq\sqrt{\frac{2}{n_{c,\mathrm{max}}}} under ONMnr,nc(Kr,Kc,P,Πr,Πc)ONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c}) by Lemma 3. When Kr=Kc=KK_{r}=K_{c}=K, we have

f^c=O(K2max(nr,nc)nc,maxlog(nr+nc)σK2(P~)ρσK2(Πr)nc,min2).\displaystyle\hat{f}_{c}=O(\frac{K^{2}\mathrm{max}(n_{r},n_{c})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(\tilde{P})\rho\sigma^{2}_{K}(\Pi_{r})n^{2}_{c,\mathrm{min}}}).

 

B.4 Proof of Corollary 7

Proof  For row nodes, under conditions of Corollary 7, we have

maxir[nr]eir(Π^rΠr𝒫r)1=O(ϖKrnrKr)=O(ϖKnr).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\varpi K_{r}\sqrt{\frac{n_{r}}{K_{r}}})=O(\varpi\sqrt{Kn_{r}}).

Under conditions of Corollary 7, κ(Ω)=O(1)\kappa(\Omega)=O(1) and μC\mu\leq C for some C>0C>0 by the proof of Corollary 1 Qing and Wang (2021a). Then, by Lemma 5, we have

ϖ\displaystyle\varpi =O(K(κ(Ω)max(nr,nc)μmin(nr,nc)+log(nr+nc))ρσK(P~)σK(Πr)nc,Kr)=O(K(Cmax(nr,nc)min(nr,nc)+log(nr+nc))ρσK(P~)σK(Πr)nc,min)\displaystyle=O(\frac{\sqrt{K}(\kappa(\Omega)\sqrt{\frac{\mathrm{max}(n_{r},n_{c})\mu}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\sqrt{\rho}\sigma_{K}(\tilde{P})\sigma_{K}(\Pi_{r})\sqrt{n_{c,K_{r}}}})=O(\frac{\sqrt{K}(\sqrt{\frac{C\mathrm{max}(n_{r},n_{c})}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\sqrt{\rho}\sigma_{K}(\tilde{P})\sigma_{K}(\Pi_{r})\sqrt{n_{c,\mathrm{min}}}})
=O(K1.5(Cmax(nr,nc)min(nr,nc)+log(nr+nc))σK(P~)ρnrnc),\displaystyle=O(\frac{K^{1.5}(\sqrt{\frac{C\mathrm{max}(n_{r},n_{c})}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\sigma_{K}(\tilde{P})\sqrt{\rho n_{r}n_{c}}}),

which gives that

maxir[nr]eir(Π^rΠr𝒫r)1=O(K2(Cmax(nr,nc)min(nr,nc)+log(nr+nc))σK(P~)ρnc).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\frac{K^{2}(\sqrt{\frac{C\mathrm{max}(n_{r},n_{c})}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\sigma_{K}(\tilde{P})\sqrt{\rho n_{c}}}).

Note that, when Kr<KcK_{r}<K_{c}, we can not draw a conclusion that μC\mu\leq C. Because, when Kr<KcK_{r}<K_{c}, the inverse of BcBcB_{c}B^{\prime}_{c} does not exist since BcKc×KrB_{c}\in\mathbb{R}^{K_{c}\times K_{r}}. Therefore, Lemma 8 of Qing and Wang (2021a) does not hold, and we can not obtain the upper bound of Uc2\|U_{c}\|_{2\rightarrow\infty}, causing the impossibility of obtaining the upper bound of μ\mu, and this is the reason that we only consider the case when Kr=KcK_{r}=K_{c} for row nodes here.

For column nodes, under conditions of Corollary 7, we have

f^c\displaystyle\hat{f}_{c} =O(KrKcmax(nr,nc)log(nr+nc)σKr2(P~)ρδc2σKr2(Πr)nc,Krnc,min)=O(KrKcmax(nr,nc)log(nr+nc)σKr2(P~)ρδc2(nr/Kr)(nc/Kc)(nc/Kc))\displaystyle=O(\frac{K_{r}K_{c}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(\tilde{P})\rho\delta^{2}_{c}\sigma^{2}_{K_{r}}(\Pi_{r})n_{c,K_{r}}n_{c,\mathrm{min}}})=O(\frac{K_{r}K_{c}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(\tilde{P})\rho\delta^{2}_{c}(n_{r}/K_{r})(n_{c}/K_{c})(n_{c}/K_{c})})
=O(Kr2Kc3max(nr,nc)log(nr+nc)σKr2(P~)ρδc2nrnc2).\displaystyle=O(\frac{K^{2}_{r}K^{3}_{c}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(\tilde{P})\rho\delta^{2}_{c}n_{r}n^{2}_{c}}).

For the special case Kr=Kc=KK_{r}=K_{c}=K, since nc,maxnc,min=O(1)\frac{n_{c,\mathrm{max}}}{n_{c,\mathrm{min}}}=O(1) when nc,min=O(ncK)n_{c,\mathrm{min}}=O(\frac{n_{c}}{K}), we have

f^c=O(K4max(nr,nc)log(nr+nc)σK2(P~)ρnrnc).\displaystyle\hat{f}_{c}=O(\frac{K^{4}\mathrm{max}(n_{r},n_{c})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(\tilde{P})\rho n_{r}n_{c}}).

When nr=O(n),nc=O(n),Kr=O(1)n_{r}=O(n),n_{c}=O(n),K_{r}=O(1) and Kc=O(1)K_{c}=O(1), the corollary follows immediately by basic algebra.  

C Proofs under ODCNM

C.1 Proof of Proposition 9

Proof  Since Ω=ΠrPΠcΘc=ΠˇrPˇΠˇcΘˇc=UrΛUc\Omega=\Pi_{r}P\Pi^{\prime}_{c}\Theta_{c}=\check{\Pi}_{r}\check{P}\check{\Pi}^{\prime}_{c}\check{\Theta}_{c}=U_{r}\Lambda U^{\prime}_{c}, we have Ur=ΠrUr(r,:)=ΠˇrUr(r,:)U_{r}=\Pi_{r}U_{r}(\mathcal{I}_{r},:)=\check{\Pi}_{r}U_{r}(\mathcal{I}_{r},:) by Lemma 11, which gives that Πr=Πˇr\Pi_{r}=\check{\Pi}_{r}. Since Uc,=ΠcBc=ΠcUc,(c,:)=ΠˇcUc,(c,:)U_{c,*}=\Pi_{c}B_{c}=\Pi_{c}U_{c,*}(\mathcal{I}_{c},:)=\check{\Pi}_{c}U_{c,*}(\mathcal{I}_{c},:) by Lemma 11, we have Πc=Πˇc\Pi_{c}=\check{\Pi}_{c}.  

C.2 Proof of Lemma 11

Proof 

  • For UrU_{r}: since Ω=UrΛUc\Omega=U_{r}\Lambda U^{\prime}_{c} and UcUc=IKrU^{\prime}_{c}U_{c}=I_{K_{r}}, we have Ur=ΩUcΛ1U_{r}=\Omega U_{c}\Lambda^{-1}. Recall that Ω=ΠrPΠcΘc\Omega=\Pi_{r}P\Pi^{\prime}_{c}\Theta_{c} under ODCNM, we have Ur=ΠrPΠcΘcUcΛ1=ΠrBrU_{r}=\Pi_{r}P\Pi^{\prime}_{c}\Theta_{c}U_{c}\Lambda^{-1}=\Pi_{r}B_{r}, where Br=PΠcΘcUcΛ1B_{r}=P\Pi^{\prime}_{c}\Theta_{c}U_{c}\Lambda^{-1}. Sure, Ur(ir,:)=Ur(i¯r,:)U_{r}(i_{r},:)=U_{r}(\bar{i}_{r},:) holds when Πr(ir,:)=Πr(i¯r,:)\Pi_{r}(i_{r},:)=\Pi_{r}(\bar{i}_{r},:) for ir,i¯r[nr]i_{r},\bar{i}_{r}\in[n_{r}].

  • For UcU_{c}: let DcD_{c} be a Kc×KcK_{c}\times K_{c} diagonal matrix such that Dc(k,k)=ΘcΠc(:,k)FθcFD_{c}(k,k)=\frac{\|\Theta_{c}\Pi_{c}(:,k)\|_{F}}{\|\theta_{c}\|_{F}} for k[Kc]k\in[K_{c}]. Let Γc\Gamma_{c} be an nc×Kcn_{c}\times K_{c} matrix such that Γc(:,k)=ΘcΠc(:,k)ΘcΠc(:,k)F\Gamma_{c}(:,k)=\frac{\Theta_{c}\Pi_{c}(:,k)}{\|\Theta_{c}\Pi_{c}(:,k)\|_{F}} for k[Kc]k\in[K_{c}]. For such DcD_{c} and Γc\Gamma_{c}, we have ΓcΓc=IKc\Gamma^{\prime}_{c}\Gamma_{c}=I_{K_{c}} and Ω=ΠrPθcFDcΓc\Omega=\Pi_{r}P\|\theta_{c}\|_{F}D_{c}\Gamma^{\prime}_{c}, i.e., ΘcΠc=θcFΓcDc\Theta_{c}\Pi_{c}=\|\theta_{c}\|_{F}\Gamma_{c}D_{c}.

    Since Ω=UrΛUc\Omega=U_{r}\Lambda U^{\prime}_{c} and UrUr=IKrU^{\prime}_{r}U_{r}=I_{K_{r}}, we have Uc=ΘcΠcPΠrUrΛ1U_{c}=\Theta_{c}\Pi_{c}P^{\prime}\Pi^{\prime}_{r}U_{r}\Lambda^{-1}. Since ΘcΠc=θcFΓcDc\Theta_{c}\Pi_{c}=\|\theta_{c}\|_{F}\Gamma_{c}D_{c}, we have Uc=ΓcθcFDcPΠrUrΛ1=ΓcVcU_{c}=\Gamma_{c}\|\theta_{c}\|_{F}D_{c}P^{\prime}\Pi^{\prime}_{r}U_{r}\Lambda^{-1}=\Gamma_{c}V_{c}, where we set Vc=θcFDcPΠrUrΛ1Kc×KrV_{c}=\|\theta_{c}\|_{F}D_{c}P^{\prime}\Pi^{\prime}_{r}U_{r}\Lambda^{-1}\in\mathbb{R}^{K_{c}\times K_{r}}. Note that since UcUc=IKr=VcΓcΓcVc=VcVcU^{\prime}_{c}U_{c}=I_{K_{r}}=V^{\prime}_{c}\Gamma^{\prime}_{c}\Gamma_{c}V_{c}=V^{\prime}_{c}V_{c}, we have VcVc=IKrV^{\prime}_{c}V_{c}=I_{K_{r}}. Now, for ic[nc],k[Kr]i_{c}\in[n_{c}],k\in[K_{r}], we have

    Uc(ic,k)\displaystyle U_{c}(i_{c},k) =eicUcek=eicΓcVcek=Γc(ic,:)Vcek\displaystyle=e^{\prime}_{i_{c}}U_{c}e_{k}=e^{\prime}_{i_{c}}\Gamma_{c}V_{c}e_{k}=\Gamma_{c}(i_{c},:)V_{c}e_{k}
    =θc(ic)[Πc(ic,1)ΘcΠc(:,1)FΠc(ic,2)ΘcΠc(:,2)FΠc(ic,Kc)ΘcΠc(:,Kc)F]Vcek\displaystyle=\theta_{c}(i_{c})[\frac{\Pi_{c}(i_{c},1)}{\|\Theta_{c}\Pi_{c}(:,1)\|_{F}}~{}~{}\frac{\Pi_{c}(i_{c},2)}{\|\Theta_{c}\Pi_{c}(:,2)\|_{F}}~{}~{}\ldots~{}~{}\frac{\Pi_{c}(i_{c},K_{c})}{\|\Theta_{c}\Pi_{c}(:,K_{c})\|_{F}}]V_{c}e_{k}
    =θc(ic)ΘcΠc(:,(ic))FVc((ic),k),\displaystyle=\frac{\theta_{c}(i_{c})}{\|\Theta_{c}\Pi_{c}(:,\ell(i_{c}))\|_{F}}V_{c}(\ell(i_{c}),k),

    which gives that

    Uc(ic,:)=θc(ic)ΘcΠc(:,(ic)F[Vc((ic),1)Vc((ic),2)Vc((ic),Kr)]=θc(ic)ΘcΠc(:,(ic)FVc((ic),:).\displaystyle U_{c}(i_{c},:)=\frac{\theta_{c}(i_{c})}{\|\Theta_{c}\Pi_{c}(:,\ell(i_{c})\|_{F}}[V_{c}(\ell(i_{c}),1)~{}~{}V_{c}(\ell(i_{c}),2)~{}~{}\ldots~{}~{}V_{c}(\ell(i_{c}),K_{r})]=\frac{\theta_{c}(i_{c})}{\|\Theta_{c}\Pi_{c}(:,\ell(i_{c})\|_{F}}V_{c}(\ell(i_{c}),:).

    Then we have

    Uc,(ic,:)=Vc((ic),:)Vc((ic),:)F.\displaystyle U_{c,*}(i_{c},:)=\frac{V_{c}(\ell(i_{c}),:)}{\|V_{c}(\ell(i_{c}),:)\|_{F}}. (12)

    Sure, we have Uc,(ic,:)=Uc,(i¯c,:)U_{c,*}(i_{c},:)=U_{c,*}(\bar{i}_{c},:) when (ic)=(i¯c)\ell(i_{c})=\ell(\bar{i}_{c}) for ic,i¯c[nc]i_{c},\bar{i}_{c}\in[n_{c}]. Let BcKc×KrB_{c}\in\mathbb{R}^{K_{c}\times K_{r}} such that Bc(l,:)=Vc(l,:)Vc(l,:)FB_{c}(l,:)=\frac{V_{c}(l,:)}{\|V_{c}(l,:)\|_{F}} for l[Kc]l\in[K_{c}]. Eq (12) gives Uc,=ΠcBcU_{c,*}=\Pi_{c}B_{c}, which guarantees the existence of BcB_{c}.

    Now we consider the case when Kr=Kc=KK_{r}=K_{c}=K. Since VcKc×KrV_{c}\in\mathbb{R}^{K_{c}\times K_{r}} and Uc=ΓcVcnc×KrU_{c}=\Gamma_{c}V_{c}\in\mathbb{R}^{n_{c}\times K_{r}}, we have VcK×KV_{c}\in\mathbb{R}^{K\times K} and rank(Vc)=K\mathrm{rank}(V_{c})=K. Since VcVc=IKrV^{\prime}_{c}V_{c}=I_{K_{r}}, we have VcVc=IKV^{\prime}_{c}V_{c}=I_{K} when Kr=Kc=KK_{r}=K_{c}=K. Then we have

    VcVc=IKVcVcVc=VcVc(VcVcIK)=0rank(Vc)=KVcVc=IK.\displaystyle V^{\prime}_{c}V_{c}=I_{K}\Rightarrow V^{\prime}_{c}V_{c}V^{\prime}_{c}=V^{\prime}_{c}\Rightarrow V^{\prime}_{c}(V_{c}V^{\prime}_{c}-I_{K})=0\overset{\mathrm{rank}(V_{c})=K}{\Rightarrow}V_{c}V^{\prime}_{c}=I_{K}. (13)

    Since VcVc=VcVc=IKV_{c}V^{\prime}_{c}=V^{\prime}_{c}V_{c}=I_{K}, we have Uc,(ic,:)=Vc((ic),:)U_{c,*}(i_{c},:)=V_{c}(\ell(i_{c}),:) by Eq (12), and Uc,(ic,:)Uc,(i¯c,:)F=Vc((ic),:)Vc((i¯c),:)F=2\|U_{c,*}(i_{c},:)-U_{c,*}(\bar{i}_{c},:)\|_{F}=\|V_{c}(\ell(i_{c}),:)-V_{c}(\ell(\bar{i}_{c}),:)\|_{F}=\sqrt{2} when (ic)(i¯c)\ell(i_{c})\neq\ell(\bar{i}_{c}) for ic,i¯c[nc]i_{c},\bar{i}_{c}\in[n_{c}], i.e., Bc(k,:)Bc(l,:)F=2\|B_{c}(k,:)-B_{c}(l,:)\|_{F}=\sqrt{2} for kl[K]k\neq l\in[K].

    Note that, when Kr<KcK_{r}<K_{c}, since rank(Vc)=Kr\mathrm{rank}(V_{c})=K_{r} and VcKc×KrV_{c}\in\mathbb{R}^{K_{c}\times K_{r}}, the inverse of VcV_{c} does not exist, which causes that the last equality in Eq (13) does not hold and Bc(k,:)Bc(,:)2\|B_{c}(k,:)-B_{c}(\ell,:)\|\neq\sqrt{2} for all kl[Kc]k\neq l\in[K_{c}].

 

C.3 Proof of Theorem 14

Proof  For row nodes, when conditions in Lemma 13 hold, by Theorem 2 of Qing and Wang (2021a), we have

maxir[nr]eir(Π^rΠr𝒫r)1=O(ϖκ(ΠrΠr)Krλ1(ΠrΠr)).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\varpi\kappa(\Pi^{\prime}_{r}\Pi_{r})K_{r}\sqrt{\lambda_{1}(\Pi^{\prime}_{r}\Pi_{r})}).

Next, we focus on column nodes. By the proof of Lemma 2.3 of Qing and Wang (2021b), there exists an orthogonal matrix O^\hat{O} such that

U^cO^UcF22KrAΩλKr(ΩΩ).\displaystyle\|\hat{U}_{c}\hat{O}-U_{c}\|_{F}\leq\frac{2\sqrt{2K_{r}}\|A-\Omega\|}{\sqrt{\lambda_{K_{r}}(\Omega^{\prime}\Omega)}}. (14)

Under ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}), by Lemma 4 of Qing (2021a), we have

λKr(ΩΩ)θc,minσKr(P)σKr(Πr)nc,Kr.\displaystyle\sqrt{\lambda_{K_{r}}(\Omega^{\prime}\Omega)}\geq\theta_{c,\mathrm{min}}\sigma_{K_{r}}(P)\sigma_{K_{r}}(\Pi_{r})\sqrt{n_{c,K_{r}}}. (15)

By Lemma 4.2 of Qing (2021a), when Assumption (12) holds, with probability at least 1o((nr+nc)α)1-o((n_{r}+n_{c})^{-\alpha}), we have

AΩ=O(max(θc,maxnr,θc1)log(nr+nc)).\displaystyle\|A-\Omega\|=O(\sqrt{\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\mathrm{log}(n_{r}+n_{c})}). (16)

Substitute the two bounds in Eqs (15) and (16) into Eq (14), we have

U^cO^UcFCKrmax(θc,maxnr,θc1)log(nr+nc)σKr(P)θc,minσKr(Πr)nc,Kr.\displaystyle\|\hat{U}_{c}\hat{O}-U_{c}\|_{F}\leq C\frac{\sqrt{K_{r}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\mathrm{log}(n_{r}+n_{c})}}{\sigma_{K_{r}}(P)\theta_{c,\mathrm{min}}\sigma_{K_{r}}(\Pi_{r})\sqrt{n_{c,K_{r}}}}. (17)

For ic[nc]i_{c}\in[n_{c}], by basic algebra, we have

U^c,(ic,:)O^Uc,(ic,:)F2U^c(ic,:)O^Uc(ic,:)FUc(ic,:)F.\displaystyle\|\hat{U}_{c,*}(i_{c},:)\hat{O}-U_{c,*}(i_{c},:)\|_{F}\leq\frac{2\|\hat{U}_{c}(i_{c},:)\hat{O}-U_{c}(i_{c},:)\|_{F}}{\|U_{c}(i_{c},:)\|_{F}}.

Set mc=min1icncUc(ic,:)Fm_{c}=\mathrm{min}_{1\leq i_{c}\leq n_{c}}\|U_{c}(i_{c},:)\|_{F}, we have

U^c,O^Uc,F=ic=1ncU^c,(ic,:)O^Uc,(ic,:)F22U^cO^UcFmc.\displaystyle\|\hat{U}_{c,*}\hat{O}-U_{c,*}\|_{F}=\sqrt{\sum_{i_{c}=1}^{n_{c}}\|\hat{U}_{c,*}(i_{c},:)\hat{O}-U_{c,*}(i_{c},:)\|^{2}_{F}}\leq\frac{2\|\hat{U}_{c}\hat{O}-U_{c}\|_{F}}{m_{c}}.

Next, we provide lower bounds of mcm_{c}. By the proof of Lemma 11, we have

Uc(ic,:)F\displaystyle\|U_{c}(i_{c},:)\|_{F} =θc(ic)ΘcΠc(:,(ic))FVc((ic),:)F=θc(ic)ΘcΠc(:,(ic))FVc((ic),:)F\displaystyle=\|\frac{\theta_{c}(i_{c})}{\|\Theta_{c}\Pi_{c}(:,\ell(i_{c}))\|_{F}}V_{c}(\ell(i_{c}),:)\|_{F}=\frac{\theta_{c}(i_{c})}{\|\Theta_{c}\Pi_{c}(:,\ell(i_{c}))\|_{F}}\|V_{c}(\ell(i_{c}),:)\|_{F}
=θc(ic)ΘcΠc(:,(ic))Fθc,minθc,maxnc,maxmVc,\displaystyle=\frac{\theta_{c}(i_{c})}{\|\Theta_{c}\Pi_{c}(:,\ell(i_{c}))\|_{F}}\geq\frac{\theta_{c,\mathrm{min}}}{\theta_{c,\mathrm{max}}\sqrt{n_{c,\mathrm{max}}}}m_{V_{c}},

where we set mVc=mink[Kc]Vc(k,:)Fm_{V_{c}}=\mathrm{min}_{k\in[K_{c}]}\|V_{c}(k,:)\|_{F}. Note that when Kr=Kc=KK_{r}=K_{c}=K, by the proof of Lemma 11, we know that VcVc=IKV_{c}V^{\prime}_{c}=I_{K}, which gives that Vc(k,:)F=1\|V_{c}(k,:)\|_{F}=1 for k[K]k\in[K], i.e., mVc=1m_{V_{c}}=1 when Kr=Kc=KK_{r}=K_{c}=K. However, when Kr<KcK_{r}<K_{c}, it is challenge to obtain a positive lower bound of mVcm_{V_{c}}. Hence, we have 1mcθc,maxnc,maxθc,minmVc\frac{1}{m_{c}}\leq\frac{\theta_{c,\mathrm{max}}\sqrt{n_{c,\mathrm{max}}}}{\theta_{c,\mathrm{min}}m_{V_{c}}}. Then, by Eq (17), we have

U^c,O^Uc,F=O(θc,maxKrmax(θc,maxnr,θc1)nc,maxlog(nr+nc)σKr(P)θc,min2mVcσKr(Πr)nc,Kr).\displaystyle\|\hat{U}_{c,*}\hat{O}-U_{c,*}\|_{F}=O(\frac{\theta_{c,\mathrm{max}}\sqrt{K_{r}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}}{\sigma_{K_{r}}(P)\theta^{2}_{c,\mathrm{min}}m_{V_{c}}\sigma_{K_{r}}(\Pi_{r})\sqrt{n_{c,K_{r}}}}).

Let ς>0\varsigma>0 be a small quantity, by Lemma 2 in Joseph and Yu (2016), if

KcςUc,U^c,O^F(1nc,k+1nc,l)Bc(k,:)Bc(l,:)F,foreach1klKc,\displaystyle\frac{\sqrt{K_{c}}}{\varsigma}\|U_{c,*}-\hat{U}_{c,*}\hat{O}\|_{F}(\frac{1}{\sqrt{n_{c,k}}}+\frac{1}{\sqrt{n_{c,l}}})\leq\|B_{c}(k,:)-B_{c}(l,:)\|_{F},\mathrm{~{}for~{}each~{}}1\leq k\neq l\leq K_{c}, (18)

then the clustering error f^c=O(ς2)\hat{f}_{c}=O(\varsigma^{2}). Setting ς=2δcKcnc,minUc,U^c,O^F\varsigma=\frac{2}{\delta_{c}}\sqrt{\frac{K_{c}}{n_{c,\mathrm{min}}}}\|U_{c,*}-\hat{U}_{c,*}\hat{O}\|_{F} makes Eq (18) hold for all 1klKc1\leq k\neq l\leq K_{c}. Then we have f^c=O(ς2)=O(KcUc,U^c,O^F2δc2nc,min)\hat{f}_{c}=O(\varsigma^{2})=O(\frac{K_{c}\|U_{c,*}-\hat{U}_{c,*}\hat{O}\|^{2}_{F}}{\delta^{2}_{c}n_{c,\mathrm{min}}}). By Eq (17), we have

f^c=O(θc,max2KrKcmax(θc,maxnr,θc1)nc,maxlog(nr+nc)σKr2(P)θc,min4δc2mVc2σKr2(Πr)nc,Krnc,min).\displaystyle\hat{f}_{c}=O(\frac{\theta^{2}_{c,\mathrm{max}}K_{r}K_{c}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(P)\theta^{4}_{c,\mathrm{min}}\delta^{2}_{c}m^{2}_{V_{c}}\sigma^{2}_{K_{r}}(\Pi_{r})n_{c,K_{r}}n_{c,\mathrm{min}}}).

Especially, when Kr=Kc=KK_{r}=K_{c}=K, δc=2\delta_{c}=\sqrt{2} under ODCNMnr,nc(Kr,Kc,P,Πr,Πc,Θc)ODCNM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{c}) by Lemma 11, and mVc=1m_{V_{c}}=1. When Kr=Kc=KK_{r}=K_{c}=K, we have

f^c=O(θc,max2K2max(θc,maxnr,θc1)nc,maxlog(nr+nc)σK2(P)θc,min4σK2(Πr)nc,min2).\displaystyle\hat{f}_{c}=O(\frac{\theta^{2}_{c,\mathrm{max}}K^{2}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(P)\theta^{4}_{c,\mathrm{min}}\sigma^{2}_{K}(\Pi_{r})n^{2}_{c,\mathrm{min}}}).

 

C.4 Proof of Corollary 15

Proof  For row nodes, under conditions of Corollary 15, we have

maxir[nr]eir(Π^rΠr𝒫r)1=O(ϖKrnrKr)=O(ϖKnr).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\varpi K_{r}\sqrt{\frac{n_{r}}{K_{r}}})=O(\varpi\sqrt{Kn_{r}}).

Under conditions of Corollary 15, κ(Ω)=O(1)\kappa(\Omega)=O(1) and μCθc,max2θc,min2C\mu\leq C\frac{\theta^{2}_{c,\mathrm{max}}}{\theta^{2}_{c,\mathrm{min}}}\leq C for some C>0C>0 by Lemma 2 of Qing (2021a). Then, by Lemma 13, we have

ϖ\displaystyle\varpi =O(θc,maxKr(κ(Ω)max(nr,nc)μmin(nr,nc)+log(nr+nc))θc,minσKr(P)σKr(Πr)nc,Kr)\displaystyle=O(\frac{\sqrt{\theta_{c,\mathrm{max}}K_{r}}(\kappa(\Omega)\sqrt{\frac{\mathrm{max}(n_{r},n_{c})\mu}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\theta_{c,\mathrm{min}}\sigma_{K_{r}}(P)\sigma_{K_{r}}(\Pi_{r})\sqrt{n_{c,K_{r}}}})
=O(θc,maxK(κ(Ω)max(nr,nc)μmin(nr,nc)+log(nr+nc))θc,minσK(P)σK(Πr)nc,min)\displaystyle=O(\frac{\sqrt{\theta_{c,\mathrm{max}}K}(\kappa(\Omega)\sqrt{\frac{\mathrm{max}(n_{r},n_{c})\mu}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\theta_{c,\mathrm{min}}\sigma_{K}(P)\sigma_{K}(\Pi_{r})\sqrt{n_{c,\mathrm{min}}}})
=O(K1.5θc,max(Cmax(nr,nc)min(nr,nc)+log(nr+nc))θc,minσK(P)nrnc),\displaystyle=O(\frac{K^{1.5}\sqrt{\theta_{c,\mathrm{max}}}(\sqrt{\frac{C\mathrm{max}(n_{r},n_{c})}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\theta_{c,\mathrm{min}}\sigma_{K}(P)\sqrt{n_{r}n_{c}}}),

which gives that

maxir[nr]eir(Π^rΠr𝒫r)1=O(K2θc,max(Cmax(nr,nc)min(nr,nc)+log(nr+nc))θc,minσK(P)nc).\displaystyle\mathrm{max}_{i_{r}\in[n_{r}]}\|e^{\prime}_{i_{r}}(\hat{\Pi}_{r}-\Pi_{r}\mathcal{P}_{r})\|_{1}=O(\frac{K^{2}\sqrt{\theta_{c,\mathrm{max}}}(\sqrt{\frac{C\mathrm{max}(n_{r},n_{c})}{\mathrm{min}(n_{r},n_{c})}}+\sqrt{\mathrm{log}(n_{r}+n_{c})})}{\theta_{c,\mathrm{min}}\sigma_{K}(P)\sqrt{n_{c}}}).

The reason that we do not consider the case when Kr<KcK_{r}<K_{c} for row nodes is similar as that of Corollary 7, and we omit it here.

For column nodes, under conditions of Corollary 15, we have

f^c\displaystyle\hat{f}_{c} =O(θc,max2KrKcmax(θc,maxnr,θc1)nc,maxlog(nr+nc)σKr2(P)θc,min4δc2mVc2σKr2(Πr)nc,Krnc,min)\displaystyle=O(\frac{\theta^{2}_{c,\mathrm{max}}K_{r}K_{c}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(P)\theta^{4}_{c,\mathrm{min}}\delta^{2}_{c}m^{2}_{V_{c}}\sigma^{2}_{K_{r}}(\Pi_{r})n_{c,K_{r}}n_{c,\mathrm{min}}})
=O(θc,max2Kr2Kc2max(θc,maxnr,θc1)log(nr+nc)σKr2(P)θc,min4δc2mVc2nrnc).\displaystyle=O(\frac{\theta^{2}_{c,\mathrm{max}}K^{2}_{r}K^{2}_{c}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K_{r}}(P)\theta^{4}_{c,\mathrm{min}}\delta^{2}_{c}m^{2}_{V_{c}}n_{r}n_{c}}).

For the case Kr=Kc=KK_{r}=K_{c}=K, we have

f^c\displaystyle\hat{f}_{c} =O(θc,max2K2max(θc,maxnr,θc1)nc,maxlog(nr+nc)σK2(P)θc,min4σK2(Πr)nc,min2)\displaystyle=O(\frac{\theta^{2}_{c,\mathrm{max}}K^{2}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})n_{c,\mathrm{max}}\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(P)\theta^{4}_{c,\mathrm{min}}\sigma^{2}_{K}(\Pi_{r})n^{2}_{c,\mathrm{min}}})
=O(θc,max2K4max(θc,maxnr,θc1)log(nr+nc)σK2(P)θc,min4nrnc).\displaystyle=O(\frac{\theta^{2}_{c,\mathrm{max}}K^{4}\mathrm{max}(\theta_{c,\mathrm{max}}n_{r},\|\theta_{c}\|_{1})\mathrm{log}(n_{r}+n_{c})}{\sigma^{2}_{K}(P)\theta^{4}_{c,\mathrm{min}}n_{r}n_{c}}).

When nr=O(n),nc=O(n),Kr=O(1)n_{r}=O(n),n_{c}=O(n),K_{r}=O(1) and Kc=O(1)K_{c}=O(1), the corollary follows immediately by basic algebra.  

D The degee-corrected overlapping and nonoverlapping model

Here, we extend ONM by introducing degree heterogeneities for row nodes with overlapping property in the directed network 𝒩\mathcal{N}. Let θr\theta_{r} be an nr×1n_{r}\times 1 vector whose iri_{r}-th entry is the degree heterogeneity of row node iri_{r}, for ir[nr]i_{r}\in[n_{r}]. Let Θr\Theta_{r} be an nr×nrn_{r}\times n_{r} diagonal matrix whose iri_{r}-th diagonal element is θr(ir)\theta_{r}(i_{r}). The extended model for generating AA is as follows:

Ω:=ΘrΠrPΠc,A(ir,ic)Bernoulli(Ω(ir,ic))forir[nr],ic[nc].\displaystyle\Omega:=\Theta_{r}\Pi_{r}P\Pi^{\prime}_{c},~{}~{}~{}A(i_{r},i_{c})\sim\mathrm{Bernoulli}(\Omega(i_{r},i_{c}))\qquad\mathrm{for~{}}i_{r}\in[n_{r}],i_{c}\in[n_{c}]. (19)
Definition 17

Call model (1), (2), (3),(4), (19) the Degree-Corrected Overlapping and Nonoverlapping model (DCONM) and denote it by DCONMnr,nc(Kr,Kc,P,Πr,Πc,Θr)DCONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{r}).

The following conditions are sufficient for the identifiability of DCONM:

  • (II1) rank(P)=Kr,rank(Πr)=Kr,rank(Πc)=Kc\mathrm{rank}(P)=K_{r},\mathrm{rank}(\Pi_{r})=K_{r},\mathrm{rank}(\Pi_{c})=K_{c}, and P(k,k)=1P(k,k)=1 for k[Kr]k\in[K_{r}].

  • (II2) There is at least one pure row node for each of the KrK_{r} row communities.

For degree-corrected overlapping models, it is popular to require that PP has unit-“diagonal” elements for model identifiability, see the model identifiability requirements on the DCMM model of Jin et al. (2017) and the OCCAM model of Zhang et al. (2020). Follow similar proof as that of Lemma 3, we have the following lemma.

Lemma 18

Under DCONMnr,nc(Kr,Kc,P,Πr,Πc,Θr)DCONM_{n_{r},n_{c}}(K_{r},K_{c},P,\Pi_{r},\Pi_{c},\Theta_{r}), there exist an unique Kr×KrK_{r}\times K_{r} matrix BrB_{r} and an unique Kc×KrK_{c}\times K_{r} matrix BcB_{c} such that

  • Ur=ΘrΠrBrU_{r}=\Theta_{r}\Pi_{r}B_{r} where Br=Θr1(r,r)Ur(r,:)B_{r}=\Theta^{-1}_{r}(\mathcal{I}_{r},\mathcal{I}_{r})U_{r}(\mathcal{I}_{r},:).

  • Uc=ΠcBcU_{c}=\Pi_{c}B_{c}. Meanwhile, Uc(ic,:)=Uc(i¯c,:)U_{c}(i_{c},:)=U_{c}(\bar{i}_{c},:) when (ic)=(i¯c)\ell(i_{c})=\ell(\bar{i}_{c}) for ic,i¯c[nc]i_{c},\bar{i}_{c}\in[n_{c}], i.e., UcU_{c} has KcK_{c} distinct rows. Furthermore, when Kr=Kc=KK_{r}=K_{c}=K, we have Bc(k,:)Bc(l,:)F=1nc,k+1nc,l\|B_{c}(k,:)-B_{c}(l,:)\|_{F}=\sqrt{\frac{1}{n_{c,k}}+\frac{1}{n_{c,l}}} for all 1k<lK1\leq k<l\leq K.

The following proposition guarantees the identifiability of DCONM.

Proposition 19

If conditions (II1) and (II2) hold, DCONM is identifiable: For eligible (P,Πr,Πc,Θr)(P,\Pi_{r},\Pi_{c},\Theta_{r}) and (Pˇ,Πˇr,Πˇc,Θˇr)(\check{P},\check{\Pi}_{r},\check{\Pi}_{c},\check{\Theta}_{r}), if ΘrΠrPΠc=ΘˇrΠˇrPˇΠˇc\Theta_{r}\Pi_{r}P\Pi^{\prime}_{c}=\check{\Theta}_{r}\check{\Pi}_{r}\check{P}\check{\Pi}^{\prime}_{c}, then P=Pˇ,Πr=Πˇr,Πc=Πˇc,Θr=ΘˇrP=\check{P},\Pi_{r}=\check{\Pi}_{r},\Pi_{c}=\check{\Pi}_{c},\Theta_{r}=\check{\Theta}_{r}.

Proof  By Lemma 18, since Uc=ΠcBc=ΠcUc(c,:)=ΠˇcUc(c,:)U_{c}=\Pi_{c}B_{c}=\Pi_{c}U_{c}(\mathcal{I}_{c},:)=\check{\Pi}_{c}U_{c}(\mathcal{I}_{c},:), we have Πc=Πˇc\Pi_{c}=\check{\Pi}_{c}. Since Ω(r,c)=Θr(r,r)Πr(r,:)PΠc(c,:)=Θr(r,r)P=Ur(r,:)ΛUc(c,:)\Omega(\mathcal{I}_{r},\mathcal{I}_{c})=\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})\Pi_{r}(\mathcal{I}_{r},:)P\Pi^{\prime}_{c}(\mathcal{I}_{c},:)=\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})P=U_{r}(\mathcal{I}_{r},:)\Lambda U^{\prime}_{c}(\mathcal{I}_{c},:), we have Θr(r,r)=diag(Ur(r,:)ΛUc(c,:))\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})=\mathrm{diag}(U_{r}(\mathcal{I}_{r},:)\Lambda U^{\prime}_{c}(\mathcal{I}_{c},:)) by the condition that P(k,k)=1P(k,k)=1 for k[Kr]k\in[K_{r}]. Therefore, we also have Θˇr(r,r)=diag(Ur(r,:)ΛUc(c,:))\check{\Theta}_{r}(\mathcal{I}_{r},\mathcal{I}_{r})=\mathrm{diag}(U_{r}(\mathcal{I}_{r},:)\Lambda U^{\prime}_{c}(\mathcal{I}_{c},:)), which gives that Θr(r,r)=Θˇr(r,r)\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})=\check{\Theta}_{r}(\mathcal{I}_{r},\mathcal{I}_{r}). Since Θˇr(r,r)Pˇ=Ur(r,:)ΛUc(c,:)=Θr(r,r)P\check{\Theta}_{r}(\mathcal{I}_{r},\mathcal{I}_{r})\check{P}=U_{r}(\mathcal{I}_{r},:)\Lambda U^{\prime}_{c}(\mathcal{I}_{c},:)=\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})P, we have P=PˇP=\check{P}. By Lemma 18, since Ur=ΘrΠrΘr1(r,r)Ur(r,:)=ΘˇrΠˇrΘˇr1(r,r)Ur(r,:)=ΘˇrΠˇrΘr1(r,r)Ur(r,:)U_{r}=\Theta_{r}\Pi_{r}\Theta^{-1}_{r}(\mathcal{I}_{r},\mathcal{I}_{r})U_{r}(\mathcal{I}_{r},:)=\check{\Theta}_{r}\check{\Pi}_{r}\check{\Theta}^{-1}_{r}(\mathcal{I}_{r},\mathcal{I}_{r})U_{r}(\mathcal{I}_{r},:)=\check{\Theta}_{r}\check{\Pi}_{r}\Theta^{-1}_{r}(\mathcal{I}_{r},\mathcal{I}_{r})U_{r}(\mathcal{I}_{r},:) and Ur(r,:)U_{r}(\mathcal{I}_{r},:) is an nonsingular matrix, we have ΘrΠr=ΘˇrΠˇr\Theta_{r}\Pi_{r}=\check{\Theta}_{r}\check{\Pi}_{r}. Since Πr(ir,:)1=Πˇr(ir,:)1=1\|\Pi_{r}(i_{r},:)\|_{1}=\|\check{\Pi}_{r}(i_{r},:)\|_{1}=1 for ir[nr]i_{r}\in[n_{r}], we have Πr=Πˇr\Pi_{r}=\check{\Pi}_{r} and Θr=Θˇr\Theta_{r}=\check{\Theta}_{r}.

Remark 20

(The reason that we do not introduce a model as an extension of ONM by considering degree heterogeneities for both row and column nodes) Suppose we propose an extension model (call it nontrivial-extension-of-ONM, and ne-ONM for short) of ONM such that 𝔼[A]=Ω=ΘrΠrPΠcΘc\mathbb{E}[A]=\Omega=\Theta_{r}\Pi_{r}P\Pi^{\prime}_{c}\Theta_{c}. For model identifiability, we see that if ne-ONM is identifiable, the following should holds: when Ω=ΘrΠrPΠcΘc=ΘˇrΠˇrPˇΠˇcΘˇc\Omega=\Theta_{r}\Pi_{r}P\Pi^{\prime}_{c}\Theta_{c}=\check{\Theta}_{r}\check{\Pi}_{r}\check{P}\check{\Pi}^{\prime}_{c}\check{\Theta}_{c}, we have Θr=Θˇr,Πr=Πˇr,P=Pˇ,Πc=Πˇc\Theta_{r}=\check{\Theta}_{r},\Pi_{r}=\check{\Pi}_{r},P=\check{P},\Pi_{c}=\check{\Pi}_{c} and Θc=Θˇc\Theta_{c}=\check{\Theta}_{c}. Now we check the identifiability of ne-ONM. Follow proof of Lemma 19, since Ω(r,c)=Θr(r,r)Πr(r,:)PΠc(c,:)Θc(c,c)=Θr(r,r)PΘc(c,c)=Ur(r,:)ΛUc(c,:)\Omega(\mathcal{I}_{r},\mathcal{I}_{c})=\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})\Pi_{r}(\mathcal{I}_{r},:)P\Pi^{\prime}_{c}(\mathcal{I}_{c},:)\Theta_{c}(\mathcal{I}_{c},\mathcal{I}_{c})=\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})P\Theta_{c}(\mathcal{I}_{c},\mathcal{I}_{c})=U_{r}(\mathcal{I}_{r},:)\Lambda U^{\prime}_{c}(\mathcal{I}_{c},:), we have Θr(r,r)P=Ur(r,:)ΛUc(c,:)Θc1(c,c)\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})P=U_{r}(\mathcal{I}_{r},:)\Lambda U^{\prime}_{c}(\mathcal{I}_{c},:)\Theta^{-1}_{c}(\mathcal{I}_{c},\mathcal{I}_{c}). If we assume that P(k,k)=1P(k,k)=1 for k[Kr]k\in[K_{r}], we have Θr(r,r)=diag(Ur(r,:)ΛUc(c,:)Θc1(c,c))\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r})=\mathrm{diag}(U_{r}(\mathcal{I}_{r},:)\Lambda U^{\prime}_{c}(\mathcal{I}_{c},:)\Theta^{-1}_{c}(\mathcal{I}_{c},\mathcal{I}_{c})). Similarly, we have Θˇr(r,r)=diag(Ur(r,:)ΛUc(c,:)Θˇc1(c,c))\check{\Theta}_{r}(\mathcal{I}_{r},\mathcal{I}_{r})=\mathrm{diag}(U_{r}(\mathcal{I}_{r},:)\Lambda U^{\prime}_{c}(\mathcal{I}_{c},:)\check{\Theta}^{-1}_{c}(\mathcal{I}_{c},\mathcal{I}_{c})), and it is impossible to guarantee the uniqueness of Θr(r,r)\Theta_{r}(\mathcal{I}_{r},\mathcal{I}_{r}) such that Θˇr(r,r)\check{\Theta}_{r}(\mathcal{I}_{r},\mathcal{I}_{r}) unless we further assume that Θc(c,c)\Theta_{c}(\mathcal{I}_{c},\mathcal{I}_{c}) is a fixed matrix. However, when we fix Θc(c,c)\Theta_{c}(\mathcal{I}_{c},\mathcal{I}_{c}) such that ne-ONM is identifiable, ne-ONM is nontrivial due to the fact Θc(c,c)\Theta_{c}(\mathcal{I}_{c},\mathcal{I}_{c}) is fixed. And ne-ONM is trivial only when we set Θc(c,c)=IKc\Theta_{c}(\mathcal{I}_{c},\mathcal{I}_{c})=I_{K_{c}}, however, for such ne-ONM when Θc(c,c)=IKc\Theta_{c}(\mathcal{I}_{c},\mathcal{I}_{c})=I_{K_{c}}, ne-ONM is DCONM actually. The above analysis proposes the reason that why we do not extend ONM by considering Θr\Theta_{r} and Θc\Theta_{c} simultaneously.

Follow similar idea as Qing (2021a), we can design spectral algorithm with consistent estimation to fit DCONM. Compared with ONM and ODCNM, the identifiability requirement of DCONM on PP is too strict such that DCONM only model directed network generated from PP with diagonal “unit” elements, and this is the reason we do not provide DCONM in the main text and propose further algorithmic study as well as theoretical study for it. .  

References

  • Airoldi et al. (2008) Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9:1981–2014, 2008.
  • Airoldi et al. (2013) Edoardo M. Airoldi, Xiaopei Wang, and Xiaodong Lin. Multi-way blockmodels for analyzing coordinated high-dimensional responses. The Annals of Applied Statistics, 7(4):2431–2457, 2013.
  • Erdos and Rényi (2011) P. Erdos and A. Rényi. On the evolution of random graphs. pages 38–82, 2011.
  • Gillis and Vavasis (2015) Nicolas Gillis and Stephen A. Vavasis. Semidefinite programming based preconditioning for more robust near-separable nonnegative matrix factorization. SIAM Journal on Optimization, 25(1):677–698, 2015.
  • Holland et al. (1983) Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
  • Jin (2015) Jiashun Jin. Fast community detection by SCORE. Annals of Statistics, 43(1):57–89, 2015.
  • Jin et al. (2017) Jiashun Jin, Zheng Tracy Ke, and Shengming Luo. Estimating network memberships by simplex vertex hunting. arXiv: Methodology, 2017.
  • Joseph and Yu (2016) Antony Joseph and Bin Yu. Impact of regularization on spectral clustering. Annals of Statistics, 44(4):1765–1791, 2016.
  • Karrer and Newman (2011) Brian Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83(1):16107, 2011.
  • Lei and Rinaldo (2015) Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. Annals of Statistics, 43(1):215–237, 2015.
  • Mao et al. (2018) Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. Overlapping clustering models, and one (class) svm to bind them all. In Advances in Neural Information Processing Systems, volume 31, pages 2126–2136, 2018.
  • Mao et al. (2020) Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association, pages 1–13, 2020.
  • Qin and Rohe (2013) Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochastic blockmodel. Advances in Neural Information Processing Systems 26, pages 3120–3128, 2013.
  • Qing (2021a) Huan Qing. Directed degree corrected mixed membership model and estimating community memberships in directed networks. arXiv preprint arXiv:2109.07826, 2021a.
  • Qing (2021b) Huan Qing. A useful criterion on studying consistent estimation in community detection. arXiv preprint arXiv:2109.14950, 2021b.
  • Qing and Wang (2021a) Huan Qing and Jingli Wang. Directed mixed membership stochastic blockmodel. arXiv preprint arXiv:2101.02307, 2021a.
  • Qing and Wang (2021b) Huan Qing and Jingli Wang. Consistency of spectral clustering for directed network community detection. arXiv preprint arXiv:2109.10319, 2021b.
  • Rohe et al. (2011) Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional stochastic blockmodel. Annals of Statistics, 39(4):1878–1915, 2011.
  • Rohe et al. (2016) Karl Rohe, Tai Qin, and Bin Yu. Co-clustering directed graphs to discover asymmetries and directional communities. Proceedings of the National Academy of Sciences of the United States of America, 113(45):12679–12684, 2016.
  • Wang et al. (2020) Zhe. Wang, Yingbin. Liang, and Pengsheng. Ji. Spectral algorithms for community detection in directed networks. Journal of Machine Learning Research, 21:1–45, 2020.
  • Zhang et al. (2020) Yuan Zhang, Elizaveta Levina, and Ji Zhu. Detecting overlapping communities in networks using spectral methods. SIAM Journal on Mathematics of Data Science, 2(2):265–283, 2020.
  • Zhou and A.Amini (2019) Zhixin Zhou and Arash A.Amini. Analysis of spectral clustering algorithms for community detection: the general bipartite setting. Journal of Machine Learning Research, 20(47):1–47, 2019.