This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Review on Determining the Number of Communities in Network Data

Zhengyuan Du and Jason Cui
Abstract

This paper reviews statistical methods for hypothesis testing and clustering in network models. We analyze the method by Bickel et al. (2016) for deriving the asymptotic null distribution of the largest eigenvalue, noting its slow convergence and the need for bootstrap corrections. The SCORE method by Jin et al. (2015) and the NCV method by Chen et al. (2018) are evaluated for their efficacy in clustering within Degree-Corrected Block Models, with NCV facing challenges due to its time-intensive nature. We suggest exploring eigenvector entry distributions as a potential efficiency improvement.

1 Introduction

Network-structured data and network analysis are garnering increasing attention across various fields. Research in this area often focuses on understanding the structure of network data, with significant implications for social sciences, biology, and statistics. The practical applications of this research profoundly impact our daily lives in multiple ways. For example, search engines utilize discoveries and tools from this field to analyze the relationships among various keywords. See Kolaczyk and Csárdi, (2014), Hevey, (2018), Sun and Han, (2013), and Berkowitz, (2013) for further reading.

Community detection is of major interest of network analysis. Given an nn-node (undirected) graph (𝒩,E)(\mathcal{N},E), where 𝒩={1,2,n}\mathcal{N}=\{1,2,...n\} is the set of nodes and EE is the set of edges. We assume that 𝒩\mathcal{N} could be partitioned into KK disjoint subsets or ”communities”. The community structure is represented by a vector g=(g1,,gn)g=(g_{1},...,g_{n}) with gi{1,,K}g_{i}\in\{1,...,K\} being the community that node ii belongs to. Nodes tend to have more common characteristics in the same communities. Various algorithms have been proposed to partition these nodes. However, most of them must fixed KK as a priori. How to determine KK priori is still an open problem.

Bickel and Sarkar, (2016) proposed a testing statistics based on the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix. However they only have theoretical results for testing null hypothesis of H0:K=1H_{0}:K=1. For testing H0:K=jH_{0}:K=j when j>1j>1, they need to iteratively split the network into small sub-networks. It seems that their testing statistics does not work well on the later one. Compared to stochastic block model (SBM), degree corrected block model (DCBM) proposed by Karrer and Newman (2011) is more flexible. Jin, (2015) proposed a so called spectral clustering on ratios-of-eigenvectors (SCORE) method to detect communities for DCBM. The idea is by clustering on entry-wise ratio of eigenvectors of the adjacency matrix. Taking the ratio can eliminate the effect of degree heterogeneity. Further Ji et al., (2016) apply SCORE to a practical problem on detecting potential communities in coauthorship and citation networks for statisticians. However, there are few works on the hypothesis testing of KK targetedly for DCBM. Instead of performing hypothesis testing, Chen and Lei, (2018) proposed a specially designed cross-validation method to figure out KK. In the following, we are going to give a brief review and duplicate the numerical results of some of the aforementioned papers.

2 Bickel and Sarkar, (2016)

In an SBM with nn nodes and KK communities, Let AA be the adjacency matrix. and g=(g1,,gn)g=(g_{1},...,g_{n}) with gi{1,,K}g_{i}\in\{1,...,K\} being the community that node ii belongs to. Given the membership vector gg, each edge Aij(i<j)A_{ij}(i<j) is an independent Bernoulli variable satisfying

P(Aij=1)=1P(Aij=0)=Bgigj,P\left(A_{ij}=1\right)=1-P\left(A_{ij}=0\right)=B_{g_{i}g_{j}}, (2.1)

where BB is a K×KK\times K symmetric matrix representing the community-wise edge probabilities.

Bickel and Sarkar, (2016) proposed a statistics on testing problem with null hypothesis H0:K=1H_{0}:K=1 based on some properties of Erdős-Renýi graph. In their paper, they assume that the number of clusters KK and the edge probabilities are constant, whereas the number of nodes nn is growing to \infty. Thus the average degree is growing linearly with nn. In addition Cragg and Donald, (1993), Cragg and Donald, (1997), Cui et al., 2024b , Cui et al., 2024a Du, (2025) and Cui et al., (2023) discussed about rank inference of a matrix, which can be a generalization work of Bickel and Sarkar, (2016).

Noticing that Erdős-Renýi graph can be viewed as a special case of SBM when there is only one community and one community-wise edge probability p:=B11p:=B_{11}. We can estimate it within OP(1/n)O_{P}(1/n) error by computing the proportion of pairs of nodes that forms an edge, denoted by p^\widehat{p}.

Let P^:=np^𝐞𝐞Tp^I\widehat{P}:=n\widehat{p}\mathbf{e}\mathbf{e}^{\mathrm{T}}-\widehat{p}I, which is an estimate for the mean of adjacency matrix. Bickel and Sarkar, (2016) proved the following theorem:

Theorem 1.

(Bickel and Sarkar, (2016)) Let

A~:=AP^{(n1)p^(1p^)}\tilde{A}^{\prime}:=\frac{A-\widehat{P}}{\sqrt{\{(n-1)\widehat{p}(1-\widehat{p})\}}} (2.2)

We have the following asymptotic distribution of our test statistic θ\theta:

θ:=n2/3{λ1(A~)2}dTW1\theta:=n^{2/3}\left\{\lambda_{1}\left(\tilde{A}^{\prime}\right)-2\right\}\stackrel{{\scriptstyle\mathrm{d}}}{{\rightarrow}}\mathrm{TW}_{1} (2.3)

where TW1\mathrm{TW}_{1} denotes the Tracy–Widom law with index 1. This is also the limiting law of the largest eigenvalue of GOEs.

Example 1. Here we performed a simulation study to see the size performance of the testing statistics. We construct a Erdős-Renýi graph with n=500n=500. pp ranges from 0.1 to 0.9, increasing with 0.1 in each step.

Refer to caption
Figure 1: Empirical size of θ\theta under 500 replications at significant level 0.05. The horizontal solid line represents significant level 0.05.

We can see from the simulation study that they cannot control the size. The author proposed a small sample correction on their testing statistics. That is due to the low convergence rate of the eigenvalues, whereas the largest eigenvalues of GOE matrices converge to the Tracy–Widom distribution quite quickly, those of adjacency matrices do not. Based on that, they proposed a small sample correction which suggests that computing the pp-value by using the empirical distribution of λ1\lambda_{1} generated by using a parametric bootstrap step. The main idea is to compute the mean and the variance of the distributions from a few simulations, and then shift and scale the test statistic to match the first two moments of the limiting TW1 law. Table 1 shows the whole algorithm for perfoming the hypothesis proposed in Bickel and Sarkar, (2016).

Table 1: Algorithm for hypothesis test in Bickel and Sarkar, (2016) with correction

Step. 1: p^=ΣijAij/{n(n1)}\widehat{p}=\Sigma_{ij}A_{ij}/\{n(n-1)\} Step. 2: θn2/3(λ1[(AP^)/{(n1)p^(1p^)}]2)\left.\theta\leftarrow n^{2/3}\left(\lambda_{1}[(A-\widehat{P})/\sqrt{\{}(n-1)\widehat{p}(1-\widehat{p})\}\right]-2\right) Step. 3: μTWETW1[X]σTWvarTW(X)\begin{array}[]{l}{\mu_{\mathrm{TW}}\leftarrow E_{\mathrm{TW}_{1}}[X]}\\ {\sigma_{\mathrm{TW}}\leftarrow\sqrt{\operatorname{var}_{\mathrm{TW}}(X)}}\end{array}  for i=1,.50 do\text{ for }i=1,\ldots.50\text{ do } Step 4: Ai Erdós-Renyi (n,p¯)θin2/3[λ1(AP^)//{(n1)p^(1p^)}2]\begin{array}[]{l}{A_{i}\leftarrow\text{ Erdós-Renyi }(n,\bar{p})}\\ {\theta_{i}\leftarrow n^{2/3}\left[\lambda_{1}(A-\widehat{P})/\sqrt{/}\{(n-1)\widehat{p}(1-\widehat{p})\}-2\right]}\end{array} Step 4: μ^n,p^mean({θi})σ^n,p^ standard deviation ({θi})\begin{array}[]{l}{\widehat{\mu}_{n,\widehat{p}}\leftarrow\operatorname{mean}\left(\left\{\theta_{i}\right\}\right)}\\ {\widehat{\sigma}_{n,\widehat{p}}\leftarrow\text{ standard deviation }\left(\left\{\theta_{i}\right\}\right)}\end{array} Step 5: θμTW+{(θμ^n,p^)/σ^n,p^}σTW\theta^{\prime}\leftarrow\mu_{\mathrm{TW}}+\left\{\left(\theta-\widehat{\mu}_{n,\widehat{p}}\right)/\widehat{\sigma}_{n,\widehat{p}}\right\}\sigma_{\mathrm{TW}} Step 6:  pval PTW1(X>θ)\text{ pval }\leftarrow P_{\mathrm{TW}_{1}}\left(X>\theta^{\prime}\right)

3 Spectral Clustering Method for DCBM

In this section we want to look at a spectral clustering method for and DCBM, the SCORE method proposed by Jin, (2015).

In a DCBM, given membership vector gg and community-wise connectivity matrix BB, Let ψ\psi and Ψ\Psi be the n×1n\times 1 vector and the n×nn\times n diagonal matrix defined as follows:

ψ=(ψ1,ψ2,,ψn),Ψ(i,i)=ψ(i),1in.\psi=(\psi_{1},\psi_{2},\ldots,\psi_{n})^{\prime},\quad\Psi(i,i)=\psi(i),\quad 1\leq i\leq n.

The presence of an edge between nodes ii and jj is represented by a Bernoulli random variable AijA_{ij} with

P(Aij=1)=1P(Aij=0)=ψiψjBgigjP\left(A_{ij}=1\right)=1-P\left(A_{ij}=0\right)=\psi_{i}\psi_{j}B_{g_{i}g_{j}}

The main difference between DCBM and SBM is the appearance of the degree heterogeneity parameter ψi>0\psi_{i}>0. ψi\psi_{i} represents the individual activeness of node ii. The idea of that is that some individuals in one group could be more outgoing than others, therefore edge probabilities are different for different individuals.

While the traditional spectral clustering method of SBM is simply by performing existing clustering methods on the eigenvectors of adjacency matrix, detecting communities with the DCBM is not an easy problem, where the main challenge lies in the degree heterogeneity.

The main observation of SCORE is that the effect of degree heterogeneity is largely an- cillary, and can be effectively removed by taking entry-wise ratios between eigenvectors. Let’s firstly consider the simple case when there are only two groups. Specifically, let

P(Aij=1)=ψiψj{a,gi=gj=1,c,gi=gj=2,b, otherwise P(A_{ij}=1)=\psi_{i}\psi_{j}\left\{\begin{array}[]{ll}{a,}&{g_{i}=g_{j}=1,}\\ {c,}&{g_{i}=g_{j}=2,}\\ {b,}&{\text{ otherwise }}\end{array}\right.

and denote Ω=E(A)\Omega=E(A) Theorem 1 is

Lemma 1.

(Jin, (2015)) If acb2ac\neq b^{2}, then Ω\Omega has two simple nonzero eigenvalues

12ψ2(ad12+cd22±(ad12cd22)2+4b2d12d22)\frac{1}{2}\|\psi\|^{2}\left(ad_{1}^{2}+cd_{2}^{2}\pm\sqrt{\left(ad_{1}^{2}-cd_{2}^{2}\right)^{2}+4b^{2}d_{1}^{2}d_{2}^{2}}\right)

and the associated eigenvectors η1\eta_{1} and η2\eta_{2} (with possible nonunit norms) are

Ψ(bd22𝟏1+12[cd22ad12±(ad12cd22)2+4b2d12d22]𝟏2)\Psi\left(bd_{2}^{2}\cdot\mathbf{1}_{1}+\frac{1}{2}\left[cd_{2}^{2}-ad_{1}^{2}\pm\sqrt{\left(ad_{1}^{2}-cd_{2}^{2}\right)^{2}+4b^{2}d_{1}^{2}d_{2}^{2}}\right]\cdot\mathbf{1}_{2}\right)

The key observation is that if we let rr be the n×1n\times 1 vector of the coordinate- wise ratios between η1\eta_{1} and η2\eta_{2}. ri=η2i/η2η1i/η1,1inr_{i}=\dfrac{\eta_{2i}/\left\|\eta_{2}\right\|}{\eta_{1i}/\left\|\eta_{1}\right\|},\quad 1\leq i\leq n. Then rr is independent from the degree heterogeneity parameter ψ\psi.

Similar phenomenon also exists when there are K>2K>2 groups.

Based on that, the algorithm of clustering on DCBM is proposed as following:

  • Let η^1\widehat{\eta}_{1} , η^2\widehat{\eta}_{2} ,…, η^K\widehat{\eta}_{K} be KK unit-norm eigenvectors of AA associated with the largest K eigenvalues (in magnitude), respectively.

  • Let R^\widehat{R} be the n×K1n\times K-1 vector of coordinate-wise ratios: R^(i,k)=η^k+1,i/η^1,i,1kK1,1in\widehat{R}(i,k)=\widehat{\eta}_{k+1,i}/\widehat{\eta}_{1,i},\quad 1\leq k\leq K-1,1\leq i\leq n

  • Clustering the labels by applying the k-means method to the vector R^\widehat{R}, assuming there are K\leq K communities in total.

Example 2. Here I performed one simulation study to show the intuition of SCORE. We take K=1K=1 and K=2K=2 respectively to construct networks. For both we take n=1000n=1000 and generate 𝝍\bm{\psi} by log(ψi)i.i.d.N(0,0.2),1in,\text{log}(\psi_{i})\overset{i.i.d.}{\sim}N(0,0.2),1\leq i\leq n, then normalize ψ\psi by ψ=0.9ψ/ψmax.\psi=0.9\cdot\psi/\psi_{\text{max}}. For K=2K=2, 250 nodes are in different cluster with others. B11=B22=1,B12=0.5.B_{11}=B_{22}=1,B_{12}=0.5. The coordinate-wise plot is reported in Figure 2. Here when K=1K=1, the R^\widehat{R} is given by computing the coordinate ratio of η1\eta_{1} and η2\eta_{2}. The patterns between K=1K=1 and K=2K=2 have significantly difference. If K=1K=1, each coordinate of R^\widehat{R} will gather around one constant as we showed before, while it will on two hierarchies rather than one when K=2K=2.

Refer to caption
Refer to caption
Figure 2: The vector R^\widehat{R}, Left:the case when K=1K=1. Right: the case when K=2K=2

4 Chen and Lei, (2018)

Chen and Lei, (2018) proposed a network cross-validation (NCV) approach to determine the number of communities for both SBM and DCBM. It can automatically determine the number of communities based on a block-wise node-pair splitting technique. The main idea is that firstly, we randomly split the nodes into VV equal-sized subsets {𝒩~v:1vV}\left\{\tilde{\mathcal{N}}_{v}:1\leq v\leq V\right\}, and then split the adjacency matrix correspondingly into V×VV\times V equal sized blocks.

A=(A~(uv):1u,vV)A=\left(\tilde{A}^{(uv)}:1\leq u,v\leq V\right) (4.1)

where A~(uv)\tilde{A}^{(uv)} is the submatrix of AA with rows in N~u\tilde{N}_{u} and columns in 𝒩~v\tilde{\mathcal{N}}_{v}. All the blocks are splitted into fitting sets and validation sets. We can estimate model parameters (g^(v),B^(v))\left(\widehat{g}^{(v)},\widehat{B}^{(v)}\right) using the rectangular submatrix as fitting set obtained by removing the rows of AA in subset N~v\tilde{N}_{v}.

A~(v)=(A~(rs):rv,1r,sV).\tilde{A}^{(-v)}=\left(\tilde{A}^{(rs)}:r\neq v,1\leq r,s\leq V\right). (4.2)

Then A~(vv)\tilde{A}^{(vv)} is treated as validation sets and used to calculate the predictive loss. At last, KK is selected when the loss is minimized. In general, cross-validation methods are insensitive to the number of folds. The same intuition empirically holds true for the proposed NCV method. Here the author suggest to use V=3V=3.

Take V=2V=2 for example. We can rearrange adjacency matrix in a collapsed 2×22\times 2 block form

A=(A(11)A(12)A(21)A(22))A=\left(\begin{array}[]{ll}{A^{(11)}}&{A^{(12)}}\\ {A^{(21)}}&{A^{(22)}}\end{array}\right)

Such a splitting puts node pairs in A(11)A^{(11)} and A(12)A^{(12)} as the fitting sample and those in A(22)A^{(22)} as the validating sample. The advantages of that kind of splitting are from three aspects. Firstly, the fitting set carries full information about the network model parameters. We can consistently estimate the membership of all the nodes as well as the community-wise edge probability matrix, using only data in the fitting set. Secondly, given the community membership, the data in the fitting set and in the testing set are independent. Thirdly, it is different from cross validation methods for network data based on a node splitting technique. In the node splitting method, the nodes are usually split into a fitting set and a testing set. Therefore, one typically assumes that the node memberships are generated independently with prior probability π=(π1,π2,,πK)\pi=(\pi_{1},\pi_{2},\cdots,\pi_{K}). However, it has several drawbacks. The main drawback is that calculating the full likelihood in terms of the prior probability in the presence of a missing membership vector gg is computationally demanding.

Algorithm details of their NCV method is shown as following:

  • Step 1: Block-wise node-pair splitting as shown in (4.1)(\ref{eq1})

  • Step 2: Estimating model parameters from the fitting set.

    Take SBM for example.

     1. Let U^ be the n×d matrix consisting of the top d right singular vectors of A(1) 2. Output g^ by applying the k -means clustering with K~ clusters to the rows of U^.\begin{array}[]{l}{\text{ 1. Let }\widehat{U}\text{ be the }n\times d\text{ matrix consisting of the top }d\text{ right singular vectors of }A^{(1)}}\\ {\text{ 2. Output }\widehat{g}\text{ by applying the }k\text{ -means clustering with }\widetilde{K}\text{ clusters to the rows of\ }\widehat{U}.}\end{array}

    Once g^\widehat{g} is obtained, let 𝒩j,k\mathcal{N}_{j,k} be the nodes in jj with estimated membership kk,and ni,k=|𝒩i,k|(j=1,2,1kK~)n_{i,k}=\left|\mathcal{N}_{i,k}\right|(j=1,2,1\leq k\leq\tilde{K}).Wecan estimate BB using a simple plug-in estimator:

    B^k,k={i𝒩1,k,j𝒩1,kN2,kAijn1,k(n1,k++n2,k),kki,j𝒩1,k,i<jAij+i𝒩1,k,j𝒩2,kAij(n1,k1)n1,k/2+n1,kn2,k,k=k\widehat{B}_{k,k^{\prime}}=\left\{\begin{array}[]{ll}{\frac{\sum_{i\in\mathcal{N}_{1,k},j\in\mathcal{N}_{1,k^{\prime}}\cup N_{2,k^{\prime}}}A_{ij}}{n_{1,k}\left(n_{1,k^{+}}+n_{2,k^{\prime}}\right)},}&{k\neq k^{\prime}}\\ {\frac{\sum_{i,j\in\mathcal{N}_{1,k},i<j}A_{ij}+\sum_{i\in\mathcal{N}_{1,k},j\in\mathcal{N}_{2,k}A_{ij}}}{\left(n_{1,k}-1\right)n_{1,k}/2+n_{1,k}n_{2,k}},}&{k=k^{\prime}}\end{array}\right.
  • Step 3: Validation using the testing set. Consider the validated predictive loss L^(A,K~)=i,j𝒩2,ij(Aij,P^ij)\text{Consider the validated predictive loss }\widehat{L}(A,\widetilde{K})=\sum_{i,j\in\mathcal{N}_{2},i\neq j}\ell\left(A_{ij},\widehat{P}_{ij}\right), with  negative log-likelihood (x,p)=xlogp(1x)log(1p)\text{ negative log-likelihood }\ell(x,p)=-x\log p-(1-x)\log(1-p) or squared error (x,p)=(xp)2\ell(x,p)=(x-p)^{2}.

Algorithm of dealing with DCBM is similar. The only difference is that in step 2, we need to estimate the parameters (g,B,ψ)(g,B,\psi) under the framework of DCBM.

Example 3. I duplicate one of the simulation result in this paper. In the simulation study, the community-wise edge probability matrix B=rB0B=rB_{0}, where the diagonal entries of B0B_{0} are 3 and off-diagonal entries are 1. n=1000,K=2n=1000,K=2. The sparsity levels are chosen at r{0.01,0.02,0.05,0.1,0.2}r\in\{0.01,0.02,0.05,0.1,0.2\}. Size of the first community is set to be n1n_{1} and size of second community is nn1n-n_{1}. Figure 3 is the simulation result from 200 simulated data sets.

Refer to caption
Refer to caption
Figure 3: Simulation Comparison. Left: my simulation result. Right: simulaiton result from Chen and Lei, (2018)

We can see that there is slight difference between my simulation result and that from the paper. The main difference is that when n1=200n_{1}=200 and sparsity level is 0.050.05, I only observed a correct percentage of 0.12 but the author observed the The difference might come from the different choice of clustring method in Step 2 in the algorithm.

The author also gave theoretical properties for their NCV under SBM. They firstly introduce two notions of community recovery consistency. One is exactly consistent recovery. Given a sequence of SBMs with KK blocks parameterized by (g(n),B(n))\left(g^{(n)},B^{(n)}\right), we call a community recovery method g^\widehat{g} exactly consistent if P(g^(A,K))=g(n))1P(\widehat{g}(A,K))=\left.g^{(n)}\right)\rightarrow 1 where AA is a realization of SBM (g(n),B(n))\left(g^{(n)},B^{(n)}\right) and the equality is up to a possible label permutation. The other is called approximately consistent recovery. For a sequence of SBMs with KK blocks parameterized by (g(n),B(n))\left(g^{(n)},B^{(n)}\right) and a sequence ηn=o(1)\eta_{n}=o(1), we say g^\widehat{g} is approximately consistent with rate ηn\eta_{n} if,

limtP[Ham(g^(A,K),g)ηnn]=0,\lim_{t\rightarrow\infty}P\left[\operatorname{Ham}(\widehat{g}(A,K),g)\geq\eta_{n}n\right]=0,

where Ham(g^,g\widehat{g},g) is the smallest Hamming distance between g^\widehat{g} and gg among all possible label permutations.

NCV is proved to be approximately consistent with rate (nρn)1(n\rho_{n})^{-1}, with ρnn/log(n)>C\rho_{n}n/log(n)>C for some constant CC. See Theorem 2 in Chen and Lei, (2018). Therefore it also has community recovery consistency property.

5 Conclusion

In this paper, we first examine the statistical approach proposed by Bickel and Sarkar, (2016) for hypothesis testing under the null hypothesis H0:K=1H_{0}:K=1. They derived the asymptotic null distribution of the largest eigenvalue of a suitably scaled and centered adjacency matrix. However, this method exhibits a slow convergence rate, necessitating bootstrap corrections in practical applications. Subsequently, we explore the SCORE method introduced by Jin, (2015), which is designed for clustering on the Degree-Corrected Block Model (DCBM). This method effectively mitigates the impact of degree heterogeneity by utilizing the coordinate ratio of the eigenvectors of the adjacency matrix.Additionally, Chen and Lei, (2018) introduced the Network Cross-Validation (NCV) method to automate the selection of KK. This method demonstrates robust performance on both the Stochastic Block Model (SBM) and DCBM, offering exact consistency. However, its primary drawback is its time-intensive nature, a challenge common to all methods based on cross-validation. Optimizing this aspect could involve identifying potential patterns in the distribution of each element or the entry-wise ratio of the eigenvectors, which might streamline the process.

References

  • Berkowitz, (2013) Berkowitz, S. D. (2013). An introduction to structural analysis: The network approach to social research. Elsevier.
  • Bickel and Sarkar, (2016) Bickel, P. J. and Sarkar, P. (2016). Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(1):253–273.
  • Chen and Lei, (2018) Chen, K. and Lei, J. (2018). Network cross-validation for determining the number of communities in network data. Journal of the American Statistical Association, 113(521):241–251.
  • Cragg and Donald, (1993) Cragg, J. G. and Donald, S. G. (1993). Testing identifiability and specification in instrumental variable models. Econometric theory, 9(2):222–240.
  • Cragg and Donald, (1997) Cragg, J. G. and Donald, S. G. (1997). Inferring the rank of a matrix. Journal of econometrics, 76(1-2):223–250.
  • (6) Cui, S., Guo, X., Li, R., Yang, S., and Zhang, Z. (2024a). Double-estimation-friendly inference for high dimensional misspecified measurement error models. arXiv preprint arXiv:2409.16463.
  • (7) Cui, S., Li, D., Li, R., and Xue, L. (2024b). Hypothesis testing for high-dimensional matrix-valued data. arXiv preprint arXiv:2412.07987.
  • Cui et al., (2023) Cui, S., Sudjianto, A., Zhang, A., and Li, R. (2023). Enhancing robustness of gradient-boosted decision trees through one-hot encoding and regularization. arXiv preprint arXiv:2304.13761.
  • Du, (2025) Du, K. (2025). A short note of comparison between convex and non-convex penalized likelihood. arXiv preprint arXiv:2502.07655.
  • Hevey, (2018) Hevey, D. (2018). Network analysis: a brief overview and tutorial. Health psychology and behavioral medicine, 6(1):301–328.
  • Ji et al., (2016) Ji, P., Jin, J., et al. (2016). Coauthorship and citation networks for statisticians. The Annals of Applied Statistics, 10(4):1779–1812.
  • Jin, (2015) Jin, J. (2015). Fast community detection by score. The Annals of Statistics, 43(1):57–89.
  • Kolaczyk and Csárdi, (2014) Kolaczyk, E. D. and Csárdi, G. (2014). Statistical analysis of network data with R, volume 65. Springer.
  • Sun and Han, (2013) Sun, Y. and Han, J. (2013). Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD explorations newsletter, 14(2):20–28.