This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Lost in the Shuffle: Testing Power in the Presence of Errorful Network Vertex Labels

Ayushi Saxena Department of Mathematics, University of Maryland Vince Lyzinski Department of Mathematics, University of Maryland
Abstract

Two-sample network hypothesis testing is an important inference task with applications across diverse fields such as medicine, neuroscience, and sociology. Many of these testing methodologies operate under the implicit assumption that the vertex correspondence across networks is a priori known. This assumption is often untrue, and the power of the subsequent test can degrade when there are misaligned/label-shuffled vertices across networks. This power loss due to shuffling is theoretically explored in the context of random dot product and stochastic block model networks for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices. The loss in testing power is further reinforced by numerous simulations and experiments, both in the stochastic block model and in the random dot product graph model, where the power loss across multiple recently proposed tests in the literature is considered. Lastly, the impact that shuffling can have in real-data testing is demonstrated in a pair of examples from neuroscience and from social network analysis.

1 Introduction

Interest in graph and network-valued data has soared in the last decades [1], as networks have become a common data type for modeling complex dependencies and interactions in many fields of study, ranging from neuroscience [2, 3, 4] to sociology [5, 6] to biochemistry [7, 8], among others. As network data has become more commonplace, a number of statistical tools tailored for handling network data have been developed [9, 10], including methods for hypothesis testing [11, 12, 13], goodness-of-fit analysis [14, 15, 16], clustering [17, 18, 19, 20, 21, 22], and classification [23, 24, 25, 26], among others.

While early work on network inference often considered a single network as the data set, there has been a relatively recent proliferation of datasets that consist of multiple networks on the same set of vertices (which we will referred to as the “paired” setting) and multiple networks on differing vertex sets (the “unpaired” setting). In the paired setting, example datasets include the DTMRI and FMRI connectome networks considered in [11, 27, 28], and the paired social networks from Twitter (now X) in [29] and from Facebook [30, 31] to name a few. In the unpaired setting, example datasets include the social networks that partially share a user base from [32, 33] and the friendship networks in [34]. When considering multiple paired networks for a single inference task, these methods often make an implicit assumption that the graphs are “vertex-aligned,” i.e., that there is an a priori known, true, one–to–one correspondence across the vertex labels of the graphs. Often, this is not the case in practice (see the voluminous literature on network alignment and graph matching [35, 36, 37]), as node alignments may be obscured by numerous factors, including, for example, different usernames across social networks [33], unknown correspondence of neurons across brain hemispheres [38], and misalignments/misregistrations to a common brain atlas (to create connectomes) as discussed in [39]. Moreover, these errors across the observed vertex labels can have a dramatically detrimental impact on subsequent inferential performance [40].

An important inference task in the multiple network setting is two-sample network hypothesis testing [41, 42]. Two-sample network testing has been used, for example, to compare neurological properties (captured via patient connectomes) of populations of patients along various demographic characteristics like age, sex, and the presence/absence of a neurological disorder (indeed, this connectomic testing example will serve as motivation for us in Section 1.3). Among the work in this area, numerous methods exist for (semi)parametric hypothesis testing across network samples, where one of the parameters being leveraged is the correspondence of labels across networks; see for example [11, 43, 44, 45]. There are also nonparametric methods that estimate and compare network distributions, and ignore the information contained in the vertex labels; see, for example, [12, 46].

This paper considers an amalgam of the above settings in which there is signal to be leveraged in one portion of the vertex labels, and uncertainty across the remaining labels. Our principle task is then two-fold. First, we seek to understand how this label uncertainty impacts testing power both theoretically (Section 2) and in practice (Sections 35). Second, we seek to better understand how to mitigate the impact of this uncertainty via a graph matching preprocessing step that aligns the graphs before testing (Section 6). Before formally defining the shuffled testing problem, we will first set here some of the notational conventions that will appear throughout the paper. Note also that all necessary code to reproduce the figures in the paper can be found at https://www.math.umd.edu/~vlyzinsk/Shuffled_testing/.

1.1 Notation

Given an undirected graph nn-vertex graph GG (all graphs considered will be undirected graphs with no self-loops), we let [n]:={1,2,,n}[n]:=\{1,2,...,n\} denote the vertices of GG. The adjacency matrix A {0,1}n×n\in\{0,1\}^{n\times n} of GG is given by: Aij=𝟙{{i,j}E(G)}A_{ij}=\mathds{1}\{\{i,j\}\in E(G)\} for all i,j[n]i,j\in[n]. We note here that we will refer to a graph and its adjacency matrix interchangeably, as these objects (in the setting we consider herein) encode equivalent information. We denote the ithi^{th} row of any matrix M with the notation MiM_{i} (or via 𝐌n[i,:]\mathbf{M}_{n}[i,:] if the subscript is needed to denote explicit dependence on nn).

We define the usual Frobenius norm F\|\cdot\|_{F} of a matrix A via

AF=(i,j=1nAij2)1/2.\|\textbf{A}\|_{F}=(\sum_{i,j=1}^{n}A_{ij}^{2})^{1/2}.

For positive integers dd and nn, we will define the set of orthogonal matrices in d×d\mathbb{R}^{d\times d} via 𝒪d\mathcal{O}_{d} and the set of n×nn\times n permutation matrices via Πn\Pi_{n}. We indicate a matrix of all ones of size d×dd\times d by 𝐉d\mathbf{J}_{d}, and 𝐉n,d\mathbf{J}_{n,d} is the matrix of all one of size n×dn\times d. Similarly, we denote the corresponding matrices of all 0’s by 0d={0}d×d\textbf{0}_{d}=\{0\}^{d\times d} and 0n,d={0}n×d.\textbf{0}_{n,d}=\{0\}^{n\times d}. Lastly, the direct sum of two matrices A and B is denoted by AB\textbf{A}\oplus\textbf{B}.

For functions f,g:00f,g:\mathbb{Z}_{\geq 0}\mapsto\mathbb{R}_{\geq 0}, we use here the standard asymptotic notations:

f=O(g)\displaystyle f=O(g) if C>0, and n00 s.t. f(n)Cg(n) for nn0;\displaystyle\text{ if }\exists\,C>0,\text{ and }n_{0}\in\mathbb{Z}_{\geq 0}\text{ s.t. }f(n)\leq Cg(n)\text{ for }n\geq n_{0};
f=Ω(g)\displaystyle f=\Omega(g) if C>0, and n00 s.t. Cg(n)f(n) for nn0;\displaystyle\text{ if }\exists\,C>0,\text{ and }n_{0}\in\mathbb{Z}_{\geq 0}\text{ s.t. }Cg(n)\leq f(n)\text{ for }n\geq n_{0};
f=Θ(g)\displaystyle f=\Theta(g) if f=Ω(g), and f=O(g);fg if limnf(n)/g(n)=1;\displaystyle\text{ if }f=\Omega(g),\text{ and }f=O(g);\,\,f\sim g\text{ if }\lim_{n\rightarrow\infty}f(n)/g(n)=1;
f=o(g)\displaystyle f=o(g) if limnf(n)/g(n)=0;f=ω(g) if limng(n)/f(n)=0.\displaystyle\text{ if }\lim_{n\rightarrow\infty}f(n)/g(n)=0;\,f=\omega(g)\text{ if }\lim_{n\rightarrow\infty}g(n)/f(n)=0.

Note that when f=o(g)f=o(g) (resp., f=ω(g)f=\omega(g) and f=Θ(g)f=\Theta(g)) when gg is a complicated function of nn, we will often write fgf\ll g (resp., fgf\gg g and fgf\approx g) to ease notation.

1.2 Shuffled testing

To formalize our partially aligned graph setting, suppose we have two networks A1\textbf{A}_{1} and A2\textbf{A}_{2} on a common vertex set V(A1)=V(A2)=[n]V(\textbf{A}_{1})=V(\textbf{A}_{2})=[n], and that the label correspondence across networks is known for nkn-k of the vertices; denote the set of these vertices via Mn,kM_{n,k}. We further assume that the user has knowledge of which vertices are in Mn,kM_{n,k}. This, for example, could be the result of graph matching algorithms that provide a measure of certainty for the validity of each matched vertex (see, for example, the soft matching approach of [47] or the vertex nomination work of [48, 49]). The veracity of the correspondence across the remaining kk vertices not in Mn,kM_{n,k} (which we shall denote via Un,k:=[n]Mn,kU_{n,k}:=[n]\setminus M_{n,k}), is assumed unknown a priori. This may be due, for example, to algorithmic uncertainty in aligning nodes across networks or noise in the data. We then have that there exists an unknown (where Πn\Pi_{n} is the set of n×nn\times n permutation matrices)

Q~Πn,k:={QΠn s.t. iUn,kQiik;jMn,kQjj=nk}\widetilde{\textbf{Q}}\in\Pi_{n,k}:=\{\textbf{Q}\in\Pi_{n}\text{ s.t. }\!\!\!\sum_{i\in U_{n,k}}\!\!\!Q_{ii}\leq k;\,\!\!\!\sum_{j\in M_{n,k}}\!\!\!Q_{jj}=n-k\}

such that the practitioner observes A1\textbf{A}_{1} and B2=Q~A2Q~T\textbf{B}_{2}=\widetilde{\textbf{Q}}\textbf{A}_{2}\widetilde{\textbf{Q}}^{T}. Given the above framework, it is natural to consider the following semiparametric adaptation of the traditional parametric tests. From A1\textbf{A}_{1} and B2\textbf{B}_{2}, the user seeks to test if the distribution of A1\textbf{A}_{1} is different than from that of A2\textbf{A}_{2}, i.e., to test the following hypotheses (where (Ai)\mathcal{L}(\textbf{A}_{i}) denotes the distribution (law) of Ai\textbf{A}_{i}), H0:(A1)=(A2)H_{0}:{\mathcal{L}}(\textbf{A}_{1})=\mathcal{L}(\textbf{A}_{2}) versus H1:(A1)(A2).H_{1}:\mathcal{L}(\textbf{A}_{1})\neq\mathcal{L}(\textbf{A}_{2}). In this work, we will be considering testing within the family of (conditionally) edge-independent graph models, so that (Ai)\mathcal{L}(\textbf{A}_{i}) is completely determined by Pi=𝔼(Ai)\textbf{P}_{i}=\mathbb{E}(\textbf{A}_{i}). Focusing our attention first on a simple Frobenius-norm based hypothesis test (later considering the spectral embedding based tests of [45, 11, 12]), we reject H0H_{0} if

T=T(A1,B2):=P^1P^2,Q~F2T=T(\textbf{A}_{1},\textbf{B}_{2}):=\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\widetilde{\textbf{Q}}}\|_{F}^{2}

is suitably large; here P^1\widehat{\textbf{P}}_{1} is an estimate of P1\textbf{P}_{1} obtained from A1\textbf{A}_{1}, and P^2,Q~\widehat{\textbf{P}}_{2,\widetilde{\textbf{Q}}} an estimate of Q~P2Q~T=𝔼(B2)\widetilde{\textbf{Q}}\textbf{P}_{2}\widetilde{\textbf{Q}}^{T}=\mathbb{E}(\textbf{B}_{2}) (this will be formalized later in Section 1.5; for intuition and experimental validation on why the test statistic using P^\widehat{\textbf{P}} is preferable to the adjacency-based test using A1B2F2\|\textbf{A}_{1}-\textbf{B}_{2}\|_{F}^{2} as its statistic, see Section 3). To account for the uncertainty in the labeling of B2\textbf{B}_{2}, for each α>0\alpha>0 and QΠn,k\textbf{Q}\in\Pi_{n,k}, define cα,Q>0c_{\alpha,\textbf{Q}}>0 to be the smallest value such that

H0(P^1𝐐P^2𝐐TFcα,Q)α.\mathbb{P}_{H_{0}}(\|\widehat{\textbf{P}}_{1}-\mathbf{Q}\widehat{\textbf{P}}_{2}\mathbf{Q}^{T}\|_{F}\geq c_{\alpha,\textbf{Q}})\leq\alpha.

As we do not know which element of Πn,k\Pi_{n,k} yields the shuffling from A2\textbf{A}_{2} to B2\textbf{B}_{2}, a valid (conservative) level-α\alpha test using the Frobenius norm test statistic would reject H0H_{0} if

T(A1,B2)>maxQΠn,kcα,Q.T(\textbf{A}_{1},\textbf{B}_{2})>\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}.

The price of this validity is a loss in testing power against any fixed alternative, especially in the scenario where Q~\widetilde{\textbf{Q}} (the true, but unknown, shuffling) shuffles fewer than kk vertices in A2\textbf{A}_{2}. In this case the conservative test is over-correcting for the uncertainty in Un,kU_{n,k}. The question we seek to answer is how much testing power is lost in this shuffle, and how robust the adaptations of different testing methods (i.e., different TT test statistics) are to this shuffling.

Note that our choice of Frobenius norm for the test statistic is natural here in light of the metric’s extensive use for network comparison; see, e.g., its use in the estimation, testing, and matching literatures [50, 51, 40].

1.3 Motivating example: DTMRI connectome testing

To motivate the shuffled testing problem further, we first present the following real-data example. We consider the test/retest connectomic dataset from [52] processed via the algorithmic pipeline at [53] (note that this data is available for download at http://www.cis.jhu.edu/~parky/Microsoft/JHU-MSR/ZMx2/BNU1/DS01216-xyz.zip). This dataset represents human connectomes derived from DTMRI scans, where there are multiple (i.e., test/retest) scans per each of the 57 individuals in the study. We consider three such scans, yielding connectomes A1\textbf{A}_{1} and A2\textbf{A}_{2} and A3\textbf{A}_{3}. Here, A1\textbf{A}_{1} and A2\textbf{A}_{2} represent test/retest scans from one subject (subject 1) and A3\textbf{A}_{3} a scan from a different subject (subject 2 scan 1). In each scan, vertices represent voxel regions of the brain with edges denoting whether a neuronal fiber bundle connects the two regions or not (so that the graphs are binary and undirected). Considering only vertices common to the three connectomes, we are left with three graphs each with n=1085n=1085 vertices.

For k>0k>0, let QΠn,k\textbf{Q}\in\Pi_{n,k} be an unknown permutation. Observing A1\textbf{A}_{1} and A2\textbf{A}_{2} and B3=QA3QT\textbf{B}_{3}=\textbf{Q}\textbf{A}_{3}\textbf{Q}^{T} (rather than A1,A2,\textbf{A}_{1},\textbf{A}_{2}, and A3\textbf{A}_{3}) we seek to test whether B3\textbf{B}_{3} is a connectome drawn from the same person as A1\textbf{A}_{1} and A2\textbf{A}_{2} or from a different person (i.e., under the reasonable assumption that (A1)=(A2)\mathcal{L}(\textbf{A}_{1})=\mathcal{L}(\textbf{A}_{2}), we seek to test whether A3\textbf{A}_{3} is from this same distribution). Ideally, we would then construct our test statistic as

T(Ai,Aj)=P^iP^jF\displaystyle T(\textbf{A}_{i},\textbf{A}_{j})=\|\widehat{\textbf{P}}_{i}-\widehat{\textbf{P}}_{j}\|_{F} (1)

where P^i\widehat{\textbf{P}}_{i} is the estimate of P1=𝔼(A1)=𝔼(A2)=P2\textbf{P}_{1}=\mathbb{E}(\textbf{A}_{1})=\mathbb{E}(\textbf{A}_{2})=\textbf{P}_{2} or P3=𝔼(A3)\textbf{P}_{3}=\mathbb{E}(\textbf{A}_{3}) derived from Ai\textbf{A}_{i} as in Section 1.5.

Incorporating the unknown shuffling of Un,kU_{n,k} in B3\textbf{B}_{3} is tricky here, as for moderate kk it is computationally infeasible to compute cα,𝐐c_{\alpha,\mathbf{Q}} for all 𝐐Πn,k\mathbf{Q}\in\Pi_{n,k} (where we recall that cα,𝐐c_{\alpha,\mathbf{Q}} is the smallest value such that (P^1𝐐P^2𝐐TF2cα,Q)α;\mathbb{P}(\|\widehat{\textbf{P}}_{1}-\mathbf{Q}\widehat{\textbf{P}}_{2}\mathbf{Q}^{T}\|_{F}^{2}\geq c_{\alpha,\textbf{Q}})\leq\alpha; indeed, the order of Πn,k\Pi_{n,k} is k!k! given we know the vertices in Mn,kM_{n,k}), and so it is difficult to compute the conservative critical value maxQΠn,kcα,𝐐\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\mathbf{Q}}. Here, the task of finding the worse-case shuffling in Πn,k\Pi_{n,k} can be cast as finding an optimal matching between one graph and the complement of the second, which we suspect is computationally intractable. To proceed forward, then, we consider the following modification of the overall testing regime: We consider a fixed (randomly chosen) sequence of nested permutations QkΠn,k\textbf{Q}_{k}\in\Pi_{n,k} for k=0, 50, 150, 200, 250, 350k=0,\,50,\,150,\,200,\,250,\,350 and consider shuffling A2\textbf{A}_{2} by Qk\textbf{Q}_{k} and A3\textbf{A}_{3} by Q\textbf{Q}_{\ell} for all k\ell\leq k. We then repeat this process nMC=100nMC=100 times, each time obtaining an estimate (via bootstrapping as outlined below) of testing power against H0H_{0}. This is done here out of computational necessity, and although the test does not achieve level-α\alpha here, this will nonetheless be sufficient to demonstrate the dramatic loss in testing performance due to shuffling.

Refer to caption
Figure 1: Results for 200 bootstrapped samples of the test statistic in Eq. 1 at approximate level α=0.05\alpha=0.05. The xx-axis represents the number of vertices shuffled via Q\textbf{Q}_{\ell} (from 0 to kk) while the curve colors represent the maximum number of vertices potentially shuffled via Πk,n\Pi_{k,n}, here all shuffled by Qk\textbf{Q}_{k}. The left panel displays testing power, the right type-I error; results are averaged over 100 Monte Carlo iterates (error bars are ±2\pm 2s.e.).

In order to estimate the testing power here, we rely on the bootstrapping heuristic inspired by [45, 51] for latent position networks. Informally, we will model A1\textbf{A}_{1}, A2\textbf{A}_{2}, and A3\textbf{A}_{3} as instantiations of the Random Dot Product Graph (RDPG) modeling framework of [54].

Definition 1.1.

Let X=[X1|X2||Xn]Tn×d\textbf{X}=[X_{1}|X_{2}|\cdots|X_{n}]^{T}\in\mathbb{R}^{n\times d} be such that Xi,Xj[0,1]\langle X_{i},X_{j}\rangle\in[0,1] for all i,j[n]i,j\in[n]. We say that ARDPG(X,ν)\textbf{A}\sim RDPG(\textbf{X},\nu) is an instance of a d-dimensional Random Dot Product Graph (RDPG) with latent positions X and sparsity parameter ν\nu if given X, the entries of the random, symmetric, hollow adjacency matrix A{0,1}n×n\textbf{A}\in\{0,1\}^{n\times n} satisfy for all i<ji<j, Aijind.Bernoulli(νXiTXj).A_{ij}\overset{ind.}{\sim}\text{Bernoulli}(\nu X_{i}^{T}X_{j}).

In this framework (where we here have ν=1)\nu=1), we posit matrices of latent positions Xn×d\textbf{X}\in\mathbb{R}^{n\times d}, for A1\textbf{A}_{1}, A2\textbf{A}_{2}, and Yn×d\textbf{Y}\in\mathbb{R}^{n\times d} for A3\textbf{A}_{3} (with XXT,YYT[0,1]n×n\textbf{X}\textbf{X}^{T},\textbf{Y}\textbf{Y}^{T}\in[0,1]^{n\times n}), such that the ii-th row of X, (resp., Y) corresponds to the latent feature vector associated with the ii-th vertex in A1\textbf{A}_{1} and A2\textbf{A}_{2} (resp., A3\textbf{A}_{3}). In this setting, the distribution of networks from the same person (i.e., A1\textbf{A}_{1} and A2\textbf{A}_{2}) will serve as our null distribution, and we seek to test H0:A3RDPG(X,ν=1)H_{0}:\textbf{A}_{3}\sim\text{RDPG}(\textbf{X},\nu=1), where A3RDPG(Y,ν=1)\textbf{A}_{3}\sim\text{RDPG}(\textbf{Y},\nu=1).

The RDPG model posits a tractable parameterization of the networks provided by the latent position matrices, and there are a number of statistically consistent methods for estimating these parameters under a variety of model variants. Here, we will use the Adjacency Spectral Embedding (ASE) to estimate X and Y; see [55] for a survey of recent work in estimating and inference in RDPGs.

Definition 1.2.

(Adjacency Spectral Embedding) Given the adjacency matrix A{0,1}n×n\text{{A}}\in\{0,1\}^{n\times n} of an nn-vertex graph, the Adjacency Spectral Embedding (ASE) of A into d\mathbb{R}^{d} is given by

ASE(A,d)=X^=UASA1/2n×d\text{ASE({A}},d)=\widehat{\textbf{X}}=U_{A}S_{A}^{1/2}\in\mathbb{R}^{n\times d} (2)

where [UA|UA~][SASA~][UA|UA~][U_{A}|\tilde{U_{A}}][S_{A}\oplus\tilde{S_{A}}][U_{A}|\tilde{U_{A}}]\hskip 2.0pt is the spectral decomposition of |A|=(ATA)12,|\text{{A}}|=(\text{{A}}^{T}\text{{A}})^{\frac{1}{2}}, SAS_{A} is the diagonal matrix with the ordered dd largest singular values of A on its diagonal, and UAn×dU_{A}\in\mathbb{R}^{n\times d} is the matrix whose columns are the corresponding orthonormal eigenvectors of A.

Once suitable estimates of the parameters are obtained—denoted X^1,X^2\widehat{\textbf{X}}_{1},\widehat{\textbf{X}}_{2} for those derived via ASE of A1,A2\textbf{A}_{1},\textbf{A}_{2} respectively, and Y^\widehat{\textbf{Y}} derived via ASE of A3\textbf{A}_{3}—we use a parametric bootstrap (here with 200 bootstrap samples) to estimate the null distribution critical value of T(A1,A2)=P^1QkP^2QkTF,T(\textbf{A}_{1},\textbf{A}_{2})=\|\widehat{\textbf{P}}_{1}-\textbf{Q}_{k}\widehat{\textbf{P}}_{2}\textbf{Q}_{k}^{T}\|_{F}, where in the bb-th bootstrap sample, the test statistic

Tb(A1,A2)=P^1,bQkP^2,bQkTFT_{b}(\textbf{A}_{1},\textbf{A}_{2})=\|\widehat{\textbf{P}}_{1,b}-\textbf{Q}_{k}\widehat{\textbf{P}}_{2,b}\textbf{Q}_{k}^{T}\|_{F}

is computed as follows: For each i=1,2i=1,2,

  • (i)

    sample independent Ai,b\textbf{A}_{i,b}\simRDPG(X^i,ν=1)(\widehat{\textbf{X}}_{i},\nu=1);

  • (ii)

    compute X^i,b=\widehat{\textbf{X}}_{i,b}=ASE(Gi,b,dG_{i,b},d);

  • (iii)

    set P^i,b=X^i,b(X^i,b)T\widehat{\textbf{P}}_{i,b}=\widehat{\textbf{X}}_{i,b}(\widehat{\textbf{X}}_{i,b})^{T}.

Note that we use a single embedding dimension dd for all the ASE’s in the null, estimated as detailed in Remark 1.3. Given this estimated critical value, we then estimate the testing level and testing power as follows. For each k\ell\leq k, we mimic the above procedure (again with 200 bootstrap samples) to estimate the distributions of T()(A1,A2)=P^1QP^2QTFT^{(\ell)}(\textbf{A}_{1},\textbf{A}_{2})=\|\widehat{\textbf{P}}_{1}-\textbf{Q}_{\ell}\widehat{\textbf{P}}_{2}\textbf{Q}_{\ell}^{T}\|_{F} and T()(A1,A3)=P^1QP^3QTFT^{(\ell)}(\textbf{A}_{1},\textbf{A}_{3})=\|\widehat{\textbf{P}}_{1}-\textbf{Q}_{\ell}\widehat{\textbf{P}}_{3}\textbf{Q}_{\ell}^{T}\|_{F} where P^2\widehat{\textbf{P}}_{2} (resp., P^3\widehat{\textbf{P}}_{3}) is the ASE derived estimate of P2=𝔼(A2)\textbf{P}_{2}=\mathbb{E}(\textbf{A}_{2}) (resp., P3=𝔼(A3)\textbf{P}_{3}=\mathbb{E}(\textbf{A}_{3})); note that similar bootstrapping procedures were considered in [11, 45]. The estimated testing level (resp., power) under the null hypothesis that (A1)=(A2)\mathcal{L}(\textbf{A}_{1})=\mathcal{L}(\textbf{A}_{2}) (resp., under the alternative that (A1)=(A3)\mathcal{L}(\textbf{A}_{1})=\mathcal{L}(\textbf{A}_{3})) is then computed by calculating the proportion of T()(A1,A2)T^{(\ell)}(\textbf{A}_{1},\textbf{A}_{2}) (resp., T()(A1,A3)T^{(\ell)}(\textbf{A}_{1},\textbf{A}_{3})) greater than the estimated critical value.

Estimated power for 200 bootstrapped samples of these test statistics at approximate level α=0.05\alpha=0.05 are plotted in Figure 1 left panel (averaged over the nMC=100nMC=100 Monte Carlo replicates). In the figure, the xx-axis represents the number of vertices shuffled via Q\textbf{Q}_{\ell} (from 0 to kk) while the curve colors represent the maximum number of vertices potentially shuffled via Πk,n\Pi_{k,n}; here all shuffled by Qk\textbf{Q}_{k}. As seen in figure, the power of this test increases as both kk and \ell increase, implying that the test is able to correctly distinguish the difference between the two subjects when the effect of the shuffling is either minimal (small kk) or when the shuffling is equally severe in both the null and alternative cases (i.e., Q\textbf{Q}_{\ell} shuffles as much as Qk\textbf{Q}_{k}). When kk is much bigger than \ell, the test is overly conservative (see Figure 1 right panel), as expected. In this case the shuffling in H0H_{0} has the effect of inflating the critical value compared to the true (i.e., unshuffled) testing critical value, yielding an overly conservative test that cannot distinguish between the different test subjects.

1.4 Random graph models

As referenced above, to tackle the question of power loss statistically, we will anchor our analysis in commonly studied random graph models from the literature. In addition to the Random Dot Product Graph (RDPG) model [56, 54] mentioned above, we will also consider the Stochastic Blockmodel (SBM) [57] as a data generating mechanism. These models provide tractable settings for the analysis of graphs where the connectivity is driven by latent features—community membership in the SBM and the latent position vector in the RDPG.

The Stochastic Blockmodel—and its myriad variants including mixed membership [58], degree corrected [59], and hierarchical SBMs [25, 60]—provide a simple framework for networks with latent community structure.

Definition 1.3.

We say that an nn-vertex random graph A\textbf{A}\sim SBM(K,Λ,b,ν)(K,\Lambda,b,\nu) is distributed according to a stochastic block model random graph with parameters K+K\in\mathbb{Z}^{+} the number of blocks in graph, Λ[0,1]K×K\Lambda\in[0,1]^{K\times K} the block probability matrix, bKb\in\mathbb{Z}^{K} the block membership function, and ν\nu the sparsity parameter, if

  • i.

    The vertex set VV is partitioned into KK blocks V=V1V2VK,V=V_{1}\sqcup V_{2}\sqcup\cdots\sqcup V_{K}, where for each i[K]i\in[K], we have |Vi|=ni|V_{i}|=n_{i} denotes the size of the ithi^{th} block (so that i=1Kni=n\sum_{i=1}^{K}n_{i}=n);

  • ii.

    The block membership function b:VKb:V\mapsto K is such that b(v)=i{b}(v)=i iff vViv\in V_{i}, and we have for each {u,v}(V2)\{u,v\}\in\binom{V}{2}, Auvind.Bernoulli(νΛb(u),b(v)).A_{uv}\stackrel{{\scriptstyle ind.}}{{\sim}}\text{Bernoulli}(\nu\Lambda_{{b}(u),{b}(v)}).

Note that the block membership vector in an SBM is often modeled as a random multinomial vector with block probability parameter πK\vec{\pi}\in\mathbb{R}^{K} giving the probabilities of assigning vertices randomly to each of the KK blocks. Our analysis is done in the fixed block membership setting, although it translates immediately to the random membership setting. Note also that we will often be considering cases where the number of vertices in GG satisfies nn\rightarrow\infty. In this case, we write GSBM(Kn,Λn,bn,νn)G\sim\text{SBM}(K_{n},\Lambda_{n},b_{n},\nu_{n}) so that the model parameters may vary in nn. However, to ease notation, we will suppress the nn subscript throughout, although the dependence on nn is implicitly understood.

In SBMs, the connectivity structure is driven by the latent community membership of the vertices. In the space of latent feature models, a natural extension of this idea is to have connectivity modeled as a function of more nuanced, vertex-level, features. In this direction, we will also consider framing our inference in the popular Random Dot Product Graph model introduced in Definition 1.1. Note that the RDPG model encompasses SBM models with positive semidefinite Λ\Lambda. Indefinite and negative definite SBM’s, are encompassed via the generalized RDPG [61], though the ordinary RDPG will be sufficient for our present purposes. Note also that our theory will be presented for the fixed latent position RDPG above, though it translates immediately to the random latent position setting (i.e., where the rows of 𝐗\mathbf{X}, namely the XiX_{i}, are i.i.d. from an appropriate distribution FF).

Remark 1.1.

An inherent non-identifiability of the RDPG model comes from the fact that for any orthogonal matrix W𝒪d\textbf{W}\in\mathcal{O}_{d}, we get 𝐀|𝐗=𝐀|(𝐗𝐖)\mathbf{A}|\mathbf{X}\stackrel{{\scriptstyle\mathcal{L}}}{{=}}\mathbf{A}|(\mathbf{XW}). With this caveat, RDPGs are more suitable for modeling in inference tasks that are rotation invariant, such as clustering [21, 20], classification [24], and appropriately defined hypothesis testing settings [11, 45].

Remark 1.2.

If the RDPG graphs are directed or weighted, then appropriate modifications to the ASE are required to embed the networks (see, for example, [62] and [63]). Analogous concentration results are available in both settings, and we suspect that we can derive theory analogous to Theorems 2.12.2. Herein, we restrict ourselves to the unweighted RDPG, and leave the necessary modification to handle directed and weighted graph to future work.

As in [43], we will choose to control the sparsity of our graphs via ν\nu and not through the latent position matrix 𝐗\mathbf{X} or block probability matrix Λ\Lambda. As such, we will implicitly make the following assumption throughout the remainder for all RDPGs and positive semidefinite SBMs (when viewed as RDPGs):

Assumption 1.

If we consider a random graph sequence AnRDPG(Xn,νn)\textbf{A}_{n}\sim\text{RDPG}(\textbf{X}_{n},\nu_{n}) where Xnn×d\textbf{X}_{n}\in\mathbb{R}^{n\times d}, then we will assume that for all nn sufficiently large, we have that:

  • i.

    Xn\textbf{X}_{n} is rank dd, and if σ1(Xn)σ2(Xn)σd(Xn)\sigma_{1}(\textbf{X}_{n})\geq\sigma_{2}(\textbf{X}_{n})\geq\cdots\geq\sigma_{d}(\textbf{X}_{n}) are the singular values of Xn\textbf{X}_{n}, we have σ1(Xn)σd(Xn)=Θ(n)\sigma_{1}(\textbf{X}_{n})\approx\sigma_{d}(\textbf{X}_{n})=\Theta(n);

  • ii.

    There exists a fixed compact set 𝒳\mathcal{X} such that the rows of Xn\textbf{X}_{n} are in 𝒳\mathcal{X} for all nn;

  • iii.

    There exists a fixed constant a>0a>0 such that XnXnTa\textbf{X}_{n}\textbf{X}_{n}^{T}\geq a entry-wise.

1.5 Model estimation

In the RDPG (and positive semidefinite SBM) setting, our initial hypothesis test will be predicated upon having a suitable estimate of P=𝔼(A|X)\textbf{P}=\mathbb{E}(\textbf{A}|\textbf{X}). In this setting, the Adjacency Spectral Embedding (ASE) (see Definition 1.2) of [21] has proven to be practically useful and theoretically tractable means for obtaining such an estimate. The adjacency spectral embedding has a rich, recent history in the literature (see [55]) as a tool for estimating tractable graph representations, achieving its greatest estimation strength in the class of latent position networks (the RDPG being one such example). In these settings, it is often assumed that the rank of the latent position matrix X is dd, and that dd is considerably smaller than nn, the number of vertices in the graph.

A great amount of inference in the RDPG setting is predicated upon X^\widehat{\textbf{X}} being a suitably accurate estimate of X. To this end, the key statistical properties of consistency and asymptotic residual normality are established for the ASE in [21, 64, 61] and [65, 55] respectively. These results (and analogues for unscaled variants of the ASE) have laid the groundwork for myriad subsequent inference results, including clustering [21, 64, 22, 66], classification [24], time-series analysis [41, 67], and vertex nomination [49, 68], among others.

Remark 1.3.

In practice, there are a number of heuristics for estimating the unknown embedding dimension dd in the ASE (see, for example, the work in [69, 70, 71]). In the real data experiments below, we will adopt an automated elbow-finder applied to the SCREE plot (as motivated by [72] and [70]); for the simulation experiments, we use the true dd value for the underlying RDPG’s/SBM’s. Estimating the correct dimension dd is of paramount importance in spectral graph inference, as underestimating dd introduces bias into the embedding estimate and overestimating dd introduces additional variance into the estimate. In our experience, underestimation of dd would have a more dramatic impact on subsequent inferential performance.

2 Shuffled graph testing in theory

In complicated testing regimes (e.g., the embedding-based tests of [11, 45, 44]), analyzing the distribution of the test under the alternative is itself a challenging proposition (see, for example, the work in [73, 11]). Accounting for a second layer of uncertainty due to the shuffling adds further complexity to the analysis. In order to build intuition for these more complex settings in the context of the RDPG and SBM models (which we will explore empirically in Section 4), we examine the effect on testing power of shuffling in the simple Frobenius norm test considered in Section 1.3.

We consider first the case where A1\textbf{A}_{1}\simRDPG(Xn,νn)(\textbf{X}_{n},\nu_{n}) and we have an independent A2\textbf{A}_{2}\simRDPG(Yn,νn)(\textbf{Y}_{n},\nu_{n}). Under the null Xn=Yn\textbf{X}_{n}=\textbf{Y}_{n}, and we will consider elements of the alternative that satisfy the following: for all but r=rnr=r_{n} rows of Yn\textbf{Y}_{n}, we have Yn[i,:]=Xn[i,:]\textbf{Y}_{n}[i,:]=\textbf{X}_{n}[i,:] so that we have YYT=XXT+𝐄\textbf{Y}\textbf{Y}^{T}=\textbf{X}\textbf{X}^{T}+\mathbf{E} where (with the proper vertex reordering)

𝐄=(𝐄r𝐄r(𝐄r)T0nr)\displaystyle\mathbf{E}=\begin{pmatrix}\mathbf{E}_{r}&\mathbf{E}_{r}^{\prime}\\ (\mathbf{E}_{r}^{\prime})^{T}&\textbf{0}_{n-r}\end{pmatrix} (3)

We will further assume that there exists constants c2>c1>0c_{2}>c_{1}>0 and ϵn=ϵ>0\epsilon_{n}=\epsilon>0 such that c1ϵ|eij|c2ϵc_{1}\epsilon\leq|e_{ij}|\leq c_{2}\epsilon for all entries of 𝐄r\mathbf{E}_{r} and 𝐄r\mathbf{E}_{r}^{\prime}. We note that we will assume throughout that both Xn\textbf{X}_{n} and Yn\textbf{Y}_{n} satisfy the conditions of Assumption 1.

The principle challenge of testing in this regime is that the veracity of the (across graph) labels of vertices in Un,kU_{n,k} is unknown a priori. It could be the case that these vertices were all shuffled or all correctly aligned, and it is difficult to disentangle the effect on testing power of 𝐄\mathbf{E} versus the potential shuffling. To model this, we consider shuffled elements of the alternative, so that we observe A1\textbf{A}_{1} and B2=Q~A2(Q~)T\textbf{B}_{2}=\widetilde{\textbf{Q}}\textbf{A}_{2}(\widetilde{\textbf{Q}})^{T}, where the true but unknown shuffling of A2\textbf{A}_{2} is Q~\widetilde{\textbf{Q}}, which shuffles k\ell\leq k labels in Un,kU_{n,k}.

2.1 Power analysis and the effect of shuffling: P^\widehat{P} test

In this section, we will present a trio of theorems, namely Theorems 2.12.3, in which we characterize the impact on power of the two distinct sources of noise here: the shuffling error (kk and \ell) and the error in the alternative captured here by ϵ\epsilon. When the difference between kk and \ell is comparably large (for example, when r\ell\leq r ϵ(k)/r\epsilon\ll\sqrt{(k-\ell)/r} and ϵk\epsilon\ll\frac{k-\ell}{\ell}), then the power of the resulting test will be low even in the presence of modest error ϵ\epsilon. In this case, the relative size of the error in the alternative is overwhelmed by the excess shuffling in the null which is needed to maintain testing level α\alpha. The actual shuffling error (i.e., \ell) is much less than the conservative null shuffling (i.e., kk), and the test is not able to distinguish the two graphs in light of the conservative test’s overcompensation. Even in the case where kk-\ell is relatively small, if the error ϵ\epsilon is sufficiently small, we will have low testing power, as expected. However, when the difference between kk and \ell is relatively small compared to ϵ\epsilon, or kk and \ell are both relatively small compared to ϵ\epsilon (see the conditions in Theorem 2.1 and Theorem 2.2), then the difference in the number of vertices being shuffled across the conservative null and truly shuffled in the alternative is overwhelmed by the error in the alternative. In this case, the noise created by the relatively small differences in shuffling between null and alternative can be overcome, and high power can still be achieved.

2.1.1 Small kk-\ell regime

Before presenting the trio of theorems, we will first establish the following notation

  • i.

    For i=1,2,i=1,2, let P^i\widehat{\textbf{P}}_{i} be the ASE-based estimate of Pi\textbf{P}_{i} derived from Ai\textbf{A}_{i};

  • ii.

    For i=1,2,i=1,2, and for any QΠn,k\textbf{Q}\in\Pi_{n,k}, let P^i,Q\widehat{\textbf{P}}_{i,\textbf{Q}} be the ASE-based estimate of Pi,Q=QPiQT\textbf{P}_{i,\textbf{Q}}=\textbf{Q}\textbf{P}_{i}\textbf{Q}^{T} derived from QAiQT\textbf{Q}\textbf{A}_{i}\textbf{Q}^{T};

  • iii.

    Let QargmaxQΠn,kP1P1,QF\textbf{Q}^{*}\in\text{argmax}_{Q\in\Pi_{n,k}}\|P_{1}-P_{1,\textbf{Q}}\|_{F}; let Q~\widetilde{\textbf{Q}} be the shuffling of k\ell\leq k vertices in Un,kU_{n,k} such that we observe B2=Q~A2Q~T\textbf{B}_{2}=\widetilde{\textbf{Q}}\textbf{A}_{2}\widetilde{\textbf{Q}}^{T}.

In this section, we will be concerned with conditions under which power is asymptotically almost surely 1; specifically conditions under which the following holds for all nn sufficiently large

H1(P^1P^2,Q~F>maxQΠn,kcα,Q)1n2.\mathbb{P}_{H_{1}}\left(\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\tilde{\textbf{Q}}}\|_{F}>\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}\right)\geq 1-n^{-2}. (4)

Our first result tackles the case in which kk is relatively small, and only modest error ϵ\epsilon is needed to achieve high testing power. Note that the proof of Theorem 2.1 can be found in Appendix A.1.

Theorem 2.1.

With notation as above, assume there exist α(0,1]\alpha\in(0,1] such that r=Θ(nα)r=\Theta(n^{\alpha}) and k,nαk,\ell\ll n^{\alpha}, and that P1P1,QF2P1P1,Q~F2ν2n=O(k)\frac{\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2}}{\nu^{2}n}=O(k). In the sparse setting, consider νlog4c(n)nβ\nu\gg\frac{\log^{4c}(n)}{n^{\beta}} for β(0,1]\beta\in(0,1] where αβ\alpha\geq\beta. If either

  • i.

    k=O(nβlog2cn)k=O\left(\frac{n^{\beta}}{\log^{2c}n}\right) and ϵnβαlog2c(n)\epsilon\gg\sqrt{\frac{n^{\beta-\alpha}}{\log^{2c}(n)}}; or

  • ii.

    knβlog2c(n)k\gg\frac{n^{\beta}}{\log^{2c}(n)} and ϵknα\epsilon\gg\sqrt{\frac{k}{n^{\alpha}}}

then Eq. 4 holds for all nn sufficiently large. In the dense case where ν=1\nu=1, if either

  • i.

    klog2c(n) and ϵk/nα;k\gg\log^{2c}(n)\text{ and }\epsilon\gg\sqrt{k/n^{\alpha}}; or

  • ii.

    klog2c(n) and ϵ(logc(n))/nα,k\ll\log^{2c}(n)\text{ and }\epsilon\gg\sqrt{(\log^{c}(n))/n^{\alpha}},

then Eq. 4 holds for all nn sufficiently large.

Note that P1P1,QF2=O(nkν2)\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}=O(nk\nu^{2}) so the assumption on the growth rate of P1P1,QF2P1P1,Q~F2\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2} in Theorem 2.1 considers the case where the shuffling due to \ell does not compensate for the shuffling due to kk, and the shuffling due to kk needs to be relatively minor to achieve the desired power (here, the growth rates on ϵ\epsilon in terms of kk).

Our second result tackles the case in which kk-\ell is relatively small, and only modest error ϵ\epsilon is needed to achieve high testing power. The proof of Theorem 2.2 can be found in Appendix A.1.

Theorem 2.2.

With notation as above, assume there exist α(0,1]\alpha\in(0,1] such that r=Θ(nα)r=\Theta(n^{\alpha}) and k,nαk,\ell\ll n^{\alpha}, and that P1P1,QF2P1P1,Q~F2ν2n=O(k)\frac{\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2}}{\nu^{2}n}=O(k-\ell). In the sparse setting where νlog4c(n)nβ\nu\gg\frac{\log^{4c}(n)}{n^{\beta}} for β(0,1]\beta\in(0,1] where αβ\alpha\geq\beta, if knβlog2c(n)k\gg\frac{n^{\beta}}{\log^{2c}(n)} and either

  • i.

    kk1/2nβ/2log2c(n)\frac{k-\ell}{k^{1/2}}\geq\frac{n^{\beta/2}}{\log^{2c}(n)}; ϵnα\epsilon\gg\frac{\ell}{n^{\alpha}}; and ϵknα\epsilon\gg\sqrt{\frac{k-\ell}{n^{\alpha}}}; or

  • ii.

    kk1/2nβ/2log2c(n)\frac{k-\ell}{k^{1/2}}\leq\frac{n^{\beta/2}}{\log^{2c}(n)}; ϵnα\epsilon\gg\frac{\ell}{n^{\alpha}}; and ϵnβ/2log2c(n)k1/2nα\epsilon\gg\sqrt{\frac{n^{\beta/2}}{\log^{2c}(n)}\frac{k^{1/2}}{n^{\alpha}}}

then Eq. 4 holds for all nn sufficiently large. In the dense case where ν=1\nu=1 and k=ω(log2cn)k=\omega(\log^{2c}n), if either

  • i.

    kk1/2logc(n)\frac{k-\ell}{k^{1/2}}\geq\log^{c}(n); ϵnα\epsilon\gg\frac{\ell}{n^{\alpha}}; and ϵknα\epsilon\gg\sqrt{\frac{k-\ell}{n^{\alpha}}}; or

  • ii.

    kk1/2logc(n)\frac{k-\ell}{k^{1/2}}\leq\log^{c}(n); ϵnα\epsilon\gg\frac{\ell}{n^{\alpha}}; and ϵk1/2logc(n)nα\epsilon\gg\sqrt{\frac{k^{1/2}\log^{c}(n)}{n^{\alpha}}},

then Eq. 4 holds for all nn sufficiently large.

The assumption on the growth rate of P1P1,QF2P1P1,Q~F2\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2} in Theorem 2.2 considers the case where the shuffling due to \ell can compensate for the shuffling due to kk (i.e., when kkk-\ell\ll k). In this setting, it is possible to achieve the desired power in alternative regimes with significantly smaller ϵ\epsilon. Under mild assumptions, this growth rate condition will hold, for example, in the SBM where the shuffling is across blocks; see Section 2.1.2.

2.1.2 Shuffled graph testing in SBMs

We consider next the case where A1\textbf{A}_{1}\simSBM(K,Λ,b,ν)(K,\Lambda,b,\nu), and we assume there exists a matrix 𝐄=[eij]n×n\mathbf{E}=[e_{ij}]\in\mathbb{R}^{n\times n} of the form (up to vertex reordering) of Eq. 3. such that, under H1H_{1}, A2=[A2,ij]\textbf{A}_{2}=[A_{2,ij}] is an independently sampled graph with independently drawn edges sampled according to

A2,ijind.Bernoulli(ν[Λb(i),b(j)+eij]).A_{2,ij}\stackrel{{\scriptstyle\text{ind.}}}{{\sim}}\text{Bernoulli}\left(\nu\left[\Lambda_{b(i),b(j)}+e_{ij}\right]\right).

We will consider here 𝐄\mathbf{E} being block-structured (in which case A2\textbf{A}_{2} is itself an SBM). As before, we will assume that there exists constants c2>c1>0c_{2}>c_{1}>0 and ϵn=ϵ>0\epsilon_{n}=\epsilon>0 such that c1ϵ|eij|c2ϵc_{1}\epsilon\leq|e_{ij}|\leq c_{2}\epsilon for all entries of 𝐄r,𝐄r\mathbf{E}_{r},\mathbf{E}_{r}^{\prime}.

Consider the setting where Un,kV1V2,U_{n,k}\subset V_{1}\cup V_{2}, and |Un,kV1|=|Un,kV2|=k/2,|U_{n,k}\cap V_{1}|=|U_{n,k}\cap V_{2}|=k/2, so that at most kk vertices have shuffled labels and k/2k/2 of these are in each of blocks 1 and 2. Note that, as block labels are arbitrary, this captures the setting where vertices may be flipped between any two different blocks. In what follows below (see Proposition A.1 in the Appendix), we will see that we can bound maxQΠn,kcα,Q\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}} in terms of any permutation that interchange exactly k/2k/2 vertices between blocks 1 and 2. Without loss of generality we can then bound maxQΠn,kcα,Q\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}} in terms of Qk\textbf{Q}_{k} defined via

Qk=(𝐑k0n1+n2,nn1n20nn1n2,n1+n2Inn1n2)\textbf{Q}_{k}=\begin{pmatrix}\mathbf{R}_{k}&\textbf{0}_{n_{1}+n_{2},n-n_{1}-n_{2}}\\ \textbf{0}_{n-n_{1}-n_{2},n_{1}+n_{2}}&\textbf{I}_{n-n_{1}-n_{2}}\end{pmatrix}

where

𝐑k=(0k/2,k/20k/2,n1k/20n1k/2,k/2In1k/2Ik/20k/2,n2k/20n1k/2,k/20n1k/2,n2k/2Ik/20k/2,n1k/20n2k/2,k/20n2k/2,n1k/20k/2,k/20k/2,n2k/20n2k/2,k/2In2k/2)\mathbf{R}_{k}=\begin{pmatrix}\begin{matrix}\textbf{0}_{k/2,k/2}&\textbf{0}_{k/2,n_{1}-k/2}\\ \textbf{0}_{n_{1}-k/2,k/2}&\textbf{I}_{n_{1}-k/2}\end{matrix}&\vline&\begin{matrix}\textbf{I}_{k/2}&\textbf{0}_{k/2,n_{2}-k/2}\\ \textbf{0}_{n_{1}-k/2,k/2}&\textbf{0}_{n_{1}-k/2,n_{2}-k/2}\end{matrix}\\ \hline\cr\begin{matrix}\textbf{I}_{k/2}&\textbf{0}_{k/2,n_{1}-k/2}\\ \textbf{0}_{n_{2}-k/2,k/2}&\textbf{0}_{n_{2}-k/2,n_{1}-k/2}\end{matrix}&\vline&\begin{matrix}\textbf{0}_{k/2,k/2}&\textbf{0}_{k/2,n_{2}-k/2}\\ \textbf{0}_{n_{2}-k/2,k/2}&\textbf{I}_{n_{2}-k/2}\end{matrix}\end{pmatrix}

We again consider shuffled elements of the alternative, so that we observe A1\textbf{A}_{1} and B2=QA2(Q)T\textbf{B}_{2}=\textbf{Q}_{\ell}\textbf{A}_{2}(\textbf{Q}_{\ell})^{T}, where Q\textbf{Q}_{\ell} is defined analogously to Qk\textbf{Q}_{k} (i.e., for any hkh\leq k, Qh\textbf{Q}_{h} shuffles the first h/2h/2 vertices between blocks 1 and 2). In this SBM setting, note that

P1\displaystyle\|\textbf{P}_{1} QkP1(Qk)TF2=k2ν2(Λ11Λ22)2/2+2ki=3Kniν2(Λi1Λi2)2\displaystyle\!-\!\textbf{Q}_{k}\textbf{P}_{1}(\textbf{Q}_{k})^{T}\|_{F}^{2}=k^{2}\nu^{2}(\Lambda_{11}\!-\!\Lambda_{22})^{2}/2+2k\sum_{i=3}^{K}n_{i}\nu^{2}(\Lambda_{i1}-\Lambda_{i2})^{2}
+2k(n1k/2)ν2(Λ11Λ12)2+2k(n2k/2)ν2(Λ22Λ12)2\displaystyle+2k(n_{1}-k/2)\nu^{2}(\Lambda_{11}\!-\!\Lambda_{12})^{2}+2k(n_{2}-k/2)\nu^{2}(\Lambda_{22}-\Lambda_{12})^{2}
=\displaystyle= k2ν2(Λ11Λ22)2/2+2ki=1Kniν2(Λi1Λi2)2\displaystyle k^{2}\nu^{2}(\Lambda_{11}-\Lambda_{22})^{2}/2+2k\sum_{i=1}^{K}n_{i}\nu^{2}(\Lambda_{i1}-\Lambda_{i2})^{2}
k2ν2(Λ11Λ12)2k2ν2(Λ22Λ12)2\displaystyle-k^{2}\nu^{2}(\Lambda_{11}-\Lambda_{12})^{2}-k^{2}\nu^{2}(\Lambda_{22}-\Lambda_{12})^{2}

If ni=Θ(n)n_{i}=\Theta(n) for each i[K]i\in[K] and k,nk,\ell\ll n (as assumed in Theorems 2.1 and 2.2), then P1QkP1(Qk)TF2P1QP1(Q)TF2=Θ(nν2(k))\|\textbf{P}_{1}-\textbf{Q}_{k}\textbf{P}_{1}(\textbf{Q}_{k})^{T}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{Q}_{\ell}\textbf{P}_{1}(\textbf{Q}_{\ell})^{T}\|_{F}^{2}=\Theta(n\nu^{2}(k-\ell)) under mild assumptions on Λ\Lambda, and Theorem 2.2 applies.

2.1.3 Large kk-\ell regime

We next tackle the power lost by an overly conservative test (i.e., when kk is much bigger than \ell). In this case, it is reasonable to expect the power of the resulting test to be small, as in this setting the shuffling noise could hide the true discriminatory signal in the alternative (here presented by 𝐄\mathbf{E}). Note that the proof of Theorem 2.3 can be found in Appendix A.2.

Theorem 2.3.

With notation as in Section 2.1.1, assume that

P1P1,QF2P1P1,Q~F2ν2n=Ω(k)\frac{\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2}}{\nu^{2}n}=\Omega(k-\ell)

and r=Θ(nα).r=\Theta(n^{\alpha}). Suppose further that kklogcnν.\frac{k-\ell}{\sqrt{k}}\gg\frac{\log^{c}n}{\sqrt{\nu}}. Then if either

  • i.

    ϵknα and r\epsilon\ll\frac{k-\ell}{n^{\alpha}}\text{ and }\ell\geq r; or

  • ii.

    ϵknα;ϵk; and r,\epsilon\ll\sqrt{\frac{k-\ell}{n^{\alpha}}};\,\,\epsilon\ll\frac{k-\ell}{\ell};\text{ and }\ell\leq r,

we have that for all nn sufficiently large

H1(P^1QP^2QTF>maxQΠn,kcα,Q)n2.\mathbb{P}_{H_{1}}\left(\|\widehat{\textbf{P}}_{1}-\textbf{Q}_{\ell}\widehat{\textbf{P}}_{2}\textbf{Q}_{\ell}^{T}\|_{F}>\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}\right)\leq n^{-2}. (5)

Note again that under mild assumptions, the growth rate requirements of P1P1,QF2P1P1,Q~F2\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2} holds in the SBM setting considered in Section 2.1.2, and Theorem 2.3 holds (given the growth rate of k,rk,r).

3 P^\widehat{\textbf{P}} versus A in the Frobenius test

The issue that is at the heart of the problem with the Frobenius-norm test using adjacency matrices (rather than using P^\widehat{\textbf{P}}) can be best understood via the following simple example:

Example 3.1.

Consider the simple setting where we have independent random variables

XBernoulli(p);YBernoulli(p);ZBernoulli(q).\displaystyle X\sim\text{Bernoulli}(p);\,\,Y\sim\text{Bernoulli}(p);\,\,Z\sim\text{Bernoulli}(q).

In this case

𝔼|XY|\displaystyle\mathbb{E}|X-Y| =2p(1p)\displaystyle=2p(1-p)
𝔼|XZ|\displaystyle\mathbb{E}|X-Z| =p(1q)+q(1p)\displaystyle=p(1-q)+q(1-p)

Note that

p(1q)+q(1p)2p(1p)=p(pq)+(qp)(1p)=(qp)(12p)\displaystyle p(1-q)+q(1-p)-2p(1-p)=p(p-q)+(q-p)(1-p)=(q-p)(1-2p)

is greater than 0 when q>pq>p and p<1/2p<1/2, or when q<pq<p and p>1/2p>1/2; and is less than 0 when q>pq>p and p>1/2p>1/2, or when q<pq<p and p<1/2p<1/2.

Consider next the task of testing H0:(A)=(B)H_{0}:\mathcal{L}(\textbf{A})=\mathcal{L}(\textbf{B}) for a pair of graphs A and B. A natural first test statistic to use is T=ABF2T=\|\textbf{A}-\textbf{B}\|_{F}^{2}, and it is natural to then reject the null when TT is relatively large. In the case where A\textbf{A}\simER(n,p)(n,p) (i.e., all edges appear in A with i.i.d. probability pp independent of all other edges) and B\textbf{B}\simER(n,q)(n,q), the test becomes H0:p=qH_{0}:p=q. However, under H0H_{0} we have 𝔼T=n(n1)2p(1p)\mathbb{E}T=n(n-1)2p(1-p) and under the alternative 𝔼T=n(n1)(p(1q)+q(1p))\mathbb{E}T=n(n-1)(p(1-q)+q(1-p)). If p<1/2p<1/2, then p>qp>q implies 𝔼1T<𝔼0T\mathbb{E}_{1}T<\mathbb{E}_{0}T, and rejecting for large values of TT would fail to reject for this range of alternatives. Of course, in the homogeneous Erdős-Rényi (ER) case, we would want a two-sided rejection region (or we can appropriately scale TT to render a one-sided test suitable), though in heterogeneous ER models, adapting TT is more nuanced as we shall show below. While the test using T=P^1P^2F2T=\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2}\|_{F}^{2} does not suffer from this particular quirk, we do not claim it is the optimal test in the low-rank heterogeneous ER model. Indeed, we suspect the more direct spectral tests of [11, 45] would be more effective, though the effect of the shuffling is more nuanced in those tests. There, it is considerably more difficult to disentangle the shuffling from the embedding alignment steps of the testing regimes (Procrustes alignment in [11], and the Omnibus construction in [45]).

Note that, as here we are working with a 22-block SBM setting of Section 2.1.2, we adopt the notation Q2hQ_{2h} for 0hk0\leq h\leq k to emphasize that 2h2h total vertices are being shuffled, with hh coming from each block. For i=1,2i=1,2 and all 0hk0\leq h\leq k, let Ai,h\textbf{A}_{i,h} be shorthand for Q2hAiQ2hT\textbf{Q}_{2h}\textbf{A}_{i}\textbf{Q}_{2h}^{T} and 𝐄h=[ei,j(h)]\mathbf{E}_{h}=[e_{i,j}^{(h)}] be shorthand for Q2h𝐄Q2hT\textbf{Q}_{2h}\mathbf{E}\textbf{Q}_{2h}^{T}. Consider the hypothesis test for testing H0:(A1)=(A2)H_{0}:\mathcal{L}(\textbf{A}_{1})=\mathcal{L}(\textbf{A}_{2}) using the test statistic (where 𝒢n\mathcal{G}_{n} is the set of all nn-vertex undirected graphs) TA:𝒢n×𝒢n0T_{A}:\mathcal{G}_{n}\times\mathcal{G}_{n}\mapsto\mathbb{R}^{\geq 0} defined via TA(A1,A2):=12A1A2F2T_{A}(\textbf{A}_{1},\textbf{A}_{2}):=\frac{1}{2}\|\textbf{A}_{1}-\textbf{A}_{2}\|_{F}^{2}.

Assume that we are in the dense setting (i.e., νn=1\nu_{n}=1 for all nn) and that the following holds:

  • i.

    There exists an η(0,1/2)\eta\in(0,1/2) such that ηΛ1η\eta\leq\Lambda\leq 1-\eta entry-wise;

  • ii.

    There exists a η~[0,η)\tilde{\eta}\in[0,\eta) such that for all {ij}\{ij\}, |eij|η~|e_{ij}|\leq\tilde{\eta};

  • iii.

    minini=Θ(n)\min_{i}n_{i}=\Theta(n), and maxi|Λ1iΛ2i|,|Λ11Λ22|=Θ(1).\max_{i}|\Lambda_{1i}-\Lambda_{2i}|,|\Lambda_{11}-\Lambda_{22}|=\Theta(1).

In this case, we have that TA(A1,A2,k)T_{A}(\textbf{A}_{1},\textbf{A}_{2,k}) is stochastically greater than TA(A1,A2,h)T_{A}(\textbf{A}_{1},\textbf{A}_{2,h}) for h<kh<k, and so the conservative level α\alpha test—to account for the uncertainty in Un,2kU_{n,2k}—using TAT_{A} would reject H0H_{0} if TA(A1,A3,)>𝔠α,kT_{A}(\textbf{A}_{1},\textbf{A}_{3,\ell})>\mathfrak{c}_{\alpha,k} where 𝔠α,k\mathfrak{c}_{\alpha,k} is the smallest value such that H0(TA(A1,A2,k)>𝔠α,k)α.\mathbb{P}_{H_{0}}(T_{A}(\textbf{A}_{1},\textbf{A}_{2,k})>\mathfrak{c}_{\alpha,k})\leq\alpha. As the following proposition shows (proven in Appendix A.3), the decay of power for this adjacency-based test exhibits pathologies not present in the P^\widehat{\textbf{P}}-based test (where {ij}\sum_{\{ij\}} denotes the sum over unordered pairs of elements of [n][n], and n=mininin_{*}=\min_{i}n_{i}, and δ:=maxi|Λ1iΛ2i|\delta:=\max_{i}|\Lambda_{1i}-\Lambda_{2i}| and γ:=|Λ11Λ22|\gamma:=|\Lambda_{11}-\Lambda_{22}|).

Proposition 3.1.

With notation as above, let r=nr=n and define ξij:=(2pij(1)1)eij()\xi_{ij}:=(2p^{(1)}_{ij}-1)e^{(\ell)}_{ij} and μξ:={ij}ξij\mu_{\xi}:=\sum_{\{ij\}}\xi_{ij}. We have that

H1(TA(A1,A2,)𝔠α,k)=o(1)\mathbb{P}_{H_{1}}(T_{A}(\textbf{A}_{1},\textbf{A}_{2,\ell})\geq\mathfrak{c}_{\alpha,k})=o(1)

if

(k)nnk22nδ2knγ2+μξn=ω(1).(k-\ell)\frac{n_{*}}{n}-\frac{k^{2}-\ell^{2}}{n}\delta^{2}-\frac{k-\ell}{n}\gamma^{2}+\frac{\mu_{\xi}}{n}=\omega(1).

Digging a bit deeper into this proposition, we see the phenomena of Example 3.1 at play. Even when kk-\ell is relatively small, if sufficiently often we have that

eij()<0\displaystyle e^{(\ell)}_{ij}<0 when pij(1)<1/2\displaystyle\text{ when }p^{(1)}_{ij}<1/2 (6)
eij()>0\displaystyle e^{(\ell)}_{ij}>0 when pij(1)>1/2\displaystyle\text{ when }p^{(1)}_{ij}>1/2 (7)

then μξn\frac{\mu_{\xi}}{n} can itself be positive and divergent, driving power to 0.

Refer to caption
Figure 2: Power results at level α=0.05\alpha=0.05 for nMC=200nMC=200 Monte Carlo replicates of the adjacency matrix-based test statistic and null and alternative distributions presented in Eq. 8 (error bars are ±2\pm 2s.e.). In the figure the xx-axis represents the number of vertices shuffled via Q\textbf{Q}_{\ell} (from 0 to kk) while the curve colors represent the maximum number of vertices potentially shuffled via Πk,n\Pi_{k,n}, here all shuffled by Qk\textbf{Q}_{k}.

3.0.1 Power loss in Presence of Shuffling: Adjacency versus P^\widehat{\textbf{P}}-based tests

Note that in this section, the testing power was computed by directly sampling the distributions of the test statistic under the null and alternative. In this 2-block stochastic block model setting, all shufflings permuting the same number of vertices between blocks are stochastically equivalent, and hence we can directly estimate the testing critical value under the null, and statistic under the alternative (for all k\ell\leq k).

Refer to caption
Figure 3: Power results at approximate level α=0.05\alpha=0.05 for nMC=200nMC=200 Monte Carlo replicates of the 𝐏^\widehat{\bf P}-based test statistic and null and alternative distributions presented in Eq. 8 (error bars are ±2\pm 2s.e.). In the figure the xx-axis represents the number of vertices shuffled via Q\textbf{Q}_{\ell} (from 0 to kk) while the curve colors represent the maximum number of vertices potentially shuffled via Πk,n\Pi_{k,n}, here all shuffled by Qk\textbf{Q}_{k}.

Here the number of Monte Carlo replicates used to estimate the critical values is nMC=200nMC=200, and 200200 further Monte Carlo replicates are used to estimate the power curves under each value of ϵ\epsilon under consideration. We note here that the true d=2d=2 dimension was used to embed the graphs in this section.

To further explore the theoretical analysis of TAT_{A} considered above, we consider the following simple, illustrative experimental setup. With b(v)=2𝟙{v{1,2,,250}b(v)=2-\mathds{1}\{v\in\{1,2,\cdots,250\}, we consider two n=500n=500 vertex SBMs defined via

ASBM(2,[0.550.40.40.45],b,1);BSBM(2,[0.550.40.40.45]+Eϵ,b,1)\textbf{A}\sim\text{SBM}\left(2,\begin{bmatrix}0.55&0.4\\ 0.4&0.45\end{bmatrix},b,1\right);\quad\textbf{B}\sim\text{SBM}\left(2,\begin{bmatrix}0.55&0.4\\ 0.4&0.45\end{bmatrix}+\textbf{E}_{\epsilon},b,1\right) (8)

where Eϵ=ϵ×J500×500\textbf{E}_{\epsilon}=\epsilon\times\textbf{J}_{500\times 500}, and ϵ\epsilon ranges over {±0.01,±0.02,±0.03}\{\pm 0.01,\pm 0.02,\pm 0.03\} (for an example of an analogous test—and result—in the sparse regime, see Appendix A.4). According to Eq. 6 and 7, we would expect the power of the adjacency matrix-based test to be poor for the ϵ<0\epsilon<0 values, even when kk-\ell is relatively small (i.e., even when the shuffling has a negligible effect). We see this play out in Figure 2, where the adjacency matrix-based test (i.e., the test where T(A,B)=12ABF2T(\textbf{A},\textbf{B})=\frac{1}{2}\|\textbf{A}-\textbf{B}\|_{F}^{2}) demonstrates the following: diminishing power in the ϵ>0\epsilon>0 setting when kk-\ell is large, and uniformly poor power in the ϵ<0\epsilon<0 setting. Notably, when kk-\ell is small, the test power is (relatively) large when ϵ>0\epsilon>0 and is near 0 when ϵ<0\epsilon<0. In Figure 3, we see the above phenomena does not occur for the P^\widehat{\textbf{P}}-based test (i.e., the test where T(A,B)=P^AP^BFT(A,B)=\|\widehat{\textbf{P}}_{A}-\widehat{\textbf{P}}_{B}\|_{F}), as for this test we see (nearly identical) diminishing power in the ϵ>0\epsilon>0 and ϵ<0\epsilon<0 settings when kk-\ell is large, and relatively high power when kk-\ell is small. In both cases, the power is increasing as ϵ\epsilon increases as expected. We note here the odd behavior in Figures 2 and 3 when k=400k=400 and =300,\ell=300, and 400400. When k=400k=400, most of the vertices are being shuffled, and the least favorable shuffling under the null does not shuffle the full k=400k=400 vertices (indeed, the critical value when =300\ell=300 is larger here). This is because when k==400k=\ell=400 here, the graphs are closer to the original 2-dimensional, 2-block SBM’s with the blocks reversed. In this case, the “overshuffling” of =400\ell=400 results in a smaller test statistic than the least favorable k<400k<400 shuffling.

One possible solution to the issue presented in Example 3.1 (and exemplified in Eq. 67) is to normalize the adjacency matrices to account for degree discrepancy. With the setup the same as in Example 3.1, consider

T(U,V)=𝔼|UV|𝔼U(1𝔼U)+𝔼V(1𝔼V),T(U,V)=\frac{\mathbb{E}|U-V|}{\mathbb{E}U(1-\mathbb{E}U)+\mathbb{E}V(1-\mathbb{E}V)},

so that

T(X,Y)=1;T(X,Z)=p(1q)+q(1p)p(1p)+q(1q)1.\displaystyle T(X,Y)=1;\quad T(X,Z)=\frac{p(1-q)+q(1-p)}{p(1-p)+q(1-q)}\geq 1.

With A\textbf{A}\simER(n,p)(n,p) and B\textbf{B}\simER(n,q)(n,q), rejecting H0:p=qH_{0}:p=q for large values of TT will be asymptotically strongly consistent. However, in heterogeneous ER settings (see Figure 4) this degree normalization is less effective (especially when the expected degrees are equal across networks). In Figure 4, with

b(v)=2𝟙{v{1,2,,250},b(v)=2-\mathds{1}\{v\in\{1,2,\cdots,250\},

we consider two n=500n=500 vertex SBMs defined via

ASBM(2,[0.550.40.40.45],b,1);BSBM(2,[0.60.350.350.5],b,1)\textbf{A}\sim\text{SBM}\left(2,\begin{bmatrix}0.55&0.4\\ 0.4&0.45\end{bmatrix},b,1\right);\quad\textbf{B}\sim\text{SBM}\left(2,\begin{bmatrix}0.6&0.35\\ 0.35&0.5\end{bmatrix},b,1\right) (9)

and we consider testing H0:(A)=(B)H_{0}:\mathcal{L}(A)=\mathcal{L}(B) in the presence of shuffling using the test statistic

Tnorm(A,B)=12(n2)ABF212(n2)AF2(112(n2)AF2)+12(n2)BF2(112(n2)BF2),T_{\text{norm}}(\textbf{A},\textbf{B})=\frac{\frac{1}{2\binom{n}{2}}\|\textbf{A}-\textbf{B}\|_{F}^{2}}{\frac{1}{2\binom{n}{2}}\|\textbf{A}\|_{F}^{2}\left(1-\frac{1}{2\binom{n}{2}}\|\textbf{A}\|_{F}^{2}\right)+\frac{1}{2\binom{n}{2}}\|\textbf{B}\|_{F}^{2}\left(1-\frac{1}{2\binom{n}{2}}\|\textbf{B}\|_{F}^{2}\right)},

for the normalized adjacency matrix test (left panel), and the usual T(A1,A2)=P^1P^2FT(\textbf{A}_{1},\textbf{A}_{2})=\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2}\|_{F} for the P^\widehat{\textbf{P}} test (right panel). From the figure, we see that in settings such as this where AF2BF2\|\textbf{A}\|_{F}^{2}\approx\|\textbf{B}\|_{F}^{2}, the degree normalization is (unsurprisingly) unable to overcome the issues with the adjacency based test outlined in Example 3.1.

Refer to caption
Figure 4: Power results at approximate level α=0.05\alpha=0.05 for nMC=500nMC=500 Monte Carlo replicates of the normalized test statistic and null and alternative distributions presented in Eq. 9 (error bars are ±2\pm 2s.e.). In the figure the xx-axis represents the number of vertices shuffled via Q2\textbf{Q}_{2\ell} (from 0 to kk) while the curve colors represent the maximum number of vertices potentially shuffled via Πk,n\Pi_{k,n}, here all shuffled by Q2k\textbf{Q}_{2k}.

4 Empirically exploring shuffling in ASE-based tests

Refer to caption
Figure 5: For the experimental setup considered in Section 4, we plot the empirical testing power in the presence of shuffling for the four tests: the Frobenius norm difference between the adjacency-matrices, between P^\widehat{\textbf{P}}’s, TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}}. In the figure the x-axis represents the number of vertices actually shuffled in Un,kU_{n,k} (i.e., the number shuffled in the alternative) while the curve colors represent the maximum number of vertices potentially shuffled via in Un,kU_{n,k}.
Refer to caption
Figure 6: For the experimental setup considered in Section 4, we plot the empirical testing power in the presence of shuffling for the four tests: the Frobenius norm difference between the adjacency-matrices, between P^\widehat{\textbf{P}}’s, TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}}. In the figure the x-axis represents the number of vertices actually shuffled in Un,kU_{n,k} (i.e., the number shuffled in the alternative) while the curve colors represent the maximum number of vertices potentially shuffled via in Un,kU_{n,k}.

As mentioned in previous sections, multiple spectral-based hypothesis testing regimes have been proposed in the literature over the past several years; see, for example, [45, 11, 12, 44]. One of the chief advantages of the P^\widehat{\textbf{P}}-based test considered herein is the ease in which the analysis lends itself to understanding the effect of shuffling; indeed, this power analysis is markedly more complex for the tests considered in [45, 11], for example.

In the ASE-based tests in [45, 11], the authors consider nn vertex, dd dimensional RDPG’s ARDPG(X,ν=1)\textbf{A}\sim\text{RDPG}(\textbf{X},\nu=1) and BRDPG(Y,ν=1)\textbf{B}\sim\text{RDPG}(\textbf{Y},\nu=1), and seek to test

H0:X=Y, versus H1:X=Y,H_{0}:\textbf{X}\stackrel{{\scriptstyle\perp}}{{=}}\textbf{Y},\text{ versus }H_{1}:\textbf{X}\not\stackrel{{\scriptstyle\perp}}{{=}}\textbf{Y}, (10)

where X=Y\textbf{X}\stackrel{{\scriptstyle\perp}}{{=}}\textbf{Y} holds if there exists an orthogonal matrix 𝐖𝒪d\mathbf{W}\in\mathcal{O}_{d} such that X=Y𝐖\textbf{X}=\textbf{Y}\mathbf{W}. This rotation is to account for the inherent non-identifiability of the RDPG model, as latent positions X and Y satisfying X=Y\textbf{X}\stackrel{{\scriptstyle\perp}}{{=}}\textbf{Y} yield the same graph distribution. The semiparametric test of [11] used as its test statistic a suitably scaled version of the Frobenius norm between suitably rotated ASE estimates of X and 𝐘{\bf Y}; namely, an appropriately scaled version of TSemipar(A,B)=minW𝒪dX^WY^F,T_{\text{Semipar}}(\textbf{A},\textbf{B})=\min_{\textbf{W}\in\mathcal{O}_{d}}\lVert\widehat{\textbf{X}}\textbf{W}-\widehat{\textbf{Y}}\rVert_{F}, where X^=\widehat{\textbf{X}}=ASE(A,d\textbf{A},d) and Y^=\widehat{\textbf{Y}}=ASE(B,d\textbf{B},d). While consistency of the test based on TSemiparT_{\text{Semipar}} is shown in [11], the effect of shuffling vertex labels in the presence of the Procrustean alignment step is difficult to parse here, and is the subject of current research.

The separate graph embeddings cannot be compared in TSemiparT_{\text{Semipar}} above without first being aligned (e.g., without the Procrustes alignment provided by W) due to the non-identifiability of the RDPG model, and this added variability (uncertainty) motivated the Omnibus joint embedding regime of [45]. The Omnibus matrix in the m=2m=2 setting— mm here the number of graphs to embed—is defined as follows. Given two adjacency matrices A,Bn×n\textbf{A},\textbf{B}\in\mathbb{R}^{n\times n} on the same vertex set with known vertex correspondence, the omnibus matrix M2n×2n\textbf{M}\in\mathbb{R}^{2n\times 2n} is defined as

𝐌=[AA+B2A+B2B].{\bf M}=\begin{bmatrix}\textbf{A}&\frac{\textbf{A}+\textbf{B}}{2}\\ \frac{\textbf{A}+\textbf{B}}{2}&\textbf{B}\end{bmatrix}.

Note that this definition can easily extend to a sequence of matrices A1,,Am\textbf{A}_{1},\dots,\textbf{A}_{m}, where the i,ji,j-th block of the omnibus matrix Mij=(Ai+Aj)/2\textbf{M}_{ij}=(\textbf{A}_{i}+\textbf{A}_{j})/2 for all i,j[m]i,j\in[m]. We present the case when m=2m=2 for simplicity. When combined with ASE, the Omni framework allows for us to simultaneously produce directly comparable estimates of the latent positions of each network without the need for a rotation. Let the ASE of 𝐌\mathbf{M} be defined as ASE(𝐌,d)=𝐙~=[X~T,𝐘~T]T,\text{ASE}(\mathbf{M},d)=\widetilde{\mathbf{Z}}=[\widetilde{\textbf{X}}^{T},\widetilde{\bf Y}^{T}]^{T}, where 𝐙~2n×d\widetilde{\mathbf{Z}}\in\mathbb{R}^{2n\times d} provides, via its first nn rows denoted X~\widetilde{\textbf{X}}, an estimate of X and, via its second nn rows denoted 𝐘~\widetilde{\bf Y}, an estimate of 𝐘{\bf Y}. The Omnibus test, as proposed in [45], seeks to test the hypotheses in Eq. 10 via the test statistic TOmni=X~Y~F.T_{\text{Omni}}=\lVert\widetilde{\textbf{X}}-\widetilde{\textbf{Y}}\rVert_{F}. Concentration and asymptotic normality of TOmniT_{\text{Omni}} under H0H_{0} is established in [45] (see also the work analyzing TOmniT_{\text{Omni}} under the alternative in [73]), and in [45] the Omni-based test demonstrates superior empirical testing performance compared to the test in [11]. As in the case with TSemiparT_{\text{Semipar}}, the effect of shuffling vertex labels in the presence of the omnibus structural/construction alignment is difficult to theoretically understand, and is the subject of current research.

In the setting above, we will compare the performance of the P^\widehat{\textbf{P}} and adjacency-based tests with TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}} in a (slightly) modified version of the experimental setup of [45]. To wit, we consider paired 100100-vertex RDPG graphs where the rows of X are i.i.d. Dirichlet(α=(1,1,1)\alpha=(1,1,1)) random vectors, with the exception that the first five rows of X are fixed to be (0.8,0.1,0.1)(0.8,0.1,0.1) (in [45] they consider one fixed row). The rows of Y are identical to those of X, with the exception that the first five rows of Y are fixed to be (1λ)(0.8,0.1,0.1)+λ(0.1,0.1,0.8)(1-\lambda)(0.8,0.1,0.1)+\lambda(0.1,0.1,0.8); here we consider λ\lambda ranging over (0,0.25,0.5,0.75,1)(0,0.25,0.5,0.75,1) (plots for the λ\lambda and kk values not shown here can be found in Appendix A.4). In the language of Section 2, we are setting here r=5r=5, with the λ\lambda controlling the level of error ei,je_{i,j} in each entry of 𝐄\mathbf{E}. This change in notation (from 𝐄\mathbf{E} to λ\lambda) is to maintain consistency with the notation used in the motivating work of [45, 74]. Note that in each ASE computed in this experiment, the true d=3d=3 dimension was used for the embedding.

As in the connectomic real-data example, incorporating the unknown shuffling of Un,kU_{n,k} into the adjacency and P^\widehat{\textbf{P}}-based tests and into TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}} is tricky here, as for moderate kk it is computationally infeasible to compute the conservative critical values exactly. Our compromise is that in the Monte Carlo simulations below, we sample random permutations that fix no element of Un,kU_{n,k} to act as the elements generating the least favorable null; while this will not guarantee the worst case shuffling is sampled (and so the test will most-likely not achieve its desired level of α=0.05\alpha=0.05), this seems reasonable in light of the non-fixed rows of X being i.i.d.

In order to simulate testing H0:X=YH_{0}:\textbf{X}\stackrel{{\scriptstyle\perp}}{{=}}\textbf{Y}, we consider the following two-tiered Monte Carlo simulation approach.

  • 1.

    For i=1,,nMC1=50i=1,\ldots,nMC_{1}=50

    • i.

      Simulate X and Y, drawn as above

    • ii.

      For j=1,,nMC2=100j=1,\ldots,nMC_{2}=100

      • a.

        Generate a nested sequence of random derangements of the elements of Un,kU_{n,k} (ranging over kk);

      • b.

        Simulate A1,A2i.i.dRDPG(X,ν=1)\textbf{A}_{1},\textbf{A}_{2}\stackrel{{\scriptstyle\text{i.i.d}}}{{\sim}}\text{RDPG}(\textbf{X},\nu=1) and independently simulate A3RDPG(Y,ν=1)\textbf{A}_{3}\sim\text{RDPG}(\textbf{Y},\nu=1);

      • c.

        Compute the test statistic under the null and alternative for each kk, namely compute T(k,j)=P^1P^2,kFT^{(k,j)}=\|\hat{P}_{1}-\hat{P}_{2,k}\|_{F} for the null, and T(,j)=P^1P^3,FT^{(\ell,j)}=\|\hat{P}_{1}-\hat{P}_{3,\ell}\|_{F} for the alternative;

    • iii.

      Estimate the critical value for the test for each kk using {T(k,j)=P^1P^2,kF}j=1100\{T^{(k,j)}=\|\hat{P}_{1}-\hat{P}_{2,k}\|_{F}\}_{j=1}^{100}

    • iv.

      Estimate the power for the test at each kk by computing the proportion of {T(,j)=P^1P^3,F}j=1100\{T^{(\ell,j)}=\|\hat{P}_{1}-\hat{P}_{3,\ell}\|_{F}\}_{j=1}^{100} greater than the critical value in step iii.

  • 2.

    Compute the power average over the nMC1=50nMC_{1}=50 outer Monte Carlo iterates.

Results are displayed in Figure 56 where we range k=(20,50,75,100)k=(20,50,75,100) (in the figures in Appendix A.4, we range k=(10,20,30,50,75,100)k=(10,20,30,50,75,100)) and different kk values are represented by different colors/shapes of the plotted lines. Here k\ell\leq k, the number shuffled in the alternative (i.e., the number of actual incorrect labels in Un,kU_{n,k}), ranges over the values plotted on the xx-axis. Figure 5 (resp., Figure 6) show results for λ=0.75\lambda=0.75 (resp., λ=1\lambda=1); plots for the remaining values of λ\lambda can be found in Appendix A.4. From the figures, we see that, as expected, larger values of λ\lambda (i.e., more signal in the alternative) yield higher testing power, and that power diminishes greatly as the number of vertices shuffled in the null is increasing relative to the number shuffled in the alternative. We also note that the Omnibus based test appears to be more robust to shuffling than the other tests; developing analogous theory to Theorems 2.12.3 for the Omnibus test is a natural next step, though we do pursue this further here. We lastly note that the loss in power in the large kk settings is more pronounced here than in the SBM simulations, even when k\ell\approx k; we suspect here this is due to the noise introduced by the large amount of shuffling being of higher order than the signal in the alternative.

Refer to caption
Figure 7: In the left (resp., right) panel, we plot the Frobenius norm difference of the estimated P^\widehat{\textbf{P}} between the same (resp., across) network as the number of shuffled vertex labels is increased within (resp., across) networks; the xx-axis represents the number of vertices shuffled. Note the different scales on the yy-axes of the two panels in the figure.

5 Shuffling in social networks

In this section, we explore the shuffled testing phenomena in the context of the social media data found in [32]. The data contains a multilayer social network consisting of user activity across three distinct social networks: YouTube, Twitter, and FriendFeed; after initially cleaning the data (removing isolates, and symmetrizing the networks, etc.), there are a total of 422 common users across the three networks. Given adjacency matrices of our three 422-user social networks A=AY\textbf{A}=\textbf{A}_{\textbf{Y}} (Youtube), B=B𝐓\textbf{B}=\textbf{B}_{\bf T} (Twitter), and C=C𝐅\textbf{C}=\textbf{C}_{\bf F} (FriendFeed), we ultimately wish to understand the effect of vertex misalignment on testing the following hypotheses

H0(1)\displaystyle H_{0}^{(1)} :(A)=(B);H0(2):(A)=(C);H0(3):(B)=(C)\displaystyle\!\!:\!\mathcal{L}(\textbf{A})\!=\!\mathcal{L}(\textbf{B});\,\,H_{0}^{(2)}\!\!:\!\mathcal{L}(\textbf{A})\!=\!\mathcal{L}(\textbf{C});\,\,H_{0}^{(3)}\!\!:\!\mathcal{L}(\textbf{B})\!=\!\mathcal{L}(\textbf{C})
H1(1)\displaystyle H_{1}^{(1)} :(A)(B);H1(2):(A)(C);H1(3):(B)(C)\displaystyle\!\!:\!\mathcal{L}(\textbf{A})\!\neq\!\mathcal{L}(\textbf{B});\,\,H_{1}^{(2)}\!\!:\!\mathcal{L}(\textbf{A})\!\neq\!\mathcal{L}(\textbf{C});\,\,H_{1}^{(3)}\!\!:\!\mathcal{L}(\textbf{B})\!\neq\!\mathcal{L}(\textbf{C})

In this case, it is difficult to get a handle on the critical values across these three tests under shuffling, so we consider the following simple initial illustrative experiment to shed light on these tests. Assuming an underlying RDPG model for each of the three networks, and using Friendfeed as our fulcrum (note that similar results are obtained using Twitter as the fulcrum network), we consider testing the simple parametric hypotheses under the effect of shuffling

H0(1)\displaystyle H_{0}^{(1)} :ARDPG(XC)H0(2):BRDPG(XC)\displaystyle:\textbf{A}\sim\text{RDPG}(\textbf{X}_{\textbf{C}})\quad\quad H_{0}^{(2)}:\textbf{B}\sim\text{RDPG}(\textbf{X}_{\textbf{C}})
H1(1)\displaystyle H_{1}^{(1)} :A≁RDPG(XC)H1(2):B≁RDPG(XC)\displaystyle:\textbf{A}\not\sim\text{RDPG}(\textbf{X}_{\textbf{C}})\quad\quad H_{1}^{(2)}:\textbf{B}\not\sim\text{RDPG}(\textbf{X}_{\textbf{C}}) (11)

Letting P\textbf{P}_{\bullet} represents the edge probability matrix for the corresponding social network, we compute T(A,C)=12P^AP^CF2T(\textbf{A},\textbf{C})=\frac{1}{2}\|\widehat{\textbf{P}}_{\textbf{A}}-\widehat{\textbf{P}}_{\textbf{C}}\|_{F}^{2} (similarly T(B,C)T(\textbf{B},\textbf{C}) and T(A,B)T(\textbf{A},\textbf{B})), where the embedding dimensions needed to compute P^=X^X^T\widehat{\textbf{P}}_{\bullet}=\widehat{\textbf{X}}_{\bullet}\widehat{\textbf{X}}^{T}_{\bullet} (X^\widehat{\textbf{X}}_{\bullet} being the ASE of the associated network) are each chosen via an automated elbow finder on the scree plot of A,B,C\textbf{A},\textbf{B},\textbf{C} inspired by [72] and [70], and where we then set a common embedding dimension for the three networks to the max of these three estimated dimensions.

We plot these initial findings in Figure 7. In the left (resp., right) panel of the figure, we plot the Frobenius norm difference of the estimated P^\widehat{\textbf{P}} between the same (resp., across) network as the number of shuffled vertex labels is increased within (resp., across) networks; the xx-axis represents the number of vertices shuffled. Note the different scales on the yy-axes of the two panels in the figure. From the figure, we see that although all network pairs differ significantly from each other, the FriendFeed and Twitter networks are more similar to each other (according to TT) than either is to the Youtube network. However, this is obscured given enough vertex shuffling as seen by the green curve crossing the blue/purple curves in the right panel. Given enough uncertainty in the vertex labels, we posit that a conservative test using TT—i.e., one that must assume the uncertain labels are shuffled—would compute the FriendFeed and Twitter networks to be more similar to each other (if the uncertain labels are, in fact, correct and not shuffled under H1H_{1}) than either is to themselves, and so we should be less likely to reject H0(2)H_{0}^{(2)} of Eq. 11.

Refer to caption
Figure 8: Using a parametric bootstrap with 200 bootstrapped replicates to estimate testing power for H0(1)H_{0}^{(1)} and H0(2)H_{0}^{(2)}. The amount shuffled in the null is kk; the different colored curves represent the different kk-values. The amount shuffled in the alternative, k\ell\leq k, is plotted on the xx-axis. Here, the results are averaged over nMC=50nMC=50 Monte Carlo replicates (±2\pm 2s.e.).

We see this play out in Figure 8, where we use a parametric bootstrap (assuming the RDPG model framework) with 200 bootstrapped replicates to estimate each of the critical values and the testing power in order to see the effect of vertex shuffling on the testing power for H0(1)H_{0}^{(1)} and H0(2)H_{0}^{(2)} of Eq. 11; note that as in section 1.3 we consider the following modification of the overall testing regime: We consider a fixed (randomly chosen) sequence of nested permutations QkΠn,k\textbf{Q}_{k}\in\Pi_{n,k} for k=25, 50, 100, 150, 200, 250k=25,\,50,\,100,\,150,\,200,\,250 for H(2)H^{(2)} and k=5,10,15,20,25, 50k=5,10,15,20,25,\,50 for H(1)H^{(1)}. We consider shuffling the null by Qk\textbf{Q}_{k} and the alternative by Q\textbf{Q}_{\ell} for all k\ell\leq k. We then repeat this process nMC=50nMC=50 times, each time obtaining a bootstrapped estimate of testing power against H0H_{0}. In the figure, we plot the average empirical testing power, where the amount shuffled in the conservative null is kk (so that Un,kU_{n,k} is entirely shuffled); the different colored curves represent the different kk-values. The amount actually shuffled in the alternative, k\ell\leq k, is plotted on the xx-axis. From the figure, we see that the test would reject the null in both cases when few vertices are shuffled (i.e., a small kk-value) or when kk-\ell is small in the Twitter versus Friendfeed panel; this is as expected from Figure 7, as the networks all seem to differ significantly from each other. However, testing power degrades significantly when kk-\ell is large in the Twitter versus Friendfeed setting and in the large kk setting for Youtube versus Friendfeed, and the test no longer rejects the null. While the former is as expected—indeed, with enough uncertainty the small differences across the Twitter and Youtube networks is lost in the shuffle, even when the networks are different—the latter is surprising. Further investigation yields that this power degradation as a function of kk alone stems from a difference in density, as the Youtube network is sparse and P^CP^AFP^CF\|\widehat{\textbf{P}}_{C}-\widehat{\textbf{P}}_{A}\textbf{}\|_{F}\approx\|\widehat{\textbf{P}}_{C}\|_{F}; in this case the error in shuffling under the null when kk is large overwhelms any effect of shuffling in the alternative.

If the sparse Youtube network is used as the fulcrum, then both power figures would have power equal to one uniformly. The sparse Youtube graph is markedly different in degree from the two denser networks, and the shuffling does not affect that difference. In general, testing for a measure of equality in graphs with markedly different degree distributions would require a modified version of the hypotheses and test statistic, perhaps of the form (inspired by [11]) 𝐏^1c𝐏^2F\|\widehat{\mathbf{P}}_{1}-c\widehat{\mathbf{P}}_{2}\|_{F} for a scaling constant cc (to test equality of latent positions up to rotation and constant scaling) or 𝐏^1𝐃𝐏^2𝐃TF\|\widehat{\mathbf{P}}_{1}-\mathbf{D}\widehat{\mathbf{P}}_{2}\mathbf{D}^{T}\|_{F} for a diagonal scaling matrix 𝐃\mathbf{D} (to test equality of latent positions up to rotation and diagonal scaling).

6 Graph matching (unshuffling)

Once we quantify the added uncertainty due to shuffles in the vertex correspondences, it is natural to try to remedy this shuffling via graph matching. In this context, the graph matching problem seeks the correspondence between the vertices of the graphs that minimizes the number of induced edge disagreements [37]. Formally, letting Πn\Pi_{n} again denote the sets of n×nn\times n permutations matrices, the simplest formulation of the graph matching problem can be cast as seeking elements in argminQΠnAQBQTF2\operatorname*{arg\,min}_{\textbf{Q}\in\Pi_{n}}\lVert\textbf{A}-\textbf{QBQ}^{T}\rVert_{F}^{2}. For a survey on the current state of the graph matching literature, see [35, 36, 37].

The problem of graph matching can often be made easier by having prior information about the true vertex correspondence. This information can come in the form of seeds, or a list of vertices for which the true, latent correspondence is known a priori; see [75, 47, 76]. In the current shuffled testing regime, only the vertices of Un,kU_{n,k} have unknown correspondence, and so the graph matching problem would reduce to seeking elements in argminQΠn,kAQBQTF2\operatorname*{arg\,min}_{\textbf{Q}\in\Pi_{n,k}}\lVert\textbf{A}-\textbf{QBQ}^{T}\rVert_{F}^{2}; i.e., those vertices in Mn,kM_{n,k} can be treated as seeds. While there are no efficient algorithms known for matching in general (with or without seeds), there are a number of approximate seeded graph matching algorithms (see, for example, [47, 76]) that have proven effective in practice. In applications below, we will make use of the SGM algorithm of [47] to approximately solve the seeded graph matching problem.

6.1 SBM Shuffling and Matching

Refer to caption
Figure 9: (Top row) Shuffling different number of vertices across two blocks in an SBM model from Section 6.1 with ϵ=0.01\epsilon=0.01, 0.050.05 to calculate power for n=500n=500 vertices. (Bottom row) The power of the test after using SGM to match the kk vertices in Un,kU_{n,k} under the null, versus the \ell vertices truly shuffled in the alternative. Results are averaged over 100 Monte Carlo replicates with error bars ±2\pm 2s.e.

We first begin by testing the dual effects of shuffling and matching on testing power in the SBM setting (where, as in Section 3.0.1, the critical values and the power can be computed via direct simulation). Adopting the notation of Section 2.1.2, we first consider an SBM with n1=n2=250n_{1}=n_{2}=250 vertices, where Λ\Lambda is given by:

Λ=(0.550.40.40.45)250×250{\Lambda}=\begin{pmatrix}0.55&0.4\\ 0.4&0.45\end{pmatrix}\in\mathbb{R}^{250\times 250}

Letting Qh\textbf{Q}_{h} (hkh\leq k, so that h/2h/2 vertices are shuffled between each of the two blocks) be as defined in Section 2.1.2, we consider the error matrix E=ϵ𝐉n\textbf{E}=\epsilon\mathbf{J}_{n}; in this example we will consider ϵ{0.01,0.05}\epsilon\in\{0.01,0.05\}. In this setting, we will simulate directly (using the true model parameters) from the SBM models to estimate the relevant critical values and the testing powers; here, all critical values and power estimates are based on nMC=100nMC=100 Monte Carlo replicates.

In Figures 9, we plot (in the upper panels) the power loss due to shuffling in the ϵ=0.01\epsilon=0.01 and 0.050.05 settings; in the bottom panels, we plot the effect of shuffling and then matching on testing power (where we first match the graphs, and then use the unshuffled statistic P^1P^2F\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2}\|_{F} for the hypothesis test. For each kk, we would (in practice) use SGM to unshuffle all kk vertices in Un,kU_{n,k} regardless of the value of \ell—represented here by the unshuffling in the =k\ell=k case. Here, we also show the effect of unshuffling only the \ell vertices truly shuffled in the alternative. From the k=k=\ell case, we see that matching recovers the lost shuffling power. From the <k\ell<k setting, we see that the (possible) downside of unshuffling is its propensity to align the two graphs better than the true, latent alignment. In general, the more vertices being matched, the smaller the graph matching objective function, and hence the smaller the test statistic, which here manifests as larger than desired type-I error probability. Note that in each ASE used to compute P^\widehat{\textbf{P}} in this experiment, we used the true value of d=2d=2.

Refer to caption
Figure 10: (Top row) Shuffling different number of vertices across two blocks in an RDPG model from Section 4 with ϵ=0.01\epsilon=0.01, and 0.050.05 to calculate power for n=500n=500 vertices. (Bottom row) The power of the test after using SGM to match the kk vertices in Un,kU_{n,k} under the null, versus the \ell vertices truly shuffled in the alternative. Error bars are ±2\pm 2s.e.

These results play out in more general random graph settings as well, with Figure 10, showing the corresponding results in the RDPG setting of Section 4; i.e., each graph is n=500n=500 vertices with the rows of X are i.i.d. Dirichlet(1,1,1). The error here is added directly to the latent positions, as X~=X+ϵ𝐉n,3\widetilde{\textbf{X}}=\textbf{X}+\epsilon\mathbf{J}_{n,3} for ϵ=0.01\epsilon=0.01, and 0.050.05. In the left panels, where we are fixing the latent positions and testing H0:(A)=(B)H_{0}:\mathcal{L}(A)=\mathcal{L}(B), the results are similar: unshuffling recovering the testing power, and over-matching providing artificially high testing power. In the right panel, we show results for testing H0:(X)=(Y)H_{0}:\mathcal{L}(\textbf{X})=\mathcal{L}(\textbf{Y}) where the latent positions are random and the rows of X are i.i.d. In this case, shuffling has no effect on testing power (as seen in the upper panel), while over-matching can still detrimentally add artificial power (as seen in the lower panel).

6.2 Brain networks shuffling and matching

We explore the dual effects of shuffling and matching on testing power in the motivating brain testing example of Section 1.3 (again with 200 bootstrap samples and averaged over nMC=100 Monte Carlo trials). In Figures 11, we plot (in the left panel) the power of the test when the brains are shuffled, then matched before testing (the power here is estimated using the parametric bootstrap procedure outlined in Section 1.3); note, as we first match the graphs we use the unshuffled statistic for this hypothesis test, i.e., we use T(A1,A2)=P^1P^2FT(\textbf{A}_{1},\textbf{A}_{2})=\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2}\|_{F} when estimating the null critical value, and T(A1,A3)=P^1P^3FT(\textbf{A}_{1},\textbf{A}_{3})=\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{3}\|_{F} when estimating the testing power. Again, we show the effect of unshuffling all kk unknown vertices in the null and only the \ell vertices truly shuffled in the alternative. As in the simulations, from the k=k=\ell case, we see that matching recovers the lost shuffling power while maintaining the desired testing level. From the <k\ell<k setting, we see that this recovered power is at the expense of testing level much greater than the desired α=0.05\alpha=0.05. In the event that the matching is viewed as a pre-processing step, matching more vertices in the null than the alternative could increase the true level of the test resulting in heightened type-I error risk (resulting in possibly artificially high testing power) as seen in the right panel of Figure 11.

Refer to caption
Figure 11: Results for 200 bootstrap replicates to estimate power in the shuffled-then-unshuffled brain test. In the figure the xx-axis represents the number of vertices unshuffled via Q\textbf{Q}_{\ell} (from 0 to kk) while the curve colors represent the maximum number of vertices potentially shuffled via Πk,n\Pi_{k,n}, here all shuffled by Qk\textbf{Q}_{k}. The left panel displays testing power, the right type-I error with the dashed line at y=0.05y=0.05; results are averaged over 100 Monte Carlo iterates (error bars are ±2\pm 2s.e.).

7 Conclusions

As network data has become more commonplace, a number of statistical tools tailored for handling network data have been developed, including methods for hypothesis testing, goodness-of-fit analysis, clustering, and classification among others. Classically, many of the paired or multiple network inference tasks have assumed the vertices are aligned across networks, and this known alignment is often leveraged to improve subsequent inference. Exploring the impact of shuffled/misaligned vertices on inferential performance—here on testing power—is an increasingly important problem as these methods gain more traction in noisy network domains. By systematically breaking down a simple Frobenius-norm hypothesis test, we uncover and numerically analyze the decline of power as a function of both the distributional difference in the alternative and the number of shuffled nodes. Further analysis in a pair of real data settings reinforce our findings in giving practical guidance for the level of tolerable noise in paired, vertex-aligned testing procedures.

Our most thorough analysis of power loss is done in the context of random dot product and stochastic block models; bolstered by extensive simulation with real data experiments backing up our findings. While the goal of our research is to test the robustness of multiple network hypothesis testing methodologies, there still remains work to do in extending our findings to more general network models and to more complex network testing paradigms. A natural next step is to lift the simple Frobenius norm hypothesis test analysis to more broad and complex models, as well as test misalignment of vertices in the Omnibus and Semiparametric testing settings (see Section 4). Within the context of SBM’s we aim to see how our power analysis changes for more esoteric shufflings (more blocks, different number flipped between blocks, etc.). Further extensions include extending the theory to non-edge independent graph models, and to shuffling in two sample tests where there are multiple graphs per sample, and where there is an interplay to explore in shuffling within and across populations. In non-testing inference tasks, often the vertices are assumed aligned as well (e.g., tensor factorization, multiple graph embeddings, etc.), and exploring the inferential performance loss due to shuffled vertices in these settings is a natural next step.

In the event that vertex labels are incorrectly known, it is natural to use a graph matching/network alignment methods to align the networks before proceeding with subsequent inference. There are a host of matching procedures in the literature that could be applied to recover the true vertex alignment, and in doing so recover the lost testing power [40]. We explore this in the context of our simulations and find that while matching recovers the lost power, “over matching” can result in artificially high power. Care must be taken when using alignment tools for data pre-processing, as graph matching methods can induce artificial signal across even disparate network pairs (this is related to the “phantom alignment strength” phenomena of [77]). Natural next steps in this direction include the following questions (among others): how the signal in an imperfectly recovered matching affects power loss as opposed to a random misalignment; how a probabilistic alignment (where the unknown in vertex labels is encoded into a stochastic matrix giving probabilities of alignment) can be incorporated into the testing framework; and how to use matching metrics (e.g., alignment strength [50, 77]) to estimate the size and membership of Un,kU_{n,k} when this is unknown a priori.

Acknowledgements: This material is based on research sponsored by the Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreement number FA8750-20-2-1001. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the AFRL and DARPA or the U.S. Government.

References

  • [1] A. Goldenberg, A. X. Zheng, S. E. Fienberg, E. M. Airoldi, A survey of statistical network models, Found. Trends Mach. Learn. 2 (2) (2009) 129–233. arXiv:0912.5410, doi:10.1561/2200000005.
  • [2] E. Bullmore, O. Sporns, Complex brain networks: graph theoretical analysis of structural and functional systems, Nature reviews neuroscience 10 (3) (2009) 186–198.
  • [3] D. Durante, D. B. Dunson, Bayesian inference and testing of group differences in brain networks, Bayesian Analysis 13 (1) (2018) 29–58.
  • [4] J. Chung, E. Bridgeford, J. Arroyo, B. D. Pedigo, A. Saad-Eldin, V. Gopalakrishnan, L. Xiang, C. E. Priebe, J. T. Vogelstein, Statistical connectomics, Annu. Rev. Stat. Appl. 8 (2021) 463–492.
  • [5] J. C. Mitchell, Social networks, Annual review of anthropology 3 (1) (1974) 279–299.
  • [6] P. J. Carrington, J. Scott, S. Wasserman, Models and methods in social network analysis, Vol. 28, Cambridge university press, 2005.
  • [7] A. Vazquez, A. Flammini, A. Maritan, A. Vespignani, Global protein function prediction from protein-protein interaction networks, Nature biotechnology 21 (6) (2003) 697–700.
  • [8] O. N. Temkin, A. V. Zeigarnik, D. Bonchev, Chemical reaction networks: a graph-theoretical approach, CRC Press, 2020.
  • [9] E. D. Kolaczyk, Statistical Analysis of Network Data: Methods and Models, Springer Science & Business Media, 2009.
  • [10] E. D. Kolaczyk, G. Csárdi, Statistical analysis of network data with R, Vol. 65, Springer, 2014.
  • [11] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, Y. Park, C. E. Priebe, A semiparametric two-sample hypothesis testing problem for random graphs, Journ. of Computational and Graphical Statistics 26 (2) (2017) 344–354.
  • [12] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, C. E. Priebe, A nonparametric two-sample hypothesis testing problem for random graphs, Bernoulli 23 (3) (2017) 1599–1630. arXiv:arXiv:1409.2344, doi:10.3150/15-BEJ789.
  • [13] C. E. Ginestet, J. Li, P. Balachandran, S. Rosenberg, E. D. Kolaczyk, Hypothesis testing for network data in functional neuroimaging, Ann. Appl. Stat. (2017) 725–750.
  • [14] D. R. Hunter, S. M. Goodreau, M. S. Handcock, Goodness of fit of social network models, J. Am. Stat. Assoc. 103 (481) (2008) 248–258.
  • [15] J. Lei, A goodness-of-fit test for stochastic block models, Ann. Stat. 44 (1) (2016) 401–424.
  • [16] Y. R. Wang, P. J. Bickel, Likelihood-based model selection for stochastic block models, Ann. Stat. 45 (2) (2017) 500–528.
  • [17] M. E. Newman, Clustering and preferential attachment in growing networks, Physical review E 64 (2) (2001) 025102.
  • [18] A. Clauset, M. E. Newman, C. Moore, Finding community structure in very large networks, Physical review E 70 (6) (2004) 066111.
  • [19] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journ. of statistical mechanics: theory and experiment 2008 (10) (2008) P10008.
  • [20] K. Rohe, S. Chatterjee, B. Yu, Spectral clustering and the high-dimensional stochastic blockmodel, Ann. Stat. 39 (4) (2011) 1878–1915.
  • [21] D. L. Sussman, M. Tang, D. E. Fishkind, C. E. Priebe, A consistent adjacency spectral embedding for stochastic blockmodel graphs, J. Am. Stat. Assoc. 107 (499) (2012) 1119–1128. arXiv:1108.2228, doi:10.1080/01621459.2012.699795.
  • [22] J. Lei, A. Rinaldo, Consistency of spectral clustering in stochastic block models, Ann. Stat. 43 (1) (2015) 215–237.
  • [23] J. T. Vogelstein, C. E. Priebe, Shuffled graph classification: Theory and connectome applications, J. Classif. 32 (1) (2015) 3–20.
  • [24] M. Tang, D. L. Sussman, C. E. Priebe, Universally consistent vertex classification for latent positions graphs, Ann. Stat. 41 (3) (2013) 1406–1430.
  • [25] V. Lyzinski, M. Tang, A. Athreya, Y. Park, C. E. Priebe, Community detection and classification in hierarchical stochastic blockmodels, IEEE Trans. on Network Science and Engineering 4 (1) (2016) 13–26.
  • [26] M. Zhang, Z. Cui, M. Neumann, Y. Chen, An end-to-end deep learning architecture for graph classification, in: Thirty-Second AAAI Conf. on Artificial Intelligence, 2018.
  • [27] D. Durante, D. B. Dunson, J. T. Vogelstein, Nonparametric bayes modeling of populations of networks, Journal of the American Statistical Association 112 (520) (2017) 1516–1530.
  • [28] J. Arroyo, A. Athreya, J. Cape, G. Chen, C. E. Priebe, J. T. Vogelstein, Inference for multiple heterogeneous networks with a common invariant subspace (2019) 1–45arXiv:1906.10026.
    URL http://arxiv.org/abs/1906.10026
  • [29] V. Lyzinski, D. L. Sussman, Matchability of heterogeneous networks pairs, Information and Inference: A Journal of the IMA 9 (4) (2020) 749–783.
  • [30] B. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On the evolution of user interaction in facebook, in: Proceedings of the 2nd ACM workshop on Online social networks, 2009, pp. 37–42.
  • [31] M. Heimann, H. Shen, T. Safavi, D. Koutra, Regal: Representation learning-based graph alignment, in: Proceedings of the 27th ACM international conference on information and knowledge management, 2018, pp. 117–126.
  • [32] M. Magnani, L. Rossi, The ml-model for multi-layer social networks, in: Inter. Conf. Adv. Soc Netw Anal Min, IEEE, 2011, pp. 5–12.
  • [33] H. G. Patsolic, Y. Park, V. Lyzinski, C. E. Priebe, Vertex nomination via seeded graph matching, Statistical Analysis and Data Mining: The ASA Data Science Journal 13 (3) (2020) 229–244.
  • [34] R. Mastrandrea, J. Fournet, A. Barrat, Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys, PloS one 10 (9) (2015) e0136497.
  • [35] D. Conte, P. Foggia, C. Sansone, M. Vento, Thirty years of graph matching in pattern recognition, Intern J Pattern Recognit Artif Intell 18 (03) (2004) 265–298.
  • [36] P. Foggia, G. Percannella, M. Vento, Graph matching and learning in pattern recognition in the last 10 years, Intern J Pattern Recognit Artif Intell 28 (01) (2014) 1450001.
  • [37] J. Yan, X.-C. Yin, W. Lin, C. Deng, H. Zha, X. Yang, A short survey of recent advances in graph matching, in: Proceedings of the 2016 ACM on International Conf. on Multimedia Retrieval, 2016, pp. 167–174.
  • [38] B. D. Pedigo, M. Winding, C. E. Priebe, J. T. Vogelstein, Bisected graph matching improves automated pairing of bilaterally homologous neurons from connectomes, Network Neuroscience 7 (2) (2023) 522–538.
  • [39] M. Fiori, P. Sprechmann, J. Vogelstein, P. Musé, G. Sapiro, Robust multimodal graph matching: Sparse coding meets graph matching, Advances in neural information processing systems 26 (2013).
  • [40] V. Lyzinski, Information recovery in shuffled graphs via graph matching, IEEE Trans. on Information Theory 64 (5) (2018) 3254–3273.
  • [41] G. Chen, J. Arroyo, A. Athreya, J. Cape, J. T. Vogelstein, Y. Park, C. White, J. Larson, W. Yang, C. E. Priebe, Multiple network embedding for anomaly detection in time series of graphs, arXiv preprint arXiv:2008.10055 (2020).
  • [42] L. Chen, J. Zhou, L. Lin, Hypothesis testing for populations of networks, Communications in Statistics-Theory and Methods 52 (11) (2023) 3661–3684.
  • [43] X. Du, M. Tang, Hypothesis testing for equality of latent positions in random graphs, Bernoulli 29 (4) (2023) 3221–3254.
  • [44] D. Asta, C. R. Shalizi, Geometric network comparison, Proceedings of the 31st Annual Conference on Uncertainty in AI (UAI) (2015).
  • [45] K. Levin, A. Athreya, M. Tang, V. Lyzinski, C. E. Priebe, A central limit theorem for an omnibus embedding of multiple random dot product graphs, 2017 IEEE inter. conf. on data mining workshops (2017) 964–967.
  • [46] J. Agterberg, M. Tang, C. Priebe, Nonparametric two-sample hypothesis testing for random graphs with negative and repeated eigenvalues, arXiv preprint arXiv:2012.09828 (2020).
  • [47] D. E. Fishkind, S. Adali, H. G. Patsolic, L. Meng, D. Singh, V. Lyzinski, C. E. Priebe, Seeded graph matching, Pattern Recognit. 87 (2019) 203–215.
  • [48] G. Coppersmith, Vertex nomination, Wiley Interdisciplinary Reviews: Computational Statistics 6 (2) (2014) 144–153.
  • [49] D. E. Fishkind, V. Lyzinski, H. Pao, L. Chen, C. E. Priebe, Vertex nomination schemes for membership prediction, Ann. Appl. Stat. 9 (3) (2015) 1510–1532.
  • [50] D. E. Fishkind, L. Meng, A. Sun, C. E. Priebe, V. Lyzinski, Alignment strength and correlation for graphs, Pattern Recognit. Lett. 125 (2019) 295–302.
  • [51] K. Levin, E. Levina, Bootstrapping networks with latent space structure, arXiv preprint arXiv:1907.10821 (2019).
  • [52] X. Zuo, J. S. Anderson, P. Bellec, R. M. Birn, B. B. Biswal, J. Blautzik, J. C. Breitner, R. L. Buckner, V. D. Calhoun, F. X. Castellanos, et al., An open science resource for establishing reliability and reproducibility in functional connectomics, Scientific data 1 (1) (2014) 1–13.
  • [53] G. Kiar, E. W. Bridgeford, W. R. G. Roncal, C. for Reliability, R. (CoRR), V. Chandrashekhar, D. Mhembere, S. Ryman, X. Zuo, D. S. Margulies, R. C. Craddock, et al., A high-throughput pipeline identifies robust connectomes but troublesome variability, bioRxiv (2017) 188706.
  • [54] S. J. Young, E. R. Scheinerman, Random dot product graph models for social networks, in: International Workshop on Algorithms and Models for the Web-Graph, Springer, 2007, pp. 138–149.
  • [55] A. Athreya, D. E. Fishkind, M. Tang, C. E. Priebe, Y. Park, J. T. Vogelstein, K. Levin, V. Lyzinski, Y. Qin, D. L. Sussman, Statistical inference on random dot product graphs: A survey, J. Mach. Learn. Res. 18 (2018) 1–92.
  • [56] P. D. Hoff, A. E. Raftery, M. S. Handcock, Latent space approaches to social network analysis, J. Am. Stat. Assoc. 97 (460) (2002) 1090–1098.
  • [57] P. W. Holland, K. B. Laskey, S. Leinhardt, Stochastic blockmodels: First steps, Social networks 5 (2) (1983) 109–137.
  • [58] E. M. Airoldi, D. M. Blei, S. E. Fienberg, E. P. Xing, Mixed membership stochastic blockmodels, J. Mach. Learn. Res. (2008).
  • [59] B. Karrer, M. E. J. Newman, Stochastic blockmodels and community structure in networks, Physical review E 83 (1) (2011) 016107.
  • [60] T. Li, L. Lei, S. Bhattacharyya, K. Van den Berge, P. Sarkar, P. J. Bickel, E. Levina, Hierarchical community detection by recursive partitioning, J. Am. Stat. Assoc. (2020) 1–18.
  • [61] P. Rubin-Delanchy, J. Cape, M. Tang, C. E. Priebe, A statistical interpretation of spectral embedding: the generalised random dot product graph, J. R. Stat. Soc. Series B to appear (2022).
  • [62] D. L. Sussman, M. Tang, C. E. Priebe, Consistent latent position estimation and vertex classification for random dot product graphs, IEEE Trans. Pattern Anal. Mach. Intell. 36 (1) (2014) 48–57. arXiv:arXiv:1207.6745v1, doi:10.1109/TPAMI.2013.135.
  • [63] I. Gallagher, A. Jones, A. Bertiger, C. E. Priebe, P. Rubin-Delanchy, Spectral embedding of weighted graphs, Journal of the American Statistical Association (2023) 1–10.
  • [64] V. Lyzinski, D. L. Sussman, M. Tang, A. Athreya, C. E. Priebe, Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding, Electron. J. Stat. 8 (2014) 2905–2922. arXiv:1310.0532, doi:10.1214/14-EJS978.
  • [65] A. Athreya, C. E. Priebe, M. Tang, V. Lyzinski, D. J. Marchette, D. L. Sussman, A limit theorem for scaled eigenvectors of random dot product graphs, Sankhya A 78 (2016) 1–18.
  • [66] F. Sanna Passino, N. A. Heard, P. Rubin-Delanchy, Spectral clustering on spherical coordinates under the degree-corrected stochastic blockmodel, Technometrics (just-accepted) (2021) 1–28.
  • [67] K. Pantazis, A. Athreya, W. N. Frost, E. S. Hill, V. Lyzinski, The importance of being correlated: Implications of dependence in joint spectral inference across multiple networks, Journal of Machine Learning Research 23 (141) (2022) 1–77.
  • [68] J. Yoder, L. Chen, H. Pao, E. Bridgeford, K. Levin, D. E. Fishkind, C. Priebe, V. Lyzinski, Vertex nomination: The canonical sampling and the extended spectral nomination schemes, Computational Statistics & Data Analysis 145 (2020) 106916.
  • [69] D. E. Fishkind, D. L. Sussman, M. Tang, J. T. Vogelstein, C. E. Priebe, Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown, SIAM Journ. on Matrix Analysis and Applications 34 (1) (2013) 23–39.
  • [70] S. Chatterjee, Matrix estimation by universal singular value thresholding, Ann. Stat. 43 (1) (2015) 177–214.
  • [71] T. Li, E. Levina, J. Zhu, Network cross-validation by edge sampling, Biometrika 107 (2) (2020) 257–276.
  • [72] M. Zhu, A. Ghodsi, Automatic dimensionality selection from the scree plot via the use of profile likelihood, Computational Statistics & Data Analysis 51 (2) (2006) 918–930.
  • [73] B. Draves, D. L. Sussman, Bias-variance tradeoffs in joint spectral embeddings, arXiv preprint arXiv:2005.02511 (2020).
  • [74] K. Levin, A. Athreya, M. Tang, V. Lyzinski, Y. Park, C. E. Priebe, A central limit theorem for an omnibus embedding of multiple random graphs and implications for multiscale network inference, arXiv preprint arXiv:1705.09355 (2017).
  • [75] F. Fang, D. L. Sussman, V. Lyzinski, Tractable graph matching via soft seeding, arXiv preprint arXiv:1807.09299 (2018).
  • [76] E. Mossel, J. Xu, Seeded graph matching via large neighborhood statistics, Random Structures & Algorithms 57 (3) (2020) 570–611.
  • [77] D. E. Fishkind, F. Parker, H. Sawczuk, L. Meng, E. Bridgeford, A. Athreya, C. Priebe, V. Lyzinski, The phantom alignment strength conjecture: practical use of graph matching alignment strength to indicate a meaningful graph match, Appl. Netw. Sci. 6 (1) (2021) 1–27.
  • [78] C. Stein, Approximate computation of expectations, IMS, 1986.
  • [79] N. Ross, Fundamentals of Stein’s method, Probability Surveys 8 (2011) 210–293.

Appendix A Proofs of main and supporting results

Herein, we collect the proofs of the main and supporting results from the paper. We first present a useful theorem and corollary that will be used throughout.

In the shuffled testing analysis that we consider herein, we will use the following ASE consistency result from [61, 43]. Note that we write here that a random variable XX is O(g(n))O_{\mathbb{P}}(g(n)) if for any constant A>0A>0, there exists n0+n_{0}\in\mathbb{Z}^{+} and constant B>0B>0 (both possibly depending on AA) such that (|X|Bg(n))1nA\mathbb{P}(|X|\leq Bg(n))\geq 1-n^{-A} for all nn0n\geq n_{0}.

Theorem A.1.

Given Assumption 1, let AnRDPG(Xn,νn)\textbf{A}_{n}\sim RDPG(\textbf{X}_{n},\nu_{n}) be a sequence of dd-dimensional RDPGs, and let the adjacency spectral embedding of An\textbf{A}_{n} be given by X^n\widehat{\textbf{X}}_{n}\simASE(An,d)(\textbf{A}_{n},d). There exists a sequence of orthogonal matrices 𝐖n𝒪d\mathbf{W}_{n}\in\mathcal{O}_{d} and a universal constant c>0c>0 such that if the sparsity factor νn=ω(log4cnn)\nu_{n}=\omega\left(\frac{\log^{4c}n}{n}\right), then (suppressing the dependence of X and X^\widehat{\textbf{X}} on nn)

maxi=1,2,,n𝐖nX^iνn1/2Xi=O(logcnn1/2)\max_{i=1,2,\ldots,n}\|\mathbf{W}_{n}\widehat{X}_{i}-\nu_{n}^{1/2}X_{i}\|=O_{\mathbb{P}}\left(\frac{\log^{c}n}{n^{1/2}}\right) (12)

From Eq. 12, we can derive the following rough (though sufficient for our present needs) estimation bound on P^PF\|\widehat{\textbf{P}}-\textbf{P}\|_{F}.

Corollary A.1.

With notation and assumptions as in Theorem A.1, let P=νXXT\textbf{P}=\nu\textbf{X}\textbf{X}^{T} and P^=X^X^T\widehat{\textbf{P}}=\widehat{\textbf{X}}\widehat{\textbf{X}}^{T} (where X^\widehat{\textbf{X}}\simASE(A,d)(\textbf{A},d)). We then have

P^PF=O(nνnlogcn)\|\widehat{\textbf{P}}-\textbf{P}\|_{F}=O_{\mathbb{P}}\left(\sqrt{n\nu_{n}}\log^{c}n\right) (13)
Proof.

Let (Wn)(\textbf{W}_{n}) be the sequence of W’s from Theorem A.1. Note first that (suppressing the subscript dependence on nn)

|νXiTXjX^iTX^j|\displaystyle|\nu X_{i}^{T}X_{j}-\widehat{X}_{i}^{T}\widehat{X}_{j}| |ν1/2XiT(ν1/2XjWX^j)|+|(ν1/2XiTX^iTWT)WX^j|\displaystyle\leq|\nu^{1/2}X_{i}^{T}(\nu^{1/2}X_{j}-\textbf{W}\widehat{X}_{j})|+|(\nu^{1/2}X_{i}^{T}-\widehat{X}_{i}^{T}\textbf{W}^{T})\textbf{W}\widehat{X}_{j}|
ν1/2XjWX^j2ν1/2XiT2+ν1/2XiWX^i2WX^j2\displaystyle\leq\|\nu^{1/2}X_{j}-\textbf{W}\widehat{X}_{j}\|_{2}\|\nu^{1/2}X_{i}^{T}\|_{2}+\|\nu^{1/2}X_{i}-\textbf{W}\widehat{X}_{i}\|_{2}\|\textbf{W}\widehat{X}_{j}\|_{2}
O(ν1/2logcnn1/2)+O(logcnn1/2)(WX^jν1/2Xj2+ν1/2Xj2)\displaystyle\leq O_{\mathbb{P}}\left(\frac{\nu^{1/2}\log^{c}n}{n^{1/2}}\right)+O_{\mathbb{P}}\left(\frac{\log^{c}n}{n^{1/2}}\right)(\|\textbf{W}\widehat{X}_{j}-\nu^{1/2}X_{j}\|_{2}+\|\nu^{1/2}X_{j}\|_{2})
=O(ν1/2logcnn1/2)\displaystyle=O_{\mathbb{P}}\left(\frac{\nu^{1/2}\log^{c}n}{n^{1/2}}\right)

Applying this entry-wise to P^PF2\|\widehat{\textbf{P}}-\textbf{P}\|_{F}^{2} we get

P^PF2\displaystyle\|\widehat{\textbf{P}}-\textbf{P}\|_{F}^{2} =i,j|νXiTXjX^iTX^j|2=O(nνlog2cn)\displaystyle=\sum_{i,j}|\nu X_{i}^{T}X_{j}-\widehat{X}_{i}^{T}\widehat{X}_{j}|^{2}=O_{\mathbb{P}}\left(n\nu\log^{2c}n\right)

as desired. ∎

A.1 Proof of Theorems 2.1 and 2.2

These proofs will proceed by using Corollary A.1 to sharply bound the critical value of the level α\alpha test in terms of the error between the models (i.e., the difference of the P matrices) and the sampling error (i.e., the difference between P and P^\widehat{\textbf{P}}). Growth rate analysis on the difference of the P matrices will then allow for the detailed power analysis.

We will begin by recalling/establishing some notation. Let P1=𝔼(A1)\textbf{P}_{1}=\mathbb{E}(A_{1}), P2=𝔼(A2)\textbf{P}_{2}=\mathbb{E}(A_{2}). To ease notation moving forward, we will define

  • i.

    For i=1,2i=1,2, let P^i\widehat{\textbf{P}}_{i} be the ASE-based estimate of Pi\textbf{P}_{i} derived from Ai\textbf{A}_{i};

  • ii.

    For i=1,2i=1,2, for any QΠn,k\textbf{Q}\in\Pi_{n,k}, let P^i,Q\widehat{\textbf{P}}_{i,\textbf{Q}} be the ASE-based estimate of Pi,Q=QP2QT\textbf{P}_{i,\textbf{Q}}=\textbf{Q}\textbf{P}_{2}\textbf{Q}^{T} derived from QAiQT\textbf{Q}\textbf{A}_{i}\textbf{Q}^{T};

Given A1\textbf{A}_{1} and B2=Q~A2(Q~)T\textbf{B}_{2}=\tilde{\textbf{Q}}\textbf{A}_{2}(\tilde{\textbf{Q}})^{T} (where Q~\tilde{\textbf{Q}} shuffles k\ell\leq k vertices of Un,kU_{n,k}), a valid (conservative) level-α\alpha test using the Frobenius norm test statistic would correctly reject H0H_{0} if

T(A1,B2):=P^1P^2,Q~FmaxQΠn,kcα,Q.T(\textbf{A}_{1},\textbf{B}_{2}):=\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\tilde{\textbf{Q}}}\|_{F}\geq\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}.

Our first Proposition will bound maxQΠn,kcα,Q\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}} in terms of maxQΠn,kP1P1,QF\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F} as follows:

Proposition A.1.

With notation and setup as above, we have that for any fixed α>0\alpha>0, there exists an M>0M>0 such that for all sufficiently large nn,

maxQΠn,kcα,QmaxQΠn,kP1P1,QF+Mnνnlogcn\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}\leq\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}+M\sqrt{n\nu_{n}}\log^{c}n

and

maxQΠn,kcα,QmaxQΠn,kP1P1,QFMnνnlogcn.\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}\geq\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}-M\sqrt{n\nu_{n}}\log^{c}n.
Proof.

Note that under the null hypothesis, P1=P2\textbf{P}_{1}=\textbf{P}_{2}. As these critical values are computed under under the null hypothesis assumption, we shall make use of this throughout. Note, however that P^1P^2\widehat{\textbf{P}}_{1}\neq\widehat{\textbf{P}}_{2} in general, as these are estimated from A1\textbf{A}_{1} and A2\textbf{A}_{2}, which are equal only in distribution. Note that for the ASE-based estimate of P2,Q=P1,Q\textbf{P}_{2,\textbf{Q}}=\textbf{P}_{1,\textbf{Q}}, we have P^2,Q=QP^2QT\widehat{\textbf{P}}_{2,\textbf{Q}}=\textbf{Q}\widehat{\textbf{P}}_{2}\textbf{Q}^{T}. We then have

maxQΠn,kP^1P^1,QF\displaystyle\max_{\textbf{Q}\in\Pi_{n,k}}\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{1,\textbf{Q}}\|_{F} maxQΠn,k(P^1P1F+P1P2,QF+P2,QP^2,QF)\displaystyle\leq\max_{\textbf{Q}\in\Pi_{n,k}}\left(\|\widehat{\textbf{P}}_{1}-\textbf{P}_{1}\|_{F}+\|\textbf{P}_{1}-\textbf{P}_{2,\textbf{Q}}\|_{F}+\|\textbf{P}_{2,\textbf{Q}}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F}\right)
=maxQΠn,k(P^1P1F+P1P2,QF+Q(P2P^2)QTF)\displaystyle=\max_{\textbf{Q}\in\Pi_{n,k}}\left(\|\widehat{\textbf{P}}_{1}-\textbf{P}_{1}\|_{F}+\|\textbf{P}_{1}-\textbf{P}_{2,\textbf{Q}}\|_{F}+\|\textbf{Q}(\textbf{P}_{2}-\widehat{\textbf{P}}_{2})\textbf{Q}^{T}\|_{F}\right)
=(maxQΠn,kP1P2,QF)+P^1P1F+P2P^2F\displaystyle=\left(\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{2,\textbf{Q}}\|_{F}\right)+\|\widehat{\textbf{P}}_{1}-\textbf{P}_{1}\|_{F}+\|\textbf{P}_{2}-\widehat{\textbf{P}}_{2}\|_{F}
=maxQΠn,kP1P1,QF+O(nνnlogcn)\displaystyle=\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}+O_{\mathbb{P}}(\sqrt{n\nu_{n}}\log^{c}n)

where the last line follows from Corollary A.1. Therefore, for any ϵ(0,α)\epsilon\in(0,\alpha) there exists an M1>0M_{1}>0 and N1>0N_{1}>0 such that for any nN1n\geq N_{1}, and any Q^Πn,k\widehat{\textbf{Q}}\in\Pi_{n,k} we have that

H0\displaystyle\mathbb{P}_{H_{0}} (P^1P^2,Q^F>maxQΠn,kP1P1,QF+M1nνnlogcn)\displaystyle\left(\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\widehat{\textbf{Q}}}\|_{F}>\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}+M_{1}\sqrt{n\nu_{n}}\log^{c}n\right)
H0(maxQΠn,kP^1P^2,QF>maxQΠn,kP1P1,QF+M1nνnlogcn)ϵ,\displaystyle\leq\mathbb{P}_{H_{0}}\left(\max_{\textbf{Q}\in\Pi_{n,k}}\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F}>\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}+M_{1}\sqrt{n\nu_{n}}\log^{c}n\right)\leq\epsilon,

implying (as Q^\widehat{\textbf{Q}} was chosen arbitrarily)

cα,Q^\displaystyle c_{\alpha,\widehat{\textbf{Q}}} maxQΠn,kP1P1,QF+M1nνnlogcn\displaystyle\leq\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}+M_{1}\sqrt{n\nu_{n}}\log^{c}n
maxQΠn,kcα,QmaxQΠn,kP1P1,QF+M1nνnlogcn.\displaystyle\Rightarrow\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}\leq\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}+M_{1}\sqrt{n\nu_{n}}\log^{c}n.

For the lower bound, recall that for Q^Πn,k\widehat{\textbf{Q}}\in\Pi_{n,k}, we have cα,Q^c_{\alpha,\widehat{\textbf{Q}}} is the smallest value such that H0(P^1P^2,Q^Fcα,Q^)α\mathbb{P}_{H_{0}}(\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\widehat{\textbf{Q}}}\|_{F}\geq c_{\alpha,\widehat{\textbf{Q}}})\leq\alpha. From the triangle inequality, we have that

P^1P^2,QF\displaystyle\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F} P^1P1F+P1P2,QFP2,QP^2,QF\displaystyle\geq-\|\widehat{\textbf{P}}_{1}-\textbf{P}_{1}\|_{F}+\|\textbf{P}_{1}-\textbf{P}_{2,\textbf{Q}}\|_{F}-\|\textbf{P}_{2,\textbf{Q}}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F}
P2,QP^2,QF+P^1P1FP1P2,QFP^1P^2,QF\displaystyle\Leftrightarrow\|\textbf{P}_{2,\textbf{Q}}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F}+\|\widehat{\textbf{P}}_{1}-\textbf{P}_{1}\|_{F}\geq\|\textbf{P}_{1}-\textbf{P}_{2,\textbf{Q}}\|_{F}-\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F}

so that for any ϵ2>0\epsilon_{2}>0, there exists an M2M_{2} and N2N_{2} such that for all nN2n\geq N_{2} (recalling P1=P2\textbf{P}_{1}=\textbf{P}_{2} by assumption),

H0\displaystyle\mathbb{P}_{H_{0}} (P2,QP^2,QF+P^1P1FM2nνnlogcn)1ϵ2.\displaystyle\left(\|\textbf{P}_{2,\textbf{Q}}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F}+\|\widehat{\textbf{P}}_{1}-\textbf{P}_{1}\|_{F}\leq M_{2}\sqrt{n\nu_{n}}\log^{c}n\right)\geq 1-\epsilon_{2}.
H0(P1P2,QFP^1P^2,QFM2nνnlogcn)1ϵ2.\displaystyle\Rightarrow\mathbb{P}_{H_{0}}\left(\|\textbf{P}_{1}-\textbf{P}_{2,\textbf{Q}}\|_{F}-\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F}\leq M_{2}\sqrt{n\nu_{n}}\log^{c}n\right)\geq 1-\epsilon_{2}.
H0(P^1P^2,QFP1P1,QFM2nνnlogcn)1ϵ2.\displaystyle\Leftrightarrow\mathbb{P}_{H_{0}}\left(\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\textbf{Q}}\|_{F}\geq\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}-M_{2}\sqrt{n\nu_{n}}\log^{c}n\right)\geq 1-\epsilon_{2}.

This then implies that (for a well chosen ϵ2\epsilon_{2}), that there exists an M2M_{2} and N2N_{2} such that for all nN2n\geq N_{2}

maxQΠn,kcα,QmaxQΠn,kP1P1,QFM2nνnlogcn\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}\geq\max_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}-M_{2}\sqrt{n\nu_{n}}\log^{c}n

Letting M=max(M1,M2)M=\max(M_{1},M_{2}) yields the desired result. ∎

For ease of notation let QargmaxQΠn,kP1P1,QF\textbf{Q}^{*}\in\text{argmax}_{\textbf{Q}\in\Pi_{n,k}}\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}}\|_{F}, and define

T1,k,:=\displaystyle T_{1,k,\ell}:= P1P1,QF2P1P1,Q~F2;\displaystyle\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|^{2}_{F};
T2,,:=\displaystyle T_{2,\ell,\ell}:= P1P2,Q~F2P1P1,Q~F2.\displaystyle\|\textbf{P}_{1}-\textbf{P}_{2,\widetilde{\textbf{Q}}}\|^{2}_{F}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|^{2}_{F}.

We have then that (under the assumptions of Theorem 2.1) there exists constants C1,C2>0C_{1},C_{2}>0 and an integer n0n_{0} such that for nn0n\geq n_{0}, the following holds with probability at least 1n21-n^{-2} under H1H_{1},

P^1P^2,Q~F\displaystyle\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\widetilde{\textbf{Q}}}\|_{F} (P1P1,QF2T1,k,+T2,,)1/2C1nνnlogcn;\displaystyle\geq\left(\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}-T_{1,k,\ell}+T_{2,\ell,\ell}\right)^{1/2}-C_{1}\sqrt{n\nu_{n}}\log^{c}n;
P^1P^2,Q~F\displaystyle\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\widetilde{\textbf{Q}}}\|_{F} (P1P1,QF2T1,k,+T2,,)1/2+C2nνnlogcn.\displaystyle\leq\left(\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}-T_{1,k,\ell}+T_{2,\ell,\ell}\right)^{1/2}+C_{2}\sqrt{n\nu_{n}}\log^{c}n.

Recalling the form of 𝐄\mathbf{E}, we have the following simplification of T2,,T_{2,\ell,\ell} under H1H_{1}; first note that

P1P2,Q~F2\displaystyle\|\textbf{P}_{1}-\textbf{P}_{2,\widetilde{\textbf{Q}}}\|^{2}_{F} =P1P1,Q~F2+2trace((P1,Q~TP1)T𝐄)+𝐄F2,\displaystyle=\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|^{2}_{F}+2\text{trace}\left(\left(\textbf{P}_{1,\widetilde{\textbf{Q}}^{T}}-\textbf{P}_{1}\right)^{T}\mathbf{E}\right)+\|\mathbf{E}\|_{F}^{2},

so that (where C>0C>0 and c>0c>0 are constants that can change line–to–line)

T2,,{Cnrν2ϵ2cnrν2ϵ if rCnrν2ϵ2cnν2ϵ if rT2,,{Cnrν2ϵ2+cnrν2ϵ if rCnrν2ϵ2+cnν2ϵ if r\displaystyle T_{2,\ell,\ell}\geq\begin{cases}Cnr\nu^{2}\epsilon^{2}-cnr\nu^{2}\epsilon&\text{ if }\ell\geq r\\ Cnr\nu^{2}\epsilon^{2}-cn\ell\nu^{2}\epsilon&\text{ if }\ell\leq r\end{cases}\quad\quad T_{2,\ell,\ell}\leq\begin{cases}Cnr\nu^{2}\epsilon^{2}+cnr\nu^{2}\epsilon&\text{ if }\ell\geq r\\ Cnr\nu^{2}\epsilon^{2}+cn\ell\nu^{2}\epsilon&\text{ if }\ell\leq r\end{cases}

We are now ready to prove Theorem 2.1, which we state here for completeness. Recall, we are concerned with showing that for all sufficiently large nn,

H1(P^1P^2,Q~F>maxQΠn,kcα,Q)1n2.\mathbb{P}_{H_{1}}\left(\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\tilde{\textbf{Q}}}\|_{F}>\max_{\textbf{Q}\in\Pi_{n,k}}c_{\alpha,\textbf{Q}}\right)\geq 1-n^{-2}. (14)
Theorem 2.1.

With notation as above, assume there exist α(0,1]\alpha\in(0,1] such that r=Θ(nα)r=\Theta(n^{\alpha}) and k,nαk,\ell\ll n^{\alpha}, and that

P1P1,QF2P1P1,Q~F2ν2n=O(k).\frac{\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2}}{\nu^{2}n}=O(k).

In the sparse setting, consider νlog4c(n)nβ\nu\gg\frac{\log^{4c}(n)}{n^{\beta}} for β(0,1]\beta\in(0,1] where αβ\alpha\geq\beta. If either

  • i.

    k=O(nβlog2cn)k=O\left(\frac{n^{\beta}}{\log^{2c}n}\right) and ϵnβαlog2c(n)\epsilon\gg\sqrt{\frac{n^{\beta-\alpha}}{\log^{2c}(n)}};

  • ii.

    knβlog2c(n)k\gg\frac{n^{\beta}}{\log^{2c}(n)} and ϵknα\epsilon\gg\sqrt{\frac{k}{n^{\alpha}}}

then Eq. 14 holds for all nn sufficiently large. In the dense case where ν=1\nu=1, if either

  • i.

    klog2c(n) and ϵk/nα;k\gg\log^{2c}(n)\text{ and }\epsilon\gg\sqrt{k/n^{\alpha}};

  • ii.

    klog2c(n) and ϵ(logc(n))/nα,k\ll\log^{2c}(n)\text{ and }\epsilon\gg\sqrt{(\log^{c}(n))/n^{\alpha}},

then Eq. 14 holds for all nn sufficiently large.

Proof.

Note that we will see below that we will require k<c2nαk<c_{2}n^{\alpha} for an appropriate constant c2c_{2}, so the assumption that knαk\ll n^{\alpha} is not overly stringent. We begin by noting that for sufficiently large nn (as r\ell\ll r)

T2,,T1,k,\displaystyle T_{2,\ell,\ell}-T_{1,k,\ell}\geq Cnrν2ϵ2cnν2ϵP1P1,QF2+P1P1,Q~F2.\displaystyle Cnr\nu^{2}\epsilon^{2}-cn\ell\nu^{2}\epsilon-\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}+\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|^{2}_{F}. (15)

Now, for power to be asymptotically almost surely 1 (i.e., bounded below by 1n21-n^{-2} for all nn sufficiently large), it suffices that under H1H_{1} (as the critical value for the hypothesis test is bounded above by Proposition A.1 by P1P1,QF+Mnνlogcn\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}+M\sqrt{n\nu}\log^{c}n),

P^1P^2,Q~F\displaystyle\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\widetilde{\textbf{Q}}}\|_{F} (P1P1,QF2T1,k,+T2,,)1/2C1nνlogcn\displaystyle\geq\left(\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}-T_{1,k,\ell}+T_{2,\ell,\ell}\right)^{1/2}-C_{1}\sqrt{n\nu}\log^{c}n
P1P1,QF+Mnνlogcn,\displaystyle\geq\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}+M\sqrt{n\nu}\log^{c}n,

which is implied by

Cnrν2ϵ2\displaystyle Cnr\nu^{2}\epsilon^{2}\geq cnν2ϵ+(P1P1,QF2P1P1,Q~F2)+(M+C1)2nνlog2cn\displaystyle cn\ell\nu^{2}\epsilon+(\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|^{2}_{F})+(M+C_{1})^{2}n\nu\log^{2c}n
+2P1P1,QF(M+C1)nνlogcn.\displaystyle\hskip 14.22636pt+2\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}(M+C_{1})\sqrt{n\nu}\log^{c}n. (16)

To show Eq. 16 holds, it suffices that all of the following hold

ϵnα;ϵknα;ϵlog2cnnαν;ϵk1/2logcnnαν1/2\displaystyle\epsilon\gg\frac{\ell}{n^{\alpha}};\quad\quad\epsilon\gg\sqrt{\frac{k}{n^{\alpha}}};\quad\quad\epsilon\gg\sqrt{\frac{\log^{2c}n}{n^{\alpha}\nu}};\quad\quad\epsilon\gg\sqrt{\frac{k^{1/2}\log^{c}n}{n^{\alpha}\nu^{1/2}}} (17)

In the sparse setting where νlog4c(n)nβ\nu\gg\frac{\log^{4c}(n)}{n^{\beta}} for β(0,1]\beta\in(0,1], we have Eq. 17 is implied by the following (as k,nαk,\ell\ll n^{\alpha} and kk\geq\ell so that knα1\frac{k}{n^{\alpha}}\ll 1 and then knαknαnα\sqrt{\frac{k}{n^{\alpha}}}\gg\frac{k}{n^{\alpha}}\geq\frac{\ell}{n^{\alpha}})

ϵknα;ϵnβαlog2c(n);ϵnβαlogc(n)k1/2nβ/2;\displaystyle\epsilon\gg\sqrt{\frac{k}{n^{\alpha}}};\quad\quad\quad\epsilon\gg\sqrt{\frac{n^{\beta-\alpha}}{\log^{2c}(n)}};\quad\quad\quad\epsilon\gg\sqrt{\frac{n^{\beta-\alpha}}{\log^{c}(n)}\cdot\frac{k^{1/2}}{n^{\beta/2}}}; (18)

note that for these equations to be possible, we must have αβ\alpha\geq\beta. If k=O(nβlog2cn)k=O\left(\frac{n^{\beta}}{\log^{2c}n}\right), then Eq. 18 is implied by ϵnβαlog2c(n)\epsilon\gg\sqrt{\frac{n^{\beta-\alpha}}{\log^{2c}(n)}}. Consider next knβlog2c(n)k\gg\frac{n^{\beta}}{\log^{2c}(n)}, in which case Eq. 18 are implied by ϵknα\epsilon\gg\sqrt{\frac{k}{n^{\alpha}}}.

In the dense case where ν=1\nu=1, we have that Eq. 17 is implied by the following

ϵknα;ϵ\displaystyle\epsilon\gg\sqrt{\frac{k}{n^{\alpha}}};\quad\quad\epsilon log2cnnα;ϵk1/2logcnnα\displaystyle\gg\sqrt{\frac{\log^{2c}n}{n^{\alpha}}};\quad\quad\epsilon\gg\sqrt{\frac{k^{1/2}\log^{c}n}{n^{\alpha}}} (19)

Eq. 19 is then implied by

ϵknα\displaystyle\epsilon\gg\sqrt{\frac{k}{n^{\alpha}}}  if klog2c(n);\displaystyle\,\,\text{ if }\,\,k\gg\log^{2c}(n);
ϵlogc(n)nα\displaystyle\epsilon\gg\sqrt{\frac{\log^{c}(n)}{n^{\alpha}}}  if klog2c(n).\displaystyle\,\,\text{ if }\,\,k\ll\log^{2c}(n).

as desired. ∎

We next turn our attention to Theorem 2.2.

Theorem 2.2.

With notation as above, assume there exist α(0,1]\alpha\in(0,1] such that r=Θ(nα)r=\Theta(n^{\alpha}) and k,nαk,\ell\ll n^{\alpha}, and that

P1P1,QF2P1P1,Q~F2ν2n=O(k).\frac{\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2}}{\nu^{2}n}=O(k-\ell).

In the sparse setting where νlog4c(n)nβ\nu\gg\frac{\log^{4c}(n)}{n^{\beta}} for β(0,1]\beta\in(0,1] where αβ\alpha\geq\beta, if knβlog2c(n)k\gg\frac{n^{\beta}}{\log^{2c}(n)} and either

  • i.

    kk1/2nβ/2log2c(n)\frac{k-\ell}{k^{1/2}}\geq\frac{n^{\beta/2}}{\log^{2c}(n)}; ϵnα\epsilon\gg\frac{\ell}{n^{\alpha}}; and ϵknα\epsilon\gg\sqrt{\frac{k-\ell}{n^{\alpha}}}; or

  • ii.

    kk1/2nβ/2log2c(n)\frac{k-\ell}{k^{1/2}}\leq\frac{n^{\beta/2}}{\log^{2c}(n)}; ϵnα\epsilon\gg\frac{\ell}{n^{\alpha}}; and ϵnβ/2log2c(n)k1/2nα\epsilon\gg\sqrt{\frac{n^{\beta/2}}{\log^{2c}(n)}\frac{k^{1/2}}{n^{\alpha}}}

then Eq. 14 holds for all nn sufficiently large. In the dense case where ν=1\nu=1 and k=ω(log2cn)k=\omega(\log^{2c}n), if either

  • i.

    kk1/2logc(n)\frac{k-\ell}{k^{1/2}}\geq\log^{c}(n); ϵnα\epsilon\gg\frac{\ell}{n^{\alpha}}; and ϵ(k)nα\epsilon\gg\sqrt{\frac{(k-\ell)}{n^{\alpha}}}; or

  • ii.

    kk1/2logc(n)\frac{k-\ell}{k^{1/2}}\leq\log^{c}(n); ϵnα\epsilon\gg\frac{\ell}{n^{\alpha}}; and ϵk1/2logc(n)nα\epsilon\gg\sqrt{\frac{k^{1/2}\log^{c}(n)}{n^{\alpha}}},

then Eq. 14 holds for all nn sufficiently large.

Proof.

Mimicking the proof of Theorem 2.1, for Eq. 14 to hold, it suffices that all of the following hold

ϵnα;ϵknα;ϵlog2cnnαν;ϵk1/2logcnnαν1/2\displaystyle\epsilon\gg\frac{\ell}{n^{\alpha}};\quad\quad\epsilon\gg\sqrt{\frac{k-\ell}{n^{\alpha}}};\quad\quad\epsilon\gg\sqrt{\frac{\log^{2c}n}{n^{\alpha}\nu}};\quad\quad\epsilon\gg\sqrt{\frac{k^{1/2}\log^{c}n}{n^{\alpha}\nu^{1/2}}} (20)

In the sparse setting where νlog4c(n)nβ\nu\gg\frac{\log^{4c}(n)}{n^{\beta}} for β(0,1]\beta\in(0,1], we have Eq. 20 is implied by the following

ϵnα;ϵknα;ϵnβαlog2c(n);ϵnβαlogc(n)k1/2nβ/2;\displaystyle\epsilon\gg\frac{\ell}{n^{\alpha}};\quad\quad\epsilon\gg\sqrt{\frac{k-\ell}{n^{\alpha}}};\quad\quad\epsilon\gg\sqrt{\frac{n^{\beta-\alpha}}{\log^{2c}(n)}};\quad\quad\epsilon\gg\sqrt{\frac{n^{\beta-\alpha}}{\log^{c}(n)}\cdot\frac{k^{1/2}}{n^{\beta/2}}}; (21)

note that for these equations to be possible, we must have αβ\alpha\geq\beta. Recalling the assumption that knβlog2c(n)k\gg\frac{n^{\beta}}{\log^{2c}(n)}, the behavior in this case hinges on \ell as well, as Eq. 21 is implied by

ϵnα,ϵknα\displaystyle\epsilon\gg\frac{\ell}{n^{\alpha}},\quad\epsilon\gg\sqrt{\frac{k-\ell}{n^{\alpha}}}  if kk1/2nβ/2log2c(n);\displaystyle\,\,\text{ if }\,\,\frac{k-\ell}{k^{1/2}}\geq\frac{n^{\beta/2}}{\log^{2c}(n)};
ϵnα,ϵnβ/2log2c(n)k1/2nα\displaystyle\epsilon\gg\frac{\ell}{n^{\alpha}},\quad\epsilon\gg\sqrt{\frac{n^{\beta/2}}{\log^{2c}(n)}\frac{k^{1/2}}{n^{\alpha}}}  if kk1/2nβ/2log2c(n);\displaystyle\,\,\text{ if }\frac{k-\ell}{k^{1/2}}\leq\frac{n^{\beta/2}}{\log^{2c}(n)};

In the dense case where ν=1\nu=1, we have that Eq. 20 is implied by the following

ϵnα;ϵknα;ϵlog2cnnα;ϵk1/2logcnnα\displaystyle\epsilon\gg\frac{\ell}{n^{\alpha}};\quad\quad\epsilon\gg\sqrt{\frac{k-\ell}{n^{\alpha}}};\quad\quad\epsilon\gg\sqrt{\frac{\log^{2c}n}{n^{\alpha}}};\quad\quad\epsilon\gg\sqrt{\frac{k^{1/2}\log^{c}n}{n^{\alpha}}} (22)

Under the assumption that klog2cnk\gg\log^{2c}n, Eq. 22 is implied by

ϵnα,ϵ(k)nα\displaystyle\epsilon\gg\frac{\ell}{n^{\alpha}},\quad\epsilon\gg\sqrt{\frac{(k-\ell)}{n^{\alpha}}}  if kk1/2logc(n);\displaystyle\,\,\text{ if }\,\,\frac{k-\ell}{k^{1/2}}\geq\log^{c}(n);
ϵnα,ϵk1/2logc(n)nα\displaystyle\epsilon\gg\frac{\ell}{n^{\alpha}},\quad\epsilon\gg\sqrt{\frac{k^{1/2}\log^{c}(n)}{n^{\alpha}}}  if kk1/2logc(n),\displaystyle\,\,\text{ if }\,\,\frac{k-\ell}{k^{1/2}}\leq\log^{c}(n),

as desired. ∎

A.2 Proof of Theorem 2.3

Recall that

T2,,T1,k,{Cnrν2ϵ2+cnrν2ϵP1P1,QF2+P1P1,Q~F2 if rCnrν2ϵ2+cnν2ϵP1P1,QF2+P1P1,Q~F2 if rT_{2,\ell,\ell}-T_{1,k,\ell}\leq\begin{cases}Cnr\nu^{2}\epsilon^{2}+cnr\nu^{2}\epsilon-\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}+\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|^{2}_{F}&\text{ if }\ell\geq r\\ Cnr\nu^{2}\epsilon^{2}+cn\ell\nu^{2}\epsilon-\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}+\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|^{2}_{F}&\text{ if }\ell\leq r\end{cases}

so that the assumption

P1P1,QF2P1P1,Q~F2ν2n=Ω(k)\frac{\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}^{2}-\|\textbf{P}_{1}-\textbf{P}_{1,\widetilde{\textbf{Q}}}\|_{F}^{2}}{\nu^{2}n}=\Omega(k-\ell)

yields that there exists a constant R>0R>0 such that for nn sufficiently large

T2,,T1,k,{Cnrν2ϵ2+cnrν2ϵRnν2(k) if rCnrν2ϵ2+cnν2ϵRnν2(k) if rT_{2,\ell,\ell}-T_{1,k,\ell}\leq\begin{cases}Cnr\nu^{2}\epsilon^{2}+cnr\nu^{2}\epsilon-Rn\nu^{2}(k-\ell)&\text{ if }\ell\geq r\\ Cnr\nu^{2}\epsilon^{2}+cn\ell\nu^{2}\epsilon-Rn\nu^{2}(k-\ell)&\text{ if }\ell\leq r\end{cases}

Now, for power to be asymptotically almost surely 0 (i.e., bounded below by n2n^{-2} for all nn sufficiently large), it suffices that under H1H_{1} we have, where 𝔠\mathfrak{c} is an appropriate constant that can change line-to-line (as the critical value for the hypothesis test is bounded below by Proposition A.1 by P1P1,QFMnνlogcn\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}-M\sqrt{n\nu}\log^{c}n)

P^1P^2,Q~F\displaystyle\|\widehat{\textbf{P}}_{1}-\widehat{\textbf{P}}_{2,\widetilde{\textbf{Q}}}\|_{F} (P1P1,QF2T1,k,+T2,,)1/2+C2nνlogcn\displaystyle\leq\left(\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|^{2}_{F}-T_{1,k,\ell}+T_{2,\ell,\ell}\right)^{1/2}+C_{2}\sqrt{n\nu}\log^{c}n
P1P1,QFMnνlogcn\displaystyle\leq\|\textbf{P}_{1}-\textbf{P}_{1,\textbf{Q}^{*}}\|_{F}-M\sqrt{n\nu}\log^{c}n (23)

Eq. 23 is then implied by

T2,,T1,k,(C2+M)2nνlog2cn2(M+C2)nνlogcnP1P2,QFT_{2,\ell,\ell}-T_{1,k,\ell}\leq(C_{2}+M)^{2}n\nu\log^{2c}n-2(M+C_{2})\sqrt{n\nu}\log^{c}n\cdot\|\textbf{P}_{1}-\textbf{P}_{2,\textbf{Q}^{*}}\|_{F}

which is implied by

{n1+αν2ϵ2+n1+αν2ϵ𝔠nν2(k)+𝔠nνlog2cn𝔠nν3/2klogcn if rn1+αν2ϵ2+nν2ϵ𝔠nν2(k)+𝔠nνlog2cn𝔠nν3/2klogcn if r\displaystyle\begin{cases}n^{1+\alpha}\nu^{2}\epsilon^{2}+n^{1+\alpha}\nu^{2}\epsilon\leq\mathfrak{c}n\nu^{2}(k-\ell)+\mathfrak{c}n\nu\log^{2c}n-\mathfrak{c}n\nu^{3/2}\sqrt{k}\log^{c}n&\text{ if }\ell\geq r\\ n^{1+\alpha}\nu^{2}\epsilon^{2}+n\ell\nu^{2}\epsilon\leq\mathfrak{c}n\nu^{2}(k-\ell)+\mathfrak{c}n\nu\log^{2c}n-\mathfrak{c}n\nu^{3/2}\sqrt{k}\log^{c}n&\text{ if }\ell\leq r\end{cases} (24)

Suppose further that

kklogcnν.\frac{k-\ell}{\sqrt{k}}\gg\frac{\log^{c}n}{\sqrt{\nu}}.

In this case Eq. 24 is implied by

n1+αν2ϵ2\displaystyle n^{1+\alpha}\nu^{2}\epsilon^{2} +n1+αν2ϵnνlog2cn+n(k)ν2 if r\displaystyle+n^{1+\alpha}\nu^{2}\epsilon\ll n\nu\log^{2c}n+n(k-\ell)\nu^{2}\quad\text{ if }\ell\geq r
n1+αν2ϵ2\displaystyle n^{1+\alpha}\nu^{2}\epsilon^{2} +nν2ϵnνlog2cn+n(k)ν2 if r\displaystyle+n\ell\nu^{2}\epsilon\ll n\nu\log^{2c}n+n(k-\ell)\nu^{2}\quad\text{ if }\ell\leq r
{ϵ2+ϵlog2cnnαν+knα if rnαϵ2+ϵlog2cnν+(k) if r\displaystyle\Leftrightarrow\begin{cases}\epsilon^{2}+\epsilon\ll\frac{\log^{2c}n}{n^{\alpha}\nu}+\frac{k-\ell}{n^{\alpha}}\quad&\text{ if }\ell\geq r\\ n^{\alpha}\epsilon^{2}+\ell\epsilon\ll\frac{\log^{2c}n}{\nu}+(k-\ell)\quad&\text{ if }\ell\leq r\end{cases}
{ϵknα if rϵknα;ϵk if r\displaystyle\Leftarrow\begin{cases}\epsilon\ll\frac{k-\ell}{n^{\alpha}}\quad&\text{ if }\ell\geq r\\ \epsilon\ll\sqrt{\frac{k-\ell}{n^{\alpha}}};\,\,\epsilon\ll\frac{k-\ell}{\ell}\quad&\text{ if }\ell\leq r\end{cases} (25)

as desired.

A.3 Proof of Proposition 3.1

To ease notation, define Th,A=TA(A1,A2,h):=12A1A2,hF2T_{h,A}=T_{A}(\textbf{A}_{1},\textbf{A}_{2,h}):=\frac{1}{2}\|\textbf{A}_{1}-\textbf{A}_{2,h}\|_{F}^{2}. We will adopt the following notations for the entries of the shuffled edge expectation matrices: for η=1,2\eta=1,2, the (i,j)(i,j)-th entry of Pη,h=Q2hPηQ2hT\textbf{P}_{\eta,h}=\textbf{Q}_{2h}\textbf{P}_{\eta}\textbf{Q}_{2h}^{T} is denoted via pij(η,h)p_{ij}^{(\eta,h)} (and where pi,j(η)p_{i,j}^{(\eta)} will refer to the (i,j)(i,j)-th entry of Pη\textbf{P}_{\eta}). We will also define (recall, {i,j}\sum_{\{i,j\}} signifies the sum over unordered pairs of elements of [n][n]) μ1:={i,j}pi,j(1)\mu_{1}:=\sum_{\{i,j\}}p_{i,j}^{(1)}, and μE:={i,j}eij.\mu_{E}:=\sum_{\{i,j\}}e_{ij}. Then, we have that (where we recall that 𝐄=Q2𝐄Q2T=[ei,j()]\mathbf{E}_{\ell}=\textbf{Q}_{2\ell}\mathbf{E}\textbf{Q}_{2\ell}^{T}=[e_{i,j}^{(\ell)}] is the shuffled noise matrix)

𝔼H0(Tk,A)\displaystyle\mathbb{E}_{H_{0}}(T_{k,A}) =2μ12{i,j}pij(1)pij(1,k)\displaystyle=2\mu_{1}-2\sum_{\{i,j\}}p_{ij}^{(1)}p_{ij}^{(1,k)}
𝔼H1(T,A)\displaystyle\mathbb{E}_{H_{1}}(T_{\ell,A}) =2μ1+μE2{i,j}pij(1)(pij(1,)+eij())\displaystyle=2\mu_{1}+\mu_{E}-2\sum_{\{i,j\}}p^{(1)}_{ij}(p^{(1,\ell)}_{ij}+e^{(\ell)}_{ij})

So that in the dense setting (i.e., νn=1\nu_{n}=1 for all nn), letting ξij:=(2pij(1)1)eij()\xi_{ij}:=(2p^{(1)}_{ij}-1)e^{(\ell)}_{ij} and μξ:={ij}ξij\mu_{\xi}:=\sum_{\{ij\}}\xi_{ij}, we have that (where to ease notation, we define δ=maxi|Λ1iΛ2i|\delta=\max_{i}|\Lambda_{1i}-\Lambda_{2i}|)

𝔼H0(Tk,A)\displaystyle\mathbb{E}_{H_{0}}(T_{k,A})- 𝔼H1(T,A)=2{ij}pij(1)(pij(1,)pij(1,k))μE+2{ij}pij(1)eij()\displaystyle\mathbb{E}_{H_{1}}(T_{\ell,A})=2\sum_{\{ij\}}p^{(1)}_{ij}(p^{(1,\ell)}_{ij}-p^{(1,k)}_{ij})-\mu_{E}+2\sum_{\{ij\}}p^{(1)}_{ij}e^{(\ell)}_{ij}
=2(h>2nh(k)(Λ1hΛ2h)2+(k2)(Λ11Λ22)2\displaystyle=2\bigg{(}\sum_{h>2}n_{h}(k-\ell)(\Lambda_{1h}-\Lambda_{2h})^{2}+\binom{k-\ell}{2}(\Lambda_{11}-\Lambda_{22})^{2} (26)
+(n1k)(k)(Λ11Λ21)2+(n2k)(k)(Λ22Λ12)2\displaystyle\hskip 14.22636pt+(n_{1}-k)(k-\ell)(\Lambda_{11}-\Lambda_{21})^{2}+(n_{2}-k)(k-\ell)(\Lambda_{22}-\Lambda_{12})^{2} (27)
2(k)(Λ11Λ12)(Λ22Λ21))\displaystyle\hskip 14.22636pt-2\ell(k-\ell)(\Lambda_{11}-\Lambda_{12})(\Lambda_{22}-\Lambda_{21})\bigg{)} (28)
μE+2{ij}pij(1)eij()\displaystyle\hskip 14.22636pt-\mu_{E}+2\sum_{\{ij\}}p^{(1)}_{ij}e^{(\ell)}_{ij} (29)
2n(k)δ2+μξ\displaystyle\leq 2n(k-\ell)\delta^{2}+\mu_{\xi} (30)

To see this, we first focus on bounding the terms in Eqs. (26)–(28). To ease notation, let x:=Λ11Λ21x:=\Lambda_{11}-\Lambda_{21} and y:=Λ22Λ12y:=\Lambda_{22}-\Lambda_{12}. For the desired bound, it suffices to show that

(k2)(xy)2k(k)(x2+y2)2(k)xy0.\binom{k-\ell}{2}(x-y)^{2}-k(k-\ell)(x^{2}+y^{2})-2\ell(k-\ell)xy\leq 0.

To see this, note that

(k2)\displaystyle\binom{k-\ell}{2} (xy)2k(k)(x2+y2)2(k)xy\displaystyle(x-y)^{2}-k(k-\ell)(x^{2}+y^{2})-2\ell(k-\ell)xy
=k2((k1)(x2+y2)2(k1)xy2k(x2+y2)4xy)\displaystyle=\frac{k-\ell}{2}\left((k-\ell-1)(x^{2}+y^{2})-2(k-\ell-1)xy-2k(x^{2}+y^{2})-4\ell xy\right)
=k2((k1)(x2+y2)2(k1)xy4xy)0,\displaystyle=\frac{k-\ell}{2}\left((-k-\ell-1)(x^{2}+y^{2})-2(k-\ell-1)xy-4\ell xy\right)\leq 0,

as this yields the terms in Eqs. (26)–(28) are bounded above by 2h1nh(k)(Λ1hΛ2h)22\sum_{h\geq 1}n_{h}(k-\ell)(\Lambda_{1h}-\Lambda_{2h})^{2} as desired. For a lower bound, we have that (where γ:=|Λ11Λ22|\gamma:=|\Lambda_{11}-\Lambda_{22}|)

𝔼H0(Tk,A)𝔼H1(T,A)\displaystyle\mathbb{E}_{H_{0}}(T_{k,A})-\mathbb{E}_{H_{1}}(T_{\ell,A}) 2(k)[h1nh(Λ1hΛ2h)2(k+)δ2γ2/2]+μξ\displaystyle\geq 2(k-\ell)\left[\sum_{h\geq 1}n_{h}(\Lambda_{1h}-\Lambda_{2h})^{2}-(k+\ell)\delta^{2}-\gamma^{2}/2\right]+\mu_{\xi}
2(k)[minini(k+)]δ2(k)γ2+μξ\displaystyle\geq 2(k-\ell)[\min_{i}n_{i}-(k+\ell)]\delta^{2}-(k-\ell)\gamma^{2}+\mu_{\xi}

To derive the desired lower bound on the terms in Eqs. (26)–(28), we see that

(k2)\displaystyle\binom{k-\ell}{2} (xy)2k(k)(x2+y2)2(k)xy\displaystyle(x-y)^{2}-k(k-\ell)(x^{2}+y^{2})-2\ell(k-\ell)xy
=k2((k+)(x+y)2+(xy)2)\displaystyle=-\frac{k-\ell}{2}((k+\ell)(x+y)^{2}+(x-y)^{2})
k2((k+)2δ2+γ2)=(k)(k+)δ2(k)γ2/2\displaystyle\geq-\frac{k-\ell}{2}((k+\ell)2\delta^{2}+\gamma^{2})=-(k-\ell)(k+\ell)\delta^{2}-(k-\ell)\gamma^{2}/2

as desired.

Under our assumptions on Λ\Lambda and 𝐄\mathbf{E}, we have that

2η(1η)𝔼H0(Tk,A)\displaystyle 2\eta(1-\eta)\mathbb{E}_{H_{0}}(T_{k,A}) VarH0(Tk,A)(12η(1η))𝔼H0(Tk,A)\displaystyle\leq\text{Var}_{H_{0}}(T_{k,A})\leq(1-2\eta(1-\eta))\mathbb{E}_{H_{0}}(T_{k,A})
2(ηη^)(1η+η^)𝔼H1(T,A)\displaystyle 2(\eta-\hat{\eta})(1-\eta+\hat{\eta})\mathbb{E}_{H_{1}}(T_{\ell,A}) VarH1(T,A)(12(ηη^)(1η+η^))𝔼H1(T,A)\displaystyle\leq\text{Var}_{H_{1}}(T_{\ell,A})\leq(1-2(\eta-\hat{\eta})(1-\eta+\hat{\eta}))\mathbb{E}_{H_{1}}(T_{\ell,A})

To see this for Tk,AT_{k,A} (with T,AT_{\ell,A} being analogous), let σk\sigma_{k} (resp., σ\sigma_{\ell}) be the permutation associated with Q2k\textbf{Q}_{2k} (resp., Q2\textbf{Q}_{2\ell}), and define

𝒮k\displaystyle\mathcal{S}_{k} :={{i,j} s.t. {i,j}{σk(i),σk(j)}}\displaystyle:=\{\,\{i,j\}\text{ s.t. }\{i,j\}\neq\{\sigma_{k}(i),\sigma_{k}(j)\}\}
𝒮\displaystyle\mathcal{S}_{\ell} :={{i,j} s.t. {i,j}{σ(i),σ(j)}}.\displaystyle:=\{\,\{i,j\}\text{ s.t. }\{i,j\}\neq\{\sigma_{\ell}(i),\sigma_{\ell}(j)\}\}.

For ease of notation, define

𝔞ij:\displaystyle\mathfrak{a}_{ij}: =pi,j(1)(1pij(1,k))+pij(2,k)(1pij(1))\displaystyle=p^{(1)}_{i,j}(1-p_{ij}^{(1,k)})+p_{ij}^{(2,k)}(1-p^{(1)}_{ij})
𝔟ij:\displaystyle\mathfrak{b}_{ij}: =2pi,j(1)(1pij(1))\displaystyle=2p^{(1)}_{i,j}(1-{p}^{(1)}_{ij})

Then

VarH0(Tk,A)=𝔼H0(Tk,A){ij}𝒮k𝔞ij2{ij}𝒮k𝔟ij2;\displaystyle\text{Var}_{H_{0}}(T_{k,A})=\mathbb{E}_{H_{0}}(T_{k,A})-\sum_{\{ij\}\in\mathcal{S}_{k}}\mathfrak{a}_{ij}^{2}-\sum_{\{ij\}\notin\mathcal{S}_{k}}\mathfrak{b}_{ij}^{2};

noting that

2η(1η)\displaystyle 2\eta(1-\eta) 𝔞ij12η(1η)\displaystyle\leq\mathfrak{a}_{ij}\leq 1-2\eta(1-\eta)
2η(1η)\displaystyle 2\eta(1-\eta) 𝔟ij12η(1η)\displaystyle\leq\mathfrak{b}_{ij}\leq 1-2\eta(1-\eta)

yields

VarH0(Tk,A)\displaystyle\text{Var}_{H_{0}}(T_{k,A}) 𝔼H0(Tk,A)(12η(1η)){ij}𝒮k𝔞ij(12η(1η)){ij}𝒮k𝔟ij\displaystyle\geq\mathbb{E}_{H_{0}}(T_{k,A})-(1-2\eta(1-\eta))\!\!\sum_{\{ij\}\in\mathcal{S}_{k}}\!\!\mathfrak{a}_{ij}-(1-2\eta(1-\eta))\!\!\sum_{\{ij\}\notin\mathcal{S}_{k}}\!\!\mathfrak{b}_{ij}
=2η(1η)𝔼H0(Tk,A)\displaystyle=2\eta(1-\eta)\mathbb{E}_{H_{0}}(T_{k,A})

and, similarly, VarH0(Tk,A)(12η(1η))𝔼H0(Tk,A)\text{Var}_{H_{0}}(T_{k,A})\leq(1-2\eta(1-\eta))\mathbb{E}_{H_{0}}(T_{k,A}).

Stein’s method (see [78, 79]) yields that under both H0H_{0} (resp., H1H_{1}) TA(A1,A2,k)T_{A}(\textbf{A}_{1},\textbf{A}_{2,k}) (resp., TA(A1,A2,)T_{A}(\textbf{A}_{1},\textbf{A}_{2,\ell})) are aymptotically normally distributed, and hence the testing power is asymptotically equal to (where Φ~\widetilde{\Phi} is the standard normal tail CDF, and C>0C>0 is a constant that can change line-to-line, and n=mininin_{*}=\min_{i}n_{i})

H1\displaystyle\mathbb{P}_{H_{1}} (T,A𝔠α,k)H1(T,AzαVarH0(Tk,A)+𝔼H0(Tk,A))\displaystyle(T_{\ell,A}\geq\mathfrak{c}_{\alpha,k})\approx\mathbb{P}_{H_{1}}(T_{\ell,A}\geq z_{\alpha}\sqrt{\text{Var}_{H_{0}}(T_{k,A})}+\mathbb{E}_{H_{0}}(T_{k,A}))
=H1(T,A𝔼H1(T,A)VarH1(T,A)zαVarH0(Tk,A)+𝔼H0(Tk,A)𝔼H1(T,A)VarH1(T,A(a)))\displaystyle=\mathbb{P}_{H_{1}}\left(\frac{T_{\ell,A}-\mathbb{E}_{H_{1}}(T_{\ell,A})}{\sqrt{\text{Var}_{H_{1}}(T_{\ell,A})}}\geq\frac{z_{\alpha}\sqrt{\text{Var}_{H_{0}}(T_{k,A})}+\mathbb{E}_{H_{0}}(T_{k,A})-\mathbb{E}_{H_{1}}(T_{\ell,A})}{\sqrt{\text{Var}_{H_{1}}(T^{(a)}_{\ell,A})}}\right)
Φ~(zαVarH0(Tk,A)+𝔼H0(Tk,A)𝔼H1(T,A)VarH1(T,A))\displaystyle\approx\widetilde{\Phi}\left(\frac{z_{\alpha}\sqrt{\text{Var}_{H_{0}}(T_{k,A})}+\mathbb{E}_{H_{0}}(T_{k,A})-\mathbb{E}_{H_{1}}(T_{\ell,A})}{\sqrt{\text{Var}_{H_{1}}(T_{\ell,A})}}\right)
Φ~(C(2(k)(nnk+n)δ2knγ2+μξn))\displaystyle\leq\widetilde{\Phi}\left(C\cdot\left(2(k-\ell)\left(\frac{n_{*}}{n}-\frac{k+\ell}{n}\right)\delta^{2}-\frac{k-\ell}{n}\gamma^{2}+\frac{\mu_{\xi}}{n}\right)\right)
Φ~(C((k)nnk22nδ2knγ2+μξn))\displaystyle\leq\widetilde{\Phi}\left(C\cdot\left((k-\ell)\frac{n_{*}}{n}-\frac{k^{2}-\ell^{2}}{n}\delta^{2}-\frac{k-\ell}{n}\gamma^{2}+\frac{\mu_{\xi}}{n}\right)\right)

Now, we have that power is asymptotically negligible if

(k)nnk22nδ2knγ2+μξn0(k-\ell)\frac{n_{*}}{n}-\frac{k^{2}-\ell^{2}}{n}\delta^{2}-\frac{k-\ell}{n}\gamma^{2}+\frac{\mu_{\xi}}{n}\gg 0

as desired.

A.4 Additional Experiments

Herein, we include the additional experiments from Sections 3 and 4. We first show an example of the P^\widehat{\textbf{P}} test in a sparser regime than that considered in Section 3. As in Section 3, we consider b(v)=2𝟙{v{1,2,,250}b(v)=2-\mathds{1}\{v\in\{1,2,\cdots,250\}, we consider two n=500n=500 vertex SBMs defined via

ASBM(2,[0.050.010.010.04],b,1);BSBM(2,[0.050.010.010.04]+Eϵ,b,1)\textbf{A}\sim\text{SBM}\left(2,\begin{bmatrix}0.05&0.01\\ 0.01&0.04\end{bmatrix},b,1\right);\quad\textbf{B}\sim\text{SBM}\left(2,\begin{bmatrix}0.05&0.01\\ 0.01&0.04\end{bmatrix}+\textbf{E}_{\epsilon},b,1\right) (31)

where Eϵ=ϵJ500\textbf{E}_{\epsilon}=\epsilon J_{500} for ϵ=0.01,0.05,0.005\epsilon=0.01,0.05,-0.005. Results are displayed in Figure 12; here we see the same trend in play for the modestly sparse regime (when ϵ=0.01,0.05\epsilon=0.01,0.05). Indeed, in this sparse regime, the testing power is high even for low noise (ϵ=0.01\epsilon=0.01), as the signal–to–noise ratio is still favorable for inference. As in the dense case, if the error is too small (here ϵ=0.005\epsilon=-0.005), the test is unable to distinguish the two networks irregardless of the shuffling effect.

Refer to caption
Figure 12: For the experimental setup considered in Section A.4, we plot the empirical testing power in the presence of shuffling for the P^\widehat{\textbf{P}}-based test in the sparse regime. In the figure the x-axis represents the number of vertices actually shuffled in Un,kU_{n,k} (i.e., the number shuffled in the alternative) while the curve colors represent the maximum number of vertices potentially shuffled via in Un,kU_{n,k}; error bars are ±2\pm 2s.e.).

We next consider additional experiments in the ASE-based testing of Section 4. Herein, we show results for more values of k,,andλk,\ell,and\lambda.

Refer to caption
Figure 13: For the experimental setup considered in Section 4, we plot the empirical testing power in the presence of shuffling for the four tests: the Frobenius norm difference between the adjacency-matrices, between P^\widehat{\textbf{P}}’s, TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}}. In the figure the x-axis represents the number of vertices actually shuffled in Un,kU_{n,k} (i.e., the number shuffled in the alternative) while the curve colors represent the maximum number of vertices potentially shuffled via in Un,kU_{n,k}.
Refer to caption
Figure 14: For the experimental setup considered in Section 4, we plot the empirical testing power in the presence of shuffling for the four tests: the Frobenius norm difference between the adjacency-matrices, between P^\widehat{\textbf{P}}’s, TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}}. In the figure the x-axis represents the number of vertices actually shuffled in Un,kU_{n,k} (i.e., the number shuffled in the alternative) while the curve colors represent the maximum number of vertices potentially shuffled via in Un,kU_{n,k}.
Refer to caption
Figure 15: For the experimental setup considered in Section 4, we plot the empirical testing power in the presence of shuffling for the four tests: the Frobenius norm difference between the adjacency-matrices, between P^\widehat{\textbf{P}}’s, TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}}. In the figure the x-axis represents the number of vertices actually shuffled in Un,kU_{n,k} (i.e., the number shuffled in the alternative) while the curve colors represent the maximum number of vertices potentially shuffled via in Un,kU_{n,k}.
Refer to caption
Figure 16: For the experimental setup considered in Section 4, we plot the empirical testing power in the presence of shuffling for the four tests: the Frobenius norm difference between the adjacency-matrices, between P^\widehat{\textbf{P}}’s, TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}}. In the figure the x-axis represents the number of vertices actually shuffled in Un,kU_{n,k} (i.e., the number shuffled in the alternative) while the curve colors represent the maximum number of vertices potentially shuffled via in Un,kU_{n,k}.
Refer to caption
Figure 17: For the experimental setup considered in Section 4, we plot the empirical testing power in the presence of shuffling for the four tests: the Frobenius norm difference between the adjacency-matrices, between P^\widehat{\textbf{P}}’s, TOmniT_{\text{Omni}} and TSemiparT_{\text{Semipar}}. In the figure the x-axis represents the number of vertices actually shuffled in Un,kU_{n,k} (i.e., the number shuffled in the alternative) while the curve colors represent the maximum number of vertices potentially shuffled via in Un,kU_{n,k}.