Vertex Nomination in Richly Attributed Networks
Abstract
Vertex nomination is a lightly-supervised network information retrieval task in which vertices of interest in one graph are used to query a second graph to discover vertices of interest in the second graph. Similar to other information retrieval tasks, the output of a vertex nomination scheme is a ranked list of the vertices in the second graph, with the heretofore unknown vertices of interest ideally concentrating at the top of the list. Vertex nomination schemes provide a useful suite of tools for efficiently mining complex networks for pertinent information. In this paper, we explore, both theoretically and practically, the dual roles of content (i.e., edge and vertex attributes) and context (i.e., network topology) in vertex nomination. We provide necessary and sufficient conditions under which vertex nomination schemes that leverage both content and context outperform schemes that leverage only content or context separately. While the joint utility of both content and context has been demonstrated empirically in the literature, the framework presented in this paper provides a novel theoretical basis for understanding the potential complementary roles of network features and topology.
1 Introduction
Network data has become ubiquitous in the sciences, owing to the generality and flexibility of networks in modeling relations among entities. Networks appear in such varied fields as neuroscience, genomics, the social sciences, economics and ecology, to name just a few (see, for example, [36]). As such, statistical analysis of network data has emerged as an important field within modern statistics [22, 23, 11]. Many classical statistical inference tasks, such as hypothesis testing [56, 57, 24, 16], regression [14, 33], and maximum likelihood estimation [4, 52, 2] have been adapted to network data. Inference tasks that are specific to network data, such as link-prediction [25], community detection [37, 47, 53], and vertex nomination [32, 9, 55, 12] have also seen increasing popularity in recent years. Among these network-specific tasks is the vertex nomination (VN) problem, in which the goal is to identify vertices similar to one or more vertices specified as being of interest to a practitioner. The VN task is similar in spirit to popular network-based information retrieval (IR) procedures such as PageRank [40] and personalized recommender systems on graphs [19]. In VN, the goal is as follows: Given vertices of interest in a graph , produce a ranked list of the vertices in a second graph according to how likely they are judged to be interesting. Ideally, interesting vertices in should concentrate at the top of the ranked list. As an inference task, this formulation of VN is distinguished from other supervised network IR tasks by the generality of what may define vertices as interesting and the limited available training data in . In contrast to typical IR problems, there is little or no training data available in the VN problem.
The vertex nomination problem was first introduced as a task involving only a single graph, and vertices of interest were modeled as belonging to a single community of vertices [9, 12, 28, 63]. The information provided by vertices with known community memberships, called seed vertices, was leveraged to rank vertices with unknown membership, with both network-topology and available vertex features being leveraged to produce ranking schemes [32, 10, 55]. This single-graph, community-based definition of the problem is somewhat limited in its ability to capture network models beyond the stochastic blockmodel [18]. Subsequent work lifted the problem to the two-network setting considered here [42], allowing a generalization of what defines interesting vertices and a generalization of the network models that could be considered [42, 29, 1].
In many settings, observed networks are endowed with features at the vertex and/or edge level. For example, in social networks, vertices typically correspond to users for whom we have demographic information, and edges correspond to different types of social relations. The theoretical advances in both the single- and multiple-graph VN problem recounted above were established in the context of networks where no such feature are available. It is natural, then, to seek to better understand the effect of network attributes on the theoretical VN framework developed in [29] and [1]. Motivated by this, in the present work we develop VN on richly-featured networks, and we explore how the incorporation of this information impacts the concepts of Bayes optimality and consistency for the VN problem. Furthermore, in Sections 4 and 5, adopting an information theoretic perspective, we give the first steps toward a theoretical understanding (which is born out in subsequent experiments) of the potential benefit of VN schemata that use both content and context versus one of content or context alone.
The remainder of the paper is laid out as follows. In Section 2, we outline the extension of the vertex nomination framework to the attributed network setting, defining richly-featured graphs in Section 2.1, VN schema in Section 2.2, and VN performance measures in Section 2.3. In Section 3, we derive the Bayes optimal VN scheme in the setting of richly-featured networks, and in Sections 4 and 5 we compare VN performance in the richly-featured setting to that in the settings where feature information or network information, respectively, is missing. Experiments further illustrating the practical implications of our theoretical results are presented in Section 6.
2 Vertex Nomination with Features
In the initial works introducing vertex nomination, where the defining trait of interesting vertices was membership in a community of interest, graph models with latent community structure (e.g., the stochastic blockmodel of [18, 20]) were sensible models for the underlying network structure. The need for a more general notion of what renders a vertex interesting necessitated more nuanced models, culminating in the Nominatable Distribution network model introduced in [29]. We take this model as our starting point, and extend it by endowing it with both edge and vertex features.
2.1 Richly Featured Networks
We begin by defining the class of networks with vertex and edge features, which we call richly-featured networks. We note here that there is a large literature on inference within attributed networks, with graphs endowed with features arising in settings such as social network analysis [21, 62] and knowledge representation [38, 39], among others.
Definition 1.
Let and be discrete sets of possible vertex and edge features, respectively. A richly-featured network indexed by is an ordered tuple where
-
i.
is a labeled, undirected graph on vertices. The vertices of will be denoted via either or .
-
ii.
denotes the matrix of -dimensional vertex features, so that is the vector of features associated with vertex .
-
iii.
Let , where we use as a special symbol representing unavailable data. Letting , denotes the matrix of -dimensional edge features. Indexing lexicographically, for , we write for the vector of features associated with edge . The form of is then
We further require that for all , and for all .
We use to denote the set of all richly-featured networks indexed by .
Let . In the definition of richly-featured networks, for , we interpret the edge features as unavailable data. This is a sensible assumption in practice, and is commonly made in attributed network models; see, for example [43, 65]. We note that the structure of encodes the edge structure of , but we choose to keep the redundant information in Definition 1, as encodes the purely topological structure of the graph, absent any edge- or vertex-level features. This fact will prove useful below, when we seek investigate the role of graph structure in the absence of features and vice versa.
Remark 2.
We use discrete vertex and edge feature sets in Definition 1, as this is both rich enough to model many real world networks (where features often encode types or characteristics of nodes or edges, and edge weights often derive from count data) and amenable to the theoretical derivations in vertex nomination. Considering continuous features is not a practical problem, but does raise subtle measure-theoretic difficulties in the theory to follow. See Remark 23 for further discussion.
Example 3.
Consider the graph with given by
The edge features for this network would then be of the form
Remark 4.
Let be an ordered tuple of nonnegative integers, and let and be discrete sets of vertex and edge features, respectively. In the definitions and exposition that follow, we consider -valued random variables. Implicitly, we mean the following: letting be a given probability space, is a -valued random variable if it is -measurable, where is the total sigma field on , is the total sigma field on , and is the total sigma field on .
With Definition 1 in hand, lifting the definition of Nominatable Distributions first introduced in [29] to the attributed graph setting is relatively straightforward.
Definition 5.
Given and sets of discrete vertex and edge features and , respectively, the set of Richly Featured Nominatable Distributions of order with feature sets and , which we denote , is the collection of all families of distributions of the form
where is a distribution on (recalling that ), parameterized by and satisfying the following conditions:
-
1.
The vertex sets and satisfy for . We refer to as the core vertices. These are the vertices that are shared across the two graphs and imbue the model with a natural vertex correspondence.
-
2.
Vertices in and , satisfy . We refer to and as junk vertices. These are the vertices in each graph that have no corresponding vertex in the other graph
-
3.
If is distributed according to , then is a -valued random variable and is a -valued random variable. The edge features and almost surely satisfy
and
-
4.
The richly-featured subgraphs induced by the junk vertices,
are conditionally independent given .
In Definition 5, the rows of are the vertex features of , with representing the feature associated with vertex in . Similarly, the rows of are the vertex features of , with representing the vertex feature of vertex in . We do not, a priori, assume that any vertex features are missing, although extending the definition to is straightforward. With this definition in place, we are ready to define feature-aware vertex nomination schemes.
Note: In order to ease notation below, we will write
and accordingly write for . In the sequel, we will assume that the feature sets and are given, and satisfy . We will suppress the dependence of the family of richly-featured nominatable distributions on the feature sets and , writing in place of .
2.2 Vertex Nomination Schemes
In vertex nomination, the labels of vertices in the second graph, , are assumed unknown a priori. In order to accomplish this in our Featured Nominatable Distribution framework, we introduce obfuscating functions as in [29]. Obfuscation functions serve to hide vertex labels, and can be interpreted as a non-probabilistic version of the vertex shuffling considered in [60] and [26].
Definition 6.
Consider graphs with vertex sets and , respectively. An obfuscating set, , of and of order is a set satisfying for and . Given obfuscating set , an obfuscating function is a bijection from to . We denote by the set of all such obfuscating functions. For a richly-featured network , we will write where
-
i.
denotes the graph with labels obfuscated by . That is, , where and is such that if and only if .
-
ii.
is the vertex feature matrix associated with , so that for ,
-
iii.
is the edge feature matrix associated with , so that for ,
Note that we will assume that is endowed with an arbitrary but fixed ordering, and that the edges of are ordered lexicographically according to this ordering on . We do not necessarily assume that the ordering of is the ordering induced by . That is, we do not necessarily assume that implies .
Relating this definition back to Definition 5, the purpose of the obfuscating function is to render the labels on the vertices in uninformative with respect to the correspondence between and encoded in the core vertices . Following this logic, it is sensible to require vertex nomination schemes (defined below) to be independent of vertex labels. Informally, if a set of vertices have identical features and edge structures among them, then their rankings in a VN scheme should be independent of the chosen obfuscating function . This is made precise in Definition 9 and Assumption 10 below, but requires some preliminary definitions.
Definition 7 (Action of a permutation on a richly-featured network).
Let be a richly-featured network. A permutation acts on to produce , where
-
i.
is the graph with its vertex labels permuted by .
-
ii.
is the vertex feature matrix associated with , so that for ,
-
iii.
is the edge feature matrix associated with , so that for ,
Definition 8 (feature-preserving automorphisms and isomorphisms).
We call a permutation a feature-preserving automorphism (abbreviated f-automorphism) of if . Similarly, We call a permutation a feature-preserving isomorphism between and (abbreviated f-isomorphism) if .
Let be a richly-featured network. For each , define
With the above notation in hand, we are now ready to introduce the concept of a feature-aware vertex nomination scheme (often abbreviated VN scheme in the sequel). In the definition to follow, represents the set of vertices of interest in . These are usually assumed to be in , and the goal of a VN scheme is to have concentrate at the top of the produced rank list in .
Definition 9 (Feature-aware VN Scheme).
Let and , be given. Let be an obfuscating set of and of order , and let be given. For a set , let denote the set of all total orderings of the elements of . A feature-aware vertex nomination scheme (FA-VN scheme) on is a function
satisfying the consistency criteria in Assumption 10. We let denote the set of all such VN schemes.
The consistency criteria required of FA-VN schemes essentially forces the schemes to be agnostic to the labels in the obfuscated . To accomplish this, we define the following.
Assumption 10 (FA-VN Consistency criteria).
With notation as in Definition 9, for each and , define
to be the position of in the total ordering provided by . Further, define according to
For any , , obfuscating functions and any , we require
(1) | ||||
where denotes the -th element (i.e., the rank- vertex) in the ordering .
Figure 1 gives a simple illustrative example of this consistency criterion (i.e., Eq. 1) in action. Note here that if for all , then the consistency criterion forces
for any permutation and obfuscating .
![]() |
![]() |
(a) Internally consistent scheme | (b) Inconsistent scheme |
In VN and other IR ranking problems, ties due to identical structure (here represented by f-isomorphisms in or ) cause theoretical complications. We refer the interested reader to [29] and [1] for examples of these complications and how they can be handled. In order to avoid the additional notational and definitional burdens required to deal with tie-breaking in these situations, we will make the following assumption on the distributions considered in .
Assumption 11.
Let and consider the events
satisfies .
This assumption is unrealistic if there are only a few categorical vertex features (for example, roles in a corporate hierarchy), but this assumption is less restrictive when there are a large number of available categorical features or the features are continuous. We stress that this assumption is made purely to ease the presentation of theoretical material, and the practical impact of this assumption being violated is easily overcome.
2.3 Loss and Bayes Loss
A vertex nomination scheme is, essentially, a semi-supervised IR system for querying large networks. Similar to the recommender system framework [46], a VN scheme is judged to be successful if the top of the nomination list contains a high concentration of vertices of interest from the second network. This motivates the definition of VN loss based on the concept of precision-at-.
Definition 12.
Let be a VN scheme, an obfuscating set of and of order , and . Let be realized from
with a vertex of interest set . For , we define
-
(i)
For realized as , the level- nomination loss
-
(ii)
The level- error of is defined as
where and .
The level-k Bayes optimal scheme for is defined as any element
with corresponding Bayes error .
Remark 13.
Note that we could have also defined a recall-based loss function via
We focus on the more natural precision-based loss function, but we note in passing that consistency and Bayes optimality with respect to these two loss functions is equivalent when .
3 Bayes Optimal Vertex Nomination Schemes
In [29], a Bayes optimal VN scheme (i.e., one that achieves optimal expected VN loss) was derived in the setting where one observes a network without features. In the feature-rich setting, derivations are similar, though they require a more careful technical treatment. After some preliminary work, this section culminates in the definition of the feature-aware Bayes optimal scheme in Section 3.2.
3.1 Obfuscating Features
We are now faced with the problem of modeling the effect of the obfuscating function on features under the VN framework. If we observe , then we have no knowledge of which member of
was obfuscated, but we do know what features are associated to each of the vertices and edges. In order to model this setting, we adopt the following conventions. Let and , be given. Furthermore, let the set of vertices of interest, , be fixed. Let be an obfuscating set of of and of order , and . Define to be the set of asymmetric richly-featured graphs
Under Assumption 11, is supported on .
For each , define
Note that if the asymmetry of yields that
In light of the action of the obfuscating function on the features and vertex labels of , we view as the set of possible graph pairs that could have led to the observed graph pair .
For each and , we also define the following restriction:
and for define
3.2 Defining Bayes Optimality
We are now ready to define a Bayes optimal scheme for a given satisfying Assumption 11. We will define the scheme element-wise on each asymmetric , and then systematically lift the scheme to all of . To wit, let
form a partition of . To ease notation, we adopt the following shorthand.
-
1.
We use to denote the event .
-
2.
For , we use to denote the event
We will use this often with .
-
3.
We use to denote the event
We will use this often with .
Let denote the set of indices such that . That is, and are asymmetric as richly-featured networks. For each , writing for to ease notation, define
where we break ties in an arbitrary but fixed manner. For each element
choose the f-isomorphism such that , and define
For , any fixed and arbitrary definition of (subject to the consistency criterion in Definition 9) on will suffice, as this set has measure under by Assumption 11. Theorem 14 shows that defined above is indeed level- Bayes optimal for all for any nominatable distribution satisfying Assumption 11. A proof is given in Appendix B.1.
Theorem 14.
For ease of reference in the sections to come, Table 1 summarizes the notation used in the paper so far.
Symbol | Description | Definition |
---|---|---|
For , this denotes | - | |
For a set , this represents the | - | |
set | ||
Denotes a discrete set of vertex features | - | |
Denotes a discrete set of edge features | - | |
The set , where is a special symbol | Def. 1 | |
representing unavailable or missing data | ||
For , the set of all labeled, undirected | - | |
graphs on vertices | ||
For , the set of richly-featured | Def. 1 | |
networks of order with vertex (resp., edge) | ||
features in (resp., ) | ||
For graph , denotes the set of vertices of | - | |
For graph , denotes the set of edges of | - | |
- | ||
- | ||
The vector of all ’s | - | |
This denotes the -th row of a matrix | - | |
For a set , this denotes the submatrix of | - | |
with rows indexed by | ||
If satisfy , then is | - | |
isomorphic to | ||
If satisfy , then is | Def. 8 | |
feature-preserving isomorphic to | ||
A nomination scheme | Def. 9 | |
the set of all feature-aware VN schemes | Def. 9 | |
level- error of a VN scheme | Def. 12 |
4 Feature-Oblivious Vertex Nomination
It is intuitively clear that incorporating features should improve VN performance, provided those features are correlated with vertex “interestingness”. Indeed, this is a common theme across many graph-based machine learning tasks (see, for example, [64, 5, 34]), and the same holds in the present VN setting. The combination of network structure and informative features can significantly improve the VN Bayes error. Consider, for instance, the following simple example set in the context of the stochastic blockmodel [18].
Definition 15.
An undirected -vertex graph is an instantiation of a Stochastic Blockmodel with parameters , abbreviated , if:
-
1.
The vertex set is partitioned into communities ;
-
2.
The community membership function agrees with the partition of the vertices, so that for each ;
-
3.
is a matrix of probabilities: for each , and the collection of random variables are mutually independent.
Example 16.
Let be independent -vertex random graphs with
where are fixed (i.e., do not vary with ). Edges in both and are independent and the probability of an edge between vertices is equal to
Take with corresponding vertex of interest with . In the absence of features, , owing to the fact that vertices in the same community are stochastically identical. If the graphs are endowed with one-dimensional edge features ,
we see that a ranking scheme that ignores network structure can do no better than randomly ranking the vertices with feature , and thus has a loss . In contrast, if one considers the richly attributed graphs and , the Bayes optimal loss is improved to for all , as the network topology and vertex features offer complementary information.
Can Bayes optimal performance in VN ever be improved by ignoring features? To answer this question, we must first define a scheme that both ignores features and satisfies the consistency criteria in Definition 9, and it is not immediately obvious how to do so.
Firstly, we must define what it means to ignore features in a richly-featured network. Toward this end, consider the following definition, which defines a procedure for mapping a richly-featured network to a simple graph structure (i.e., the network stripped of its feature information).
Definition 17.
Let . For , we define to be the graph compatible with the edge features in . That is, , and is such that for all and for all
We stress that a priori, it is not necessarily clear that a VN scheme exists that simultaneously ignores features and satisfies the consistency requirements of Definition 9. We illustrate this point with a brief example. A scheme that ignores features must rank vertices identically regardless of features, so long as the features are compatible in the sense of the mapping just defined. More formally, it must be that for all and all features
with and ,
(2) |
Now consider and that are asymmetric as attributed networks (i.e., have no non-trivial f-automorphisms), but for which the raw graphs and have non-trivial automorphism groups. By assumption, there exists a permutation such that
Then the consistency criterion in Definition 9 requires
while Equation (2) requires contradicting .
For Equation (2) to hold in general, we must consider a consistency criterion analogous to Equation (1) that is compatible with schemes that ignore the vertex features. The following definition, adapted from [1], suffices.
Definition 18 (Feature-Oblivious VN Scheme).
Similar to the FA-VN consistency criteria, we require FO-VN schemes to be similarly label-agnostic for the obfuscated labels of the second graph.
Assumption 19 (FO-VN Consistency Criteria).
With notation as in Definition 18, for each and , let
For any , , obfuscating functions and any , we require
(3) | ||||
where denotes the -th element in the ordering (i.e., the rank- vertex).
The criterion in Equation (3) is less restrictive than that in Equation (1), and it is not immediate that incorporating features yields an FA-VN scheme with smaller loss than the Bayes optimal FO-VN scheme. We illustrate this in the following example.
Example 20.
Let be a distribution such that and , where denotes the complete graph on vertices. If the -automorphism group of are a.s. trivial, then any given FA-VN scheme can be outperformed by a well-chosen FO-VN scheme. Indeed, if there is a single vertex of interest in with corresponding vertex in , then there exists a FO-VN scheme that satisfies for almost all . Such a cannot satisfy Equation (1), and it is possible to have for FA-VN Bayes optimal .
However, consider distributions satisfying the following assumption.
Assumption 21.
Let and consider the events
satisfies .
Under Assumption 21, we have that , and the consistency criteria in Assumptions 10 and 19 are almost surely equivalent. It is then immediate that Bayes optimality cannot be improved by ignoring features. That is, a FO-VN scheme is almost surely a FA-VN scheme, and hence for all . This leads us to ask whether we can establish conditions under which ignoring features strictly decreases VN performance.
4.1 Feature oblivious Bayes optimality
We first establish the notion of a Bayes optimal FO-VN scheme for distributions satisfying Assumption 21. Defining
let be such that partitions . For supported on , it follows from [1] that a Bayes optimal FO-VN scheme, , can be constructed as follows. If , and is any featured extension of (note that this notation will be implicit below), we sequentially define (breaking ties in a fixed but arbitrary manner and writing for to ease notation)
where and (that is, the graphs without their features) are defined analogously to the featured and respectively:
For each choose the f-isomorphism such that , and define
For elements , any fixed and arbitrary definition of satisfying Equation (3) suffices (as this set has measure under by Assumption 21). Note that is almost surely well-defined, as the definition of on
is independent of the choice of the partition .
4.2 The Benefit of Features
With the FO-VN scheme defined, we seek to understand when, for distributions supported on , we have where and are the Bayes optimal FA-VN and FO-VN schemes, respectively, under . Toward this end, we first define the following -valued random variable.
Definition 22.
Let , let be a given set of vertices of interest, and let be an obfuscating set of and of order with obfuscating function . Let be a VN scheme (either feature-aware or feature-oblivious), and define, letting be our sample space, the -valued random variable
by . For each , define , where we define to be the set of all -tuples of distinct elements of (each such tuple can be viewed as specifying a total ordering of distinct elements of ).
Remark 23.
Note that in the setting of continuous features, the measurability of is not immediate (and indeed, is non-trivial to establish); this technical hurdle is the main impetus for discretizing the feature space.
We can now characterize the conditions under which incorporating features strictly improves VN performance. A proof of Theorem 24 can be found in Appendix B.2.
Theorem 24.
Consider the setup and notation of Definition 22 and suppose that satisfies Assumption 21. Letting and be Bayes optimal FA-VN and FO-VN schemes, respectively, under , we have that if and only if there exists a Bayes optimal FA-VN scheme with
where is the mutual information and the statistical entropy defined by
where we have written as shorthand for .
We note that, since , we can restate the result of Theorem 24 as if and only if for all Bayes optimal FA-VN schemes ,
Stated succinctly, there is excess uncertainty in after observing ; and is not deterministic given .
5 Network-Oblivious Vertex Nomination
In contrast to the feature-oblivious VN schemes considered in Section 4, one can also consider VN schemes that use only features and ignore network structure. Defining such a network-oblivious VN scheme (NO-VN scheme) is not immediately straightforward. Ideally, we would like to have that for all and all edge features , compatible with and respectively,
(4) |
for any choice of vertex features . As in the FO-VN scheme setting, this leads to potential violation of the internal consistency criteria of Equation (1). Indeed, consider and with asymmetric graphics but with symmetries in (i.e., there exists non-identity permutation matrix such that ). On such networks, Equations (1) and (4) cannot both hold simultaneously. Thus, we consider a relaxed consistency criterion as in Assumption 19. We first define
and make the following consistency assumption.
Assumption 25 (NO-VN Consistency Criteria).
For any , letting be an obfuscating set of and of order with , be the set of vertices of interest, and taking , if is a VN scheme satisfying this assumption, then
(5) | ||||
where denotes the -th element in the ordering (i.e., the rank- vertex under ).
A network-oblivious VN scheme is then a VN scheme as in Definition 9, where the consistency criterion of Equation (1) is replaced with that in Equation (5) and we further require Equation (4) to hold. As with FA-VN schemes, we consider distributions satisfying the following assumption.
Assumption 26.
Let and define the events and . is such that .
We note the parallel between this assumption and Assumption 21, while noting that the two assumptions concern permutations acting on markedly different objects (graphs, in the case of Assumption 21 and vertex-level features in the case of Assumption 26). Under Assumption 26, we have that , and the consistency criteria of Assumption 25 are almost surely equivalent. As in Section 4, under this assumption, we have that Bayes optimality cannot be improved by ignoring the network. Indeed, one can show that the NO-VN scheme is almost surely a FA-VN scheme, and we are led once again to ask under what circumstances VN performance will be strictly worsened by ignoring the network (and subsequently, the edge features). To this end, we wish to compare Bayes optimality of NO-VN with that of FA-VN.
5.1 Network-oblivious Bayes optimality
We first establish the notion of a Bayes optimal NO-VN scheme for distributions satisfying Assumption 26. Define
and for , define
For a given satisfying Assumption 26, we will define the Bayes optimal NO-VN scheme, , element-wise on vertex feature matrices with no row repetitions (similar to in Section 3.2), and then lift the scheme to all richly-featured graphs with vertex features not in . For with and having distinct rows, let and be the unique graphs with edge structure compatible with and respectively. Writing and , we define, writing for to ease notation
where we write in the conditioning statement as shorthand for (writing and )
Note that once again, ties in the maximizations when constructing are assumed to be broken in an arbitrary but nonrandom manner. For each element
choose the f-isomorphism such that , and define
For elements and arbitrary edge features , any fixed and arbitrary definition of on (well-defined) graphs in suffices, subject to the internal consistency criterion in Equation (5), as this set has measure under under Assumption 26.
5.2 The benefit of network topology
Once again, for distributions satisfying Assumption 26, our aim is to under stand when . That is, when does incorporating the network topology into the vertex-level features strictly improve VN performance? Theorem 27 characterizes these conditions. The proof is completely analogous to the proof of Theorem 2, and is included in Appendix B.3 for completeness.
Theorem 27.
Let be a richly-featured nominatable distribution satisfying Assumption 26. Let be a given set of vertices of interest and let be an obfuscating set of and of order with . Let and be Bayes optimal FA-VN and NO-VN schemes, respectively, under . Then if and only if there exists a Bayes optimal FA-VN scheme with
Note that, since Theorem 27 can be restated as (i.e., incorporating network structure improves performance) if and only if all Bayes optimal FA-VN schemes satisfy . Said yet another way, incorporating network structure improves VN performance if and only if there is excess uncertainty in conditional on the features . This is precisely when the network structure is informative— the FA-VN scheme incorporates both network and feature information into its ranking, while the NO-VN scheme incorporates only the feature information carried by .
6 Simulations and Experiments
We turn now to a brief experimental exploration of the VN problem as applied to both simulated and real data. We consider a VN scheme based on spectral clustering, which we denote . We refer the reader to [1] for details and a further exploration of this scheme in an adversarial version of vertex nomination without node or edge features.
In our experiments, edge features will appear as edge weights or edge directions, while vertex features will take the form of feature matrices and , following the notation of previous sections. The scheme proceeds as follows. Note that we have assumed for simplicity, but the procedure can be extended to pairs of differently-sized networks in a straight-forward manner.
- i.
-
ii.
Embed the two networks into a common Euclidean space, using Adjacency Spectral Embedding [54]; see Appendix A.2 for details.
The embedding dimension is chosen by estimating the elbow in the scree plots of the adjacency matrices of the networks and [66], taking to be the larger of the two elbows.
Applying ASE to an -vertex graph results in a mapping of the vertices in the graph to points in . We denote the embeddings of graphs and by , respectively, with the -th row of each of these matrices corresponding to the embedding of the -th vertex in its corresponding network.
-
iii.
Given seed vertices (see Appendix A.3) whose correspondence is known a priori across networks, solve the orthogonal Procrustes problem [17] (see Appendix A.4) to align the rows of and . Apply this Procrustes rotation to the rows of , yielding . If -dimensional vertex features are available, append the vertex features to the embeddings as and .
-
iv.
Cluster the rows of both and using a Gaussian mixture modeling-based clustering procedure; see, for example, the mClust package in R; [15].
For each vertex , let and be the mean and covariance of the normal mixture component containing . For each , compute the distances
-
v.
Rank the unseeded vertices in so that the vertex minimizing is ranked first, with ties broken in an arbitrary but fixed manner.
Below, we apply this VN scheme in an illustrative simulation and in two real data network settings derived from neuroscience and text-mining applications.
6.1 Synthetic data
To further explore the complementary roles of network structure and features in vertex nomination, we consider the following simulation, set in the context of the stochastic blockmodel [18], as described in Definition 15. We consider independent of with , ,
and , where denotes the -by- matrix of all ones. We designate block 1 as the anomalous block, containing the vertices of interest across the two networks, with the signal in the anomalous block 1 dampened in compared to owing to the convex combination of and the “flat” matrix . We will consider the vertices of interest to be all such that . We select 10 vertices at random from block 1 in and from block 1 in to serve as “seeded” vertices, meaning vertices whose correspondences are known ahead of time.
We consider vertex features of the form (letting denote the -by- identity matrix)
independently over all and generating and independently of one another. Note that when applying our scheme to the above data, we set the number of blocks to be the “true” , with the number of clusters in step (iv) set to 5 as well. In practice, there are numerous principled heuristics to select this dimension parameter (e.g., USVT or finding an elbow in the scree plot [7, 66]) and the number of clusters (e.g., optimizing silhouette width or minimizing BIC [15]). We do not pursue these model selection problems further here.
The effects of and are as follows. Larger values of provide more separation between the blocks in the underlying SBM, making it easier to distinguish the vertices of interest from the seeded vertices. This is demonstrated in Figure 2, where we vary with held fixed. The figure shows, for different values of , the gain in precision at achieved by incorporating the graph topology as compared to a nomination scheme based on features alone. That is, defining
-
•
to be the number of vertices of interest in nominated in the top by applied to , and
-
•
to be the number of vertices of interest in nominated in the top by applied to , that is, step (iv) of the algorithm above applied only to the vertex features,
That is, letting be any of the our nomination schemes under consideration (e.g., the features-only scheme) and letting denote the vertices of interest in , we evaluate performance according to the number of vertices ranked in the top
We note that we do not consider seeded vertices in our ranked list, so the maximum value achievable by either or is 40.
Figure 2 plots for . Results are averaged over 100 Monte Carlo replicates of the experiment, with error bars indicating two standard errors of the mean. Examining the figure, we see the expected phenomenon: as increases, the gain in VN precision from incorporating the network increases. For small values of , the graphs are detrimental to performance when compared to using features alone, since the structure of and are such that it is difficulty to distinguish the communities from one another (and to distinguish the interesting community from the rest of the network). As increases, the community structure in networks and becomes easier to detect, and incorporating network structure into the VN procedure becomes beneficial to performance as compared to a procedure using only vertex features.

While controls the strength of the signal present in the network, controls the signal present in the features, with larger values of allowing stronger delineation of the block of interest from the rest of the graph based on features alone. To demonstrate this, we consider the same experiment as that summarized in Figure 2, but this time fixing and varying . The results are summarized in Figure 3, where we plot over where is the number of vertices of interest in nominated in the top by applied to (i.e., ignoring vertex features). As with Figure 2, we see that as increases, the gain in VN performance from incorporating vertex features increases. For small values of , features are slightly detrimental to performance, again owing to the fact that there is insufficient signal present in them to differentiate the vertices of interest from the rest of the network.
In each of Figures 2 and 3, using one of the two available data modalities (networks or features) gives performance that, while significantly better than chance, is suboptimal. These experiments suggest that combining informative network structure with informative features should yield better VN performance than utilizing either source in isolation.

6.2 C. Elegans
We next consider a real data example derived from the C. elegans connectome, as presented in [61, 59]. In this data, vertices correspond to neurons in the C. elegans, with edges encoding which pairs of neurons form synapses. The data capture the connectivity among the 302 labeled neurons in the hermaphroditic C. elegans brain for two different synapse types called electrical gap junctions and chemical synapses. These two different synaptic types yield two distinct connectomes (i.e., brain networks) capturing the two different kinds of interactions between neurons. After preprocessing the data, including removing neurons that are isolates in either connectome, symmetrizing the directed chemical connectome and removing self-loops (see [8] for details), we obtain two weighted networks on shared vertices: , capturing the chemical synapses, and , capturing the electrical gap junction synapses. The graphs are further endowed with vertex labels (i.e., vertex features), which assign each vertex (i.e., neuron) to one of three neuronal types: sensory, motor, or inter-neurons.

Each of the 253 neurons in has a known true corresponding neuron in . Thus, there is a sensible ground truth in a vertex nomination problem across and , in the sense that each vertex in has one and only one corresponding vertex in . As such, this data provides a natural setting for evaluating vertex nomination performance. We thus consider the following experiment: a vertex in is chosen uniformly at random and designated as the vertex of interest. An additional 20 vertices are sampled to serve as seeded vertices for the Procrustes alignment step, and the nomination scheme is applied as outlined previously. Performance was measured by computing the number of vertices of interest whose corresponding match was ranked in the top , according to
where denotes the vertex in corresponding to . We denote by and , the performance of VN applied to, respectively, the network only; the features only; and both the network and features jointly. Figure 4 summarizes the result of 100 independent Monte Carlo trials of this experiment. Each curve in the figure corresponds to one trial. In each trial, we compared the performance of VN based on both network structure and vertex features against using either only network structure (i.e., feature-oblivious VN) or only vertex features (i.e., network-oblivious VN). The left panel of Figure 4 shows VN performance based on both network structure and neuronal type features, which we append onto and in step (iii) above, minus performance of the scheme using the graph alone (i.e., ). Similarly, in the right panel of Figure 4, we consider VN performance based on the graph with the neuronal type features minus performance of the scheme in the setting with only neuronal features (i.e., ). Within each plot, we have selected one line (i.e., one trial) to highlight, corresponding to a trial with a comparatively “good” seed set. Note that same trial (and thus the same seed set) is highlighted in both panels. Performance is also summarized in Table 2.
1.73 | 4.76 | 7.60 | 7.83 | 8.12 | 7.25 | 6.14 | -1.29 | |
1.53 | 7.94 | 15.55 | 22.31 | 28.30 | 34.42 | 41.35 | 73.43 |
Using only the neuronal features for vertex nomination amounts to considering a coarse clustering of the neurons into the three neuronal types. As such, recovering correspondences across the two networks networks based only on this feature information is effectively at chance, conditioned on the neuronal type. When is approximately , the graph is effectively providing only enough information to coarsely cluster the vertices into their neuronal types. Examining Figure 4 and Table 2, it is clear that incorporating features adds significant signal compared to only considering network structure. Indeed, is uniformly positive.
Interestingly, here the right-hand panel suggests that adding the network topology improves performance compared to a scheme that only uses features. Of course, in general we expect that network structure should add significant signal to the features, but this observation is surprising in the present setting. In the present data set, it is known that the network topology differs dramatically across the two different synapse types. For example, has more than three times the edges of . As a result, it is notoriously difficult to discover the vertex correspondence across this pair of networks using only topology. Indeed, state-of-the-art network alignment algorithms only recover approximately of the correspondences correctly even using a priori known seeded vertices [41]. It is thus not immediate that there is sufficient signal in the networks to identify individual neurons across networks beyond their vertex type. While the features add significant signal to the network, the graph also adds signal to the features. For small , which are typically most important for most vertex nomination problems, is positive on average, and for well-chosen seed sets (see the black lines in the figure), this difference can be dramatic.
6.3 Wikipedia data
As another illustration of vertex nomination with features on real data, we consider a pair of networks derived from Wikipedia articles in English and French. We begin with a network whose vertices correspond to English language Wikipedia articles and whose edges join pairs of articles reachable one from another via a hyperlink. The English language network corresponds to the articles within two hops of the article titled “Algebraic Geometry”, with the articles grouped into 6 “types” according to their broad topics; see [31] for a detailed description. We then consider a paired network of vertices corresponding to French language Wikipedia articles on the same topics, with correspondence across these networks encoding whether or not one article’s title is an exact or approximate translation of the other. The hyperlink structure among the articles within each language yields a natural network structure, and the semantic content of the pages, as encoded via a bag-of-words representation, provides a natural choice of vertex features. As in [51], we consider capturing both the network and semantic feature information within each network via dissimilarity matrices. We use shortest path dissimilarity in the hyperlink graph and cosine dissimilarity between extracted text features. This procedure yields four dissimilarity matrices, two for each of English and French Wikipedia. We then embed the pages according to these dissimilarity measures using canonical multidimensional scaling; see [6] for detail. This yields four embeddings, corresponding to each pairing of language (English or French) and structure (network or semantic features).
In order to disentangle the information contained in the network structure and the features, we consider the following experiment. Using randomly chosen “seeded” vertices across the networks (recall that seeded vertices are those whose correspondence is known a priori across networks), we align embeddings using orthogonal Procrustes alignment [17]. We then cluster the combined point clouds as in step (iv) of Section 6 and nominate across the datasets using step (v) of Section 6. We use clusters in MClust to reflect the six different broad article types in the Wikipedia data. We next use the JOFC algorithm of [44, 30] to jointly embed the English dissimilarities and (separately) jointly embed the French dissimilarities. Within each language, we then average across the embedded dissimilarities, and repeat the above procedure: Procrustes alignment using randomly chosen seeded vertices across the networks, followed by clustering and nomination according to steps (iv) and (v) outlined at the beginning of Section 6. This procedure was repeated with Monte Carlo replicates. We note here that while the embedding procedure differs from adjacency spectral embedding, the core nomination strategy post-embedding is unchanged from that presented at the start of Section 6.
As in the C. elegans example, we consider each of the articles as vertices of interest separately. For each vertex in , we consider the rank of its true corresponding match in , and record how many vertices have their true corresponding matches ranked in the top , for varying values of . Figure 5 summarizes how including both network structure and vertex features improves performance, as measured by the fraction of vertices whose true matches are ranked in the top . In the left (respectively, right) panel of Figure 5, we plot the performance when nominating across the jointly embedded graphs as compared with the performance when nominating across only the embedded graph dissimilarities (respectively, text feature dissimilarities). Each light-colored curve in the figure corresponds to one of the 25 Monte Carlo replicates, with the dark curve representing the average across all 25 replicates. In both settings, we see the overall positive effect of using both network and features when nominating across languages. However, from the figure it is clear that there is an asymmetry in information content across network and features, in that the network topology contributes markedly less to the total performance gain than the textual feature information. Moreover, while the inclusion of text features is nearly uniformly helpful as compared with the network alone (left panel), a poor choice of seed vertices may result in a situation wherein network information actually impedes performance compared to using text-derived features alone. This can be seen in the light blue curves below zero in the right-hand panel of Figure 5.

7 Discussion
It is intuitively clear that informative features and network topology will together yield better performance in most network inference tasks compared to using either mode in isolation. Indeed, in the context of vertex nomination, this has been established empirically across a host of application areas [10, 32]. However, examples abound where the underlying network does not offer additional information for subsequent information retrieval, and may even be detrimental; see, for example, [45]. In this paper, we have established the first (to our knowledge) theoretical exploration of the dual role of network and features, and we provide necessary and sufficient conditions under which VN performance can be improved by incorporating both network structure and features. Along the way, we have formulated a framework for vertex nomination in richly-featured networks, and derived the analogue of Bayes optimality in this framework. We view this work as constituting an initial step towards a more comprehensive understanding of the benefits of incorporating features into network data and complementing classical data with network structure. A core goal of future work is to extend the framework presented here to incorporate continuous features; establish theoretical results supporting our empirical findings of the utility of features and network in the algorithm; understand the role of missing or noisily observed features; and develop a framework for adversarial attack analysis in this richly-featured setting akin to that in [1].
Acknowledgments This material is based on research sponsored by the Air Force Research Laboratory and DARPA under agreement numbers FA8750-18-2-0035 and FA8750-20-2-1001, and by NSF grants DMS-1646108 and DMS-2052918. This work is also supported in part by the D3M program of the Defense Advanced Research Projects Agency. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory and DARPA or the U.S. Government. The authors also gratefully acknowledge the support of NIH grant BRAIN U01-NS108637. KL acknowledges the support of the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.
Appendix A Algorithmic primitives
Here, we provide background information and technical details related to the algorithmic primitives involved in the scheme described in Section 6.
A.1 Passing to ranks and diagonal augmentation
Consider a weighted adjacency matrix , and let be the vector of edge weights of . Note that we are agnostic to the dimension of , which will vary according to whether is symmetric, hollow, etc. Define by taking to be the rank of in the weight vector , with ties broken by averaging ranks. By the pass-to-ranks operation, we mean to replace the edge weights in with the vector . That is, replacing the weighted edges of by their ranks. Note that if is binary, the pass-to-ranks operation simply returns unchanged.
By diagonal augmentation we mean setting
for each . In experiments, we find that these preprocessing steps are essential for robust and reliable performance on real network data [58].
A.2 Adjacency Spectral Embedding
Given an undirected network with adjacency matrix , the -dimensional Adjacency Spectral Embedding (ASE) of yields a mapping of the vertices in the network to points in -dimensional space in such a way that vertices that play similar structural roles in the network are mapped to nearby points in [53].
Definition 28 (Adjacency spectral embedding).
Given , the adjacency spectral embedding (ASE) of into is defined by where
is the spectral decomposition of , is the diagonal matrix with the largest eigenvalues of on its diagonal and has columns which are the eigenvectors corresponding to the eigenvalues of . The -th row of corresponds to the position in -dimensional Euclidean space to which the -th vertex is mapped.
A.3 Seeds
In vertex nomination, vertices in the core are shared across the two networks, although the correspondence between and is unknown owing to the obfuscating function. In many applications, however, some of these correspondences may be known ahead of time. We refer to vertices in for which this correspondence is known as seeded vertices, and denote them by . Said another way, seeded vertices are vertices in whose labels are not obfuscated. In this case, the obfuscating function would take the form where
and is an obfuscating set of order satisfying for . Seeded vertices, and the information they provide, have proven to be valuable resources across both VN (e.g., [12, 28, 42]) and other network-alignment tasks (e.g., [13, 27, 35]).
A.4 Orthogonal Procrustes
The -dimensional adjacency spectral embedding of a network on vertices yields a collection of points in , one point for each vertex. A natural way to compare two networks on vertices is to compare the point clouds produced by their adjacency spectral embeddings; see, e.g.,[56]. Approaches of this sort are especially natural in low-rank models, such as the random dot product graph [3, 48] and the stochastic block model. In such models, we can write the expectation of the adjacency matrix as for , and the adjacency spectral embedding of is a natural estimate of , up to orthogonal rotation. That is, for some unknown orthogonal , and are close. Non-identifiabilities of this sort are inherent to latent space network models, whereby transformations that preserve pairwise similarity of the latent positions lead to identical distributions over networks [50]. Owing to this non-identifiability, comparison of two networks via their adjacency spectral embeddings and requires accounting for this unknown rotation.
Given matrices , the orthogonal Procrustes problem seeks the orthogonal matrix that minimizes (where is the Frobenius norm). The problem is solved by computing the singular value decomposition , with the optimal given then by [49]. We note that the orthogonal Procrustes problem is just one of a number of related alignment problems for point clouds [17].
Appendix B Proofs and supporting results
Below we provide proofs of our main theoretical results and supporting lemmas.
B.1 Proof of Theorem 14
Recall that is the set of indices such that and are asymmetric as richly-featured networks (i.e., for there are no non-trivial f-automorphisms of ).
To compare the VN loss of to that of an arbitrary VN scheme , we will proceed as follows. Let be fixed. With
define for each . Then we have that
Next, note that for each and ,
To ease notation in what follows, we define the following key term for the support of satisfying Assumption 11; i.e., on all ,
and note that, by definition of as the optimal nomination scheme, for any ,
Thus, for any FA-VN scheme , we have
from which we deduce that , completing the proof.
B.2 Proof of Theorem 24
Suppose that , whence and thus for each with it holds for all that
For each , let denote the unique element in the support of . With this notation in hand, we define the FO-VN scheme as follows. For and , take
where satisfies
-
i.
;
-
ii.
is ordered lexicographically according to some predefined total ordering of .
Then is an FO-VN scheme by construction, and
from which and it follows that , as desired.
To prove the other half of the Theorem, we proceed as follows. The assumption that implies that (with notation as in Section B.1),
and therefore, since
for all , we conclude that
Therefore, there exists a tie-breaking scheme in the definition of that yields
for all , and hence
We therefore have that . Since is a constant given , we have , and therefore , completing the proof.
B.3 Proof of Theorem 27
We assume throughout that satisfies Assumption 26. Suppose , implying that . For each satisfying and each , we have
For each , let be the unique element in the support of . We define the NO-VN scheme as follows. For and , we take
where satisfies
-
i.
;
-
ii.
is ordered lexicographically according to some predefined total ordering of .
is an NO-VN scheme by construction, and
from which , and we conclude that .
To prove the other half of the theorem, we note that implies that (with notation as in Section B.1),
since
for all , we conclude that
Thus, there exists a tie-breaking scheme in the definition of such that
for all , and hence
We then have that . Since is a constant given , we have , whence we conclude that , which completes the proof.
B.4 Supporting lemmas
The following lemma follows from our assumption of asymmetry.
Lemma 29.
If for , then
Proof.
By the assumption that , we have that , and there exists an isomorphism such that . From our assumption that and the consistency criteria in Definition 9,
A similar argument shows that . We then have that
as we set out to show. ∎
Lemma 30.
Let be a Bayes optimal VN scheme, and let be any other VN scheme. For any ,
Proof.
If there exists an such that , the result follows from the definition of . Consider then
and let be such that . We have that
where the inequality follows from the optimality of . ∎
References
- [1] J. Agterberg, Y. Park, J. Larson, C. White, C. E. Priebe, and V. Lyzinski. Vertex nomination, consistent estimation, and adversarial modification. Electronic Journal of Statistics, 14(2):3230–3267, 2020.
- [2] J. Arroyo, D. L. Sussman, C. E. Priebe, and V. Lyzinski. Maximum likelihood estimation and graph matching in errorfully observed networks. Journal of Computational and Graphical Statistics, 30(4):1111–1123, 2021.
- [3] A. Athreya, D. E. Fishkind, K. Levin, V. Lyzinski, Y. Park, Y. Qin, D. L. Sussman, M. Tang, J. T. Vogelstein, and C. E. Priebe. Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226):1–92, 2018.
- [4] P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences of the United States of America, 106:21068–73, 2009.
- [5] N. Binkiewicz, J. T. Vogelstein, and K. Rohe. Covariate-assisted spectral clustering. Biometrika, 104(2):361–377, 2017.
- [6] I. Borg and P. J. F. Groenen. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
- [7] S. Chatterjee. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177–214, 2014.
- [8] L. Chen, J. T. Vogelstein, V. Lyzinski, and C. E. Priebe. A joint graph inference case study: the c. elegans chemical and electrical connectomes. Worm, 5(2):e1142041, 2016.
- [9] G. Coppersmith. Vertex nomination. Wiley Interdisciplinary Reviews: Computational Statistics, 6(2):144–153, 2014.
- [10] G. A. Coppersmith and C. E. Priebe. Vertex nomination via content and context. arXiv preprint arXiv:1201.4118, 2012.
- [11] H. Crane. Probabilistic foundations of statistical network analysis. Chapman and Hall/CRC, 2018.
- [12] D. E. Fishkind, V. Lyzinski, H. Pao, L. Chen, and C. E. Priebe. Vertex nomination schemes for membership prediction. The Annals of Applied Statistics, 9(3):1510–1532, 2015.
- [13] D.E. Fishkind, S. Adali, H. G. Patsolic, L. Meng, D. Singh, V. Lyzinski, and C.E. Priebe. Seeded graph matching. Pattern Recognition, 87:203 – 215, 2019.
- [14] B. K. Fosdick and P. D. Hoff. Testing and modeling dependencies between a network and nodal attributes. Journal of the American Statistical Association, 110(511):1047–1056, 2015.
- [15] C. Fraley and A. E. Raftery. Mclust: Software for model-based cluster analysis. Journal of Classification, 16(2):297–306, 1999.
- [16] C. E. Ginestet, J. Li, P. Balachandran, S. Rosenberg, and E. D. Kolaczyk. Hypothesis testing for network data in functional neuroimaging. The Annals of Applied Statistics, 11(2):725–750, 2017.
- [17] J. C. Gower and G. B. Dijksterhuis. Procrustes Problems. Oxford University Press, 2004.
- [18] P. W. Holland, K. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
- [19] Z. Huang, W. Chung, and H. Chen. A graph model for e-commerce recommender systems. Journal of the American Society for information science and technology, 55(3):259–274, 2004.
- [20] B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83, 2011.
- [21] M. Kim and J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet mathematics, 8(1-2):113–160, 2012.
- [22] E. D. Kolaczyk. Statistical analysis of network data: Methods and models. Springer-Verlag New York, 2009.
- [23] E. D. Kolaczyk and G. Csárdi. Statistical analysis of network data with R, volume 65. Springer, 2014.
- [24] J. Lei. A goodness-of-fit test for stochastic block models. The Annals of Statistics, 44(1):401–424, 2016.
- [25] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007.
- [26] V. Lyzinski. Information recovery in shuffled graphs via graph matching. IEEE Transactions on Information Theory, 64(5):3254–3273, 2018.
- [27] V. Lyzinski, S. Adali, J. T. Vogelstein, Y. Park, and C. E. Priebe. Seeded graph matching via joint optimization of fidelity and commensurability. arXiv preprint arXiv:1401.3813, 2014.
- [28] V. Lyzinski, K. Levin, D. E. Fishkind, and C. E. Priebe. On the consistency of the likelihood maximization vertex nomination scheme: Bridging the gap between maximum likelihood estimation and graph matching. Journal of Machine Learning Research, 17(179):1–34, 2016.
- [29] V. Lyzinski, K. Levin, and C. E. Priebe. On consistent vertex nomination schemes. Journal of Machine Learning Research, 20(69):1–39, 2019.
- [30] V. Lyzinski, Y. Park, C. E. Priebe, and M. Trosset. Fast embedding for jofc using the raw stress criterion. Journal of Computational and Graphical Statistics, 26(4):786–802, 2017.
- [31] Z. Ma, D. J. Marchette, and C. E. Priebe. Fusion and inference from multiple data sources in a commensurate space. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(3):187–193, 2012.
- [32] D. Marchette, C. E. Priebe, and G. Coppersmith. Vertex nomination via attributed random dot product graphs. In Proceedings of the 57th ISI World Statistics Congress, volume 6, 2011.
- [33] F. W. Marrs, B. K. Fosdick, and T. H. McCormick. Standard errors for regression on relational data with exchangeable errors. arXiv preprint arXiv:1701.05530, 2017.
- [34] Z. Meng, S. Liang, H. Bao, and X. Zhang. Co-embedding attributed networks. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 393–401. ACM, 2019.
- [35] E. Mossel and J. Xu. Seeded graph matching via large neighborhood statistics. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1005–1014. SIAM, 2019.
- [36] M. E. J. Newman. Networks. Oxford University Press, 2nd edition, 2018.
- [37] M. E. J. Newman and A. Clauset. Structure and inference in annotated networks. Nature Communications, 7(11863), 2016.
- [38] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2015.
- [39] M. Nickel, L. Rosasco, and T. Poggio. Holographic embeddings of knowledge graphs. In Thirtieth Aaai conference on artificial intelligence, 2016.
- [40] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
- [41] H. Patsolic, S. Adali, J. T. Vogelstein, Y. Park, C. E. Friebe, G. Li, and V. Lyzinski. Seeded graph matching via joint optimization of fidelity and commensurability. arXiv, pages arXiv–1401, 2014.
- [42] H. G. Patsolic, Y. Park, V. Lyzinski, and C. E. Priebe. Vertex nomination via seeded graph matching. Statistical Analysis and Data Mining: the ASA Data Science Journal, 13(3):229–244, 2020.
- [43] J. J. Pfeiffer III, S. Moreno, T. La Fond, J. Neville, and B. Gallagher. Attributed graph models: Modeling network structure with correlated attributes. In Proceedings of the 23rd international conference on World wide web, pages 831–842. ACM, 2014.
- [44] C. E. Priebe, D. J. Marchette, Z. Ma, and S. Adali. Manifold matching: Joint optimization of fidelity and commensurability. Brazilian Journal of Probability and Statistics, 27(3):377–400, 2013.
- [45] P. Rastogi, V. Lyzinski, and B. Van Durme. Vertex nomination on the cold start knowledge graph. Human Language Technology Center of Excellence: Technical report, 2017.
- [46] F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook. In Recommender systems handbook, pages 1–35. Springer, 2011.
- [47] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. Annals of Statistics, 39:1878–1915, 2011.
- [48] P. Rubin-Delanchy, J. Cape, M. Tang, and C. E. Priebe. A statistical interpretation of spectral embedding: the generalised random dot product graph. Journal of the Royal Statistical Society: Series B, 84(4):1446–1473, 2022.
- [49] P. H. Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
- [50] C. R. Shalizi and D. Asta. Consistency of maximum likelihood for continuous-space network models. arXiv:1711.02123, 2017.
- [51] C. Shen, J. T. Vogelstein, and C. E. Priebe. Manifold matching using shortest-path distance and joint neighborhood selection. Pattern Recognition Letters, 92:41–48, 2017.
- [52] T. A. B. Snijders, J. Koskinen, and M. Schweinberger. Maximum likelihood estimation for social network dynamics. The Annals of Applied Statistics, 4(2):567, 2010.
- [53] D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe. A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107(499):1119–1128, 2012.
- [54] D. L. Sussman, M. Tang, and C. E. Priebe. Consistent latent position estimation and vertex classification for random dot product graphs. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(1):48–57, 2014.
- [55] S. Suwan, D. S. Lee, and C. E. Priebe. Bayesian vertex nomination using content and context. Wiley Interdisciplinary Reviews: Computational Statistics, 7(6):400–416, 2015.
- [56] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. A semiparametric two-sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics, 26(2):344–354, 2017.
- [57] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, and C. E. Priebe. A nonparametric two-sample hypothesis testing problem for random dot product graphs. Bernoulli, 23(3):1599–1630, 2017.
- [58] R. Tang, M. Ketcha, A. Badea, E. D. Calabrese, D. S. Margulies, J. T. Vogelstein, C. E. Priebe, and D. L. Sussman. Connectome smoothing via low-rank approximations. IEEE Transactions on Medical Imaging, 38(6):1446–1456, 2018.
- [59] L. R. Varshney, B. L. Chen, E. Paniagua, D. H. Hall, and D. B. Chklovskii. Structural properties of the caenorhabditis elegans neuronal network. PLoS computational biology, 7(2):e1001066, 2011.
- [60] J. T. Vogelstein and C. E. Priebe. Shuffled graph classification: Theory and connectome applications. Journal of Classification, 32(1):3–20, 2015.
- [61] J. G. White, E. Southgate, J. N. Thomson, and S. Brenner. The structure of the nervous system of the nematode caenorhabditis elegans. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 314(1165):1–340, 1986.
- [62] J. Yang, J. McAuley, and J. Leskovec. Community detection in networks with node attributes. In 2013 IEEE 13th International Conference on Data Mining, pages 1151–1156. IEEE, 2013.
- [63] J. Yoder, L. Chen, H. Pao, E. Bridgeford, K. Levin, D. E. Fishkind, C. E. Priebe, and V. Lyzinski. Vertex nomination: The canonical sampling and the extended spectral nomination schemes. Computational Statistics & Data Analysis, 145, 2020.
- [64] S. Zhang and H. Tong. Final: Fast attributed network alignment. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1345–1354. ACM, 2016.
- [65] F. Zhou and F. De la Torre. Factorized graph matching. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 127–134. IEEE, 2012.
- [66] M. Zhu and A. Ghodsi. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2):918–930, 2006.