This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Vertex Nomination in Richly Attributed Networks

Keith Levin Department of Statistics, University of Wisconsin-Madison Carey E. Priebe Vince Lyzinski Department of Mathematics, University of Maryland-College Park
Abstract

Vertex nomination is a lightly-supervised network information retrieval task in which vertices of interest in one graph are used to query a second graph to discover vertices of interest in the second graph. Similar to other information retrieval tasks, the output of a vertex nomination scheme is a ranked list of the vertices in the second graph, with the heretofore unknown vertices of interest ideally concentrating at the top of the list. Vertex nomination schemes provide a useful suite of tools for efficiently mining complex networks for pertinent information. In this paper, we explore, both theoretically and practically, the dual roles of content (i.e., edge and vertex attributes) and context (i.e., network topology) in vertex nomination. We provide necessary and sufficient conditions under which vertex nomination schemes that leverage both content and context outperform schemes that leverage only content or context separately. While the joint utility of both content and context has been demonstrated empirically in the literature, the framework presented in this paper provides a novel theoretical basis for understanding the potential complementary roles of network features and topology.

1 Introduction

Network data has become ubiquitous in the sciences, owing to the generality and flexibility of networks in modeling relations among entities. Networks appear in such varied fields as neuroscience, genomics, the social sciences, economics and ecology, to name just a few (see, for example, [36]). As such, statistical analysis of network data has emerged as an important field within modern statistics [22, 23, 11]. Many classical statistical inference tasks, such as hypothesis testing [56, 57, 24, 16], regression [14, 33], and maximum likelihood estimation [4, 52, 2] have been adapted to network data. Inference tasks that are specific to network data, such as link-prediction [25], community detection [37, 47, 53], and vertex nomination [32, 9, 55, 12] have also seen increasing popularity in recent years. Among these network-specific tasks is the vertex nomination (VN) problem, in which the goal is to identify vertices similar to one or more vertices specified as being of interest to a practitioner. The VN task is similar in spirit to popular network-based information retrieval (IR) procedures such as PageRank [40] and personalized recommender systems on graphs [19]. In VN, the goal is as follows: Given vertices of interest in a graph G1G_{1}, produce a ranked list of the vertices in a second graph G2G_{2} according to how likely they are judged to be interesting. Ideally, interesting vertices in G2G_{2} should concentrate at the top of the ranked list. As an inference task, this formulation of VN is distinguished from other supervised network IR tasks by the generality of what may define vertices as interesting and the limited available training data in G1G_{1}. In contrast to typical IR problems, there is little or no training data available in the VN problem.

The vertex nomination problem was first introduced as a task involving only a single graph, and vertices of interest were modeled as belonging to a single community of vertices [9, 12, 28, 63]. The information provided by vertices with known community memberships, called seed vertices, was leveraged to rank vertices with unknown membership, with both network-topology and available vertex features being leveraged to produce ranking schemes [32, 10, 55]. This single-graph, community-based definition of the problem is somewhat limited in its ability to capture network models beyond the stochastic blockmodel [18]. Subsequent work lifted the problem to the two-network setting considered here [42], allowing a generalization of what defines interesting vertices and a generalization of the network models that could be considered [42, 29, 1].

In many settings, observed networks are endowed with features at the vertex and/or edge level. For example, in social networks, vertices typically correspond to users for whom we have demographic information, and edges correspond to different types of social relations. The theoretical advances in both the single- and multiple-graph VN problem recounted above were established in the context of networks where no such feature are available. It is natural, then, to seek to better understand the effect of network attributes on the theoretical VN framework developed in [29] and [1]. Motivated by this, in the present work we develop VN on richly-featured networks, and we explore how the incorporation of this information impacts the concepts of Bayes optimality and consistency for the VN problem. Furthermore, in Sections 4 and 5, adopting an information theoretic perspective, we give the first steps toward a theoretical understanding (which is born out in subsequent experiments) of the potential benefit of VN schemata that use both content and context versus one of content or context alone.

The remainder of the paper is laid out as follows. In Section 2, we outline the extension of the vertex nomination framework to the attributed network setting, defining richly-featured graphs in Section 2.1, VN schema in Section 2.2, and VN performance measures in Section 2.3. In Section 3, we derive the Bayes optimal VN scheme in the setting of richly-featured networks, and in Sections 4 and 5 we compare VN performance in the richly-featured setting to that in the settings where feature information or network information, respectively, is missing. Experiments further illustrating the practical implications of our theoretical results are presented in Section 6.

2 Vertex Nomination with Features

In the initial works introducing vertex nomination, where the defining trait of interesting vertices was membership in a community of interest, graph models with latent community structure (e.g., the stochastic blockmodel of [18, 20]) were sensible models for the underlying network structure. The need for a more general notion of what renders a vertex interesting necessitated more nuanced models, culminating in the Nominatable Distribution network model introduced in [29]. We take this model as our starting point, and extend it by endowing it with both edge and vertex features.

2.1 Richly Featured Networks

We begin by defining the class of networks with vertex and edge features, which we call richly-featured networks. We note here that there is a large literature on inference within attributed networks, with graphs endowed with features arising in settings such as social network analysis [21, 62] and knowledge representation [38, 39], among others.

Definition 1.

Let 𝒱\mathcal{V} and \mathcal{E} be discrete sets of possible vertex and edge features, respectively. A richly-featured network 𝐠{\bf g} indexed by (n,d1,d2,𝒱,)(n,d_{1},d_{2},\mathcal{V},\mathcal{E}) is an ordered tuple 𝐠=(g,𝐱,𝐰){\bf g}=(g,{\bf x},{\bf w}) where

  • i.

    g=(V,E)𝒢ng=(V,E)\in\mathcal{G}_{n} is a labeled, undirected graph on nn vertices. The vertices of gg will be denoted via either V={v1,v2,,vn}V=\{v_{1},v_{2},\cdots,v_{n}\} or V={u1,u2,,un}V=\{u_{1},u_{2},\cdots,u_{n}\}.

  • ii.

    𝐱𝒱n×d1{\bf x}\in\mathcal{V}^{n\times d_{1}} denotes the matrix of d1d_{1}-dimensional vertex features, so that 𝐱[v,:]{\bf x}[v,:] is the vector of features associated with vertex vv.

  • iii.

    Let ~={}\widetilde{\mathcal{E}}=\mathcal{E}\cup\{\star\}, where we use \star as a special symbol representing unavailable data. Letting N=(n2)N=\binom{n}{2}, 𝐰~N×d2{\bf w}\in\widetilde{\mathcal{E}}^{N\times d_{2}} denotes the matrix of d2d_{2}-dimensional edge features. Indexing (V2)\binom{V}{2} lexicographically, for e(V2)e\in\binom{V}{2}, we write 𝐰[e,:]{\bf w}[e,:] for the vector of features associated with edge ee. The form of 𝐰{\bf w} is then

    𝐰=(𝐰[{v1,v2},:]𝐰[{v1,v3},:]𝐰[{v1,v4},:]𝐰[{vn1,vn},:]).{\bf w}=\begin{pmatrix}{\bf w}[\{v_{1},v_{2}\},:]\\ {\bf w}[\{v_{1},v_{3}\},:]\\ {\bf w}[\{v_{1},v_{4}\},:]\\ \vdots\\ {\bf w}[\{v_{n-1},v_{n}\},:]\end{pmatrix}.

    We further require that 𝐰[e,:]=(,,,)~d2{\bf w}[e,:]=(\star,\star,\cdots,\star)\in\widetilde{\mathcal{E}}^{d_{2}} for all eEe\notin E, and 𝐰[e,:]d2{\bf w}[e,:]\in\mathcal{E}^{d_{2}} for all eEe\in E.

We use 𝒢n,𝒱,d1,d2\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},d_{2}} to denote the set of all richly-featured networks indexed by (n,d1,d2,𝒱,)(n,d_{1},d_{2},\mathcal{V},\mathcal{E}).

Let e(V2)e\in\binom{V}{2}. In the definition of richly-featured networks, for eEe\notin E, we interpret the edge features 𝐰[e,:]{\bf w}[e,:] as unavailable data. This is a sensible assumption in practice, and is commonly made in attributed network models; see, for example [43, 65]. We note that the structure of 𝐰{\bf w} encodes the edge structure of gg, but we choose to keep the redundant information in Definition 1, as gg encodes the purely topological structure of the graph, absent any edge- or vertex-level features. This fact will prove useful below, when we seek investigate the role of graph structure in the absence of features and vice versa.

Remark 2.

We use discrete vertex and edge feature sets in Definition 1, as this is both rich enough to model many real world networks (where features often encode types or characteristics of nodes or edges, and edge weights often derive from count data) and amenable to the theoretical derivations in vertex nomination. Considering continuous features is not a practical problem, but does raise subtle measure-theoretic difficulties in the theory to follow. See Remark 23 for further discussion.

Example 3.

Consider the graph 𝐠𝒢4,𝒱,d1,d2{\bf g}\in\mathcal{G}_{4,\mathcal{V},\mathcal{E}}^{\,d_{1},d_{2}} with gg given by

g=(Vg,Eg)=({1,2,3,4},{{1,2},{1,3},{1,4},{3,4}}).g=(V_{g},E_{g})=\left(\{1,2,3,4\},\{\ \{1,2\},\{1,3\},\{1,4\},\{3,4\}\ \}\right).

The edge features for this network would then be of the form

𝐰=(𝐰[{v1,v2},:]𝐰[{v1,v3},:]𝐰[{v1,v4},:]𝐰[{v2,v3},:]𝐰[{v2,v4},:]𝐰[{v3,v4},:])=(𝐰[{v1,v2},:]d2𝐰[{v1,v3},:]d2𝐰[{v1,v4},:]d2,,,,,,𝐰[{v3,v4},:]d2).{\bf w}=\begin{pmatrix}{\bf w}[\{v_{1},v_{2}\},:]\\ {\bf w}[\{v_{1},v_{3}\},:]\\ {\bf w}[\{v_{1},v_{4}\},:]\\ {\bf w}[\{v_{2},v_{3}\},:]\\ {\bf w}[\{v_{2},v_{4}\},:]\\ {\bf w}[\{v_{3},v_{4}\},:]\end{pmatrix}=\begin{pmatrix}{\bf w}[\{v_{1},v_{2}\},:]\in\mathcal{E}^{d_{2}}\\ {\bf w}[\{v_{1},v_{3}\},:]\in\mathcal{E}^{d_{2}}\\ {\bf w}[\{v_{1},v_{4}\},:]\in\mathcal{E}^{d_{2}}\\ \star,\star,\cdots,\star\\ \star,\star,\cdots,\star\\ {\bf w}[\{v_{3},v_{4}\},:]\in\mathcal{E}^{d_{2}}\end{pmatrix}.
Remark 4.

Let (n,d1,d2)(n,d_{1},d_{2}) be an ordered tuple of nonnegative integers, and let 𝒱\mathcal{V} and \mathcal{E} be discrete sets of vertex and edge features, respectively. In the definitions and exposition that follow, we consider 𝒢n,𝒱,d1,e1\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{d_{1},e_{1}}-valued random variables. Implicitly, we mean the following: letting (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) be a given probability space, (G,𝐗,𝐖):Ω𝒢n,𝒱,d1,d2(G,{\bf X},{\bf W}):\Omega\mapsto\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{d_{1},d_{2}} is a 𝒢n,𝒱,d1,d2\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{d_{1},d_{2}}-valued random variable if it is (,Gnd1d2)(\mathcal{F},\mathcal{F}_{G_{n}}\otimes\mathcal{F}_{d_{1}}\otimes\mathcal{F}_{d_{2}}^{*})-measurable, where Gn\mathcal{F}_{G_{n}} is the total sigma field on 𝒢n\mathcal{G}_{n}, d1\mathcal{F}_{d_{1}} is the total sigma field on 𝒱n×d1\mathcal{V}^{n\times d_{1}}, and d2\mathcal{F}_{d_{2}}^{*} is the total sigma field on ~N×d2\widetilde{\mathcal{E}}^{N\times d_{2}}.

With Definition 1 in hand, lifting the definition of Nominatable Distributions first introduced in [29] to the attributed graph setting is relatively straightforward.

Definition 5.

Given n,m>0n,m\in\mathbb{Z}_{>0} and sets of discrete vertex and edge features 𝒱\mathcal{V} and \mathcal{E}, respectively, the set of Richly Featured Nominatable Distributions of order (n,m)(n,m) with feature sets 𝒱\mathcal{V} and \mathcal{E}, which we denote 𝒱,(n,m)\mathcal{F}_{\mathcal{V},\mathcal{E}}^{(n,m)}, is the collection of all families of distributions of the form

𝐅(n,m)={\displaystyle\mathbf{F}^{(n,m)}=\Big{\{} Fc,θ,(d1,e1),(d2,e2)(n,m):(c,θ,(d1,e1),(d2,e2))0×d(n,m)×>02×>02,\displaystyle F^{(n,m)}_{c,\theta,(d_{1},e_{1}),(d_{2},e_{2})}\ :\ \big{(}c,\theta,(d_{1},e_{1}),(d_{2},e_{2})\big{)}\in\mathbb{Z}_{\geq 0}\times\mathbb{R}^{d(n,m)}\times\mathbb{Z}^{2}_{>0}\times\mathbb{Z}^{2}_{>0},
 and 0cmin(n,m)},\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\text{ and }0\leq c\leq\min(n,m)\Big{\}},

where Fc,θ,(d1,e1),(d2,e2)(n,m)F^{(n,m)}_{c,\theta,(d_{1},e_{1}),(d_{2},e_{2})} is a distribution on 𝒢n×𝒱n×d1×~N×e1×𝒢m×𝒱m×d2×~M×e2\mathcal{G}_{n}\times\mathcal{V}^{n\times d_{1}}\times\widetilde{\mathcal{E}}^{N\times e_{1}}\times\mathcal{G}_{m}\times\mathcal{V}^{m\times d_{2}}\times\widetilde{\mathcal{E}}^{M\times e_{2}} (recalling that N=(n2),M=(m2)N=\binom{n}{2},M=\binom{m}{2}), parameterized by θd(n,m)\theta\in\mathbb{R}^{d(n,m)} and satisfying the following conditions:

  1. 1.

    The vertex sets V1={v1,v2,,vn}V_{1}=\{v_{1},v_{2},...,v_{n}\} and V2={u1,u2,,um}V_{2}=\{u_{1},u_{2},...,u_{m}\} satisfy vi=uiv_{i}=u_{i} for 1ic1\leq i\leq c. We refer to C={v1,v2,,vc}={u1,u2,,uc}C=\{v_{1},v_{2},...,v_{c}\}=\{u_{1},u_{2},...,u_{c}\} as the core vertices. These are the vertices that are shared across the two graphs and imbue the model with a natural vertex correspondence.

  2. 2.

    Vertices in J1=V1CJ_{1}=V_{1}\setminus C and J2=V2CJ_{2}=V_{2}\setminus C, satisfy J1J2=J_{1}\cap J_{2}=\emptyset. We refer to J1J_{1} and J2J_{2} as junk vertices. These are the vertices in each graph that have no corresponding vertex in the other graph

  3. 3.

    If (G1,𝐗,𝐖,G2,𝐘,𝐙)(G_{1},{\bf X},{\bf W},G_{2},{\bf Y},{\bf Z}) is distributed according to Fc,θ,(d1,e1),(d2,e2)(n,m)F^{(n,m)}_{c,\theta,(d_{1},e_{1}),(d_{2},e_{2})}, then (G1,𝐗,𝐖)(G_{1},{\bf X},{\bf W}) is a 𝒢n,𝒱,d1,e1\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}-valued random variable and (G2,𝐘,𝐙)(G_{2},{\bf Y},{\bf Z}) is a 𝒢m,𝒱,d2,e2\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}}-valued random variable. The edge features 𝐖~N×e1{\bf W}\in\widetilde{\mathcal{E}}^{N\times e_{1}} and 𝐙~M×e2{\bf Z}\in\widetilde{\mathcal{E}}^{M\times e_{2}} almost surely satisfy

    𝐖[e,:]={(,,,)~e1 if eE(G1);e1 if eE(G1);{\bf W}[e,:]=\begin{cases}(\star,\star,\cdots,\star)\in\widetilde{\mathcal{E}}^{e_{1}}&\text{ if }e\notin E(G_{1});\\ \in\mathcal{E}^{e_{1}}&\text{ if }e\in E(G_{1});\end{cases}

    and

    𝐙[e,:]={(,,,)~e2 if eE(G2);e2 if eE(G2).{\bf Z}[e,:]=\begin{cases}(\star,\star,\cdots,\star)\in\widetilde{\mathcal{E}}^{e_{2}}&\text{ if }e\notin E(G_{2});\\ \in\mathcal{E}^{e_{2}}&\text{ if }e\in E(G_{2}).\end{cases}
  4. 4.

    The richly-featured subgraphs induced by the junk vertices,

    (G1[J1],𝐗[J1,:],𝐖[(J12),:]) and (G2[J2],𝐘[J2,:],𝐙[(J22),:])\left(G_{1}[J_{1}],{\bf X}[J_{1},:],{\bf W}\left[\binom{J_{1}}{2},\,:\,\right]\right)\text{ and }\left(G_{2}[J_{2}],{\bf Y}[J_{2},:],{\bf Z}\left[\binom{J_{2}}{2},\,:\,\right]\right)

    are conditionally independent given θ\theta.

In Definition 5, the rows of 𝐗𝒱n×d1{\bf X}\in\mathcal{V}^{n\times d_{1}} are the vertex features of G1G_{1}, with 𝐗[i,:]{\bf X}[i,:] representing the feature associated with vertex ii in G1G_{1}. Similarly, the rows of 𝐘𝒱n×d2{\bf Y}\in\mathcal{V}^{n\times d_{2}} are the vertex features of G2G_{2}, with 𝐘[i,:]{\bf Y}[i,:] representing the vertex feature of vertex ii in G2G_{2}. We do not, a priori, assume that any vertex features are missing, although extending the definition to 𝒱~=𝒱{}\widetilde{\mathcal{V}}=\mathcal{V}\cup\{\star\} is straightforward. With this definition in place, we are ready to define feature-aware vertex nomination schemes.

Note: In order to ease notation below, we will write

Θ:=(c,θ,(d1,e1),(d2,e2)),\Theta:=(c,\theta,(d_{1},e_{1}),(d_{2},e_{2})),

and accordingly write FΘ(n,m)F^{(n,m)}_{\Theta} for Fc,θ,(d1,e1),(d2,e2)(n,m)F^{(n,m)}_{c,\theta,(d_{1},e_{1}),(d_{2},e_{2})}. In the sequel, we will assume that the feature sets \mathcal{E} and 𝒱\mathcal{V} are given, and satisfy ||=|𝒱|=|\mathcal{E}|=|\mathcal{V}|=\infty. We will suppress the dependence of the family of richly-featured nominatable distributions on the feature sets \mathcal{E} and 𝒱\mathcal{V}, writing (n,m)\mathcal{F}^{(n,m)} in place of 𝒱,(n,m)\mathcal{F}_{\mathcal{V},\mathcal{E}}^{(n,m)}.

2.2 Vertex Nomination Schemes

In vertex nomination, the labels of vertices in the second graph, 𝐠2{\bf g}_{2}, are assumed unknown a priori. In order to accomplish this in our Featured Nominatable Distribution framework, we introduce obfuscating functions as in [29]. Obfuscation functions serve to hide vertex labels, and can be interpreted as a non-probabilistic version of the vertex shuffling considered in [60] and [26].

Definition 6.

Consider graphs (𝐠1,𝐠2)𝒢n,𝒱,d1,e1×𝒢m,𝒱,d2,e2({\bf g}_{1},{\bf g}_{2})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{d_{1},e_{1}}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{d_{2},e_{2}} with vertex sets V1V_{1} and V2V_{2}, respectively. An obfuscating set, HH, of V1V_{1} and V2V_{2} of order |V2|=m|V_{2}|=m is a set satisfying HVi=H\cap V_{i}=\emptyset for i=1,2,i=1,2, and |H|=|V2|=m|H|=|V_{2}|=m. Given obfuscating set HH, an obfuscating function 𝔬:V2H\mathfrak{o}:V_{2}\rightarrow H is a bijection from V2V_{2} to HH. We denote by 𝔒H\mathfrak{O}_{H} the set of all such obfuscating functions. For a richly-featured network 𝐠=(g,𝐱,𝐰)𝒢m,𝒱,d2,e2{\bf g}=(g,{\bf x},{\bf w})\in\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}}, we will write 𝔬(𝐠)=(𝔬(g),𝔬(𝐱),𝔬(𝐰))\mathfrak{o}({\bf g})=(\mathfrak{o}(g),\mathfrak{o}({\bf x}),\mathfrak{o}({\bf w})) where

  • i.

    𝔬(g)\mathfrak{o}(g) denotes the graph g=(Vg,Eg)g=(V_{g},E_{g}) with labels obfuscated by 𝔬\mathfrak{o}. That is, 𝔬(g)=(V𝔬(g),E𝔬(g))\mathfrak{o}(g)=(V_{\mathfrak{o}(g)},E_{\mathfrak{o}(g)}), where V𝔬(g)={𝔬(v):vVg)V_{\mathfrak{o}(g)}=\{\mathfrak{o}(v):v\in V_{g}) and E𝔬(g)E_{\mathfrak{o}(g)} is such that {u,v}Eg\{u,v\}\in E_{g} if and only if {𝔬(u),𝔬(v)}E𝔬(g)\{\mathfrak{o}(u),\mathfrak{o}(v)\}\in E_{\mathfrak{o}(g)}.

  • ii.

    𝔬(𝐱)\mathfrak{o}({\bf x}) is the vertex feature matrix associated with 𝔬(g)\mathfrak{o}(g), so that for uHu\in H,

    (𝔬(𝐱))[u,:]=𝐱[𝔬1(u),:].(\mathfrak{o}({\bf x}))[u,:]={\bf x}[\mathfrak{o}^{-1}(u),:].
  • iii.

    𝔬(𝐰)\mathfrak{o}({\bf w}) is the edge feature matrix associated with 𝔬(g)\mathfrak{o}(g), so that for {v,u}(H2)\{v,u\}\in\binom{H}{2},

    (𝔬(𝐰))[{v,u},:]=𝐰[{𝔬1(v),𝔬1(u)},:].\left(\mathfrak{o}({\bf w})\right)[\{v,u\},:]={\bf w}\left[\left\{\mathfrak{o}^{-1}(v),\mathfrak{o}^{-1}(u)\right\},:\right].

Note that we will assume that HH is endowed with an arbitrary but fixed ordering, and that the edges of 𝔬(𝐰)\mathfrak{o}({\bf w}) are ordered lexicographically according to this ordering on HH. We do not necessarily assume that the ordering of HH is the ordering induced by VV. That is, we do not necessarily assume that uvu\leq v implies 𝔬(u)𝔬(v)\mathfrak{o}(u)\leq\mathfrak{o}(v).

Relating this definition back to Definition 5, the purpose of the obfuscating function is to render the labels on the vertices in 𝐠2{\bf g}_{2} uninformative with respect to the correspondence between 𝐠1{\bf g}_{1} and 𝐠2{\bf g}_{2} encoded in the core vertices CC. Following this logic, it is sensible to require vertex nomination schemes (defined below) to be independent of vertex labels. Informally, if a set of vertices have identical features and edge structures among them, then their rankings in a VN scheme should be independent of the chosen obfuscating function 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H}. This is made precise in Definition 9 and Assumption 10 below, but requires some preliminary definitions.

Definition 7 (Action of a permutation on a richly-featured network).

Let 𝐠=(g,𝐱,𝐰)𝒢n,𝒱,d1,d2{\bf g}=(g,{\bf x},{\bf w})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},d_{2}} be a richly-featured network. A permutation σ:[n][n]\sigma:[n]\mapsto[n] acts on 𝐠{\bf g} to produce σ(𝐠)=(g,𝐱,𝐰)𝒢n,𝒱,d1,d2\sigma({\bf g})=(g^{\prime},{\bf x}^{\prime},{\bf w}^{\prime})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},d_{2}}, where

  • i.

    g=σ(g)g^{\prime}=\sigma(g) is the graph gg with its vertex labels permuted by σ\sigma.

  • ii.

    𝐱{\bf x}^{\prime} is the vertex feature matrix associated with gg^{\prime}, so that for v[n]v\in[n],

    𝐱[v,:]=𝐱[σ1(v),:].{\bf x}^{\prime}[v,:]={\bf x}[\sigma^{-1}(v),:].
  • iii.

    𝐰{\bf w}^{\prime} is the edge feature matrix associated with gg^{\prime}, so that for {u,v}([n]2)\{u,v\}\in\binom{[n]}{2},

    𝐰[{u,v},:]=𝐰[{σ1(u),σ1(v)},:].{\bf w}^{\prime}[\{u,v\},:]={\bf w}\left[\left\{\sigma^{-1}(u),\sigma^{-1}(v)\right\},:\right].
Definition 8 (feature-preserving automorphisms and isomorphisms).

We call a permutation σ\sigma a feature-preserving automorphism (abbreviated f-automorphism) of 𝐠{\bf g} if 𝐠=σ(𝐠){\bf g}=\sigma({\bf g}). Similarly, We call a permutation σ\sigma a feature-preserving isomorphism between 𝐠{\bf g} and 𝐠{\bf g}^{\prime} (abbreviated f-isomorphism) if 𝐠=σ(𝐠){\bf g}^{\prime}=\sigma({\bf g}).

Let 𝐠=(g,𝐱,𝐰)𝒢n,𝒱,d1,d2{\bf g}=(g,{\bf x},{\bf w})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},d_{2}} be a richly-featured network. For each uVgu\in V_{g}, define

(u;𝐠):={wVg s.t.  an f-automorphism σ of 𝐠, s.t. σ(u)=w}.\mathcal{I}(u;{\bf g}):=\{w\in V_{g}\text{ s.t. }\exists\text{ an f-automorphism }\sigma\text{ of }{\bf g},\text{ s.t. }\,\sigma(u)=w\}.

With the above notation in hand, we are now ready to introduce the concept of a feature-aware vertex nomination scheme (often abbreviated VN scheme in the sequel). In the definition to follow, VV^{*} represents the set of vertices of interest in 𝐠1{\bf g}_{1}. These are usually assumed to be in V1V2V_{1}\cap V_{2}, and the goal of a VN scheme is to have 𝔬(V)\mathfrak{o}(V^{*}) concentrate at the top of the produced rank list in 𝒯H\mathcal{T}_{H}.

Definition 9 (Feature-aware VN Scheme).

Let n,m,d1,e1,d2,e2>0n,m,d_{1},e_{1},d_{2},e_{2}\in\mathbb{Z}_{>0} and 𝒱\mathcal{V}, \mathcal{E} be given. Let HH be an obfuscating set of V1V_{1} and V2V_{2} of order |V2|=m|V_{2}|=m, and let 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H} be given. For a set AA, let 𝒯A{\mathcal{T}}_{A} denote the set of all total orderings of the elements of AA. A feature-aware vertex nomination scheme (FA-VN scheme) on 𝒢n,𝒱,d1,e1×𝔬(𝒢m,𝒱,d2,e2)\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}\times\mathfrak{o}(\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}}) is a function

Φ:𝒢n,𝒱,d1,e1×𝔬(𝒢m,𝒱,d2,e2)×2V1𝒯H\Phi:\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}\times\mathfrak{o}(\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}})\times 2^{V_{1}}\rightarrow{\mathcal{T}}_{H}

satisfying the consistency criteria in Assumption 10. We let 𝒩(n,m)=𝒩(d1,e1),(d2,e2)(n,m)\mathcal{N}^{(n,m)}=\mathcal{N}^{(n,m)}_{(d_{1},e_{1}),(d_{2},e_{2})} denote the set of all such VN schemes.

The consistency criteria required of FA-VN schemes essentially forces the schemes to be agnostic to the labels in the obfuscated 𝔬(𝐠2)\mathfrak{o}({\bf g}_{2}). To accomplish this, we define the following.

Assumption 10 (FA-VN Consistency criteria).

With notation as in Definition 9, for each uV2u\in V_{2} and VV1V^{*}\subseteq V_{1}, define

rankΦ(𝐠𝟏,𝔬(𝐠𝟐),V)(𝔬(u))\operatorname{rank}_{\Phi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*})}\big{(}\mathfrak{o}(u)\big{)}

to be the position of 𝔬(u)\mathfrak{o}(u) in the total ordering provided by Φ(𝐠𝟏,𝔬(𝐠𝟐),V)\Phi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*}). Further, define 𝔯Φ:𝒢nd1,e1×𝒢md2,e2×𝔒H×2V1×2V22[m]\mathfrak{r}_{\Phi}:\mathcal{G}_{n}^{\,d_{1},e_{1}}\times\mathcal{G}_{m}^{\,d_{2},e_{2}}\times\mathfrak{O}_{H}\times 2^{V_{1}}\times 2^{V_{2}}\mapsto 2^{[m]} according to

𝔯Φ(𝐠𝟏,𝐠𝟐,𝔬,V,S)={rankΦ(𝐠𝟏,𝔬(𝐠𝟐),V)(𝔬(u)) s.t. uS}.\mathfrak{r}_{\Phi}({\bf g_{1}},{\bf g_{2}},\mathfrak{o},V^{*},S)=\{\operatorname{rank}_{\Phi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*})}\big{(}\mathfrak{o}(u)\big{)}\text{ s.t. }u\in S\}.

For any 𝐠𝟏𝒢n,𝒱,d1,e1,{\bf g_{1}}\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}, 𝐠𝟐𝒢m,𝒱,d2,e2{\bf g_{2}}\in\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}}, VV1V^{*}\subset V_{1}, obfuscating functions 𝔬1,𝔬2𝔒H\mathfrak{o}_{1},\mathfrak{o}_{2}\in\mathfrak{O}_{H} and any uV(g2)u\in V(g_{2}), we require

𝔯Φ(𝐠𝟏,𝐠𝟐,𝔬1,V,(u;𝐠𝟐))=𝔯Φ(𝐠𝟏,𝐠𝟐,𝔬2,V,(u;𝐠𝟐))\displaystyle\mathfrak{r}_{\Phi}({\bf g_{1}},{\bf g_{2}},\mathfrak{o}_{1},V^{*},\mathcal{I}(u;{\bf g_{2}}))=\mathfrak{r}_{\Phi}\left({\bf g_{1}},{\bf g_{2}},\mathfrak{o}_{2},V^{*},\mathcal{I}(u;{\bf g_{2}})\right) (1)
𝔬2𝔬11((Φ(𝐠𝟏,𝔬1(𝐠𝟐),V)[k]);𝔬1(𝐠𝟐))=(Φ(𝐠𝟏,𝔬2(𝐠𝟐),V)[k];𝔬2(𝐠𝟐))\displaystyle\Leftrightarrow\mathfrak{o}_{2}\circ\mathfrak{o}_{1}^{-1}\big{(}\mathcal{I}(\Phi({\bf g_{1}},\mathfrak{o}_{1}({\bf g_{2}}),V^{*})[k]);\mathfrak{o}_{1}({\bf g_{2}})\big{)}=\mathcal{I}\left(\Phi({\bf g_{1}},\mathfrak{o}_{2}({\bf g_{2}}),V^{*})[k];\mathfrak{o}_{2}({\bf g_{2}})\right)
 for all k[m],\displaystyle\hskip 71.13188pt\text{ for all }k\in[m],

where Φ(𝐠𝟏,𝔬(𝐠𝟐),V)[k]\Phi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*})[k] denotes the kk-th element (i.e., the rank-kk vertex) in the ordering Φ(𝐠𝟏,𝔬(𝐠𝟐),V)\Phi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*}).

Figure 1 gives a simple illustrative example of this consistency criterion (i.e., Eq. 1) in action. Note here that if (u;𝐠2)={u}\mathcal{I}(u;{\bf g}_{2})=\{u\} for all uV2u\in V_{2}, then the consistency criterion forces

Φ(𝐠𝟏,σ(𝔬(𝐠𝟐)),V)=σ(Φ(𝐠𝟏,𝔬(𝐠𝟐),V))\Phi({\bf g_{1}},\sigma(\mathfrak{o}({\bf g_{2}})),V^{*})=\sigma(\Phi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*}))

for any permutation σ\sigma and obfuscating 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H}.

Refer to caption Refer to caption
(a) Internally consistent scheme (b) Inconsistent scheme
Figure 1: Example of VN schemes that (a) satisfy and (b) do not satisfy the consistency criterion in Eq. 1. Both subplots (a) and (b) denote two different obfuscation functions applied to the same graph. Here, (1;G2)={1}\mathcal{I}(1;G_{2})=\{1\}, (2;G2)=(8;G2)={2,8}\mathcal{I}(2;G_{2})=\mathcal{I}(8;G_{2})=\{2,8\}, (3;G2)=(7;G2)={3,7}\mathcal{I}(3;G_{2})=\mathcal{I}(7;G_{2})=\{3,7\}, (4;G2)={4}\mathcal{I}(4;G_{2})=\{4\}, (5;G2)={5}\mathcal{I}(5;G_{2})=\{5\} and (6;G2)={6}\mathcal{I}(6;G_{2})=\{6\}. Under 𝔬1\mathfrak{o}_{1}, {2,8}\{2,8\} is mapped to {B,C}\{B,C\}, while under 𝔬2\mathfrak{o}_{2}, {2,8}\{2,8\} is mapped to {E,U}\{E,U\}. The consistency property in Definition 9 requires that the ranking of {2,8}\{2,8\} must be the same under 𝔬1\mathfrak{o}_{1} and 𝔬2\mathfrak{o}_{2}. That is, {B,C}\{B,C\} and {E,U}\{E,U\} should be ranked the same under 𝔬1\mathfrak{o}_{1} and 𝔬2\mathfrak{o}_{2}, respectively. This requirement is obeyed by the VN scheme illustrated in subplot (a). On the other hand, the scheme in subplot (b) violates the consistency property: it ranks {B,C}\{B,C\} as second and third under 𝔬1\mathfrak{o}_{1}, but ranks {E,U}\{E,U\} as second and fourth under 𝔬2\mathfrak{o}_{2}.

In VN and other IR ranking problems, ties due to identical structure (here represented by f-isomorphisms in 𝐠1{\bf g}_{1} or 𝐠2{\bf g}_{2}) cause theoretical complications. We refer the interested reader to [29] and [1] for examples of these complications and how they can be handled. In order to avoid the additional notational and definitional burdens required to deal with tie-breaking in these situations, we will make the following assumption on the distributions considered in (n,m)\mathcal{F}^{(n,m)}.

Assumption 11.

Let (𝐆1,𝐆2)FΘ(n,m)(n,m)({\bf G}_{1},{\bf G}_{2})\sim F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)} and consider the events

D1\displaystyle D_{1} ={ the only f-automorphism of 𝐆1 is σ=idn}\displaystyle=\{\text{ the only f-automorphism of }{\bf G}_{1}\text{ is }\sigma=\operatorname{id}_{n}\}
D2\displaystyle D_{2} ={ the only f-automorphism of 𝐆2 is σ=idm}.\displaystyle=\{\text{ the only f-automorphism of }{\bf G}_{2}\text{ is }\sigma=\operatorname{id}_{m}\}.

FΘ(n,m)F^{(n,m)}_{\Theta} satisfies FΘ(n,m)(D1)=FΘ(n,m)(D2)=1\mathbb{P}_{F^{(n,m)}_{\Theta}}(D_{1})=\mathbb{P}_{F^{(n,m)}_{\Theta}}(D_{2})=1.

This assumption is unrealistic if there are only a few categorical vertex features (for example, roles in a corporate hierarchy), but this assumption is less restrictive when there are a large number of available categorical features or the features are continuous. We stress that this assumption is made purely to ease the presentation of theoretical material, and the practical impact of this assumption being violated is easily overcome.

2.3 Loss and Bayes Loss

A vertex nomination scheme is, essentially, a semi-supervised IR system for querying large networks. Similar to the recommender system framework [46], a VN scheme is judged to be successful if the top of the nomination list contains a high concentration of vertices of interest from the second network. This motivates the definition of VN loss based on the concept of precision-at-kk.

Definition 12.

Let Φ𝒩(n,m)=𝒩(d1,e1),(d2,e2)(n,m)\Phi\in\mathcal{N}^{(n,m)}=\mathcal{N}^{(n,m)}_{(d_{1},e_{1}),(d_{2},e_{2})} be a VN scheme, HH an obfuscating set of V1V_{1} and V2V_{2} of order |V2|=m|V_{2}|=m, and 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H}. Let (𝐠𝟏,𝐠𝟐)({\bf g_{1}},{\bf g_{2}}) be realized from

(G1,𝐗,𝐖,G2,𝐘,𝐙)FΘ(n,m)(n,m),(G_{1},{\bf X},{\bf W},G_{2},{\bf Y},{\bf Z})\sim F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)},

with a vertex of interest set VC=V1V2V^{*}\subset C=V_{1}\cap V_{2}. For k[m1]k\in[m-1], we define

  • (i)

    For (𝐠1,𝐠2)({\bf g}_{1},{\bf g}_{2}) realized as (G1,𝐗,𝐖,G2,𝐘,𝐙)(G_{1},{\bf X},{\bf W},G_{2},{\bf Y},{\bf Z}), the level-kk nomination loss

    k(Φ,𝐠𝟏,𝔬(𝐠𝟐),V):\displaystyle\ell_{k}(\Phi,{\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*}): =11kvV𝟙{rankΦ(𝐠𝟏,𝔬(𝐠𝟐),V)(𝔬(v))k}\displaystyle=1-\frac{1}{k}\sum_{v\in V^{*}}\mathds{1}\{\operatorname{rank}_{\Phi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*})}(\mathfrak{o}(v))\leq k\}
  • (ii)

    The level-kk error of Φ\Phi is defined as

    Lk(Φ,V)\displaystyle L_{k}(\Phi,V^{*}) =Lk(Φ,V,𝔬):=𝔼(𝐆𝟏,𝐆𝟐)FΘ(n,m)[k(Φ,𝐆𝟏,𝔬(𝐆𝟐),V)]\displaystyle=L_{k}(\Phi,V^{*},\mathfrak{o}):=\mathbb{E}_{({\bf G_{1}},{\bf G_{2}})\sim F^{(n,m)}_{\Theta}}[\ell_{k}(\Phi,{\bf G_{1}},\mathfrak{o}({\bf G_{2}}),V^{*})]
    =11kvVFΘ(n,m)(rankΦ(𝐆𝟏,𝔬(𝐆𝟐),V)(𝔬(v))k),\displaystyle=1-\frac{1}{k}\sum_{v\in V^{*}}\mathbb{P}_{F^{(n,m)}_{\Theta}}\bigg{(}\operatorname{rank}_{\Phi({\bf G_{1}},\mathfrak{o}({\bf G_{2}}),V^{*})}(\mathfrak{o}(v))\leq k\bigg{)},

    where 𝐆𝟏=(G1,𝐗,𝐖){\bf G_{1}}=(G_{1},{\bf X},{\bf W}) and 𝐆𝟐=(G2,𝐘,𝐙){\bf G_{2}}=(G_{2},{\bf Y},{\bf Z}).

The level-k Bayes optimal scheme for FΘ(n,m)F^{(n,m)}_{\Theta} is defined as any element

Φk=Φk,Vargminϕ𝒩(n,m)Lk(Φ,V),\Phi^{*}_{k}=\Phi^{*}_{k,V^{*}}\in\operatorname*{arg\,min}_{\phi\in\mathcal{N}^{(n,m)}}L_{k}(\Phi,V^{*}),

with corresponding Bayes error LkL^{*}_{k}.

Remark 13.

Note that we could have also defined a recall-based loss function via

k(r)(Φ,𝐠𝟏,𝐠𝟐,V):=1|V|vV𝟙{rankΦ(𝐠𝟏,𝔬(𝐠𝟐),V)(𝔬(v))k+1}.\ell^{(r)}_{k}(\Phi,{\bf g_{1}},{\bf g_{2}},V^{*}):=\frac{1}{|V^{*}|}\sum_{v\in V^{*}}\mathds{1}\{\operatorname{rank}_{\Phi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*})}(\mathfrak{o}(v))\geq k+1\}.

We focus on the more natural precision-based loss function, but we note in passing that consistency and Bayes optimality with respect to these two loss functions is equivalent when |V|=O(1)|V^{*}|=O(1).

3 Bayes Optimal Vertex Nomination Schemes

In [29], a Bayes optimal VN scheme (i.e., one that achieves optimal expected VN loss) was derived in the setting where one observes a network without features. In the feature-rich setting, derivations are similar, though they require a more careful technical treatment. After some preliminary work, this section culminates in the definition of the feature-aware Bayes optimal scheme in Section 3.2.

3.1 Obfuscating Features

We are now faced with the problem of modeling the effect of the obfuscating function on features under the VN framework. If we observe 𝔬(𝐠2)\mathfrak{o}({\bf g}_{2}), then we have no knowledge of which member of

[𝐠2]:={𝐠2:𝐠2𝐠2}[{\bf g}_{2}]:=\{{\bf g}_{2}^{\prime}:{\bf g}_{2}\simeq{\bf g}_{2}^{\prime}\}

was obfuscated, but we do know what features are associated to each of the vertices and edges. In order to model this setting, we adopt the following conventions. Let n,m,d1,e1,d2,e2>0n,m,d_{1},e_{1},d_{2},e_{2}\in\mathbb{Z}_{>0} and 𝒱\mathcal{V}, \mathcal{E} be given. Furthermore, let the set of vertices of interest, VC=V1V2V^{*}\subseteq C=V_{1}\cap V_{2}, be fixed. Let HH be an obfuscating set of of V1V_{1} and V2V_{2} of order |V2|=m|V_{2}|=m, and 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H}. Define 𝒜n,md1,e1,d2,e2\mathcal{A}^{\,d_{1},e_{1},d_{2},e_{2}}_{n,m} to be the set of asymmetric richly-featured graphs

𝒜n,m=𝒜n,md1,e1,d2,e2\displaystyle\mathcal{A}_{n,m}=\mathcal{A}^{\,d_{1},e_{1},d_{2},e_{2}}_{n,m} :={(𝐠1,𝐠2)𝒢n,𝒱,d1,e1×𝒢m,𝒱,d2,e2:there are no\displaystyle:=\Big{\{}({\bf g}_{1},{\bf g}_{2})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}}~{}:~{}\text{there are no }
 non-trivial f-automorphisms of 𝐠1 or 𝐠2}.\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}\text{ non-trivial f-automorphisms of }{\bf g}_{1}\text{ or }{\bf g}_{2}\Big{\}}.

Under Assumption 11, FΘ(n,m)F^{(n,m)}_{\Theta} is supported on 𝒜n,m\mathcal{A}_{n,m}.

For each (𝐠𝟏,𝐠𝟐)𝒢n,𝒱,(d1,e1)×𝒢m,𝒱,(d2,e2)({\bf g_{1}},{\bf g_{2}})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{(d_{1},e_{1})}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{(d_{2},e_{2})}, define

(𝐠𝟏,[𝔬(𝐠2)])\displaystyle\left({\bf g_{1}},[\mathfrak{o}({\bf g}_{2})]\right) ={(𝐠1,𝐠^2)𝒢n,𝒱,(d1,e1)×𝒢m,𝒱,(d2,e2) s.t. 𝔬(𝐠^2)𝔬(𝐠2)}\displaystyle=\bigg{\{}({\bf g}_{1},\widehat{\bf g}_{2})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{(d_{1},e_{1})}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{(d_{2},e_{2})}\text{ s.t. }\mathfrak{o}(\widehat{\bf g}_{2})\simeq\mathfrak{o}({\bf g}_{2})\bigg{\}}
={(𝐠1,𝐠^2)𝒢n,𝒱,(d1,e1)×𝒢m,𝒱,(d2,e2) s.t. 𝐠^2𝐠2}.\displaystyle=\bigg{\{}({\bf g}_{1},\widehat{\bf g}_{2})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{(d_{1},e_{1})}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{(d_{2},e_{2})}\text{ s.t. }\widehat{\bf g}_{2}\simeq{\bf g}_{2}\bigg{\}}.

Note that if (𝐠𝟏,𝐠𝟐)𝒜n,m({\bf g_{1}},{\bf g_{2}})\in\mathcal{A}_{n,m} the asymmetry of 𝐠𝟐{\bf g_{2}} yields that

|(𝐠𝟏,[𝔬(𝐠2)])|=m!.\Big{|}\Big{(}{\bf g_{1}},\big{[}\mathfrak{o}({\bf g}_{2})\big{]}\Big{)}\Big{|}=m!.

In light of the action of the obfuscating function on the features and vertex labels of 𝐠2{\bf g}_{2}, we view (𝐠1,[𝔬(𝐠2)])({\bf g}_{1},[\mathfrak{o}({\bf g}_{2})]) as the set of possible graph pairs that could have led to the observed graph pair (𝐠1,𝔬(𝐠2))({\bf g}_{1},\mathfrak{o}({\bf g}_{2})).

For each uHu\in H and vV2v\in V_{2}, we also define the following restriction:

(𝐠𝟏,[𝔬(𝐠2)])u=𝔬(v)={\displaystyle({\bf g_{1}},[\mathfrak{o}({\bf g}_{2})])_{u=\mathfrak{o}(v)}=\bigg{\{} (𝐠1,𝐠^2)𝒢n,𝒱,(d1,e1)×𝒢m,𝒱,(d2,e2) s.t. 𝔬(𝐠^2)=σ(𝔬(𝐠2)),\displaystyle({\bf g}_{1},\widehat{\bf g}_{2})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{(d_{1},e_{1})}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{(d_{2},e_{2})}\text{ s.t. }\mathfrak{o}(\widehat{\bf g}_{2})=\sigma(\mathfrak{o}({\bf g}_{2})),
where σ is an f-isomorphism satisfying σ(u)=𝔬(v)}\displaystyle~{}~{}~{}~{}~{}~{}\text{where $\sigma$ is an f-isomorphism satisfying }\sigma(u)=\mathfrak{o}(v)\bigg{\}}
={\displaystyle=\bigg{\{} (𝐠1,𝐠^2)𝒢n,𝒱,(d1,e1)×𝒢m,𝒱,(d2,e2) s.t. 𝐠^2=σ(𝐠2),\displaystyle({\bf g}_{1},\widehat{\bf g}_{2})\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{(d_{1},e_{1})}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{(d_{2},e_{2})}\text{ s.t. }\widehat{\bf g}_{2}=\sigma({\bf g}_{2}),
 where σ is an f-isomorphism satisfying σ(𝔬1(u))=v},\displaystyle~{}~{}~{}~{}~{}~{}\text{ where $\sigma$ is an f-isomorphism satisfying }\sigma(\mathfrak{o}^{-1}(u))=v\bigg{\}},

and for SV2,S\subseteq V_{2}, define (𝐠𝟏,[𝔬(𝐠2)])u𝔬(S)=vS(𝐠𝟏,[𝔬(𝐠2)])u=𝔬(v).({\bf g_{1}},[\mathfrak{o}({\bf g}_{2})])_{u\in\mathfrak{o}(S)}=\bigcup_{v\in S}\,\,({\bf g_{1}},[\mathfrak{o}({\bf g}_{2})])_{u=\mathfrak{o}(v)}.

3.2 Defining Bayes Optimality

We are now ready to define a Bayes optimal scheme Φ\Phi^{*} for a given FΘ(n,m)(n,m)F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)} satisfying Assumption 11. We will define the scheme element-wise on each asymmetric (𝐠1,[𝔬(𝐠2)])({\bf g}_{1},[\mathfrak{o}({\bf g}_{2})]), and then systematically lift the scheme to all of 𝒜n,md1,e1,d2,e2\mathcal{A}^{\,d_{1},e_{1},d_{2},e_{2}}_{n,m}. To wit, let

{(𝐠1(i),𝐠2(i)):iI}\left\{({\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)}):i\in I\right\}

form a partition of 𝒢n,𝒱,d1,e1×𝒢m,𝒱,d1,e1\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{d_{1},e_{1}}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{d_{1},e_{1}}. To ease notation, we adopt the following shorthand.

  1. 1.

    We use (𝐠1(i),[𝔬(𝐠2(i))])\big{(}{\bf g}^{(i)}_{1},\left[\mathfrak{o}({\bf g}^{(i)}_{2})\right]\big{)} to denote the event {(𝐆𝟏,𝔬(𝐆𝟐))(𝐠1(i),[𝔬(𝐠2(i))])}\left\{({\bf G_{1}},\mathfrak{o}({\bf G_{2}}))\in\left({\bf g}^{(i)}_{1},\left[\mathfrak{o}({\bf g}^{(i)}_{2})\right]\right)\right\}.

  2. 2.

    For uHu\in H, we use (𝐠1(i),[𝔬(𝐠2(i))])u=𝔬(v)\big{(}{\bf g}^{(i)}_{1},\left[\mathfrak{o}({\bf g}^{(i)}_{2})\right]\big{)}_{u=\mathfrak{o}(v)} to denote the event

    {(𝐆𝟏,𝔬(𝐆𝟐))(𝐠1(i),[𝔬(𝐠2(i))])u=𝔬(v)}.\left\{({\bf G_{1}},\mathfrak{o}({\bf G_{2}}))\in\left({\bf g}^{(i)}_{1},\left[\mathfrak{o}({\bf g}^{(i)}_{2})\right]\right)_{u=\mathfrak{o}(v)}\right\}.

    We will use this often with u=Φ(𝐠1(i),𝔬(𝐠2(i)),V)[j]u=\Phi({\bf g}^{(i)}_{1},\mathfrak{o}({\bf g}^{(i)}_{2}),V^{*})[j].

  3. 3.

    We use (𝐠1(i),[𝔬(𝐠2(i))])u𝔬(V)\big{(}{\bf g}^{(i)}_{1},\left[\mathfrak{o}({\bf g}^{(i)}_{2})\right]\big{)}_{u\in\mathfrak{o}(V^{*})} to denote the event

    {(𝐆𝟏,𝔬(𝐆𝟐))(𝐠1(i),[𝔬(𝐠2(i))])u𝔬(V)}.\left\{({\bf G_{1}},\mathfrak{o}({\bf G_{2}}))\in\left({\bf g}^{(i)}_{1},\left[\mathfrak{o}({\bf g}^{(i)}_{2})\right]\right)_{u\in\mathfrak{o}(V^{*})}\right\}.

    We will use this often with u=Φ(𝐠1(i),𝔬(𝐠2(i)),V)[j]u=\Phi({\bf g}^{(i)}_{1},\mathfrak{o}({\bf g}^{(i)}_{2}),V^{*})[j].

Let 𝒮\mathcal{S} denote the set of indices ii such that (𝐠1(i),𝐠2(i))𝒜n,m({\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)})\in\mathcal{A}_{n,m}. That is, 𝐠1(i){\bf g}_{1}^{(i)} and 𝐠2(i){\bf g}_{2}^{(i)} are asymmetric as richly-featured networks. For each i𝒮i\in\mathcal{S}, writing ()\mathbb{P}(\cdot) for FΘ(n,m)()\mathbb{P}_{F^{(n,m)}_{\Theta}}(\cdot) to ease notation, define

Φ(𝐠1(i),𝔬(𝐠2(i)),V)[1]\displaystyle\Phi^{*}({\bf g}^{(i)}_{1},\mathfrak{o}({\bf g}^{(i)}_{2}),V^{*})[1] argmaxuH((𝐠1(i),[𝔬(𝐠2(i))])u𝔬(V)|(𝐠1(i),[𝔬(𝐠2(i))]))\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\end{subarray}}\,\,\mathbb{P}\bigg{(}({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}^{(i)}_{2})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}^{(i)}_{2})])\bigg{)}
Φ(𝐠1(i),𝔬(𝐠2(i)),V)[2]\displaystyle\Phi^{*}({\bf g}^{(i)}_{1},\mathfrak{o}({\bf g}^{(i)}_{2}),V^{*})[2] argmaxuH{Φ[1]}((𝐠1(i),[𝔬(𝐠2(i))])u𝔬(V)|(𝐠1(i),[𝔬(𝐠2(i))]))\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\setminus\{\Phi^{*}[1]\}\end{subarray}}\,\,\mathbb{P}\bigg{(}({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}^{(i)}_{2})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}^{(i)}_{2})])\bigg{)}
\displaystyle\vdots
Φ(𝐠1(i),𝔬(𝐠2(i)),V)[m]\displaystyle\Phi^{*}({\bf g}^{(i)}_{1},\mathfrak{o}({\bf g}^{(i)}_{2}),V^{*})[m] argmaxuH{j<m{Φ[j]}((𝐠1(i),[𝔬(𝐠2(i))])u𝔬(V)|(𝐠1(i),[𝔬(𝐠2(i))])),\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\setminus\{\cup_{j<m}\{\Phi^{*}[j]\}\end{subarray}}\,\,\mathbb{P}\bigg{(}({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}^{(i)}_{2})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}^{(i)}_{2})])\bigg{)},

where we break ties in an arbitrary but fixed manner. For each element

(𝐠1,𝐠2)(𝐠1(i),[𝔬(𝐠2(i))]),({\bf g}_{1},{\bf g}_{2})\in\left({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}^{(i)}_{2})]\right),

choose the f-isomorphism σ\sigma such that 𝔬(𝐠2)=σ(𝔬(𝐠2(i)))\mathfrak{o}({\bf g}_{2})=\sigma(\mathfrak{o}({\bf g}^{(i)}_{2})), and define

Φ(𝐠1,𝔬(𝐠2),V)=σ(Φ(𝐠1(i),𝔬(𝐠2(i)),V)).{\Phi^{*}}({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})=\sigma({\Phi^{*}}({\bf g}^{(i)}_{1},\mathfrak{o}({\bf g}^{(i)}_{2}),V^{*})).

For i𝒮i\not\in\mathcal{S}, any fixed and arbitrary definition of Φ\Phi^{*} (subject to the consistency criterion in Definition 9) on (𝐠1(i),[𝔬(𝐠2(i))])({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}^{(i)}_{2})]) will suffice, as this set has measure 0 under FΘ(n,m)F^{(n,m)}_{\Theta} by Assumption 11. Theorem 14 shows that Φ\Phi^{*} defined above is indeed level-kk Bayes optimal for all kk for any nominatable distribution satisfying Assumption 11. A proof is given in Appendix B.1.

Theorem 14.

Let FΘ(n,m)(n,m)F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)} satisfy Assumption 11, and let VC=V1V2V^{*}\subset C=V_{1}\cap V_{2} be a given set of vertices of interest in 𝐆1{\bf G}_{1}. The FA-VN scheme Φ\Phi^{*} defined in Equation (3.2) is a level-kk Bayes optimal scheme for FΘ(n,m)F^{(n,m)}_{\Theta} for all k[m]k\in[m] and any obfuscating set HH and obfuscating function 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H}; i.e., ΦargminΦ𝒩(n,m)Lk(Φ,V)\Phi^{*}\in\operatorname*{arg\,min}_{\Phi\in\mathcal{N}^{(n,m)}}L_{k}(\Phi,V^{*}) for all k[m]k\in[m].

For ease of reference in the sections to come, Table 1 summarizes the notation used in the paper so far.

Symbol Description Definition
[n][n] For n>0n\in\mathbb{Z}_{>0}, this denotes {1,2,3,,n}\{1,2,3,\cdots,n\} -
(S2)\binom{S}{2} For a set SS, this represents the -
set {{u,v} s.t. u,vS}\{\,\{u,v\}\text{ s.t. }u,v\in S\}
𝒱\mathcal{V} Denotes a discrete set of vertex features -
\mathcal{E} Denotes a discrete set of edge features -
~\widetilde{\mathcal{E}} The set {}\mathcal{E}\cup\{\star\}, where \star is a special symbol Def. 1
representing unavailable or missing data
𝒢n\mathcal{G}_{n} For n>0n\in\mathbb{Z}_{>0}, the set of all labeled, undirected -
graphs on nn vertices
𝒢n,𝒱,d1,d2\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},d_{2}} For n,d1,d2>0n,d_{1},d_{2}\in\mathbb{Z}_{>0}, the set of richly-featured Def. 1
networks of order (n,d1,d2)(n,d_{1},d_{2}) with vertex (resp., edge)
features in 𝒱d1\mathcal{V}^{d_{1}} (resp., ~d2\widetilde{\mathcal{E}}^{d_{2}})
VgV_{g} For graph g𝒢ng\in\mathcal{G}_{n}, VgV_{g} denotes the set of vertices of gg -
EgE_{g} For graph g𝒢ng\in\mathcal{G}_{n}, EgE_{g} denotes the set of edges of gg -
NN (n2)\binom{n}{2} -
MM (m2)\binom{m}{2} -
0\vec{0} The vector of all 0’s -
𝐗[i,:]{\bf X}[i,:] This denotes the ii-th row of a matrix 𝐗{\bf X} -
𝐗[S,:]{\bf X}[S,:] For a set SS, this denotes the submatrix of 𝐗{\bf X} -
with rows indexed by SS
\simeq If g1,g2𝒢ng_{1},g_{2}\in\mathcal{G}_{n} satisfy g1g2g_{1}\simeq g_{2}, then g1g_{1} is -
isomorphic to g2g_{2}
\simeq If 𝐠1,𝐠2𝒢n,𝒱,d1,d2{\bf g}_{1},{\bf g}_{2}\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},d_{2}} satisfy 𝐠1𝐠2{\bf g}_{1}\simeq{\bf g}_{2}, then 𝐠1{\bf g}_{1} is Def. 8
feature-preserving isomorphic to 𝐠2{\bf g}_{2}
Φ\Phi A nomination scheme Def. 9
𝒩(n,m)=𝒩(d1,e1),(d2,e2)(n,m)\mathcal{N}^{(n,m)}=\mathcal{N}^{(n,m)}_{(d_{1},e_{1}),(d_{2},e_{2})} the set of all feature-aware VN schemes Def. 9
Lk,LkL_{k},L^{*}_{k} level-kk error of a VN scheme Def. 12
Table 1: Notation and definitions used in the remainder of the paper.

4 Feature-Oblivious Vertex Nomination

It is intuitively clear that incorporating features should improve VN performance, provided those features are correlated with vertex “interestingness”. Indeed, this is a common theme across many graph-based machine learning tasks (see, for example, [64, 5, 34]), and the same holds in the present VN setting. The combination of network structure and informative features can significantly improve the VN Bayes error. Consider, for instance, the following simple example set in the context of the stochastic blockmodel [18].

Definition 15.

An undirected nn-vertex graph G=(V,E)G=(V,E) is an instantiation of a Stochastic Blockmodel with parameters (K,b,Λ)(K,b,\Lambda), abbreviated GSBM(K,b,Λ)G\sim\text{SBM}(K,b,\Lambda), if:

  • 1.

    The vertex set VV is partitioned into KK communities V=V1V2VKV=V_{1}\sqcup V_{2}\sqcup\ldots\sqcup V_{K};

  • 2.

    The community membership function b:V[K]b:V\rightarrow[K] agrees with the partition of the vertices, so that vVb(v)v\in V_{b(v)} for each vVv\in V;

  • 3.

    Λ\Lambda is a K×KK\times K matrix of probabilities: 𝟙{{u,v}E}Bern(Λ(b(u),b(v)))\mathbbm{1}\left\{\{u,v\}\in E\right\}\sim\text{Bern}(\Lambda(b(u),b(v))) for each {u,v}(V2)\{u,v\}\in\binom{V}{2}, and the collection of random variables {𝟙{{u,v}E}:{u,v}(V2)}\{\mathbbm{1}\left\{\{u,v\}\in E\right\}:\{u,v\}\in\binom{V}{2}\} are mutually independent.

Example 16.

Let G1,G2G_{1},G_{2} be independent 2n2n-vertex SBM(2,Λ,b)\text{SBM}(2,\Lambda,b) random graphs with

Λ=(abbc),b(v)={1 if 1vn2 if n+1v2n,\Lambda=\begin{pmatrix}a&b\\ b&c\end{pmatrix},\hskip 8.53581ptb(v)=\begin{cases}1&\text{ if }1\leq v\leq n\\ 2&\text{ if }n+1\leq v\leq 2n\end{cases},

where b<a<cb<a<c are fixed (i.e., do not vary with nn). Edges in both G1G_{1} and G2G_{2} are independent and the probability of an edge between vertices {u,v}\{u,v\} is equal to

({u,v}E(Gi))={a if {u,v}[n];c if {u,v}{n+1,,2n};b otherwise. \mathbb{P}(\{u,v\}\in E(G_{i}))=\begin{cases}a&\text{ if }\{u,v\}\subset[n];\\ c&\text{ if }\{u,v\}\subset\{n+1,\cdots,2n\};\\ b&\text{ otherwise. }\end{cases}

Take V={v1}V^{*}=\{v_{1}\} with corresponding vertex of interest u=u1V2u^{*}=u_{1}\in V_{2} with b(v1)=b(u1)=1b(v_{1})=b(u_{1})=1. In the absence of features, Lk=(1+o(1))(1min(k,n)n)L^{*}_{k}=(1+o(1))(1-\frac{\min(k,n)}{n}), owing to the fact that vertices in the same community are stochastically identical. If the graphs are endowed with one-dimensional edge features X,Y2nX,Y\in\mathbb{R}^{2n},

XT=YT=[1,1,,1n/2 total,2,2,,2n total,1,1,,1n/2 total],X^{T}=Y^{T}=[\underbrace{1,1,\cdots,1}_{n/2\text{ total}},\underbrace{2,2,\cdots,2}_{n\text{ total}},\underbrace{1,1,\cdots,1}_{n/2\text{ total}}],

we see that a ranking scheme that ignores network structure can do no better than randomly ranking the vertices with feature 11, and thus has a loss Lk=1min(k,n)nL^{*}_{k}=1-\frac{\min(k,n)}{n}. In contrast, if one considers the richly attributed graphs (G1,X)(G_{1},X) and (G2,Y)(G_{2},Y), the Bayes optimal loss is improved to Lk=(1+o(1))(1min(k,n/2)n/2)L^{*}_{k}=(1+o(1))(1-\frac{\min(k,n/2)}{n/2}) for all k,nk,n, as the network topology and vertex features offer complementary information.

Can Bayes optimal performance in VN ever be improved by ignoring features? To answer this question, we must first define a scheme that both ignores features and satisfies the consistency criteria in Definition 9, and it is not immediately obvious how to do so.

Firstly, we must define what it means to ignore features in a richly-featured network. Toward this end, consider the following definition, which defines a procedure for mapping a richly-featured network to a simple graph structure (i.e., the network stripped of its feature information).

Definition 17.

Let N=(n2)N=\binom{n}{2}. For 𝐰~N×d2{\bf w}\in\widetilde{\mathcal{E}}^{N\times d_{2}}, we define γ(𝐰)=(Vγ(𝐰),Eγ(𝐰))𝒢n\gamma({\bf w})=(V_{\gamma({\bf w})},E_{\gamma({\bf w})})\in\mathcal{G}_{n} to be the graph compatible with the edge features in 𝐰{\bf w}. That is, γ(𝐰)𝒢n\gamma({\bf w})\in\mathcal{G}_{n}, and 𝐰{\bf w} is such that 𝐰[e,:]=(,,,){\bf w}[e,:]=(\star,\star,\cdots,\star) for all eEγ(𝐰)e\not\in E_{\gamma({\bf w})} and 𝐰[e,:]d2{\bf w}[e,:]\in\mathcal{E}^{d_{2}} for all eEγ(𝐰).e\in E_{\gamma({\bf w})}.

We stress that a priori, it is not necessarily clear that a VN scheme exists that simultaneously ignores features and satisfies the consistency requirements of Definition 9. We illustrate this point with a brief example. A scheme Φ\Phi that ignores features must rank vertices identically regardless of features, so long as the features are compatible in the sense of the mapping γ\gamma just defined. More formally, it must be that for all (g1,g2)𝒢n×𝒢m(g_{1},g_{2})\in\mathcal{G}_{n}\times\mathcal{G}_{m} and all features

(𝐱,𝐰,𝐲,𝐳),(𝐱,𝐰,𝐲,𝐳)𝒱n×d1×~N×e1×𝒱m×d2×~M×e2({\bf x},{\bf w},{\bf y},{\bf z}),({\bf x}^{\prime},{\bf w}^{\prime},{\bf y}^{\prime},{\bf z}^{\prime})\in\mathcal{V}^{n\times d_{1}}\times\widetilde{\mathcal{E}}^{N\times e_{1}}\times\mathcal{V}^{m\times d_{2}}\times\widetilde{\mathcal{E}}^{M\times e_{2}}

with γ(𝐰)=γ(𝐰)=g1\gamma({\bf w})=\gamma({\bf w}^{\prime})=g_{1} and γ(𝐳)=γ(𝐳)=g2\gamma({\bf z})=\gamma({\bf z}^{\prime})=g_{2},

Φ((g1,𝐱,𝐰),𝔬(g2,𝐲,𝐳),V)=Φ((g1,𝐱,𝐰),𝔬(g2,𝐲,𝐳),V).\Phi\left((g_{1},{\bf x},{\bf w}),\mathfrak{o}(g_{2},{\bf y},{\bf z}),V^{*}\right)=\Phi\left((g_{1},{\bf x}^{\prime},{\bf w}^{\prime}),\mathfrak{o}(g_{2},{\bf y}^{\prime},{\bf z}^{\prime}),V^{*}\right). (2)

Now consider 𝐠1=(g1,𝐱,𝐰){\bf g}_{1}=(g_{1},{\bf x},{\bf w}) and 𝐠2=(g2,𝐲,𝐳){\bf g}_{2}=(g_{2},{\bf y},{\bf z}) that are asymmetric as attributed networks (i.e., have no non-trivial f-automorphisms), but for which the raw graphs g1g_{1} and g2g_{2} have non-trivial automorphism groups. By assumption, there exists a permutation σid\sigma\neq\operatorname{id} such that

σ(𝔬(𝐠2))=𝔬(g2,𝐲,𝐳).\sigma(\mathfrak{o}({\bf g}_{2}))=\mathfrak{o}(g_{2},{\bf y}^{\prime},{\bf z}^{\prime}).

Then the consistency criterion in Definition 9 requires

σ(Φ(𝐠1,𝔬(𝐠2),V))=Φ(𝐠1,σ(𝔬(𝐠2)),V),\sigma(\Phi\left({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*}\right))=\Phi\left({\bf g}_{1},\sigma(\mathfrak{o}({\bf g}_{2})),V^{*}\right),

while Equation (2) requires Φ(𝐠1,σ(𝔬(𝐠2)),V)=Φ(𝐠1,𝔬(𝐠2),V),\Phi\left({\bf g}_{1},\sigma(\mathfrak{o}({\bf g}_{2})),V^{*}\right)=\Phi\left({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*}\right), contradicting σid\sigma\neq\operatorname{id}.

For Equation (2) to hold in general, we must consider a consistency criterion analogous to Equation (1) that is compatible with schemes that ignore the vertex features. The following definition, adapted from [1], suffices.

Definition 18 (Feature-Oblivious VN Scheme).

Let n,m,d1,e1,d2,e2>0n,m,d_{1},e_{1},d_{2},e_{2}\in\mathbb{Z}_{>0} and 𝒱\mathcal{V}, \mathcal{E} be given. Let HH be an obfuscating set of V1V_{1} and V2V_{2} of order |V2|=m|V_{2}|=m, and let 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H} be given. A feature-oblivious vertex nomination scheme (FO-VN scheme) is a function

Ψ:𝒢n,𝒱,d1,e1×𝔬(𝒢m,𝒱,d2,e2)×2V1𝒯H,\Psi:\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}\times\mathfrak{o}(\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}})\times 2^{V_{1}}\rightarrow{\mathcal{T}}_{H},

satisfying Equation (2) as well as the consistency criterion in Assumption 19 below.

Similar to the FA-VN consistency criteria, we require FO-VN schemes to be similarly label-agnostic for the obfuscated labels of the second graph.

Assumption 19 (FO-VN Consistency Criteria).

With notation as in Definition 18, for each 𝐠2=(g2,𝐲,𝐳)𝒢m,𝒱,d2,e2{\bf g}_{2}=(g_{2},{\bf y},{\bf z})\in\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{d_{2},e_{2}} and uV2u\in V_{2}, let

𝒥(u;𝐠2)={wV2 s.t.  an automorphism σ of g2, s.t. σ(u)=w}.\mathcal{J}(u;{\bf g}_{2})=\{w\in V_{2}\text{ s.t. }\exists\text{ an automorphism }\sigma\text{ of }g_{2},\text{ s.t. }\,\sigma(u)=w\}.

For any 𝐠𝟏𝒢n,𝒱,d1,e1,{\bf g_{1}}\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}, 𝐠𝟐𝒢m,𝒱,d2,e2{\bf g_{2}}\in\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}}, VV1V2V^{*}\subset V_{1}\cap V_{2}, obfuscating functions 𝔬1,𝔬2𝔒H\mathfrak{o}_{1},\mathfrak{o}_{2}\in\mathfrak{O}_{H} and any uV2u\in V_{2}, we require

𝔯Ψ(𝐠𝟏,𝐠𝟐,𝔬1,V,𝒥(u;𝐠𝟐))=𝔯Ψ(𝐠𝟏,𝐠𝟐,𝔬2,V,𝒥(u;𝐠𝟐))\displaystyle\mathfrak{r}_{\Psi}({\bf g_{1}},{\bf g_{2}},\mathfrak{o}_{1},V^{*},\mathcal{J}(u;{\bf g_{2}}))=\mathfrak{r}_{\Psi}({\bf g_{1}},{\bf g_{2}},\mathfrak{o}_{2},V^{*},\mathcal{J}(u;{\bf g_{2}})) (3)
\displaystyle~{}~{}~{}\Leftrightarrow
k[m]:\displaystyle\forall k\in[m]:
𝔬2𝔬11(𝒥(Ψ(𝐠𝟏,𝔬1(𝐠𝟐),V)[k]);𝔬1(𝐠𝟐))=𝒥(Ψ(𝐠𝟏,𝔬2(𝐠𝟐),V)[k];𝔬2(𝐠𝟐))\displaystyle~{}~{}~{}\mathfrak{o}_{2}\circ\mathfrak{o}_{1}^{-1}\big{(}\mathcal{J}(\Psi({\bf g_{1}},\mathfrak{o}_{1}({\bf g_{2}}),V^{*})[k]);\mathfrak{o}_{1}({\bf g_{2}})\big{)}=\mathcal{J}\left(\Psi({\bf g_{1}},\mathfrak{o}_{2}({\bf g_{2}}),V^{*})[k];\mathfrak{o}_{2}({\bf g_{2}})\right)

where Ψ(𝐠𝟏,𝔬(𝐠𝟐),V)[k]\Psi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*})[k] denotes the kk-th element in the ordering Ψ(𝐠𝟏,𝔬(𝐠𝟐),V)\Psi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*}) (i.e., the rank-kk vertex).

The criterion in Equation (3) is less restrictive than that in Equation (1), and it is not immediate that incorporating features yields an FA-VN scheme with smaller loss than the Bayes optimal FO-VN scheme. We illustrate this in the following example.

Example 20.

Let F(n,m)F\in\mathcal{F}^{(n,m)} be a distribution such that G1=a.s.KnG_{1}\stackrel{{\scriptstyle a.s.}}{{=}}K_{n} and G2=a.s.KmG_{2}\stackrel{{\scriptstyle a.s.}}{{=}}K_{m}, where KnK_{n} denotes the complete graph on nn vertices. If the ff-automorphism group of 𝐆1{\bf G}_{1} 𝐆2{\bf G}_{2} are a.s. trivial, then any given FA-VN scheme can be outperformed by a well-chosen FO-VN scheme. Indeed, if there is a single vertex of interest vv^{*} in 𝐆1{\bf G}_{1} with corresponding vertex uu^{*} in 𝐆2{\bf G}_{2}, then there exists a FO-VN scheme Ψ\Psi that satisfies Ψ(𝐠1,𝔬(𝐠2),v)[1]=𝔬(u)\Psi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),v^{*})[1]=\mathfrak{o}(u^{*}) for almost all 𝐠1,𝔬(𝐠2){\bf g}_{1},\mathfrak{o}({\bf g}_{2}). Such a Ψ\Psi cannot satisfy Equation (1), and it is possible to have L1(Φ)>0L_{1}(\Phi^{*})>0 for FA-VN Bayes optimal Φ\Phi^{*}.

However, consider distributions satisfying the following assumption.

Assumption 21.

Let (𝐆1=(G1,𝐗,𝐖),𝐆2=(G2,𝐘,𝐙))FΘ(n,m)(n,m)({\bf G}_{1}=(G_{1},{\bf X},{\bf W}),{\bf G}_{2}=(G_{2},{\bf Y},{\bf Z}))\sim F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)} and consider the events

D3\displaystyle D_{3} ={ the only automorphism of G1 is σ=idn}\displaystyle=\{\text{ the only automorphism of }G_{1}\text{ is }\sigma=\operatorname{id}_{n}\}
D4\displaystyle D_{4} ={ the only automorphism of G2 is σ=idm}.\displaystyle=\{\text{ the only automorphism of }G_{2}\text{ is }\sigma=\operatorname{id}_{m}\}.

FΘ(n,m)F^{(n,m)}_{\Theta} satisfies FΘ(n,m)(D3)=FΘ(n,m)(D4)=1\mathbb{P}_{F^{(n,m)}_{\Theta}}(D_{3})=\mathbb{P}_{F^{(n,m)}_{\Theta}}(D_{4})=1.

Under Assumption 21, we have that (u;𝐠2)=a.s.𝒥(u;𝐠2)=a.s.{u}\mathcal{I}(u;{\bf g}_{2})\stackrel{{\scriptstyle a.s.}}{{=}}\mathcal{J}(u;{\bf g}_{2})\stackrel{{\scriptstyle a.s.}}{{=}}\{u\}, and the consistency criteria in Assumptions 10 and 19 are almost surely equivalent. It is then immediate that Bayes optimality cannot be improved by ignoring features. That is, a FO-VN Ψ\Psi scheme is almost surely a FA-VN scheme, and hence Lk(Ψ,V)Lk(Φ,V)L_{k}(\Psi,V^{*})\geq L_{k}(\Phi^{*},V^{*}) for all k[m]k\in[m]. This leads us to ask whether we can establish conditions under which ignoring features strictly decreases VN performance.

4.1 Feature oblivious Bayes optimality

We first establish the notion of a Bayes optimal FO-VN scheme for distributions satisfying Assumption 21. Defining

𝔊n,m\displaystyle\mathfrak{G}_{n,m} ={(g1,𝔬(g2))𝒢n×𝔬(𝒢m) s.t. g1,g2 are asymmetric},\displaystyle=\{(g_{1},\mathfrak{o}(g_{2}))\in\mathcal{G}_{n}\times\mathfrak{o}(\mathcal{G}_{m})\text{ s.t. }g_{1},g_{2}\text{ are asymmetric}\},

let {(g1,g2)}i=1p\{(g_{1},g_{2})\}_{i=1}^{p} be such that {(g1,[𝔬(g2)])}i=1p\{(g_{1},[\mathfrak{o}(g_{2})])\}_{i=1}^{p} partitions 𝔊n,m\mathfrak{G}_{n,m}. For FΘ(n,m)F^{(n,m)}_{\Theta} supported on 𝔊n,m\mathfrak{G}_{n,m}, it follows from [1] that a Bayes optimal FO-VN scheme, Ψ\Psi^{*}, can be constructed as follows. If (g1i,g2i){(g1,g2)}i=1p(g_{1}^{i},g_{2}^{i})\in\{(g_{1},g_{2})\}_{i=1}^{p}, and (𝐠1i,𝔬(𝐠2i))({\bf g}_{1}^{i},\mathfrak{o}({\bf g}_{2}^{i})) is any featured extension of (g1i,g2i)(g_{1}^{i},g_{2}^{i}) (note that this notation will be implicit below), we sequentially define (breaking ties in a fixed but arbitrary manner and writing ()\mathbb{P}(\cdot) for FΘ(n,m)()\mathbb{P}_{F^{(n,m)}_{\Theta}}(\cdot) to ease notation)

Ψ(𝐠1i,𝔬(𝐠2i),V)[1]\displaystyle\Psi^{*}({\bf g}_{1}^{i},\mathfrak{o}({\bf g}_{2}^{i}),V^{*})[1] argmaxuH((g1i,[𝔬(g2i)])u𝔬(V)|(g1i,[𝔬(g2i)]))\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\end{subarray}}\,\,\mathbb{P}\bigg{(}(g_{1}^{i},[\mathfrak{o}(g_{2}^{i})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,(g_{1}^{i},[\mathfrak{o}(g_{2}^{i})])\bigg{)}
Ψ(𝐠1i,𝔬(𝐠2i),V)[2]\displaystyle\Psi^{*}({\bf g}_{1}^{i},\mathfrak{o}({\bf g}_{2}^{i}),V^{*})[2] argmaxuH{Ψ[1]}((g1i,[𝔬(g2i)])u𝔬(V)|(g1i,[𝔬(g2i)]))\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\setminus\{\Psi^{*}[1]\}\end{subarray}}\mathbb{P}\bigg{(}(g_{1}^{i},[\mathfrak{o}(g_{2}^{i})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,(g_{1}^{i},[\mathfrak{o}(g_{2}^{i})])\bigg{)}
\displaystyle\vdots
Ψ(𝐠1i,𝔬(𝐠2i),V)[m]\displaystyle\Psi^{*}({\bf g}_{1}^{i},\mathfrak{o}({\bf g}_{2}^{i}),V^{*})[m] argmaxuH{j<m{Ψ[j]}((g1i,[𝔬(g2i)])u𝔬(V)|(g1i,[𝔬(g2i)])),\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\setminus\{\cup_{j<m}\{\Psi^{*}[j]\}\end{subarray}}\mathbb{P}\bigg{(}(g_{1}^{i},[\mathfrak{o}(g_{2}^{i})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,(g_{1}^{i},[\mathfrak{o}(g_{2}^{i})])\bigg{)},

where (g1,[𝔬(g2)])(g_{1},[\mathfrak{o}(g_{2})]) and (g1,[𝔬(g2)])u𝔬(V)(g_{1},[\mathfrak{o}(g_{2})])_{u\in\mathfrak{o}(V^{*})} (that is, the graphs without their features) are defined analogously to the featured (𝐠1,[𝔬(𝐠2)])({\bf g}_{1},[\mathfrak{o}({\bf g}_{2})]) and (𝐠1,[𝔬(𝐠2)])u𝔬(V)({\bf g}_{1},[\mathfrak{o}({\bf g}_{2})])_{u\in\mathfrak{o}(V^{*})} respectively:

(g1,[𝔬(g2)])\displaystyle({g_{1}},[\mathfrak{o}(g_{2})]) ={(g1,g^2)𝔊n,m s.t. 𝔬(g^2)𝔬(g2)}\displaystyle=\bigg{\{}(g_{1},\widehat{g}_{2})\in\mathfrak{G}_{n,m}\text{ s.t. }\mathfrak{o}(\widehat{g}_{2})\simeq\mathfrak{o}(g_{2})\bigg{\}}
(g1,[𝔬(g2)])u=𝔬(v)\displaystyle({g_{1}},[\mathfrak{o}(g_{2})])_{u=\mathfrak{o}(v)} ={(g1,g^2)𝔊n,m s.t. 𝔬(g^2)=σ(𝔬(g2)), where σ is an\displaystyle=\bigg{\{}(g_{1},\widehat{g}_{2})\in\mathfrak{G}_{n,m}\text{ s.t. }\mathfrak{o}(\widehat{g}_{2})=\sigma(\mathfrak{o}(g_{2})),\text{ where $\sigma$ is an}
 isomorphism satisfying σ(u)=𝔬(v)}.\displaystyle\hskip 42.67912pt\text{ isomorphism satisfying }\sigma(u)=\mathfrak{o}(v)\bigg{\}}.

For each (𝐠1,𝐠2)(𝐠1i,[𝔬(𝐠2i)]),({\bf g}_{1}^{\prime},{\bf g}_{2}^{\prime})\in({\bf g}_{1}^{i},[\mathfrak{o}({\bf g}_{2}^{i})]), choose the f-isomorphism σ\sigma such that 𝔬(𝐠2)=σ(𝔬(𝐠2i))\mathfrak{o}({\bf g}_{2}^{\prime})=\sigma(\mathfrak{o}({\bf g}_{2}^{i})), and define

Ψ(𝐠1,𝔬(𝐠2),V)=σ(Ψ(𝐠1i,𝔬(𝐠2i),V)).{\Psi^{*}}({\bf g}_{1}^{\prime},\mathfrak{o}({\bf g}_{2}^{\prime}),V^{*})=\sigma({\Psi^{*}}({\bf g}_{1}^{i},\mathfrak{o}({\bf g}_{2}^{i}),V^{*})).

For elements (𝐠1,𝔬(𝐠2))𝔊n,m({\bf g}_{1},\mathfrak{o}({\bf g}_{2}))\notin\mathfrak{G}_{n,m}, any fixed and arbitrary definition of ΦO\Phi_{O}^{*} satisfying Equation (3) suffices (as this set has measure 0 under FΘ(n,m)F^{(n,m)}_{\Theta} by Assumption 21). Note that Ψ\Psi^{*} is almost surely well-defined, as the definition of Ψ\Psi^{*} on

𝔄n,m=𝔄n,m,𝒱,(d1,e1,d2,e2):={\displaystyle\mathfrak{A}_{n,m}=\mathfrak{A}_{n,m,\mathcal{V},\mathcal{E}}^{(d_{1},e_{1},d_{2},e_{2})}:=\{ (𝐠1=(g1,𝐱,𝐰),𝐠2=(g2,𝐲,𝐳))𝒢n,𝒱,d1,e1×𝒢m,𝒱,d2,e2\displaystyle({\bf g}_{1}=(g_{1},{\bf x},{\bf w}),{\bf g}_{2}=(g_{2},{\bf y},{\bf z}))\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}\times\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}}
 s.t. g1,g2 are asymmetric}\displaystyle~{}~{}~{}~{}~{}~{}\text{ s.t. }g_{1},g_{2}\text{ are asymmetric}\}

is independent of the choice of the partition {(g1i,g2i):i=1,2,,p}\{(g_{1}^{i},g_{2}^{i}):i=1,2,\dots,p\}.

4.2 The Benefit of Features

With the FO-VN scheme defined, we seek to understand when, for distributions FΘ(n,m)F^{(n,m)}_{\Theta} supported on 𝔄n,m\mathfrak{A}_{n,m}, we have Lk(Φ,V)<Lk(Ψ,V)L_{k}(\Phi^{*},V^{*})<L_{k}(\Psi^{*},V^{*}) where Φ\Phi^{*} and Ψ\Psi^{*} are the Bayes optimal FA-VN and FO-VN schemes, respectively, under FΘ(n,m)F^{(n,m)}_{\Theta}. Toward this end, we first define the following 𝒯H\mathcal{T}_{H}-valued random variable.

Definition 22.

Let FΘ(n,m)(n,m)F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)}, let VV1V2V^{*}\subseteq V_{1}\cap V_{2} be a given set of vertices of interest, and let HH be an obfuscating set of V1V_{1} and V2V_{2} of order |V2|=m|V_{2}|=m with obfuscating function 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H}. Let Φ\Phi be a VN scheme (either feature-aware or feature-oblivious), and define, letting Ω\Omega be our sample space, the 𝒯H\mathcal{T}_{H}-valued random variable

XΦ:Ω𝒯HX_{\Phi}:\Omega\mapsto\mathcal{T}_{H}

by XΦ(ω)=Φ(𝐆1(ω),𝔬(𝐆2(ω)),V)X_{\Phi}(\omega)=\Phi({\bf G}_{1}(\omega),\mathfrak{o}({\bf G}_{2}(\omega)),V^{*}). For each kmk\leq m, define XΦk=XΦ[1:k]𝒯HkX_{\Phi}^{k}=X_{\Phi}[1:k]\in\mathcal{T}_{H}^{k}, where we define 𝒯Hk\mathcal{T}_{H}^{k} to be the set of all kk-tuples of distinct elements of HH (each such tuple can be viewed as specifying a total ordering of kk distinct elements of HH).

Remark 23.

Note that in the setting of continuous features, the measurability of XΦX_{\Phi} is not immediate (and indeed, is non-trivial to establish); this technical hurdle is the main impetus for discretizing the feature space.

We can now characterize the conditions under which incorporating features strictly improves VN performance. A proof of Theorem 24 can be found in Appendix B.2.

Theorem 24.

Consider the setup and notation of Definition 22 and suppose that FΘ(n,m)(n,m)F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)} satisfies Assumption 21. Letting Φ\Phi^{*} and Ψ\Psi^{*} be Bayes optimal FA-VN and FO-VN schemes, respectively, under FΘ(n,m)F^{(n,m)}_{\Theta}, we have that Lk(Φ,V)=Lk(Ψ,V)L_{k}(\Phi^{*},V^{*})=L_{k}(\Psi^{*},V^{*}) if and only if there exists a Bayes optimal FA-VN scheme Φ\Phi^{*} with

𝕀(XΦk;(G1,G2))=(XΦk),\mathbb{I}(X_{\Phi^{*}}^{k};(G_{1},G_{2}))=\mathbb{H}(X_{\Phi^{*}}^{k}),

where 𝕀\mathbb{I} is the mutual information and \mathbb{H} the statistical entropy defined by

(XΦk)=ξ𝒯Hk(XΦk=ξ)log((XΦk=ξ))\displaystyle\mathbb{H}(X_{\Phi^{*}}^{k})=-\sum_{\xi\in\mathcal{T}_{H}^{k}}\mathbb{P}(X_{\Phi^{*}}^{k}=\xi)\log(\mathbb{P}(X_{\Phi^{*}}^{k}=\xi))
𝕀(XΦk;(G1,G2))=ξ𝒯Hk(g1,g2)𝒢n×𝒢m(ξ,(g1,g2))log((ξ,(g1,g2))(ξ)((g1,g2))),\displaystyle\mathbb{I}(X_{\Phi^{*}}^{k};(G_{1},G_{2}))=\sum_{\xi\in\mathcal{T}_{H}^{k}}\sum_{\begin{subarray}{c}(g_{1},g_{2})\\ \in\mathcal{G}_{n}\times\mathcal{G}_{m}\end{subarray}}\mathbb{P}(\xi,(g_{1},g_{2}))\log\left(\frac{\mathbb{P}(\xi,(g_{1},g_{2}))}{\mathbb{P}(\xi)\mathbb{P}((g_{1},g_{2}))}\right),

where we have written (ξ,(g1,g2))\mathbb{P}(\xi,(g_{1},g_{2})) as shorthand for (XΦk=ξ,(G1,G2)=(g1,g2))\mathbb{P}(X_{\Phi^{*}}^{k}=\xi,(G_{1},G_{2})=(g_{1},g_{2})).

We note that, since 𝕀(XΦk;(G1,G2))(XΦk)\mathbb{I}(X_{\Phi^{*}}^{k};(G_{1},G_{2}))\leq\mathbb{H}(X_{\Phi^{*}}^{k}), we can restate the result of Theorem 24 as Lk(Φ,V)<Lk(Ψ,V)L_{k}(\Phi^{*},V^{*})<L_{k}(\Psi^{*},V^{*}) if and only if for all Bayes optimal FA-VN schemes Φ\Phi^{*},

𝕀(XΦk;(G1,G2))<(XΦk).\mathbb{I}(X_{\Phi^{*}}^{k};(G_{1},G_{2}))<\mathbb{H}(X_{\Phi^{*}}^{k}).

Stated succinctly, there is excess uncertainty in XΦkX_{\Phi^{*}}^{k} after observing (G1,G2)(G_{1},G_{2}); and XΦkX_{\Phi^{*}}^{k} is not deterministic given (G1,G2)(G_{1},G_{2}).

5 Network-Oblivious Vertex Nomination

In contrast to the feature-oblivious VN schemes considered in Section 4, one can also consider VN schemes that use only features and ignore network structure. Defining such a network-oblivious VN scheme (NO-VN scheme) is not immediately straightforward. Ideally, we would like to have that for all (g1,g2),(g1,g2)𝒢n×𝒢m(g_{1},g_{2}),(g_{1}^{\prime},g_{2}^{\prime})\in\mathcal{G}_{n}\times\mathcal{G}_{m} and all edge features (𝐰,𝐳)({\bf w},{\bf z}), (𝐰,𝐳)({\bf w}^{\prime},{\bf z}^{\prime}) compatible with (g1,g2)(g_{1},g_{2}) and (g1,g2)(g_{1}^{\prime},g_{2}^{\prime}) respectively,

Φ((g1,𝐱,𝐰),𝔬(g2,𝐲,𝐳),V)=Φ((g1,𝐱,𝐰),𝔬(g2,𝐲,𝐳),V)\Phi\left((g_{1},{\bf x},{\bf w}),\mathfrak{o}(g_{2},{\bf y},{\bf z}),V^{*}\right)=\Phi\left((g_{1}^{\prime},{\bf x},{\bf w}^{\prime}),\mathfrak{o}(g_{2}^{\prime},{\bf y},{\bf z}^{\prime}),V^{*}\right) (4)

for any choice of vertex features 𝐱,𝐲{\bf x},{\bf y}. As in the FO-VN scheme setting, this leads to potential violation of the internal consistency criteria of Equation (1). Indeed, consider 𝐠1=(g1,𝐱,𝐰){\bf g}_{1}=(g_{1},{\bf x},{\bf w}) and 𝐠2=(g2,𝐲,𝐳){\bf g}_{2}=(g_{2},{\bf y},{\bf z}) with asymmetric graphics but with symmetries in 𝐲{\bf y} (i.e., there exists non-identity permutation matrix PσP_{\sigma} such that Pσ𝐲=𝐲P_{\sigma}{\bf y}={\bf y}). On such networks, Equations (1) and (4) cannot both hold simultaneously. Thus, we consider a relaxed consistency criterion as in Assumption 19. We first define

𝒴(u;𝐠2)={wV(g2): bijection σ s.t. Pσ𝐲=𝐲 and σ(u)=w},\mathcal{Y}(u;{\bf g}_{2})=\{w\in V(g_{2}):\exists\text{ bijection }\sigma\text{ s.t. }P_{\sigma}{\bf y}={\bf y}\text{ and }\,\sigma(u)=w\},

and make the following consistency assumption.

Assumption 25 (NO-VN Consistency Criteria).

For any 𝐠𝟏𝒢n,𝒱,d1,e1,{\bf g_{1}}\in\mathcal{G}_{n,\mathcal{V},\mathcal{E}}^{\,d_{1},e_{1}}, 𝐠𝟐𝒢m,𝒱,d2,e2{\bf g_{2}}\in\mathcal{G}_{m,\mathcal{V},\mathcal{E}}^{\,d_{2},e_{2}}, letting HH be an obfuscating set of V1V_{1} and V2V_{2} of order |V2|=m|V_{2}|=m with 𝔬1,𝔬2𝔒H\mathfrak{o}_{1},\mathfrak{o}_{2}\in\mathfrak{O}_{H}, VV1V2V^{*}\subset V_{1}\cap V_{2} be the set of vertices of interest, and taking uV(g2)u\in V(g_{2}), if Ξ\Xi is a VN scheme satisfying this assumption, then

𝔯Ξ(𝐠𝟏,𝐠𝟐,𝔬1,V,𝒴(u;𝐠𝟐))=𝔯Ξ(𝐠𝟏,𝐠𝟐,𝔬2,V,𝒴(u;𝐠𝟐))\displaystyle\mathfrak{r}_{\Xi}({\bf g_{1}},{\bf g_{2}},\mathfrak{o}_{1},V^{*},\mathcal{Y}(u;{\bf g_{2}}))=\mathfrak{r}_{\Xi}({\bf g_{1}},{\bf g_{2}},\mathfrak{o}_{2},V^{*},\mathcal{Y}(u;{\bf g_{2}})) (5)
𝔬2𝔬11(𝒴(Ξ(𝐠𝟏,𝔬1(𝐠𝟐),V)[k]);𝔬1(𝐠𝟐))=𝒴(Ξ(𝐠𝟏,𝔬2(𝐠𝟐),V)[k];𝔬2(𝐠𝟐))\displaystyle\Leftrightarrow\mathfrak{o}_{2}\circ\mathfrak{o}_{1}^{-1}\big{(}\mathcal{Y}(\Xi({\bf g_{1}},\mathfrak{o}_{1}({\bf g_{2}}),V^{*})[k]);\mathfrak{o}_{1}({\bf g_{2}})\big{)}=\mathcal{Y}\left(\Xi({\bf g_{1}},\mathfrak{o}_{2}({\bf g_{2}}),V^{*})[k];\mathfrak{o}_{2}({\bf g_{2}})\right)
 for all k[m],\displaystyle\hskip 71.13188pt\text{ for all }k\in[m],

where Ξ(𝐠𝟏,𝔬(𝐠𝟐),V)[k]\Xi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*})[k] denotes the kk-th element in the ordering Ξ(𝐠𝟏,𝔬(𝐠𝟐),V)\Xi({\bf g_{1}},\mathfrak{o}({\bf g_{2}}),V^{*}) (i.e., the rank-kk vertex under Φ\Phi).

A network-oblivious VN scheme Ξ\Xi is then a VN scheme as in Definition 9, where the consistency criterion of Equation (1) is replaced with that in Equation (5) and we further require Equation (4) to hold. As with FA-VN schemes, we consider distributions satisfying the following assumption.

Assumption 26.

Let ((G1,𝐗,𝐖),(G2,𝐘,𝐙))FΘ(n,m)(n,m)((G_{1},{\bf X},{\bf W}),(G_{2},{\bf Y},{\bf Z}))\sim F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)} and define the events D5={𝐗=Pσ𝐗σ=id}D_{5}=\{{\bf X}=P_{\sigma}{\bf X}\implies\sigma=\operatorname{id}\} and D6={𝐘=Pσ𝐘σ=id}D_{6}=\{{\bf Y}=P_{\sigma}{\bf Y}\implies\sigma=\operatorname{id}\}. FΘ(n,m)F^{(n,m)}_{\Theta} is such that FΘ(n,m)(D5)=FΘ(n,m)(D6)=1\mathbb{P}_{F^{(n,m)}_{\Theta}}(D_{5})=\mathbb{P}_{F^{(n,m)}_{\Theta}}(D_{6})=1.

We note the parallel between this assumption and Assumption 21, while noting that the two assumptions concern permutations acting on markedly different objects (graphs, in the case of Assumption 21 and vertex-level features in the case of Assumption 26). Under Assumption 26, we have that (u;𝐠2)=a.s.𝒴(u;𝐠2)=a.s.{u}\mathcal{I}(u;{\bf g}_{2})\stackrel{{\scriptstyle a.s.}}{{=}}\mathcal{Y}(u;{\bf g}_{2})\stackrel{{\scriptstyle a.s.}}{{=}}\{u\}, and the consistency criteria of Assumption 25 are almost surely equivalent. As in Section 4, under this assumption, we have that Bayes optimality cannot be improved by ignoring the network. Indeed, one can show that the NO-VN scheme is almost surely a FA-VN scheme, and we are led once again to ask under what circumstances VN performance will be strictly worsened by ignoring the network (and subsequently, the edge features). To this end, we wish to compare Bayes optimality of NO-VN with that of FA-VN.

5.1 Network-oblivious Bayes optimality

We first establish the notion of a Bayes optimal NO-VN scheme for distributions satisfying Assumption 26. Define

𝔉n,m:={(𝐱,𝐲)n×d1×m×d2 s.t. 𝐱,𝐲 have distinct rows},\displaystyle\mathfrak{F}_{n,m}:=\{({\bf x},{\bf y})\in\mathbb{R}^{n\times d_{1}}\times\mathbb{R}^{m\times d_{2}}\text{ s.t. }{\bf x},{\bf y}\text{ have distinct rows}\},

and for (𝐱,𝐲)𝔉n,m({\bf x},{\bf y})\in\mathfrak{F}_{n,m}, define

(𝐱,[𝔬(𝐲)])\displaystyle({\bf x},[\mathfrak{o}({\bf y})]) ={(𝐱,𝐲^)𝔉n,m s.t. there exists permutation σ s.t. Pσ𝐲=𝐲^}\displaystyle=\bigg{\{}({\bf x},\widehat{\bf y})\in\mathfrak{F}_{n,m}\text{ s.t. there exists permutation }\sigma\text{ s.t. }P_{\sigma}{\bf y}=\widehat{\bf y}\bigg{\}}
(𝐱,[𝔬(𝐲)])u=𝔬(v)\displaystyle({\bf x},[\mathfrak{o}({\bf y})])_{u=\mathfrak{o}(v)} ={(𝐱,𝐲^)𝔉n,m s.t. there exists permutation σ s.t. Pσ𝐲=𝐲^\displaystyle=\bigg{\{}({\bf x},\widehat{\bf y})\in\mathfrak{F}_{n,m}\text{ s.t. there exists permutation }\sigma\text{ s.t. }P_{\sigma}{\bf y}=\widehat{\bf y}
 and σ satisfies σ(u)=𝔬(v)}.\displaystyle\hskip 56.9055pt\text{ and $\sigma$ satisfies }\sigma(u)=\mathfrak{o}(v)\bigg{\}}.

For a given FΘ(n,m)F^{(n,m)}_{\Theta} satisfying Assumption 26, we will define the Bayes optimal NO-VN scheme, Ξ\Xi^{*}, element-wise on vertex feature matrices with no row repetitions (similar to in Section 3.2), and then lift the scheme to all richly-featured graphs with vertex features not in 𝔉n,m\mathfrak{F}_{n,m}. For (𝐱,𝐰,𝐲,𝐳)({\bf x},{\bf w},{\bf y},{\bf z}) with 𝐱{\bf x} and 𝐲{\bf y} having distinct rows, let g1g_{1} and g2g_{2} be the unique graphs with edge structure compatible with 𝐰{\bf w} and 𝐳{\bf z} respectively. Writing 𝐠1=(g1,𝐱,𝐰){\bf g}_{1}=(g_{1},{\bf x},{\bf w}) and 𝐠2=(g2,𝐲,𝐳){\bf g}_{2}=(g_{2},{\bf y},{\bf z}), we define, writing ()\mathbb{P}(\cdot) for FΘ(n,m)()\mathbb{P}_{F^{(n,m)}_{\Theta}}(\cdot) to ease notation

Ξ(𝐠1,𝔬(𝐠2),V)[1]\displaystyle\Xi^{*}({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[1] argmaxuH((𝐱,[𝔬(𝐲)])u𝔬(V)|(𝐱,[𝔬(𝐲)]))\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\end{subarray}}\,\,\mathbb{P}\bigg{(}({\bf x},[\mathfrak{o}({\bf y})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,({\bf x},[\mathfrak{o}({\bf y})])\bigg{)}
Ξ(𝐠1,𝔬(𝐠2),V)[2]\displaystyle\Xi^{*}({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[2] argmaxuH{Ξ[1]}((𝐱,[𝔬(𝐲)])u𝔬(V)|(𝐱,[𝔬(𝐲)]))\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\setminus\{\Xi^{*}[1]\}\end{subarray}}\mathbb{P}\bigg{(}({\bf x},[\mathfrak{o}({\bf y})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,({\bf x},[\mathfrak{o}({\bf y})])\bigg{)}
\displaystyle\vdots
Ξ(𝐠1,𝔬(𝐠2),V)[m]\displaystyle\Xi^{*}({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[m] argmaxuH{j<m{Ξ[j]}((𝐱,[𝔬(𝐲)])u𝔬(V)|(𝐱,[𝔬(𝐲)])),\displaystyle\in\operatorname*{arg\,max}_{\begin{subarray}{c}u\in H\setminus\{\cup_{j<m}\{\Xi^{*}[j]\}\end{subarray}}\mathbb{P}\bigg{(}({\bf x},[\mathfrak{o}({\bf y})])_{u\in\mathfrak{o}(V^{*})}\,\,\big{|}\,\,({\bf x},[\mathfrak{o}({\bf y})])\bigg{)},

where we write (𝐱,[𝔬(𝐲)])({\bf x},[\mathfrak{o}({\bf y})]) in the conditioning statement as shorthand for (writing 𝐠1=(g1,𝐱,𝐰){\bf g}_{1}=(g_{1},{\bf x},{\bf w}) and 𝐠2=(g2,𝐲,𝐳){\bf g}_{2}=(g_{2},{\bf y},{\bf z}))

(𝐆1,𝔬(𝐆2)){((g1,𝐱,𝐰),𝔬((g2,𝐲,𝐳))) s.t. (𝐱,𝐲)(𝐱,[𝔬(𝐲)])}.\left({\bf G}_{1},\mathfrak{o}({\bf G}_{2})\right)\in\left\{((g_{1},{\bf x},{\bf w}),\mathfrak{o}((g_{2},{\bf y},{\bf z})))\text{ s.t. }({\bf x},{\bf y})\in({\bf x},[\mathfrak{o}({\bf y})])\right\}.

Note that once again, ties in the maximizations when constructing Ξ\Xi^{*} are assumed to be broken in an arbitrary but nonrandom manner. For each element

(𝐠1,𝐠2)(𝐠1,[𝔬(𝐠2)]),({\bf g}_{1}^{\prime},{\bf g}_{2}^{\prime})\in({\bf g}_{1},[\mathfrak{o}({\bf g}_{2})]),

choose the f-isomorphism σ\sigma such that 𝔬(𝐠2)=σ(𝔬(𝐠2))\mathfrak{o}({\bf g}_{2}^{\prime})=\sigma(\mathfrak{o}({\bf g}_{2})), and define

Ξ(𝐠1,𝔬(𝐠2),V)=σ(Ξ(𝐠1,𝔬(𝐠2),V)).{\Xi^{*}}({\bf g}_{1}^{\prime},\mathfrak{o}({\bf g}_{2}^{\prime}),V^{*})=\sigma({\Xi^{*}}({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})).

For elements (𝐱,𝐲)𝔉n,m({\bf x},{\bf y})\notin\mathfrak{F}_{n,m} and arbitrary edge features 𝐰,𝐳{\bf w},{\bf z}, any fixed and arbitrary definition of Ξ\Xi^{*} on (well-defined) graphs in 𝒢n×{𝐱}×{𝐰}×𝒢m×{𝐲}×{𝐳}\mathcal{G}_{n}\times\{{\bf x}\}\times\{{\bf w}\}\times\mathcal{G}_{m}\times\{{\bf y}\}\times\{{\bf z}\} suffices, subject to the internal consistency criterion in Equation (5), as this set has measure 0 under FΘ(n,m)F^{(n,m)}_{\Theta} under Assumption 26.

5.2 The benefit of network topology

Once again, for distributions satisfying Assumption 26, our aim is to under stand when Lk(Φ,V)<Lk(Ξ,V)L_{k}(\Phi^{*},V^{*})<L_{k}(\Xi^{*},V^{*}). That is, when does incorporating the network topology into the vertex-level features strictly improve VN performance? Theorem 27 characterizes these conditions. The proof is completely analogous to the proof of Theorem 2, and is included in Appendix B.3 for completeness.

Theorem 27.

Let FΘ(n,m)(n,m)F^{(n,m)}_{\Theta}\in\mathcal{F}^{(n,m)} be a richly-featured nominatable distribution satisfying Assumption 26. Let VV1V2V^{*}\subseteq V_{1}\cap V_{2} be a given set of vertices of interest and let HH be an obfuscating set of V1V_{1} and V2V_{2} of order |V2|=m|V_{2}|=m with 𝔬𝔒H\mathfrak{o}\in\mathfrak{O}_{H}. Let Φ\Phi^{*} and Ξ\Xi^{*} be Bayes optimal FA-VN and NO-VN schemes, respectively, under FΘ(n,m)F^{(n,m)}_{\Theta}. Then Lk(Φ,V)=Lk(Ξ,V)L_{k}(\Phi^{*},V^{*})=L_{k}(\Xi^{*},V^{*}) if and only if there exists a Bayes optimal FA-VN scheme Φ\Phi^{*} with

𝕀(XΦk;(𝐗,𝐘))=(XΦk).\mathbb{I}(X_{\Phi^{*}}^{k};({\bf X},{\bf Y}))=\mathbb{H}(X_{\Phi^{*}}^{k}).

Note that, since 𝕀(XΦk;(𝐗,𝐘))(XΦk),\mathbb{I}(X_{\Phi^{*}}^{k};({\bf X},{\bf Y}))\leq\mathbb{H}(X_{\Phi^{*}}^{k}), Theorem 27 can be restated as Lk(Φ,V)<Lk(Ξ,V)L_{k}(\Phi^{*},V^{*})<L_{k}(\Xi^{*},V^{*}) (i.e., incorporating network structure improves performance) if and only if all Bayes optimal FA-VN schemes Φ\Phi^{*} satisfy 𝕀(XΦk;(𝐗,𝐘))<(XΦk)\mathbb{I}(X_{\Phi^{*}}^{k};({\bf X},{\bf Y}))<\mathbb{H}(X_{\Phi^{*}}^{k}). Said yet another way, incorporating network structure improves VN performance if and only if there is excess uncertainty in XΦkX_{\Phi^{*}}^{k} conditional on the features (𝐗,𝐘)({\bf X},{\bf Y}). This is precisely when the network structure is informative— the FA-VN scheme Φ\Phi^{*} incorporates both network and feature information into its ranking, while the NO-VN scheme incorporates only the feature information carried by (𝐗,𝐘)({\bf X},{\bf Y}).

6 Simulations and Experiments

We turn now to a brief experimental exploration of the VN problem as applied to both simulated and real data. We consider a VN scheme based on spectral clustering, which we denote VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE}. We refer the reader to [1] for details and a further exploration of this scheme in an adversarial version of vertex nomination without node or edge features.

In our experiments, edge features will appear as edge weights or edge directions, while vertex features will take the form of feature matrices 𝐱{\bf x} and 𝐲{\bf y}, following the notation of previous sections. The scheme VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE} proceeds as follows. Note that we have assumed n=mn=m for simplicity, but the procedure can be extended to pairs of differently-sized networks in a straight-forward manner.

  • i.

    Pass the edge weights to ranks, and augment the diagonal of the adjacency matrix by setting Ai,i=jiAi,j/(n1)A_{i,i}=\sum_{j\neq i}A_{i,j}/(n-1) [58]; see Appendix A.1 for detail.

  • ii.

    Embed the two networks into a common Euclidean space, d\mathbb{R}^{d} using Adjacency Spectral Embedding [54]; see Appendix A.2 for details.

    The embedding dimension dd is chosen by estimating the elbow in the scree plots of the adjacency matrices of the networks G1G_{1} and G2G_{2} [66], taking dd to be the larger of the two elbows.

    Applying ASE to an nn-vertex graph results in a mapping of the nn vertices in the graph to points in d\mathbb{R}^{d}. We denote the embeddings of graphs G1G_{1} and G2G_{2} by X^1,X^2n×d\widehat{X}_{1},\widehat{X}_{2}\in\mathbb{R}^{n\times d}, respectively, with the ii-th row of each of these matrices corresponding to the embedding of the ii-th vertex in its corresponding network.

  • iii.

    Given seed vertices SS (see Appendix A.3) whose correspondence is known a priori across networks, solve the orthogonal Procrustes problem [17] (see Appendix A.4) to align the rows of X^1[S,:]\widehat{X}_{1}[S,:] and X^2[S,:]\widehat{X}_{2}[S,:]. Apply this Procrustes rotation to the rows of X^2\widehat{X}_{2}, yielding Y^2n×d\widehat{Y}_{2}\in\mathbb{R}^{n\times d}. If pp-dimensional vertex features are available, append the vertex features to the embeddings as Z1=[X^1|𝐱]n×(d+p)Z_{1}=[\widehat{X}_{1}\,|\,{\bf x}]\in\mathbb{R}^{n\times(d+p)} and Z2=[Y^2|𝐲]n×(d+p)Z_{2}=[\widehat{Y}_{2}\,|\,{\bf y}]\in\mathbb{R}^{n\times(d+p)}.

  • iv.

    Cluster the rows of both Z1Z_{1} and Z2Z_{2} using a Gaussian mixture modeling-based clustering procedure; see, for example, the mClust package in R; [15].

    For each vertex vv, let μv\mu_{v} and Σv\Sigma_{v} be the mean and covariance of the normal mixture component containing vv. For each uV(G2)u\in V(G_{2}), compute the distances

    D(V,u)=minvVmax{(vu)Σu1(vu),(vu)Σv1(vu)}.D(V^{*},u)=\min_{v^{*}\in V^{*}}\max\left\{\sqrt{(v^{*}-u)\Sigma_{u}^{-1}(v^{*}-u)},\sqrt{(v^{*}-u)\Sigma_{v^{*}}^{-1}(v^{*}-u)}\right\}.
  • v.

    Rank the unseeded vertices in G2G_{2} so that the vertex uu minimizing D(V,u)D(V^{*},u) is ranked first, with ties broken in an arbitrary but fixed manner.

Below, we apply this VN scheme in an illustrative simulation and in two real data network settings derived from neuroscience and text-mining applications.

6.1 Synthetic data

To further explore the complementary roles of network structure and features in vertex nomination, we consider the following simulation, set in the context of the stochastic blockmodel [18], as described in Definition 15. We consider G1SBM(5,b,Λ1)G_{1}\sim\text{SBM}(5,b,\Lambda_{1}) independent of G2SBM(5,b,Λ2),G_{2}\sim\text{SBM}(5,b,\Lambda_{2}), with V(Gi)={1,2,,250}V(G_{i})=\{1,2,\ldots,250\}, b(v)=250/vb(v)=\lceil 250/v\rceil,

Λ1=diag(ϵ+0.05,ϵ,ϵ,ϵ,ϵ)+0.3J5,\Lambda_{1}=\operatorname{diag}(\epsilon+0.05,\epsilon,\epsilon,\epsilon,\epsilon)+0.3*J_{5},

and Λ2=0.8Λ1+0.2J5\Lambda_{2}=0.8*\Lambda_{1}+0.2*J_{5}, where JpJ_{p} denotes the pp-by-pp matrix of all ones. We designate block 1 as the anomalous block, containing the vertices of interest across the two networks, with the signal in the anomalous block 1 dampened in G2G_{2} compared to G1G_{1} owing to the convex combination of Λ1\Lambda_{1} and the “flat” matrix J5J_{5}. We will consider the vertices of interest to be all vVv\in V such that b(v)=1b(v)=1. We select 10 vertices at random from block 1 in G1G_{1} and from block 1 in G2G_{2} to serve as “seeded” vertices, meaning vertices whose correspondences are known ahead of time.

We consider vertex features 𝐱,𝐲250×5{\bf x},{\bf y}\in\mathbb{R}^{250\times 5} of the form (letting IdI_{d} denote the dd-by-dd identity matrix)

𝐱(v){Normal(δ1,I5) if b(v)=1Normal(0,I5) if b(v)1𝐲(v){Normal(δ1,I5) if b(v)=1Normal(0,I5) if b(v)1{\bf x}(v)\sim\begin{cases}\operatorname{Normal}(\delta\vec{1},I_{5})\text{ if }b(v)=1\\ \operatorname{Normal}(\vec{0},I_{5})\text{ if }b(v)\neq 1\end{cases}\hskip 5.69054pt{\bf y}(v)\sim\begin{cases}\operatorname{Normal}(\delta\vec{1},I_{5})\text{ if }b(v)=1\\ \operatorname{Normal}(\vec{0},I_{5})\text{ if }b(v)\neq 1\end{cases}

independently over all vVv\in V and generating 𝐱{\bf x} and 𝐲{\bf y} independently of one another. Note that when applying our VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE} scheme to the above data, we set the number of blocks to be the “true” d=5d=5, with the number of clusters in step (iv) set to 5 as well. In practice, there are numerous principled heuristics to select this dimension parameter (e.g., USVT or finding an elbow in the scree plot [7, 66]) and the number of clusters (e.g., optimizing silhouette width or minimizing BIC [15]). We do not pursue these model selection problems further here.

The effects of ϵ\epsilon and δ\delta are as follows. Larger values of ϵ\epsilon provide more separation between the blocks in the underlying SBM, making it easier to distinguish the vertices of interest from the seeded vertices. This is demonstrated in Figure 2, where we vary ϵ=0,0.1,0.2,0.3,0.5\epsilon=0,0.1,0.2,0.3,0.5 with δ=1\delta=1 held fixed. The figure shows, for different values of kk, the gain in precision at kk achieved by incorporating the graph topology as compared to a nomination scheme based on features alone. That is, defining

  • rGF(k){r_{\operatorname{GF}}}(k) to be the number of vertices of interest in G2G_{2} nominated in the top kk by VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE} applied to (G1,𝐱,G2,𝐲)(G_{1},{\bf x},G_{2},{\bf y}), and

  • rF(k){r_{\operatorname{F}}}(k) to be the number of vertices of interest in G2G_{2} nominated in the top kk by VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE} applied to (𝐱,𝐲)({\bf x},{\bf y}), that is, step (iv) of the algorithm above applied only to the vertex features,

That is, letting Φ\Phi be any of the our nomination schemes under consideration (e.g., the features-only scheme) and letting V(G2)V^{*}(G_{2}) denote the vertices of interest in G2G_{2}, we evaluate performance according to the number of vertices ranked in the top

r(k)=|{uV(G2) s.t. rankΦ(v)k}|,k=1,2,,40.r(k)=\left|\{u\in V^{*}(G_{2})\in\text{ s.t. }\operatorname{rank}_{\Phi}(v)\leq k\}\right|,~{}~{}~{}k=1,2,\dots,40.

We note that we do not consider seeded vertices in our ranked list, so the maximum value achievable by either rGF{r_{\operatorname{GF}}} or rF{r_{\operatorname{F}}} is 40.

Figure 2 plots rGF(k)rF(k){r_{\operatorname{GF}}}(k)-{r_{\operatorname{F}}}(k) for k(1,10,20,30,40)k\in(1,10,20,30,40). Results are averaged over 100 Monte Carlo replicates of the experiment, with error bars indicating two standard errors of the mean. Examining the figure, we see the expected phenomenon: as ϵ\epsilon increases, the gain in VN precision from incorporating the network increases. For small values of ϵ\epsilon, the graphs are detrimental to performance when compared to using features alone, since the structure of Λ1\Lambda_{1} and Λ2\Lambda_{2} are such that it is difficulty to distinguish the communities from one another (and to distinguish the interesting community from the rest of the network). As ϵ\epsilon increases, the community structure in networks G1G_{1} and G2G_{2} becomes easier to detect, and incorporating network structure into the VN procedure becomes beneficial to performance as compared to a procedure using only vertex features.

Refer to caption
Figure 2: Improvement in vertex nomination performance under the stochastic block model specified above, as a function of ϵ=0,0.1,0.2,0.3,0.5\epsilon=0,0.1,0.2,0.3,0.5 for fixed δ=1\delta=1, based on 1010 randomly chosen seeded vertices. The plot shows rGF(k)rF(k){r_{\operatorname{GF}}}(k)-{r_{\operatorname{F}}}(k), as defined in Equation (1), for k{1,10,20,30,40}k\in\{1,10,20,30,40\}. Results are averaged over 100 Monte Carlo trials, with error bars indicating two standard errors of the mean.

While ϵ\epsilon controls the strength of the signal present in the network, δ\delta controls the signal present in the features, with larger values of δ\delta allowing stronger delineation of the block of interest from the rest of the graph based on features alone. To demonstrate this, we consider the same experiment as that summarized in Figure 2, but this time fixing ϵ=0.25\epsilon=0.25 and varying δ=0,0.5,1,1.5,2\delta=0,0.5,1,1.5,2. The results are summarized in Figure 3, where we plot rGF(k)rG(k){r_{\operatorname{GF}}}(k)-{r_{\operatorname{G}}}(k) over k(1,10,20,30,40)k\in(1,10,20,30,40) where rG(k){r_{\operatorname{G}}}(k) is the number of vertices of interest in G2G_{2} nominated in the top kk by VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE} applied to (G1,G2)(G_{1},G_{2}) (i.e., ignoring vertex features). As with Figure 2, we see that as δ\delta increases, the gain in VN performance from incorporating vertex features increases. For small values of δ\delta, features are slightly detrimental to performance, again owing to the fact that there is insufficient signal present in them to differentiate the vertices of interest from the rest of the network.

In each of Figures 2 and 3, using one of the two available data modalities (networks or features) gives performance that, while significantly better than chance, is suboptimal. These experiments suggest that combining informative network structure with informative features should yield better VN performance than utilizing either source in isolation.

Refer to caption
Figure 3: Improvement in vertex nomination performance under the stochastic block model specified above, as a function of δ=0,0.5,1,1.5,2\delta=0,0.5,1,1.5,2 for fixed ϵ=0.25\epsilon=0.25, based on 1010 randomly chosen seeded vertices. The plot shows rGF(k)rG(k){r_{\operatorname{GF}}}(k)-{r_{\operatorname{G}}}(k), as defined in Equation (3), for k{1,10,20,30,40}k\in\{1,10,20,30,40\}. Results are averaged over 100 Monte Carlo trials, with error bars indicating two standard errors of the mean.

6.2 C. Elegans

We next consider a real data example derived from the C. elegans connectome, as presented in [61, 59]. In this data, vertices correspond to neurons in the C. elegans, with edges encoding which pairs of neurons form synapses. The data capture the connectivity among the 302 labeled neurons in the hermaphroditic C. elegans brain for two different synapse types called electrical gap junctions and chemical synapses. These two different synaptic types yield two distinct connectomes (i.e., brain networks) capturing the two different kinds of interactions between neurons. After preprocessing the data, including removing neurons that are isolates in either connectome, symmetrizing the directed chemical connectome and removing self-loops (see [8] for details), we obtain two weighted networks on 253253 shared vertices: GcG_{c}, capturing the chemical synapses, and GeG_{e}, capturing the electrical gap junction synapses. The graphs are further endowed with vertex labels (i.e., vertex features), which assign each vertex (i.e., neuron) to one of three neuronal types: sensory, motor, or inter-neurons.

Refer to caption
Figure 4: Improvement in vertex nomination performance when using both network structure and neuronal type features, compared to (a) using network structure only and (b) using neuronal type features only. Performance was measured according to the number of vertices of interest whose corresponding match was ranked in the top kk (i.e., |{vV(Gc) s.t. rankΦ(v)k}||\{v\in V(G_{c})\in\text{ s.t. }\operatorname{rank}_{\Phi}(v^{\prime})\leq k\}|) as a function of kk. Each grey line corresponds to a single trial, and shows the improvement of this performance measure when using both network structure and vertex features as compared to the performance of its feature-oblivious (left) or network-oblivious (right) counterpart. We have highlighted in black a single “good” trial in each subplot.

Each of the 253 neurons in GcG_{c} has a known true corresponding neuron in GeG_{e}. Thus, there is a sensible ground truth in a vertex nomination problem across GcG_{c} and GeG_{e}, in the sense that each vertex in GcG_{c} has one and only one corresponding vertex in GeG_{e}. As such, this data provides a natural setting for evaluating vertex nomination performance. We thus consider the following experiment: a vertex vv in GcG_{c} is chosen uniformly at random and designated as the vertex of interest. An additional 20 vertices are sampled to serve as seeded vertices for the Procrustes alignment step, and the VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE} nomination scheme is applied as outlined previously. Performance was measured by computing the number of vertices of interest whose corresponding match was ranked in the top kk, according to

r(k)=|{vV(Gc) s.t. rankΦ(v)k}|,k=1,2,,253,r(k)=\left|\{v\in V(G_{c})\in\text{ s.t. }\operatorname{rank}_{\Phi}(v^{\prime})\leq k\}\right|,~{}~{}~{}k=1,2,\dots,253,

where vv^{\prime} denotes the vertex in GeG_{e} corresponding to vV(Gc)v\in V(G_{c}). We denote by rG,rF{r_{\operatorname{G}}},{r_{\operatorname{F}}} and rGF{r_{\operatorname{GF}}}, the performance of VN applied to, respectively, the network only; the features only; and both the network and features jointly. Figure 4 summarizes the result of 100 independent Monte Carlo trials of this experiment. Each curve in the figure corresponds to one trial. In each trial, we compared the performance of VN based on both network structure and vertex features against using either only network structure (i.e., feature-oblivious VN) or only vertex features (i.e., network-oblivious VN). The left panel of Figure 4 shows VN performance based on both network structure and neuronal type features, which we append onto X^c\widehat{X}_{c} and Y^e\widehat{Y}_{e} in step (iii) above, minus performance of the scheme using the graph alone (i.e., rGFrG{r_{\operatorname{GF}}}-{r_{\operatorname{G}}}). Similarly, in the right panel of Figure 4, we consider VN performance based on the graph with the neuronal type features minus performance of the scheme in the setting with only neuronal features (i.e., rGFrF{r_{\operatorname{GF}}}-{r_{\operatorname{F}}}). Within each plot, we have selected one line (i.e., one trial) to highlight, corresponding to a trial with a comparatively “good” seed set. Note that same trial (and thus the same seed set) is highlighted in both panels. Performance is also summarized in Table 2.

k=1k=1 k=5k=5 k=10k=10 k=15k=15 k=20k=20 k=25k=25 k=30k=30 k=50k=50
rGFrF{r_{\operatorname{GF}}}-{r_{\operatorname{F}}} 1.73 4.76 7.60 7.83 8.12 7.25 6.14 -1.29
rGFrG{r_{\operatorname{GF}}}-{r_{\operatorname{G}}} 1.53 7.94 15.55 22.31 28.30 34.42 41.35 73.43
Table 2: Mean values of rGFrG{r_{\operatorname{GF}}}-{r_{\operatorname{G}}} and rGFrF{r_{\operatorname{GF}}}-{r_{\operatorname{F}}} over the range of values of kk considered in the experiment. Note that the mean of rGFrF{r_{\operatorname{GF}}}-{r_{\operatorname{F}}} is less than 0 for larger kk.

Using only the neuronal features for vertex nomination amounts to considering a coarse clustering of the neurons into the three neuronal types. As such, recovering correspondences across the two networks networks based only on this feature information is effectively at chance, conditioned on the neuronal type. When rGFrF{r_{\operatorname{GF}}}-{r_{\operatorname{F}}} is approximately 0, the graph is effectively providing only enough information to coarsely cluster the vertices into their neuronal types. Examining Figure 4 and Table 2, it is clear that incorporating features adds significant signal compared to only considering network structure. Indeed, rGFrG{r_{\operatorname{GF}}}-{r_{\operatorname{G}}} is uniformly positive.

Interestingly, here the right-hand panel suggests that adding the network topology improves performance compared to a scheme that only uses features. Of course, in general we expect that network structure should add significant signal to the features, but this observation is surprising in the present setting. In the present data set, it is known that the network topology differs dramatically across the two different synapse types. For example, GcG_{c} has more than three times the edges of GeG_{e}. As a result, it is notoriously difficult to discover the vertex correspondence across this pair of networks using only topology. Indeed, state-of-the-art network alignment algorithms only recover approximately 5%5\% of the correspondences correctly even using 5050 a priori known seeded vertices [41]. It is thus not immediate that there is sufficient signal in the networks to identify individual neurons across networks beyond their vertex type. While the features add significant signal to the network, the graph also adds signal to the features. For small kk, which are typically most important for most vertex nomination problems, rGFrF{r_{\operatorname{GF}}}-{r_{\operatorname{F}}} is positive on average, and for well-chosen seed sets (see the black lines in the figure), this difference can be dramatic.

6.3 Wikipedia data

As another illustration of vertex nomination with features on real data, we consider a pair of networks derived from Wikipedia articles in English and French. We begin with a network GENG_{\mathrm{EN}} whose vertices correspond to English language Wikipedia articles and whose edges join pairs of articles reachable one from another via a hyperlink. The English language network corresponds to the n=1382n=1382 articles within two hops of the article titled “Algebraic Geometry”, with the articles grouped into 6 “types” according to their broad topics; see [31] for a detailed description. We then consider a paired network GFRG_{\mathrm{FR}} of n=1382n=1382 vertices corresponding to French language Wikipedia articles on the same topics, with correspondence across these networks encoding whether or not one article’s title is an exact or approximate translation of the other. The hyperlink structure among the articles within each language yields a natural network structure, and the semantic content of the pages, as encoded via a bag-of-words representation, provides a natural choice of vertex features. As in [51], we consider capturing both the network and semantic feature information within each network via dissimilarity matrices. We use shortest path dissimilarity in the hyperlink graph and cosine dissimilarity between extracted text features. This procedure yields four dissimilarity matrices, two for each of English and French Wikipedia. We then embed the pages according to these dissimilarity measures using canonical multidimensional scaling; see [6] for detail. This yields four embeddings, corresponding to each pairing of language (English or French) and structure (network or semantic features).

In order to disentangle the information contained in the network structure and the features, we consider the following experiment. Using s=10s=10 randomly chosen “seeded” vertices across the networks (recall that seeded vertices are those whose correspondence is known a priori across networks), we align embeddings using orthogonal Procrustes alignment [17]. We then cluster the combined point clouds as in step (iv) of Section 6 and nominate across the datasets using step (v) of Section 6. We use K=6K=6 clusters in MClust to reflect the six different broad article types in the Wikipedia data. We next use the JOFC algorithm of [44, 30] to jointly embed the English dissimilarities and (separately) jointly embed the French dissimilarities. Within each language, we then average across the embedded dissimilarities, and repeat the above procedure: Procrustes alignment using s=10s=10 randomly chosen seeded vertices across the networks, followed by clustering and nomination according to steps (iv) and (v) outlined at the beginning of Section 6. This procedure was repeated with 2525 Monte Carlo replicates. We note here that while the embedding procedure differs from adjacency spectral embedding, the core nomination strategy post-embedding is unchanged from that presented at the start of Section 6.

As in the C. elegans example, we consider each of the 13821382 articles as vertices of interest separately. For each vertex vv in GENG_{\mathrm{EN}}, we consider the rank of its true corresponding match vv^{\prime} in GFRG_{\mathrm{FR}}, and record how many vertices have their true corresponding matches ranked in the top kk, for varying values of kk. Figure 5 summarizes how including both network structure and vertex features improves performance, as measured by the fraction of vertices whose true matches are ranked in the top kk. In the left (respectively, right) panel of Figure 5, we plot the performance when nominating across the jointly embedded graphs as compared with the performance when nominating across only the embedded graph dissimilarities (respectively, text feature dissimilarities). Each light-colored curve in the figure corresponds to one of the 25 Monte Carlo replicates, with the dark curve representing the average across all 25 replicates. In both settings, we see the overall positive effect of using both network and features when nominating across languages. However, from the figure it is clear that there is an asymmetry in information content across network and features, in that the network topology contributes markedly less to the total performance gain than the textual feature information. Moreover, while the inclusion of text features is nearly uniformly helpful as compared with the network alone (left panel), a poor choice of seed vertices may result in a situation wherein network information actually impedes performance compared to using text-derived features alone. This can be seen in the light blue curves below zero in the right-hand panel of Figure 5.

Refer to caption
Figure 5: Improvement in vertex nomination performance when using both network structure and semantic text features in the Wikipedia network example using (left) only network structure and (right) using semantic text features only. Performance was measured according to the number of vertices of interest whose corresponding match was ranked in the top kk (i.e., |{vV(GFR) s.t. rankΦ(v)k}||\{v\in V^{*}(G_{\mathrm{FR}})\text{ s.t. }\operatorname{rank}_{\Phi}(v^{\prime})\leq k\}|) versus kk. Each light-colored line corresponds to a single Monte Carlo replicate, with the dark curves representing the average across all replicates.

7 Discussion

It is intuitively clear that informative features and network topology will together yield better performance in most network inference tasks compared to using either mode in isolation. Indeed, in the context of vertex nomination, this has been established empirically across a host of application areas [10, 32]. However, examples abound where the underlying network does not offer additional information for subsequent information retrieval, and may even be detrimental; see, for example, [45]. In this paper, we have established the first (to our knowledge) theoretical exploration of the dual role of network and features, and we provide necessary and sufficient conditions under which VN performance can be improved by incorporating both network structure and features. Along the way, we have formulated a framework for vertex nomination in richly-featured networks, and derived the analogue of Bayes optimality in this framework. We view this work as constituting an initial step towards a more comprehensive understanding of the benefits of incorporating features into network data and complementing classical data with network structure. A core goal of future work is to extend the framework presented here to incorporate continuous features; establish theoretical results supporting our empirical findings of the utility of features and network in the VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE} algorithm; understand the role of missing or noisily observed features; and develop a framework for adversarial attack analysis in this richly-featured setting akin to that in [1].

Acknowledgments This material is based on research sponsored by the Air Force Research Laboratory and DARPA under agreement numbers FA8750-18-2-0035 and FA8750-20-2-1001, and by NSF grants DMS-1646108 and DMS-2052918. This work is also supported in part by the D3M program of the Defense Advanced Research Projects Agency. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory and DARPA or the U.S. Government. The authors also gratefully acknowledge the support of NIH grant BRAIN U01-NS108637. KL acknowledges the support of the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.

Appendix A Algorithmic primitives

Here, we provide background information and technical details related to the algorithmic primitives involved in the VNGMMASE\text{VN}\circ\text{GMM}\circ\text{ASE} scheme described in Section 6.

A.1 Passing to ranks and diagonal augmentation

Consider a weighted adjacency matrix An×nA\in\mathbb{R}^{n\times n}, and let wsw\in\mathbb{R}^{s} be the vector of edge weights of AA. Note that we are agnostic to the dimension of ww, which will vary according to whether AA is symmetric, hollow, etc. Define rsr\in\mathbb{R}^{s} by taking rir_{i} to be the rank of wiw_{i} in the weight vector ww, with ties broken by averaging ranks. By the pass-to-ranks operation, we mean to replace the edge weights in ww with the vector 2r/(s+1)2r/(s+1). That is, replacing the weighted edges of AA by their ranks. Note that if AA is binary, the pass-to-ranks operation simply returns AA unchanged.

By diagonal augmentation we mean setting

Ai,i=jiAi,j/(n1)A_{i,i}=\sum_{j\neq i}A_{i,j}/(n-1)

for each i=1,2,,ni=1,2,\cdots,n. In experiments, we find that these preprocessing steps are essential for robust and reliable performance on real network data [58].

A.2 Adjacency Spectral Embedding

Given an undirected network with adjacency matrix An×nA\in\mathbb{R}^{n\times n}, the dd-dimensional Adjacency Spectral Embedding (ASE) of AA yields a mapping of the nn vertices in the network to points in dd-dimensional space in such a way that vertices that play similar structural roles in the network are mapped to nearby points in d\mathbb{R}^{d} [53].

Definition 28 (Adjacency spectral embedding).

Given d>0d\in\mathbb{Z}_{>0}, the adjacency spectral embedding (ASE) of AA into d\mathbb{R}^{d} is defined by X^=UASA1/2n×d\widehat{{X}}={U}_{{A}}{S}_{{A}}^{1/2}\in\mathbb{R}^{n\times d} where

|A|=[UA|UA][SASA][UA|UA]|{A}|=[{U}_{{A}}|{U}^{\perp}_{{A}}][{S}_{{A}}\oplus{S}^{\perp}_{{A}}][{U}_{{A}}|{U}^{\perp}_{{A}}]

is the spectral decomposition of |A|=(ATA)1/2|{A}|=({A}^{T}{A})^{1/2}, SAd×d{S}_{{A}}\in\mathbb{R}^{d\times d} is the diagonal matrix with the dd largest eigenvalues of |A||{A}| on its diagonal and UAn×d{U}_{{A}}\in\mathbb{R}^{n\times d} has columns which are the eigenvectors corresponding to the eigenvalues of SA{S}_{{A}}. The ii-th row of X^\widehat{X} corresponds to the position in dd-dimensional Euclidean space to which the ii-th vertex is mapped.

A.3 Seeds

In vertex nomination, vertices in the core CC are shared across the two networks, although the correspondence between CV1C\cap V_{1} and CV2C\cap V_{2} is unknown owing to the obfuscating function. In many applications, however, some of these correspondences may be known ahead of time. We refer to vertices in CC for which this correspondence is known as seeded vertices, and denote them by SCS\subseteq C. Said another way, seeded vertices are vertices in CC whose labels are not obfuscated. In this case, the obfuscating function would take the form 𝔬S:V2SH\mathfrak{o}_{S}:V_{2}\mapsto S\cup H where

𝔬S(u)={u if uShH if uV2S\mathfrak{o}_{S}(u)=\begin{cases}u&\text{ if }u\in S\\ h\in H&\text{ if }u\in V_{2}\setminus S\end{cases}

and HH is an obfuscating set of order m|S|m-|S| satisfying HVi=H\cap V_{i}=\emptyset for i=1,2i=1,2. Seeded vertices, and the information they provide, have proven to be valuable resources across both VN (e.g., [12, 28, 42]) and other network-alignment tasks (e.g., [13, 27, 35]).

A.4 Orthogonal Procrustes

The dd-dimensional adjacency spectral embedding of a network on nn vertices yields a collection of nn points in d\mathbb{R}^{d}, one point for each vertex. A natural way to compare two networks on nn vertices is to compare the point clouds produced by their adjacency spectral embeddings; see, e.g.,[56]. Approaches of this sort are especially natural in low-rank models, such as the random dot product graph [3, 48] and the stochastic block model. In such models, we can write the expectation of the adjacency matrix as 𝔼A=XXT\mathbb{E}A=XX^{T} for Xn×dX\in\mathbb{R}^{n\times d}, and the adjacency spectral embedding of AA is a natural estimate of XX, up to orthogonal rotation. That is, for some unknown orthogonal Qd×dQ\in\mathbb{R}^{d\times d}, XX and X^Q\widehat{{X}}Q are close. Non-identifiabilities of this sort are inherent to latent space network models, whereby transformations that preserve pairwise similarity of the latent positions lead to identical distributions over networks [50]. Owing to this non-identifiability, comparison of two networks via their adjacency spectral embeddings X^\widehat{{X}} and Y^\widehat{{Y}} requires accounting for this unknown rotation.

Given matrices X,Yn×dX,Y\in\mathbb{R}^{n\times d}, the orthogonal Procrustes problem seeks the orthogonal matrix Qd×dQ\in\mathbb{R}^{d\times d} that minimizes XQYF\|XQ-Y\|_{F} (where F\|\cdot\|_{F} is the Frobenius norm). The problem is solved by computing the singular value decomposition XTY=UΣVTX^{T}Y=U\Sigma V^{T}, with the optimal QQ given then by Q=UVTQ^{*}=UV^{T} [49]. We note that the orthogonal Procrustes problem is just one of a number of related alignment problems for point clouds [17].

Appendix B Proofs and supporting results

Below we provide proofs of our main theoretical results and supporting lemmas.

B.1 Proof of Theorem 14

Recall that 𝒮\mathcal{S} is the set of indices ii such that 𝐠1(i){\bf g}_{1}^{(i)} and 𝐠2(i){\bf g}_{2}^{(i)} are asymmetric as richly-featured networks (i.e., for j=1,2j=1,2 there are no non-trivial f-automorphisms of 𝐠j(i){\bf g}_{j}^{(i)}).

To compare the VN loss of Φ\Phi^{*} to that of an arbitrary VN scheme Φ\Phi, we will proceed as follows. Let km1k\leq m-1 be fixed. With

(𝐆𝟏,𝐆𝟐)=((G1,𝐗,𝐖),(G2,𝐘,𝐙))FΘ(n,m),({\bf G_{1}},{\bf G_{2}})=\left((G_{1},{\bf X},{\bf W}),(G_{2},{\bf Y},{\bf Z})\right)\sim F^{(n,m)}_{\Theta},

define Avj:={rankΦ(𝐆𝟏,𝔬(𝐆𝟐),V)(𝔬(v))=j}A^{j}_{v}:=\{\operatorname{rank}_{\Phi({\bf G_{1}},\mathfrak{o}({\bf G_{2}}),V^{*})}(\mathfrak{o}(v))=j\} for each j[k]j\in[k]. Then we have that

(Avj)\displaystyle\mathbb{P}(A^{j}_{v}) =i𝒮[Avj|(𝐠1(i),[𝔬(𝐠2(i))])]((𝐠1(i),[𝔬(𝐠2(i))])).\displaystyle=\sum_{i\in\mathcal{S}}\mathbb{P}\left[A^{j}_{v}\Big{|}({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}_{2}^{(i)})])\right]\mathbb{P}\left(({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}_{2}^{(i)})])\right).

Next, note that for each vVv\in V^{*} and i𝒮i\in\mathcal{S},

{(𝐠1,𝐠2)((𝐠1(i),[𝔬(𝐠2(i))]):rankΦ(𝐠1,𝔬(𝐠2),V)(𝔬(v))=j}\displaystyle\left\{({\bf g}_{1},{\bf g}_{2})\in\big{(}({\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}:\operatorname{rank}_{\Phi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})}(\mathfrak{o}(v))=j\right\}
={(𝐠1,𝐠2)((𝐠1(i),[𝔬(𝐠2(i))]):Φ(𝐠1,𝔬(𝐠2),V)[j]=𝔬(v)}\displaystyle=\left\{({\bf g}_{1},{\bf g}_{2})\in\big{(}({\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}:\Phi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[j]=\mathfrak{o}(v)\right\}
={(𝐠1,𝐠2)((𝐠1(i),[𝔬(𝐠2(i))]): f-isomorphism σ s.t. σ(𝔬(𝐠2(i)))=𝔬(𝐠2)\displaystyle=\Big{\{}({\bf g}_{1},{\bf g}_{2})\in\big{(}({\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}:\exists\text{ f-isomorphism }\sigma\text{ s.t. }\sigma(\mathfrak{o}({\bf g}_{2}^{(i)}))=\mathfrak{o}({\bf g}_{2})
 and σ(Φ(𝐠1(i),𝔬(𝐠2(i)),V)[j])=𝔬(v)}\displaystyle\hskip 42.67912pt\text{ and }\sigma\big{(}\,\Phi({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})[j]\,\big{)}=\mathfrak{o}(v)\Big{\}}
=(𝐠1(i),[𝔬(𝐠2(i))])Φ(𝐠1(i),𝔬(𝐠2(i)),V)[j]=𝔬(v).\displaystyle=\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}_{\Phi({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})[j]=\mathfrak{o}(v)}.

To ease notation in what follows, we define the following key term for the support of FΘ(n,m)F^{(n,m)}_{\Theta} satisfying Assumption 11; i.e., on all (𝐠1,𝐠2)i𝒮(𝐠1(i),[𝔬(𝐠2(i))])({\bf g}_{1},{\bf g}_{2})\in\bigcup_{i\in\mathcal{S}}\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)},

Rk(Φ,𝐠1,𝐠2,V):\displaystyle R_{k}(\Phi,{\bf g}_{1},{\bf g}_{2},V^{*}): =jkvV[Avj|(𝐠1,[𝔬(𝐠2)])]\displaystyle=\sum_{j\leq k}\sum_{v\in V^{*}}\mathbb{P}\left[A^{j}_{v}\,\big{|}\,\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}\,\right]
=jkvV[(𝐠1,[𝔬(𝐠2)])Φ(𝐠1,𝔬(𝐠2),V)[j]=𝔬(v)|(𝐠1,[𝔬(𝐠2)])]\displaystyle=\sum_{j\leq k}\sum_{v\in V^{*}}\mathbb{P}\left[\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}_{\Phi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[j]=\mathfrak{o}(v)}\,\big{|}\,\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}\right]
=jk[(𝐠1,[𝔬(𝐠2)])Φ(𝐠1,𝔬(𝐠2),V)[j]𝔬(V)|(𝐠1,[𝔬(𝐠2)])],\displaystyle=\sum_{j\leq k}\mathbb{P}\left[\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}_{\Phi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[j]\in\mathfrak{o}(V^{*})}\,\big{|}\,\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}\right],

and note that, by definition of Φ\Phi^{*} as the optimal nomination scheme, for any i𝒮i\in\mathcal{S},

Rk(Φ,𝐠1(i),𝐠2(i),V)Rk(Φ,𝐠1(i),𝐠2(i),V).R_{k}(\Phi,{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})\leq R_{k}(\Phi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*}).

Thus, for any FA-VN scheme Φ\Phi, we have

1Lk(Φ,V)\displaystyle 1-L_{k}(\Phi,V^{*}) =1kvV(rankΦ(𝐆𝟏,𝔬(𝐆𝟐),V)(𝔬(v))k)=1kjkvV(Avj)\displaystyle=\frac{1}{k}\sum_{v\in V^{*}}\mathbb{P}(\operatorname{rank}_{\Phi({\bf G_{1}},\mathfrak{o}({\bf G_{2}}),V^{*})}(\mathfrak{o}(v))\leq k)=\frac{1}{k}\sum_{j\leq k}\sum_{v\in V^{*}}\mathbb{P}(A_{v}^{j})
=1kjkvVi𝒮(Avj|(𝐠1(i),[𝔬(𝐠2(i))]))((𝐠1(i),[𝔬(𝐠2(i))]))\displaystyle=\frac{1}{k}\sum_{j\leq k}\sum_{v\in V^{*}}\sum_{i\in\mathcal{S}}\mathbb{P}\left(A^{j}_{v}\,\big{|}\,({\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})])\right)\mathbb{P}\left(({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}_{2}^{(i)})])\right)
=1ki𝒮Rk(Φ,𝐠1(i),𝐠2(i),V)((𝐠1(i),[𝔬(𝐠2(i))]))\displaystyle=\frac{1}{k}\sum_{i\in\mathcal{S}}R_{k}(\Phi,{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})\mathbb{P}\left(({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}_{2}^{(i)})])\right)
1ki𝒮Rk(Φ,𝐠1(i),𝐠2(i),V)((𝐠1(i),[𝔬(𝐠2(i))]))=1Lk(Φ,V),\displaystyle\leq\frac{1}{k}\sum_{i\in\mathcal{S}}R_{k}(\Phi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})\mathbb{P}\left(({\bf g}^{(i)}_{1},[\mathfrak{o}({\bf g}_{2}^{(i)})])\right)=1-L_{k}(\Phi^{*},V^{*}),

from which we deduce that Lk(Φ,V)Lk(Φ,V)L_{k}(\Phi^{*},V^{*})\leq L_{k}(\Phi,V^{*}), completing the proof.

B.2 Proof of Theorem 24

Suppose that 𝕀(XΦk;(G1,G2))=(XΦk)\mathbb{I}(X_{\Phi^{*}}^{k};(G_{1},G_{2}))=\mathbb{H}(X_{\Phi^{*}}^{k}), whence (XΦk|(G1,G2))=0\mathbb{H}(X_{\Phi^{*}}^{k}|(G_{1},G_{2}))=0 and thus for each (g1,g2)(g_{1},g_{2}) with ((G1,G2)=(g1,g2))>0\mathbb{P}((G_{1},G_{2})=(g_{1},g_{2}))>0 it holds for all ξ𝒯Hk\xi\in\mathcal{T}_{H}^{k} that

(XΦk=ξ|(G1,G2)=(g1,g2)){0,1}.\mathbb{P}(X_{\Phi^{*}}^{k}=\xi\,|\,(G_{1},G_{2})=(g_{1},g_{2}))\in\{0,1\}.

For each (g1,g2)(g_{1},g_{2}), let ξg1,g2\xi_{g_{1},g_{2}} denote the unique element in the support of XΦk|(G1,G2)=(g1,g2)X_{\Phi^{*}}^{k}\,|\,(G_{1},G_{2})=(g_{1},g_{2}). With this notation in hand, we define the FO-VN scheme Ψ\Psi as follows. For 𝐠1=(g1,𝐱,𝐰){\bf g}_{1}=(g_{1},{\bf x},{\bf w}) and 𝐠2=(g2,𝐲,𝐳){\bf g}_{2}=(g_{2},{\bf y},{\bf z}), take

Ψ(𝐠1,𝔬(𝐠2),V)=ξ^g1,g2,\Psi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})=\widehat{\xi}_{g_{1},g_{2}},

where ξ^g1,g2𝒯H\widehat{\xi}_{g_{1},g_{2}}\in\mathcal{T}_{H} satisfies

  • i.

    ξ^g1,g2[1:k]=ξg1,g2\widehat{\xi}_{g_{1},g_{2}}[1:k]=\xi_{g_{1},g_{2}};

  • ii.

    ξ^g1,g2[k+1:m]\widehat{\xi}_{g_{1},g_{2}}[k+1:m] is ordered lexicographically according to some predefined total ordering of HH.

Then Ψ\Psi is an FO-VN scheme by construction, and

Ψ(𝐆1,𝔬(𝐆2),V)[1:k]=Φ(𝐆1,𝔬(𝐆2),V)[1:k] almost surely,\Psi({\bf G}_{1},\mathfrak{o}({\bf G}_{2}),V^{*})[1:k]=\Phi^{*}({\bf G}_{1},\mathfrak{o}({\bf G}_{2}),V^{*})[1:k]~{}\text{ almost surely,}

from which Lk(Φ,V)=Lk(Ψ,V)Lk(Ψ,V)Lk(Φ,V)L_{k}(\Phi^{*},V^{*})=L_{k}(\Psi,V^{*})\geq L_{k}(\Psi^{*},V^{*})\geq L_{k}(\Phi^{*},V^{*}) and it follows that Lk(Φ,V)=Lk(Ψ,V)L_{k}(\Phi^{*},V^{*})=L_{k}(\Psi^{*},V^{*}), as desired.

To prove the other half of the Theorem, we proceed as follows. The assumption that Lk(Φ,V)=Lk(Ψ,V)L_{k}(\Phi^{*},V^{*})=L_{k}(\Psi^{*},V^{*}) implies that (with notation as in Section B.1),

0\displaystyle 0 =Lk(Φ,V)Lk(Ψ,V)\displaystyle=L_{k}(\Phi^{*},V^{*})-L_{k}(\Psi^{*},V^{*})
=1ki𝒮[Rk(Ψ,𝐠1(i),𝐠2(i),V)Rk(Φ,𝐠1(i),𝐠2(i),V)]((𝐠1(i),[𝔬(𝐠2(i))])),\displaystyle=\frac{1}{k}\sum_{i\in\mathcal{S}}\left[R_{k}(\Psi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})-R_{k}(\Phi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})\right]\mathbb{P}\left(({\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})])\right),

and therefore, since

Rk(Ψ,𝐠1(i),𝐠2(i),V)Rk(Φ,𝐠1(i),𝐠2(i),V)R_{k}(\Psi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})\leq R_{k}(\Phi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})

for all i𝒮i\in\mathcal{S}, we conclude that

Rk(Ψ,𝐠1(i),𝐠2(i),V)=Rk(Φ,𝐠1(i),𝐠2(i),V).R_{k}(\Psi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})=R_{k}(\Phi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*}).

Therefore, there exists a tie-breaking scheme in the definition of Φ\Phi^{*} that yields

Ψ(𝐠1(i),𝔬(𝐠2(i)),V)[1:k]=Φ(𝐠1(i),𝔬(𝐠2(i)),V)[1:k]\Psi^{*}({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})[1:k]{=}\Phi^{*}({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})[1:k]

for all i𝒮i\in\mathcal{S}, and hence

Ψ(𝐆1,𝔬(𝐆2),V)[1:k]=a.s.Φ(𝐆1,𝔬(𝐆2),V)[1:k].\Psi^{*}({\bf G}_{1},\mathfrak{o}({\bf G}_{2}),V^{*})[1:k]\stackrel{{\scriptstyle a.s.}}{{=}}\Phi^{*}({\bf G}_{1},\mathfrak{o}({\bf G}_{2}),V^{*})[1:k].

We therefore have that (XΦk|(G1,G2))=(XΨk|(G1,G2))\mathbb{H}(X_{\Phi^{*}}^{k}|(G_{1},G_{2}))=\mathbb{H}(X_{\Psi^{*}}^{k}|(G_{1},G_{2})). Since Ψ\Psi^{*} is a constant given (G1,G2)(G_{1},G_{2}), we have (XΨk|(G1,G2))=0\mathbb{H}(X_{\Psi^{*}}^{k}|(G_{1},G_{2}))=0, and therefore (XΦk|(G1,G2))=0\mathbb{H}(X_{\Phi^{*}}^{k}|(G_{1},G_{2}))=0, completing the proof.

B.3 Proof of Theorem 27

We assume throughout that FF satisfies Assumption 26. Suppose 𝕀(XΦk;(𝐗,𝐘))=(XΦk)\mathbb{I}(X_{\Phi^{*}}^{k};({\bf X},{\bf Y}))=\mathbb{H}(X_{\Phi^{*}}^{k}), implying that (XΦk|(𝐗,𝐘))=0\mathbb{H}(X_{\Phi^{*}}^{k}|({\bf X},{\bf Y}))=0. For each (𝐱,𝐲)({\bf x},{\bf y}) satisfying ((𝐗,𝐘)=(𝐱,𝐲))>0\mathbb{P}(({\bf X},{\bf Y})=({\bf x},{\bf y}))>0 and each ξ𝒯Hk\xi\in\mathcal{T}_{H}^{k}, we have

(XΦk=ξ|(𝐗,𝐘)=(𝐱,𝐲)){0,1}.\mathbb{P}(X_{\Phi^{*}}^{k}=\xi\,|\,({\bf X},{\bf Y})=({\bf x},{\bf y}))\in\{0,1\}.

For each (𝐱,𝐲)({\bf x},{\bf y}), let ξ𝐱,𝐲\xi_{{\bf x},{\bf y}} be the unique element in the support of XΦk|(𝐗,𝐘)=(𝐱,𝐲)X_{\Phi^{*}}^{k}\,|\,({\bf X},{\bf Y})=({\bf x},{\bf y}). We define the NO-VN scheme Ξ\Xi as follows. For 𝐠1=(g1,𝐱,𝐰){\bf g}_{1}=(g_{1},{\bf x},{\bf w}) and 𝐠2=(g2,𝐲,𝐳){\bf g}_{2}=(g_{2},{\bf y},{\bf z}), we take

Ξ(𝐠1,𝔬(𝐠2),V)=ξ^𝐱,𝐲,\Xi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})=\widehat{\xi}_{{\bf x},{\bf y}},

where ξ^𝐱,𝐲𝒯H\widehat{\xi}_{{\bf x},{\bf y}}\in\mathcal{T}_{H} satisfies

  • i.

    ξ^𝐱,𝐲[1:k]=ξ𝐱,𝐲\widehat{\xi}_{{\bf x},{\bf y}}[1:k]=\xi_{{\bf x},{\bf y}};

  • ii.

    ξ^𝐱,𝐲[k+1:m]\widehat{\xi}_{{\bf x},{\bf y}}[k+1:m] is ordered lexicographically according to some predefined total ordering of HH.

Ξ\Xi is an NO-VN scheme by construction, and

Ξ(𝐆1,𝔬(𝐆2),V)[1:k]=a.s.Φ(𝐆1,𝔬(𝐆2),V)[1:k],\Xi({\bf G}_{1},\mathfrak{o}({\bf G}_{2}),V^{*})[1:k]\stackrel{{\scriptstyle a.s.}}{{=}}\Phi^{*}({\bf G}_{1},\mathfrak{o}({\bf G}_{2}),V^{*})[1:k],

from which Lk(Φ,V)=Lk(Ξ,V)Lk(Ξ,V)Lk(Φ,V)L_{k}(\Phi^{*},V^{*})=L_{k}(\Xi,V^{*})\geq L_{k}(\Xi^{*},V^{*})\geq L_{k}(\Phi^{*},V^{*}), and we conclude that Lk(Φ,V)=Lk(Ξ,V)L_{k}(\Phi^{*},V^{*})=L_{k}(\Xi^{*},V^{*}).

To prove the other half of the theorem, we note that Lk(Φ,V)=Lk(Ξ,V)L_{k}(\Phi^{*},V^{*})=L_{k}(\Xi^{*},V^{*}) implies that (with notation as in Section B.1),

0\displaystyle 0 =Lk(Φ,V)Lk(Ξ,V)\displaystyle=L_{k}(\Phi^{*},V^{*})-L_{k}(\Xi^{*},V^{*})
=1ki𝒮[Rk(Ξ,𝐠1(i),𝐠2(i),V)Rk(Φ,𝐠1(i),𝐠2(i),V)]((𝐠1(i),[𝔬(𝐠2(i))])),\displaystyle=\frac{1}{k}\sum_{i\in\mathcal{S}}\left[R_{k}(\Xi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})-R_{k}(\Phi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})\right]\mathbb{P}\left(({\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})])\right),

since

Rk(Ξ,𝐠1(i),𝐠2(i),V)Rk(Φ,𝐠1(i),𝐠2(i),V)R_{k}(\Xi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})\leq R_{k}(\Phi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})

for all i𝒮i\in\mathcal{S}, we conclude that

Rk(Ξ,𝐠1(i),𝐠2(i),V)=Rk(Φ,𝐠1(i),𝐠2(i),V).R_{k}(\Xi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*})=R_{k}(\Phi^{*},{\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)},V^{*}).

Thus, there exists a tie-breaking scheme in the definition of Φ\Phi^{*} such that

Ξ(𝐠1(i),𝔬(𝐠2(i)),V)[1:k]=Φ(𝐠1(i),𝔬(𝐠2(i)),V)[1:k]\Xi^{*}({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})[1:k]{=}\Phi^{*}({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})[1:k]

for all i𝒮i\in\mathcal{S}, and hence

Ξ(𝐆1,𝔬(𝐆2),V)[1:k]=a.s.Φ(𝐆1,𝔬(𝐆2),V)[1:k].\Xi^{*}({\bf G}_{1},\mathfrak{o}({\bf G}_{2}),V^{*})[1:k]\stackrel{{\scriptstyle a.s.}}{{=}}\Phi^{*}({\bf G}_{1},\mathfrak{o}({\bf G}_{2}),V^{*})[1:k].

We then have that (XΦk|(𝐗,𝐘))=(XΞk|(𝐗,𝐘))\mathbb{H}(X_{\Phi^{*}}^{k}|({\bf X},{\bf Y}))=\mathbb{H}(X_{\Xi^{*}}^{k}|({\bf X},{\bf Y})). Since Ξ\Xi^{*} is a constant given (𝐗,𝐘)({\bf X},{\bf Y}), we have (XΞk|(𝐗,𝐘))=0\mathbb{H}(X_{\Xi^{*}}^{k}|({\bf X},{\bf Y}))=0, whence we conclude that (XΦk|(𝐗,𝐘))=0\mathbb{H}(X_{\Phi^{*}}^{k}|({\bf X},{\bf Y}))=0, which completes the proof.

B.4 Supporting lemmas

The following lemma follows from our assumption of asymmetry.

Lemma 29.

If (𝐠1,𝐠2)(𝐠1(i),[𝔬(𝐠2(i))])({\bf g}_{1},{\bf g}_{2})\in\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)} for i𝒮i\in\mathcal{S}, then

(𝐠1,[𝔬(𝐠2)])Φ(𝐠1,𝔬(𝐠2),V)[j]=𝔬(v)=(𝐠1(i),[𝔬(𝐠2(i))])Φ(𝐠1(i),𝔬(𝐠2(i)),V)[j]=𝔬(v).\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}_{\Phi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[j]=\mathfrak{o}(v)}=\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}_{\Phi({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})[j]=\mathfrak{o}(v)}.
Proof.

By the assumption that (𝐠1,𝐠2)(𝐠1(i),[𝔬(𝐠2(i))])({\bf g}_{1},{\bf g}_{2})\in\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}, we have that 𝐠1=𝐠1(i){\bf g}_{1}={\bf g}_{1}^{(i)}, and there exists an isomorphism τ\tau such that 𝐠2=τ(𝐠2(i)){\bf g}_{2}=\tau({\bf g}_{2}^{(i)}). From our assumption that i𝒮i\in\mathcal{S} and the consistency criteria in Definition 9,

Φ(𝐠1,𝔬(𝐠2),V)=τ(Φ(𝐠1(i),𝔬(𝐠2(i)),V)).\Phi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})=\tau(\Phi({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})).

A similar argument shows that (𝐠1,[𝔬(𝐠2)])=(𝐠1(i),[𝔬(𝐠2(i))])\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}=\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}. We then have that

(𝐠1,[𝔬(𝐠2)])Φ(𝐠1,𝔬(𝐠2),V)[j]=𝔬(v)\displaystyle\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}_{\Phi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[j]=\mathfrak{o}(v)}
={(𝐠1,𝐠2)(𝐠1,[𝔬(𝐠2)]) s.t. rankΦ(𝐠1,𝔬(𝐠2),V)(𝔬(v))=j}\displaystyle~{}~{}~{}~{}~{}~{}=\left\{({\bf g}_{1}^{\prime},{\bf g}_{2}^{\prime})\in\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}\text{ s.t. }\operatorname{rank}_{\Phi({\bf g}_{1}^{\prime},\mathfrak{o}({\bf g}_{2}^{\prime}),V^{*})}(\mathfrak{o}(v))=j\right\}
={(𝐠1,𝐠2)(𝐠1(i),[𝔬(𝐠2(i))]) s.t. rankΦ(𝐠1,𝔬(𝐠2),V)(𝔬(v))=j}\displaystyle~{}~{}~{}~{}~{}~{}=\left\{({\bf g}_{1}^{\prime},{\bf g}_{2}^{\prime})\in\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}\text{ s.t. }\operatorname{rank}_{\Phi({\bf g}_{1}^{\prime},\mathfrak{o}({\bf g}_{2}^{\prime}),V^{*})}(\mathfrak{o}(v))=j\right\}
=(𝐠1(i),[𝔬(𝐠2(i))])Φ(𝐠1(i),𝔬(𝐠2(i)),V)[j]=𝔬(v),\displaystyle~{}~{}~{}~{}~{}~{}=\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}_{\Phi({\bf g}_{1}^{(i)},\mathfrak{o}({\bf g}_{2}^{(i)}),V^{*})[j]=\mathfrak{o}(v)},

as we set out to show. ∎

Lemma 30.

Let Φ\Phi^{*} be a Bayes optimal VN scheme, and let Φ\Phi be any other VN scheme. For any (𝐠1,𝐠2)i𝒮(𝐠1(i),[𝔬(𝐠2(i))])({\bf g}_{1},{\bf g}_{2})\in\bigcup_{i\in\mathcal{S}}\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)},

Rk(Φ,𝐠1,𝐠2,V)Rk(Φ,𝐠1,𝐠2,V).\displaystyle R_{k}(\Phi,{\bf g}_{1},{\bf g}_{2},V^{*})\leq R_{k}(\Phi^{*},{\bf g}_{1},{\bf g}_{2},V^{*}).
Proof.

If there exists an i𝒮i\in\mathcal{S} such that (𝐠1,𝐠2)=(𝐠1(i),𝐠2(i))({\bf g}_{1},{\bf g}_{2})=({\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)}), the result follows from the definition of Φ\Phi^{*}. Consider then

(𝐠1,𝐠2){i𝒮(𝐠1(i),[𝔬(𝐠2(i))])}\{{(𝐠1(i),𝐠2(i))}i𝒮},({\bf g}_{1},{\bf g}_{2})\in\left\{\bigcup_{i\in\mathcal{S}}\big{(}{\bf g}_{1}^{(i)},[\mathfrak{o}({\bf g}_{2}^{(i)})]\big{)}\right\}\bigg{\backslash}\left\{\{({\bf g}_{1}^{(i)},{\bf g}_{2}^{(i)})\}_{i\in\mathcal{S}}\right\},

and let i𝒮i^{\prime}\in\mathcal{S} be such that (𝐠1,𝐠2)(𝐠1(i),[𝔬(𝐠2(i))])({\bf g}_{1},{\bf g}_{2})\in\big{(}{\bf g}_{1}^{(i^{\prime})},[\mathfrak{o}({\bf g}_{2}^{(i^{\prime})})]\big{)}. We have that

Rk(Φ,𝐠1,𝐠2,V)\displaystyle R_{k}(\Phi,{\bf g}_{1},{\bf g}_{2},V^{*}) =jk((𝐠1,[𝔬(𝐠2)])Φ(𝐠1,𝔬(𝐠2),V)[j]𝔬(V)|(𝐠1,[𝔬(𝐠2)]))\displaystyle=\sum_{j\leq k}\mathbb{P}\left(\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}_{\Phi({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[j]\in\mathfrak{o}(V^{*})}\,\big{|}\,\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}\right)
=jk((𝐠1(i),[𝔬(𝐠2(i))])Φ(𝐠1(i),𝔬(𝐠2(i)),V)[j]𝔬(V)|(𝐠1(i),[𝔬(𝐠2(i))]))\displaystyle=\sum_{j\leq k}\mathbb{P}\left(\big{(}{\bf g}_{1}^{(i^{\prime})},[\mathfrak{o}({\bf g}_{2}^{(i^{\prime})})]\big{)}_{\Phi({\bf g}_{1}^{(i^{\prime})},\mathfrak{o}({\bf g}_{2}^{(i^{\prime})}),V^{*})[j]\in\mathfrak{o}(V^{*})}\,\big{|}\,\big{(}{\bf g}_{1}^{(i^{\prime})},[\mathfrak{o}({\bf g}_{2}^{(i^{\prime})})]\big{)}\right)
jk((𝐠1(i),[𝔬(𝐠2(i))])Φ(𝐠1(i),𝔬(𝐠2(i)),V)[j]𝔬(V)|(𝐠1(i),[𝔬(𝐠2(i))]))\displaystyle\leq\sum_{j\leq k}\mathbb{P}\left(\big{(}{\bf g}_{1}^{(i^{\prime})},[\mathfrak{o}({\bf g}_{2}^{(i^{\prime})})]\big{)}_{\Phi^{*}({\bf g}_{1}^{(i^{\prime})},\mathfrak{o}({\bf g}_{2}^{(i^{\prime})}),V^{*})[j]\in\mathfrak{o}(V^{*})}\,\big{|}\,\big{(}{\bf g}_{1}^{(i^{\prime})},[\mathfrak{o}({\bf g}_{2}^{(i^{\prime})})]\big{)}\right)
=jk((𝐠1,[𝔬(𝐠2)])Φ(𝐠1,𝔬(𝐠2),V)[j]𝔬(V)|(𝐠1,[𝔬(𝐠2)]))\displaystyle=\sum_{j\leq k}\mathbb{P}\left(\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}_{\Phi^{*}({\bf g}_{1},\mathfrak{o}({\bf g}_{2}),V^{*})[j]\in\mathfrak{o}(V^{*})}\,\big{|}\,\big{(}{\bf g}_{1},\left[\mathfrak{o}({\bf g}_{2})\right]\big{)}\right)
=Rk(Φ,𝐠1,𝐠2,V),\displaystyle=R_{k}(\Phi^{*},{\bf g}_{1},{\bf g}_{2},V^{*}),

where the inequality follows from the optimality of Φ\Phi^{*}. ∎


References

  • [1] J. Agterberg, Y. Park, J. Larson, C. White, C. E. Priebe, and V. Lyzinski. Vertex nomination, consistent estimation, and adversarial modification. Electronic Journal of Statistics, 14(2):3230–3267, 2020.
  • [2] J. Arroyo, D. L. Sussman, C. E. Priebe, and V. Lyzinski. Maximum likelihood estimation and graph matching in errorfully observed networks. Journal of Computational and Graphical Statistics, 30(4):1111–1123, 2021.
  • [3] A. Athreya, D. E. Fishkind, K. Levin, V. Lyzinski, Y. Park, Y. Qin, D. L. Sussman, M. Tang, J. T. Vogelstein, and C. E. Priebe. Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226):1–92, 2018.
  • [4] P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences of the United States of America, 106:21068–73, 2009.
  • [5] N. Binkiewicz, J. T. Vogelstein, and K. Rohe. Covariate-assisted spectral clustering. Biometrika, 104(2):361–377, 2017.
  • [6] I. Borg and P. J. F. Groenen. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
  • [7] S. Chatterjee. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177–214, 2014.
  • [8] L. Chen, J. T. Vogelstein, V. Lyzinski, and C. E. Priebe. A joint graph inference case study: the c. elegans chemical and electrical connectomes. Worm, 5(2):e1142041, 2016.
  • [9] G. Coppersmith. Vertex nomination. Wiley Interdisciplinary Reviews: Computational Statistics, 6(2):144–153, 2014.
  • [10] G. A. Coppersmith and C. E. Priebe. Vertex nomination via content and context. arXiv preprint arXiv:1201.4118, 2012.
  • [11] H. Crane. Probabilistic foundations of statistical network analysis. Chapman and Hall/CRC, 2018.
  • [12] D. E. Fishkind, V. Lyzinski, H. Pao, L. Chen, and C. E. Priebe. Vertex nomination schemes for membership prediction. The Annals of Applied Statistics, 9(3):1510–1532, 2015.
  • [13] D.E. Fishkind, S. Adali, H. G. Patsolic, L. Meng, D. Singh, V. Lyzinski, and C.E. Priebe. Seeded graph matching. Pattern Recognition, 87:203 – 215, 2019.
  • [14] B. K. Fosdick and P. D. Hoff. Testing and modeling dependencies between a network and nodal attributes. Journal of the American Statistical Association, 110(511):1047–1056, 2015.
  • [15] C. Fraley and A. E. Raftery. Mclust: Software for model-based cluster analysis. Journal of Classification, 16(2):297–306, 1999.
  • [16] C. E. Ginestet, J. Li, P. Balachandran, S. Rosenberg, and E. D. Kolaczyk. Hypothesis testing for network data in functional neuroimaging. The Annals of Applied Statistics, 11(2):725–750, 2017.
  • [17] J. C. Gower and G. B. Dijksterhuis. Procrustes Problems. Oxford University Press, 2004.
  • [18] P. W. Holland, K. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
  • [19] Z. Huang, W. Chung, and H. Chen. A graph model for e-commerce recommender systems. Journal of the American Society for information science and technology, 55(3):259–274, 2004.
  • [20] B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83, 2011.
  • [21] M. Kim and J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet mathematics, 8(1-2):113–160, 2012.
  • [22] E. D. Kolaczyk. Statistical analysis of network data: Methods and models. Springer-Verlag New York, 2009.
  • [23] E. D. Kolaczyk and G. Csárdi. Statistical analysis of network data with R, volume 65. Springer, 2014.
  • [24] J. Lei. A goodness-of-fit test for stochastic block models. The Annals of Statistics, 44(1):401–424, 2016.
  • [25] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007.
  • [26] V. Lyzinski. Information recovery in shuffled graphs via graph matching. IEEE Transactions on Information Theory, 64(5):3254–3273, 2018.
  • [27] V. Lyzinski, S. Adali, J. T. Vogelstein, Y. Park, and C. E. Priebe. Seeded graph matching via joint optimization of fidelity and commensurability. arXiv preprint arXiv:1401.3813, 2014.
  • [28] V. Lyzinski, K. Levin, D. E. Fishkind, and C. E. Priebe. On the consistency of the likelihood maximization vertex nomination scheme: Bridging the gap between maximum likelihood estimation and graph matching. Journal of Machine Learning Research, 17(179):1–34, 2016.
  • [29] V. Lyzinski, K. Levin, and C. E. Priebe. On consistent vertex nomination schemes. Journal of Machine Learning Research, 20(69):1–39, 2019.
  • [30] V. Lyzinski, Y. Park, C. E. Priebe, and M. Trosset. Fast embedding for jofc using the raw stress criterion. Journal of Computational and Graphical Statistics, 26(4):786–802, 2017.
  • [31] Z. Ma, D. J. Marchette, and C. E. Priebe. Fusion and inference from multiple data sources in a commensurate space. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(3):187–193, 2012.
  • [32] D. Marchette, C. E. Priebe, and G. Coppersmith. Vertex nomination via attributed random dot product graphs. In Proceedings of the 57th ISI World Statistics Congress, volume 6, 2011.
  • [33] F. W. Marrs, B. K. Fosdick, and T. H. McCormick. Standard errors for regression on relational data with exchangeable errors. arXiv preprint arXiv:1701.05530, 2017.
  • [34] Z. Meng, S. Liang, H. Bao, and X. Zhang. Co-embedding attributed networks. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 393–401. ACM, 2019.
  • [35] E. Mossel and J. Xu. Seeded graph matching via large neighborhood statistics. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1005–1014. SIAM, 2019.
  • [36] M. E. J. Newman. Networks. Oxford University Press, 2nd edition, 2018.
  • [37] M. E. J. Newman and A. Clauset. Structure and inference in annotated networks. Nature Communications, 7(11863), 2016.
  • [38] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2015.
  • [39] M. Nickel, L. Rosasco, and T. Poggio. Holographic embeddings of knowledge graphs. In Thirtieth Aaai conference on artificial intelligence, 2016.
  • [40] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
  • [41] H. Patsolic, S. Adali, J. T. Vogelstein, Y. Park, C. E. Friebe, G. Li, and V. Lyzinski. Seeded graph matching via joint optimization of fidelity and commensurability. arXiv, pages arXiv–1401, 2014.
  • [42] H. G. Patsolic, Y. Park, V. Lyzinski, and C. E. Priebe. Vertex nomination via seeded graph matching. Statistical Analysis and Data Mining: the ASA Data Science Journal, 13(3):229–244, 2020.
  • [43] J. J. Pfeiffer III, S. Moreno, T. La Fond, J. Neville, and B. Gallagher. Attributed graph models: Modeling network structure with correlated attributes. In Proceedings of the 23rd international conference on World wide web, pages 831–842. ACM, 2014.
  • [44] C. E. Priebe, D. J. Marchette, Z. Ma, and S. Adali. Manifold matching: Joint optimization of fidelity and commensurability. Brazilian Journal of Probability and Statistics, 27(3):377–400, 2013.
  • [45] P. Rastogi, V. Lyzinski, and B. Van Durme. Vertex nomination on the cold start knowledge graph. Human Language Technology Center of Excellence: Technical report, 2017.
  • [46] F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook. In Recommender systems handbook, pages 1–35. Springer, 2011.
  • [47] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. Annals of Statistics, 39:1878–1915, 2011.
  • [48] P. Rubin-Delanchy, J. Cape, M. Tang, and C. E. Priebe. A statistical interpretation of spectral embedding: the generalised random dot product graph. Journal of the Royal Statistical Society: Series B, 84(4):1446–1473, 2022.
  • [49] P. H. Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
  • [50] C. R. Shalizi and D. Asta. Consistency of maximum likelihood for continuous-space network models. arXiv:1711.02123, 2017.
  • [51] C. Shen, J. T. Vogelstein, and C. E. Priebe. Manifold matching using shortest-path distance and joint neighborhood selection. Pattern Recognition Letters, 92:41–48, 2017.
  • [52] T. A. B. Snijders, J. Koskinen, and M. Schweinberger. Maximum likelihood estimation for social network dynamics. The Annals of Applied Statistics, 4(2):567, 2010.
  • [53] D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe. A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107(499):1119–1128, 2012.
  • [54] D. L. Sussman, M. Tang, and C. E. Priebe. Consistent latent position estimation and vertex classification for random dot product graphs. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(1):48–57, 2014.
  • [55] S. Suwan, D. S. Lee, and C. E. Priebe. Bayesian vertex nomination using content and context. Wiley Interdisciplinary Reviews: Computational Statistics, 7(6):400–416, 2015.
  • [56] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. A semiparametric two-sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics, 26(2):344–354, 2017.
  • [57] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, and C. E. Priebe. A nonparametric two-sample hypothesis testing problem for random dot product graphs. Bernoulli, 23(3):1599–1630, 2017.
  • [58] R. Tang, M. Ketcha, A. Badea, E. D. Calabrese, D. S. Margulies, J. T. Vogelstein, C. E. Priebe, and D. L. Sussman. Connectome smoothing via low-rank approximations. IEEE Transactions on Medical Imaging, 38(6):1446–1456, 2018.
  • [59] L. R. Varshney, B. L. Chen, E. Paniagua, D. H. Hall, and D. B. Chklovskii. Structural properties of the caenorhabditis elegans neuronal network. PLoS computational biology, 7(2):e1001066, 2011.
  • [60] J. T. Vogelstein and C. E. Priebe. Shuffled graph classification: Theory and connectome applications. Journal of Classification, 32(1):3–20, 2015.
  • [61] J. G. White, E. Southgate, J. N. Thomson, and S. Brenner. The structure of the nervous system of the nematode caenorhabditis elegans. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 314(1165):1–340, 1986.
  • [62] J. Yang, J. McAuley, and J. Leskovec. Community detection in networks with node attributes. In 2013 IEEE 13th International Conference on Data Mining, pages 1151–1156. IEEE, 2013.
  • [63] J. Yoder, L. Chen, H. Pao, E. Bridgeford, K. Levin, D. E. Fishkind, C. E. Priebe, and V. Lyzinski. Vertex nomination: The canonical sampling and the extended spectral nomination schemes. Computational Statistics & Data Analysis, 145, 2020.
  • [64] S. Zhang and H. Tong. Final: Fast attributed network alignment. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1345–1354. ACM, 2016.
  • [65] F. Zhou and F. De la Torre. Factorized graph matching. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 127–134. IEEE, 2012.
  • [66] M. Zhu and A. Ghodsi. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2):918–930, 2006.