# on_consistent_vertex_nomination_schemes__05ab583e.pdf Journal of Machine Learning Research 20 (2019) 1-39 Submitted 1/18; Revised 12/18; Published 4/19 On Consistent Vertex Nomination Schemes Vince Lyzinski vlyzinski@umass.edu Department of Mathematics and Statistics University of Massachusetts Amherst Amherst, MA 01003, USA Keith Levin klevin@umich.edu Department of Statistics University of Michigan Ann Arbor, MI 48109, USA Carey E. Priebe cep@jhu.edu Department of Applied Mathematics and Statistics Johns Hopkins University Baltimore, MD 21218-2608, USA Editor: Edo Airoldi Given a vertex of interest in a network G1, the vertex nomination problem seeks to find the corresponding vertex of interest (if it exists) in a second network G2. A vertex nomination scheme produces a list of the vertices in G2, ranked according to how likely they are judged to be the corresponding vertex of interest in G2. The vertex nomination problem and related information retrieval tasks have attracted much attention in the machine learning literature, with numerous applications to social and biological networks. However, the current framework has often been confined to a comparatively small class of network models, and the concept of statistically consistent vertex nomination schemes has been only shallowly explored. In this paper, we extend the vertex nomination problem to a very general statistical model of graphs. Further, drawing inspiration from the long-established classification framework in the pattern recognition literature, we provide definitions for the key notions of Bayes optimality and consistency in our extended vertex nomination framework, including a derivation of the Bayes optimal vertex nomination scheme. In addition, we prove that no universally consistent vertex nomination schemes exist. Illustrative examples are provided throughout. Keywords: Vertex nomination, graph inference, recommender systems 1. Introduction Statistical inference on graphs is an important branch of modern statistics and machine learning. In recent years, there have been numerous papers in the literature developing graph analogues of statistical inference tasks such as hypothesis testing (Asta and Shalizi, 2015; Tang et al., 2017b), classification (Tang et al., 2013; Chen et al., 2016a), and clustering (Luxburg, 2007; Rohe et al., 2011; Sussman et al., 2012; Newman and Clauset, 2016). Moreover, growth in the size and complexity of network data sets have necessitated techniques for network-specific data mining tasks such as link prediction (Liben-Nowell c 2019 Vince Lyzinski, Keith Levin, and Carey E. Priebe. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v20/18-048.html. Lyzinski, Levin, and Priebe Figure 1: A visual representation of the classical Vertex Nomination framework: Given a community of interest in a network (here the red community) and some examples of vertices that are/are not part of the community of interest (colored red and green, respectively), rank the remaining vertices in the network into a nomination list, with those vertices from the community of interest concentrating at the top of the nomination list. and Kleinberg, 2007; L u and Zhou, 2011); entity resolution and network alignment (Conte et al., 2004; Lyzinski, 2018); and vertex nomination (Coppersmith and Priebe, 2012; Coppersmith, 2014; Suwan et al., 2015; Fishkind et al., 2015; Lyzinski et al., 2016). Akin to the development of classical statistics, algorithmic advancement has, in many ways, outpaced theoretical developments in these emerging graph-driven domains. This development has been necessitated by the dizzying pace of data generation, but there is nevertheless the need for a firm theoretical context in which to frame algorithmic progress. In this paper, drawing inspiration from the long-established classification framework in the pattern recognition literature (Devroye et al., 1997), we provide a rigorous theoretical framework for understanding statistical consistency in the vertex nomination inference task. The vertex nomination (VN) task, which can be viewed as the graph analogue of the more classical recommender system task (Ricci et al., 2011), has traditionally been stated as follows: given a community of interest in a network and some examples of vertices that are or are not part of a community of interest, vertex nomination seeks to rank the remaining vertices in the network into a nomination list, with those vertices from the community of interest (ideally) concentrating at the top of the nomination list. See Figure 1 for a visual representation of this classical Vertex Nomination framework. In limited-resource settings, vertex nomination tools have proven to be effective in efficiently searching and querying large networks, with applications including detecting fraudsters in the Enron email network (Coppersmith and Priebe, 2012; Marchette et al., 2011; Suwan et al., 2015), uncovering web advertisements that have association with human trafficking (Fishkind et al., 2015), and identifying latent structure in connectome data (Fishkind et al., 2015; Yoder et al., 2018). While related to the community detection problem (Newman, 2006; Luxburg, 2007; Bickel and Chen, 2009; Newman and Clauset, 2016), this traditional formulation of the VN problem is a semi-supervised inference task whose output is not an assignment of vertices to communities, but rather a ranked estimate of which vertices belong to a particular community of interest. That is, in contrast to community detection, the VN problem does On Consistent Vertex Nomination Schemes not aim to recover the community memberships of any vertices not in the community of interest. Clearly, any method that can recover the community memberships of all vertices in a graph can recover the interesting community, and hence any community detection algorithm can be repurposed for the VN problem just described with minor adaptation (e.g., by ranking vertices according to their probability of membership in the community of interest); see, for example, the spectral vertex nomination scheme of Fishkind et al. (2015). The specific performance of such an adaptation is highly dependent on the fidelity of the base clustering procedure, and the performance is often below that of the semi-supervised VN specific analogues (see Yoder et al., 2018). The above formulation of the VN task assumes the presence of strong community structure among the vertices of interest in the graph. In practice, this is often a reasonable assumption, particularly if it is expected that interesting vertices will behave similarly to one another in the network. However, the particular features that mark a vertex as interesting are entirely task-dependent. To paraphrase the common proverb, interestingness is in the eye of the practitioner. Interesting vertices may be, for example, those with large network centrality (Jeong et al., 2001; Newman, 2005), those with a particular role in the network (Lusseau and Newman, 2004), or those corresponding to a given user across social networks (Patsolic et al., 2017). In these applications, interesting vertices need not correspond precisely to the community structure captured by a generative network model, and hence such cases are ill-described by the community-based VN problem described above. To accommodate this task-dependency and broader notion of interesting vertices, we consider the following generalization and extension of the previously-presented VN problem: Given a vertex of interest v in a graph G1 = (V1, E1), find the corresponding vertex of interest u (if it exists) in a second graph G2 = (V2, E2) by ranking the vertices of G2 according to our confidence that they correspond to v in graph G1; see Figure 2 for a visual representation of this VN framework. In this formulation, which is an (potentially) unsupervised inference task, what defines v as interesting is entirely model-dependent, and different network models can highlight different characteristics of interest in the graph. Potential application domains for this VN generalization abound, including identifying users of interest across social network platforms (see, for example, Patsolic et al., 2017), identifying structural signal across connectomes (see, for example, Sussman et al., 2018), and identifying topics of interest across graphical knowledge bases (see, for example, Sun and Priebe, 2013). In Fishkind et al. (2015) and Lyzinski et al. (2016), the notion of a consistent vertex nomination scheme (i.e., an asymptotically optimal solution to the VN problem) was proposed for the original formulation of the VN problem, in which community membership entirely determines whether or not a given vertex is interesting. This definition of consistency was based on the mean average precision (MAP) of a nomination scheme operating on a graph model with explicit community structure encoded by the the Stochastic Block Model (SBM) of Holland et al. (1983). Under this restricted notion of consistency, Fishkind et al. (2015) derived the analogue of universal Bayes optimality in the VN setting, namely a scheme that achieves the optimal mean average precision for all parameterizations of the underlying SBM. While this derivation of the Bayes optimal scheme somewhat parallels the derivation of the Bayes optimal classifier in the classical pattern recognition literature, the SBM model assumption and MAP formulation greatly narrow the set of models and sets of interesting vertices we can consider. In this paper, we revamp and generalize the concept Lyzinski, Levin, and Priebe Figure 2: A visual representation of the generalized Vertex Nomination framework: Given a vertex of interest v (colored red) in a graph G1 = (V1, E1), find the corresponding vertex of interest u (if it exists) in a second graph G2 = (V2, E2), ranking the vertices of G2 into a nomination list so that u ideally appears at the top of the nomination list. of VN consistency and of VN Bayes optimality in the two-graph VN framework. This framework is quite general, and further allows us to highlight the similarities and differences between our new VN consistency formulation and its analogue in the classification literature defined in, for example, Devroye et al. (1997). The paper is laid out as follows. In the remainder of this section, we provide brief overviews of information retrieval as it relates to vertex nomination (Section 1.1) and the Bayes optimal classifier in the classical setting (Section 1.2), and conclude the introduction by establishing notation for the remainder of the paper (Section 1.3). In Section 2, we define the VN problem framework that is the focus of this paper, and in Section 3 we derive the VN analogue of a Bayes optimal scheme. In Section 4, we define a new notion of VN consistency, and we prove that no universally consistent VN scheme exists, providing an interesting contrast to the standard classification setting. We conclude in Section 5 with a short summary comparing and contrasting VN with classical classification and a discussion of implications and future directions. 1.1. Connections to Information Retrieval The vertex nomination task is, in some ways, similar to the task faced by recommender systems (Resnick and Varian, 1997; Ricci et al., 2011), in which the aim is to retrieve objects (e.g., documents or images) likely to be of interest to a user based on his or her previous behavior. For example, the celebrated Page Rank algorithm (Brin and Page, 1998) recommends webpages based on random walks on the world wide web graph, in which websites are nodes and (directed) edges reflect hyperlinks between pages. The information retrieval (IR) literature includes many such graph-based approaches. We refer the reader to On Consistent Vertex Nomination Schemes Ricci et al. (2011) and Mihalcea and Radev (2011) for the state of the art circa 2010, and concentrate here on recapping more recent graph-based information retrieval techniques. Many graph-based IR techniques rely on the assumption that similar objects (i.e., documents, webpages, etc.) will lie near one another in a suitably-constructed graph. Indeed, this intuition underlies many graph-based approaches throughout machine learning and related disciplines (see, for example, Belkin and Niyogi, 2003; Zhou et al., 2004). Techniques along these lines have been applied toward many tasks in natural language processing, typically inspired by Page Rank (Rothe and Sch utze, 2014). Along similar lines, Ma et al. (2012) applied a diffusion-based method (Coifman and Lafon, 2006) to the world wide web graph to yield an approach to ranking for query completion and recommendation. These information retrieval techniques can be naturally adapted to the vertex nomination problem by treating the vertex or vertices of interest as the object or objects to be retrieved. The vertex nomination problem also bears similarities to the task of learning to rank (Duh, 2009; Liu, 2009; Li, 2011), in which the goal is to learn an ordering on a set of objects (e.g., documents, images, videos, etc.) according to (estimated) similarity or relevance to a given query object. In the learning to rank literature, graphs usually appear as training instances, with nodes corresponding to objects and edges encoding preferences or similarities among them elicited from users (e.g., an undirected weighted edge may join two documents judged to be similar). The work in Agarwal et al. (2006) is among the earliest to consider the problem of ranking objects in a network. The authors modified the Page Rank algorithm to take preference information into account, rather than working solely with the hyperlink graph. In Agarwal (2010), the authors used a data graph encoding object similarities to obtain a regularizer similar to Belkin et al. (2006) on the empirical ranking error, with the target ranking encoded in a preference graph. More recent efforts along these lines have focused on the problem of incorporating network structure present between entities of different types, for example, between users and events in a social network (Luo et al., 2014; Pham et al., 2016). Here again, any learning to rank algorithm has a natural adaptation to the VN problem by using the first graph, in which some vertices are labeled, as training data to learn a ranking on the vertices of the second graph. 1.2. Bayes Error in Classical Pattern Recognition In this section, we review the concepts of consistency and Bayes error from the statistical classification literature. We do not aim to give an exhaustive overview of the subject, but only to provide a rough outline as to the structures that we would like to replicate in the context of vertex nomination. For a more thorough treatment, we refer the interested reader to Devroye et al. (1997), whose presentation we follow below. We begin by recalling the classical definition of Bayes error. Note that we will restrict our attention to the two-class problem to maximally bring forth the similarities (and differences) between statistical classification and VN, as in VN vertices are either of interest or not. Definition 1 Consider a set of potential observations X and a set of unknown class labels {0, 1} for objects in X. A classifier is a function h : X {0, 1}, which aims to predict the class label of a given observation in X. Given a distribution F supported on X {0, 1}, the error for the classifier h is given by L(h) = P(h(X) = Y ) where (X, Y ) F. Lyzinski, Levin, and Priebe Any classifier that achieves the lowest possible error is said to be a Bayes optimal classifier. We write h for any such optimal classifier, which by definition satisfies h arg minh:X {0,1} L(h). It is easily seen in this two-class framework that the Bayes optimal classifier is given by ( 1 if E(Y |X = x) = P(Y = 1|X = x) > 1/2; 0 else. (1) Practically speaking, the Bayes optimal scheme chooses the label which maximizes the class-conditional probability of the observed data. The corresponding error, L = L(h ), is called the Bayes error. Of course, h depends on the distribution F of (X, Y ), and, when appropriate, we will make this dependence explicit by writing L F . In practice, a classifier is often constructed based on training data (X1, Y1), (X2, Y2), . . . , (Xn, Yn), where the data (Xi, Yi) are drawn i.i.d. according to F. This supervised classification framework is defined as follows. Definition 2 Consider a set of potential observations X and a set of unknown class labels {0, 1} for objects in X. A (supervised) classifier is a function hn : X {X {0, 1} }n {0, 1}, which aims to predict the class label of a given observation in X based on n training observations (x1, y1), (x2, y2), . . . , (xn, yn) X {0, 1}. Given a distribution F supported on X {0, 1}, the error for the classifier hn is given by LF (hn) = P hn(X, (Xi, Yi)n i=1) = Y | (Xi, Yi)n i=1 where (X, Y ), (X1, Y1), (X2, Y2), . . . , (Xn, Yn) i.i.d. F. Note that LF (hn) is a random variable in which {(Xi, Yi)}n i=1 are drawn i.i.d. from F, but then held fixed as we average over (X, Y ) F. A sequence of classifiers h = (hn) n=1 is called a classification rule. Informally, a good classification rule is one for which the probability of error becomes arbitrarily close to Bayes optimal as n . The precise nature of what we mean by close is codified in the concept of statistical consistency. Definition 3 A classification rule h = (hn) n=1 is consistent with respect to F if EF (L(hn)) L F . The rule h is strongly consistent if LF (hn) a.s. L F . A rule that is (strongly) consistent for all distributions F on X {0, 1} is called (strongly) universally consistent. On Consistent Vertex Nomination Schemes Perhaps surprisingly, given that F can have arbitrary structure on X {0, 1}, universally consistent classification rules exist; see Stone (1977) for the first proof of this phenomenon. In Fishkind et al. (2015), a notion of consistency for vertex nomination was presented, roughly analogous to Definition 3. In contrast to the classification task presented above, vertex nomination requires a ranking of the vertices, rather than merely the classification of a single vertex. As such, a vertex nomination scheme is evaluated in Fishkind et al. (2015) based on average precision (Manning et al., 2008), rather than simply a fraction of correctly-classified vertices. In Fishkind et al. (2015), VN consistency is defined in the context of stochastic block model (SBM) random graphs with respect to a provably optimal canonical nomination scheme. This canonical scheme plays an analogous role of Bayes optimal classifiers in this restricted model framework (see Section 3 below). The goal of this paper is to explore and further develop a broader notion of VN consistency that encompasses a more expressive class of models than the SBM. 1.3. Notation and Background We conclude this section by establishing notation and reviewing a few of the more popular statistical network models that we will make use of as examples in the sequel. 1.3.1. Notation For a set S, we let |S| denote its cardinality and S 2 denote the set of all unordered pairs of distinct elements from S. Throughout, we will denote graphs via the ordered pair G = (V, E), with vertices V and edges E V 2 . All graphs considered herein will be labeled, hollow (i.e., containing no self-edges), and undirected. We let Gn denote the set of all labeled, hollow, undirected graphs on n vertices. Given a graph G, we will let V (G) denote the vertices of G and E(G) denote its edges. We note that when G is random, this latter set is a random subset of V 2 . For a set of vertices S V (G), we let G[S] denote the subgraph of G induced by S, i.e., the graph G = (S, E) with {u, v} E if and only if {u, v} E(G). In a few places, we will require the notion of an asymmetric graph. A graph G Gn is asymmetric if it has no nontrivial automorphisms (Erd os and R enyi, 1963). For a positive integer n Z, we will define [n] = {1, 2, . . . , n}, and Gn to be the be the set of labeled graphs on n vertices. Throughout this paper, we will often, in order to simplify notation, suppress dependence of parameters on n. Throughout, the reader should assume that, unless specified otherwise, all parameters depend on the number of vertices n. 1.3.2. Stochastic block models The stochastic block model (SBM) is a widely studied model for edge-independent random graphs with latent community structure (Holland et al., 1983; Hoffet al., 2002; Karrer and Newman, 2011). Definition 4 We say that a random graph G = (V, E) Gn is an instantiation of a stochastic block model with parameters (K, B, b), written G SBM(K, B, b), if i. V is partitioned into K classes (called communities or blocks), V = V1 V2 VK. ii. The block membership vector b [K]|V | is such that for all k [K], bv = k if and only if v Vk. Lyzinski, Levin, and Priebe iii. The symmetric matrix B [0, 1]K K denotes the edge probabilities between and within blocks, with 1{ {u,v} E(G)} ind. Bernoulli(Bbu,bv). We note that when K = 1, we recover the Erd os-R enyi random graph (Erd os and R enyi, 1959), in which the edges of G are present or absent independently with probability p. In this special case, we write G ER(n, p). By a slight abuse of notation, for a symmetric matrix P [0, 1]n n, we will write G ER(P) if, identifying the vertices of G with [n], we have {i, j} E(G) with probability Pi,j independent of the other edges. With no restrictions on P, ER(P) random graphs can be viewed as n-block SBMs and are the most general edge-independent random graph model. The latent community structure inherent to SBMs makes them a natural model for use in the traditional vertex nomination framework. Recall the traditional VN task: given a community of interest in a network and some examples of vertices that are or are not part of the community of interest, vertex nomination seeks to rank the remaining vertices in the network into a nomination list, with those vertices from the community of interest (ideally) concentrating at the top of the nomination list. As a result, previous work on VN consistency (Fishkind et al., 2015) has been posed within the SBM framework, with the optimal scheme only obtaining its optimality for SBMs. We note that we consider herein the SBM setting where communities are disjoint and each vertex can only belong to a single community. However, the results contained herein translate immediately to the mixed membership SBM setting (Airoldi et al., 2008); details are omitted for brevity. 1.3.3. Random dot product graphs In stochastic block models, the block assignment vector can be viewed as a latent feature vector for the vertices in the network, with these features (i.e., block memberships) defining the connectivity structure in the network. The random dot product graph (RDPG) model (Young and Scheinerman, 2007) allows for more nuanced vertex features to be incorporated into the model and has been used as the setting for a VN formulation similar to the one proposed here; see Patsolic et al. (2017) for details. Definition 5 We say that a random graph G = (V, E) Gn is an instantiation of a ddimensional random dot product graph with parameters X, written G RDPG(X), if i. The matrix X Rn d is such that 0 (XXT )i,j 1 for all i, j [n]. The rows of X provide the latent features for the vertices in V . ii. The edges of G are present or absent independently, with {i, j} E(G) with probability (XXT )i,j. Written succinctly, G ER(XXT ). We can view the RDPG model as a example of the more general latent position random graph model (Hoffet al., 2002), in which edge probabilities are determined by hidden vertex-level geometry. Estimating the latent position structure in RDPGs is particularly amenable to spectral methods, and this model has played a prominent role in recent theoretical developments of spectral graph methods (see, for example, Rohe et al., 2011; Sussman et al., 2012; Tang On Consistent Vertex Nomination Schemes et al., 2017a). Note that the RDPG can be extended to a broader class of models, in which edge probabilities are given by evaluating a positive definite link function at vertices latent positions as in, for example, Tang et al. (2013). While incorporating this more general family of latent position graphs into the present VN framework would be straightforward, we restrict our focus to the RDPG model of Definition 5 for ease of exposition. 1.3.4. Correlation across networks The vertex nomination problem we consider in this paper presupposes the existence of a vertex of interest in a network G1 and, ideally, a corresponding vertex of interest in a second network G2. Often, such correspondences across networks are encoded into random graph models via edge-wise graph correlation (see, for example, Fishkind et al., 2019). Arguably the simplest such structure is seen in the ρ-correlated Erd os-R enyi model (see, for example, Lyzinski et al., 2015). Definition 6 We say that bivariate random graphs (G1, G2) Gn Gn are an instantiation of a ρ-correlated ER(P) model, written (G1, G2) ρER(P), if i. Marginally, G1 ER(P) and G2 ER(P). ii. Edges are independent across G1 and G2 except that the indicators of the events {u, v} E(G1) and {u, v} E(G2) are jointly distributed as a pair of Bernoulli random variables with success probability Pu,v and correlation ρ. If the correlation is allowed to vary across edges, so that these two events have correlation ρu,v, then collecting these correlations in a symmetric matrix R = [ρi,j]n i,j=1, we write (G1, G2) R -ER(P) (see Lyzinski and Sussman, 2018). Ranging the values in R from 0 to 1 allows for the consideration of graphs that range from independent (R = 0) to isomorphic (R = 1). Intermediate values of R allow for the encoding of a correspondence across networks between these two extremes. We will also consider R < 0, in which case edges across networks are anti-correlated. This is particularly useful for modeling situations in which corresponding vertices stochastically behave differently across networks. 2. Vertex Nomination Loosely stated, the vertex nomination problem we consider in this paper can be summarized as follows: Given a vertex of interest v in a graph G1 = (V1, E1), find the corresponding vertex of interest u (if it exists) in a second graph G2 = (V2, E2) by ranking the vertices of G2 according to our confidence that they correspond to v in graph G1. To formally define this version of vertex nomination, we will need to consider distributions on graphs with partially-overlapping node sets that have a built-in notion of vertex correspondence across graphs. To this end, we will consider distributions on Gn Gm, where Gn is the set of labeled graphs on n vertices, with vertex labels given by {v1, v2, . . . , vn}, and Gm is the set of labeled graphs on m vertices, with vertex labels given by {u1, u2, . . . , um}. Note that for i [n] [m], vi and ui are merely vertex labels, and it is not necessarily the case that vi = ui. We follow this labeling convention in order to emphasize the reality that the vertex sets of G1 and G2 may only partially overlap. Lyzinski, Levin, and Priebe Definition 7 (Nominatable Distributions) For a given n, m Z > 0, the set of Nominatable Distributions of order (n, m), denoted Nn,m, is the collection of all families of distributions of the following form F(n,m) Θ = {F (n,m) c,θ s.t. 0 c min(n, m) Z, θ Θ Rd(n,m)} where F (n,m) c,θ is a distribution on Gn Gm parameterized by θ Θ satisfying: i. The vertex sets V1 = {v1, v2, . . . , vn} and V2 = {u1, u2, . . . , um} satisfy vi = ui for 0 < i c. We refer to C = {v1, v2, . . . , vc} = {u1, u2, . . . , uc} as the core vertices. These are the vertices that are shared across the two graphs and imbue the model with a natural notion of corresponding vertices. ii. Vertices in J1 = V1 \ C and J2 = V2 \ C, satisfy J1 J2 = . We refer to J1 and J2 as junk vertices. These are the vertices in each graph that have no corresponding vertex in the other graph. iv. The induced subgraphs G1[J1] and G2[J2] are conditionally independent given θ. A few examples will serve to illustrate this definition. We will return to the three example settings below several times throughout the rest of the paper in order to highlight and illustrate phenomena of interest. Example 1 (R -ER(P)) Let (G1, G2) R -ER(P) with P, R Rn n and R > 0 entrywise, so that G1 and G2 have correlated edges as described in Section 1.3.4. In this example, the model parameter is θ = (P, R), and the vertex sets of the two graphs can be thought of as fully overlapping, i.e., V1 = V2 = C = [n] and J1 = J2 = , since the correlation structure conveyed in the entries of R encodes an explicit correspondence between the edges of G1 and the edges of G2 (and hence also a correspondence between V1 and V2). Note that if we consider C = [k] with k < n, then we would require (after suitably ordering the vertices) Ru,v = 0 for u, v > k. This highlights the way in which θ (and hence the distribution F (n,m) c,θ ) can vary with c, and vice-versa. Example 2 (RDPG) Let m > n and suppose that Y Rm d has distinct rows and satisfies (Y Y T )i,j [0, 1] for all i, j [m]. Let X Rn d be a submatrix of Y , and consider G1 RDPG(X) and G2 RDPG(Y ), where G1 and G2 are conditionally independent given Y . In this example, we can consider θ = Y , V1 = [n] = C, J1 = , V2 = [n] J2, and J2 = {un+1, un+2, . . . , um}. Note that as G1 and G2 are conditionally independent given θ, we could also consider 0 < c < n here as well. This illustrates that θ need not necessarily vary with c, and hence F (n,m) c,θ need not vary with c, either. Example 3 (Independent Erd os-R enyi graphs) Let (G1, G2) be independent ER(n, p) random graphs. In this example, we can consider any c [n]. Note that if c = 0 here, then there is no corresponding vertex of interest in G2, and this example serves as a natural boundary case between models in which nomination is possible and those in which it is not. As we will see below in Theorem 23, c > 0 may still yield chance performance for any nomination scheme, and the existence of a vertex correspondence does not necessarily imply any performance guarantees. On Consistent Vertex Nomination Schemes Remark 8 In addition to the edge-independent and conditionally edge-independent network models considered above, the class of nominatable distributions contains a host of other popular random graph models, including the Exponential Random Graph Model (Frank and Strauss, 1986; Snijders et al., 2006; Robins et al., 2007), the preferential attachment model (Albert and Barab asi, 2002), and the Watts-Strogatz small world model (Watts and Strogatz, 1998), among others. Indeed, if we consider the case where c = n = m, then any parametric distribution on Gn Gn is a nominatable distribution. Remark 9 The core vertices C in a nominatable distribution correspond to the vertices that can be sensibly identified across graphs. Note that this set does not require any further structure, aside from the conditional independence of G1[J1] and G2[J2] given the parameter θ. Thus, we are largely free to specify any notion of correspondence we please. Depending on the application, this correspondence may be that of vertices playing similar structural roles, belonging to the same community, or some more complicated application-specific notion of correspondence. That is, the notion of cross-graph correspondence, and hence the notion of vertex similarity, is largely left to the practitioner to specify when she or he specifies an appropriate random graph model. Given a pair of graphs (G1, G2) F (n,m) c,θ Nn,m, if the vertices in C are known across graphs then identifying the corresponding vertex to v C is immediate from the vertex labels. In practice, this information is unknown and the correspondences across graphs are only partially observed or even unobserved entirely. To model this added uncertainty, we consider passing the vertex labels of G2 through an obfuscating function. Definition 10 Let (G1, G2) F (n,m) c,θ Nn,m. An obfuscating function o : V2 W is a bijection from V2 to W with W Vi = for i = 1, 2. We call a set W satisfying W Vi = for i = 1, 2 an obfuscating set, and for a given obfuscating set W, we let OW be the set of all obfuscating functions o : V2 W. Here, o models the practical reality that the correspondence of labels across graph is unknown a priori. Note that to ease notation, we shall write o(G2) (resp., o(g2) and o(Gm)) to denote the graph G2 (respectively, g2 and Gm) whose labels have been obfuscated via o. Before defining a VN scheme, we must make one additional definition: for a graph g Gm and u V (g), define I(u; g) = {w V (g) s.t. an automorphism σ of g, s.t. σ(u) = w}. Note that by taking σ to be the identity, we have u I(u; g). The vertices in I(u; g) are those that are, in a sense, topologically equivalent to the vertex u in g, and hence, in the absence of labels, indistinguishable from one another. As such, any sensibly-defined vertex nomination scheme should view all vertices in I(u; g) as being equally good matches to a vertex of interest v . Thus, a well-defined VN scheme should be label-independent in the following sense: The set of ranks of each set of equivalent vertices (i.e., each I(u; g2)) needs to be invariant to the particular choice of obfuscating function; see Figure 3 for an illustration of this consistency criterion. Formally, we have the following. Lyzinski, Levin, and Priebe Figure 3: An illustration of the label-independence property of VN schemes. If the blue vertex in o1(g2) (resp., o2(g2)) is o1(u) (resp., o2(u)) for u V2, then we require the ranks of I(o1(u); o1(g2)) (outlined in red in the network o1(g2) and colored red/blue in the ordering provided by Φ) to be equal to the ranks of I(o2(u); o2(g2)) (outlined in grey in the network o2(g2) and colored grey/blue in the ordering provided by Φ). Indeed, the set of ranks of o(I(u; g2)) via Φ is independent of the choice of obfuscation function o. Definition 11 (Vertex Nomination (VN) Scheme) For a set A, let TA denote the set of all total orderings of the elements of A. For n, m > 0 fixed, obfuscating set W, and obfuscating function o OW , a vertex nomination scheme is a function Φ : Gn o(Gm) V1 TW satisfying the following consistency property: If for each u V2, we define rankΦ(g1,o(g2),v ) o(u) to be the position of o(u) in the total ordering provided by Φ(g1, o(g2), v ), and we define rΦ : Gn Gm OW V1 2V2 7 2[m] via rΦ(g1, g2, o, v , S) = {rankΦ(g1,o(g2),v ) o(u) s.t. u S}, then we require that for any g1 Gn, g2 Gm, v V1, obfuscating functions o1, o2 OW and any u V (g2), rΦ g1, g2, o1, v , I(u; g2) = rΦ g1, g2, o2, v , I(u; g2) (2) o2 o 1 1 h I Φ(g1, o1(g2), v )[k]); o1(g2) i = I Φ(g1, o2(g2), v )[k]; o2(g2) , for all k [m], where Φ(g1, o(g2), v )[k] denotes the k-th element (i.e., the rank-k vertex) in the ordering Φ(g1, o(g2), v ). We let Vn,m denote the set of all such VN schemes. On Consistent Vertex Nomination Schemes a. Well-defined scheme b. Not well-defined scheme b. Not well-defined scheme Φ a. Well-defined scheme Φ Figure 4: An example of the consistency criterion, Equation (2), in action. The left panel (a) shows a well-defined nomination scheme while the right panel (b) shows an ill-defined scheme. The key in this example is that any scheme satisfying Equation (2) must have rΦ(g1, g2, o1, v , {1}) = rΦ(g1, g2, o2, v , {1}); rΦ(g1, g2, o1, v , {3}) = rΦ(g1, g2, o2, v , {3}); rΦ(g1, g2, o1, v , {2}) = rΦ(g1, g2, o2, v , {2}); and rΦ(g1, g2, o1, v , {4, 5}) = rΦ(g1, g2, o2, v , {4, 5}). Given (G1, G2) F (n,m) c,θ Nn,m realized as G1 = g1 and G2 = g2 with v V1 the vertex of interest in G1, a VN scheme Φ( , , ) produces a ranked list Φ(g1, o(g2), v ) of the vertices of o(g2) (i.e., the set W), ordered according to how likely each vertex in V (o(g2)) is judged to correspond to v , with optimal performance corresponding to Φ(g1, o(g2), v )[1] = ( o(v ) if v C arbitrary v W if v / C. Less formally, one can think of a VN scheme as ranking the vertices of G2 according to how well they resemble the vertex of interest v under some task-dependent measure. Remark 12 Note that if u V2 is such that I(u; g2) = {u} (i.e., u is topologically distinct within g2), then Equation (2) implies that rankΦ(g1,o1(g2),v ) o1(u) = rankΦ(g1,o2(g2),v ) o2(u) for any o1, o2 in OW . If I(u; g2) contains vertices in addition to u, then Equation (2) implies that the set of vertices topologically equivalent to u (namely, those in I(u; g2)) must achieve the same ranks via Φ under any two obfuscating functions; see Figure 4 for a simple example of this consistency criterion in action. Remark 13 (Relation to Fishkind et al. (2015); Lyzinski et al. (2016)) Recall the one-graph vertex nomination task considered in earlier works (Coppersmith, 2014; Fishkind et al., 2015; Lyzinski et al., 2016) and described in Section 1, in which vertices are Lyzinski, Levin, and Priebe considered interesting precisely when they belong to one of K communities in a stochastic block model. While the two-graph VN formulation we consider in the present work (modulo symmetries) involves a single vertex of interest across graphs, the framework is easily extended to the setting where one may have multiple vertices of interest (and not of interest), and in particular it can encode instances of the one-graph version VN problem. To see this, consider an instance of the single-graph VN problem on graph G = (V, E) where V is partitioned into K communities as V = V1 V2 VK and each of the communities is comprised of labeled (i.e., seed vertices, whose community memberships are observed) and unlabeled (i.e., nonseed, whose community memberships are unobserved) vertices, Vk = Sk Uk, where Sk Vk is the set of seeds from the k-th block and Uk Vk is the set of nonseed vertices. We can encode this one-graph VN instance as an instance of the two-graph problem by encoding additional information in the graph G1. Construct a vertex set V = V {ℓ1, ℓ2, . . . , ℓK}. The K new vertices {ℓk}K k=1 will encode the label information present in the graph G. Let E = E L, where L = {{ℓk, s} : s Sk, k [K]}, so that edges connect from seed vertices in S = S1 S2 SK to their corresponding label vertices. Take G1 = (V , E ), and let the interesting vertices (and possible uninteresting vertices) be given by the elements of S V . The second graph G2 is then the subgraph of G induced by the unlabeled vertices U V passed through an appropriate obfuscating function. This pair (G1, G2), with any s S1 chosen to be the interesting vertex, encodes the label information present in the one-graph VN problem as well as the graph structure of G, as required. 3. Bayes Error and Bayes Optimality in Vertex Nomination Viewing a VN scheme as an information retrieval system suggests that a scheme that puts o(v ) close to the top of the nomination list is potentially of great practical value, even if it fails to obtain perfect performance. Motivated by this, we adapt the recall-at-k metric from classical information retrieval as a measure of performance. To wit, we define the level-k loss function and error for VN as follows. Definition 14 (VN loss function, level-k error) Let Φ Vn,m be a vertex nomination scheme and o an obfuscating function. For (g1, g2) realized from (G1, G2) F (n,m) c,θ Nn,m with vertex of interest v C, and for k [m 1], we define the level-k nomination loss via ℓk(Φ, g1, g2, o, v ) = 1{rankΦ(g1,o(g2),v )(o(v )) k + 1} = 1 1{rankΦ(g1,o(g2),v )(o(v )) k}. (3) The level-k error of Φ at v is then defined to be Lk(Φ, v ) = E(G1,G2) F (n,m) c,θ [ℓk(Φ, G1, G2, o, v )] = P(G1,G2) F (n,m) c,θ rankΦ(G1,o(G2),v )(o(v )) k + 1 . (4) On Consistent Vertex Nomination Schemes From the definition of the level-k error in Eq. (4), it is immediate that L1(Φ, v ) = 1 P(G1,G2) F (n,m) c,θ rankΦ(G1,o(G2),v )(o(v )) = 1 L2(Φ, v ) = 1 P(G1,G2) F (n,m) c,θ rankΦ(G1,o(G2),v )(o(v )) {1, 2} L3(Φ, v ) = 1 P(G1,G2) F (n,m) c,θ rankΦ(G1,o(G2),v )(o(v )) {1, 2, 3} Lm 1(Φ, v ) = 1 P(G1,G2) F (n,m) c,θ rankΦ(G1,o(G2),v )(o(v )) [m 1] , The level-1 loss function is analogous to the classical 0/1 loss function in classification, as L1(Φ, v ) is simply the probability that Φ fails to classify o(v ) as the vertex corresponding to v in o(G2) (i.e., fails to rank it first). Considering 1 < k m enables us to model the practical loss associated with using a VN scheme to search for o(v ) in o(V2) given limited resources. Remark 15 Unlike in the classification setting described in Section 1.2, where LF (hn) is a random variable indexed by n, the nomination errors defined in Definition 14 are sequences indexed by m and n and are not random. In the classical setting, LF (hn) denotes the error rate of a classifier that classifies a single observation X based on n training instances {(Xi, Yi)}n i=1. In the case of VN, the notion of labeled training instances is, at best, more hazy. Indeed, in the present setting, the training data and test data are inseparable. The graphs (or, more specifically, their edges) are the training data, and in the present work, the graph orders n, m are better thought of as measuring problem dimension rather than training set size. Analogous to the classification literature, we are now able to define the concept of Bayes optimality in the VN framework. Definition 16 (Bayes error of a VN scheme) Let (G1, G2) F (n,m) c,θ with vertex of interest v C (where we recall that C is the set of core vertices; see Definition 7), and let o OW be an obfuscating function. For k [m 1], we define the level-k Bayes optimal VN scheme to be any element Ψ arg minΦ Vn,m Lk(Φ, v ), and define the level-k Bayes error to be L k(v ) = Lk(Ψ, v ) for level-k Bayes optimal Ψ. Now that we have a notion of Bayes error for VN, it is natural to ask whether an optimal VN scheme exists analogous to the Bayes optimal classifier of Equation (1). Toward this end, let (g1, g2) be realized from (G1, G2) F (n,m) c,θ Nn,m, and consider a vertex of interest v C and obfuscating function o : V2 W. In order to avoid the technical complexities associated with graph automorphisms, in what follows we will assume that F (n,m) c,θ Nn,m is supported on Ga n Ga m, where Ga n (resp., Ga m) is the set of asymmetric graphs in Gn (resp., Gm). For analogous results in networks with symmetries, see Remark 18. Letting denote graph isomorphism, for (g1, g2) Gn Gm define the set (g1, [o(g2)]) = g1, g2 s.t. o( g2) o(g2) = g1, g2 s.t. g2 g2 . (6) Lyzinski, Levin, and Priebe In order to define the Bayes optimal scheme, we will also need the following restriction of (g1, [o(g2)]): for each w W, we define (g1,[o(g2)])w=o(v ) = n g1, g2 s.t. isomorphism σ s.t. o( g2) = σ(o(g2)), σ(w) = o(v ) o . (7) We are now ready to define a Bayes optimal VN scheme. For ease of notation, in the sequel we will write PF (n,m) c,θ or even simply P in place of P(G1,G2) F (n,m) c,θ where there is no risk of ambiguity. Let g = n g(i) 1 , g(i) 2 oh be such that the sets n g(i) 1 , [o(g(i) 2 )] oh i=1 partition Ga n Ga m. We will call this partition Pn,m, where we suppress dependence on g and o for ease of notation. We will define a Bayes optimal scheme Φ (independent of the choice of g) piecewise on each element of this partition, and we will prove in Theorem 17 that Φ is level-k Bayes optimal for all k [m 1]: Φ g(i) 1 , o(g(i) 2 ), v [1] argmax u W P h g(i) 1 , [o(g(i) 2 )] g(i) 1 , [o(g(i) 2 )] i Φ g(i) 1 , o(g(i) 2 ), v [2] argmax u W\{Φ [1]} P h g(i) 1 , [o(g(i) 2 )] g(i) 1 , [o(g(i) 2 )] i Φ g(i) 1 , o(g(i) 2 ), v [m] argmax u W\{ j [m 1]Φ [j]} P h g(i) 1 , [o(g(i) 2 )] g(i) 1 , [o(g(i) 2 )] i , (9) with ties broken arbitrarily but deterministically. We refer the interested reader to Appendix B.1 for discussion of the case where ties are allowed in the ranking function. For each element (g1, g2) g(i) 1 , [o(g(i) 2 )] \ n g(i) 1 , g(i) 2 o , choose the permutation σ such that o(g2) = σ(o(g(i) 2 )), and define Φ (g(i) 1 , o(g2), v ) = σ(Φ (g(i) 1 , o(g(i) 2 ), v )). Lastly, the following theorem shows that this scheme (uniquely defined up to tiebreaking) is indeed Bayes optimal. A proof can be found in Appendix A.1. Theorem 17 Let o OW be an obfuscating function, and let g = n g(i) 1 , g(i) 2 oh be such that the sets n g(i) 1 , [o(g(i) 2 )] oh On Consistent Vertex Nomination Schemes partition Ga n Ga m. Let Φ = Φ g be as defined in Equation (9). Suppose that (G1, G2) F (n,m) c,θ Nn,m with F (n,m) c,θ supported on Ga n Ga m, and consider a vertex of interest v C. We have that Lk(Φ , v ) = L k(v ) for all k [m 1], partitions g, and all obfuscating functions o. Remark 18 The effect of symmetries on Theorem 23 is both subtle and cumbersome, as the specific tie-breaking procedures used in the analogue of Eq. (9) are of great import. To this end, consider g to be defined as above, and let T TW be the ordering that specifies the (fixed but otherwise arbitrary) scheme by which elements within each I(v; o(g2)) are ordered. Informally, we will first rank the sets I(v; o(g2)) (rather than the individual vertices), and then use T to rank within and across each of the I(v; o(g2)). Full detail is provided below. For each w W and v V (g2), define (g1, [o(g2)])I(w;o(g2))=o(v) = n g1, g2 s.t. iso. σ with o( g2) = σ(o(g2)), and σ(u) = o(v) for some u I(w; o(g2)) o . (10) As above, we will define the Bayes optimal VN scheme on each element of the partition provided via g. We first define a ranking Ψ of the sets g(i) 1 , [o(g(i) 2 )] I(u;o(g(i) 2 ))=o(v ) indexed by u, and then will use T to give the total ordering from Ψ. To wit, for each i [k], iteratively define (where ties in the argmax are broken in an arbitrary but nonrandom manner) Ψ g(i) 1 , o(g(i) 2 ), v [1] argmax I(u;o(g(i) 2 )) W P g(i) 1 , [o(g(i) 2 )] I(u;o(g(i) 2 ))=o(v ) g(i) 1 , [o(g(i) 2 )] Ψ g(i) 1 , o(g(i) 2 ), v [2] argmax I(u;o(g(i) 2 )) W\{Ψ[1]} P g(i) 1 , [o(g(i) 2 )] I(u;o(g(i) 2 ))=o(v ) g(i) 1 , [o(g(i) 2 )] Ψ g(i) 1 , o(g(i) 2 ), v [ℓ] argmax Io(g2)(u) W\{ ℓ 1 j=1Ψ[j]} P g(i) 1 , [o(g(i) 2 )] I(u;o(g(i) 2 ))=o(v ) g(i) 1 , [o(g(i) 2 )] . (11) For each element (g1, g2) (g(i) 1 , [o(g(i) 2 )]) \ {(g(i) 1 , g(i) 2 )}, choose an isomorphism σ such that o(g2) = σ(o(g(i) 2 )), and define Ψ(g1, o(g2), v ) = σ(Ψ(g(i) 1 , o(g(i) 2 ), v )). Note that the choice of isomorphism σ does not impact the definition of Ψ. Lyzinski, Levin, and Priebe For each (g1, g2) Gn Gm, we define a VN scheme Φ T from Ψ as follows: 1. Initialize Φ T (g1, o(g2), v ) as an empty list; initialize j = 1; 2. If Ψ(g1, o(g2), v )[j] is nonempty, add the top ranked (according to T) element from Ψ(g1, o(g2), v )[j] to the end of Φ T (g1, o(g2), v ), else do nothing; set j = j + 1 (mod |Ψ(g1, o(g2), v )|) 3. Repeat Step 2 until there are no more vertices to add to Φ T (g1, o(g2), v ). If T[1] = o(v ), then Φ T (g1, o(g2), v ) is Bayes optimal (as in Theorem 17) in the sense of Definition 16. See Appendix A.1 for details. Example 1, continued. Let (G1, G2) R-ER(P) for P, R Rn n. Under mild model assumptions, we have that limn L k(v ) = 0 for any fixed k. This is due to the fact that the optimal graph matching of G1 to o(G2) will almost surely recover the true vertex labels of o(G2) for n suitably large; i.e., argmin Q Πn AQ QB F = {In} with probability 1, where Πn is the set of n n permutation matrices, A is the adjacency matrix for G1 and B the adjacency matrix for G2. More concretely, we have the following theorem adapted to our present setting from Lyzinski and Sussman (2018). A proof sketch can be found in Appendix B. Theorem 19 Let (A, B) R-ER(P), and for any fixed permutation matrix Q define the random variable δ(Q) := AQ QB F . Define 0 < ϵ := mini,j;i =j 2Ri,j Pi,j(1 Pi,j). There exist positive constants c1, c2 such that if ϵ2 > c1 log(n)/n, then for sufficiently large n, P ( Q Πn \ {In} s.t. δ(Q) δ(In)) 2 exp c2ϵ2n . Similarly to the Bayes optimal scheme, we define the graph matching VN scheme, denoted ΦM, separately on each element of Pn,n. For a given ( g1, [o( g2)]) Pn,n, let (g1, g2) be an fixed element in ( g1, [o( g2)]). Define ΦM(g1, o(g2), v )[1] to be a fixed but arbitrary element from r 1(v ) s.t. Qr argmin Q Πn AQ QB F , (12) where each r : W V2 appearing above is a bijection and Qr its associated permutation matrix (having identified both W and V2 with the set [n]). Define R1 = n r : W V2 s.t. r is a bijection with r 1(v ) = ΦM(g1, o(g2), v )[1] o . If j > 1, define ΦM(g1, o(g2), v )[j] to be any element of n r 1(v ) s.t. Qr argmin Q Π(n)\{Qr:r Sj 1 ℓ=1 Rℓ} AQ QB F o , where Rj is defined analogously to R1. For each element (g 1, g 2) ( g1, [o( g2)]) \ {(g1, g2)}, choose a permutation σ such that o(g 2) = σ(o(g2)), and we then define ΦM(g1, o(g 2), v ) = σ(ΦM(g1, o(g2), v )). Theorem 19 states that under mild model assumptions, we have that ΦM(G1, o(G2), v )[1] = o(v ) asymptotically almost surely, and thus limn L1(ΦM, v ) = 0. Indeed, in this setting, for any fixed k 1, limn Lk(ΦM, v ) = 0. It is then immediate On Consistent Vertex Nomination Schemes that limn L k(v ) = 0 in this model for any fixed k. The next two examples serve to illustrate how the level-k Bayes error behaves in the presence of stochastically indistinguishable vertices. In essence, we cannot hope to perform better than randomly ordering stochastically equivalent vertices. Example 3, continued. Let G1 and G2 be independent ER(n, p) graphs. Since the vertices are stochastically indistinguishable within each of the two graphs, no nomination scheme can do better than random chance in this model. Thus, with c = n, we have that L k(v ) = (1 k/n)(1 + o(1)) for all k [n 1] and all v [n]. Example 4 Let p1, p2, q [0, 1] with 1 p1 > p2 0 and q = p1, p2. Define the matrix B = p1 q q p2 and let G1, G2 be independent SBM (2, B, bn) graphs where bn(i) = 1 if i n/2 (n even) and bn(i) = 2 if i > n/2. With c = n and the correspondence equal to the identity function, let (kn) n=2 be a nondecreasing divergent sequence satisfying kn n/2 for all n > 1, then limn L kn(v ) = limn [(1 2kn/n) 0] for all v . Indeed, L kn is asymptotically equivalent to randomly ordering the n/2 vertices in G2 that are stochastically equivalent to v . 4. VN Consistency With the definition of Bayes optimality and the Bayes optimal scheme in hand, it is now possible to define a notion of consistent vertex nomination analogous to consistent classification in the pattern recognition literature. Before defining a consistent VN rule (i.e., a sequence of VN schemes), we must first define the notion of sequences of nominatable distributions with nested cores. Such sequences of distributions are necessary in order to speak sensibly of a sequence of vertex nomination problem instances. Definition 20 Let F = F (n,mn) cn,θn n=n0 be a sequence of distributions such that F (n,mn) cn,θn Nn,mn for all n. We say that F has nested cores if for all n0 n1 < n2, if (G1, G2) Fn1 = F (n1,mn1) cn1,θn1 and ( e G1, e G2) Fn2 = F (n2,mn2) cn2,θn2 , we have, letting C1 and C2 be the core vertices associated with Fn1 and Fn2 respectively, and denoting the junk vertices J1,1, J2,1, J1,2, e J2,2 analogously, [i.] V (G1) = C1 J1,1 V ( e G1) = C2 J2,1; [ii.] V (G2) = C2 J1,2 V ( e G2) = C2 J2,2; [iii.] C1 C2. We are now ready to define a consistent VN rule. Definition 21 Let F = Fn = F (n,mn) cn,θn Lyzinski, Levin, and Priebe be a sequence of distributions such that Fn Nn,m(n) with nested cores satisfying limn mn = . For a given non-decreasing sequence (kn), we say that a VN rule Φ = (Φn,mn) n=n0 is level-(kn) consistent at v with respect to F if lim n Lkn(Φn,mn, v ) L kn(v ) = 0, for any sequence of obfuscating functions of V2 with |V2| = mn. If a rule Φ is level-(kn) consistent at v for a constant sequence kn = k, n = 1, 2, . . . , then we say simply that Φ is level-k consistent. Remark 22 Equation (5) has an interesting implication for VN consistency in the setting where L kn(v ) 0. In this case, level-(kn) consistency of a VN rule Φ implies that Φ is (k n)-consistent for all (k n) such that lim inf k n kn 1. We conjecture that this implication holds true for the case where L kn(v ) c > 0, but this problem remains open at present. Example 1, continued. Let F = (F (n,n) n,θn=(Pn,Rn)) be a sequence of Rn-ER(Pn) random graph models in Nn,n for some sequence of probability matrices (Pn) n=n0 and correlation matrices (Rn) n=n0. Under mild model assumptions (see Theorem 19), the graph matching vertex nomination rule ΦM defined in Equation (12) above is level-1 consistent, and hence level-(kn) consistent for all (kn) sequences. Example 3, continued. Let F = (F (n,n) n,θn=p) be a sequence of independent ER(n, p) random graph models in N. All vertex nomination rules are level-(kn) consistent for all (kn) sequences. This holds for all possible values of c [n] in the nested sequence of ER(n, p) distributions, as all VN rules have effectively chance performance, regardless of core size under this model. We define the consistency of a VN rule with respect to a broad class of graph sequences, and it is perhaps no surprise that there cannot be any level-(kn) universally consistent VN rules. Indeed, even for constant sequences kn := k (i.e., those that are level-(kn) consistent for all sequences of nominatable distributions F with nested cores) there cannot be any level-k universally consistent VN rules. To prove this result, we will first establish an analogue to the arbitrary poor performance theorems for classifiers (see, e.g., Devroye et al., 1997, Theorem 7.1), which state that for a fixed n and m, any VN scheme can be shown to have arbitrarily poor performance with respect to a well-chosen adversarial distribution F (n,m) c,θ . Our theorem mirrors the classical classification literature, as for a given classification rule, there exists a sufficiently complex distribution for which the sample size n is hopelessly small (Devroye et al., 1997, pg. 111), so that a classification rule can be made to perform arbitrarily poorly by selecting a suitably complex data distribution. Nonetheless, in the case of classification, this model complexity and the implicit dependence on n can be overcome asymptotically by a classification rule. That is, universally consistent classifiers exist (see, for example, Stone, 1977; Steinwart, 2002; Tang et al., 2013). In contrast, in the VN problem, the complexity of the model generating the data can also grow in n, which effectively thwarts the ability of a VN rule to asymptotically overcome a sequence of adversarial graph models. Formalizing the above, we arrive at the following theorem, a proof of which can be found in Appendix A.2. On Consistent Vertex Nomination Schemes Theorem 23 Let n and m be large enough to guarantee the existence of asymmetric graphs g1 Gn and g2 Gm. Consider a VN scheme Φ Vn,m, obfuscating function o, and strictly increasing sequence (ϵi)m i=1 satisfying ϵi (0, i m). Then there exists a distribution F (n,m) c,θ Nn,m over Gn Gm and v C such that for each k [c 1], L k(v ) ϵm k < 1 k m < 1 ϵk < Lk(Φ, v ), m represents the error probability of chance performance, i.e., the error probability of a VN scheme in the independent Erd os-R enyi setting. In the remainder of the section, we will suppress the dependence of m = mn on n. If we consider sequences (ϵm,i)m i=1 satisfying the assumptions of Theorem 23 and limn ϵm,m kn = ϵ (0, 1) for a given (kn) satisfying kn = o(m), we arrive at the following Corollary, namely that level-(kn) universally consistent VN schemes do not exist for any sequence (kn) n=n0 that does not grow as fast as m = |V (G2)|. Corollary 24 Let ϵ (0, 1) be arbitrary, and consider a VN rule Φ = (Φn,m). For any nondecreasing sequence (kn) n=n0 satisfying kn = o(m), there exists a sequence of distribu- tions (F (n,m) c,θ Nn,m) n=n0 with nested cores such that lim sup n L kn(v ) = ϵ < 1 = lim n Lkn(Φn,m, v ). Corollary 24 has a number of practical implications. Below, we will briefly outline two such implications. Unlike in the classification setting, where universally consistent rules (e.g., k-nearest neighbors) are theoretically guaranteed to perform well in big-data settings, the VN practitioner enjoys no such certainty. Indeed, in VN, the practitioner first needs to identify the consistency class of a VN rule (i.e., the set of models for which the VN rule is consistent) before applying it in real settings. Unfortunately, identifying and enumerating these consistency classes is theoretically and practically nontrivial, and we are investigating theory and heuristics for this at present. In a streaming data setting, the performance of a universally consistent classifier will approach Bayes optimality for the distribution governing the data, and the classifier will be guaranteed to successfully adapt itself to any changes in the underlying data distribution. The lack of universal consistency in the VN setting implies that his is not the case for graph data, as the performance, as the performance of a consistent VN scheme in the streaming setting could precipitously decline in the presence of distributional shifts in the data. Recognizing these shifts and their potential impact on VN performance is paramount and is the subject of current research. 4.1. Global Consistency We have just seen that no universally consistent VN schemes exist. This is a consequence of the complexity of the models available when choosing a sequence of nominatable distributions. Indeed, nested-core nominatable sequences F allow for (nearly) arbitrary dependence structure and model complexity as n increases: corresponding vertex behavior may be correlated (see Example 1), independent (see Example 4), or negatively correlated (see Example 5) across networks. This model flexibility is in service of modeling the complexity of real Lyzinski, Levin, and Priebe world networks, but, as we will demonstrate below, restricting our model class to simpler dependency structures still does not necessarily guarantee the existence of universally consistent schemes. It is thus natural to explore a weaker notion of consistency, namely consistency for a sufficiently large family of nominatable sequences rather than for all nominatable sequences. Definition 25 Let F = Fα = F (n,mn),α cn,θn be a family of nominatable sequences, indexed by some set A. We say that VN scheme Φ is level-(kn) F-globally consistent if Φ is level-(kn) consistent for every F F. We call such a family level-(kn) globally consistent. The question of the maximal family F for which a level-(kn) F-globally consistent rule exists is of prime interest. While we cannot offer a satisfactory complete answer to that question in the present work, we do offer some examples of globally consistent families. Example 1 continued: In settings where corresponding vertices have correlated neighborhood structures across networks, there is hope for finding globally consistent rules. In the ongoing Example 1, we have seen a simple example of this in the R-ER(P) model, in which the matrix of correlations R encodes a correspondence across the two graphs. As mentioned previously, Theorem 19 asserts that under some mild model assumptions on R and P in the R-ER(P) model, level-1 globally consistent VN rules exist (namely the graph matching VN scheme). If F denotes the set of distributions obeying these model assumptions, then we have that level-(kn) F-globally consistent rules exist for all sequences (kn). While we do not expect the conditions of Theorem 19 to produce a maximal level-(kn) globally consistent family for any given (kn), this example nonetheless provides an important intuition for the properties such maximal families might possess. Example 4 continued: The SBM provides a prime example of global consistency. Working in the one-graph framework of Remark 13, under appropriate growth conditions on the parameters of every sequence in family F, Theorem 6 in Lyzinski et al. (2016) implies the existence of a likelihood-based nomination scheme that is level-(|U1|) globally consistent for this family of models. Under similar growth conditions, Theorem 6 in Lyzinski et al. (2014) implies the existence of a level-(|U1|) globally consistent scheme based on spectral clustering, in which vertices are nominated based on their proximity to the vertex or vertices of interest. Remark 26 An attempt at systematically constructing a maximal globally consistent family might begin by putting model restrictions onto elements of Nn,m. A natural restriction to consider would be to demand that the models in F be nested in the following sense: For F F, if (G1, G2) F (n2,mn2) cn2,θn2 with n1 < n2, then (G1 [n1] , G2 [mn1] ) L= (G 1, G 2) where (G 1, G 2) F (n1,mn1) cn1,θn1 . This property would allow us to consider streaming network models F, where for n1 < n2, if (g1, g2) is realized from (G1, G2) F (n2,mn2) cn2,θn2 , and (g 1, g 2) is realized from (G 1, G 2) F (n1,mn1) cn1,θn1 then (g1, g2) can be constructed by appropriately adding On Consistent Vertex Nomination Schemes n2 n1 vertices to (g 1, g 2). Additionally, this would serve to mimic the nested nature of the data in the classification consistency literature. However, as we will see in Example 5, global consistency depends both on the dependency structure within each graph (as seen in Theorem 23) and the vertex correspondence (i.e., the potential dependency structure across graphs) encoded in the model. 4.2. Behavioral (In)consistency and Global (In)consistency We suspect that if the vertices of interest have a common distinguishing probabilistic and/or topological characteristic (e.g., correlated neighborhoods, common SBM block structure, high network centrality, etc.) then a globally consistent rule may exist. Indeed, under mild model assumptions, this is the case in the R -ER(P) of Example 1; in the i.i.d. SBM of Example 4 where the correspondence is the identity function (Lyzinski et al., 2014); and in the i.i.d. ER of Example 3, to name a few. In each of these examples, there is a stochastic/topological similarity (or in the ER case, uniformity) between corresponding vertices across networks. In each, corresponding vertices behave similarly across networks. While we suspect that this behavioral similarity is not sufficient for global consistency, Example 5 demonstrates that behavioral inconsistencies within a family of nominatable distributions can preclude the existence of globally consistent nomination rules. Example 5 For each n, consider n-vertex random graphs G1 asym-SBM(2, B1, b(1) n ) independent of G2 asym-SBM(2, B2, b(2) n ), where asym-SBM denotes the stochastic blockmodel distribution restricted to have support on asymmetric graphs. This restriction is made to avoid the unpleasantries of symmetries, and is asymptotically negligible as the SBMs considered in this example are asymptotically almost surely asymmetric. Case 1. In this case, corresponding vertices behave similarly across networks. To wit, let F = (Fn) n=n0 be the sequence of models where B1 = B2 = p1 q q p2 , b(1) n (i) = b(2) n (i) = ( 1 if i n/2; 2 if i > n/2, p1 = p2, c = n, and the correspondence is the identity function. As stated before, in this model L n/2(v ) 0 for all v C. Without loss of generality, consider v = v1 = u1 = 1. If Φ is consistent with respect to F then PFn(rankΦ(G1,o(G2),v1)(o(u1)) n/2 + 1) 0. By the distributional equivalence of vertices within the same block, and the consistency property in the definition of a VN scheme, for any u, v b 1 n (1), k [n] we have that PFn(rankΦ(G1,o(G2),v1)(o(u)) = k) = PFn(rankΦ(G1,o(G2),v1)(o(v)) = k). Letting this common value be set to αk,n (with βk,n defined similarly as the common value of PFn(rankΦ(G1,o(G2),v1)(o(u)) = k) for u in block 2), we have that i=1 PFn(rankΦ(G1,o(G2),v1)(o(ui)) = k) = 1 = n(αk,n + βk,n) Lyzinski, Levin, and Priebe giving us that αk,n = 2/n βk,n. Consistency implies that k=1 αk,n 1, which implies that k=1 (2/n βk,n) = 1 k=1 βk,n 1, implying Pn/2 k=1 βk,n 0. Therefore, for any u in block 2, PFn(rankΦ(G1,o(G2),v1)(o(u)) n/2 + 1) 1. Case 2. In this case, corresponding vertices behave differently across networks. To wit, let F = ( Fn) n=n0 be the sequence of models where B1 = p1 q q p2 , B2 = p2 q q p1 , b(1) n (i) = b(2) n (i) = ( 1 if i n/2; 2 if i > n/2, c = n, and the correspondence is the identity function. As in Case 1 considered above, in this model L n/2(v ) 0 for all v C, and, as above, consider v = v1 = u1. If Φ is consistent with respect to F then P Fn(rankΦ(G1,o(G2),v1)(o(u1)) n/2 + 1) 0. Note that if σ is the permutation such that ( i + n/2 if i n/2; i n/2 if i > n/2, , then PFn(g1, g2) = P Fn(g1, σ(g2)). Define En = {(g1, g2) s.t. rankΦ(g1,o(g2),v1)(o(u1)) n/2 + 1}} i.e., and En = {(g1, g2) s.t. (g1, σ(g2)) En}, i.e., En = {(g1, g2) s.t. rankΦ(g1,o(g2),v1)(o(un/2+1)) n/2 + 1}. If Φ is consistent with respect to F we have that PFn(En) 0 which implies (as (g1, g2) En (g1, σ(g2)) En) P Fn( En) 0. If Φ is consistent with respect to F then P Fn(En) 0 and P Fn( En) 1. We arrive at a contradiction, and Φ cannot be (n/2)-consistent with respect to both F and F. On Consistent Vertex Nomination Schemes Although Example 5 may seem artificial, it is a simple representation of a common phenomenon observed in network data. Often the same entity can behave quite differently across networks; see Patsolic et al. (2017) for an example of this in social networks and Chen et al. (2016b) for an example of this in connectomics. In such a setting, intuition says that a universal scheme that works in both behavioral settings should not exist. Indeed, at least in the simple block model setting considered above, we see that no such scheme exists. Example 5 also highlights an important difference between the VN setting and the more standard classification framework. We already noted that classification s universal consistency relies on the distribution not changing in n, whereas in VN the distributions must vary with n (indeed, the graph sizes grow in n). Further, Example 5 shows that the nonexistence of a universally consistent scheme is not simply a consequence of changing the underlying distributional parameters with n, as these two SBM distributions are (essentially) fixed, in that the matrix B does not change with n. In this example the training data provided by G1 cannot be uniformly beneficial for a single VN scheme across the two differing model settings we consider. Contrast this with the classification setting of Devroye et al. (1997), where the training data uniformly provides progressively better estimates of the class-conditional distributions. Indeed, the training data helps delineate potentially interesting vertices from non-interesting ones in G2 in Case (1) for one VN scheme, and in Case (2) for another VN scheme, but there does not exist a VN scheme that achieves this desired class separation across both cases. Remark 27 In the cases considered in Example 5, if we introduce positive edge-wise correlation of min p1(1 p2) p2(1 p1), p2(1 p1) into both Case 1 and Case 2, then under mild assumptions on the growth of p1 and p2, joint consistency can be recovered via a USVT centered graph matching nomination scheme (see Lyzinski and Sussman, 2018, for details). This example demonstrates that it is sometimes possible to toggle a family of models to allow for global consistency. A note of caution is needed, however, as in this particular example the correlation ρ is introducing a behavioral consistency across networks that addresses the precise issue brought forth by the behavioral inconsistency in Example 5. In other, more nuanced model families, we do not expect the global-consistency modification (if it indeed exists) to be as straightforward as adding additional edge-correlation into the model. 4.3. Vertex Nomination on Networks with Node Covariates It is natural to ask if incorporating vertex features into the VN framework can resolve the lack of universally consistent VN schemes. While straightforward to implement, the ameliorating effect of features is significantly more nuanced. Before defining the VN scheme with features, we need the following extension of I(v; g) for g Gn and v V (g). Letting X be the space of vertex features for graphs in Gn, for g Gn, v V (g), and X X n we define e I(v; g, X) = {u V (g) : automorphism τ of g s.t. τ(v) = u and Xu = Xv}, where Xv is the feature associated to v via X. Lyzinski, Levin, and Priebe Definition 28 (Vertex Nomination (VN) Scheme with features) Let X (resp., Y) be the space of vertex features of graphs in Gn (resp., Gm). For n, m > 0 fixed, obfuscating set W and obfuscating function o OW , a vertex nomination scheme with features is a function Φ : Gn X n o(Gm) Ym V1 TW satisfying the following consistency property: If for each u V2, we define rankΦ(g1,X,o(g2),o(Y ),v ) o(u) to be the position of o(u) in the total ordering provided by Φ(g1, X, o(g2), o(Y ), v ), and we define rΦ : Gn X n Gm Ym OW V1 2V2 7 2[m] via rΦ(g1, X, o(g2), o(Y ), v , S) = {rankΦ(g1,X,o(g2),o(Y ),v ) o(u) s.t. u S}, then we require that for any g1 Gn, g2 Gm, v V1, X X n, Y Ym, obfuscating functions o1, o2 OW and any u V (g2), rΦ g1, X, o1(g2), o1(Y ), v , e I(u; g2, Y ) = rΦ g1, X, o2(g2), o2(Y ), v , e I(u; g2, Y ) . (13) We let e Vn,m denote the set of all such VN schemes. It is immediate that if the features are sufficiently informative, consistency can be established with features where it could not be without. Indeed, consider in Example 5 features that encode the community memberships of a few vertices (e.g., a few vertices whose correspondences across the two graphs are known a priori). Combined with spectral methods, these would be sufficient for consistent VN under either behavior regime. It is also immediate that the fundamental idea presented in Example 5 has an analogue when vertex features are available, as illustrated by the following example. Example 6 For each n, consider n-vertex random graphs G1 asym-SBM(3, B1, b(1) n ) independent of G2 asym-SBM(3, B2, b(2) n ), where asym-SBM again indicates the stochastic block model with support restricted to the asymmetric graphs. Case 1. In this case, corresponding vertices behave similarly across networks. To wit, let F = (Fn) n=n0 be the sequence of models where 3|n and p1 q q q p2 q q q p2 , b(1) n (i) = b(2) n (i) = 1 if i n/3; 2 if i (n/3, 2n/3] 3 if i > n/3, p1 = p2, c = n, the correspondence is the identity function, and v = 1 is in block 1. Case 2. In this case, corresponding vertices behave differently across networks. To wit, let F = (Fn) n=n0 be the sequence of models where 3|n and p1 q q q p2 q q q p2 p2 q q q p1 q q q p2 , b(1) n (i) = b(2) n (i) = 1 if i n/3; 2 if i (n/3, 2n/3] 3 if i > n/3, On Consistent Vertex Nomination Schemes p1 = p2, c = n, the correspondence is the identity function, and v = 1 is in block 1. Similar to Example 5, without features no VN scheme can be (n/3)-consistent for both Cases 1 and 2 simultaneously. In a similar fashion, if we consider features X and Y defined via ( 1 if b(1)(v) = 1 1 if b(1)(v) = 2, 3, then joint consistency is achievable for both Cases 1 and 2, for example by relying on features and ignoring graph structure. However, if we consider features X and Y defined via ( 1 if b(1)(v) = 1, 2 1 if b(1)(v) = 3, then joint consistency is again not achievable for both Cases 1 and 2 simultaneously. This example demonstrates that features, in general, are not enough to ensure universal consistency. Nevertheless, insofar as features supply additional information, they have the potential to improve VN performance. A more thorough examination of the effect and effectiveness of vertex features in VN is beyond the scope of this work, and is the subject of current research. 5. Discussion In this work, we have introduced a notion of consistency for the vertex nomination task that better reflects the broad range of models under which VN may be deployed. Rather than being restricted to the stochastic block model structure required in previous formulations of the problem, our framework allows for arbitrary dependence structure both within and between graph pairs, while encompassing the original SBM formulation of the problem. Additionally, we have demonstrated how this framework relates to the well-studied notion of Bayes optimal classifiers in the pattern recognition literature. Unlike in the classification setting, we have seen that while Bayes optimal VN schemes always exist, no universally consistent scheme exists. This fact is due essentially to the additional leeway provided by the graph model, in which observing more vertices does not necessarily correspond to receiving more information about the underlying distribution. This is in contrast to the classification setting studied in Stone (1977) and others (Devroye et al., 1997), in which observing more samples allows more accurate estimation of the underlying distribution and class boundary. For this reason, one especially interesting line of investigation concerns the nominatable distributions for which larger n does indeed correspond to more information about the underlying graph distribution. A simple example of this is the initial formulation of the vertex nomination problem, in which observing more vertices allows one to better estimate the model parameters, including the block memberships, and thus more accurately identify the vertices from the interesting block. We suspect that the essential property at play here is that under models of this sort, each vertex is analogous to a sample from a single distribution, though this may not be in and of itself a sufficient condition for consistency. For example, in the case of (G1, G2) being i.i.d. or ρ-correlated marginally identical draws from a random dot product graph model with the identity correspondence, each vertex (along with its incident edges) is, in a sense, a noisy sample from the underlying latent Lyzinski, Levin, and Priebe position distribution. Hence, for large n, one can estimate the latent positions or their distribution to arbitrary accuracy, and provided that the latent positions of the interesting vertices are suitably separated from those of the rest of the graph, one should have VN consistency for the collection of these latent position models. More broadly, it would be good to better understand whether there exist families of nominatable distributions F for which certain VN schemes are consistent, and precisely how large these families can be made to be. In a similar vein, it would be of interest to explore how the dependence structure allowed both within and between graphs influences vertex nomination. In particular, if one rules out certain pathologically hard dependence structures as considered in Example 5, can one obtain global consistency with respect to this restricted set of distributions? We hope to explore these questions in future work. We are also exploring alternative formulations of the VN problem and alternate formulations of the VN loss function. While the extension to multiple vertices of interest in each network G1 and G2 is straightforward, we are considering several generalizations of the VN problem considered here. One formulation of prime interest in applications (especially in connectomics and social networks) is as follows: given a collection of vertices of interest in one graph, find those that play a similar structural (based on the topology of the underlying network) or functional (based on vertex or edge covariates) role in the other graph. In addition, as seen in Section 4.3 the impact on VN consistency (and the potential existence of universally consistent schemes) when incorporating edge and vertex covariates into the VN framework is of prime interest, and a deeper analysis of the VN inference task in this framework is the subject of our current and future work. The loss function considered in the present work is an analogue of the 0/1 recall-atk loss function in the information retrieval literature. Under this loss function, we have shown that no universally consistent VN rule exists. It is natural to ask whether alternative loss functions can be considered under which universal consistency is achievable. While we conjecture that Example 5 will nearly always provide a counterexample to universal consistency, this question remains open and is the subject of current research. Acknowledgments This work is supported in part by the D3M program of the Defense Advanced Research Projects Agency (DARPA), NSF grant DMS-1646108, and by the Air Force Research Laboratory and DARPA, under agreement number FA8750-18-2-0035. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory and DARPA, or the U.S. Government. Appendix A. Proofs of Main Results Here we collect the proofs of the two main theorems in this work, Theorems 17 and 23. On Consistent Vertex Nomination Schemes A.1. Theorem 17 and Remark 18 In this section, we present proofs of Theorem 17 and the claim in Remark 18. Proof [Proof of Theorem 17] Let Φ be an arbitrary VN scheme and Φ the scheme defined at Eq. (9). Recall the definition (g1, [o(g2)])w=o(v) = n g1, g2 s.t. iso. σ with o( g2) = σ(o(g2)), and σ(w) = o(v ) o . With g defined as in the theorem, note that for each i [h], Uj i,g : = n (g1, g2) g(i) 1 , [o(g(i) 2 )] s.t. rankΦ(g1,o(g2),v )(o(v )) = j o = n (g1, g2) g(i) 1 , [o(g(i) 2 )] s.t. Φ(g1, o(g2), v )[j] = o(v ) o = n (g1, g2) g(i) 1 , [o(g(i) 2 )] s.t. iso. σ s.t. σ(o(g(i) 2 )) = o(g2) and σ(Φ(g(i) 1 , o(g(i) 2 ), v )[j]) = o(v ) o = g(i) 1 , [o(g(i) 2 )] Φ(g(i) 1 ,o(g(i) 2 ),v )[j]=o(v ) . For each i [h] define pi,Φ [0, 1]m via pi,Φ[j] := PF (n,m) c,θ Uj i,g (g(i) 1 , [o(g(i) 2 )]) . Observe that for each i [h], it is immediate that pi,Φ majorizes pi,Φ. To see this, note that for any fixed ξ [m], letting qξ i,Φ be (pi,Φ[j])ξ j=1 with entries sorted in descending order, we have pi,Φ [j] qξ i,Φ[j] for all j [ξ], and majorization follows immediately. With Pg denoting the partition induced by g, this majorization property implies Lξ(Φ, v ) = 1 j=1 P rankΦ(G1,o(G2),v )(o(v )) = j j=1 P h Uj i,g (g(i) 1 , [o(g(i) 2 )]) i P h (g(i) 1 , [o(g(i) 2 )]) i j=1 pi,Φ[j] P h (g(i) 1 , [o(g(i) 2 )]) i j=1 pi,Φ [j] P h (g(i) 1 , [o(g(i) 2 )]) i = Lξ(Φ , v ). As Φ, g, and o were arbitrary, the proof follows. Lyzinski, Levin, and Priebe Proof [Proof of Remark 18] Fix i, and let ξi = |Ψ(g(i) 1 , g(i) 2 , v )|. Note that for each j ξi, the set of graphs (g1, g2) g(i) 1 , [o(g(i) 2 )] for which Ψ(g1, g2, v )[j] = I(o(v ); o(g2)) is precisely g(i) 1 , [o(g(i) 2 )] Ψ(g(i) 1 ,o(g(i) 2 ),v )[j]=o(v ) . If the tie breaking scheme T satisfies T[1] = o(v ), then the set set of graphs (g1, g2) g(i) 1 , [o(g(i) 2 )] for which Φ T (g1, g2, v )[j] = o(v ) is then also g(i) 1 , [o(g(i) 2 )] Ψ(g(i) 1 ,o(g(i) 2 ),v )[j]=o(v ) . The proof proceeds as follows. For an arbitrary VN scheme Φ, and for each i [h], Uj i,g : = n (g1, g2) g(i) 1 , [o(g(i) 2 )] s.t. rankΦ(g1,o(g2),v )(o(v )) = j o n (g1, g2) g(i) 1 , [o(g(i) 2 )] s.t. iso. σ s.t. σ(o(g(i) 2 )) = o(g2) and σ I Φ(g(i) 1 , o(g(i) 2 ), v )[j]; o(g(i) 2 ) o(v ) o = g(i) 1 , [o(g(i) 2 )] I Φ(g(i) 1 ,o(g(i) 2 ),v )[j];o(g(i) 2 ) =o(v ) We then have that T[1] = o(v ) implies that 1 Lℓ(Φ, v ) = X g(i) 1 , [o(g(i) 2 )] P h g(i) 1 , [o(g(i) 2 )] i g(i) 1 , [o(g(i) 2 )] I Φ(g(i) 1 ,o(g(i) 2 ),v )[j];o(g(i) 2 ) =o(v ) g(i) 1 , [o(g(i) 2 )] P h g(i) 1 , [o(g(i) 2 )] i " g(i) 1 , [o(g(i) 2 )] I Φ(g(i) 1 ,o(g(i) 2 ),v )[j];o(g(i) 2 ) =o(v ) | g(i) 1 , [o(g(i) 2 )] # P h g(i) 1 , [o(g(i) 2 )] i j Ji P g(i) 1 , [o(g(i) 2 )] Ψ(g(i) 1 ,o(g(i) 2 ),v )[j]=o(v ) | g(i) 1 , [o(g(i) 2 )] =P h (g1,g2) g(i) 1 ,[o(g(i) 2 )] s.t. Φ T (g1,o(g2),v )[j]=o(v ) g(i) 1 ,[o(g(i) 2 )] i P h g(i) 1 , [o(g(i) 2 )] i 1 Lℓ(Φ T , v ), where Ji [ℓ] is the lexicographically smallest set of indices for which ( g(i) 1 , o(g(i) 2 )] I Φ(g(i) 1 ,o(g(i) 2 ),v )[j];o(g(i) 2 ) =o(v ) are distinct. On Consistent Vertex Nomination Schemes A.2. Proof of Theorem 23 Proof Define a probability vector ξ Rm by ξi = ϵi ϵi 1 for i [m 1] (where we take ϵ0 := 0), and let ξm = 1 ϵm 1. Consider asymmetric graphs (g1, g2) Gn Gm and construct a distribution F (n,m) c,θ N as follows. i. c = n m; ii. The support of F (n,m) c,θ is (g1, [o(g2)]); iii. For each k [m] define RΦ,k = (g1, g2) (g1, [o(g2)]) s.t. Φ(g1, o( g2), v )[k] = o(v ) . Then we define PF (n,m) c,θ (RΦ,k) := ξ[k] with all elements of RΦ,k being assigned equal mass under F (n,m) c,θ . It is clear then that Lk(Φ, v ) = 1 ϵk > 1 k m. It is also clear that L k(v ) ϵm k. Indeed, consider Φ which is defined by reversing the order provided by Φ; then Lk(Φ , v ) = ϵm k; which completes the proof. Appendix B. Proof of Theorem 19 Herein we will provide a sketch of the proof of Theorem 19 for completeness. Let Q be a permutation matrix in Πn (with associated permutation σQ) that permutes precisely k labels (i.e., P i Qi,i = n k), and let T denote the number of transpositions induced by σQ, and note |T| k/2. By exploiting the cyclic structure of Q acting on vec(B), we have that Eδ(Q) Eδ(In) = E AQ QB 2 F E A B 2 F ϵ (n k)k + k 2 Combining this expectation bound with the following Mc Diarmid-like concentration result will yield the proof of Theorem 19. Proposition 29 (Proposition 3.2 from Kim et al. (2002)) Let X1, . . . , Xn be a sequence of independent Bernoulli random variables where E[Xi] = pi. Let f : {0, 1}n 7 ℜ be such that changing any Xi to 1 Xi changes f by at most M = sup i sup X1,...,Xn |f(X1, . . . , Xi, . . . , Xn) f(X1, . . . , 1 Xi, . . . , Xn)|. Let σ2 = M2 P i pi(1 pi) and let Y = f(X1, . . . , Xn). Then Pr[|Y E[Y ]| tσ] 2e t2/4 for all 0 < t < 2σ/M. Lyzinski, Levin, and Priebe Indeed, we see that XQ := δ(Q) δ(In) is a function of NQ independent Bernoulli random variables, where NQ = 3 k 2 + k(n k) |T| 3kn. Let SQ be the sum of these NQ Bernoulli random variables, and it follows that Var(SQ) NQ/4. By setting t = C ϵkn σ for an appropriate constant C > 0 in Proposition 29, we have for n sufficiently large Pr (XQ 0) Pr (|XQ E(XQ)| E(XQ)) 2exp Θ(ϵ2kn) . A union bound over all such Q (of which there are nk) and over k yields P ( Q Πn \ {In} s.t. δ(Q) δ(In)) k=2 2exp k log(n) Θ(ϵ2kn) = 2exp Θ(ϵ2n) , as desired. B.1. VN Schemes with Ties We can incorporate ties into the VN framework as follows. With ties allowed, any sensiblydefined vertex nomination scheme should view all vertices in I(u; g) as being equally good matches to a vertex of interest v . To this end, we will view VN schemes as providing weak orderings of the elements of W: Definition 30 For a set A, let WA denote the set of all weak orderings of the elements of A (i.e., the set of all total orderings where ties are allowed). For x WA, let tx be any maximum-length total ordering induced by x. For each a A, we define rankx(a) = rank tx(a ), where a = a according to the ordering x. Example 7 If A = {a, b, c, d, e} and x : a > c > d = e > b, then t(x) : a > c > d > b, or t(x) : a > c > e > b; in either case, rankx(a) = 1, rankx(c) = 2, rankx(d) = 3, rankx(e) = 3, and rankx(b) = 4. A well-defined VN scheme should be label-independent in the following sense: Each element of each I(o(u); o(g2)) should be ranked identically by Φ, and these ranks should be independent of the obfuscation function o. Formally, we have the following. Definition 31 (Vertex Nomination (VN) Scheme) For n, m > 0 fixed, obfuscating set W, and o OW , a vertex nomination scheme is a function Φ : Gn o(Gm) V1 WW satisfying the following properties: For all (g1, g2) Gn Gm, i. If u1 / I(u2; g2) then either o(u1) > o(u2) or o(u1) < o(u2) in the ordering provided by Φ(g1, o(g2), v ); ii. If u1 I(u2; g2) then o(u1) = o(u2) in the ordering provided by Φ(g1, o(g2), v ); iii. (consistency criterion) If o1, o2 OW , then for each v V (g2) rankΦ(g1,o1(g2),v )(o1(v)) = rankΦ(g1,o2(g2),v )(o2(v)). (14) On Consistent Vertex Nomination Schemes We let Vn,m denote the set of all such VN schemes. The VN loss functions and level-k errors are defined as in the totally ordered setting. To define the Bayes optimal scheme, let (g1, g2) be realized from (G1, G2) F (n,m) c,θ Nn,m, and consider a vertex of interest v C and obfuscating function o : V2 W. Letting denote graph isomorphism, for (g1, g2) Gn Gm define the set (g1, [o(g2)]) = g1, g2 s.t. o( g2) o(g2) = g1, g2 s.t. g2 g2 (15) In order to define the Bayes optimal scheme, we will also need the following restrictions of (g1, [o(g2)]): for each w W, and v V2 we define (g1, [o(g2)])I(w;o(g2))=o(v) = n g1, g2 s.t. iso. σ with o( g2) = σ(o(g2)), and σ(u) = o(v) for some u I(w; o(g2)) o . (16) Note that for a fixed v, if {I(w; o(g2))}w W partitions W (for some suitable W W), then (g1, [o(g2)])I(w;o(g2))=o(v) partitions (g1, [o(g2)]). We are now ready to define a Bayes optimal VN scheme. For ease of notation, in the sequel we will write PF (n,m) c,θ or even simply P in place of P(G1,G2) F (n,m) c,θ where there is no risk of ambiguity. Let g = n g(i) 1 , g(i) 2 oh be such that the sets n g(i) 1 , [o(g(i) 2 )] oh i=1 partition Gn Gm. We will call this partition Pn,m, where we suppress dependence on g and o for ease of notation. We will define a Bayes optimal scheme Φ = Φ g piecewise on each element of this partition. For each i [h], define (where ties in the argmax s are broken in an arbitrary but nonrandom manner) Φ g(i) 1 , o(g(i) 2 ), v [1] argmax I(u;o(g(i) 2 )) W P g(i) 1 , [o(g(i) 2 )] I(u;o(g(i) 2 ))=o(v ) g(i) 1 , [o(g(i) 2 )] Φ g(i) 1 , o(g(i) 2 ), v [2] argmax I(u;o(g(i) 2 )) W\{Φ [1]} P g(i) 1 , [o(g(i) 2 )] I(u;o(g(i) 2 ))=o(v ) g(i) 1 , [o(g(i) 2 )] Φ g(i) 1 , o(g(i) 2 ), v [ℓ] argmax Io(g2)(u) W\{ ℓ 1 j=1Φ [j]} P g(i) 1 , [o(g(i) 2 )] I(u;o(g(i) 2 ))=o(v ) g(i) 1 , [o(g(i) 2 )] Lyzinski, Levin, and Priebe so that the ranking provided by Φ is (where Φ (g(i) 1 , o(g(i) 2 ), v )[i] = {u(i) 1 , . . . , u(i) ni }) u(1) 1 = u(1) 2 = = u(1) n1 | {z } n1 > u(2) 1 = u(2) 2 = = u(2) n2 | {z } n2 > > u(ℓ) 1 = u(ℓ) 2 = = u(ℓ) nℓ | {z } nℓ For each element (g 1, g 2) (g(i) 1 , [o( g(i) 2 )]) \ {(g(i) 1 , g(i) 2 )}, choose any isomorphism σ such that o(g 2) = σ(o(g(i) 2 )), and define Φ (g1, o(g 2), v ) = σ(Φ (g(i) 1 , o(g(i) 2 ), v )), noting that Φ (g1, o(g 2), v ) is independent of the choice of isomorphism σ. The next proposition states that, modulo ties, the definition of Φ g is independent of the choice of g. Proposition 32 Let o OW be an obfuscating function, and let g = n g(i) 1 , g(i) 2 oh i=1 = g = n g(i) 1 , g(i) 2 oh be such that the sets n g(i) 1 , [o(g(i) 2 )] oh i=1 , n g(i) 1 , [o( g(i) 2 )] oh partition Gn Gm. Suppose that (G1, G2) F (n,m) c,θ Nn,m, and consider a vertex of interest v C. Then there exists a fixed strategy for breaking ties in the argmax s for Φ g and Φ g that yields Φ g = Φ g. In particular, under any such tie-breaking strategy, we have that Lk(Φ g, v ) = Lk(Φ g, v ) for all k [m 1]. Lastly, the following theorem shows that this scheme (or schemes) is indeed Bayes optimal. The proof is analogous to the totally ordered setting and is thus omitted. Theorem 33 Let o OW be an obfuscating function, and let g = n g(i) 1 , g(i) 2 oh be such that the sets n g(i) 1 , [o(g(i) 2 )] oh partition Gn Gm. Let Φ = Φ g be as defined in Equation (9). Suppose that (G1, G2) F (n,m) c,θ Nn,m, and consider a vertex of interest v C. We have that Lk(Φ , v ) = L k(v ) for all k [m 1] and all obfuscating functions o. On Consistent Vertex Nomination Schemes A. Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, pages 14 23, 2006. S. Agarwal. Learning to rank on graphs. Machine Learning, 10(3):333 357, 2010. E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. The Journal of Machine Learning Research, 9:1981 2014, 2008. R. Albert and A.-L. Barab asi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47, 2002. D. Asta and C. R. Shalizi. Geometric network comparison. In M. Meila and T. Haskes, editors, Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, pages 102 110, 2015. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373 1396, 2003. M. Belkin, P. Niyogi, and V. Sindhwani. Manifold Regularization: A Geometric Framework for Learning from Examples. Journal of Machine Learning Research, 7:2399 2434, 2006. P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities. Proc. National Academy of Sciences, USA, 106:21068 21073, 2009. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, 1998. L. Chen, C. Shen, J. T. Vogelstein, and C. E. Priebe. Robust vertex classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3):578 590, 2016a. L. Chen, J. T. Vogelstein, V. Lyzinski, and C. E. Priebe. A joint graph inference case study: The C. elegans chemical and electrical connectomes. Worm, 5(2), 2016b. R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21:5 30, 2006. D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence, 18 (03):265 298, 2004. G. Coppersmith. Vertex nomination. Wiley Interdisciplinary Reviews: Computational Statistics, 6(2):144 153, 2014. G. A. Coppersmith and C. E. Priebe. Vertex nomination via content and context. ar Xiv preprint ar Xiv:1201.4118, 2012. Lyzinski, Levin, and Priebe L. Devroye, L. Gy orfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1997. K. K. Duh. Learning to Rank with Partially-Labeled Data. Ph D thesis, University of Washington, 2009. P. Erd os and A. R enyi. On random graphs, I. Publicationes Mathematicae, 6:290 297, 1959. P. Erd os and A. R enyi. Asymmetric graphs. Acta Mathematica Academiae Scientiarum Hungarica, 14(3 4):295 315, 1963. D. E. Fishkind, V. Lyzinski, H. Pao, L. Chen, and C. E. Priebe. Vertex nomination schemes for membership prediction. The Annals of Applied Statistics, 9(3):1510 1532, 2015. D. E. Fishkind, S. Adali, H. G. Patsolic, L. Meng, D. Singh, V. Lyzinski, and C. E. Priebe. Seeded graph matching. Pattern Recognition, 87:203 215, 2019. O. Frank and D. Strauss. Markov graphs. Journal of the american Statistical association, 81(395):832 842, 1986. P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090 1098, 2002. P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109 137, 1983. H. Jeong, S. P. Mason, A.-L. Barab asi, and Z. N. Oltvai. Lethality and centrality in protein networks. Nature, 411(6833):41 42, 2001. B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83, 2011. J. H. Kim, B. Sudakov, and V. H. Vu. On the asymmetry of random regular graphs and random graphs. Random Structures and Algorithms, 21:216 224, 2002. H. Li. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, 4(1), 2011. D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the Association for Information Science and Technology, 58(7):1019 1031, 2007. T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225 331, 2009. L. L u and T. Zhou. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6):1150 1170, 2011. C. Luo, W. Pang, Z. Wang, and C. Lin. Hete-CF: Social-based collaborative filtering recommendation using heterogeneous relations. In Proceedings of the IEEE International Conference on Data Mining, pages 917 922, 2014. On Consistent Vertex Nomination Schemes D. Lusseau and M. E. J. Newman. Identifying the role that animals play in their social networks. Proceedings of the Royal Society of London B: Biological Sciences, 271(Suppl 6):S477 S481, 2004. U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395 416, 2007. V. Lyzinski. Information recovery in shuffled graphs via graph matching. IEEE Transactions on Information Theory, 64(5):3254 3273, 2018. V. Lyzinski and D. L. Sussman. Matchability of heterogeneous networks pairs. ar Xiv preprint ar Xiv:1705.02294, 2018. V. Lyzinski, D. L. Sussman, M. Tang, A. Athreya, and C. E. Priebe. Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electronic Journal of Statistics, 8:2905 2922, 2014. V. Lyzinski, D. Fishkind, M. Fiori, J.T. Vogelstein, C.E. Priebe, and G. Sapiro. Graph matching: Relax at your own risk. IEEE Transactions on Pattern Analysis and Machine Intelligence, In press, 2015. V. Lyzinski, K. Levin, D. E. Fishkind, and C. E. Priebe. On the consistency of the likelihood maximization vertex nomination scheme: Bridging the gap between maximum likelihood estimation and graph matching. Journal of Machine Learning Research, 17(179):1 34, 2016. H. Ma, I. King, and M. R. Lyu. Mining web graphs for recommendations. IEEE Transactions on Knowledge and Data Engineering, 24(6):1051 1064, 2012. C. Manning, P. Raghavan, and H. Sch utze. Introduction to Information Retrieval. Cambridge University Press, 2008. D. Marchette, C. E. Priebe, and G. Coppersmith. Vertex nomination via attributed random dot product graphs. In Proceedings of the 57th ISI World Statistics Congress, volume 6, page 16, 2011. R. Mihalcea and D. Radev. Graph-based Natural Language Processing and Information Retrieval. Cambridge University Press, 2011. M. E. J. Newman. A measure of betweenness centrality based on random walks. Social Networks, 27(1):39 54, 2005. M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E, 74(3):036104, 2006. M. E. J. Newman and A. Clauset. Structure and inference in annotated networks. Nature Communications, 7(11863), 2016. H. G. Patsolic, Y. Park, V. Lyzinski, and C. E. Priebe. Vertex nomination via local neighborhood matching. ar Xiv preprint ar Xiv:1705.00674, 2017. Lyzinski, Levin, and Priebe T. A. N. Pham, X. Li, G. Cong, and Z. Zhang. A general recommendation model for heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 28 (12):3140 3153, 2016. P. Resnick and H. R. Varian. Recommender systems. Communications of the ACM, 40(3): 56 58, 1997. F. Ricci, L. Rokach, and B. Shapira. Introduction to Recommender Systems Handbook. Springer, 2011. G. Robins, P. Pattison, Y. Kalish, and D. Lusher. An introduction to exponential random graph (p ) models for social networks. Social Networks, 29(2):173 191, 2007. K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. Annals of Statistics, 39:1878 1915, 2011. S. Rothe and H. Sch utze. Co Sim Rank: A flexible & efficient graph-theoretic similarity measure. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1392 1402, 2014. T. Snijders, P. Pattison, G. Robins, and M. Handcock. New specifications for exponential random graph models. Sociological Methodology, 36(1):99 153, 2006. I. Steinwart. Support vector machines are universally consistent. Journal of Complexity, 18(3):768 791, 2002. C. Stone. Consistent nonparametric regression. Annals of Statistics, 5:595 645, 1977. M. Sun and C. E. Priebe. Efficiency investigation of manifold matching for text document classification. Pattern Recognition Letters, 34(11):1263 1269, 2013. D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe. A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107(499):1119 1128, 2012. D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. Matched filters for noisy induced subgraph detection. ar Xiv preprint ar Xiv:1803.02423, 2018. S. Suwan, D. S. Lee, and C. E. Priebe. Bayesian vertex nomination using content and context. Wiley Interdisciplinary Reviews: Computational Statistics, 7(6):400 416, 2015. M. Tang, D. L. Sussman, and C. E. Priebe. Universally consistent vertex classification for latent positions graphs. The Annals of Statistics, 41(3):1406 1430, 2013. M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. A semiparametric two-sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics, 26(2):344 354, 2017a. M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, and C. E. Priebe. A nonparametric two-sample hypothesis testing problem for random dot product graphs. Bernoulli, 23(3): 1599 1630, 2017b. On Consistent Vertex Nomination Schemes D. Watts and S. H. Strogatz. Collective dynamics of small-world networks. Nature, 393 (6684):440, 1998. J. Yoder, L. Chen, H. Pao, E. Bridgeford, K. Levin, D. E. Fishkind, C. E. Priebe, and V. Lyzinski. Vertex nomination: The canonical sampling and the extended spectral nomination schemes. ar Xiv preprint ar Xiv:1802.04960, 2018. S. Young and E. Scheinerman. Random dot product graph models for social networks. In Proceedings of the 5th International Conference on Algorithms and Models for the Web-graph, pages 138 149, 2007. D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch olkopf. Learning with local and global consistency. In S. Thrun, L. K. Saul, and P. B. Sch olkopf, editors, Advances in Neural Information Processing Systems 16, pages 321 328, 2004.