# on_provable_privacy_vulnerabilities_of_graph_representations__99f1c449.pdf

On provable privacy vulnerabilities of graph representations

Ruofan Wu , Guanhua Fang , Mingyang Zhang , Qiying Pan , Tengfei Liu 1, and Weiqiang Wang

Ant Group Fudan University Shanghai Jiao Tong University {ruofan.wrf, zhangmingyang.zmy, aaron.ltf, weiqiang.wwq}@antgroup.com fanggh@fudan.edu.cn, sim10_arity@sjtu.edu.cn

Graph representation learning (GRL) is critical for extracting insights from complex network structures, but it also raises security concerns due to potential privacy vulnerabilities in these representations. This paper investigates the structural vulnerabilities in graph neural models where sensitive topological information can be inferred through edge reconstruction attacks. Our research primarily addresses the theoretical underpinnings of similarity-based edge reconstruction attacks (SERA), furnishing a non-asymptotic analysis of their reconstruction capacities. Moreover, we present empirical corroboration indicating that such attacks can (almost) perfectly reconstruct sparse graphs as graph size increases. Conversely, we establish that sparsity is a critical factor for SERA s effectiveness, as demonstrated through analysis and experiments on (dense) stochastic block models. Finally, we explore the resilience of private graph representations produced via noisy aggregation (NAG) mechanism against SERA. Through theoretical analysis and empirical assessments, we affirm the mitigation of SERA using NAG. In parallel, we also empirically delineate instances wherein SERA demonstrates both efficacy and deficiency in its capacity to function as an instrument for elucidating the trade-off between privacy and utility. 1

1 Introduction

With the surging developments of graph representation learning (GRL) [15], there has been growing apprehensions concerning the security challenges associated with the deployment of graph neural models in real-world scenarios [10]. GRL models harness the topological information of the underlying graph for producing high-quality predictions or graph representations. Meanwhile, these models bear the risk of inadvertently divulging the same topological information through the graph representations they produce. Such kind of security risks have been empirically validated through the examination of the attacking performance of edge reconstruction algorithms [11, 17, 34, 46], among which a simple form of attack based solely on the representation similarity of node pairs is shown to achieve strikingly strong performance, without the requirement of additional knowledge like encoder architecture or auxiliary datasets [17].

Equal contribution Corresponding author: aaron.ltf@antgroup.com 1Code available at https://github.com/Rorschach1989/gnn_privacy_attack

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

Despite the empirical evidence of topological vulnerabilities of graph representations, theoretical explanations delineating the effectiveness of such attacks remain largely unexplored: As demonstrated in previous studies [11, 17], similarity-based attacks are remarkably effective against sparse graphs that exhibit a generalized homophily pattern, i.e., there exists a significant correlation between the similarity of node features and edge adjacency information. This phenomenon posits that feature similarity may serve as a confounding factor, potentially impacting the efficacy of similarity-based attacks. It is therefore valuable to understand the influence of graph properties, such as feature similarity and sparsity, on the edge reconstruction process of the attacking procedures.

Beyond their capability in characterizing the vulnerabilities of representations, attacking algorithms may also function as empirical attestations of privacy-preserving inference protocols that fulfill formal privacy guarantees such as differential privacy [9, Section 4]. As an illustrative case, membership inference attacks can be employed for auditing differential privacy [31]. Since edge reconstruction is equivalent to edge membership inference on graphs [43], it is thus pertinent to explore the performance of similarity-based attacks when confronted with privacy-preserving graph representations [30, 36].

In this paper, we take initial steps toward a principled understanding of structural vulnerabilities of graph representations under the similarity-based edge reconstruction attack (hereafter abbreviated as SERA) which forms a realistic threat in many practical scenarios such as vertical federated learning [36]. In particular, we establish the following theoretical as well as empirical findings:

(i) Success modes of SERA Through applying SERA to sparse random graphs equipped with independent random node features, we show that SERA provably reconstructs the input graph via a non-asymptotic analysis. The result indicates that feature similarity is not necessary for SERA to succeed. We conduct both synthetic experiments as well as real-world data evaluations to empirically validate our theory.

(ii) Failure modes of SERA We show, through theoretical analysis and corroborative synthetic experiments, performance lower bounds when applying SERA to stochastic block models (SBM) with independent random node features: When the underlying SBM has Θ(1) intra-group connection probability, edge recovery through graph representations becomes provably hard.

(iii) Mitigation of SERA We assess the resilience of SERA using noisy aggregation (NAG) as the privacy protection mechanism. Theoretical guarantees of NAG are established which further extends previous results, accompanied by extensive empirical evaluations to corroborate our theoretical assertions. Intriguingly, our findings reveal instances wherein NAG provides significant resistance to SERA, even under some scenarios where it only guarantees very weak privacy. Such discoveries delineate the circumstances that elucidate both the strengths and limitations of SERA as a privacy auditing tool.

2 Related works

Typically, there exist two categories of private information that may potentially be compromised during the training or deployment phases of graph neural network models: The (sensitive) node attributes and the adjacency relation between nodes. In this paper we focus on the later category since edge adjacency relations are less informative, i.e., for each pair of nodes, the existence of an edge constitutes only a single bit of information.

2.1 Edge reconstruction attacks on graph-structured data

Contemporary developments on edge reconstruction attacks differ significantly in their conceptualization of adversaries, particularly in terms of their capabilities [45, 44] and the extent of prior knowledge they possess about the GRL model and the underlying graph dataset [17]. The mechanism of SERA was first proposed in [11] and later studies in [17]. Empirical evidences suggest that with only black-box access to node representations, the SERA mechanism obtains a high success rate (AUC > 0.9 for the Citeseer dataset). Subsequent developments have explored stronger attacks under more powrful adversaries. In [17] the authors investigated the impact of an adversary s prior knowledge, including the possession of node features, partial graph structure, and access to a shadow dataset, on the success rate of corresponding attack strategies. Inspire by information bottleneck,[46] improves SERA via carefully exploiting intermediate representations produced by GNNs. Notably,

despite the adversaries in [17, 46] being equipped with substantially more information compared to SERA, the resulting enhancement in attack performance exhibited by these adversaries demonstrates only marginal improvements relative to SERA. The Graph MI attack [45] disables the adversary from being able to acquire node representations but instead requires access to node features and labels, as well as white-box access to the GNN model. Recent works explored influence-based attacking schemes, wherein the adversary is allowed to alter the graph information: The Link Teller attack [34] manipulates node features while [23] infiltrates the underlying graph with malicious nodes.

2.2 Theoretical explorations in graph recovery from neural representations

In [6], the authors proposed an algorithm that provably recovers graph structure based on representations generated via Deep Walk, which is a factorizaton-based procedure and different from GNN-produced representations. In [43] the authors showed that when block structure exists in the underlying graph, the performance of SERA is uneven across node in different blocks. In [46], the authors use information-theoretic arguments to construct more powerful attacks than SERA. Nevertheless, the aforementioned studies did not provide a theoretical rationale for the practical vulnerabilities manifested as a result of the SERA. In a contemporary work [8], the authors derived generalization bounds of linear GNN under the link prediction context assuming the underlying graph generated by a moderately sparse graphon model.

2.3 Privacy protection against edge reconstruction attacks

Edge differential privacy (EDP) [26] is the most popular privacy notion that offers a formal protection against edge reconstruction attacks. Standard private training algorithms like DPSGD [1] may produce GNN models that is provably private in the sense that membership information of any individual training sample is limitly disclosed. 2 However, such approaches do not provide privacy during inference time [7]. Protection mechanisms against inference-time adversaries are mostly based on noisy version of GNN encoding such as edge-wise randomized response [34] that provides very strong privacy protection yet being overly destructive to model utility. Noisy aggregation (NAG) mechansims [30, 36, 7] are recently proposed that empirically achieves better privacy-utility trade-offs. Inspired by the information bottleneck principle, [33, 46] proposed to use regularization or saddle-point optimization techniques to control privacy leakage. Yet these proposals are not principled in theory.

3 Preliminaries

Setup and notations Consider an undirected graph G = (V, E) with n = |V | nodes associated with node features X Rn d. Denote A as the corresponding adjacency matrix and D as the diagonal matrix with the v-th diagonal entry being the degree of node v. In this paper, we will study victim models taking forms of graph neural encoders. Our vulnerability analysis predominantly centers on the linear graph neural network [35] architecture which has been widely adopted in previous theoretical studies on graph neural networks [3, 39, 37, 8]. Specifically, the node representation matrix of an L-layer linear GNN is computed as:

H(L) = (D + I) 1 (A + I) L XW, (1)

where the identity matrix is added for ensuring self-loops, and W Rd d is the weight matrix. Throughout this paper, we will assume the node feature dimension and the hidden dimension to be equal to d and refer to this as the feature dimension, as otherwise we may add an extra input projection to fulfill this requisite. We further denote W op and κ(W) as the operator norm (i.e., largest singular value) and condition number (i.e., the ratio of largest and smallest singular value) of matrix W.

Threat model We assume the adversary knows the node set V and is able to inquire node representations of an arbitrary node subset Vvictim V . Hereafter we will refer to the subgraph induced via Vvictim as the victim subgraph Gvictim = (Vvictim, Evictim). The goal of the adversary is to recover an arbitrary fraction of Evictim based on the acquired node representations

2Note that this require a careful sensitivity analysis with respect to the correct privacy model like EDP.

H(L) victim = {h(L) v , v Vvictim}, L > 0. We identify two representative scenarios that underscore the potential threat by such adversaries: The first scenario is API-style deployments of graph representations [34], wherein an adversary might query the node representations for a set of nodes using their node identifiers, with this particular subset of nodes constituting the victim nodes. The second scenario pertains to a two-party vertical federated learning (VFL) context [36], wherein the graph topology retained by party A is deemed confidential. Under such a setup, the privacy threat materializes as party B might adhere to the VFL protocol while simultaneously being curious about the topology. Note that the capabilities of the adversaries posited herein are intentionally constrained by denying them access to both the raw node features X and the model parameters. Additionally, the objectives of the adversary are decidedly ambitious, aiming at the potential recovery of the entire suite of edges within the victim subgraph. A more in-depth discussion regarding the threat model and the potent capabilities of the adversary is deferred to appendix B.1.

The SERA is based on a similarity measure sim, with the adjacence relation between node u and node v inferred as b ASERA uv (τ) = 1 sim h(L) u , h(L) v τ , (2)

where we denote 1( ) as the indicator function. In this paper we will be primarily interested in two similairty measures: The cosine similarity cos(x, y) = x, y /( x 2 y 2) and correlation similarity corr(x, y) = x x, y y /( x x 2 y y 2), which is essentially a centered version of cosine similarity ( x, y are coordinate-wise averages of x and y) defined for node representations with dimension greater than 1. The cutoff threshold τ is allowed to depend on the embedding set Hvictim but is uniform across all edge decisions. Hereafter without misunderstandings, we will drop the superscript and denote b A(τ) as the reconstructed adjacency matrix under threshold τ. To measure the performance of the attack, we use false positive rate (FPR) and false negative rate (FNR) defined as

FPR b A (τ) =

u,v 1 b Auv(τ) = 1 1 (Auv = 0) P

u,v 1 (Auv = 0) , FNR b A (τ) =

u,v 1 b Auv(τ) = 0 1 (Auv = 1) P

u,v 1 (Auv = 1) . (3)

We further define the error rate ERR as the summation of FPR and FNR. Employing these metrics facilitates a more nuanced characterization of attack performance, particularly when the underlying graph is sparse. An alternate metric that is often used in practice [17] is the area under the receiver operating characteristic curve (AUROC) metric

AUROC b A = Z 1

1 FNR b A FPR 1 b A (s) ds (4)

which quantifies the aggregate performance of b A by integrating the trade-off between the false positive rate and the false negative rate across different thresholds.

Intuitively, the success of SERA is determined by the correlation between node representation similarity and edge presence. Previous empirical observations demonstrate the effectiveness of SERA against graphs that exhibit strong correlations between node feature similarity and edge presence [17]. We will refer to such kinds of graphs as being homophilous in a generalized sense [18, 22]. We defer a more formal introduction to homophily measrues to appendix B.2. Due to the message-passing nature of GNN encoders, it is intuitively reasonable that recursive aggregation of node representations strengthens the correlation and results in successful edge reconstructions. However, it is non-trivial whether SERA mechanism may succeed in the absence of the aforementioned generalized homophily pattern, which motivates our first analysis.

4 SERA against sparse random graphs

In this section, we study the behavior of SERA with the underlying (victim) graph generated according to a sparse random graph. Here, the adjacency matrix is generated such that each entry is independently distributed (up to symmetric constraints Auv = Avu) following a Bernoulli distribution Auv Ber(puv). We focus on the sparse regime and allow puv to depend on Xu and Xv. We further assume that the node features Xv s are generated i.i.d. according to an isotropic Gaussian distribution Xv N(0, Id). It follows that the correlation of node feature similarity and edge presence is zero. The following theorem characterizes the effectiveness of SERA under the sparse random graph setup.

Theorem 4.1. Let C1, C2 be a universal constants. Assume the following:

(i) The graph generation mechanism satisfies P

u V puv < C1 log n holds for all v V .

(ii) The depth of GNN encoder L and the feature dimension d satisfies d (C2 log n)6L+2 log n.

(iii) The condition number of the GNN encoder weight satisfies

(κ(W))2 1 8(C2 log n)3L

d log n. (5)

Then there exists a threshold τ = Θ 1 (C2 log n)2L such that with probability at least 1 2 n2 , the following holds for SERA with the similarity measure chosen either as cos or corr:

FNR b A (τ) = 0, FPR b A (τ) (C2 log n)2L

Consequently, on the above set of events we have AUROC b A 1 (C2 log n)2L

Theorem 4.1 implies that, even when SERA can not borrow strength from the homophily nature of the underlying graph, it is able to produce accurate reconstructions when the graph is sufficiently large and sparse, with the sparsity defined in the sense that each node has at most O(log n) neighbors on average. An additional intriguing implication from theorem 4.1 pertains to the dependence of reconstruction performance on the GNN encoder depth L: Provided that the node feature dimension is sufficiently large, the reconstruction performance degrades when the depth of the encoder increases, which is related to the renowned phenomenon of oversmoothing in GNN literature [37]. Intuitively, as the depth of GNN encoders increases, the resulting node representations tend to converge [27], becoming less distinct from one another. This convergence diminishes the discriminative capacity of similarity metrics, thereby affecting the attack performance. Remark 4.2 (Practicality). Theorem 4.1 requires the node feature dimension d to grow in a polylog(n) rate, a condition which may not consistently align with practical scenarios. At present, this requirement is a byproduct of our proof strategy. In section 7.1 we will further examine the implications of feature dimensionality. The existence of a threshold that theorem 4.1 manifests might not guide the choice of threshold in practice. Instead, we may rely on heuristics or side-information [17] to determine the threshold. Furthermore, Theorem 4.1 posits that the efficacy of SERA is contingent upon a reasonable conditioned weight matrix W. We will empirical validate this claim in section 7, wherein we demonstrate robust reconstruction capabilities of the SERA across diverse scenarios including when the weight matrix W is a fixed entity, when it is subject to random initialization, or when it has undergone extensive training iterations utilizing datasets from real-world supervised learning contexts.

5 SERA against dense SBMs

In this section, we reveal the limitation of SERA by constructing a reconstruction problem that is provably hard. We consider the following stochastic block model (SBM) [2], where each node is assigned a community membership from one of K groups k(v) [K]. The (u, v)-th entry of the adjacency matrix is generated as

Auv Ber(p), if k(u) = k(v) Ber(q), otherwise . (7)

For ease of presentation, we further assume that the groups share the same size, i.e., n is a multiple of K. Denote the generation mechanism as G Gsbm(n, K, p, q). We have the following result: Theorem 5.1. Let G Gsbm(n, K, p, q) and p = Θ(1). Assume the GNN encoder to be of depth L and feature dimension d max{log n/p2, K2 log3 n} with the weight matrix being the identity matrix. Then with probability at least 1 1/n2, for any fixed τ [0, 1], one of the following three statements must hold for SERA with similarity measure chosen either as cos or corr:

(i) FPR b A (τ) 1 p

2K and FNR b A (τ) q

(ii) FPR b A (τ) 1 p

(iii) FNR b A (τ) p 2K + q

According to theorem 5.1, given any cutoff threshold if the within-group connection probability is of the order Θ(1) and the number of groups K does not diverge (Otherwise, we will return to the sparse regime in section 4) , the performance of SERA measured by error rate ERR is lower bounded by non-vanishing constants when the feature dimension is sufficiently large. The theorem characterizes the inherent limitations of SERA when the underlying graph is dense. As K gets large, the lower bound of false positive/negative rate decreases. It indicates that SERA is more successful when the graph is less connected. Remark 5.2. Alternatively, we may interpret theorem 5.1 as unveiling instances where SERA is constrained to revealing only population-level relational information such as the affiliation of two nodes to a common group rather than identifying the existence of specific edges when the underlying graph is dense and admits certain group structures.

6 Defense by noisy aggregation: From theory to practice

Having demonstrated the susceptibility of GNN representations to SERA, it becomes an intriguing research question to examine the behavior of SERA within the context of privacy-preserving GRL: In this section, we explore the defensive efficacy of noisy aggregation (NAG), which has been proposed recently as a provably privacy-preserving algorithm [30, 36] under the edge differential privacy model [26]. Concretely, we study an L-layer noisy GNN with the l-th layer computed recursively as:

H(l) v = Act AGG Wl H(l 1) u / H(l 1) u 2 , u N(v) + ϵ , ϵ N(0, σ2Id), (8)

where N(v) := N(v) {v} denotes node v s extended neighborhood and H(l) v denotes the representation of node v at the l-th layer. The aggregation mechanism AGG is a permutation invariant function that defines the message-passing process and Act is some (possibly) non-linear transform. Intuitively, the NAG methodology can be understood as a privatization protocol that incorporates both a normalization step and an additive Gaussian perturbation phase into the conventional messagepassing framework, which typically forms the backbone of a GNN. In this paper, we consider 5 representative GNN architectures that allows NAG privatization: GCN [20], GAT [32], SAGE [14] with mean or max pooling, and GIN [38] with their formal definition deferred to appendix B.3. The following theorem characterizes the defensive capability of NAG: Theorem 6.1. For any graph G and SERA under any type of similarity measures, the inference error regarding any specific edge is lower bounded by:

min u V,v V

h P b Auv = 1|Auv = 0 + P b Auv = 0|Auv = 1 i 1

v u u t1 exp

l [L] Wl 2 op σ2

Here the constant C depends on the AGG mechanism of the GNN. In particular, for some standard GNN architectures we have: CGCN = CMEAN-SAGE = CGIN = 1 and CGAT = CMAX-SAGE = 4.

Theorem 6.1 augments existing literature in the sense that it extrapolates upon prior analyses [30, 36] by generalizing to a broader range of aggregation mechanisms, thereby encompassing the vast majority of foundational components integral to modern GNN models. Theorem 6.1 indicates that for any node pairs in any graph, the summation of type-I error and type-II error (in the language of binary hypothesis testing [21]) incurred by any SERA adversary is lower bounded by a constant, which will be significantly above zero when the noise scale is of the same order to the operator norms of the weight matrices of the GNN encoder. In fact, theorem 6.1 holds against a much stronger family of adversaries, which we discuss in appendix C.3.

Empirical proctection of NAG implementing NAG with a large noise scale according to theorem 6.1 may seriously degrade model efficacy. Contemporary insights [5] suggest that strict adherence to theoretical prescripts may not always be necessary, especially in the face of empirical adversaries whose capabilities may not rise to the level presumed by the defense mechanisms postulated. In this

paper we conduct a careful empirical investigation to assess the privacy-utility trade-off of NAG, with privacy evaluated by the SERA adversary. This investigation could also provide empirical evidence of the SERA s viability as a tool for auditing private GRL algorithms [9]. Furthermore, theorem 6.1 identifies a key determinant of the theoretical privacy bound for NAG the relative scale of the weight norms regarding the noise intensity. In light of this observation, we propose two distinct noise-infused training paradigms:

Unconstrained scheme We choose a fixed noise scale σ during both training and inference no constraints over the weights. The resulting model might not produce meaningful privacy guarantees in the sense of theorem 6.1 as the operator norms of weights are determined by the training dynamics.

Constrained scheme We choose a fixed noise scale σ during both training and inference and use normalization techniques [25] to provide a priori control of model weights, thereby providing tighter control of formal privacy level according to theorem 6.1.

We will empirically inspect the protection of NAG representations trained via both unconstrained and constrained schemes against SERA in section 7.3. Remark 6.2 (Alternative defenses). Beyond the scope of NAG, alternative defense mechanisms offer demonstrable protection assurances, one notable example being edge-wise randomized response (Edge RR). A comparison with such alternatives is reported in appendix D.5. Preliminary experimental comparisons indicate that NAG customarily realizes a more favorable balance between privacy and utility. Remark 6.3 (Impact of depth L). Theorem 6.1 posits that the privacy guarantees furnished by NAG diminishes with an increment in model depth, which is underpinned by the composition theorems of privacy analysis [12, 24]. An extensive discussion concerning the implications of GNN architectural design on the privacy-utility trade-off, particularly as it pertains to the depth of GNN models trained with NAG, will be provided in appendix E.

7 Experiments

In this section, comprehensive empirical studies are conducted to evaluate the effectiveness of SERA against both non-private and private node representations. By default, cos is employed as the standard measure of similarity across all experiments. The results corresponding to the use of corr as a metric were found to align with those obtained from cos, a concurrence that aligns with observations from [17]. This investigation is oriented around three core research questions:

RQ1 (Efficacy of SERA on Sparse Graphs): We evaluate SERA on synthetic datasets generated according to theorem 4.1, in addition to 8 real-world datasets to substantiate the effectiveness of SERA.

RQ2 (Deficiency of SERA on Dense Graphs): We evaluate SERA on synthetic stochastic block models to corroborate the theoretical assertions in theorem 5.1.

RQ3 (Mitigation of SERA through NAG): We evaluate SERA on privacy-enhanced node representations across three benchmark datasets generated using NAG with varied levels of noise. The outcomes affirm NAG s capacity for privacy preservation while concurrently delineating the limitations of SERA as a tool for privacy auditing.

Evaluation Metrics: Predominantly, this section documents the performance of attacks using the AUROC metric. A more expansive presentation of the results, inclusive of both AUROC and ERR, is postponed to appendix D.

7.1 Efficacy of SERA on sparse graphs

Erd os Rényi experiments In our first experiment, we test SERA on graph representations produced by (1) over Erd os Rényi graphs with edge probability p = log n

n with graph size n {100, 500, 1000}, which is a representative random graph model with controllable sparsity level. We set the weight to be the identity matrix and further present results under random weights in appendix D.1. We vary the feature dimension d {2j, 2 j 11} and network depth 1 L 10 in order to obtain a fine-grained assessment of SERA. We present the evaluations in figure 4. The results corroborate with our theoretical developments: We demonstrate that SERA is able to achieve near-perfect reconstruction of all edges only in the "large d, small L" regime. Notably, we find

Table 1: Performances of SERA on eight datasets measured by AUROC metric (%). The feature homophily Hfeature(G, X) = 1 |E| P

(u,v) E cos (Xu, Xv) is an alternate measure of correlation between feature similarity and edge presence. For each setup, the results (in the form of mean std) are obtained via 5 random trials.

Squirrel Chameleon Actor Cora Citeseer Pubmed Products Reddit

Hfeature 0.01 0.01 0.16 0.17 0.19 0.27 0.01 0.12

b AFS 46.2 55.2 44.7 80.3 87.4 87.6 52.0 95.9

Victim model b ASERA, non-trained

LIN(L = 2) 72.8 0.0 76.1 0.2 73.1 0.1 93.1 0.4 92.5 0.9 93.9 1.2 97.2 0.3 96.4 0.1 LIN(L = 5) 72.6 0.0 76.0 0.3 73.0 0.2 95.9 0.6 93.8 0.4 96.0 0.6 99.2 0.1 95.4 0.1 GCN(L = 2) 87.3 0.3 87.9 0.4 87.1 0.6 99.8 0.1 99.9 0.0 99.7 0.0 99.6 0.0 97.3 0.1 GCN(L = 5) 82.1 0.3 84.3 0.9 84.1 0.9 99.4 0.2 99.9 0.0 99.5 0.1 99.2 0.1 96.1 0.2

Victim model b ASERA, trained

LIN(L = 2) 74.6 0.0 75.0 0.3 59.9 0.7 94.6 0.1 93.7 0.1 89.0 0.1 91.6 0.2 94.7 0.1 LIN(L = 5) 74.1 0.3 76.9 0.2 61.6 0.7 94.8 0.3 93.3 0.3 88.4 0.9 98.6 0.1 92.3 0.2 GCN(L = 2) 79.4 0.4 82.3 0.3 78.5 0.8 97.8 0.1 99.0 0.0 89.2 0.3 94.5 0.1 95.1 0.1 GCN(L = 5) 77.4 0.6 80.6 0.8 78.4 0.6 97.4 0.3 98.7 0.2 92.6 0.4 98.4 0.1 95.0 0.1

SERA to be less successful under relatively deep network architectures (i.e., L 5) when the feature dimension is sufficiently large. Yet the behaviors in small d regimes appear to be less predictable. 3 Furthermore, the influence of the feature dimension appears to be more pronounced than that of the network depth. This suggests that a greater number of features, despite their independence from graph topology, lead to potentially more privacy risks as transmitted through GNN representations. Conversely, augmenting the network depth does not necessarily correlate with an elevation in the success rate of SERA.

Real-world data experiments Given that the Erd os Rényi model may not sufficiently capture the complexity of real-world graph structures, we evaluated the SERA algorithm on 8 diverse real-world graph datasets that exhibit contrasting patterns of feature similarity and edge formation. The analysis comprises the well-known Planetoid datasets [41], which are distinguished by their high homophily; the heterophilic datasets Squirrel, Chameleon, and Actor [29], which demonstrate a weak feature-edge correlation; and two larger-scale datasets, namely Amazon-Products [42] and Reddit [14]. Dataset statistics are comprehensively detailed in appendix B.2. Half of the datasets analyzed manifest a strong positive correlation of feature similarity and edge presence, which is measured via the AUROC of the estimator b AFS uv(τ) = 1 (cos(Xu, Xv) τ), while the other half show negligible correlations, an observation underscored in the baseline ( b AFS) row of table 1. In all evaluations, we standardize the hidden dimension to d = 128, with the number of GNN layers adjusted to L 2, 5. Our analysis extends beyond the linear aggregation scheme (1) to encompass four additional prominent GNN architectures: GCN [20], GAT [32], GIN [38], and SAGE [14]. To discern the effect of training dynamics on the potency of attacks, we delineate results for both pre-training (i.e., random initialization) and post-training stages. A precise account of training methodologies can be found in appendix D.2. Results pertaining to the linear GNN (LIN) and GCN are presented in table 1, with a comprehensive evaluation reserved for appendix D.2. We have the following observations:

Homophily is not necessary for SERA to succeed: The efficacy of SERA on the Planetoid datasets aligns with expectations. However, the outcomes from 4 heterophilic datasets illuminate significant privacy risks, despite a vacuous association between feature resemblance and edge formation. Notably, the Squirrel and Actor datasets, which demonstrate a mild negative feature-edge correlation, are still subject to substantial privacy breach, particularly with nonlinear models. These empirical findings support our theoretical assertion that a graph s sparsity plays a more pivotal role in its susceptibility to edge reconstruction attacks than the degree of homophily it exhibits. Moreover, in instances of

3In our analysis, one primary mathematical tool is the concentration of inner products of two Gaussian vectors, which is highly dependent on the dimension of the two vectors (i.e., the feature dimension). When the concentration is insufficient (a consequence of small d), our analysis would then be no longer correct and this partly explains why the attacking performance is limited in small d regimes.

2 4 6 8 10 log2 d

2 4 6 8 10 log2 d

2 4 6 8 10 log2 d

(a) Erd os Rényi

2 4 6 8 10 log2 d

2 4 6 8 10 log2 d

2 4 6 8 10 log2 d

Figure 1: Attacking efficacy of SERA over sparse Erd os Rényi graphs and dense SBM graphs, with performance measured in AUROC metric averaged over 5 random trials for each configuration.

comparatively denser networks, such as the Reddit dataset, the homophily of features can be exploited to mount more sophisticated attacks.

Efficacy of Linear GNNs as Proxies for Nonlinear Counterparts: Evidence presented in table 1 suggests that the trends exhibited by linear GNN models are broadly reflective of those displayed by their nonlinear, GCN equivalents. It is typically observed that the attack efficacy is modestly reduced in the linear GNN setting, with further details deferred to Appendix D.2.

Influence of Network Depth and Training Dynamics: Table 1 indicates that the post-training performance of SERA is frequently less effective compared to the scenarios with randomly initialized weights. This observation may be attributed to the notion that supervised training tends to adversely affect the conditioning of weight matrices relative to their initialized state. Additionally, augmenting model depth does not correspond with enhanced attack efficacy, an outcome that is in alignment with our theoretical predictions.

7.2 Deficiency of SERA on dense graphs

SBM experiments In this section, we test SERA graph representations over SBM graphs with K = 3, p = 0.3, q = 0.05, with the rest of the experimental setups analogous to that in the Erd os Rényi experiments. The evaluations are presented in figure 7. The results reveal the presence of a pronounced barrier that hinders the success of the attack across a wide range of configurations corresponding to different network depths and feature dimensions. Furthermore, we observe that the results tend to stabilize as the size of the graph increases. We provide a further study on the impact of SBM structure in appendix D.1.

7.3 Mitigation of SERA through NAG

In this section, we empirically study the defensive performance of noisy aggregation (8) against SERA We will use the Planetoid datasets [41] for evaluation. We consider a transductive node classification setting and use the standard train-test splits. The GNN models are trained using the training labels and evaluated on the test nodes. The performances of SERA are evaluated on the subgraphs induced by the test nodes. We report the configuration of GNN encoding, as well as the attacking pipeline and training hyperparameters in appendix D.3.1. Due to space limits, we report results on the Cora dataset under GCN and GAT in the main text and postpone the complete report in appendix D.3. We use the following two types of training configurations as proposed in section 6:

Under the unconstrained scheme, we use aggressive perturbation plans by applying noise with scale range σ {0, 1, 2, 4}, with σ = 0 indicating no protection, and d {2i, 5 i 13}.

Under the constrained scheme, we adopt the spectral normalization technique [25] to control the spectral norm of each layer at approximately 1 (with relative error < 10%). We use conservative perturbation plans by applying noise with scale range σ {0, 0.01, 0.05, 0.1, 0.5, 1}, and d {2i, 5 i 13}. Note that with σ = 1, we obtain a non-vacuous lower bound according to (9). We present the evaluations in figure 2 and summarize our observations and findings as follows:

SERA empirically elicits privacy-utility trade-off under the constrained scheme When the noise level is moderate, i.e., σ {0.01, 0.05}. The result demonstrates that privacy and utility are, at least to some extent, at odds: Under lower noise level, SERA is able to achieve non-trivial success especially when d is small. Furthermore, raising the feature dimension d results in both a decrease

5 6 7 8 9 10 11 12 13 log2 d

GCN(Constrained)

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(a) SERA performance

5 6 7 8 9 10 11 12 13 log2 d

GAT(Constrained)

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(b) SERA performance

5 6 7 8 9 10 11 12 13 log2 d

GCN(Unconstrained)

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(c) SERA performance

5 6 7 8 9 10 11 12 13 log2 d

GAT(Unconstrained)

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(d) SERA performance

5 6 7 8 9 10 11 12 13 log2 d

GCN(Constrained)

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(e) model performance

5 6 7 8 9 10 11 12 13 log2 d

GAT(Constrained)

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(f) model performance

5 6 7 8 9 10 11 12 13 log2 d

GCN(Unconstrained)

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(g) model performance

5 6 7 8 9 10 11 12 13 log2 d

GAT(Unconstrained)

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(h) model performance

Figure 2: Privacy and utility assessments on the Cora dataset with underlying model of NAG being GCN and GAT. The first row contains attack performances of SERA measured using AUROC metric under both constrained and unconstrained training scheme. The second row presents corresponding model performances.

in utility as well as an increase in privacy. This is actually predictable: Since we explicitly control the operator norm to be around 1, a larger d implies a smaller "signal-to-noise ratio" with the signal being (loosely) defined as the magnitude of the aggregated node representations.

SERA losses power against NAG using larger ds in the unconstrained scheme A surprising evidence according to figure 2 is that when the feature dimension d is sufficiently raised, i.e., d > 1024, the attacking performances degrade. Consequently, we are able to achieve decent protection against SERA (AUROC < 0.6) while at the same time incurring slight degradation in model utility (> 0.7 Accuracy in Cora) Moreover, the phenomenon is more evident for higher noise levels. While the outcome seems favorable insofar as we have identified GNN solutions that manifest both high performance and a degree of privacy since the training procedure is unrelated to the attacking mechanism, these solutions may exhibit diminished robustness, as the corresponding Lipschitz constants are likely to be inadequately regulated [40]. Due to space limits, we postpone a more detailed discussion to appendix D.4.

8 Discussion and conclusion

In this paper, we have studied the behavior of the SERA adversary by characterizing its performance against different kinds of underlying graph structures as well as encoding mechanisms. Theoretically, we first identify sparse random graphs where SERA provably reconstructs the input graph, which ascertains the empirical findings of previous works. We then reveal limitations of SERA by showing its performance lower bounds when the input graph follows a dense SBM. Additionally, we discuss protection mechanisms to SERA by exploiting both theoretically and empirically the defensive capability of NAG. Empirical investigations corroborate with our theoretical findings, while suggesting intriguing phenomenons that questions the viability of SERA as a formal privacy auditing procedure for private graph representations. Notwithstanding, several research problems warrant further study, which we discuss in appendix E alongside with the limitations of this paper.

9 Acknowledgements

The authors from Ant Group are supported by the Leading Innovative and Entrepreneur Team Introduction Program of Hangzhou (Grant No.TD2022005). Guanhua Fang is partly supported by the National Natural Science Foundation of China (nos. 12301376).

[1] M. Abadi, A. Chu, I. Goodfellow, H. B. Mc Mahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308 318, 2016.

[2] E. Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research, 18(177):1 86, 2018.

[3] P. Awasthi, A. Das, and S. Gollapudi. A convergence analysis of gradient descent on graph neural networks. Advances in Neural Information Processing Systems, 34:20385 20397, 2021.

[4] C. L. Canonne. A short note on an inequality between kl and tv. ar Xiv preprint ar Xiv:2202.07198, 2022.

[5] N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium (USENIX Security 19), pages 267 284, 2019.

[6] S. Chanpuriya, C. Musco, K. Sotiropoulos, and C. Tsourakakis. Deepwalking backwards: from embeddings back to graphs. In International Conference on Machine Learning, pages 1473 1483. PMLR, 2021.

[7] E. Chien, W.-N. Chen, C. Pan, P. Li, A. Ozgur, and O. Milenkovic. Differentially private decoupled graph convolutions for multigranular topology protection. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.

[8] A. Chung, A. Saberi, and M. Austern. Statistical guarantees for link prediction using graph neural networks. ar Xiv preprint ar Xiv:2402.02692, 2024.

[9] R. Cummings, D. Desfontaines, D. Evans, R. Geambasu, M. Jagielski, Y. Huang, P. Kairouz, G. Kamath, S. Oh, O. Ohrimenko, et al. Challenges towards the next frontier in privacy. ar Xiv preprint ar Xiv:2304.06929, 2023.

[10] E. Dai, T. Zhao, H. Zhu, J. Xu, Z. Guo, H. Liu, J. Tang, and S. Wang. A comprehensive survey on trustworthy graph neural networks: Privacy, robustness, fairness, and explainability. ar Xiv preprint ar Xiv:2204.08570, 2022.

[11] V. Duddu, A. Boutet, and V. Shejwalkar. Quantifying privacy leakage in graph embedding. In Mobi Quitous 2020-17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, pages 76 85, 2020.

[12] C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 4):211 407, 2014.

[13] M. Fey and J. E. Lenssen. Fast graph representation learning with Py Torch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.

[14] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.

[15] W. L. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs: Methods and applications. ar Xiv preprint ar Xiv:1709.05584, 2017.

[16] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1026 1034, 2015.

[17] X. He, J. Jia, M. Backes, N. Z. Gong, and Y. Zhang. Stealing links from graph neural networks. In 30th USENIX Security Symposium (USENIX Security 21), pages 2669 2686, 2021.

[18] D. Jin, R. Wang, M. Ge, D. He, X. Li, W. Lin, and W. Zhang. Raw-gnn: Random walk aggregation based graph neural network. ar Xiv preprint ar Xiv:2206.13953, 2022.

[19] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793 826, 2011. [20] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. ar Xiv preprint ar Xiv:1609.02907, 2016. [21] E. L. Lehmann, J. P. Romano, and G. Casella. Testing statistical hypotheses, volume 3. Springer, 1986. [22] S. Luan, C. Hua, M. Xu, Q. Lu, J. Zhu, X.-W. Chang, J. Fu, J. Leskovec, and D. Precup. When do graph neural networks help with node classification? investigating the homophily principle on node distinguishability. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. [23] L. Meng, Y. Bai, Y. Chen, Y. Hu, W. Xu, and H. Weng. Devil in disguise: Breaching graph neural networks privacy through infiltration. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1153 1167, 2023. [24] I. Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pages 263 275. IEEE, 2017. [25] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. ar Xiv preprint ar Xiv:1802.05957, 2018. [26] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75 84, 2007. [27] K. Oono and T. Suzuki. Graph neural networks exponentially lose expressive power for node classification. ar Xiv preprint ar Xiv:1905.10947, 2019. [28] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Z. Yang, Z. De Vito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Neur IPS, pages 8024 8035, 2019. [29] H. Pei, B. Wei, K. C.-C. Chang, Y. Lei, and B. Yang. Geom-gcn: Geometric graph convolutional networks. ar Xiv preprint ar Xiv:2002.05287, 2020. [30] S. Sajadmanesh, A. S. Shamsabadi, A. Bellet, and D. Gatica-Perez. Gap: Differentially private graph neural networks with aggregation perturbation. In USENIX Security 2023-32nd USENIX Security Symposium, 2023. [31] F. Tramer, A. Terzis, T. Steinke, S. Song, M. Jagielski, and N. Carlini. Debugging differential privacy: A case study for privacy auditing, 2022. [32] P. Veliˇckovi c, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In International Conference on Learning Representations, 2018. [33] B. Wang, J. Guo, A. Li, Y. Chen, and H. Li. Privacy-preserving representation learning on graphs: A mutual information perspective. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1667 1676, 2021. [34] F. Wu, Y. Long, C. Zhang, and B. Li. Linkteller: Recovering private edges from graph neural networks via influence analysis. In 2022 IEEE Symposium on Security and Privacy (SP), pages 2005 2024. IEEE, 2022. [35] F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger. Simplifying graph convolutional networks. In International conference on machine learning, pages 6861 6871. PMLR, 2019. [36] R. Wu, M. Zhang, L. Lyu, X. Xu, X. Hao, X. Fu, T. Liu, T. Zhang, and W. Wang. Privacypreserving design of graph neural networks with applications to vertical federated learning. ar Xiv preprint ar Xiv:2310.20552, 2023. [37] X. Wu, Z. Chen, W. Wang, and A. Jadbabaie. A non-asymptotic analysis of oversmoothing in graph neural networks. ar Xiv preprint ar Xiv:2212.10701, 2022. [38] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? ar Xiv preprint ar Xiv:1810.00826, 2018. [39] K. Xu, M. Zhang, S. Jegelka, and K. Kawaguchi. Optimization of graph neural networks: Implicit acceleration by skip connections and more depth. In International Conference on Machine Learning, pages 11592 11602. PMLR, 2021.

[40] Y.-Y. Yang, C. Rashtchian, H. Zhang, R. R. Salakhutdinov, and K. Chaudhuri. A closer look at accuracy vs. robustness. Advances in neural information processing systems, 33:8588 8601, 2020. [41] Z. Yang, W. Cohen, and R. Salakhudinov. Revisiting semi-supervised learning with graph embeddings. In International conference on machine learning, pages 40 48. PMLR, 2016. [42] H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. Prasanna. Graphsaint: Graph sampling based inductive learning method. ar Xiv preprint ar Xiv:1907.04931, 2019. [43] H. Zhang, B. Wu, S. Wang, X. Yang, M. Xue, S. Pan, and X. Yuan. Demystifying uneven vulnerability of link stealing attacks against graph neural networks. In International Conference on Machine Learning, pages 41737 41752. PMLR, 2023. [44] Z. Zhang, Q. Liu, Z. Huang, H. Wang, C.-K. Lee, and E. Chen. Model inversion attacks against graph neural networks. IEEE Transactions on Knowledge and Data Engineering, 2022. [45] Z. Zhang, Q. Liu, Z. Huang, H. Wang, C. Lu, C. Liu, and E. Chen. Graphmi: Extracting private graph data from graph neural networks. ar Xiv preprint ar Xiv:2106.02820, 2021. [46] Z. Zhou, C. Zhou, X. Li, J. Yao, Q. Yao, and B. Han. On strengthening and defending graph reconstruction attack with markov chain approximation. ar Xiv preprint ar Xiv:2306.09104, 2023.

Table of Contents

A Broader impacts 14

B Additional backgrounds 15 B.1 VFL and the SERA adversary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 B.2 Measures of homophily . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 B.3 Representative GNNs and their corresponding aggregations rules . . . . . . . . . 16

C Proofs of Theorems 16 C.1 Proof of theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 C.2 Proof of theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 C.3 Proof of theorem 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

D Further experiments 26 D.1 Synthetic experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 D.2 A complete report of attack performance on real-world datasets . . . . . . . . . 28 D.3 A complete report of privacy-utility assessments on Planetoid datasets . . . . . . 28 D.4 Spectrum study of GNN solutions obtained under the unconstrained scheme . . . 33 D.5 Privacy-utility trade-off comparisons: NAG vs Edge RR . . . . . . . . . . . . . 33

E Discussions and Limitations 35 E.1 On the impact of depth L for NAG . . . . . . . . . . . . . . . . . . . . . . . . 35 E.2 Stronger adversary for dense graphs or deep encoders . . . . . . . . . . . . . . 36 E.3 Extension to more complicated victim GNN models . . . . . . . . . . . . . . . 36 E.4 Quantifying the advantage of adversaries with more knowledge . . . . . . . . . 36

A Broader impacts

The pervasive integration of graph representation learning (GRL) into various sectors, from social networks to bioinformatics, underscores the necessity of addressing the security and privacy risks inherent in these technologies. This paper contributes to the understanding of such risks by dissecting the structural vulnerabilities of graph representations under cosine-similarity-based edge reconstruction attacks (SERA). Our work has significant ethical implications and societal consequences, as we aim to balance the need for advanced data analytics with the imperative of safeguarding individual and community privacy.

Theoretically articulating the success and failure modes of SERA, our research offers a framework for evaluating GRL models against potential privacy breaches. The insights gained can guide the development of more secure algorithms that resist inadvertent information disclosure. By highlighting the efficacy of SERA in various settings, this paper also underscores the potential for such attacks to serve as auditing tools for privacy-preserving mechanisms, thereby fostering the creation of more trustworthy GRL systems.

As GRL technologies continue to evolve, our work calls attention to the importance of proactive privacy research in the field. It encourages the industry to adopt privacy-by-design principles and serves as a reminder to policymakers to consider the implications of GRL in legislation around data protection. Future societal consequences hinge on our ability to reconcile the benefits of GRL with the privacy rights of individuals, necessitating ongoing research, transparent practices, and informed governance to navigate this complex landscape.

B Additional backgrounds

B.1 VFL and the SERA adversary

Party A Party B

node feature encoder GNN decoder loss

label sampler

Figure 3: Illustration of a typical vertically federated graph representation learning scenario, the figure is adapted from [36].

We describe the scenario of vertically federated graph representation learning mentioned in section 1 in more detail, with the system architecture illustrated in figure 3.

VFL setup Under this scenario, we assume that party A (on the left side of figure 3 holds the graph as well as the node features, and party B (on the right side of figure 3 holds the node labels. Such kind of scenarios are encountered in applications like financial risk management [36]. Under VFL protocols, party A and party B iteratively exchange intermediate outputs to facilitate collaborative learning. The most representative method is split learning [36], where in each step, party A sends the node representations of a possibly sampled subgraph encoded using a graph neural network whose parameters are stored at party A. We call this operation the forward transmission step and highlighted in figure 3 as the red arrow.

SERA adversary We assume that party B is honest-but-curious in the sense that party B strictly follows the VFL protocol but tries to infer the graph structure belonged to party A, both during training and during inference, as the forward step is required for both stages. The attack requires nothing more than the VFL protocol: In each step, the two parties agree on a list of node indices that participates in this step (typically carried out using some cryptographically secure primitives), which constitutes the potential victim nodes Vvictim. Upon receiving the node embeddings from party A, party B is then free to conduct SERA that targets the topological structure of Gvictim. Furthermore, party B can even target a larger subgraph via storing multiple batches of embeddings and conduct attacks based on the unioned collections. Note that the adversary does not have access to GNN model parameters as they are kept locally at party A, which manifests the practicality of the proposed SERA adversary.

B.2 Measures of homophily

The homophily metric is a way to describe the relationship between node properties and graph structure. Depending on the intrinsic nature of the property, we use two types of homophily measures: The label homophily (or edge homophily)

Hlabel(G, Y ) = 1 |E|

(u,v) E 1 (Yu = Yv) (10)

which measures the averaged agreement of adjacent nodes labels and the feature homophily (or generalized homophily [18, 22])

Hfeature(G, X) = 1 |E|

(u,v) E cos (Xu, Xv) (11)

which replaces the agreement measure in the definition of label homophily with the cosine similarity between adjacent nodes features. Another important metric we use in the experiments is the AUROC of guessing edge presence using feature similarities, i.e., b AFS uv(τ) = 1 (cos(xu, xv) τ). Note that Hfeature might be related to but not always correlate well with the AUROC of b AFS, as Hfeature ignores the feature similarity of non-edges.

The feature homophily metric is sometimes related to, but not always correlated with the

B.3 Representative GNNs and their corresponding aggregations rules

Here we briefly review GNN architectures that are involved in theorem 6.1 in the message-passing form as in (8):

Mean pooling [14] This is the most standard form of message passing GNN. With the un-normalized and un-perturbed version analyzed in section 4 and 5:

H(l) v = Re LU

Wl H(l 1) u H(l 1) u 2

(SAGE-meanpool)

Summation pooling [38] This is a simplified version of the GIN model which is also analyzed in [36]:

H(l) v = Re LU

Wl H(l 1) u H(l 1) u 2

Max pooling [14] In its un-normalized and un-perturbed version, this corresponds to the mostly used SAGE model:

H(l) v = Re LU

max u N(v) {v} Wl H(l 1) u H(l 1) u 2

(SAGE-maxpool)

GCN pooling [20] The GCN pooling takes the form

H(l) v = Re LU

Wl H(l 1) u du + 1 H(l 1) u 2

Attentive pooling [32] This is also know as the GAT model. To simplify notations, let H(l) v = H(l) v / H(l) v 2, then the GAT model is recursively defined as

H(l) v = Re LU

u N(v) {v} αuv Wl H(l 1) u + ϵ

αuv = exp Leaky Re LU βsrc, Wl H(l 1) u + βdst, Wl H(l 1) v

u N(v) {v} exp Leaky Re LU βsrc, Wl H(l 1) v + βdst, Wl H(l 1) v

where βsrc, βdst Rd are learnable vector parameters.

C Proofs of Theorems

C.1 Proof of theorem 4.1

In the proof, for notational simplicity, we abuse notation by treating A = A + I and D = D + I (i.e., self-edge is included in the edge graph). We then define A(L) := A ... |{z} L times A and p(L) ij :=

((D 1A)L)ij.

Proof of the theorem. For any pair of two nodes i and j, we next recall the formula of cosine similarity, cos θ(H(L) i , H(L) j ),

cos θ(H(L) i , H(L) j ) := H(L) i , H(L) j q

H(L) i , H(L) i H(L) j , H(L) j , (12)

which will be used recurrently in the following main proof.

According to the generation mechanism of node features (i.e., isotropic Gaussian assumption), we have that

d Xj 2 1| 3

d Xj, Xj | 3

for all j, j with probability at least 1 1/n2.

Case 1: without considering the learnable weight matrix W. For the numerator in cos θ(H(L) i1 , H(L) i2 ), when i1 and i2 are truly connected, we have

H(L) i1 , H(L) i2 =

j=1 p(L) i1j p(L) i2j Xj 2 + X

j =j p(L) i1j p(L) i2j Xj, Xj

p(L) i1i1p(L) i2i1 Xi1 2 + X

j =j p(L) i1j p(L) i2j Xj, Xj

|N (L) i1 ||N (L) i2 | Xi1 2 + X

j =j p(L) i1j p(L) i2j Xj, Xj

1 (C2 log n)2L 3

d (use the fact that X

j =j p(L) i1j p(L) i2j 1)

2 3 1 (C2 log n)2L (14)

when d > 9(C2 log n)4L+2 log n. On the other hand, when i1 and i2 are not connected, by Lemma C.2, we know that, with high probability, there are at most (C2 log n)2L

n n(n 1)/2 pairs of i1, i2 such that P

j A(L) i1j A(L) i2j 1 which is equivalent to P

j p(L) i1j p(L) i2j > 0. For the rest of pairs, we have

H(L) i1 , H(L) i2 = X

j =j p(L) i1j p(L) i2j Xj, Xj

< 1 3 1 (C2 log n)3L+1 , (15)

when d > 9(C2 log n)6L+2 log n.

For the denominator ( H(L) i1 H(L) i2 )1/2, we give the upper and lower bounds of H(L) i . We can compute

H(L) i , H(L) i =

j=1 p(L) ij p(L) ij Xj 2 + X

j =j p(L) ij p(L) ij Xj, Xj

3 1 (C2 log n)2L+1 , (16)

where we use the fact that P

j p(L) ij p(L) ij 1. Conversely, we have

H(L) i , H(L) i =

j=1 p(L) ij p(L) ij Xj 2 + X

j =j p(L) ij p(L) ij Xj, Xj

1 (C2 log n)L 3

1 (C2 log n)L 1

3 1 (C2 log n)2L+1 , (17)

where we use the fact that P

j p(L) ij p(L) ij 1/(C2 log n)L when |N (L) i | (C2 log n)L.

To sum up, cos θ(H(L) i1 , H(L) i2 ) is at least 2 3 1 (C2 log n)2L /(1 + 1 3(C2 log n)2L+1 ) 1

2 1 (C2 log n)2L (18)

when node i1 and i2 are connected. On the other hand, cos θ(H(L) i1 , H(L) i2 ) is at most 1 3 1 (C2 log n)3L+1 /( 1 (C2 log n)L 1

3 1 (C2 log n)2L+1 ) < 1

2 1 (C2 log n)2L (19)

for all pairs (except at most (C2 log n)2L

n n(n 1)/2 pairs) of disconnected nodes i1 and i2.

By choosing the cutoff τ = 1

2 1 (C2 log n)2L , with probability at least 1 2/n2, we have the false

negative is zero and the false positive is (C2 log n)2L

Case 2: with considering the learnable weight matrix W. Additionally, if the learnable weight W is taken into account, we can derive the following results. We define κ1 and κ2 to be the largest and smallest positive constants such that κ1 X, X WX, WX κ2 X, X holds. It is easy to see that κ2/κ1 = (κ(W))2. Then the parallel version of (14) becomes

H(L) i1 , H(L) i2 κ1 2 3 1 (C2 log n)2L . (20)

The parallel version of (15) becomes

H(L) i1 , H(L) i2 3κ2

The parallel version of (16) becomes

H(L) i , H(L) i κ2(1 + 3

The parallel version of (17) becomes

H(L) i1 , H(L) i2 κ1( 1 (C2 log n)L 3

Combining above results, we have that

cos θ(H(L) i1 , H(L) i2 ) κ1 2

3 1 (C2 log n)2L

1 (C2 log n)2L =: cut1(L) (24)

when i1, i2 are connected and d log2 n and

cos θ(H(L) i1 , H(L) i2 ) 3κ2 q

κ1( 1 (C2 log n)L 3 q

d 1 (C2 log n)L =: cut2(L) (25)

when i1, i2 are not connected and d (C2 log n)6L+2 log n.

Therefore as long as d (C2 log n)6L+2 log n and

κ2 )2 8(C2 log n)3L

holds, we can choose any cutoff τ between cut1(L) and cut2(L) so that false negative rate is zero and false positive rate is no larger than (C2 log n)2L/n. This completes the proof regarding FPR and FNR. For the implications in AUROC, the result follows immediately by noting the discoveries are montone in τ.

Supporting Lemma of Theorem 4.1 The following Lemmas are used for controlling the number of pairs of nodes (u, v) s which satisfy P

j A(L) uj A(L) vj 1.

Lemma C.1. Let B(n, p) denote the binomial distribution with probability p and size n.

1. Suppose X dominates B(n, p). For any a > 0, we have

P(X < np a) exp{ a2/2np}. (27)

2. Suppose X is dominated by B(n, p). For any a > 0, we have

P(X > np + a) min{exp{ a2/2np + a3/(np)3}, exp{ a2

2np + 2a/3}}. (28)

Proof of Lemma C.1 is standard and we omit it here. The consequences of this Lemma is that C1

2 log n with high probability at least 1 1/n2 for all node i [n].

Lemma C.2. Given a graph with edge probability p (p C1 log n

j A(L) i1j A(L) i2j 1) (C2 log n)2L

where i1, i2 are two nodes uniformly randomly sampled from the graph.

Proof of Lemma C.2. By recalling the definition of A(L) ij that A(L) ij equals one only when node i and node j can be connected within a path of length L. Therefore, with probability at least 1 1/n2, it holds |N (L) j | ( 3C1 log n

2 )L, where N (L) j := {i : A(L) ij = 1}

Note that, given fixed j, Ai1j Ai2j is greater than 0 only if i1, i2 N (L) j . By the symmetry, we

know that this happens with probability at most |N (L) j |(|N (L) j | 1) n(n 1) when i1, i2 are uniformly randomly sampled. Therefore, by union bound, we have

j A(L) i1j A(L) i2j 1) X

|N (L) j |(|N (L) j | 1) n(n 1)

(1.5C1 log n)2L

which concludes the proof.

The implication of this lemma is that there are at most n (C2 log n)2L pairs of (u, v) such that P

j A(L) uj A(L) vj 1.

Extension to different similarities The cosine similarity can be replaced by other similarity metrics. For example, we can use correlation similarity, i.e.,

corr(H(L) i , H(L) j ) :=:= H(L) i H(L) i , H(L) j H(L) j q

H(L) i H(L) i , H(L) i H(L) i H(L) j H(L) j , H(L) j H(L) j ,

where H(L) i is a d-dimensional vector whose elements are equal to the mean of H(L) i . By treating Xj Xj as Xj (j [n]), where Xj is a d-dimensional vector whose elements are equal to the mean of Xj. (13) changes to

d Xj 2 1| 4

d Xj, Xj | 4

Therefore, the above proof still holds by adjusting the constant accordingly.

C.2 Proof of theorem 5.1

To prove the desired result, we first need the following lemmas. In the rest of proof, we abuse the notation by treating p as p0 and q as q0.

By applying the Hoeffding s inequality, we can obtain the following two lemmas.

Lemma C.3. It holds | P j:i,j in the same group Aij n

K p0| 3 log n =: ϵ1 for all i with probability at least 1 1/n2.

Lemma C.4. Suppose i is in group k, it holds | P

j:i,j in the group k ( = k) Aij n

K q0| min{ 1

2 n K q0, 3 log n} =: ϵ2 for all i with probability at least 1 1/n2.

Combining Lemma C.3 and Lemma C.4, we have the following lemma.

Lemma C.5. It holds | P

K p0 + (n n

K ) q0)| ϵ1 + (K 1)ϵ2 with probability at least 1 2/n2.

In summary, with high probability confidence, Lemma C.5 gives the characterization of degree (i.e. number of neighbours) of every node i.

We then make a step forward and characterize the normalized degree p(L) ij for L 2 in the following lemmas.

Lemma C.6. With probability at least 1 1/n2, it holds that |A(2) ij ( n

K p2 0 + (n n/K)q0p0)|

2 n K q0 for i, j from the same group and |A(2) ij ( n

K p0q0+(n n/K)q2 0)| min{ 2

K p0q0+ (n n/K)q2 0), 3(K 1) log n} for i, j from different groups.

Lemma C.7. For L 2, suppose there exist constants a(L) 1 and a(L) 2 such that |A(L) ij a(L) 1 | ϵ(L) 1 when i, j are in the same group and |A(L) ij a(L) 2 | ϵ(L) 2 when i, j are not in the same group. It holds that

|A(L+1) ij a(L+1) 1 | ϵ(L) 1 i, j in the same group

|A(L+1) ij a(L+1) 2 | ϵ(L) 2 i, j not in the same group, (31)

a(L+1) 1 := (a(L) 1 n K p0 + a(L) 2 (n n/K)q0),

a(L+1) 2 := a(L) 1 n K q0 + a(L) 2 n K p0 + a(L) 2 (n 2n/K)q0

ϵ(L+1) 1 := ϵ(L) 1 n K p0 + ϵ1a(L) 1 + ϵ1ϵ(L) 1 + ϵ(L) 2 (n n/K)q0 + (K 1)ϵ2a(L) 2 + (K 1)ϵ2ϵ(L) 2 ,

ϵ(L+1) 2 := ϵ(L) 1 n K q0 + ϵ2a(L) 1 + ϵ2ϵ(L) 1 + ϵ(L) 2 n K p0 + ϵ1a(L) 2 + ϵ1ϵ(L) 2 + ϵ(L) 2 (n 2n/K)q0

+ (K 2)ϵ2a(L) 2 + (K 2)ϵ2ϵ(L) 2 .

Proof of Lemma C.6 is a special case of that of Lemma C.7. In the following, we prove Lemma C.7.

Proof of Lemma C.7. By the definition, we know A(L+1) ij = P

j A(L) ij Aj j.

When i, j are from the same class (w.l.o.g, we denote it as class 1), then it holds

|A(L+1) ij (a(L) 1 n K p0 + a(L) 2 (n n/K)q0)|

j A(L) ij Aj j (a(L) 1 n K p0 + a(L) 2 (n n/K)q0)|

j in class 1 A(L) ij Aj j a(L) 1 n K p0| + | X

j not in class 1 A(L) ij Aj j a(L) 2 (n n/K)q0|

= ϵ(L) 1 n K p0 + ϵ1a(L) 1 + ϵ1ϵ(L) 1 + ϵ(L) 2 (n n/K)q0 + (K 1)ϵ2a(L) 2 + (K 1)ϵ2ϵ(L) 2 .(32)

Therefore, we can let a(L+1) 1 := (a(L) 1 n k p0 + a(L) 2 (n n/K)q0) and ϵ(L+1) 1 := ϵ(L) 1 n K p0 + ϵ1a(L) 1 + ϵ1ϵ(L) 1 + ϵ(L) 2 (n n/K)q0 + (K 1)ϵ2a(L) 2 + (K 1)ϵ2ϵ(L) 2 .

When i, j are not from the same class (w.l.o.g. we assume i is from class 1 and j is from class 2), then it holds

|A(L+1) ij (a(L) 1 n K q0 + a(L) 2 n K p0 + a(L) 2 (n 2n/K)q0)|

j A(L) ij Aj j (a(L) 1 n K q0 + a(L) 2 n K p0 + a(L) 2 (n 2n/K)q0)|

j in class 1 A(L) ij Aj j a(L) 1 n k q0| + | X

j in class 2 A(L) ij Aj j a(L) 2 n K p0|

j not in class 1 & 2 A(L) ij Aj j a(L) 2 (n 2n/K)q0|

ϵ(L) 1 n K q0 + ϵ2a(L) 1 + ϵ2ϵ(L) 1 + ϵ(L) 2 n K p0 + ϵ1a(L) 2 + ϵ1ϵ(L) 2

+ϵ(L) 2 (n 2n/K)q0 + (K 2)ϵ2a(L) 2 + (K 2)ϵ2ϵ(L) 2 . (33)

Therefore, we can let a(L+1) 2 := a(L) 1 n K q0+a(L) 2 n K p0+a(L) 2 (n 2n/K)q0 and ϵ(L+1) 2 := ϵ(L) 1 n K q0+ ϵ2a(L) 1 +ϵ2ϵ(L) 1 +ϵ(L) 2 n K p0+ϵ1a(L) 2 +ϵ1ϵ(L) 2 +ϵ(L) 2 (n 2n/K)q0+(K 2)ϵ2a(L) 2 +(K 2)ϵ2ϵ(L) 2 .

By above induction, it can be seen that, for any fixed L, ϵ(L) 1 /a(L) 1 = Op( log n

n ), ϵ(L) 2 /a(L) 1 = Op( log n

n ). It also holds a(L) 2 /a(L) 1 = Op( log n

n ) when true edge probability satisfies q0 = Op( log n

n ), and ϵ(L) 2 /a(L) 2 = Op( log n

n ) when q0 log n

Recall the definition that p(L) ij = ((D 1A)L)ij, therefore p(L) ij A(L) ij for any fixed i. In other words, for fixed L 2, we have

p(L) ij := p(L) ij + Op(k log n

n2 ) = a(L) 1 n K a(L) 1 + (n n

K ) a(L) 2 | {z }

+Op(k log n

n2 ), i, j in the same group,

p(L) ij = p(L) ij + Op(k log n

n2 ) = a(L) 2 n K a(L) 1 + (n n

K ) a(L) 2 | {z }

+Op(k log n

n2 ), i, j not in the same group.

Here, on a very high level, we can treat p(L) ij as the population version of ((D 1A)L)ij. When i, j in

the same group, then p(L) ij p(L) 1 . Otherwise, p(L) ij p(L) 2 . With above preparations, we are ready to prove the theorem as follows.

Proof of the theorem. We need to consider the case L 2 and L = 1 separately.

Case 1: L 2. We define Xk := P

i group k Xi and r(L) := a(L) 2 /a(L) 1 . For the numerator in

cos θ(H(L) i1 , H(L) i2 ), when i1 and i2 are in the same group (w.l.o.g, suppose it is group 1), we have

H(L) i1 , H(L) i2 =

j=1 p(L) i1j p(L) i2j Xj 2 + X

j =j p(L) i1j p(L) i2j Xj, Xj

j=1 p(L) i1j p(L) i2j Xj 2 + X

j =j p(L) i1j p(L) i2j Xj, Xj + error (35)

= p(L) 1 p(L) 1 X1, X1 + 2 X

k =1 p(L) 1 p(L) 2 X1, Xk + X

k =1 p(L) 2 p(L) 2 Xk, Xk

k =k =1 p(L) 2 p(L) 2 Xk, Xk + error

= p(L) 1 p(L) 1 n K + (K 1)p(L) 2 p(L) 2 n K + Op((K 1)p(L) 1 p(L) 2 n K 1

+(K 1)(K 2)p(L) 2 p(L) 2 n K 1

d ) + error,

where (36) uses the property of node feature generation mechanism that Xk, Xk = n

1/d) for any k and Xk, Xk = Op( n K

d) for k = k . Here the error term in (36) is error := Pn j=1(p(L) i1j p(L) i2j p(L) i1j p(L) i2j ) Xj 2 + P

j =j (p(L) i1j p(L) i2j p(L) i1j p(L) i2j ) Xj, Xj , which can be controlled as follows.

|error| = |

j=1 (p(L) i1j p(L) i2j p(L) i1j p(L) i2j ) Xj 2 + X

j =j (p(L) i1j p(L) i2j p(L) i1j p(L) i2j ) Xj, Xj |

j=1 (p(L) i1j p(L) i2j p(L) i1j p(L) i2j ) Xj 2| + | X

j =j (p(L) i1j p(L) i2j p(L) i1j p(L) i2j ) Xj, Xj |

= Op(k log n

n2 + k log n

where (37) utilizes the fact that P

j pij 1 for any i and (34) by adjusting the constant.

When i1, i2 are not in the same group (w.l.o.g, suppose i1 in group 1 and i2 in group 2), we have

H(L) i1 , H(L) i2 =

j=1 p(L) i1j p(L) i2j Xj 2 + X

j =j p(L) i1j p(L) i2j Xj, Xj

j=1 p(L) i1j p(L) i2j Xj 2 + X

j =j p(L) i1j p(L) i2j Xj, Xj + error

= p(L) 1 p(L) 2 ( X1, X1 + X2, X2 ) + (p(L) 1 p(L) 1 + p(L) 2 p(L) 2 ) X1, X2

+(p(L) 1 p(L) 2 + p(L) 2 p(L) 2 ) X

k =1,2 X1 + X2, Xk + X

k =1,2 p(L) 2 p(L) 2 Xk, Xk

k =k =1,2 p(L) 2 p(L) 2 Xk, Xk + error

= 2p(L) 1 p(L) 2 n K + (K 2)p(L) 2 p(L) 2 n K

+Op((p(L) 1 p(L) 1 + Kp(L) 1 p(L) 2 + K2p(L) 2 p(L) 2 ) n

1 d) + error. (38)

To sum up, if i1, i2 are in the same group, cos θ(H(L) i1 , H(L) i2 ) satisfies

cos θ(H(L) i1 , H(L) i2 )

= H(L) i1 , H(L) i2 q

H(L) i1 , H(L) i1 H(L) i2 , H(L) i2

= p(L) 1 p(L) 1 n k + (K 1)p(L) 2 p(L) 2 n K

p(L) 1 p(L) 1 n K + (K 1)p(L) 2 p(L) 2 n K + C (Kp(L) 1 p(L) 2 + K2p(L)2 2 ) n K

d + Op( K log n

= 1 |{z} cut1(L) op(1) (39)

as long as d K2 log3 n/b2.

If i1, i2 are not in the same group, cos θ(H(L) i1 , H(L) i2 ) satisfies

cos θ(H(L) i1 , H(L) i2 )

= H(L) i1 , H(L) i2 q

H(L) i1 , H(L) i1 H(L) i2 , H(L) i2

= 2p(L) 1 p(L) 2 n K + (K 2)p(L) 2 p(L) 2 n K + C (Kp(L) 1 p(L) 2 + K2p(L)2 2 ) n K

d + Op( K log n

p(L) 1 p(L) 1 n K + (K 1)p(L) 2 p(L) 2 n K

= 2r(L) + (K 2)r(L)2

1 + (K 1)r(L)2 | {z } cut2(L)

+op(1). (40)

Remark. As L , r(L) will converge to 1. Therefore, cut2(L) will eventually equal 1 cut1(L).

Case 2: L = 1. For notational convenience, we define X(i) k,1 := P

i group k b(i) i,1Xi where b(i) i,1 s are

i.i.d. Bernoulli random variables with success probability p0 and X(i) k,2 := P i group k b(i) i,2Xi where

b(i) i,2 s are i.i.d. Bernoulli random variables with success probability q0.

Then it is straightforward to calculate that, if i1, i2 are in the same group 1, it holds

H(1) i1 , H(1) i2 =d 1 Di1Di2

X(i1) 1,1 , X(i2) 1,1 + X

k =1 X(i1) 1,1 , X(i2) k,2 + X

k =1 X(i1) k,2 , X(i2) 1,1

k =1 X(i1) k,2 , X(i2) k,2 + X

k,k =1 X(i1) k,2 , X(i2) k ,2

k p2 0 + (K 1) n

K q2 0 + Op(p0 n log n

d + p0q0 n log n

+Kq0 n log n

Similarly, if i1, i2 are not in the same group (w.l.o.g, they are in group 1 and 2 respectively), it holds

H(1) i1 , H(1) i2 =d 1 Di1Di2

X(i1) 1,1 , X(i2) 1,2 + X(i1) 2,2 , X(i2) 1,1 + X(i1) 1,1 , X(i2) 2,1 + X(i1) 2,2 , X(i2) 1,2

k =1,2 X(i1) 1,1 + X(i1) 2,2 , X(i2) k,2 + X

k =1,2 X(i1) k,2 , X(i2) 1,2 + X(i2) 2,1

k =1,2 X(i1) k,2 , X(i2) k,2 + X

k,k =1,2 X(i1) k,2 , X(i2) k ,2

k p0q0 + (K 2) n

K q2 0 + Op(p0 n log n

d + p0q0 n log n

+Kq0 n log n

Moreover, if i1 = i2, it holds

H(1) i1 , H(1) i1 =d 1 Di1Di1

X(i1) 1,1 , X(i1) 1,1 + 2 X

k =1 X(i1) 1,1 , X(i1) k,2

k =1 X(i1) k,2 , X(i1) k,2 + X

k,k =1 X(i1) k,2 , X(i1) k ,2

k p0 + (K 1) n

K q0 + Op(p0 n log n

d + p0q0 n log n

+Kq0 n log n

To sum up, we arrive at

cos θ(H(1) i1 , H(1) i2 )

= H(1) i1 , H(1) i2 q

H(1) i1 , H(1) i1 H(1) i2 , H(1) i2

n k p2 0 + (K 1) n

K q2 0 n k p0 + (K 1) n

K q0 + Op(p0 n log n

d + p0q0 n log n

d + Kq0 n log n

n k p2 0 + (K 1) n

K q2 0 n k p0 + (K 1) n

K q0 | {z } cut1(1)

+op(1) (44)

for i1, i2 from the same group, when d log n. Similarly, we have

cos θ(H(1) i1 , H(2) i2 )

= H(1) i1 , H(1) i2 q

H(1) i1 , H(1) i1 H(1) i2 , H(1) i2

n k p2 0 + (K 1) n

K q2 0 + Op(p0 n log n

d + p0q0 n log n

d + Kq0 n log n

n k p0 + (K 1) n

k p0q0 + (K 2) n

K q2 0 n k p0 + (K 1) n

K q0 | {z } cut2(1)

+op(1) (45)

for i1, i2 from different groups, when d log n/p2 0.

Therefore, for any fixed L and any fixed cutoff τ cut1(L), then SERA will predict at least p K n

K 1)/2 + q K(K 1)/2 + n

K truly connected pairs as dis-connected. In other words,

we have the false negative rate is at least p/(2k)+q/2. If the cutoff τ is between cut2(L) and cut1(L), then SERA will predict at least (1 p)K n

K 1)/2 truly dis-connected pairs as connected and predict at least q K(K 1)/2 + n

K truly connected pairs as dis-connected. That is, false positive rate is at least (1 p)/(2k) and false negative rate is at least (1 q)/2. If the cutoff τ is less than cut2(L), then SERA will predict at least (1 p)K n

K 1)/2 + (1 q)K(K 1)/2 + n

K truly connected pairs as dis-connected. That is, false positive rate is at least (1 p)/(2k) + (1 q)/2. This completes the proof.

C.3 Proof of theorem 6.1

As mentioned in section 6, NAG actually protects against a much stronger class of adversaries. Specifically, let H = {H(l)}0 l L denote all the intermediate representations produced by the underlying GNN with weights W = {Wl}l [L]. The following theorem is a stronger version of theorem 6.1:

Theorem C.8. Assume the adversary A has access to H and W, and outputs an estimate of graph adjacencies b A = A(H, W). Then for any graph G and any such adversary A, we have the lower bound:

min u V,v V

h P b Auv = 1|Auv = 0 + P b Auv = 0|Auv = 1 i 1

v u u t1 exp

l [L] Wl 2 op σ2

Here the constant C depends on the AGG mechanism of the GNN. In particular, for some standard GNN architectures we have: CGCN = CMEAN-SAGE = CGIN = 1 and CGAT = CMAX-SAGE = 4.

The theorem is a consequence of the following lemma:

Lemma C.9. Fix an arbitrary node pair (u, v). Let H1 and H0 be the collection of node representations generated under Auv = 1 and Auv = 0, respectively. It follows that the Kullback-Leibler divergence between H1 and H0 is bounded:

DKL (H1 H0) C

l [L] Wl 2 op σ2 . (47)

Here the constant C = 1 for (SAGE-meanpool), (GIN) and (GCN); and C = 4 for (SAGE-maxpool) and (GAT).

Proof of lemma C.9. The proof is essentially a proof of Rényi differential privacy similar to that in [36]. First we fix a single l-th layer of GNN defined in (8). We rewrite (8) as:

H(l) v = Act

Wl H(l 1) u H(l 1) u 2

, u N(v) {v}

:= Act H(l 1) v + ϵ (48)

Let the corresponding representation matrix be H(l) 1 for Auv = 1 and H(l) 0 for Auv = 0 for any l [L]. Further denote e Hl a = { H(l) v,a}v V as the intermediate representation defined as in (48) with Auv = a, a {0, 1}. Then by standard results on Rényi divergence [24], we have

DKL Hl 1 Hl 0 =

e H(l) 1 e H(l) 0 2

For some input Hl 1. It follows that given all the other edges, the only terms that contributes to e H(l) 1 e H(l) 0 2

2 are H(l) v,1 H(l) v,0 2

2 and H(l) u,1 H(l) u,0 2

2. Next we give the derivation of various GNN architectures:

The case of (SAGE-meanpool) We let dv to be the degree of v assuming Auv = 1. Further let

g(l) v = Wlh(l 1) u h(l 1) u 2 We have:

H(l) u,1 H(l) u,0 2 =

u N(v)\{v} g(l 1) u

g(l 1) v 2 + 1

u N(v)\{v} Wl op

= Wl op (53)

Analogously we have H(l) u,1 H(l) u,0 2

2 Wl 2 op and thus DKL H(l) 1 H(l) 0 Wl 2 op σ2 . The result follows from adaptive composition as in [24, Proposition 1].

The case of (GIN) This follows by combining the preceding argument with [36, Proposition 1].

The case of (GCN) This follows by combining the preceding argument with [36, Proposition 2].

The case of (SAGE-maxpool) The result follows from the following fact that maxu N(v) gu maxu N\{v} gu 2 attains its maximum when gv = gu, u N\{v} since all the gus are unit vectors.

Proof of theorem 6.1. We view the reconstruction problem regarding Auv as a binary hypothesis testing problem H0 : Auv = 0 v.s. H1 : Auv = 1. (54) Then according to hypothesis testing theory [21], we have

h P b Auv = 1|Auv = 0 + P b Auv = 0|Auv = 1 i 1 d TV (H1, H0) , (55)

where we use d TV (H1, H0) to denote the total variation distance of distributions induced by H1 and H0 respectively. By the Bretagnolle Huber bound [4, Theorem 1], we have

d TV (H1, H0) p

1 exp ( DKL (H1 H0)) (56) The result then follows by combining (55), (56) and lemma C.9.

D Further experiments

D.1 Synthetic experiments

Erd os Rényi experiments with random GNN weights The experimental setup in this section is basically the same as that in section 7.1, except that the model weights are generated by the following process: For an L-layer Linear GNN, we generate the weight matrix as: W = W1 WL. (57) Here each Wl, 1 l 10 is a random matrix generated using the initialization method proposed in [16]. The evaluations are shown in figure 5.

The results exhibit a similar pattern to figure 4 where the weight matrix is set to identity. However, the attacking performance differs between the two scenarios: When the matrix W is poorly conditioned (a consequence of the construction (57)), the attacking performance degrades especially when the feature dimension d is not sufficiently large.

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

(a) Measured in AUROC metric, darker color implies higher attacking performance

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

(b) Measured in ERR metric, lighter color implies higher attacking performance

Figure 4: Attacking efficacy of SERA over sparse Erd os Rényi graphs, with each grid s value indicating SERA s performance measured in either AUROC (first row) or ERR (second row) metric.

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

(a) Measured in AUROC metric, darker color implies higher attacking performance

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

(b) Measured in ERR metric, lighter color implies higher attacking performance

Figure 5: Attacking efficacy of SERA over sparse Erd os Rényi graphs, with each grid s value indicating SERA s performance measured in either AUROC (first row) or ERR (second row) metric.

1 3 5 7 9 11 13 15 17 19 K

n = 100, L = 1, d = 2048, q = 0.05

p = 0.1 p = 0.3 p = 0.5 p = 0.7

1 3 5 7 9 11 13 15 17 19 K

n = 100, L = 1, d = 2048, q = 0.05

p = 0.1 p = 0.3 p = 0.5 p = 0.7

Figure 6: Performance of SERA on SBM with varying K and p. All plots are based on 5 independent trials with shades indicating one standard deviation.

Table 2: Summary of dataset characteristics

Squirrel Chameleon Actor Cora Citeseer Pubmed Products Reddit

# nodes 5201 2277 7600 2708 3327 19717 1569960 232965 # edges 36101 217073 30019 10556 9104 88648 264339468 114615892 # features 2089 2325 932 1433 3703 500 200 602 # classes 5 5 5 7 6 3 107 41

Impact of SBM structure To investigate the impact of SBM structure on the performance of SERA, we fix the GNN architecture at L = 1 and evaluate on a graph with 100 nodes and node feature dimension d = 2048. Note that we choose a relatively large node feature dimension to ensure that the assumption listed in theorem 5.1 is approximately met. We vary the SBM within-group probability according to p {0.1, 0.3, 0.5, 0.7} and the number of groups according to 1 K 20. The results, plotted in figure 6, suggest that in general, the attacking performance is positively correlated with the number of groups K since more groups yield stronger sparsity according to the SBM generation law. This phenomenon is also in accordance with theorem 5.1.

D.2 A complete report of attack performance on real-world datasets

Dataset characteristics We report a brief summary of datasets used in table 2.

Training configurations Across all experiments we use a hidden dimension of d = 128 with number of GNN layers adjusted to L {2, 5}. We use Adam optimizer with learning rate 0.001 and train for 1000 epochs on the 6 relatively small datasets, and train for 5 epochs on Amazon-Products and Reddit datasets, using a stochastic training strategy that samples up to 20 neighbors per node in each layer.

Results We present a full list of results in table 3. A comprehensive tabulation of outcomes is furnished in Table 3. Consistent with the findings explicated in section 7, SERA demonstrates proficient reconstruction capabilities irrespective of graph properties such as homophily. Additionally, the reconstruction potency is resilient across a spectrum of GNN architectures. These results imply that prevalent GNN architectures are likely to engender models that are vulnerable to exploitation by SERA adversaries.

D.3 A complete report of privacy-utility assessments on Planetoid datasets

D.3.1 Training configurations and attacking pipeline

Network design For node v with label yv, the prediction is defined as

ˆyv = arg max c [C] dec (enc (G, W) [v]) [c], (58)

where we use [ ] to denote the operation of vector index. Here the encoder enc is designed via stacking L noisy GNN layers (in the sense of NAG) with aggregation mechanism AGG

Table 3: Performances of SERA on eight datasets measured by AUROC metric (%). For each setup, the results (in the form of mean std) are obtained via 5 random trials.

Squirrel Chameleon Actor Cora Citeseer Pubmed Products Reddit

Hlabel 0.22 0.23 0.22 0.81 0.74 0.8 0.09 0.76 Hfeature 0.01 0.01 0.16 0.17 0.19 0.27 0.01 0.12

b AFS 46.2 55.2 44.7 80.3 87.4 87.6 52.0 95.9

Victim model b ASERA, non-trained

LIN(L = 2) 72.8 0.0 76.1 0.2 73.1 0.1 93.1 0.4 92.5 0.9 93.9 1.2 97.2 0.3 96.4 0.1 LIN(L = 5) 72.6 0.0 76.0 0.3 73.0 0.2 95.9 0.6 93.8 0.4 96.0 0.6 99.2 0.1 95.4 0.1 GCN(L = 2) 87.3 0.3 87.9 0.4 87.1 0.6 99.8 0.1 99.9 0.0 99.7 0.0 99.6 0.0 97.3 0.1 GCN(L = 5) 82.1 0.3 84.3 0.9 84.1 0.9 99.4 0.2 99.9 0.0 99.5 0.1 99.2 0.1 96.1 0.2 GAT(L = 2) 86.2 0.5 87.9 0.5 86.7 0.6 99.8 0.1 99.9 0.0 99.7 0.0 79.8 7.7 88.5 8.6 GAT(L = 5) 82.0 0.9 84.3 0.4 82.9 0.7 99.5 0.1 99.9 0.0 99.4 0.0 91.3 5.9 93.7 1.7 GIN(L = 2) 72.8 0.0 76.1 0.2 73.5 0.1 92.1 0.8 91.3 0.9 96.6 0.6 98.8 0.1 96.4 0.1 GIN(L = 5) 72.2 0.0 75.9 0.4 74.4 0.0 93.0 0.3 89.9 0.7 94.2 0.5 97.6 0.1 89.2 0.6 SAGE(L = 2) 72.6 0.1 75.5 0.1 74.1 0.2 87.6 0.3 87.0 0.9 93.7 0.9 99.2 0.0 94.2 0.5 SAGE(L = 5) 71.4 0.2 74.2 0.1 72.4 0.6 88.4 1.2 85.6 0.3 88.4 1.0 97.6 0.1 93.4 0.6

Victim model b ASERA, trained

LIN(L = 2) 74.6 0.0 75.0 0.3 59.9 0.7 94.6 0.1 93.7 0.1 89.0 0.1 91.6 0.2 94.7 0.1 LIN(L = 5) 74.1 0.3 76.9 0.2 61.6 0.7 94.8 0.3 93.3 0.3 88.4 0.9 98.6 0.1 92.3 0.2 GCN(L = 2) 79.4 0.4 82.3 0.3 78.5 0.8 97.8 0.1 99.0 0.0 89.2 0.3 94.5 0.1 95.1 0.1 GCN(L = 5) 77.4 0.6 80.6 0.8 78.4 0.6 97.4 0.3 98.7 0.2 92.6 0.4 98.4 0.1 95.0 0.1 GAT(L = 2) 73.3 0.9 74.3 1.0 66.4 0.6 95.2 0.2 96.2 0.1 89.3 0.4 98.2 0.6 95.7 0.2 GAT(L = 5) 79.2 0.6 77.2 1.3 75.0 2.1 95.6 0.2 95.9 0.6 91.6 0.8 84.5 10.1 88.5 4.6 GIN(L = 2 73.3 0.5 77.2 0.5 71.5 0.3 94.2 0.3 95.3 0.1 91.0 0.1 99.0 0.1 95.5 0.4 GIN(L = 5) 73.2 0.3 77.0 0.6 74.1 0.1 99.8 0.1 94.1 0.3 92.7 1.5 97.7 0.1 92.4 0.9 SAGE(L = 2) 71.6 0.1 71.7 1.3 67.8 0.7 91.1 0.2 93.3 0.1 87.0 0.2 96.2 0.4 95.4 0.1 SAGE(L = 5) 71.6 0.1 74.5 0.2 71.2 0.6 93.6 0.5 93.4 0.8 85.5 1.0 96.6 0.8 87.5 1.7

{MEAN, SUM, GCN, ATTENTION, MAX} as defined above. Note that the encoder maps input node features into node representations of dimension d, which might be larger than the number of classes C. The decoder dec is a linear map that maps node representations to predictions.

Attacking paradigm The attacking procedure of SERA will be based on the node representations produced by the GNN encoder enc under a dimension of d. The attack is conducted over the node representations corresponding to the test subset, i.e., the victim subgraph is the subgraph induced by the test nodes.

Training configurations Across all the experiments, we fix the GNN model to be of depth 2 and use full-batch training for 1000 steps(epochs) using the Adam optimizer with a learning rate of 0.001.

D.3.2 Unconstrained scheme

We plot the full experimental results under the unconstrained scheme for the Cora, Citeseer and Pubmed datasets in figure 8, figure 9 and figure 10, respectively, where we evaluate the performance of SERA under both ERR and AUROC metrics. The result is consistent with those findings listed in section 7.3.

D.3.3 Constrained scheme

We plot the full experimental results under the constrained scheme for the Cora, Citeseer and Pubmed datasets in figure 11, figure 12 and figure 13, respectively, where we evaluate the performance of SERA under both ERR and AUROC metrics. The result is consistent with those findings listed in section 7.3.

Impact of different AGG mechanisms According to figure 11, 8, 12, 9, 13, 10, the previously discovered phenomenons are present for all the 5 aggregation types. Nevertheless, the degree to which these phenomena exhibit varies with the specific type of aggregation employed. Notably, the

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

(a) Measured in AUROC metric, darker color implies higher attacking performance

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

2 3 4 5 6 7 8 9 10 11 log2 d

(b) Measured in ERR metric, lighter color implies higher attacking performance

Figure 7: Attacking efficacy of SERA over dense SBM graphs, with each grid s value indicating SERA s performance measured in either AUROC (first row) or ERR (second row) metric.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Cora

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(a) GNN model performance over Cora dataset under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Cora

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(b) Attacking performance of SERA over Cora dataset (measured by ERR) under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Cora

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(c) Attacking performance of SERA over Cora dataset (measured by AUROC) under 5 different aggregation types.

Figure 8: Privacy-utility trade-off on Cora dataset using the unconstrained training scheme. The horizontal axes measure feature dimension d in log2 scale and the vertical axes stands for performance measures All plots are based on 5 independent trials with shades indicating one standard deviation.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(a) GNN model performance over Citeseer dataset under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(b) Attacking performance of SERA over Citeseer dataset (measured by ERR) under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(c) Attacking performance of SERA over Citeseer dataset (measured by AUROC) under 5 different aggregation types.

Figure 9: Privacy-utility trade-off on Citeseer dataset using the unconstrained training scheme. The horizontal axes measure feature dimension d in log2 scale and the vertical axes stands for performance measures All plots are based on 5 independent trials with shades indicating one standard deviation.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(a) GNN model performance over Pubmed dataset under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(b) Attacking performance of SERA over Pubmed dataset (measured by ERR) under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(c) Attacking performance of SERA over Pubmed dataset (measured by AUROC) under 5 different aggregation types.

Figure 10: Privacy-utility trade-off on Pubmed dataset using the unconstrained training scheme. The horizontal axes measure feature dimension d in log2 scale and the vertical axes stands for performance measures All plots are based on 5 independent trials with shades indicating one standard deviation.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Cora

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(a) GNN model performance over Cora dataset under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Cora

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(b) Attacking performance of SERA over Cora dataset (measured by ERR) under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Cora

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(c) Attacking performance of SERA over Cora dataset (measured by AUROC) under 5 different aggregation types.

Figure 11: Privacy-utility trade-off on Cora dataset using the constrained training scheme. The horizontal axes measure feature dimension d in log2 scale and the vertical axes stands for performance measures All plots are based on 5 independent trials with shades indicating one standard deviation.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(a) GNN model performance over Citeseer dataset under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(b) Attacking performance of SERA over Citeseer dataset (measured by ERR) under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Citeseer

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(c) Attacking performance of SERA over Citeseer dataset (measured by AUROC) under 5 different aggregation types.

Figure 12: Privacy-utility trade-off on Citeseer dataset using the constrained training scheme. The horizontal axes measure feature dimension d in log2 scale and the vertical axes stands for performance measures All plots are based on 5 independent trials with shades indicating one standard deviation.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(a) GNN model performance over Pubmed dataset under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(b) Attacking performance of SERA over Pubmed dataset (measured by ERR) under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Pubmed

σ =0.0 σ =0.01 σ =0.05 σ =0.1 σ =0.5 σ =1.0

(c) Attacking performance of SERA over Pubmed dataset (measured by AUROC) under 5 different aggregation types.

Figure 13: Privacy-utility trade-off on Pubmed dataset using the constrained training scheme. The horizontal axes measure feature dimension d in log2 scale and the vertical axes stands for performance measures All plots are based on 5 independent trials with shades indicating one standard deviation.

behaviors of ATTENTION, MEAN, and GCN pooling display similarities attributable to their shared mechanism in (weighted) average aggregation. Conversely, the efficacy of the SERA against Noisy Aggregation (NAG) when SUM and MAX pooling are utilized appears less susceptible to changes in d.

D.4 Spectrum study of GNN solutions obtained under the unconstrained scheme

A closer look at GNN solutions obtained via NAG in the unconstrained scheme As SERA is just one form of attack mechanism under a weak adversary, protecting against SERA does not necessarily imply strict notions of privacy. Motivated by theorem 6.1, we conduct a spectrum study regarding the GNN solutions obtained via NAG in the unconstrained scheme. Specifically, we plot the operator norm of the weight matrices corresponding to the GNN layers across all scenarios and report them in the last column in figure 14 in appendix D.4. The results exhibit a rapidly growing trend of weights operator norms regarding the increase of both feature dimension d and noise level σ. For GNN models trained using noisy aggregation under large ds, the corresponding bounds according to (8) become vacuous, i.e., practically zero. Additionally, these solutions may exhibit diminished robustness, as the corresponding Lipschitz constants are likely to be inadequately regulated [40]. To conclude, we have found successful empirical defenses against SERA without satisfying strict notions of privacy, suggesting that SERA has limitations as a tool for auditing private GRL training procedures.

D.5 Privacy-utility trade-off comparisons: NAG vs Edge RR

In this section we provide preliminary comparisons between NAG and Edge RR regarding their privacy-utility trade-offs. In particular, for a graph with n nodes, the Edge RR with budget ε is implemented as a graph-level transform that perturbs the adjacency matrix Edge RR(A) Rn n

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Cora

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Cora

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(a) Spectrum study on the Cora dataset under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Citeseer

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(b) Spectrum study on the Citeseer dataset under 5 different aggregation types.

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

ATTENTION, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

SUM, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

GCN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MAX, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

5 6 7 8 9 10 11 12 13 log2 d

MEAN, Pubmed

σ =0.0 σ =1.0 σ =2.0 σ =4.0

(c) Spectrum study on the Pubmed dataset under 5 different aggregation types.

Figure 14: Spectrum study on the Planetoid datasets under the unconstrained training scheme. The horizontal axes measure feature dimension d in log2 scale and the vertical axes measures the operator norm of the projection weights of the GNN. All plots are based on 5 independent trials with shades indicating one standard deviation.

0.2 0.3 0.4 0.5 0.6 0.7 0.8 Accuracy(2 layer GCN)

Cora: Privacy-utility trade-oﬀcomparison

Edge RR NAG

0.1 0.2 0.3 0.4 0.5 0.6 0.7 Accuracy(2 layer GCN)

Citeseer: Privacy-utility trade-oﬀcomparison

Edge RR NAG

0.4 0.5 0.6 0.7 Accuracy(2 layer GCN)

Pubmed: Privacy-utility trade-oﬀcomparison

Edge RR NAG

Figure 15: Comparison of privacy-utility trade-off between NAG and Edge RR

with each of its entries defined as:

Edge RR(A)u,v = Au,v With probability eε 1+eε 1 Au,v Otherwise (59)

It then follows from the theory of local differential privacy [19] that for any (u, v) the perturbed entry is a ε-LDP view of the underlying true adjacency. Combining the property of max-divergence along with the proof techique in section C.3 we have the following performance lower bound of any adversary A:

inf A min u V,v V

h P b Auv = 1|Auv = 0 + P b Auv = 0|Auv = 1 i 1

1 e ε. (60)

While Edge RR has a very strong privacy protection guarantee, it is also criticized for low utility. We provide a comparison using a two-layer GCN as the backbone under the embedding dimension d = 128 on the Planetoid datasets. The results, which can be viewed at figure 15, suggest that when considering SERA as the basis for evaluating privacy, NAG achieves a Pareto-dominant privacy-utility trade-off curve relative to Edge RR.

Software and hardware infrastructures. Our framework is built upon Py Torch [28] and Py Torch Geometric [13], which are open-source software released under BSD-style 4 and MIT 5 license, respectively. All datasets used throughout experiments are publicly available. All experiments are done on a single NVIDIA A100 GPU (with 80GB memory).

E Discussions and Limitations

E.1 On the impact of depth L for NAG

As elucidated in theorem 6.1, the privacy assurances provided by NAG are inclined to diminish as the depth parameter L increases, a phenomenon attributable to the compositional nature of (differential) privacy mechanisms [12, 24]. However, this same compositional principle enables NAG to disseminate all intermediate node representations H(1), . . . , H(L) while preserving the identical level of privacy as would be the case if only H(L) were released. Consequently, this framework permits the design of superior privacy-preserving GNN architectures by leveraging a blend of Hll [L] through inter-layer aggregation techniques, sometimes termed as residual connections in GRL [39]. Probing the resilience of SERA against such intricate GNN configurations poses considerable challenges and falls outside the purview of this paper. Nonetheless, preliminary evaluations have been executed to discern the ramifications of the depth parameter L on the privacy-utility compromises of NAG, absent residual connections, with privacy quantified via SERA. The inquiries, undertaken using the Planetoid datasets with a fixed noise magnitude of σ = 0.05 and hidden dimensionality d = 128, are visually synthesized in Figure 16. Findings reveal that optimal defensive utility typically transpires at L = 1, while greater GNN depths, notably those with L > 5, tend to undermine the model s utility. Moreover, the apex of attack performance generally materializes at relatively incipient

4https://github.com/pytorch/pytorch/blob/master/LICENSE 5https://github.com/pyg-team/pytorch_geometric/blob/master/LICENSE

0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 Accuracy(GCN)

Cora: Privacy-utility trade-oﬀ1 L 10

0.35 0.40 0.45 0.50 0.55 Accuracy(GCN)

Citeseer: Privacy-utility trade-oﬀ1 L 10

0.550 0.575 0.600 0.625 0.650 0.675 0.700 0.725 Accuracy(GCN)

Pubmed: Privacy-utility trade-oﬀ1 L 10

Figure 16: Privacy-utility trade-off on Planetoid regarding different model depths

layers, a confirmation of our theoretical insights set forth in theorem 4.1. Finally, we postulate that incorporating residual connections might proffer an enhanced Pareto frontier for the model. Exploration of this hypothesis is deferred to subsequent research endeavors.

E.2 Stronger adversary for dense graphs or deep encoders

We have shown the limitations of SERA over dense SBM graphs as well as deep GNN encoders. As our analysis applies to the specific SERA adversary, it is thus of interest to ask whether there exists stronger attacking paradigms that is provably effective against dense graphs or deep GNN encoders. On the flipside, it is also valuable to understand whether the phenomenon of oversmoothing may fundamentally affect the performance of any black-box adversary.

E.3 Extension to more complicated victim GNN models

The theoreical analysis presented in section 4 and section 5 is dedicated to graph neural networks employing mean aggregation without nonlinear activation functions. Prospects exist for augmenting our theoretical framework to encompass alternative aggregation schemes, such as summation [38] and attention-based aggregation [32], conditional upon the satisfaction of specific prerequisites namely, some lower bound of attention coefficients. A more challenging task lies in the extension of our analysis to graph neural networks (GNNs) that incorporate nonlinear activations between their layers. This inclusion significantly complicates the straightforward application of our non-asymptotic analysis in a cohesive end-to-end manner. Acknowledging the complexity of this endeavor, we left a thorough investigation for future research initiatives.

E.4 Quantifying the advantage of adversaries with more knowledge

Despite its effectiveness, the knowledge available to SERA is rather limited. Although previous study [17] has shown empirical evidences that equipping the adversary with more capability may results in stronger attacking algorithms, theoretical explication of these enhancements has yet to be articulated. In particular, it is of interest to quantify the amplification of adversarial capacity afforded by scenarios in which the adversary is granted white-box access to the model weights or node features.

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope?

Answer: [Yes]

Justification: All our contributions are concisely represented in the abstract.

Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.

2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors?

Answer: [Yes]

Justification: The limitations are discussed in section E

Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

Answer: [Yes]

Justification: The assumptions are listed in the theory statements in the main text, and proofs are presented in section C.

Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced.

4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?

Answer: [Yes]

Justification: The experimental details are presented in the main text and in section D in the appendix.

Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.

5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

Answer: [Yes]

Justification: Codes for reproducing experiments are included in the supplementary material. All the data used in this paper are open-sourced.

Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.

6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?

Answer: [Yes]

Justification: The details are contained in the main text and section D in the appendix.

Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material.

7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?

Answer: [Yes]

Justification: All plots in this paper are presented with error bars indicating one standard deviation based on 5 random trials. All numbers reported in each tables are also accompanied by one standard deviation.

Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper.

The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.

8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?

Answer: [Yes]

Justification: We provide hardware configurations used in our experiments in section D in the appendix.

Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper).

9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines?

Answer: [Yes]

Justification: We conform in every aspect to the Neur IPS Code of Ethics.

Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).

10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?

Answer: [Yes]

Justification: We discuss the broader impacts of this paper in section A.

Guidelines:

The answer NA means that there is no societal impact of the work performed.

If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

Answer: [NA]

Justification: The paper poses no such risks.

Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

Answer: [Yes] Justification: All the lisence information of the codes released in this paper are detailed in section D.

Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided.

If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators.

13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

Answer: [NA]

Justification: This paper does not release new assets.

Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?

Answer: [NA]

Justification: This paper does not involve crowdsourcing nor research with human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector.

15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects

Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?

Answer: [NA]

Justification: This paper does not involve crowdsourcing nor research with human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper.

We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.