# characterizing_the_influence_of_graph_elements__5027b8fa.pdf Published as a conference paper at ICLR 2023 CHARACTERIZING THE INFLUENCE OF GRAPH ELEMENTS Zizhang Chen, Peizhao Li, Hongfu Liu, Pengyu Hong Brandeis University {zizhang2,peizhaoli,hongfuliu,hongpeng}@brandeis.edu Influence function, a method from robust statistics, measures the changes of model parameters or some functions about model parameters concerning the removal or modification of training instances. It is an efficient and useful post-hoc method for studying the interpretability of machine learning models without the need for expensive model re-training. Recently, graph convolution networks (GCNs), which operate on graph data, have attracted a great deal of attention. However, there is no preceding research on the influence functions of GCNs to shed light on the effects of removing training nodes/edges from an input graph. Since the nodes/edges in a graph are interdependent in GCNs, it is challenging to derive influence functions for GCNs. To fill this gap, we started with the simple graph convolution (SGC) model that operates on an attributed graph and formulated an influence function to approximate the changes of model parameters when a node or an edge is removed from an attributed graph. Moreover, we theoretically analyzed the error bound of the estimated influence of removing an edge. We experimentally validated the accuracy and effectiveness of our influence estimation function. In addition, we showed that the influence function of a SGC model could be used to estimate the impact of removing training nodes/edges on the test performance of the SGC without re-training the model. Finally, we demonstrated how to use influence functions to guide the adversarial attacks on GCNs effectively. 1 INTRODUCTION Graph data is pervasive in real-world applications, such as, online recommendations (Shalaby et al., 2017; Huang et al., 2021; Li et al., 2021), drug discovery (Takigawa & Mamitsuka, 2013; Li et al., 2017), and knowledge management (Rizun, 2019; Wang et al., 2018), to name a few. The growing need to analyze huge amounts of graph data has inspired work that combines Graph Neural Networks with deep learning (Gori et al., 2005; Scarselli et al., 2005; Li et al., 2016; Hamilton et al., 2017; Xu et al., 2019b; Jiang et al., 2019). Graph Convolutional Networks (GCNs) (Kipf & Welling, 2017; Zhang & Chen, 2018; Fan et al., 2019), the most cited GNN architecture, adopts convolution and message-passing mechanisms. To better understand GCNs from a data-centric perspective, we consider the following question: Without model retraining, how can we estimate the changes of parameters in GCNs when the graph used for learning is perturbed by edgeor node-removals? This question proposes to estimate counterfactual effects on the parameters of a well-trained model when there is a manipulation in the basic elements in a graph, where the ground truth of such an effect should be obtained from model retraining. With a computational tool as the answer, we can efficiently manipulate edges or nodes in a graph to control the change of model parameters of trained GCNs. The solution would provide further extensions like graph data rectification, improving model generalization, and graph data poison attacks through a pure data modeling way. Yet, current methods for training GCNs offer limited interpretability of the interactions between the training graph and the GCN model. More specifically, we fall short of understanding the influence of the input graph elements on both the changes in model parameters and the generalizability of a trained model (Ying et al., 2019; Huang et al., 2022; Yuan et al., 2021; Xu et al., 2019a; Zheng et al., 2021). Published as a conference paper at ICLR 2023 In the regime of robust statistics, an analyzing tool called influence functions (Hampel, 1974; Koh & Liang, 2017) is proposed to study the counterfactual effect between training data and model performance. For independent and identically distributed (i.i.d.) data, influence functions offer an approximate estimation of the model s change when there is an infinitesimal perturbation added to the training distribution, e.g., a reweighing on some training instances. However, unlike i.i.d. data, manipulation on a graph would incur a knock-on effect through GCNs. For example, an edge removal will break down all message passing that is supposed to pass through this edge and consequentially change node representations and affect the final model optimization. Therefore, introducing influence functions to graph data and GCNs is non-trivial work and requires extra considerations. In this work, we aim to derive influence functions for GCNs. As the first attempt in this direction, we focused on Simple Graph Convolution (Wu et al., 2019). Our contributions are three-fold: We derived influence functions for Simple Graph Convolution. Based on influence functions, we developed computational approaches to estimate the changes in model parameters caused by two basic perturbations: edge removal and node removal. We derived the theoretical error bounds to characterize the gap between the estimated changes and the actual changes in model parameters in terms of both edge and node removal. We show that our influence analysis on the graph can be utilized to (1) rectify the training graph to improve model testing performance, and (2) guide adversarial attacks to SGC or conduct grey-box attacks on GCNs via a surrogate SGC. Code is publicly available at https://github.com/Cyrus9721/Characterizing_ graph_influence. 2 PRELIMINARIES In the following sections, we use a lowercase x for a scalar or an entity, an uppercase X for a constant or a set, a bolder lowercase x for a vector, and a bolder uppercase X for a matrix. Influence Functions Influence functions (Hampel, 1974) estimate the change in model parameters when the empirical weight distribution of i.i.d. training samples is perturbed infinitesimally. Such estimations are computationally efficient compared to learn-one-out retraining iterating every training sample. For N training instances x and label y, consider empirical risk minimization (ERM) ˆθ = arg minθ Θ 1 N P x,y ℓ(x, y) + λ 2 θ 2 2 for some loss function ℓ( , ) through a parameterized model θ and with a regularization term. When down weighing a training sample (xi, yi) by an infinitely small fraction ϵ, the substitutional ERM can be expressed as ˆθ(xi; ϵ) = arg minθ Θ 1 N P x,y ℓ(x, y) ϵℓ(xi, yi) + λ 2 θ 2 2. Influence functions estimate the actual change I (xi; ϵ) = ˆθ(xi; ϵ) ˆθ for a strictly convex and twice differentiable ℓ( , ): I(xi; ϵ) = lim ϵ 0 ˆθ(xi; ϵ) ˆθ = H 1 ˆθ ˆθℓ(xi, yi), (1) where Hˆθ := 1 N PN i=1 2 ˆθℓ(xi, yi) + λI is the Hessian matrix with regularization at parameter ˆθ. For some differentiable model evaluation function f : Θ R like calculating total model loss over a test set, the change from down weighing ϵ (xi, yi) to the evaluative results can be approximated by ˆθf(ˆθ)H 1 ˆθ ˆθℓ(xi, yi). When N the size of the training data is large, by setting ϵ = 1 approximate the change of ˆθ incurred by removing an entire training sample I(xi; 1 N ) = I( xi) via linear extrapolations 1 N 0. Obviously, in terms of the estimated influence I, removing a training sample has the opposite value of adding the same training sample I( xi) = I(+xi). In our work, we shall assume an additivity of influence functions in computations when several samples are removed, e.g., when removing two samples: I( xi, xj) = I( xi) + I( xj). Though efficient, as a drawback, influence functions on non-convex models suffer from estimation errors due to the variant local minima and usually a computational approximation to H 1 ˆθ for a noninvertible Hessian matrix. To introduce influence functions from i.i.d. data to graphs and precisely characterize the influence of graph elements to model parameters changes, we consider a convex model called Simple Graph Convolution from the GCNs family. Published as a conference paper at ICLR 2023 Simple Graph Convolution By removing non-linear activations between layers from typical Graph Convolutional Networks, Simple Graph Convolution (SGC) (Wu et al., 2019) formulates a linear simplification of GCNs with competitive performance on various tasks (He et al., 2020; Rakhimberdina & Murata, 2019). Let G = (V, E) denote an undirected attributed graph, where V = {v} contains vertices with corresponding feature X R|V | D with D the feature dimension, and E = {eij}1 i