# natural_graph_networks__8ca8b281.pdf Natural Graph Networks Pim de Haan Qualcomm AI Research University of Amsterdam QUVA Lab Taco Cohen Qualcomm AI Research Max Welling Qualcomm AI Research University of Amsterdam A key requirement for graph neural networks is that they must process a graph in a way that does not depend on how the graph is described. Traditionally this has been taken to mean that a graph network must be equivariant to node permutations. Here we show that instead of equivariance, the more general concept of naturality is sufficient for a graph network to be well-defined, opening up a larger class of graph networks. We define global and local natural graph networks, the latter of which are as scalable as conventional message passing graph neural networks while being more flexible. We give one practical instantiation of a natural network on graphs which uses an equivariant message network parameterization, yielding good performance on several benchmarks. 1 Introduction Graph-structured data is among the most ubiquitous forms of structured data used in machine learning and efficient practical neural network algorithms for processing such data have recently received much attention [Wu et al., 2020]. Because of their scalability to large graphs, graph convolutional neural networks or message passing networks are widely used. However, it has been shown [Xu et al., 2018] that such networks, which pass messages along the edges of the graph and aggregate them in a permutation invariant manner, are fundamentally limited in their expressivity. (a) A global isomorphism. (b) Induced local isomorphisms. Figure 1: A global graph isomorphism corresponds for each edge to a local isomorphism on its neighbourhood, shown for three example edges - denoted with arrows. Hence, when a message passing kernel satisfies the naturality condition for local isomorphisms of the edge neighbourhood (Eq. 4), it also satisfies the global naturality condition (Eq. 2). Qualcomm AI Research in an initiative of Qualcomm Technologies, Inc. 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. More expressive equivariant graph networks exist [Maron et al., 2018], but these treat the entire graph as a monolithic linear structure (e.g. adjacency matrix) and as a result their computational cost scales superlinearly with the size of the graph. In this paper we ask the question: how can we design maximally expressive graph networks that are equivariant to global node permutations while using only local computations? If we restrict a global node relabeling / permutation to a local neighbourhood, we obtain a graph isomorphism between local neighbourhoods (see Figure 1). If a locally connected network is to be equivariant to global node relabelings, the message passing scheme should thus process isomorphic neighbourhoods in an identical manner. Concretely, this means that weights must be shared between isomorphic neighbourhoods. Moreover, when a neighbourhood is symmetrical (Figure 1), it is isomorphic to itself in a non-trivial manner, and so the convolution kernel has to satisfy an equivariance constraint with respect to the symmetry group of the neighbourhood. Local equivariance has previously been used in gauge equivariant neural networks [Cohen et al., 2019]. However, as the local symmetries of a graph are different on different edges, we do not have a single gauge group here. Instead, we have more general structures that can be captured by elementary category theory. We thus present a categorical framework we call natural graph networks that can be used describe maximally flexible global and local graph networks. In this framework, an equivariant kernel is just a natural transformation between two functors. We will not assume knowledge of category theory in this paper, and explicit category theory is limited to Section 5. When natural graph networks (NGNs) are applied to graphs that are regular lattices, such as a 2D square grid, or to a highly symmetrical grid on the icosahedron, one recovers conventional equivariant convolutional neural networks [Cohen and Welling, 2016, Cohen et al., 2019]. However, when applied to irregular grids, like knowledge graphs, which generally have few symmetries, the derived kernel constraints themselves lead to impractically little weight sharing. We address this by parameterizing the kernel with a message network, an equivariant graph network which takes as input the local graph structure. We show that our kernel constraints coincide with the constraints on the message network being equivariant to node relabelings, making this construction universal whenever the network that parameterizes the kernel is universal. 2 Global Natural Graph Networks As mentioned before, there are many equivalent ways to encode (directed or undirected) graphs. The most common encoding used in the graph neural networks literature is to encode a graph as a (node-node) adjacency matrix A, whose rows and columns correspond to the nodes and whose pi, jq-th entry signals the presence (Aij 1) or absence (Aij 0) of an edge between node i and j. There are many other options, but here we will adopt the following definition: Definition 2.1. A Concrete Graph G is a finite set of nodes2 Vp Gq Ă N and a set of edges Ep Gq Ă Vp Gq ˆ Vp Gq. The natural number labels of the nodes of a concrete graph are essential for representing a graph in a computer, but contain no actual information about the underlying graph. Hence, different concrete graphs that are related by a relabelling, encode the graphs that are essentially the same. Such relabellings are called graph isomorphisms. Definition 2.2 (Graph isomorphism and automorphism). Let G and G1 be two graphs. An isomorphism φ : G Ñ G1 is a mapping (denoted by the same symbol) φ : Vp Gq Ñ Vp G1q that is bijective and preserves edges, i.e. satisfies for all pi, jq P Vp Gq ˆ Vp Gq: pi, jq P Ep Gq ðñ pφpiq, φpjqq P Ep G1q. (1) If there exists an isomorphism between G and G1, we say they are isomorphic. An isomorphism from a graph to itself is also known as an automorphism or simply symmetry. In order to define graph networks, we must first define the vector space of features on a graph. Additionally, we need to define how the feature spaces of isomorphic graphs are related, so we can express a feature on one concrete graph on other isomorphic concrete graphs. 2Note that the set of node ids may be non-contiguous. This is useful because a graph may arise as a subgraph of another one, in which case we wish to preserve the node ids. 𝜌(𝐺 ) 𝜌" 𝐺" Figure 2: A graph feature ρ assigns to each graph G a vector space ρp Gq (here ρp Gq ρp G1q R4, ρ ρ1) and to each graph isomorphism φ : G Ñ G1 a linear map ρpφq : ρp Gq Ñ ρp G1q (here swapping the first and fourth row). Global Natural Graph Network layer K between features ρ and ρ1 has for each graph G a map KG : ρp Gq Ñ ρ1p Gq, such that for each graph isomorphism φ : G Ñ G1 the above naturality diagram commutes. Definition 2.3 (Graph feature space). A graph feature space, or graph representation, ρ associates to each graph G a vector space VG ρp Gq, and to each graph isomorphism φ : G Ñ G1 an invertible linear map ρpφq : VG Ñ VG1, such that the linear maps respect composition of graph isomorphisms: ρpφ φ1q ρpφq ρpφ1q. 3 As the nodes in a concrete graph have a unique natural number as a label, the nodes can be ordered. A graph isomorphism φ : G Ñ G1 induces a permutation of that ordering. This gives a convenient way of constructing graph feature spaces. For example, for the vector representation, we associate with graph G the vector space ρp Gq R|Vp Gq| and associate to graph isomorphisms the permutation matrix of the corresponding permutation. Similarly, for the matrix representation, we associate to graph G feature matrix vector space ρp Gq R|Vp Gq|ˆ|Vp Gq| and to graph isomorphism φ : G Ñ G1, linear map ρpφqpvq Pv P T , where P is the permutation matrix corresponding to φ. A neural network operating on such graph features can, in general, operate differently on different graphs. Its (linear) layers, mapping from graph feature space ρ to feature space ρ1, thus has for each possible graph G, a (linear) map KG : ρp Gq Ñ ρ1p Gq. However, as isomorphic graphs G and G1 are essentially the same, we will want KG and KG1 to process the feature space in an equivalent manner. Definition 2.4 (Global Natural Graph Network Layer). A layer (or linear layer) in a global natural graph network (GNGN) is for each concrete G a map (resp. linear map) KG : ρp Gq Ñ ρ1p Gq between the input and output feature spaces such that for every graph isomorphism φ : G Ñ G1, the following condition ( naturality ) holds: ρ1pφq KG KG1 ρpφq. (2) Equivalently, the following diagram should commute: ρp Gq ρ1p Gq ρp G1q ρ1p G1q The constraint on the layer (Eq. 2) says that if we first transition from the input feature space ρp Gq to the equivalent input feature space ρp G1q via ρpφq and then apply KG1 we get the same thing as first applying KG and then transitioning from the output feature space ρ1p Gq to ρ1p G1q via ρ1pφq. Since ρpφq is invertible, if we choose KG for some G then we have determined KG1 for any isomorphic G1 by KG1 ρ1pφq KG ρpφq 1. Moreover, for any automorphism φ : G Ñ G, we get a equivariance constraint ρ1pφq KG KG ρpφq. Thus, to choose a layer we must choose for each isomorphism class of graphs one map KG that is equivariant to automorphisms. For linear layers, these can in principle be learned by first finding a complete solution basis to the automorphism equivariance constraint, then linearly combining the solutions with learnable parameters. 3As is common in the category theory literature for functors (see Sec. 5), we overload the ρ symbol. ρp Gq denotes a vector space, while ρpφq denotes a linear map. The construction of the graph isomorphisms, the graph feature space and the natural graph network layer resemble mathematical formalization that are used widely in machine learning: groups, group representations and equivariant maps between group representations. However, the fact that the natural graph network layer can be different for each graph, suggests a different formalism is needed, namely the much more general concepts of a category, a functor and a natural transformation. How natural transformations generalize over equivariant maps is described in section 5. 2.1 Relation to Equivariant Graph Networks The GNGN is a generalization of equivariant graph networks (EGN) [Maron et al., 2018, 2019], as an EGN can be viewed as a GNGN with a particular choice of graph feature spaces and layers. The feature space of an EGN for a graph of n nodes is defined by picking a group representation of the permutation group Sn over n symbols. Such a representation consists of a vector space Vn and an invertible linear map ρpσq : Vn Ñ Vn for each permutation σ P Sn, such that ρpσσ1q ρpσq ρpσ1q. A typical example is Vn Rnˆn, with ρpσq acting by permuting the rows and columns. The (linear) layers of an EGN between features ρ and ρ1 are (linear) maps Kn : Vn Ñ V 1 n, for each n, such that the map is equivariant: ρ1pσq Kn Kn ρpσq for each permutation σ P Sn. Comparing the definitions of EGN features and layers to GNGN features and layers, we note the former are instances of the latter, but with the restriction that an EGN picks a single representation vector space Vn and single equivariant map Kn for all graphs of n nodes, while in a general GNGN, the representation vector space and equivariant map can arbitrarily differ between non-isomorphic graphs. In an EGN, the graph structure must be encoded as a graph feature. For example, the adjacency matrix can be encoded as a matrix representation of the permutation group. Such constructions are shown to be universal [Keriven and Peyré, 2019], but impose considerable constraints on the parameterization. For example, one may want to use a GNGN with completely separate sets of parameters for non-isomorphic graphs, which is impossible to express as an EGN. 3 Local Graph Networks Global NGNs provide a general framework of specifying graph networks that process isomorphic graphs equivalently. However, in general, its layers perform global computations on entire graph features, which has high computational complexity for large graphs. 3.1 Local Invariant Graph Networks Figure 3: Two regular graphs. An entirely different strategy to building neural networks on graphs is using graph convolutional neural networks or message passing networks [Kipf and Welling, 2016, Gilmer et al., 2017]. We will refer to this class of methods as local invariant graph networks (LIGNs). Such convolutional architectures are generally more computationally efficient compared to the global methods, as the computation cost of computing one linear transformation scales linearly with the number of edges. LIGNs are instances of GNGNs, where the feature space for a graph consists of a copy of the same vector space VN at each node, and graph isomorphisms permute these node vector spaces. In their simplest form, the linear layers of an LIGN pass messages along edges of the graph: pp,qq PE Wvq, (3) where vp P VN is a feature vector at node p and W : VN Ñ V 1 N is a single matrix used on each edge of any graph. This model can be generalized into using different aggregation functions than the sum and having the messages also depend on vp instead of just vq [Gilmer et al., 2017]. It is easy to see that these constructions satisfy the GNGN constraint (Eq. 2), but also result in the output KGpvqp being invariant under a permutation of its neighbours, which is the reason for the limited expressivity noted by [Xu et al., 2018]. For example, no invariant message passing network can discriminate between the two regular graphs in figure 3. Furthermore, if applied to the rectangular pixel grid graph of an image, it corresponds to applying a convolution with isotropic filters. 𝜌(𝜙) 𝜌(𝐺!) 𝜌(𝐺" Figure 4: A node feature ρ assigns to each node neighbourhood Gp (here the dark colored nodes around node p) a vector space ρp Gpq (here ρp Gpq R5) and to each local node isomorphism ψ : Gp Ñ G1 p1 a linear map ρpψq : ρp Gq Ñ ρp G1q (here swapping the third and fifth row). 3.2 Local Natural Graph Networks The idea of a Local Natural Graph Network (LNGN) is to implement a scalable GNGN layer that consists of passing messages along edges with a message passing kernel and then aggregating the incoming messages. It generalises over local invariant graph networks by making the node features transform under isomorphisms of the neighbourhood of the node and by allowing different message passing kernels on non-isomorphic edges. Definition 3.1 (Neighbourhoods and local isomorphisms). A node neighbourhood4 Gp is a subgraph Gp of a concrete graph G in which one node p P Vp Gpq is marked. Subgraph Gp inherits the node labels from G, making Gp a concrete graph itself. A local node isomorphism is a map between node neighbourhoods ψ : Gp Ñ G1 p1, consisting of a graph isomorphism ψ : Gp Ñ G1 p1 such that ψppq p1. Similarly, an edge neighbourhood is a concrete graph Gpq with a marked edge pp, qq and a local edge isomorphism that maps between edge neighbourhoods such that the marked edge is mapped to the marked edge. Given a graph G, we can assign to node p P Vp Gq a node neighbourhood Gp in several ways. In our experiments, we choose Gp to contain all nodes in G that are at most k edges removed from p, for some natural number k, and all edges between these nodes. Similarly, we pick for edge pp, qq P Ep Gq neighbourhood Gpq containing all nodes at most k edges removed from p or q and all edges between these nodes. In all experiments, we chose k 1, unless otherwise noted. General criteria for the selection of neighbourhoods are given in App. C. Neighbourhood selections satisfying these criteria have that any global graph isomorphism φ : G Ñ G1, when restricted to a node neighbourhood Gp equals a node isomorphism φp : Gp Ñ G1 p1 and when restricted to an edge neighbourhood Gpq equals a local edge isomorphism φpq : Gpq Ñ G1 p1q1. Furthermore, it has as a property that any local edge isomorphism ψ : Gpq Ñ G1 p1q1 can be restricted to node isomorphisms ψp : Gp Ñ G1 p1 and ψq : Gq Ñ G1 q1 of the start and tail node of the edge. Next, we choose a feature space for the local NGN by picking a node feature space ρ, which is a graph feature space (Def. 2.4) for node neighbourhoods in complete analogy with the previous section on global NGNs. Node feature space ρ consists of selecting for any node neighbourhood Gp a vector space ρp Gpq and for any local node isomorphism φ : Gp Ñ G1 p1, a linear bijection ρpφq : ρp Gpq Ñ ρp G1 p1q, respecting composition: ρpφq ρpφ1q ρpφ φ1q. A node neighbourhood feature space ρ defines a graph feature space ˆρ on global graphs by concatenating (taking the direct sum of) the node vector spaces: ˆρp Gq À p PVp Gq ρp Gpq. For a global feature vector v P ˆρp Gq, we denote for node p P Vp Gq the feature vector as vp P ρp Gpq. The global graph feature space assigns to global graph isomorphism φ : G Ñ G1 a linear map ˆρpφq : ˆρp Gq Ñ ˆρp G1q, which permutes the nodes and applies ρ to the individual node features: ˆρpφqpvqφppq ρpφpqpvpq Given two such node feature spaces ρ and ρ1, we can define a (linear) local NGN message passing kernel k by choosing for each possible edge neighbourhood Gpq a (linear) map kpq : ρp Gpq Ñ ρ1p Gqq, 4In the graph literature, such graphs are also called node/edge rooted graphs. 𝜌(𝜓!) 𝜌(𝐺!) 𝜌(𝐺!! " ) 𝜌 (𝜓#) 𝜌 (𝐺#) 𝜌 (𝐺#! " ) Isomorphism Automorphism Figure 5: Local Natural Graph Network kernel k between node features ρ and ρ1 consists of a map kpq : ρp Gpq Ñ ρ1p Gqq for each edge pp, qq, satisfying the above commuting diagrams for each edge isomorphism ψ : Gpq Ñ G1 p1q1 and automorphism χ : Gpq Ñ Gpq. In this example, the node neighbourhoods of p, p1, q and q1 are colored dark. Edge isomorphism ψ, which swaps nodes 1 and 5, restricts to node isomorphisms ψp and ψq on input and output node neighbourhoods. The associated linear maps ρpψpq and ρ1pψqq swap second and third row and first and second row respectively - corresponding to the reordering of the nodes in the neighbourhood by the node isomorphism. Similarly, the automorphism χ swaps nodes 3 and 5. The isomorphism leads to weight sharing between kpq and kp1q1 and the automorphism to a kernel constraint on kpq. which takes the role of W in Eq. 3. These maps should satisfy that for any edge neighbourhood isomorphism ψ : Gpq Ñ G1 p1q1, we have that ρ1pψqq kpq kp1q1 ρpψpq. (4) In words, this local naturality criterion states that passing the message along an edge from p to q, then transporting with a local isomorphism to q1 yields the same result as first transporting from p to p1, then passing the message along the edge to q1. In analogy to the global NGN layer, we have that isomorphisms between different edge neighbourhoods bring about weight sharing - with a change of basis given by Eq. 4, while automorphisms create constraints on the kernel k. Using the local NGN kernel k between node feature spaces ρ and ρ1, we can define a global NGN layer between graph feature spaces ˆρ and ˆρ1 as: pp,qq PEp Gq kpqpvpq (5) The following main result, proven in Appendix D, shows that this gives a global NGN layer. Theorem 1. Let k be a local NGN kernel between node feature spaces ρ and ρ1. Then the layer in equation 5 defines a global NGN layer between the global graph feature spaces ˆρ and ˆρ1, satisfying the global NGN naturality condition (Eq. 2). In appendix F, we show when a local NGN is applied to a regular lattice, which is a graph with a global transitive symmetry, the NGN is equivalent to a group equivariant convolutional neural network [Cohen and Welling, 2016], when the feature spaces and neighbourhoods are chosen appropriately. In particular, when the graph is a square grid with edges on the diagonals, we recover an equivariant planar CNN with 3x3 kernels. Bigger kernels are achieved by adding more edges. When the graph is a grid on a locally flat manifold, such as a icosahedron or another platonic solid, and the grid is a regular lattice, except at some corner points, the NGN is equivalent to a gauge equivariant CNN [Cohen et al., 2019], except around the corners. q p p q p q Embed EGN Project Figure 6: Local NGN message passing with an equivariant graph network kernel. The node feature vp at p can be embedded into a graph feature vpÑq of the edge neighbourhood, to which any equivariant graph neural network can be applied. The output graph feature v1 pÑq can be projected to obtain the message from p to q, v1p q. The messages to q are invariantly aggregated to form output feature v1 q. 4 Graph Neural Network Message Parameterization Local naturality requires weight sharing only between edges with isomorphic neighbourhoods, so, in theory, one can use separate parameters for each isomorphism class of edge neighbourhoods to parameterize the space of natural kernels. In practice, graphs such as social graphs are quite heterogeneous, so that that few edges are isomorphic and few weights need to be shared, making learning and generalization difficult. This can be addressed by re-interpreting the message from p to q, kpqvp, as a function kp Gpq, vpq of the edge neighbourhood Gpq and feature value vp at p, potentially generalized to being non-linear in vp, and then letting k be a neural network-based message network . Local naturality (Eq. 4) can be guaranteed, even without explicitly solving kernel constraints for each edge in the following way. By construction of the neighbourhoods, the node feature vp can always be embedded into an edge feature, a graph feature vpÑq of the edge neighbourhood Gpq. The resulting graph feature can then be processed by an appropriate equivariant graph neural network operating on Gpq, in which nodes p and q have been distinctly marked, e.g. by a additional feature. The output graph feature v1 pÑq can be restricted to create a node feature v1p q at q, which is the message output. The messages are then aggregated using e.g. summing to create the convolution output v1 q ř pp,qq PE v1p q. This is illustrated in figure 6. It is proven in appendix E that the graph equivariance constraint on the message network ensures that the resulting message satisfies the local naturality constraint (Eq. 4). The selection of the type of graph feature and message network forms a large design space of natural graph networks. If, as in the example above, the node feature vp is a vector representation of the permutation of the node neighbourhood, the feature can be embedded into an invariant scalar feature of the edge neighbourhood graph by assigning an arbitrary node ordering to the edge neighbourhood and transporting from the node neighbourhood to the edge neighbourhood, setting a 0 for nodes outside the node neighbourhood. Any graph neural network with invariant features can subsequently be used to process the edge neighbourhood graph feature, whose output we restrict to obtain the message output at q. As a simplest example, we propose GCN2, which uses an invariant message passing algorithm, or Graph Convolutional Neural Network [Kipf and Welling, 2016], on graph Gpq as message network. 5 Naturality as Generalization of Equivariance As explained in Section 2.1, the difference between a global natural graph network and an equivariant graph network is that the GNGN does not require that non-isomorphic graphs are processed similarly, while the EGN requires all graphs to be processed the same. EGNs can be understood in terms of groups, representations and equivariant maps, but the more general GNGN requires the more general framework category theory, originally developed in algebraic topology, but recently also used as a modelling tool for more applied problems [Fong and Spivak, 2018]. Its constructions give rise to an elegant framework for building equivariant message passing networks, which we call Natural Networks , potentially applicable beyond graph networks. In this section, we will outline the key ingredients of natural networks. We refer a reader interested in learning more about category theory to Leinster [2016] and Fong and Spivak [2018]. A (small) category C consists of a set of objects Obp Cq and for each two objects, X, Y P Obp Cq, a set of abstract (homo)morphisms, or arrows, f P Hom Cp X, Y q, f : X Ñ Y between them. The arrows can be composed associatively into new arrows and each object has an identity arrow id X : X Ñ X with the obvious composition behaviour. When arrow f : X Ñ Y, g : Y Ñ X compose to identities on X and Y , they are isomorphisms (with f 1 g). A map between two categories C and D is a functor F : C Ñ D, when it maps each object X P Obp Cq to an object Fp Xq P Obp Dq and to each morphism f : X Ñ Y in C, a morphism Fpfq : Fp Xq Ñ Fp Y q in D, such that Fpg fq Fpgq Fpfq. Given two functors F, G : C Ñ D, a natural transformation η : F ñ G consists of, for each object X P Obp Cq, a morphism ηX : Fp Xq Ñ Fp Y q, such that for each morphism f : X Ñ Y in C, the following diagram commutes, meaning that the two compositions ηY Fpfq, Gpfq ηX : Fp Xq Ñ Gp Y q are the same: Fp Xq Gp Xq Fp Y q Gp Y q A group is an example of a category with one object and in which all arrows, corresponding to group elements, are isomorphisms. Group representations are functors from this category to the category of vector spaces, mapping the single object to a vector space and morphisms to linear bijections of this space. The functor axioms specialise exactly to the axioms of a group representation. A natural transformation between such functors is exactly an equivariant map. As the group category has only one object, the natural transformation consists of a single morphism (linear map). Equivariant Graph Networks on graphs with N nodes are examples of these, in which the group is the permutation group SN, the representation space are N ˆ N matrices, whose columns and rows are permuted by the group action, and the layer is a single equivariant map. To study global NGNs, we define a category of graphs, whose objects are concrete graphs and morphisms are graph isomorphisms. The graph feature spaces (Def. 2.4) are functors from this graph category to the category Vec of vector spaces. The GNGN layer is a natural transformation between such functors, consisting of a different map for each graph, but with a naturality constraint (Eq. 6) for each graph isomorphism (including automorphisms). Similarly, for local NGNs, we define a category C of node neighbourhoods and local node isomorphisms and a category D of edge neighbourhoods and local edge isomorphisms. A functor F0 : D Ñ C maps an edge neighbourhood to the node neighbourhood of the start node and an edge isomorphisms to the node isomorphism of the start node which is well defined by the construction of the neighbourhoods. Similarly, functor F1 : D Ñ C maps to the neighbourhood of the tail node of the edge. Node feature spaces are functors ρ, ρ1 : C Ñ Vec. Composition of functors leads to two functors ρ F0, ρ1 F1 : D Ñ Vec, mapping an edge neighbourhood to the input feature at the start node or the output feature at the end node. A local NGN kernel k is a natural transformation between these functors. 6 Related Work As discussed above, prior graph neural networks can be broadly classified into local (message passing) and global equivariant networks. The former in particular has received a lot of attention, with early work by [Gori et al., 2005, Kipf and Welling, 2016]. Many variants have been proposed, with some influential ones including [Gilmer et al., 2017, Veliˇckovi c et al., 2018, Li et al., 2017]. Global methods include [Hartford et al., 2018, Maron et al., 2018, 2019, Albooyeh et al., 2019]. We note that in addition to these methods, there are graph convolutional methods based on spectral rather than spatial techniques [Bruna et al., 2014, Defferrard et al., 2016, Perraudin et al., 2018]. Covariant Compositional Networks (CCN) Kondor et al. [2018] are most closely related to NGNs, as this is also a local equivariant message passing network. CCN also uses node neighbourhoods and node features that are a representation of the group of permutations of the neighbourhood. CCNs are a special case of NGNs. When in a NGN (1) the node neighbourhood is chosen to be the receptive field of the node, so that the node neighbourhood grows in each layer, and (2) when the edge neighbourhood Gpq is chosen to be the node neighbourhood of q, and (3) when the kernel is additionally restricted by the permutation group, rather just its subgroup the automorphism group of the edge neighbourhood, a CCN is recovered. These specific choices make that the feature dimensions grow as the network gets deeper, which can be problematic for large graphs. Furthermore, as the kernel is more restricted, only a subspace of equivariant kernels is used by CCNs. Dataset MUTAG PTC PROTEINS NCI1 NCI109 IMDB-B IMDB-M size 188 344 113 4110 4127 1000 1500 classes 2 2 2 2 2 2 3 avg node # 17.9 25.5 39.1 29.8 29.6 19.7 14 DGCNN [Zhang et al., 2018] 85.83 1.7 58.59 2.5 75.54 0.9 74.44 0.5 NA 70.03 0.9 47.83 0.9 PSCN [Niepert et al., 2016](k=10) 88.95 4.4 62.29 5.7 75 2.5 76.34 1.7 NA 71 2.3 45.23 2.8 DCNN [Atwood and Towsley, 2016] NA NA 61.29 1.6 56.61 1.0 NA 49.06 1.4 33.49 1.4 ECC [Simonovsky and Komodakis, 2017] 76.11 NA NA 76.82 75.03 NA NA DGK [Yanardag and Vishwanathan, 2015] 87.44 2.7 60.08 2.6 75.68 0.5 80.31 0.5 80.32 0.3 66.96 0.6 44.55 0.5 Diff Pool [Ying et al., 2018] NA NA 78.1 NA NA NA NA CCN [Kondor et al., 2018] 91.64 7.2 70.62 7.0 NA 76.27 4.1 75.54 3.4 NA NA Invariant Graph Networks [Maron et al., 2018] 83.89 12.95 58.53 6.86 76.58 5.49 74.33 2.71 72.82 1.45 72.0 5.54 48.73 3.41 GIN [Xu et al., 2018] 89.4 5.6 64.6 7.0 76.2 2.8 82.7 1.7 NA 75.1 5.1 52.3 2.8 1-2-3 GNN [Morris et al., 2019] 86.1 60.9 75.5 76.2 NA 74.2 49.5 PPGN v1 [Maron et al., 2019] 90.55 8.7 66.17 6.54 77.2 4.73 83.19 1.11 81.84 1.85 72.6 4.9 50 3.15 PPGN v2 [Maron et al., 2019] 88.88 7.4 64.7 7.46 76.39 5.03 81.21 2.14 81.77 1.26 72.2 4.26 44.73 7.89 PPGN v2 [Maron et al., 2019] 89.44 8.05 62.94 6.96 76.66 5.59 80.97 1.91 82.23 1.42 73 5.77 50.46 3.59 Ours (GCN2) 89.39 1.60 66.84 1.79 71.71 1.04 82.74 1.35 83.00 1.89 74.80 2.01 51.27 1.50 Rank 5th 2nd 11th 2nd 1st 2nd 2nd Table 2: Results on the Graph Classification dataset comparing to other deep learning methods from Yanardag and Vishwanathan [2015]. 7 Experiments Method Fixed Sym GCN 96.17 96.17 Ours 98.82 98.82 Table 1: Ico MNIST results. Icosahedral MNIST In order to experimentally show that our method is equivariant to global symmetries, and increases expressiveness over an invariant message passing network (GCN), we classify MNIST on projected to the icosahedron, as is done in Cohen et al. [2019]. In first column of table 1, we show accuracy when trained and tested on one fixed projection, while in the second column we test the same model on projections that are transformed by a random icosahedral symmetry. NGN outperforms the GCN and the equality of the accuracies shows the model is exactly equivariant. Experimental details can be found in Appendix A. Graph Classification We evaluate our model with GCN2 message parametrization on a standard set of 8 graph classification benchmarks from Yanardag and Vishwanathan [2015], containing five bioinformatics data sets and three social graphs5. We use the 10-fold cross validation method as described by Zhang et al. [2018] and report the best averaged accuracy across the 10-folds, as described by Xu et al. [2018], in table 2. Results from prior work is from Maron et al. [2019]. On most data sets, our local equivariant method performs competitively with global equiviarant methods [Maron et al., 2018, 2019]. In appendix B, we empirically show the expressiveness of our model, as well as the runtime cost. 8 Conclusion In this paper, we have developed a new framework for building neural networks that operate on graphs, which pass messages with kernels that depend on the local graph structure and have features that are sensitive to the direction of flow of information over the graph. We define natural networks as neural networks that process data irrespective of how the data is encoded - critically important for graphs, whose typical encoding is highly non-unique - using naturality, a concept from elementary category theory. Local natural graph networks satisfy the naturality constraint with a message passing algorithm, making them scalable. We evaluate one instance of local natural graph networks using a message network on several benchmarks and find competitive results. 5These experiments were run on QUVA machines. 9 Broader Impact The broader impact of this work can be analyzed in at least two different ways. Firstly, graph neural networks in general are particularly suited for analyzing human generated data. This makes that powerful graph neural nets can provide tremendous benefit automating common business tasks. On the flip side, much human generated data is privacy sensitive. Therefore, as a research community, we should not solely focus on developing better ways of analyzing such data, but also invest in technologies that help protect the privacy of those generating the data. Secondly, in this work we used some elementary applied category theory to precisely specify our problem of local equivariant message passing. We believe that applied category theory can and should be used more widely in the machine learning community. Formulating problems in a more general mathematical language makes it easier to connect disparate problem domains and solutions, as well as to communicate more precisely and thus efficiently, accelerating the research process. In the further future, we have hopes that having a better language with which to talk about machine learning problems and to specify models, may make machine learning systems more safe. 10 Funding Disclosure Funding in direct support of this work: Qualcomm Technology, Inc. Additional revenues for Max Welling not used to support this project: part-time employment at the University of Amsterdam. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, page 1 21, 2020. ISSN 2162-237X, 2162-2388. doi: 10.1109/TNNLS.2020. 2978386. ar Xiv: 1901.00596. Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? ar Xiv preprint ar Xiv:1810.00826, 2018. Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. ar Xiv preprint ar Xiv:1812.09902, 2018. Taco Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge equivariant convolutional networks and the icosahedral cnn. In ICML, pages 1321 1330, 2019. Taco Cohen and Max Welling. Group equivariant convolutional networks. In International conference on machine learning, pages 2990 2999, 2016. Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. ar Xiv preprint ar Xiv:1905.11136, 2019. Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, pages 7092 7101, 2019. Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. ar Xiv preprint ar Xiv:1609.02907, 2016. Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1263 1272. JMLR. org, 2017. Brendan Fong and David I Spivak. Seven sketches in compositionality: An invitation to applied category theory. ar Xiv preprint ar Xiv:1803.05316, 2018. Tom Leinster. Basic Category Theory. Cambridge University Press, 2016. ISBN 978-0-521-06119-3. doi: 10.2307/2329297. M. Gori, G. Monfardini, and F. Scarselli. A new model for learning in graph domains. volume 2, page 729 734 vol. 2, Jul 2005. doi: 10.1109/IJCNN.2005.1555942. Petar Veliˇckovi c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. ar Xiv:1710.10903 [cs, stat], Feb 2018. URL http://arxiv. org/abs/1710.10903. ar Xiv: 1710.10903. Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks. ar Xiv:1511.05493 [cs, stat], Sep 2017. URL http://arxiv.org/abs/1511.05493. ar Xiv: 1511.05493. Jason Hartford, Devon R Graham, Kevin Leyton-Brown, and Siamak Ravanbakhsh. Deep models of interactions across sets. ar Xiv preprint ar Xiv:1803.02879, 2018. Marjan Albooyeh, Daniele Bertolini, and Siamak Ravanbakhsh. Incidence networks for geometric deep learning. ar Xiv preprint ar Xiv:1905.11460, 2019. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le Cun. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR), 2014. URL http://arxiv.org/abs/1312.6203. ar Xiv: 1312.6203. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. page 1 14, 2016. Nathanaël Perraudin, Michaël Defferrard, Tomasz Kacprzak, and Raphael Sgier. Deepsphere: Efficient spherical convolutional neural network with healpix sampling for cosmological applications. ar Xiv:1810.12186 [astro-ph], Oct 2018. URL http://arxiv.org/abs/1810.12186. ar Xiv: 1810.12186. Risi Kondor, Hy Truong Son, Horace Pan, Brandon Anderson, and Shubhendu Trivedi. Covariant compositional networks for learning graphs. ar Xiv preprint ar Xiv:1801.02144, 2018. Pinar Yanardag and SVN Vishwanathan. Deep graph kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1365 1374, 2015. Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture for graph classification. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. In International conference on machine learning, pages 2014 2023, 2016. James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Advances in neural information processing systems, pages 1993 2001, 2016. Martin Simonovsky and Nikos Komodakis. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3693 3702, 2017. Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. In Advances in neural information processing systems, pages 4800 4810, 2018. Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4602 4609, 2019. Giorgos Bouritsas et al. Improving graph neural network expressivity via subgraph isomorphism counting. 2020. Emiel Hoogeboom, Jorn WT Peters, Taco S Cohen, and Max Welling. Hexaconv. ar Xiv preprint ar Xiv:1803.02108, 2018.