# en_equivariant_graph_neural_networks__fee21796.pdf E(n) Equivariant Graph Neural Networks Victor Garcia Satorras 1 Emiel Hoogeboom 1 Max Welling 1 This paper introduces a new model to learn graph neural networks equivariant to rotations, translations, reflections and permutations called E(n)- Equivariant Graph Neural Networks (EGNNs). In contrast with existing methods, our work does not require computationally expensive higher-order representations in intermediate layers while it still achieves competitive or better performance. In addition, whereas existing methods are limited to equivariance on 3 dimensional spaces, our model is easily scaled to higher-dimensional spaces. We demonstrate the effectiveness of our method on dynamical systems modelling, representation learning in graph autoencoders and predicting molecular properties. 1. Introduction Although deep learning has largely replaced hand-crafted features, many advances are critically dependent on inductive biases in deep neural networks. An effective method to restrict neural networks to relevant functions is to exploit the symmetry of problems by enforcing equivariance with respect to transformations from a certain symmetry group. Notable examples are translation equivariance in Convolutional Neural Networks and permutation equivariance in Graph Neural Networks (Bruna et al., 2013; Defferrard et al., 2016; Kipf & Welling, 2016a). Many problems exhibit 3D translation and rotation symmetries. Some examples are point clouds (Uy et al., 2019), 3D molecular structures (Ramakrishnan et al., 2014) or N-body particle simulations (Kipf et al., 2018). The group corresponding to these symmetries is named the Euclidean group: SE(3) or when reflections are included E(3). It is often desired that predictions on these tasks are either equivariant or invariant with respect to E(3) transformations. 1Uv A-Bosch Delta Lab, University of Amsterdam, Netherlands. Correspondence to: Victor Garcia Satorras , Emiel Hoogeboom , Max Welling . Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). Figure 1. Example of rotation equivariance on a graph with a graph neural network φ Recently, various forms and methods to achieve E(3) or SE(3) equivariance have been proposed (Thomas et al., 2018; Fuchs et al., 2020; Finzi et al., 2020; K ohler et al., 2020). Many of these works achieve innovations in studying types of higher-order representations for intermediate network layers. However, the transformations for these higher-order representations require coefficients or approximations that can be expensive to compute. Additionally, in practice for many types of data the inputs and outputs are restricted to scalar values (for instance temperature or energy, referred to as type-0 in literature) and 3d vectors (for instance velocity or momentum, referred to as type-1 in literature). In this work we present a new architecture that is translation, rotation and reflection equivariant (E(n)), and permutation equivariant with respect to an input set of points. Our model is simpler than previous methods in that it does not require the spherical harmonics as in (Thomas et al., 2018; Fuchs et al., 2020) while it can still achieve competitive or better results. In addition, equivariance in our model is not limited to the 3-dimensional space and can be scaled to larger dimensional spaces without a significant increase in computation. E(n) Equivariant Graph Neural Networks We evaluate our method in modelling dynamical systems, representation learning in graph autoencoders and predicting molecular properties in the QM9 dataset. Our method reports the best or very competitive performance in all three experiments. 2. Background In this section we introduce the relevant materials on equivariance and graph neural networks which will later complement the definition of our method. 2.1. Equivariance Let Tg : X ! X be a set of transformations on X for the abstract group g 2 G. We say a function φ : X ! Y is equivariant to g if there exists an equivalent transformation on its output space Sg : Y ! Y such that: φ(Tg(x)) = Sg(φ(x)) (1) As a practical example, let φ( ) be a non-linear function, x = (x1, . . . , x M) 2 RM n an input set of M point clouds embedded in a n-dimensional space, φ(x) = y 2 RM n the transformed set of point clouds, Tg a translation on the input set Tg(x) = x + g and Sg an equivalent translation on the output set Sg(y) = y + g. If our transformation φ : X ! Y is translation equivariant, translating the input set Tg(x) and then applying the function φ(Tx(x)) on it, will deliver the same result as first running the function y = φ(x) and then applying an equivalent translation to the output Tg(y) such that Equation 1 is fulfilled and φ(x+g) = φ(x) + g. In this work we explore the following three types of equivariance on a set of particles x: 1. Translation equivariance. Translating the input by g 2 Rn results in an equivalent translation of the output. Let x+g be shorthand for (x1+g, . . . , x M +g). Then y + g = φ(x + g) 2. Rotation (and reflection) equivariance. For any or- thogonal matrix Q 2 Rn n, let Qx be shorthand for (Qx1, . . . , Qx M). Then rotating the input results in an equivalent rotation of the output Qy = φ(Qx). 3. Permutation equivariance. Permuting the input results in the same permutation of the output P(y) = φ(P(x)) where P is a permutation on the row indexes. Note that velocities v 2 RM n are unaffected by translations, but they transform equivalently under rotation (2) and permutation (3). Our method introduced in Section 3 will satisfy the three above mentioned equivariant constraints. 2.2. Graph Neural Networks Graph Neural Networks are permutation equivariant networks that operate on graph structured data (Bruna et al., 2013; Defferrard et al., 2016; Kipf & Welling, 2016a). Given a graph G = (V, E) with nodes vi 2 V and edges eij 2 E we define a graph convolutional layer following notation from (Gilmer et al., 2017) as: mij = φe(hl i 2 Rnf is the nf-dimensional embedding of node vi at layer l. aij are the edge attributes. N(i) represents the set of neighbors of node vi. Finally, φe and φh are the edge and node operations respectively which are commonly approximated by Multilayer Perceptrons (MLPs). 3. Equivariant Graph Neural Networks In this section we present Equivariant Graph Neural Networks (EGNNs). Following the notation from background Section 2.2, we consider a graph G = (V, E) with nodes vi 2 V and edges eij 2 E. In addition to the feature node embeddings hi 2 Rnf we now also consider a n-dimensional coordinate xi 2 Rn associated with each of the graph nodes. Our model will preserve equivariance to rotations and translations on these set of coordinates xi and it will also preserve equivariance to permutations on the set of nodes V in the same fashion as GNNs. Our Equivariant Graph Convolutional Layer (EGCL) takes as input the set of node embeddings hl = {hl 0, . . . , hl M 1}, coordinate embeddings xl = {xl 0, . . . , xl M 1} and edge information E = (eij) and outputs a transformation on hl+1 and xl+1. Concisely: hl+1, xl+1 = EGCL[hl, xl, E]. The equations that define this layer are the following: φx (mij) (4) Notice the main differences between the above proposed method and the original Graph Neural Network from equation 2 are found in equations 3 and 4. In equation 3 we now input the relative squared distance between two coordinates kxl jk2 into the edge operation φe. The embeddings hl j, and the edge attributes aij are also provided as input to the edge operation as in the GNN case. In our case the edge attributes will incorporate the edge values aij = eij, but they could also include additional edge information. E(n) Equivariant Graph Neural Networks In Equation 4 we update the position of each particle xi as a vector field in a radial direction. In other words, the position of each particle xi is updated by the weighted sum of all relative differences (xi xj)8j. The weights of this sum are provided as the output of the function φx : Rnf ! R1 that takes as input the edge embedding mij from the previous edge operation and outputs a scalar value. C is chosen to be 1/(M 1), which divides the sum by its number of elements. This equation is the main difference of our model compared to standard GNNs and it is the reason why equivariances 1, 2 are preserved (proof in Appendix A). Despite its simplicity, this equivariant operation is very flexible since the embedding mij can carry information from the whole graph and not only from the given edge eij. Finally, equations 5 and 6 follow the same updates than standard GNNs. Equation 5 just aggregates all the incoming messages from neighbor nodes N(i) to node vi and Equation 6 performs the node operation φv which takes as input the aggregated messages mi, the node emedding hl i and outputs the updated node embedding hl+1 3.1. Analysis on E(n) equivariance In this section we analyze the equivariance properties of our model for E(3) symmetries (i.e. properties 1 and 2 stated in section 2.1). In other words, our model should be translation equivariant on x for any translation vector g 2 Rn and it should also be rotation and reflection equivariant on x for any orthogonal matrix Q 2 Rn n. More formally our model satisfies: Qxl+1 + g, hl+1 = EGCL(Qxl + g, hl) We provide a formal proof of this in Appendix A. Intuitively, let s consider a hl feature which is already E(n) invariant, then we can see that the resultant edge embedding mij from Equation 3 will also be E(n) invariant, because in addition to hl, it only depends on squared distances kxl jk2, which are E(n) invariant. Next, Equation 4 computes xl+1 i by a weighted sum of differences (xi xj) which is added to xi, this transforms as a type-1 vector and preserves equivariance (see Appendix A). Finally the last two equations 5 and 6 that generate the next layer node-embeddings hl+1 remain E(n) invariant since they only depend on hl and mij which, as we saw above, are E(n) invariant. Therefore the output hl+1 is E(n) invariant and xl+1 is E(n) equivariant to xl. Inductively, a composition of EGCLs will also be equivariant. 3.2. Extending EGNNs for vector type representations In this section we propose a slight modification to the presented method such that we explicitly keep track of the particle s momentum. In some scenarios this can be useful not only to obtain an estimate of the particle s velocity at every layer but also to provide an initial velocity value in those cases where it is not 0. We can include momentum to our proposed method by just replacing Equation 4 of our model with the following equation: Note that this extends the EGCL layer as hl+1, xl+1, vl+1 = EGCL[hl, xl, vl, E]. The only difference is that now we broke down the coordinate update (eq. 4) in two steps, first we compute the velocity vl+1 i and then we use this velocity to update the position xl i. The velocity vl is scaled by a new function φv : RN ! R1 that maps the node embedding hl i to a scalar value. Notice that if the initial velocity is set to zero (v(0) i = 0), both equations 4 and 7 become exactly the same for the first layer l = 0 and they become equivalent for the next layers since φv just re-scales all the outputs of φx from the previous layers with a scalar value. We proof the equivariance of this variant of the model in Appendix B.1. This variant is used in experiment 5.1 where we provide the initial velocity of the system, and predict a relative position change. 3.3. Inferring the edges Given a point cloud or a set of nodes, we may not always be provided with an adjacency matrix. In those cases we can assume a fully connected graph where all nodes exchange messages with each other, in other words, the neighborhood operator N(i) from Equation 5 would include all other nodes of the graph except for i. This fully connected approach may not scale well to large graphs where we may want to locally limit the exchange of messages to avoid an overflow of information. Similarly to (Serviansky et al., 2020; Kipf et al., 2018) we present a simple solution to infer the relations/edges of the graph in our model. We can re-write the aggregation operation from our model (eq. 5) in the following way: Where eij takes value 1 if there is an edge between nodes (i, j) and 0 otherwise. Notice this doesn t modify yet the original equation used in our model, it is just a change in notation. Now we can choose to approximate the relations eij with the following function eij = φinf(mij), where φinf : Rnf ! [0, 1]1 resembles a linear layer followed by a sigmoid function that takes as input the current edge embedding and outputs a soft estimation of its edge value. This modification doesn t change the E(n) properties of the model since we are only operating on the messages mij which are already E(n) invariant. E(n) Equivariant Graph Neural Networks GNN Radial Field TFN Schnet EGNN Edge mij = φe(hl j, aij) mij = φrf(krl i mij = φcf(krl j) mij = φe(hl ijk2, aij) ˆ mij = rl Agg. mi = P j2N (i) mij mi = P j6=i mij mi = P j6=i mij mi = P j6=i mij mi = P j2N (i) mij ˆ mi = C P i, mi) xl+1 i + mi hl+1 i + mi hl+1 i, mi) hl+1 Non-equivariant E(n)-Equivariant SE(3)-Equivariant E(n)-Invariant E(n)-Equivariant Table 1. Comparison over different works from the literature under the message passing framework notation. We created this table with the aim to provide a clear and simple way to compare over these different methods. The names from left to right are: Graph Neural Networks (Gilmer et al., 2017); Radial Field from Equivariant Flows (K ohler et al., 2019); Tensor Field Networks (Thomas et al., 2018); Schnet (Sch utt et al., 2017b); and our Equivariant Graph Neural Network. The difference between two points is written rij = (xi xj). 4. Related Work Group equivariant neural networks have demonstrated their effectiveness in a wide variety of tasks (Cohen & Welling, 2016; 2017; Weiler & Cesa, 2019; Rezende et al., 2019; Romero & Cordonnier, 2021). Recently, various forms and methods to achieve E(3) or SE(3) equivariance have been proposed. Thomas et al. (2018); Fuchs et al. (2020) utilize the spherical harmonics to compute a basis for the transformations, which allows transformations between higherorder representations. A downside to this method is that the spherical harmonics need to be recomputed which can be expensive. Currently, an extension of this method to arbitrary dimensions is unknown. Finzi et al. (2020) parametrize transformations by mapping kernels on the Lie Algebra. For this method the neural network outputs are in certain situations stochastic, which may be undesirable. Horie et al. (2020) proposes a set of isometric invariant and equivairant transformations for Graph Neural Networks. K ohler et al. (2019); K ohler et al. (2020) propose an E(n) equivariant network to model 3D point clouds, but the method is only defined for positional data on the nodes without any feature dimensions. Another related line of research concerns message passing algorithms on molecular data. (Gilmer et al., 2017) presented a message passing setting (or Graph Neural Network) for quantum chemistry, this method is permutation equivariant but not translation or rotation equivariant. (Kondor et al., 2018) extends the equivariance of GNNs such that each neuron transforms in a specific way under permutations, but this extension only affects its permutation group and not translations or rotations in a geometric space. Further works (Sch utt et al., 2017b;a) build E(n) invariant message passing networks by inputting the relative distances between points. Klicpera et al. (2020b;a) in addition to relative distances it includes a modified message passing scheme analogous to Belief Propagation that considers angles and directional information equivariant to rotations. It also uses Bessel functions and spherical harmonics to construct and orthogonal basis. Anderson et al. (2019); Miller et al. (2020) include SO(3) equivariance in its intermediate layers for modelling the behavior and properties of molecular data. Our method is also framed as a message passing framework but in contrast to these methods it achieves E(n) equivariance. Relationship with existing methods In Table 1 the EGNN equations are detailed together with some of its closest methods from the literature under the message passing notation from (Gilmer et al., 2017). This table aims to provide a simple way to compare these different algorithms. It is structured in three main rows that describe i) the edge ii) aggregation and iii) node update operations. The GNN algorithm is the same as the previously introduced in Section 2.1. Our EGNN algorithm is also equivalent to the description in Section 3 but notation has been modified to match the (edge, aggregation, node) format. In all equations rl ij = (xi xj)l. Notice that except the EGNN, all algorithms have the same aggregation operation and the main differences arise from the edge operation. The algorithm that we call Radial Field is the E(n) equivariant update from (K ohler et al., 2019). This method is E(n) equivariant, however its main limitation is that it only operates on x and it doesn t propagate node features h among nodes. In the method φrf is modelled as an MLP. Tensor Field Networks (TFN) (Thomas et al., 2018) instead propagate the node embeddings h but it uses spherical harmonics to compute its learnable weight kernel W k : R3 ! R(2 +1) (2k+1) which preserves SE(3) equivariance but is expensive to compute an limited to the 3 dimensional space. The SE(3) Transformer (Fuchs et al., 2020) (not included in this table), can be interpreted as an extension of TFN with attention. Schnet (Sch utt et al., 2017b) can be interpreted as an E(n) invariant Graph Neural Network where φcf receives as input relative distances and outputs a continuous filter convolution that multiplies the neighbor embeddings h. Our EGNN differs from these other methods in terms that it performs two different updates in each of the table rows, one related to the embeddings h and another related to the coordinates E(n) Equivariant Graph Neural Networks x, these two variables exchange information in the edge operation. In summary the EGNN can retain the flexibility of GNNs while remaining E(n) equivariant as the Radial Field algorithm and without the need to compute expensive operations (i.e. spherical harmonics). 5. Experiments 5.1. Modelling a dynamical system N-body system In a dynamical system a function defines the time dependence of a point or set of points in a geometrical space. Modelling these complex dynamics is crucial in a variety of applications such as control systems (Chua et al., 2018), model based dynamics in reinforcement learning (Nagabandi et al., 2018), and physical systems simulations (Grzeszczuk et al., 1998; Watters et al., 2017). In this experiment we forecast the positions for a set of particles which are modelled by simple interaction rules, yet can exhibit complex dynamics. Similarly to (Fuchs et al., 2020), we extended the Charged Particles N-body experiment from (Kipf et al., 2018) to a 3 dimensional space. The system consists of 5 particles that carry a positive or negative charge and have a position and a velocity associated in 3-dimensional space. The system is controlled by physic rules: particles are attracted or repelled depending on their charges. This is an equivariant task since rotations and translations on the input set of particles result in the same transformations throughout the entire trajectory. Dataset: We sampled 3.000 trajectories for training, 2.000 for validation and 2.000 for testing. Each trajectory has a duration of 1.000 timesteps. For each trajectory we are provided with the initial particle positions p(0) = {p(0) 1 , . . . p(0) 5 } 2 R5 3, their initial velocities v(0) = {v(0) 1 , . . . v(0) 5 } 2 R5 3 and their respective charges c = {c1, . . . c5} 2 { 1, 1}5. The task is to estimate the positions of the five particles after 1.000 timesteps. We optimize the averaged Mean Squared Error of the estimated position with the ground truth one. Implementation details: In this experiment we used the extension of our model that includes velocity from section 3.2. We input the position p(0) as the first layer coordinates x0 of our model and the velocity v(0) as the initial velocity in Equation 7, the norms kv(0) i k are also provided as features to h0 i through a linear mapping. The charges are input as edge attributes aij = cicj. The model outputs the last layer coordinates x L as the estimated positions. We compare our method to its non equivariant Graph Neural Network (GNN) cousin, and the equivariant methods: Radial Field (K ohler et al., 2019), Tensor Field Networks and the SE(3) Transformer. All algorithms are composed of 4 layers and have been trained under the same conditions, batch size 100, 10.000 epochs, Adam optimizer, the learning rate was tuned independently for each model. We used 64 features for the hidden layers in the Radial Field, the GNN and our EGNN. As non-linearity we used the Swish activation function (Ramachandran et al., 2017). For TFN and the SE(3) Transformer we swept over different number of vector types and features and chose those that provided the best performance. Further implementation details are provided in Appendix C.1. A Linear model that simply considers the motion equation p(t) = p(0) + v(0)t is also included as a baseline. We also provide the average forward pass time in seconds for each of the models for a batch of 100 samples in a GTX 1080 Ti GPU. Method MSE Forward time (s) Linear 0.0819 .0001 SE(3) Transformer 0.0244 .1346 Tensor Field Network 0.0155 .0343 Graph Neural Network 0.0107 .0032 Radial Field 0.0104 .0039 EGNN 0.0071 .0062 Table 2. Mean Squared Error for the future position estimation in the N-body system experiment, and forward time in seconds for a batch size of 100 samples running in a GTX 1080Ti GPU. Results As shown in Table 2 our model significantly outperforms the other equivariant and non-equivariant alternatives while still being efficient in terms of running time. It reduces the error with respect to the second best performing method by a 32%. In addition it doesn t require the computation of spherical harmonics which makes it more time efficient than Tensor Field Networks and the SE(3) Transformer. Figure 2. Mean Squared Error in the N-body experiment for the Ra- dial Field, GNN and EGNN methods when sweeping over different amounts of training data. Analysis for different number of training samples: We want to analyze the performance of our EGNN in the small and large data regime. In the following, we report on a similar experiment as above, but instead of using 3.000 training samples we generated a new training partition of E(n) Equivariant Graph Neural Networks 50.000 samples and we sweep over different amounts of data from 100 to 50.000 samples. We compare the performances of our EGNN vs its non-equivariant GNN counterpart and the Radial Field algorithm. Results are presented in Figure 2. Our method outperforms both Radial Field and GNNs in the small and large data regimes. This shows the EGNN is more data efficient than GNNs since it doesn t require to generalize over rotations and translations of the data while it ensembles the flexibility of GNNs in the larger data regime. Due to its high model bias, the Radial Field algorithm performs well when data is scarce but it is unable to learn the subtleties of the dataset as we increase the training size. In summary, our EGNN benefits from both the high bias of E(n) methods and the flexibility of GNNs. 5.2. Graph Autoencoder A Graph Autoencoder can learn unsupervised representations of graphs in a continuous latent space (Kipf & Welling, 2016b; Simonovsky & Komodakis, 2018). In this experiment section we use our EGNN to build an Equivariant Graph Autoencoder. We will explain how Graph Autoencoders can benefit from equivariance and we will show how our method outperforms standard GNN autoencoders in the provided datasets. This problem is particularly interesting since the embedding space can be scaled to larger dimensions and is not limited to a 3 dimensional Euclidean space. Similarly to the work of (Kipf & Welling, 2016b) further extended by section 3.3 in (Liu et al., 2019), our graph autoencoder z = q(G) embeds a graph G into a set of latent nodes z = {z1, . . . z M} 2 RM n, where M is the number of nodes and n the embedding size per node. Notice this may reduce the memory complexity to store the graphs from O(M 2) to O(Mn) where n may depend on M for a certain approximation error tolerance. This differs from the variational autoencoder proposed in (Simonovsky & Komodakis, 2018) which embeds the graph in a single vector z 2 RK, which causes the reconstruction to be computationally very expensive since the nodes of the decoded graph have to be matched again to the ground truth. In addition to the introduced graph generation and representation learning methods, it is worth it mentioning that in the context of graph compression other methods (Cand es & Recht, 2009) can be used. More specifically, we will compare our Equivariant Graph Auto-Encoder in the task presented in (Liu et al., 2019) where a graph G = (V, E) with node features H 2 RM nf and adjacency matrix A 2 {0, 1}M M is embedded into a latent space z = q(H, A) 2 RM n. Following (Kipf & Welling, 2016b; Liu et al., 2019), we are only interested in reconstructing the adjacency matrix A since the datasets we will work with do not contain node features. The decoder g( ) proposed by (Liu et al., 2019) takes as input the embed- ding space z and outputs the reconstructed adjacency matrix ˆA = g(z), this decoder function is defined as follows: ˆAij = ge(zi, zj) = 1 1 + exp(w kzi zjk2 + b) Where w and b are its only learnable parameters and ge( ) is the decoder edge function applied to every pair of node embeddings. It reflects that edge probabilities will depend on the relative distances among node embeddings. The training loss is defined as the binary cross entropy between the estimated and the ground truth edges L = P ij BCE( ˆAij, Aij). The symmetry problem: The above stated autoencoder may seem straightforward to implement at first sight but in some cases there is a strong limitation regarding the symmetry of the graph. Graph Neural Networks are convolutions on the edges and nodes of a graph, i.e. the same function is applied to all edges and to all nodes. In some graphs (e.g. those defined only by its adjacency matrix) we may not have input features in the nodes, and for that reason the difference among nodes relies only on their edges or neighborhood topology. Therefore, if the neighborhood of two nodes is exactly the same, their encoded embeddings will be the same too. A clear example of this is a cycle graph (an example of a 4 nodes cycle graph is provided in Figure 3). When running a Graph Neural Network encoder on a node featureless cycle graph, we will obtain the exact same embedding for each of the nodes, which makes it impossible to reconstruct the edges of the original graph from the node embeddings. The cycle graph is a severe example where all nodes have the exact same neighborhood topology but these symmetries can be present in different ways for other graphs with different edge distributions or even when including node features if these are not unique. Figure 3. Visual representation of a Graph Autoencoder for a 4 nodes cycle graph. The bottom row illustrates that adding noise at the input graph breaks the symmetry of the embedding allowing the reconstruction of the adjacency matrix. An ad-hoc method to break the symmetry of the graph is introduced by (Liu et al., 2019). This method introduces noise sampled from a Gaussian distribution into the input node features of the graph h0 i N(0, σI). This noise allows different representations for all node embeddings and as a result the graph can be decoded again, but it comes E(n) Equivariant Graph Neural Networks Community Small Erdos&Renyi Encoder BCE % Error F1 BCE % Error F1 Baseline - 31.79 .0000 - 25.13 0.000 GNN 6.75 1.29 0.980 14.15 4.62 0.907 Noise-GNN 3.32 0.44 0.993 4.56 1.25 0.975 Radial Field 9.22 1.19 0.981 6.78 1.63 0.968 EGNN 2.14 0.06 0.999 1.65 0.11 0.998 Figure 5. In the Table at the left we report the Binary Cross Entropy, % Error and F1 scores for the test partition on the Graph Autoencoding experiment in the Community Small and Erdos&Renyi datasets. In the Figure at the right, we report the F1 score when overfitting a training partition of 100 samples in the Erdos&Renyi dataset for different values of sparsity pe. The GNN is not able to successfully auto-encode sparse graphs (small pe values) for the Erdos&Renyi dataset even when training and testing on the same small subset. with a drawback, the network has to generalize over the new introduced noise distribution. Our Equivariant Graph Autoencoder will remain translation and rotation equivariant to this sampled noise which we find makes the generalization much easier. Another way of looking at this is considering the sampled noise makes the node representations go from structural to positional (Srinivasan & Ribeiro, 2019) where E(n) equivariance may be beneficial. In our case we will simply input this noise as the input coordinates x0 N(0, σI) 2 RM n of our EGNN which will output an equivariant transformation of them x L, this output will be used as the embedding of the graph (i.e. z = x L) which is the input to the decoder from Equation 9. Dataset: We generated community-small graphs (You et al., 2018; Liu et al., 2019) by running the original code from (You et al., 2018). These graphs contain 12 M 20 nodes. We also generated a second dataset using the Erdos&Renyi generative model (Bollob as & B ela, 2001) sampling random graphs with an initial number of 7 M 16 nodes and edge probability pe = 0.25. We sampled 5.000 graphs for training, 500 for validation and 500 for test for both datasets. Each graph is defined as and adjacency matrix A 2 {0, 1}M M. Implementation details: Our Equivariant Graph Auto Encoder is composed of an EGNN encoder followed by the decoder from Equation 9. The graph edges Aij are input as edge attributes aij in Equation 3. The noise used to break the symmetry is input as the coordinates x0 N(0, σI) 2 RM n in the first layer and h0 is initialized as ones since we are working with featureless graphs. As mentioned before, the encoder outputs an equivariant transformation on the coordinates which is the graph embedding and input to the decoder z = x L 2 RM n. We use n = 8 dimensions for the embedding space. We compare the EGNN to its GNN cousin, we also compare to Noise-GNN which is an adaptation of our GNN to match the setting from (Liu et al., 2019), and we also include the Radial Field algorithm as an additional baseline. All four models have 4 layers, 64 features for the hidden layers, the Swish activation function as a non-linearity and they were all trained for 100 epochs using the Adam optimizer and learning rate 10 4. More details are provided in Appendix C.2. Since the number of nodes is larger than the number of layers, the receptive field of a GNN may not comprise the whole graph which can make the comparison unfair with our EGNN. To avoid this limitation, all models exchange messages among all nodes and the edge information is provided as edge attributes aij = Aij in all of them. Results: In the table from Figure 5 we report the Binary Cross Entropy loss between the estimated and ground truth edges, the % Error which is defined as the percentage of wrong predicted edges with respect to the total amount of potential edges, and the F1 score of the edge classification, all numbers refer to the test partition. We also include a Baseline that predicts all edges as missing ˆAij = 0. The standard GNN seems to suffer from the symmetry problem and provides the worst performance. When introducing noise (Noise-GNN), both the loss and the error decrease showing that it is actually useful to add noise to the input nodes. Finally, our EGNN remains E(n) equivariant to this noise distribution and provides the best reconstruction with a 0.11% error in the Erdos&Renyi dataset and close to optimal 0.06% in the Community Small dataset. A further analysis of the reconstruction error for different n embedding sizes is reported in Appendix D.1. Overfitting the training set: We explained the symmetry problem and we showed the EGNN outperforms other methods in the given datasets. Although we observed that adding noise to the GNN improves the results, it is difficult to exactly measure the impact of the symmetry limitation in these results independent from other factors such as generalization from the training to the test set. In this section we conduct an experiment where we train the different models in a subset of 100 Erdos&Renyi graphs and embedding size n = 16 with the aim to overfit the data. We evaluate the methods on the training data. In this experiment the GNN E(n) Equivariant Graph Neural Networks Task " "HOMO "LUMO µ C G H R2 U U0 ZPVE Units bohr3 me V me V me V D cal/mol K me V me V bohr3 me V me V me V NMP .092 69 43 38 .030 .040 19 17 .180 20 20 1.50 Schnet .235 63 41 34 .033 .033 14 14 .073 19 14 1.70 Cormorant .085 61 34 38 .038 .026 20 21 .961 21 22 2.03 L1Net .088 68 46 35 .043 .031 14 14 .354 14 13 1.56 Lie Conv .084 49 30 25 .032 .038 22 24 .800 19 19 2.28 Dime Net++* .044 33 25 20 .030 .023 8 7 .331 6 6 1.21 TFN .223 58 40 38 .064 .101 - - - - - - SE(3)-Tr. .142 53 35 33 .051 .054 - - - - - - EGNN .071 48 29 25 .029 .031 12 12 .106 12 11 1.55 Table 3. Mean Absolute Error for the molecular property prediction benchmark in QM9 dataset. *Dime Net++ uses slightly different train/val/test partitions than the other papers listed here. is unable to fit the training data properly while the EGNN can achieve perfect reconstruction and Noise-GNN close to perfect. We sweep over different pe sparsity values from 0.1 to 0.9 since the symmetry limitation is more present in very sparse or very dense graphs. We report the F1 scores of this experiment in the right plot of Figure 5. In this experiment we showed that E(n) equivariance improves performance when embedding graphs in a continuous space as a set of nodes in dimension n. Even though this is a simple reconstruction task, we think this can be a useful step towards generating graphs or molecules where often graphs (i.e. edges) are decoded as pairwise distances or similarities between nodes e.g. (Kipf & Welling, 2016b; Liu et al., 2019; Grover et al., 2019), and these metrics (e.g. eq. 9) are E(n) invariant. Additionally this experiment also showed that our method can successfully perform in a E(n) equivariant task for higher dimensional spaces where n > 3. 5.3. Molecular data QM9 The QM9 dataset (Ramakrishnan et al., 2014) has become a standard in machine learning as a chemical property prediction task. The QM9 dataset consists of small molecules represented as a set of atoms (up to 29 atoms per molecule), each atom having a 3D position associated and a five dimensional one-hot node embedding that describe the atom type (H, C, N, O, F). The dataset labels are a variety of chemical properties for each of the molecules which are estimated through regression. These properties are invariant to translations, rotations and reflections on the atom positions. Therefore those models that are E(3) invariant are highly suitable for this task. We imported the dataset partitions from (Anderson et al., 2019), 100K molecules for training, 18K for validation and 13K for testing. A variety of 12 chemical properties were estimated per molecule. We optimized and report the Mean Absolute Error between predictions and ground truth. Implementation details: Our EGNN receives as input the 3D coordinate locations of each atom which are provided as x0 i in Equation 3 and an embedding of the atom properties which is provided as input node features h0 i . Since this is an invariant task and also x0 positions are static, there is no need to update the particle s position x by running Equation 4 as we did in previous experiments. Consequently, we tried both manners and we didn t notice any improvement by updating x. When not updating the particle s position (i.e. skipping Equation 4), our model becomes E(n) invariant, which is analogous to a standard GNN where all relative squared norms between pairs of points kxi xjk2 are inputted to the edge operation (eq. 3). Additionally, since we are not provided with an adjacency matrix and molecules can scale up to 29 nodes, we use the extension of our model from Section 3.3 that infers a soft estimation of the edges. Our EGNN network consists of 7 layers, 128 features per hidden layer and the Swish activation function as a non-linearity. A sum-pooling operation preceded and followed by two layers MLPs maps all the node embeddings h L from the output of the EGNN to the estimated property value. Further implementation details are reported in Appendix. C. We compare to NMP (Gilmer et al., 2017), Schnet (Sch utt et al., 2017b), Cormorant (Anderson et al., 2019), L1Net (Miller et al., 2020), Lie Conv (Finzi et al., 2020), Dime Net++ (Klicpera et al., 2020a), TFN (Thomas et al., 2018) and SE(3)-Tr. (Fuchs et al., 2020). Results are presented in Table 3. Our method reports very competitive results in all property prediction tasks while remaining relatively simple, i.e. not introducing higher order representations, angles or spherical harmonics. Perhaps, surprisingly, we outperform other equivariant networks that consider higher order representations while in this task, we are only using type-0 representations (i.e. relative distances) to define the geometry of the molecules. In Appendix E we prove that when only positional information is given (i.e. no velocity or higher order type features), then the geometry E(n) Equivariant Graph Neural Networks is completely defined by the norms in-between points up to E(n)-transformations, in other words, if two collections of points separated by E(n) transformations are considered to be identical, then the relative norms between points is a unique identifier of the collection. 6. Conclusions Equivariant graph neural networks are receiving increasing interest from the natural and medical sciences as they represent a new tool for analyzing molecules and their properties. In this work, we presented a new E(n) equivariant deep architecture for graphs that is computationally efficient, easy to implement, and significantly improves over the current state-of-the-art on a wide range of tasks. We believe these properties make it ideally suited to make a direct impact on topics such as drug discovery, protein folding and the design of new materials, as well as applications in 3D computer vision. Acknowledgements We would like to thank Patrick Forr e for his support to formalize the invariance features identification proof. Anderson, B., Hy, T.-S., and Kondor, R. Cormorant: Covariant molecular neural networks. ar Xiv preprint ar Xiv:1906.04015, 2019. Bollob as, B. and B ela, B. Random graphs. Number 73. Cambridge university press, 2001. Bruna, J., Zaremba, W., Szlam, A., and Le Cun, Y. Spec- tral networks and locally connected networks on graphs. ar Xiv preprint ar Xiv:1312.6203, 2013. Cand es, E. J. and Recht, B. Exact matrix completion via con- vex optimization. Foundations of Computational mathematics, 9(6):717 772, 2009. Chua, K., Calandra, R., Mc Allister, R., and Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. ar Xiv preprint ar Xiv:1805.12114, 2018. Cohen, T. and Welling, M. Group equivariant convolutional networks. In Balcan, M. and Weinberger, K. Q. (eds.), Proceedings of the 33nd International Conference on Machine Learning, ICML, 2016. Cohen, T. S. and Welling, M. Steerable cnns. In 5th Inter- national Conference on Learning Representations, ICLR, 2017. Defferrard, M., Bresson, X., and Vandergheynst, P. Con- volutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844 3852, 2016. Finzi, M., Stanton, S., Izmailov, P., and Wilson, A. G. Gen- eralizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. ar Xiv preprint ar Xiv:2002.12880, 2020. Fuchs, F., Worrall, D., Fischer, V., and Welling, M. Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33, 2020. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. ar Xiv preprint ar Xiv:1704.01212, 2017. Grover, A., Zweig, A., and Ermon, S. Graphite: Iterative generative modeling of graphs. In International conference on machine learning, pp. 2434 2444. PMLR, 2019. Grzeszczuk, R., Terzopoulos, D., and Hinton, G. Neuroan- imator: Fast neural network emulation and control of physics-based models. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp. 9 20, 1998. Horie, M., Morita, N., Hishinuma, T., Ihara, Y., and Mit- sume, N. Isometric transformation invariant and equivariant graph convolutional networks. ar Xiv preprint ar Xiv:2005.06316, 2020. Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., and Zemel, R. Neural relational inference for interacting systems. ar Xiv preprint ar Xiv:1802.04687, 2018. Kipf, T. N. and Welling, M. Semi-supervised classifica- tion with graph convolutional networks. ar Xiv preprint ar Xiv:1609.02907, 2016a. Kipf, T. N. and Welling, M. Variational graph auto-encoders. ar Xiv preprint ar Xiv:1611.07308, 2016b. Klicpera, J., Giri, S., Margraf, J. T., and G unnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. ar Xiv preprint ar Xiv:2011.14115, 2020a. Klicpera, J., Groß, J., and G unnemann, S. Directional message passing for molecular graphs. ar Xiv preprint ar Xiv:2003.03123, 2020b. K ohler, J., Klein, L., and No e, F. Equivariant flows: sam- pling configurations for multi-body systems with symmetric energies. Co RR, abs/1910.00753, 2019. E(n) Equivariant Graph Neural Networks K ohler, J., Klein, L., and No e, F. Equivariant flows: exact likelihood generative learning for symmetric densities. ar Xiv preprint ar Xiv:2006.02425, 2020. Kondor, R., Son, H. T., Pan, H., Anderson, B., and Trivedi, S. Covariant compositional networks for learning graphs. ar Xiv preprint ar Xiv:1801.02144, 2018. Liu, J., Kumar, A., Ba, J., Kiros, J., and Swersky, K. Graph normalizing flows. In Advances in Neural Information Processing Systems, pp. 13578 13588, 2019. Miller, B. K., Geiger, M., Smidt, T. E., and No e, F. Relevance of rotationally equivariant convolutions for predicting molecular properties. ar Xiv preprint ar Xiv:2008.08461, 2020. Nagabandi, A., Kahn, G., Fearing, R. S., and Levine, S. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559 7566. IEEE, 2018. Ramachandran, P., Zoph, B., and Le, Q. V. Searching for activation functions. ar Xiv preprint ar Xiv:1710.05941, 2017. Ramakrishnan, R., Dral, P. O., Rupp, M., and Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1 7, 2014. Rezende, D. J., Racani ere, S., Higgins, I., and Toth, P. Equiv- ariant hamiltonian flows. Co RR, abs/1909.13739, 2019. Romero, D. W. and Cordonnier, J.-B. Group equivariant stand-alone self-attention for vision. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum? id=Jkf Yjn OEo6M. Sch utt, K. T., Arbabzadah, F., Chmiela, S., M uller, K. R., and Tkatchenko, A. Quantum-chemical insights from deep tensor neural networks. Nature communications, 8 (1):1 8, 2017a. Sch utt, K. T., Kindermans, P.-J., Sauceda, H. E., Chmiela, S., Tkatchenko, A., and M uller, K.-R. Schnet: A continuousfilter convolutional neural network for modeling quantum interactions. ar Xiv preprint ar Xiv:1706.08566, 2017b. Serviansky, H., Segol, N., Shlomi, J., Cranmer, K., Gross, E., Maron, H., and Lipman, Y. Set2graph: Learning graphs from sets. Advances in Neural Information Processing Systems, 33, 2020. Simonovsky, M. and Komodakis, N. Graphvae: Towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pp. 412 422. Springer, 2018. Srinivasan, B. and Ribeiro, B. On the equivalence between positional node embeddings and structural graph representations. ar Xiv preprint ar Xiv:1910.00452, 2019. Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., and Riley, P. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. ar Xiv preprint ar Xiv:1802.08219, 2018. Uy, M. A., Pham, Q.-H., Hua, B.-S., Nguyen, T., and Yeung, S.-K. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1588 1597, 2019. Watters, N., Zoran, D., Weber, T., Battaglia, P., Pascanu, R., and Tacchetti, A. Visual interaction networks: Learning a physics simulator from video. Advances in neural information processing systems, 30:4539 4547, 2017. Weiler, M. and Cesa, G. General e(2)-equivariant steerable cnns. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d Alch e-Buc, F., Fox, E. B., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, 2019. You, J., Ying, R., Ren, X., Hamilton, W. L., and Leskovec, J. Graphrnn: Generating realistic graphs with deep autoregressive models. ar Xiv preprint ar Xiv:1802.08773, 2018.