# en_equivariant_graph_neural_networks__fee21796.pdf

E(n) Equivariant Graph Neural Networks

Victor Garcia Satorras 1 Emiel Hoogeboom 1 Max Welling 1

This paper introduces a new model to learn graph neural networks equivariant to rotations, translations, reﬂections and permutations called E(n)- Equivariant Graph Neural Networks (EGNNs). In contrast with existing methods, our work does not require computationally expensive higher-order representations in intermediate layers while it still achieves competitive or better performance. In addition, whereas existing methods are limited to equivariance on 3 dimensional spaces, our model is easily scaled to higher-dimensional spaces. We demonstrate the effectiveness of our method on dynamical systems modelling, representation learning in graph autoencoders and predicting molecular properties.

1. Introduction

Although deep learning has largely replaced hand-crafted features, many advances are critically dependent on inductive biases in deep neural networks. An effective method to restrict neural networks to relevant functions is to exploit the symmetry of problems by enforcing equivariance with respect to transformations from a certain symmetry group. Notable examples are translation equivariance in Convolutional Neural Networks and permutation equivariance in Graph Neural Networks (Bruna et al., 2013; Defferrard et al., 2016; Kipf & Welling, 2016a).

Many problems exhibit 3D translation and rotation symmetries. Some examples are point clouds (Uy et al., 2019), 3D molecular structures (Ramakrishnan et al., 2014) or N-body particle simulations (Kipf et al., 2018). The group corresponding to these symmetries is named the Euclidean group: SE(3) or when reﬂections are included E(3). It is often desired that predictions on these tasks are either equivariant or invariant with respect to E(3) transformations.

1Uv A-Bosch Delta Lab, University of Amsterdam, Netherlands. Correspondence to: Victor Garcia Satorras <v.garciasatorras@uva.nl>, Emiel Hoogeboom <e.hoogeboom@uva.nl>, Max Welling <m.welling@uva.nl>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

Figure 1. Example of rotation equivariance on a graph with a graph

neural network φ

Recently, various forms and methods to achieve E(3) or SE(3) equivariance have been proposed (Thomas et al., 2018; Fuchs et al., 2020; Finzi et al., 2020; K ohler et al., 2020). Many of these works achieve innovations in studying types of higher-order representations for intermediate network layers. However, the transformations for these higher-order representations require coefﬁcients or approximations that can be expensive to compute. Additionally, in practice for many types of data the inputs and outputs are restricted to scalar values (for instance temperature or energy, referred to as type-0 in literature) and 3d vectors (for instance velocity or momentum, referred to as type-1 in literature).

In this work we present a new architecture that is translation, rotation and reﬂection equivariant (E(n)), and permutation equivariant with respect to an input set of points. Our model is simpler than previous methods in that it does not require the spherical harmonics as in (Thomas et al., 2018; Fuchs et al., 2020) while it can still achieve competitive or better results. In addition, equivariance in our model is not limited to the 3-dimensional space and can be scaled to larger dimensional spaces without a signiﬁcant increase in computation.

E(n) Equivariant Graph Neural Networks

We evaluate our method in modelling dynamical systems, representation learning in graph autoencoders and predicting molecular properties in the QM9 dataset. Our method reports the best or very competitive performance in all three experiments.

2. Background

In this section we introduce the relevant materials on equivariance and graph neural networks which will later complement the deﬁnition of our method.

2.1. Equivariance

Let Tg : X ! X be a set of transformations on X for the abstract group g 2 G. We say a function φ : X ! Y is equivariant to g if there exists an equivalent transformation on its output space Sg : Y ! Y such that:

φ(Tg(x)) = Sg(φ(x)) (1)

As a practical example, let φ( ) be a non-linear function, x = (x1, . . . , x M) 2 RM n an input set of M point clouds embedded in a n-dimensional space, φ(x) = y 2 RM n

the transformed set of point clouds, Tg a translation on the input set Tg(x) = x + g and Sg an equivalent translation on the output set Sg(y) = y + g. If our transformation φ : X ! Y is translation equivariant, translating the input set Tg(x) and then applying the function φ(Tx(x)) on it, will deliver the same result as ﬁrst running the function y = φ(x) and then applying an equivalent translation to the output Tg(y) such that Equation 1 is fulﬁlled and φ(x+g) = φ(x) + g. In this work we explore the following three types of equivariance on a set of particles x:

1. Translation equivariance. Translating the input by g 2

Rn results in an equivalent translation of the output. Let x+g be shorthand for (x1+g, . . . , x M +g). Then y + g = φ(x + g) 2. Rotation (and reﬂection) equivariance. For any or-

thogonal matrix Q 2 Rn n, let Qx be shorthand for (Qx1, . . . , Qx M). Then rotating the input results in an

equivalent rotation of the output Qy = φ(Qx). 3. Permutation equivariance. Permuting the input results in the same permutation of the output P(y) = φ(P(x)) where P is a permutation on the row indexes.

Note that velocities v 2 RM n are unaffected by translations, but they transform equivalently under rotation (2) and permutation (3). Our method introduced in Section 3 will satisfy the three above mentioned equivariant constraints.

2.2. Graph Neural Networks

Graph Neural Networks are permutation equivariant networks that operate on graph structured data (Bruna et al., 2013; Defferrard et al., 2016; Kipf & Welling, 2016a).

Given a graph G = (V, E) with nodes vi 2 V and edges eij 2 E we deﬁne a graph convolutional layer following notation from (Gilmer et al., 2017) as:

mij = φe(hl

i 2 Rnf is the nf-dimensional embedding of node vi at layer l. aij are the edge attributes. N(i) represents the set of neighbors of node vi. Finally, φe and φh are the edge and node operations respectively which are commonly approximated by Multilayer Perceptrons (MLPs).

3. Equivariant Graph Neural Networks

In this section we present Equivariant Graph Neural Networks (EGNNs). Following the notation from background Section 2.2, we consider a graph G = (V, E) with nodes vi 2 V and edges eij 2 E. In addition to the feature node embeddings hi 2 Rnf we now also consider a n-dimensional coordinate xi 2 Rn associated with each of the graph nodes. Our model will preserve equivariance to rotations and translations on these set of coordinates xi and it will also preserve equivariance to permutations on the set of nodes V in the same fashion as GNNs.

Our Equivariant Graph Convolutional Layer (EGCL) takes as input the set of node embeddings hl = {hl

0, . . . , hl

M 1}, coordinate embeddings xl = {xl

0, . . . , xl

M 1} and edge information E = (eij) and outputs a transformation on hl+1

and xl+1. Concisely: hl+1, xl+1 = EGCL[hl, xl, E]. The equations that deﬁne this layer are the following:

φx (mij) (4)

Notice the main differences between the above proposed method and the original Graph Neural Network from equation 2 are found in equations 3 and 4. In equation 3 we now input the relative squared distance between two coordinates kxl

jk2 into the edge operation φe. The embeddings hl

j, and the edge attributes aij are also provided as input to the edge operation as in the GNN case. In our case the edge attributes will incorporate the edge values aij = eij, but they could also include additional edge information.

E(n) Equivariant Graph Neural Networks

In Equation 4 we update the position of each particle xi as a vector ﬁeld in a radial direction. In other words, the position of each particle xi is updated by the weighted sum of all relative differences (xi xj)8j. The weights of this sum are provided as the output of the function φx : Rnf ! R1 that takes as input the edge embedding mij from the previous edge operation and outputs a scalar value. C is chosen to be 1/(M 1), which divides the sum by its number of elements. This equation is the main difference of our model compared to standard GNNs and it is the reason why equivariances 1, 2 are preserved (proof in Appendix A). Despite its simplicity, this equivariant operation is very ﬂexible since the embedding mij can carry information from the whole graph and not only from the given edge eij.

Finally, equations 5 and 6 follow the same updates than standard GNNs. Equation 5 just aggregates all the incoming messages from neighbor nodes N(i) to node vi and Equation 6 performs the node operation φv which takes as input the aggregated messages mi, the node emedding hl

i and outputs the updated node embedding hl+1

3.1. Analysis on E(n) equivariance

In this section we analyze the equivariance properties of our model for E(3) symmetries (i.e. properties 1 and 2 stated in section 2.1). In other words, our model should be translation equivariant on x for any translation vector g 2 Rn and it should also be rotation and reﬂection equivariant on x for any orthogonal matrix Q 2 Rn n. More formally our model satisﬁes:

Qxl+1 + g, hl+1 = EGCL(Qxl + g, hl)

We provide a formal proof of this in Appendix A. Intuitively, let s consider a hl feature which is already E(n) invariant, then we can see that the resultant edge embedding mij from Equation 3 will also be E(n) invariant, because in addition to hl, it only depends on squared distances kxl

jk2, which are E(n) invariant. Next, Equation 4 computes xl+1

i by a weighted sum of differences (xi xj) which is added to xi, this transforms as a type-1 vector and preserves equivariance (see Appendix A). Finally the last two equations 5 and 6 that generate the next layer node-embeddings hl+1 remain E(n) invariant since they only depend on hl and mij which, as we saw above, are E(n) invariant. Therefore the output hl+1 is E(n) invariant and xl+1 is E(n) equivariant to xl. Inductively, a composition of EGCLs will also be equivariant.

3.2. Extending EGNNs for vector type representations

In this section we propose a slight modiﬁcation to the presented method such that we explicitly keep track of the particle s momentum. In some scenarios this can be useful not only to obtain an estimate of the particle s velocity at

every layer but also to provide an initial velocity value in those cases where it is not 0. We can include momentum to our proposed method by just replacing Equation 4 of our model with the following equation:

Note that this extends the EGCL layer as hl+1, xl+1, vl+1 = EGCL[hl, xl, vl, E]. The only difference is that now we broke down the coordinate update (eq. 4) in two steps, ﬁrst we compute the velocity vl+1

i and then we use this velocity to update the position xl

i. The velocity vl is scaled by a new function φv : RN ! R1

that maps the node embedding hl

i to a scalar value. Notice that if the initial velocity is set to zero (v(0)

i = 0), both equations 4 and 7 become exactly the same for the ﬁrst layer l = 0 and they become equivalent for the next layers since φv just re-scales all the outputs of φx from the previous layers with a scalar value. We proof the equivariance of this variant of the model in Appendix B.1. This variant is used in experiment 5.1 where we provide the initial velocity of the system, and predict a relative position change.

3.3. Inferring the edges

Given a point cloud or a set of nodes, we may not always be provided with an adjacency matrix. In those cases we can assume a fully connected graph where all nodes exchange messages with each other, in other words, the neighborhood operator N(i) from Equation 5 would include all other nodes of the graph except for i. This fully connected approach may not scale well to large graphs where we may want to locally limit the exchange of messages to avoid an overﬂow of information.

Similarly to (Serviansky et al., 2020; Kipf et al., 2018) we present a simple solution to infer the relations/edges of the graph in our model. We can re-write the aggregation operation from our model (eq. 5) in the following way:

Where eij takes value 1 if there is an edge between nodes (i, j) and 0 otherwise. Notice this doesn t modify yet the

original equation used in our model, it is just a change in notation. Now we can choose to approximate the relations eij with the following function eij = φinf(mij), where φinf : Rnf ! [0, 1]1 resembles a linear layer followed by a sigmoid function that takes as input the current edge embedding and outputs a soft estimation of its edge value. This modiﬁcation doesn t change the E(n) properties of the model since we are only operating on the messages mij which are already E(n) invariant.

E(n) Equivariant Graph Neural Networks

GNN Radial Field TFN Schnet EGNN

Edge mij = φe(hl

j, aij) mij = φrf(krl

i mij = φcf(krl

j) mij = φe(hl

ijk2, aij) ˆ mij = rl

Agg. mi = P

j2N (i) mij mi = P

j6=i mij mi = P

j6=i mij mi = P

j6=i mij mi = P

j2N (i) mij ˆ mi = C P

i, mi) xl+1

i + mi hl+1

i + mi hl+1

i, mi) hl+1

Non-equivariant E(n)-Equivariant SE(3)-Equivariant E(n)-Invariant E(n)-Equivariant

Table 1. Comparison over different works from the literature under the message passing framework notation. We created this table with

the aim to provide a clear and simple way to compare over these different methods. The names from left to right are: Graph Neural Networks (Gilmer et al., 2017); Radial Field from Equivariant Flows (K ohler et al., 2019); Tensor Field Networks (Thomas et al., 2018); Schnet (Sch utt et al., 2017b); and our Equivariant Graph Neural Network. The difference between two points is written rij = (xi xj).

4. Related Work

Group equivariant neural networks have demonstrated their effectiveness in a wide variety of tasks (Cohen & Welling, 2016; 2017; Weiler & Cesa, 2019; Rezende et al., 2019; Romero & Cordonnier, 2021). Recently, various forms and methods to achieve E(3) or SE(3) equivariance have been proposed. Thomas et al. (2018); Fuchs et al. (2020) utilize the spherical harmonics to compute a basis for the transformations, which allows transformations between higherorder representations. A downside to this method is that the spherical harmonics need to be recomputed which can be expensive. Currently, an extension of this method to arbitrary dimensions is unknown. Finzi et al. (2020) parametrize transformations by mapping kernels on the Lie Algebra. For this method the neural network outputs are in certain situations stochastic, which may be undesirable. Horie et al. (2020) proposes a set of isometric invariant and equivairant transformations for Graph Neural Networks. K ohler et al. (2019); K ohler et al. (2020) propose an E(n) equivariant network to model 3D point clouds, but the method is only deﬁned for positional data on the nodes without any feature dimensions.

Another related line of research concerns message passing algorithms on molecular data. (Gilmer et al., 2017) presented a message passing setting (or Graph Neural Network) for quantum chemistry, this method is permutation equivariant but not translation or rotation equivariant. (Kondor et al., 2018) extends the equivariance of GNNs such that each neuron transforms in a speciﬁc way under permutations, but this extension only affects its permutation group and not translations or rotations in a geometric space. Further works (Sch utt et al., 2017b;a) build E(n) invariant message passing networks by inputting the relative distances between points. Klicpera et al. (2020b;a) in addition to relative distances it includes a modiﬁed message passing scheme analogous to Belief Propagation that considers angles and directional information equivariant to rotations. It also uses Bessel functions and spherical harmonics to construct and orthogonal basis. Anderson et al. (2019); Miller

et al. (2020) include SO(3) equivariance in its intermediate layers for modelling the behavior and properties of molecular data. Our method is also framed as a message passing framework but in contrast to these methods it achieves E(n) equivariance.

Relationship with existing methods

In Table 1 the EGNN equations are detailed together with some of its closest methods from the literature under the message passing notation from (Gilmer et al., 2017). This table aims to provide a simple way to compare these different algorithms. It is structured in three main rows that describe i) the edge ii) aggregation and iii) node update operations. The GNN algorithm is the same as the previously introduced in Section 2.1. Our EGNN algorithm is also equivalent to the description in Section 3 but notation has been modiﬁed to match the (edge, aggregation, node) format. In all equations rl

ij = (xi xj)l. Notice that except the EGNN, all algorithms have the same aggregation operation and the main differences arise from the edge operation. The algorithm that we call Radial Field is the E(n) equivariant update from (K ohler et al., 2019). This method is E(n) equivariant, however its main limitation is that it only operates on x and it doesn t propagate node features h among nodes. In the method φrf is modelled as an MLP. Tensor Field Networks (TFN) (Thomas et al., 2018) instead propagate the node embeddings h but it uses spherical harmonics to compute its learnable weight kernel W k : R3 ! R(2 +1) (2k+1)

which preserves SE(3) equivariance but is expensive to compute an limited to the 3 dimensional space. The SE(3) Transformer (Fuchs et al., 2020) (not included in this table), can be interpreted as an extension of TFN with attention. Schnet (Sch utt et al., 2017b) can be interpreted as an E(n) invariant Graph Neural Network where φcf receives as input relative distances and outputs a continuous ﬁlter convolution that multiplies the neighbor embeddings h. Our EGNN differs from these other methods in terms that it performs two different updates in each of the table rows, one related to the embeddings h and another related to the coordinates

E(n) Equivariant Graph Neural Networks

x, these two variables exchange information in the edge operation. In summary the EGNN can retain the ﬂexibility of GNNs while remaining E(n) equivariant as the Radial Field algorithm and without the need to compute expensive operations (i.e. spherical harmonics).

5. Experiments

5.1. Modelling a dynamical system N-body system

In a dynamical system a function deﬁnes the time dependence of a point or set of points in a geometrical space. Modelling these complex dynamics is crucial in a variety of applications such as control systems (Chua et al., 2018), model based dynamics in reinforcement learning (Nagabandi et al., 2018), and physical systems simulations (Grzeszczuk et al., 1998; Watters et al., 2017). In this experiment we forecast

the positions for a set of particles which are modelled by simple interaction rules, yet can exhibit complex dynamics.

Similarly to (Fuchs et al., 2020), we extended the Charged Particles N-body experiment from (Kipf et al., 2018) to a 3 dimensional space. The system consists of 5 particles that carry a positive or negative charge and have a position and a velocity associated in 3-dimensional space. The system is controlled by physic rules: particles are attracted or repelled depending on their charges. This is an equivariant task since rotations and translations on the input set of particles result in the same transformations throughout the entire trajectory.

Dataset: We sampled 3.000 trajectories for training, 2.000 for validation and 2.000 for testing. Each trajectory has a duration of 1.000 timesteps. For each trajectory we are provided with the initial particle positions p(0) = {p(0)

1 , . . . p(0)

5 } 2 R5 3, their initial velocities v(0) = {v(0)

1 , . . . v(0)

5 } 2 R5 3 and their respective charges c = {c1, . . . c5} 2 { 1, 1}5. The task is to estimate the positions of the ﬁve particles after 1.000 timesteps. We optimize the averaged Mean Squared Error of the estimated position with the ground truth one.

Implementation details: In this experiment we used the extension of our model that includes velocity from section 3.2. We input the position p(0) as the ﬁrst layer coordinates x0 of our model and the velocity v(0) as the initial velocity in Equation 7, the norms kv(0)

i k are also provided as features to h0

i through a linear mapping. The charges are input as edge attributes aij = cicj. The model outputs the last layer coordinates x L as the estimated positions. We compare our method to its non equivariant Graph Neural Network (GNN) cousin, and the equivariant methods: Radial Field (K ohler et al., 2019), Tensor Field Networks and the SE(3) Transformer. All algorithms are composed of 4 layers and have been trained under the same conditions, batch size 100, 10.000 epochs, Adam optimizer, the learning rate was

tuned independently for each model. We used 64 features for the hidden layers in the Radial Field, the GNN and our EGNN. As non-linearity we used the Swish activation function (Ramachandran et al., 2017). For TFN and the SE(3) Transformer we swept over different number of vector types and features and chose those that provided the best performance. Further implementation details are provided in Appendix C.1. A Linear model that simply considers the motion equation p(t) = p(0) + v(0)t is also included as a baseline. We also provide the average forward pass time in seconds for each of the models for a batch of 100 samples in a GTX 1080 Ti GPU.

Method MSE Forward time (s)

Linear 0.0819 .0001 SE(3) Transformer 0.0244 .1346 Tensor Field Network 0.0155 .0343 Graph Neural Network 0.0107 .0032 Radial Field 0.0104 .0039 EGNN 0.0071 .0062

Table 2. Mean Squared Error for the future position estimation in

the N-body system experiment, and forward time in seconds for a batch size of 100 samples running in a GTX 1080Ti GPU.

Results As shown in Table 2 our model signiﬁcantly outperforms the other equivariant and non-equivariant alternatives while still being efﬁcient in terms of running time. It reduces the error with respect to the second best performing method by a 32%. In addition it doesn t require the computation of spherical harmonics which makes it more time efﬁcient than Tensor Field Networks and the SE(3) Transformer.

Figure 2. Mean Squared Error in the N-body experiment for the Ra-

dial Field, GNN and EGNN methods when sweeping over different amounts of training data.

Analysis for different number of training samples: We want to analyze the performance of our EGNN in the small and large data regime. In the following, we report on a similar experiment as above, but instead of using 3.000 training samples we generated a new training partition of

E(n) Equivariant Graph Neural Networks

50.000 samples and we sweep over different amounts of data from 100 to 50.000 samples. We compare the performances of our EGNN vs its non-equivariant GNN counterpart and the Radial Field algorithm. Results are presented in Figure 2. Our method outperforms both Radial Field and GNNs in the small and large data regimes. This shows the EGNN is more data efﬁcient than GNNs since it doesn t require to generalize over rotations and translations of the data while it ensembles the ﬂexibility of GNNs in the larger data regime. Due to its high model bias, the Radial Field algorithm performs well when data is scarce but it is unable to learn the subtleties of the dataset as we increase the training size. In summary, our EGNN beneﬁts from both the high bias of E(n) methods and the ﬂexibility of GNNs.

5.2. Graph Autoencoder

A Graph Autoencoder can learn unsupervised representations of graphs in a continuous latent space (Kipf & Welling, 2016b; Simonovsky & Komodakis, 2018). In this experiment section we use our EGNN to build an Equivariant Graph Autoencoder. We will explain how Graph Autoencoders can beneﬁt from equivariance and we will show how our method outperforms standard GNN autoencoders in the provided datasets. This problem is particularly interesting since the embedding space can be scaled to larger dimensions and is not limited to a 3 dimensional Euclidean space.

Similarly to the work of (Kipf & Welling, 2016b) further extended by section 3.3 in (Liu et al., 2019), our graph autoencoder z = q(G) embeds a graph G into a set of latent nodes z = {z1, . . . z M} 2 RM n, where M is the number of nodes and n the embedding size per node. Notice this may reduce the memory complexity to store the graphs from O(M 2) to O(Mn) where n may depend on M for a certain approximation error tolerance. This differs from the variational autoencoder proposed in (Simonovsky & Komodakis, 2018) which embeds the graph in a single vector z 2 RK, which causes the reconstruction to be computationally very expensive since the nodes of the decoded graph have to be matched again to the ground truth. In addition to the introduced graph generation and representation learning methods, it is worth it mentioning that in the context of graph compression other methods (Cand es & Recht, 2009) can be used.

More speciﬁcally, we will compare our Equivariant Graph Auto-Encoder in the task presented in (Liu et al., 2019) where a graph G = (V, E) with node features H 2 RM nf

and adjacency matrix A 2 {0, 1}M M is embedded into a latent space z = q(H, A) 2 RM n. Following (Kipf & Welling, 2016b; Liu et al., 2019), we are only interested in reconstructing the adjacency matrix A since the datasets we will work with do not contain node features. The decoder g( ) proposed by (Liu et al., 2019) takes as input the embed-

ding space z and outputs the reconstructed adjacency matrix

ˆA = g(z), this decoder function is deﬁned as follows:

ˆAij = ge(zi, zj) = 1

1 + exp(w kzi zjk2 + b)

Where w and b are its only learnable parameters and ge( ) is the decoder edge function applied to every pair of node embeddings. It reﬂects that edge probabilities will depend on the relative distances among node embeddings. The training loss is deﬁned as the binary cross entropy between the estimated and the ground truth edges L = P

ij BCE( ˆAij, Aij).

The symmetry problem: The above stated autoencoder may seem straightforward to implement at ﬁrst sight but in some cases there is a strong limitation regarding the symmetry of the graph. Graph Neural Networks are convolutions on the edges and nodes of a graph, i.e. the same function is applied to all edges and to all nodes. In some graphs (e.g. those deﬁned only by its adjacency matrix) we may not have input features in the nodes, and for that reason the difference among nodes relies only on their edges or neighborhood topology. Therefore, if the neighborhood of two nodes is exactly the same, their encoded embeddings will be the same too. A clear example of this is a cycle graph (an example of a 4 nodes cycle graph is provided in Figure 3). When running a Graph Neural Network encoder on a node featureless cycle graph, we will obtain the exact same embedding for each of the nodes, which makes it impossible to reconstruct the edges of the original graph from the node embeddings. The cycle graph is a severe example where all nodes have the exact same neighborhood topology but these symmetries can be present in different ways for other graphs with different edge distributions or even when including node features if these are not unique.

Figure 3. Visual representation of a Graph Autoencoder for a 4

nodes cycle graph. The bottom row illustrates that adding noise at the input graph breaks the symmetry of the embedding allowing the reconstruction of the adjacency matrix.

An ad-hoc method to break the symmetry of the graph is introduced by (Liu et al., 2019). This method introduces noise sampled from a Gaussian distribution into the input node features of the graph h0

i N(0, σI). This noise allows different representations for all node embeddings and as a result the graph can be decoded again, but it comes

E(n) Equivariant Graph Neural Networks

Community Small Erdos&Renyi Encoder BCE % Error F1 BCE % Error F1

Baseline - 31.79 .0000 - 25.13 0.000 GNN 6.75 1.29 0.980 14.15 4.62 0.907 Noise-GNN 3.32 0.44 0.993 4.56 1.25 0.975 Radial Field 9.22 1.19 0.981 6.78 1.63 0.968 EGNN 2.14 0.06 0.999 1.65 0.11 0.998

Figure 5. In the Table at the left we report the Binary Cross Entropy, % Error and F1 scores for the test partition on the Graph Autoencoding

experiment in the Community Small and Erdos&Renyi datasets. In the Figure at the right, we report the F1 score when overﬁtting a training partition of 100 samples in the Erdos&Renyi dataset for different values of sparsity pe. The GNN is not able to successfully auto-encode sparse graphs (small pe values) for the Erdos&Renyi dataset even when training and testing on the same small subset.

with a drawback, the network has to generalize over the new introduced noise distribution. Our Equivariant Graph Autoencoder will remain translation and rotation equivariant to this sampled noise which we ﬁnd makes the generalization much easier. Another way of looking at this is considering the sampled noise makes the node representations go from structural to positional (Srinivasan & Ribeiro, 2019) where E(n) equivariance may be beneﬁcial. In our case we will simply input this noise as the input coordinates x0 N(0, σI) 2 RM n of our EGNN which will output an equivariant transformation of them x L, this output will be used as the embedding of the graph (i.e. z = x L) which is the input to the decoder from Equation 9.

Dataset: We generated community-small graphs (You et al., 2018; Liu et al., 2019) by running the original code from (You et al., 2018). These graphs contain 12 M 20 nodes. We also generated a second dataset using the Erdos&Renyi generative model (Bollob as & B ela, 2001) sampling random graphs with an initial number of 7 M 16 nodes and edge probability pe = 0.25. We sampled 5.000 graphs for training, 500 for validation and 500 for test for both datasets. Each graph is deﬁned as and adjacency matrix A 2 {0, 1}M M.

Implementation details: Our Equivariant Graph Auto Encoder is composed of an EGNN encoder followed by the decoder from Equation 9. The graph edges Aij are input as edge attributes aij in Equation 3. The noise used to break the symmetry is input as the coordinates x0 N(0, σI) 2 RM n in the ﬁrst layer and h0 is initialized as ones since we are working with featureless graphs. As mentioned before, the encoder outputs an equivariant transformation on the coordinates which is the graph embedding and input to the decoder z = x L 2 RM n. We use n = 8 dimensions for the embedding space. We compare the EGNN to its GNN cousin, we also compare to Noise-GNN which is an adaptation of our GNN to match the setting from (Liu et al., 2019), and we also include the Radial Field algorithm as an additional baseline. All four

models have 4 layers, 64 features for the hidden layers, the Swish activation function as a non-linearity and they were all trained for 100 epochs using the Adam optimizer and learning rate 10 4. More details are provided in Appendix C.2. Since the number of nodes is larger than the number of layers, the receptive ﬁeld of a GNN may not comprise the whole graph which can make the comparison unfair with our EGNN. To avoid this limitation, all models exchange messages among all nodes and the edge information is provided as edge attributes aij = Aij in all of them.

Results: In the table from Figure 5 we report the Binary Cross Entropy loss between the estimated and ground truth edges, the % Error which is deﬁned as the percentage of wrong predicted edges with respect to the total amount of potential edges, and the F1 score of the edge classiﬁcation, all numbers refer to the test partition. We also include a Baseline that predicts all edges as missing ˆAij = 0. The

standard GNN seems to suffer from the symmetry problem and provides the worst performance. When introducing noise (Noise-GNN), both the loss and the error decrease showing that it is actually useful to add noise to the input nodes. Finally, our EGNN remains E(n) equivariant to this noise distribution and provides the best reconstruction with a 0.11% error in the Erdos&Renyi dataset and close to optimal 0.06% in the Community Small dataset. A further analysis of the reconstruction error for different n embedding sizes is reported in Appendix D.1.

Overﬁtting the training set: We explained the symmetry problem and we showed the EGNN outperforms other methods in the given datasets. Although we observed that adding noise to the GNN improves the results, it is difﬁcult to exactly measure the impact of the symmetry limitation in these results independent from other factors such as generalization from the training to the test set. In this section we conduct an experiment where we train the different models in a subset of 100 Erdos&Renyi graphs and embedding size n = 16 with the aim to overﬁt the data. We evaluate the methods on the training data. In this experiment the GNN

E(n) Equivariant Graph Neural Networks

Task " "HOMO "LUMO µ C G H R2 U U0 ZPVE Units bohr3 me V me V me V D cal/mol K me V me V bohr3 me V me V me V

NMP .092 69 43 38 .030 .040 19 17 .180 20 20 1.50 Schnet .235 63 41 34 .033 .033 14 14 .073 19 14 1.70 Cormorant .085 61 34 38 .038 .026 20 21 .961 21 22 2.03 L1Net .088 68 46 35 .043 .031 14 14 .354 14 13 1.56 Lie Conv .084 49 30 25 .032 .038 22 24 .800 19 19 2.28 Dime Net++* .044 33 25 20 .030 .023 8 7 .331 6 6 1.21 TFN .223 58 40 38 .064 .101 - - - - - - SE(3)-Tr. .142 53 35 33 .051 .054 - - - - - -

EGNN .071 48 29 25 .029 .031 12 12 .106 12 11 1.55

Table 3. Mean Absolute Error for the molecular property prediction benchmark in QM9 dataset. *Dime Net++ uses slightly different

train/val/test partitions than the other papers listed here.

is unable to ﬁt the training data properly while the EGNN can achieve perfect reconstruction and Noise-GNN close to perfect. We sweep over different pe sparsity values from 0.1 to 0.9 since the symmetry limitation is more present in very sparse or very dense graphs. We report the F1 scores of this experiment in the right plot of Figure 5.

In this experiment we showed that E(n) equivariance improves performance when embedding graphs in a continuous space as a set of nodes in dimension n. Even though this is a simple reconstruction task, we think this can be a useful step towards generating graphs or molecules where often graphs (i.e. edges) are decoded as pairwise distances or similarities between nodes e.g. (Kipf & Welling, 2016b; Liu et al., 2019; Grover et al., 2019), and these metrics (e.g. eq. 9) are E(n) invariant. Additionally this experiment also showed that our method can successfully perform in a E(n) equivariant task for higher dimensional spaces where n > 3.

5.3. Molecular data QM9

The QM9 dataset (Ramakrishnan et al., 2014) has become a standard in machine learning as a chemical property prediction task. The QM9 dataset consists of small molecules represented as a set of atoms (up to 29 atoms per molecule), each atom having a 3D position associated and a ﬁve dimensional one-hot node embedding that describe the atom type (H, C, N, O, F). The dataset labels are a variety of chemical properties for each of the molecules which are estimated through regression. These properties are invariant to translations, rotations and reﬂections on the atom positions. Therefore those models that are E(3) invariant are highly suitable for this task.

We imported the dataset partitions from (Anderson et al., 2019), 100K molecules for training, 18K for validation and 13K for testing. A variety of 12 chemical properties were

estimated per molecule. We optimized and report the Mean Absolute Error between predictions and ground truth.

Implementation details: Our EGNN receives as input the 3D coordinate locations of each atom which are provided as x0

i in Equation 3 and an embedding of the atom properties which is provided as input node features h0

i . Since this is an invariant task and also x0 positions are static, there is no need to update the particle s position x by running Equation 4 as we did in previous experiments. Consequently, we tried both manners and we didn t notice any improvement by updating x. When not updating the particle s position (i.e. skipping Equation 4), our model becomes E(n) invariant, which is analogous to a standard GNN where all relative squared norms between pairs of points kxi xjk2

are inputted to the edge operation (eq. 3). Additionally, since we are not provided with an adjacency matrix and molecules can scale up to 29 nodes, we use the extension of our model from Section 3.3 that infers a soft estimation of the edges. Our EGNN network consists of 7 layers, 128 features per hidden layer and the Swish activation function as a non-linearity. A sum-pooling operation preceded and followed by two layers MLPs maps all the node embeddings h L from the output of the EGNN to the estimated property value. Further implementation details are reported in Appendix. C. We compare to NMP (Gilmer et al., 2017), Schnet (Sch utt et al., 2017b), Cormorant (Anderson et al., 2019), L1Net (Miller et al., 2020), Lie Conv (Finzi et al., 2020), Dime Net++ (Klicpera et al., 2020a), TFN (Thomas et al., 2018) and SE(3)-Tr. (Fuchs et al., 2020).

Results are presented in Table 3. Our method reports very competitive results in all property prediction tasks while remaining relatively simple, i.e. not introducing higher order representations, angles or spherical harmonics. Perhaps, surprisingly, we outperform other equivariant networks that consider higher order representations while in this task, we are only using type-0 representations (i.e. relative distances) to deﬁne the geometry of the molecules. In Appendix E we prove that when only positional information is given (i.e. no velocity or higher order type features), then the geometry

E(n) Equivariant Graph Neural Networks

is completely deﬁned by the norms in-between points up to E(n)-transformations, in other words, if two collections of points separated by E(n) transformations are considered to be identical, then the relative norms between points is a unique identiﬁer of the collection.

6. Conclusions

Equivariant graph neural networks are receiving increasing interest from the natural and medical sciences as they represent a new tool for analyzing molecules and their properties. In this work, we presented a new E(n) equivariant deep architecture for graphs that is computationally efﬁcient, easy to implement, and signiﬁcantly improves over the current state-of-the-art on a wide range of tasks. We believe these properties make it ideally suited to make a direct impact on topics such as drug discovery, protein folding and the design of new materials, as well as applications in 3D computer vision.

Acknowledgements

We would like to thank Patrick Forr e for his support to formalize the invariance features identiﬁcation proof.

Anderson, B., Hy, T.-S., and Kondor, R. Cormorant: Covariant molecular neural networks. ar Xiv preprint ar Xiv:1906.04015, 2019.

Bollob as, B. and B ela, B. Random graphs. Number 73.

Cambridge university press, 2001.

Bruna, J., Zaremba, W., Szlam, A., and Le Cun, Y. Spec-

tral networks and locally connected networks on graphs. ar Xiv preprint ar Xiv:1312.6203, 2013.

Cand es, E. J. and Recht, B. Exact matrix completion via con-

vex optimization. Foundations of Computational mathematics, 9(6):717 772, 2009.

Chua, K., Calandra, R., Mc Allister, R., and Levine, S.

Deep reinforcement learning in a handful of trials using probabilistic dynamics models. ar Xiv preprint ar Xiv:1805.12114, 2018.

Cohen, T. and Welling, M. Group equivariant convolutional

networks. In Balcan, M. and Weinberger, K. Q. (eds.), Proceedings of the 33nd International Conference on Machine Learning, ICML, 2016.

Cohen, T. S. and Welling, M. Steerable cnns. In 5th Inter-

national Conference on Learning Representations, ICLR, 2017.

Defferrard, M., Bresson, X., and Vandergheynst, P. Con-

volutional neural networks on graphs with fast localized spectral ﬁltering. In Advances in neural information processing systems, pp. 3844 3852, 2016.

Finzi, M., Stanton, S., Izmailov, P., and Wilson, A. G. Gen-

eralizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. ar Xiv preprint ar Xiv:2002.12880, 2020.

Fuchs, F., Worrall, D., Fischer, V., and Welling, M. Se

(3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33, 2020.

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and

Dahl, G. E. Neural message passing for quantum chemistry. ar Xiv preprint ar Xiv:1704.01212, 2017.

Grover, A., Zweig, A., and Ermon, S. Graphite: Iterative

generative modeling of graphs. In International conference on machine learning, pp. 2434 2444. PMLR, 2019.

Grzeszczuk, R., Terzopoulos, D., and Hinton, G. Neuroan-

imator: Fast neural network emulation and control of physics-based models. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp. 9 20, 1998.

Horie, M., Morita, N., Hishinuma, T., Ihara, Y., and Mit-

sume, N. Isometric transformation invariant and equivariant graph convolutional networks. ar Xiv preprint ar Xiv:2005.06316, 2020.

Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., and Zemel,

R. Neural relational inference for interacting systems. ar Xiv preprint ar Xiv:1802.04687, 2018.

Kipf, T. N. and Welling, M. Semi-supervised classiﬁca-

tion with graph convolutional networks. ar Xiv preprint ar Xiv:1609.02907, 2016a.

Kipf, T. N. and Welling, M. Variational graph auto-encoders.

ar Xiv preprint ar Xiv:1611.07308, 2016b.

Klicpera, J., Giri, S., Margraf, J. T., and G unnemann,

S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. ar Xiv preprint ar Xiv:2011.14115, 2020a.

Klicpera, J., Groß, J., and G unnemann, S. Directional message passing for molecular graphs. ar Xiv preprint ar Xiv:2003.03123, 2020b.

K ohler, J., Klein, L., and No e, F. Equivariant ﬂows: sam-

pling conﬁgurations for multi-body systems with symmetric energies. Co RR, abs/1910.00753, 2019.

E(n) Equivariant Graph Neural Networks

K ohler, J., Klein, L., and No e, F. Equivariant ﬂows: exact

likelihood generative learning for symmetric densities. ar Xiv preprint ar Xiv:2006.02425, 2020.

Kondor, R., Son, H. T., Pan, H., Anderson, B., and Trivedi,

S. Covariant compositional networks for learning graphs. ar Xiv preprint ar Xiv:1801.02144, 2018.

Liu, J., Kumar, A., Ba, J., Kiros, J., and Swersky, K. Graph

normalizing ﬂows. In Advances in Neural Information Processing Systems, pp. 13578 13588, 2019.

Miller, B. K., Geiger, M., Smidt, T. E., and No e,

F. Relevance of rotationally equivariant convolutions for predicting molecular properties. ar Xiv preprint ar Xiv:2008.08461, 2020.

Nagabandi, A., Kahn, G., Fearing, R. S., and Levine, S.

Neural network dynamics for model-based deep reinforcement learning with model-free ﬁne-tuning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559 7566. IEEE, 2018.

Ramachandran, P., Zoph, B., and Le, Q. V. Searching for

activation functions. ar Xiv preprint ar Xiv:1710.05941, 2017.

Ramakrishnan, R., Dral, P. O., Rupp, M., and Von Lilienfeld,

O. A. Quantum chemistry structures and properties of 134 kilo molecules. Scientiﬁc data, 1(1):1 7, 2014.

Rezende, D. J., Racani ere, S., Higgins, I., and Toth, P. Equiv-

ariant hamiltonian ﬂows. Co RR, abs/1909.13739, 2019.

Romero, D. W. and Cordonnier, J.-B. Group equivariant stand-alone self-attention for vision. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum? id=Jkf Yjn OEo6M.

Sch utt, K. T., Arbabzadah, F., Chmiela, S., M uller, K. R.,

and Tkatchenko, A. Quantum-chemical insights from deep tensor neural networks. Nature communications, 8 (1):1 8, 2017a.

Sch utt, K. T., Kindermans, P.-J., Sauceda, H. E., Chmiela, S.,

Tkatchenko, A., and M uller, K.-R. Schnet: A continuousﬁlter convolutional neural network for modeling quantum interactions. ar Xiv preprint ar Xiv:1706.08566, 2017b.

Serviansky, H., Segol, N., Shlomi, J., Cranmer, K., Gross, E.,

Maron, H., and Lipman, Y. Set2graph: Learning graphs from sets. Advances in Neural Information Processing Systems, 33, 2020.

Simonovsky, M. and Komodakis, N. Graphvae: Towards

generation of small graphs using variational autoencoders. In International Conference on Artiﬁcial Neural Networks, pp. 412 422. Springer, 2018.

Srinivasan, B. and Ribeiro, B. On the equivalence between

positional node embeddings and structural graph representations. ar Xiv preprint ar Xiv:1910.00452, 2019.

Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L.,

Kohlhoff, K., and Riley, P. Tensor ﬁeld networks: Rotation-and translation-equivariant neural networks for 3d point clouds. ar Xiv preprint ar Xiv:1802.08219, 2018.

Uy, M. A., Pham, Q.-H., Hua, B.-S., Nguyen, T., and Yeung,

S.-K. Revisiting point cloud classiﬁcation: A new benchmark dataset and classiﬁcation model on real-world data. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1588 1597, 2019.

Watters, N., Zoran, D., Weber, T., Battaglia, P., Pascanu,

R., and Tacchetti, A. Visual interaction networks: Learning a physics simulator from video. Advances in neural information processing systems, 30:4539 4547, 2017.

Weiler, M. and Cesa, G. General e(2)-equivariant steerable

cnns. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d Alch e-Buc, F., Fox, E. B., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, 2019.

You, J., Ying, R., Ren, X., Hamilton, W. L., and Leskovec, J.

Graphrnn: Generating realistic graphs with deep autoregressive models. ar Xiv preprint ar Xiv:1802.08773, 2018.