# trajectory_prediction_using_equivariant_continuous_convolution__bf8aa01f.pdf Published as a conference paper at ICLR 2021 TRAJECTORY PREDICTION USING EQUIVARIANT CONTINUOUS CONVOLUTION Robin Walters Northeastern University r.walters@northeastern.edu Jinxi Li Northeastern University li.jinxi1@northeastern.edu Rose Yu University of California, San Diego roseyu@ucsd.edu Trajectory prediction is a critical part of many AI applications, for example, the safe operation of autonomous vehicles. However, current methods are prone to making inconsistent and physically unrealistic predictions. We leverage insights from fluid dynamics to overcome this limitation by considering internal symmetry in real-world trajectories. We propose a novel model, Equivariant Continous COnvolution (ECCO) for improved trajectory prediction. ECCO uses rotationallyequivariant continuous convolutions to embed the symmetries of the system. On both vehicle and pedestrian trajectory datasets, ECCO attains competitive accuracy with significantly fewer parameters. It is also more sample efficient, generalizing automatically from few data points in any orientation. Lastly, ECCO improves generalization with equivariance, resulting in more physically consistent predictions. Our method provides a fresh perspective towards increasing trust and transparency in deep learning models. Our code and data can be found at https://github.com/Rose-STL-Lab/ECCO. 1 INTRODUCTION Trajectory prediction is one of the core tasks in AI, from the movement of basketball players to fluid particles to car traffic (Sanchez-Gonzalez et al., 2020; Gao et al., 2020; Shah & Romijnders, 2016). A common abstraction underlying these tasks is the movement of many interacting agents, analogous to a many-particle system. Therefore, understanding the states of these particles, their dynamics, and hidden interactions is critical to accurate and robust trajectory forecasting. Figure 1: Car trajectories in two scenes. Though the entire scenes are not related by a rotation, the circled areas are. ECCO exploits this symmetry to improve generalization and sample efficiency. Even for purely physical systems such as in particle physics, the complex interactions among a large number of particles makes this a difficult problem. For vehicle or pedestrian trajectories, this challenge is further compounded with latent factors such as human psychology. Given these difficulties, current approaches require large amounts of training data and many model parameters. State-of-the-art methods in this domain such as Gao et al. (2020) are based on graph neural networks. They do not exploit the physical properties of system and often make predictions which are not self-consistent or physically meaningful. Furthermore, they predict a single agent trajectory at a time instead of multiple agents simultaneously. Equal Contribution Published as a conference paper at ICLR 2021 Our model is built upon a key insight of many-particle systems pertaining to intricate internal symmetry. Consider a model which predicts the trajectory of cars on a road. To be successful, such a model must understand the physical behavior of vehicles together with human psychology. It should distinguish left from right turns, and give consistent outputs for intersections rotated with different orientation. As shown in Figure 1, a driver s velocity rotates with the entire scene, whereas vehicle interactions are invariant to such a rotation. Likewise, psychological factors such as reaction speed or attention may be considered vectors with prescribed transformation properties. Data augmentation is a common practice to deal with rotational invariance, but it cannot guarantee invariance and requires longer training. Since rotation is a continuous group, augmentation requires sampling from infinitely many possible angles. In this paper, we propose an equivariant continuous convolutional model, ECCO, for trajectory forecasting. Continuous convolution generalizes discrete convolution and is adapted to data in manyparticle systems with complex local interactions. Ummenhofer et al. (2019) designed a model using continuous convolutions for particle-based fluid simulations. Meanwhile, equivariance to group symmetries has proven to be a powerful tool to integrate physical intuition in physical science applications (Wang et al., 2020; Brown & Lunter, 2019; Kanwar et al., 2020). Here, we test the hypothesis that an equivariant model can also capture internal symmetry in non-physical human behavior. Our model utilizes a novel weight sharing scheme, torus kernels, and is rotationally equivariant. We evaluate our model on two real-world trajectory datasets: Argoverse autonomous vehicle dataset (Chang et al., 2019) and Traj Net++ pedestrian trajectory forecasting challenge (Kothari et al., 2020). We demonstrate on par or better prediction accuracy to baseline models and data augmentation with fewer parameters, better sample efficiency, and stronger generalization properties. Lastly, we demonstrate theoretically and experimentally that our polar coordinate-indexed filters have lower equivariance discretization error due to being better adapted to the symmetry group. Our main contributions are as follows: We propose Equivariant Continous COnvolution (ECCO), a rotationally equivariant deep neural network that can capture internal symmetry in trajectories. We design ECCO using a novel weight sharing scheme based on orbit decomposition and polar coordinate-indexed filters. We implement equivariance for both the standard and regular representation L2(SO(2)). On benchmark Argoverse and Traj Net++ datasets, ECCO demonstrates comparable accuracy while enjoying better generalization, fewer parameters, and better sample complexity. 2 RELATED WORK Trajectory Forecasting For vehicle trajectories, classic models in transportation include the Car Following model (Pipes, 1966) and Intelligent Driver model (Kesting et al., 2010). Deep learning has also received considerable attention; for example, Liang et al. (2020) and Gao et al. (2020) use graph neural networks to predict vehicle trajectories. Djuric et al. (2018) use rasterizations of the scene with CNN. See the review paper by Veres & Moussa (2019) for deep learning in transportation. For human trajectory modeling, Alahi et al. (2016) propose Social LSTM to learn these humanhuman interactions. Traj Net (Sadeghian et al., 2018) and Traj Net++ (Kothari et al., 2020) introduce benchmarking for human trajectory forecasting. We refer readers to Rudenko et al. (2020) for a comprehensive survey. Nevertheless, many deep learning models are data-driven. They require large amounts of data, have many parameters, and can generate physically inconsistent predictions. Continuous Convolution Continuous convolutions over point clouds (Cts Conv) have been successfully applied to classification and segmentation tasks (Wang et al., 2018; Lei et al., 2019; Xu et al., 2018; Wu et al., 2019; Su et al., 2018; Li et al., 2018; Hermosilla et al., 2018; Atzmon et al., 2018; Hua et al., 2018). More recently, a few works have used continuous convolution for modeling trajectories or flows. For instance, Wang et al. (2018) uses Cts Conv for inferring flow on LIDAR data. Schenck & Fox (2018) and Ummenhofer et al. (2019) model fluid simulation using Cts Conv. Closely related to our work is Ummenhofer et al. (2019), who design a continuous convolution network for particle-based fluid simulations. However, they use a ball-to-sphere mapping which is not well-adapted for rotational equivariance and only encode 3 frames of input. Graph neural networks (GNNs) are a related strategy which have been used for modeling particle system Published as a conference paper at ICLR 2021 dynamics (Sanchez-Gonzalez et al., 2020). GNNs are also permutation invariant, but they do not natively encode relative positions and local interaction as a Cts Conv-based network does. Equivariant and Invariant Deep Learning Developing neural nets that preserve symmetries has been a fundamental task in image recognition (Cohen et al., 2019b; Weiler & Cesa, 2019; Cohen & Welling, 2016a; Chidester et al., 2018; Lenc & Vedaldi, 2015; Kondor & Trivedi, 2018; Bao & Song, 2019; Worrall et al., 2017; Cohen & Welling, 2016b; Weiler et al., 2018; Dieleman et al., 2016; Maron et al., 2020). Equivariant networks have also been used to predict dynamics: for example, Wang et al. (2020) predicts fluid flow using Galilean equivariance but only for gridded data. Fuchs et al. (2020) use SE(3)-equivariant transformers to predict trajectories for a small number of particles as a regression task. As in this paper, both Bekkers (2020) and Finzi et al. (2020) address the challenge of parameterizing a kernel over continuous Lie groups. Finzi et al. (2020) apply their method to trajectory prediction on point clouds using a small number of points following strict physical laws. Worrall et al. (2017) also parameterizes convolutional kernels using polar coordinates, but maps these onto a rectilinear grid for application to image data. Weng et al. (2018) address rotational equivariance by inferring a global canonicalization of the input. Similar to our work, Esteves et al. (2018) use functions evenly sampled on the circle, however, their features are only at a single point whereas we assign feature vectors to each point in a point cloud. Thomas et al. (2018) introduce Tensor Field Networks which are SO(3)-equivariant continuous convolutions. Unlike our work, both Worrall et al. (2017) and Thomas et al. (2018) define their kernels using harmonic functions. Our weight sharing method using orbits and stabilizers is simpler as it does not require harmonic functions or Clebsch-Gordon coefficients. Unlike previous work, we implement a regular representation for the continuous rotation group SO(2) which is compatible with pointwise nonlinearities and enjoys an empirical advantage over irreducible representations. 3 BACKGROUND We first review the necessary background of continuous convolution and rotational equivariance. 3.1 CONTINUOUS CONVOLUTION Continuous convolution (Cts Conv) generalizes the discrete convolution to point clouds. It provides an efficient and spatially aware way to model the interactions of nearby particles. Let f (i) Rcin denote the feature vector of particle i. Thus f is a vector field which assigns to the points x(i) a vector in Rcin. The kernel of the convolution K : R2 Rcout cin is a matrix field: for each point x R2, K(x) is a cout cin matrix. Let a be a radial local attention map with a(r) = 0 for r > R. The output feature vector g(i) of particle i from the continous convolution is given by g(i) = Cts Conv K,R(x, f; x(i)) = X j a( x(j) x(i) )K(x(j) x(i)) f (j). (1) Cts Conv is naturally equivariant to permutation of labels and is translation invariant. Equation 1 is closely related to graph neural network (GNN) (Kipf & Welling, 2017; Battaglia et al., 2018), which is also permutation invariant. Here the graph is dynamic and implicit with nodes x(i) and edges eij if x(i) x(j) < R. Unlike a GNN which applies the same weights to all neighbours, the kernel K depends on the relative position vector x(i) x(j). 3.2 ROTATIONAL EQUIVARIANCE Continuous convolution is not naturally rotationally equivariant. Fortunately, we can translate the technique of rotational equivariance on CNNs to continuous convolutions. We use the language of Lie groups and their representations. For more background, see Hall (2015) and Knapp (2013). More precisely, we denote the symmetry group of 2D rotations by SO(2) = {Rotθ : 0 θ < 2π}. As a Lie group, it has both a group structure Rotθ1 Rotθ2 = Rot(θ1+θ2)mod2π which a continuous map with respect to the topological structure. As a manifold, SO(2) is homomeomorphic to the circle S1 = {x R2 : x = 1}. The group SO(2) can act on a vector space Rc by specifying a representation map ρ: SO(2) GL(Rc) which assigns to each element of SO(2) an element of the set of invertible c c matrices GL(Rc). The map ρ must a be homomorphism Published as a conference paper at ICLR 2021 ρ(Rotθ1)ρ(Rotθ1) = ρ(Rotθ1 Rotθ2). For example, the standard representation ρ1 on R2 is by 2 2 rotation matrices. The regular representation ρreg on L2(SO(2)) = {ϕ : SO(2) R : |ϕ|2 is integrable} is ρreg(Rotφ)(ϕ) = ϕ Rot φ. Given input f with representation ρin and output with representation ρout, a map F is SO(2)-equivariant if F(ρin(Rotθ)f) = ρout(Rotθ)F(f). Discrete CNNs are equivariant to a group G if the input, output, and hidden layers carry a G-action and the linear layers and activation functions are all equivariant (Kondor & Trivedi, 2018). One method for constructing equivariant discrete CNNs is steerable CNN (Cohen & Welling, 2016b). Cohen et al. (2019a) derive a general constraint for when a convolutional kernel K : Rb Rcout cin is G-equivariant. Assume G acts on Rb and that Rcout and Rcin are G-representations ρout and ρin respectively, then K is G-equivariant if for all g G, x R2, K(gx) = ρout(g)K(x)ρin(g 1). (2) For the group SO(2), Weiler & Cesa (2019) solve this constraint using circular harmonic functions to give a basis of discrete equivariant kernels. In contrast, our method is much simpler and uses orbits and stabilizers to create continuous convolution kernels. 4 ECCO: TRAJECTORY PREDICTION USING ROTATIONALLY EQUIVARIANT CONTINUOUS CONVOLUTION In trajectory prediction, given historical position and velocity data of n particles over tin timesteps, we want to predict their positions over the next tout timesteps. Denote the ground truth dynamics as ξ, which maps ξ(xt tin:t, vt tin:t) = xt:t+tout. Motivated by the observation in Figure 1, we wish to learn a model f that approximates the underlying dynamics while preserving the internal symmetry in the data, specifically rotational equivariance. We introduce ECCO, a model for trajectory prediction based on rotationally equivariant continuous convolution. We implement rotationally equivariant continuous convolutions using a weight sharing scheme based on orbit decomposition. We also describe equivariant per-particle linear layers which are a special case of continuous convolution with radius R = 0 analogous to 1x1 convolutions in CNNs. Such layers are useful for passing information between layers from each particle to itself. 4.1 ECCO MODEL OVERVIEW Figure 2: Overview of model architecture. Past velocities are aggregated by an encoder Enc. Together with map information this is then encoded by 3 Cts Convs into ρreg features. Then l + 1 Cts Conv layers are used to predict x. The predicted position ˆxt+1 = x+ x where x is a numerically extrapolated using velocity and accleration. Since x is translation invariant, ˆx is equivariant. The high-level architecture of ECCO is illustrated in Figure 2. It is important to remember that the input, output, and hidden layers are all vector fields over the particles. Oftentimes, there is also Published as a conference paper at ICLR 2021 environmental information available in the form of road lane markers. Denote marker positions by xmap and direction vectors by vmap. This data is thus also a particle field, but static. To design an equivariant network, one must choose the group representation. This choice plays an important role in shaping the learned hidden states. We focus on two representations of SO(2): ρ1 and ρreg. The representation ρ1 is that of our input features, and ρreg is for the hidden layers. For ρ1, we constrain the kernel in Equation 1. For ρreg, we further introduce a new operator, convolution with torus kernels. In order to make continuous convolution rotationally equivariant, we translate the general condition for discrete CNNs developed in Weiler & Cesa (2019) to continuous convolution. We define the convolution kernel K in polar coordinates K(θ, r). Let Rcout and Rcin be SO(2)-representations ρout and ρin respectively, then the equivariance condition requires the kernel to satisfy K(θ + φ, r) = ρout(Rotθ)K(φ, r)ρin(Rot 1 θ ). (3) Imposing such a constraint for continuous convolution requires us to develop an efficient weight sharing scheme for the kernels, which solve Equation 3. 4.2 WEIGHT SHARING BY ORBITS AND STABILIZERS. Given a point x R2 and a group G, the set Ox = {gx : g G} is the orbit of the point x. The set of orbits gives a partition of R2 into the origin and circles of radius r > 0. The set of group elements Gx = {g : gx = x} fixing x is called the stabilizer of the point x. We use the orbits and stabilizers to constrain the weights of K. Simply put, we share weights across orbits and constrain weights according to stabilizers, as shown in Figure 3-Left. The ray D = {(0, r) : r 0} is a fundamental domain for the action of G = SO(2) on base space R2. That is, D contains exactly one point from each orbit. We first define K(0, r) for each (0, r) D. Then we compute K(θ, r) from K(0, r) by setting φ = 0 in Equation 3 as such K(θ, r) = ρout(Rotθ)K(0, r)ρin(Rot 1 θ ). (4) For r > 0, the group acts freely on (0, r), i.e. the stabilizer contains only the identity. This means that Equation 3 imposes no additional constraints on K(0, r). Thus K(0, r) Rcout cin is a matrix of freely learnable weights. For r = 0, however, the orbit O(0,0) is only one point. The stabilizer of (0, 0) is all of G, which requires K(0, 0) = ρout(Rotθ)K(0, 0)ρin(Rot 1 θ ) for all θ. (5) Thus K(0, 0) is an equivariant per-particle linear map ρin ρout. Table 1: Equivariant linear maps for K(0, 0). Trainable weights are c R and κ: S1 R, where S1 is the manifold underlying SO(2). ρin ρout = ρ1 ρout = ρreg ρ1 (a, b) 7 (ca, cb) ca cos(θ) + cb sin(θ) ρreg f 7 c R S1 f(θ) cos(θ)dθ R S1 f(θ) sin(θ)dθ) S1 κ(θ φ)f(φ)dφ We can analytically solve Equation 5 for K(0, 0) using representation theory. Table 1 shows the unique solutions for different combinations of ρ1 and ρreg. For details see subsection A.3. Note that 2D and 3D rotation equivariant continuous convolutions are implemented in Worrall et al. (2017) and Thomas et al. (2018) respectively. They both use harmonic functions which require expensive evaluation of analytic functions at each point. Instead, we provide a simpler solution. We require only knowledge of the orbits, stabilizers, and input/output representations. Additionally, we bypass Clebsch-Gordon decomposition used in Thomas et al. (2018) by mapping directly between the representations in our network. Next, we describe an efficient implementation of equivariant continuous convolution. 4.3 POLAR COORDINATE KERNELS Rotational equivariance informs our kernel discretization and implementation. We store the kernel K of continuous convolution as a 4-dimensional tensor by discretizing the domain. Specifically, we Published as a conference paper at ICLR 2021 Figure 3: Left: A torus kernel field K from a ρreg-field to a ρreg-field. The kernel is itself a field: at each point x in space the kernel K(x) yields a different matrix. We denote the (φ2, φ1) entry of the matrix at x = (θ, r) by K(θ, r)(φ2, φ1). The matrices along the red sector are freely trainable. The matrices at all white sectors are determined by those in the red sector according to the circular shifting rule illustrated above. The matrix at the red bullseye is trainable but constrained to be circulant, i.e. preserved by the circular shifting rule. Right: The torus kernel acts on features which are functions on the circle. By cutting open the torus and features along the reg and orange lines we can identify the operation at each point with matrix multiplication. discretize R2 using polar coordinates with kθ angular slices and kr radial steps. We then evaluate K at any (θ, r) using bilinear interpolation from four closest polar grid points. This method accelerates computation since we do not need to use Equation 4 to repeatedly compute K(θ, r) from K(0, r). The special case of K(0, 0) results in a polar grid with a bullseye at the center (see Figure 3-Left). We discretize angles finely and radii more coarsely. This choice is inspired by real-world observation that drivers tend to be more sensitive to the angle of an incoming car than its exact distance, Our equivariant kernels are computationally efficient and have very few parameters. Moreover, we will discuss later in Section 4.5 that despite discretization, the use of polar coordinates allows for very low equivariance error. 4.4 HIDDEN LAYERS AS REGULAR REPRESENTATIONS Regular representation ρreg has shown better performance than ρ1 for finite groups (Cohen et al., 2019a; Weiler & Cesa, 2019). But the naive ρreg = {ϕ: G R} for an infinite group G is too large to work with. We choose the space of square-integrable functions L2(G). It contains all irreducible representations of G and is compatible with pointwise non-linearities. Discretization. However, L2(SO(2)) is still infinite-dimensional. We resolve this by discretizing the manifold S1 underlying SO(2) into kreg even intervals. We represent functions f L2(SO(2)) by the vector of values [f(Rot2πi/kreg)]0 i