# constants_of_motion_network__f2c3469b.pdf

Constants of motion network

M. F. Kasim & Y. H. Lim Machine Discovery Ltd. Oxford, United Kingdom {muhammad, yi.heng}@machine-discovery.com

The beauty of physics is that there is usually a conserved quantity in an alwayschanging system, known as the constant of motion. Finding the constant of motion is important in understanding the dynamics of the system, but typically requires mathematical proficiency and manual analytical work. In this paper, we present a neural network that can simultaneously learn the dynamics of the system and the constants of motion from data. By exploiting the discovered constants of motion, it can produce better predictions on dynamics and can work on a wider range of systems than Hamiltonian-based neural networks. In addition, the training progresses of our method can be used as an indication of the number of constants of motion in a system which could be useful in studying a novel physical system.

1 Introduction

Noether s theorem [1] states that if a system has a symmetry, then there s a constant of motion corresponding to it. As it turns out, constants of motion play a significant role in understanding the world around us and are ubiquitous in almost every aspect of physics. Among of the most prominent examples are the conservation of energy, the conservation of momentum, as well as the conservation of angular momentum.

Historically, constants of motion were discovered by doing analytical works from observational data or from mathematical descriptions of the systems equations of motion. For example, the law of conservation of energy was proposed by Châtelet based on Newton s work on classical mechanics [2]. With the recent emergence of employing neural networks for scientific discovery and learning systems behaviour from observational data [3, 4, 5, 6], naturally it raises a question, "can we find constants of motion of dynamical systems from their data and exploit them to make a better prediction?"

A large body of literature has been moving into this direction by learning the Hamiltonian [6, 7] or its variations [8, 9] of a system. However, as most of the previous works focus on Hamiltonian and its variations, they work well in conserving the Hamiltonian or energy. However, when the systems have other constants of motion, the Hamiltonian-based works fail to discover and exploit those quantities. Here we present the "Constant Of Motion n ETwork" (COMET) that can discover constants of motion of a system and exploit them to make a better prediction. In contrast to Hamiltonian-based networks [6, 7, 8, 9], COMET is not constrained to Hamiltonian systems and its coordinate choice, making it generally applicable to a wider range of systems as shown in Table 1. In addition, we also found that the training progress of COMET can be used as an indication of how many independent constants of motion there are in the system (see section 6) which could be a valuable hint in studying a novel physical system. Our implementation and experiments can be found in the public domain1.

1https://github.com/machine-discovery/comet/

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

NODE [10] HNN [6] NSF [7] LNN [8] COMET

Conserves energy Works on general coordinates Works on dissipative systems Conserves other quantities

Table 1: Summary of the methods comparison. The compared methods are neural ODE, Hamiltonian neural network, neural symplectic form, Lagrangian neural network, and COMET (ours).

2 Related works

The simplest way to learn the dynamics of systems with neural networks is by using neural ordinary differential equation (NODE) [10]. NODE takes the full states of the system and produces the dynamics of the systems, i.e. the time derivative of the states. The simulation can then be run by solving the ODE from the output of NODE. As there is no inductive bias in the NODE, they typically struggle to conserve quantities that are important in some systems dynamics, such as energy.

Hamiltonian neural network (HNN) [6] is an attempt to solve this conservation problem by learning the Hamiltonian and calculate the state dynamics from the Hamiltonian. It has been shown that with HNN, one can conserve the energy and produce better motion prediction in a long time horizon. Due to its simplicity and the elegance of the idea, HNN has been applied on a wide range of tasks and neural network architectures [11, 12, 13], and even on dissipative systems by adding a dissipation term [14, 15]. HNN can also be combined with symplectic integrator [16, 17] to produce a better result from the trajectory observations.

Despite its ability to conserve the energy, HNN is limited by the requirement of using canonical coordinates instead of arbitrary coordinates. Works on Lagrangian neural networks [18, 8] solve this limitation by learning the Lagrangian. Other attempts use neural symplectic form to learn the coordinate-free representation of Hamiltonian [7] or Poisson neural network to learn the Poisson system [9]. However, those works are still limited to Hamiltonian or Poisson systems.

3 COMET: constants of motion network

We start by denoting a set of states in a system as s Rns where ns is the number of states. States are the internal parameters of a system that completely determines its dynamics without external influence. For example, in a classical particle motion, the particle s position and velocity constitute the states of the system. Without external influence, the change of the states typically depends on the states itself, i.e. ds/dt = s(s).

A constant of motion is a quantity that is conserved over the time in the system, like energy. In some systems, such as integrable systems [19], there are other quantities other than energy that is conserved, for example, momentum or angular momentum. These constants of motion can typically be described as a function of the states of the system, so we denote it as c(s) Rnc with nc is the number of constants of motion. As their quantity is constant throughout the motion, their time derivative must be 0, or dc/dt = 0. By taking the dependency of c on s, the condition on c can be written as

s s = 0, (1)

where c/ s is an nc ns Jacobian matrix where each row of the matrix is the gradient of each constant of motion with respect to the states s. The equation above means that the state dynamics s must be perpendicular to the gradient of each constant of motion.

To design a deep learning architecture that can simultaneously learn the constants of motion and learn the dynamics that conserve the constant of motion, we define two functions that depends on the states that can be constructed with neural networks, s0(s) and c(s). The function s0 : Rns Rns is the initial guess of the rate of change of the states. The function c : Rns Rnc is the function that computes the constants of motion of the system. To ensure the constants of motion are conserved as in equation 1, we compute the state dynamics by orthogonalizing the initial guess s0 with respect to

the gradient of every constant of motion, s = ortho ( s0, { c1, c2, ..., cnc}) , (2) where ci is the i-th element of the constants of motion c and ortho(a, V) is an operation to orthogonalize the vector a to every vector in the set V.

3.1 Orthogonalization process

One way to produce an orthogonal vector against a set of vectors is by using QR decomposition, i.e. A = ( c1, c2, ..., cnc, s0) Q, R = QR(A) s = Q( ,nc)R(nc,nc), (3) where Q( ,nc) is the last column of the matrix Q, and R(nc,nc) is the element at the last row and last column of the matrix R. The first row of the equations above shows a construction of a tall matrix A Rns (nc+1) where the first nc columns are the gradient of the constants of motion and the last column is the initial guess of the states rate of change, s0. QR decomposition is usually implemented using Householder transformation [20] which produces much smaller numerical error than the alternative Gram-Schmidt process [21] in practice.

The QR procedure above imposes a constraint that the number of constants of motion must be less than the number of states, i.e. nc < ns. This is in agreement with the maximum number of independent constants of motion in an integrable system is nc = ns 1 [19]. By using QR decomposition, COMET will try to find nc independent constants of motion. Reasoning behind this is given in Appendix ??.

3.2 Training loss function

We need to train the two trainable functions in COMET, s0(s) and c(s), so that the state dynamics s from equation 3 match the dynamics from the observation or training data, ˆ s. In order to train COMET, the loss function in this case is constructed as

L = s ˆ s 2 + w1 s0 ˆ s 2 + w2

i=1 ci s0 2 , (4)

where w are the tunable regularization weights. The first term of the loss function is the standard L2 error where the prediction must match the training data. The second term of the equation above is included to accelerate the training process by making the initial guess s0 to be as close as possible to the actual value of the states rate of change. The third term is an additional regularization to help the discovery of the constants of motion.

4 Learning constants of motion from data

To demonstrate the capability of COMET to simultaneously learn both the dynamics and the constants of motion, we tested it in a variety of cases. For all the cases in this section, the training data were generated by simulating the dynamics of the system from t = 0 to t = 10. From the simulations, we collected the states s as well as the states rate of change, ˆ s, which were calculated analytically and were added a Gaussian noise with standard deviation σ = 0.05.

There are 6 simple experiments performed to demonstrate the capability of COMET: (1) mass-spring, (2) 2D pendulum, (3) damped pendulum, (4) two body, (5) nonlinear spring, and (6) Lotka-Volterra. The cases were selected to represent a wide variety of cases. It includes cases with Hamiltonian in canonical coordinates (case 1, 4, 5), Hamiltonian with non-canonical coordinates (case 2, 6), a case with redundant states (case 2), dissipative system (case 3), and a case with a moderate number of states (case 4). The details of each case, including the number of constants of motion that we set for COMET training, as well as the training setups are described in appendix ??.

For each case, we compared the performance of COMET with other methods: (1) simple neural ODE (NODE) [10], (2) Hamiltonian neural network (HNN) [6] with the coordinates given in each case below, (3) neural symplectic form (NSF) [7], and (4) Lagrangian neural network (LNN) [8]. The neural network architecture for each method is detailed in appendix ??.

Case NODE [10] HNN [6] NSF [7] LNN [8] COMET

mass-spring 0.17+0.10 0.13 0.19+0.24 0.17 0.22+0.13 0.17 0.12+0.08 0.09 0.10 +0.15 0.09 2D pendulum 0.087+30 0.067 0.10+13 0.09 0.11 +0.24 0.10 0.029+0.29 0.013 0.18 +0.17 0.14 damped pendulum 0.14 +0.03 0.05 110+10 110 fail fail 0.007+0.014 0.005 two body 460+980 460 0.49 +340 0.33 fail fail 0.42+0.48 0.39 nonlinear spring 0.63+0.38 0.35 0.13 +0.71 0.11 0.19+0.70 0.15 0.17+0.70 0.14 0.23+0.40 0.18 Lotka-Volterra 0.12+0.36 0.10 0.65+1.6 0.59 0.080 +0.20 0.071 N/A 0.048+0.055 0.041 Table 2: Root mean squared error of 100 randomly initialized simulations for each case and each method. The main number is the median while the range represents the 95% percentile (i.e. lower and upper bounds are 2.5% and 97.5% percentiles, respectively). The bolded values are the ones that give the best upper bound among other methods, while the italic values denote the second best. fail means that there are integration failures with scipy s solve_ivp which makes it unable to integrate to t = 100 in a reasonable time.

4.1 Results

Figure 1: The contour plot of constant of motion discovered by COMET (left) compared to the true constant of motion (right) for mass-spring (top) and Lotka-Volterra (bottom) cases.

For each case, we tested each method by running another 100 simulations from t = 0 to t = 100 (10 times longer than the training) using 1000 sampled points with the initial condition randomly initialized as above using different seed. The root mean squared errors of the state predictions are shown in table 2.

From table 2, we can see that COMET performs well across the test cases. In the mass-spring case, all methods perform well. However, when it goes to the pendulum cases and Lotka-Volterra, HNN fails to predict the dynamics due to the chosen coordinates not being the canonical coordinates. Although NSF can perform reasonably well in 2D pendulum, it fails on the damped pendulum case because it is not a Hamiltonian system. COMET takes the advantage of having constants of motion in the cases above that can be exploited to guide the true trajectory, therefore, it can achieve better predictions of the dynamics regardless of the chosen coordinates and whether it conserves energy or not.

Figure 1 shows the discovered constant of motion in mass-spring and Lotka-Volterra cases. As it can be seen from the figure, COMET can successfully discovered the constants of motion from the data. Figure 2 shows the evolution of the known constants of motion for every method in the mass-spring, 2D pendulum, and the two body cases. The periodic variation from the true constants of motion are due to the added noise in the training data. In mass-spring case, figure 2(a) shows that the HNN, NSF, and COMET conserves the energy while the NODE gets the energy decreasing over time.

A different story can be found in figures 2(b) where they show 3 constants of motion of pendulum in 2D coordinate. In this case, NODE and HNN fail to conserve the constants of motion, while COMET can conserve those constants of motion during long period of time. The failure of HNN can be attributed to the state coordinates not being the canonical coordinates. This shows that COMET can discover the constants of motion with much less constraints in the state coordinates than HNN.

For the two body case in Figure 2(c), we can see that NODE and NSF diverges quite quickly. The failure of NSF in this case might be due to the added noise in the training data. Among the tested methods, only HNN and COMET conserves the energy. However, as HNN is only designed for Hamiltonian or energy conservation, it fails to conserve other quantities, such as the momentum and angular momentum. COMET, on the other hand, can successfully conserve those quantities.

Figure 2: The constants of motion calculated for every method for (a) mass-spring, (b) 2D pendulum, and (c) two body. Please note that the integration of NSF and LNN for the two body case cannot be completed.

Figure 3: Motion trajectory of the simulated two body system using Neural ODE, HNN, NSF, and COMET (ours) from t = 0 to t = 20.

Figure 3 shows the trajectory of the two body system simulated using various methods tested here from t = 0 to t = 20. From the figure, we can see that only our method (COMET) that can produce closed trajectory. HNN produces almost closed trajectory, but it slightly deviates from the closed trajectory because it conserves only the energy, but does not necessarily conserve the other quantities. By exploiting as many constants of motion as possible, COMET can reproduce the motion with the small error compared to the other methods.

5 Systems with external influences

One advantage of COMET is that it can easily work with systems with external influences, such as external forces. If the system has conserved quantities when the external influences are kept constant, then COMET with a simple modification can be used to learn the constants of motion and exploit them to get more accurate dynamics. The modification is just to make the initial guess of the dynamics and the constants of motion to depend on the external influences as well as the states, i.e. s0(s, x) and c(s, x), where x Rnx is the external influence. The dynamics can still be calculated following the equation 2.

We conducted an experiment using the 2D pendulum from the section 4, but with additional external force in the x-direction, Fx. The training data was generated by having the external force with profile Fx(t) = a0 cos(a1t + a2) with uniformly-distributed random values of a0 U( 0.5, 0.5), a1 U(0, 5), and a2 U(0, 2π). The experiment was done similarly like in section 4, by adding the force as the input to the neural network for NODE, HNN, NSF, and LNN as well.

Figure 4 shows the constants of motion on the test system with constant external force. As seen on the figure, the values of the true constants of motion produced by COMET is oscillating slightly around a constant offset, due to the added noise in the training. In contrast, NODE produces the shift of the energy values, NSF produces large oscillation even for the energy, and LNN quickly diverges. Although HNN can produce similar energy deviation with COMET, it has larger deviation on other

Figure 4: Constants of motion of the forced 2D pendulum case calculated using NODE, HNN, NSF, LNN, and COMET.

conserved quantities. This shows that even with external forces, COMET can still find and exploit the constants of motion for its dynamic predictions.

6 Finding the number of constants of motion

For most systems, the number of independent constants of motion is usually not known beforehand and not so obvious. Knowing the number of constants of motion can be useful in understanding the manifold dimension of the motion, however, this problem is not an easy problem to solve. During the research of this work, we observe that COMET s training progresses might provide a valuable indication to the number of constants of motion.

The parameter to look at is the first term in the loss function in equation 4, i.e

L1 = s ˆ s 2 . (5)

If we set the number of constants of motion greater than the true number, then that term could not get lower than a certain value. It is because s is constrained to be perpendicular to the constants of motion and if there are excessive constants of motion, then it may not be able to match the value from experiments to a certain accuracy.

We ran a simple experiment to find the number of constants of motion for known systems. Specifically, COMET was trained in the damped pendulum, two body, and 2D nonlinear spring cases from section 4 without added noise and ran for 3000 epochs. Those cases are known to have 2, 7, and 2 constants of motion out of 4, 8, and 4 number of states, respectively. The numbers of constants of motion were scanned from 0 to the maximum value, ns 1. Figure 5 shows the value of L1 for the various cases with varying number of constants of motion.

From the figure 5 (top), we can see that once the number of constants of motion is set above a certain number, the value of L1 suddenly increases. This gives an indication of the actual number of constants of motion. If the system has the maximum number of constants of motion, then the values of L1 will always be similar to the values with nc = 0. Besides the final value of L1, the evolution value of L1 for various number of constants of motion can be an indication on the true number of the constants of motion as we can see in figure 5 (bottom). From figure 5, we can see that the number of constants of motion for the damped pendulum case is 2, for the two body case is 7 (the maximum number), and for the nonlinear spring case is 2.

6.1 Failure mode

This technique in determining the number of constants of motion depends on the ability of the neural network to find the constants of motion. Therefore, if the neural network is not expressive enough, it could fail to find the constants of motion and indicate a lower number of constants of motion than it should be.

Figure 6 illustrates this case where we scanned the number of constants of motion from 0 to 3 in the 2D nonlinear spring case where the neural network only has 50 hidden elements per layer instead of 250. It gives an indication that the number of constants of motion to be 1 instead of the true number of 2.

Figure 5: (Top row) The relative mean values of L1 from equation 5 for damped pendulum, two body, and nonlinear spring cases, with number of constants of motion nc were scanned from 0 to ns 1. The relative values were calculated by dividing it by the value of L1 at nc = 0. The values and the error bars were respectively obtained by taking the mean and std from 5 COMETs trained with different random seeds. (Bottom row) The values of L1 during the training for various numbers of constants of motion for damped pendulum, two body, and nonlinear spring cases.

7 More complex cases

Figure 6: The failure of finding the number of constants of motion using a smaller network.

Simulating a system with infinite number of states The previous examples only involve systems with finite or countable number of states. To demonstrate the general applicability of COMET, we ran an experiment on simulation of systems with infinite (but discretized) number of states. Specifically, we trained the COMET to learn the dynamics of shallow wave following Korteweg-De Vries (Kd V) equation [22, 23] of u(x, t), u

x3 . The states in this case are the values of u along the x-axis which constitutes infinite number of states.

In our experiment, we simulate the behaviour of u from x = 0 to x = L = 5 with periodic boundary condition, sampled in 100 points with uniform spacing. We also set δ = 0.00022 for numerical stability. The training dataset was generated by running 100 simulations with random initial condition from t = 0 to t = 10 with 100 steps. The initial condition in the training dataset is u(x, 0) = a0 + a1 cos (2πx/L + a2) where a0, a1, and a2 are randomly chosen within the range of (1.5, 2.5), (0, 1), and (0, 2π), respectively.

The neural network was constructed with 1D convolutional layers with kernel size 5 and circular padding, followed by logsigmoid activation function. The pattern above was repeated 4 times but without the activation function for the last one, using 250 channels in the hidden layers. The training was done as described in section 4 which takes about 5-7 hours on an NVIDIA T4 GPU. The number of channels in the input is 1 (only for u), and for the output it is 1 + nc where nc is the number of constants of motion that we set. The first channel of the output is to represent the initial guess of the dynamics, u0(x). The last nc channels are for what we call as the constants of motion density, pi(x), for i = 1, ..., nc. From pi(x), the constants of motion can be calculated as ci = R L 0 pi(x) dx. Using the outputs from the network, the dynamics can be calculated following the equation 2.

We compared the performance of NODE and COMET in solving the Kd V equation for t = 0 to 20. It is not obvious how to apply HNN, NSF, and LNN as the Kd V equation has only u as its states and do not include velocity nor momentum. Figure 7 shows the states u(x, t) at t = 20 of the simulations using the true dynamics, NODE, and COMET. From the figure, we can see that at t = 20,

Figure 7: Plot of u(x, t) at t = 20 from simulations done by the true analytic expression, NODE, and COMET using 1 and 2 constants of motion. All simulations were initialized to the same initial condition. The simulation run by NODE already diverges at t = 20.

the simulation done by NODE has diverged while COMET simulations are still intact. This shows that COMET can take advantage of the constants of motion to make its prediction more accurate.

Learning from pixels Another experiment that we run to show COMET s capability is to learn the dynamics from pixels and to see if it can exploit the constants of motion in the latent space of an auto-encoder. In this case, we simulate the dynamics of two-body with gravitational interactions (like in section 4) and present the data as 30 30 pixels images that show the positions of the 2 objects.

Our model consists of an encoder, a COMET, and a decoder. The inputs to our model are 2 consecutive frames to capture the information about positions and velocities which we denote as x. The encoder converts the pixel inputs to latent variables, s = fe(x). The dynamics of the latent variables are then learned by COMET which will then be converted back to pixel images, x = fd(s), by the decoder.

The loss function in this case is simply a sum of the COMET loss from equation ?? and the autoencoder loss, i.e. Lae = x x 2. The observed state derivatives ˆ s to compute the COMET loss in equation ?? is approximated by finite difference of the encoded pixel data from 2 consecutive time steps, i.e. ˆ s [fe(x(t + t)) fe(x(t))]/ t. In contrast to HNN and LNN, there is no requirement in COMET to make half of the latent states to be the time derivative of the other half.

In this experiment, we compared the performance of COMET, NODE, HNN, and NSF to learn the dynamics of the latent variables of the auto-encoder. The auto-encoder architectures are kept to be the same for all methods. The number of latent states are picked arbitrarily to be 10. It is more than the true number of states which is 8. For COMET, the number of constants of motion for COMET was found by following the procedure in section 6 which is 9.

Figure 8 shows a sample of the decoded images of the dynamics predicted by COMET, NODE, HNN, and NSF, compared to the ground truth. From the figure, we can see that the dynamics predicted by NODE diverges as soon as t = 20 and at some point the decoder produces non-sensible images. Although NSF and HNN can produce good images until the end of simulation at t = 180, the dynamics predicted by NSF and HNN diverges from the ground truth. In contrast, COMET can still match the dynamics of the ground truth simulation until the end of simulation. This shows COMET s capability in accurately learn the dynamics of latent states of an auto-encoder.

8 Discussions

Limitations COMET works better if there is at least one constant of motion in the system. If there is no constant of motion, then COMET works similarly like the neural ODE [10]. Although we presented a way to find out the number of constants of motion in section 6, it still requires multiple training processes and manual insight.

In the case of successful training, COMET sometimes produces dynamics that are stiffer than the true dynamics, although LNN and NSF more often produce stiffer dynamics. In a rare case, the dynamics from COMET are so stiff that the integration by scipy s solve_ivp cannot be completed in a reasonable time. This only happens in the Kd V case and did not happen in the other cases we tested for this paper. We believe that the limitations above should be addressed to move forward.

Broader impact The impact of deep learning on physical sciences is expected to be similar to the impact of scientific computational method. Although it enables new fields of study, it adds more point of failure. For example, if a new or unexpected result is discovered using deep learning methods, it

Figure 8: Snapshots of the decoded images of the dynamics predicted by COMET, NODE, HNN, and NSF, as well as the ground truth images.

could be a true discovery or false discovery due to the failure/imperfection in training, unsuitable neural network architecture, bug in the code, among other things. Therefore, deep learning methods such as COMET should be accompanied with other different and independent methods to confirm obtained results in scientific works.

9 Conclusions

We have shown that COMET can simultaneously learn the constants of motion and the dynamics of a system from observational data. Because the assumption made by COMET (i.e. have constants of motion) is less strict than Hamiltonian-based neural networks, it can be applied to a wider range of systems than the Hamiltonian-based neural networks, including dissipative systems and systems with external influences. The training progresses of COMET can also give an indication on the number of constants of motion in a system. With all the advantages we presented, we believe that COMET can be a valuable tool for scientific machine learning in the future.

Acknowledgement

We would like to thank Sam Vinko and Brett Larder for their helpful comments in improving the manuscript as well as Ayesha Chairannisa for proofreading the paper.

[1] Emmy Noether. Invariant variation problems. Transport theory and statistical physics, 1(3):186 207, 1971.

[2] Ruth Hagengruber. Emilie du Châtelet between Leibniz and Newton. Springer, 2012.

[3] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583 589, 2021.

[4] Li Li, Stephan Hoyer, Ryan Pederson, Ruoxi Sun, Ekin D Cubuk, Patrick Riley, Kieron Burke, et al. Kohn-sham equations as regularizer: Building prior knowledge into machine-learned physics. Physical review letters, 126(3):036401, 2021.

[5] Muhammad F Kasim and Sam M Vinko. Learning the exchange-correlation functional from nature with fully differentiable density functional theory. Physical Review Letters, 127(12):126403, 2021.

[6] Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. Advances in Neural Information Processing Systems, 32, 2019.

[7] Yuhan Chen, Takashi Matsubara, and Takaharu Yaguchi. Neural symplectic form: Learning hamiltonian equations on general coordinate systems. Advances in Neural Information Processing Systems, 34, 2021.

[8] Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. Lagrangian neural networks. ar Xiv preprint ar Xiv:2003.04630, 2020.

[9] Pengzhan Jin, Zhen Zhang, Ioannis G Kevrekidis, and George Em Karniadakis. Learning poisson systems and trajectories of autonomous systems via poisson neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2022.

[10] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.

[11] Alvaro Sanchez-Gonzalez, Victor Bapst, Kyle Cranmer, and Peter Battaglia. Hamiltonian graph networks with ode integrators. ar Xiv preprint ar Xiv:1909.12790, 2019.

[12] Peter Toth, Danilo Jimenez Rezende, Andrew Jaegle, Sébastien Racanière, Aleksandar Botev, and Irina Higgins. Hamiltonian generative networks. ar Xiv preprint ar Xiv:1909.13789, 2019.

[13] Chen-Di Han, Bryan Glaz, Mulugeta Haile, and Ying-Cheng Lai. Adaptable hamiltonian neural networks. Physical Review Research, 3(2):023156, 2021.

[14] Sam Greydanus and Andrew Sosanya. Dissipative hamiltonian neural networks: Learning dissipative and conservative dynamics separately. ar Xiv preprint ar Xiv:2201.10085, 2022.

[15] Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Dissipative symoden: Encoding hamiltonian dynamics with dissipation and control into deep learning. ar Xiv preprint ar Xiv:2002.08860, 2020.

[16] Zhengdao Chen, Jianyu Zhang, Martin Arjovsky, and Léon Bottou. Symplectic recurrent neural networks. In International Conference on Learning Representations, 2019.

[17] Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Symplectic ode-net: Learning hamiltonian dynamics with control. In International Conference on Learning Representations, 2019.

[18] Michael Lutter, Christian Ritter, and Jan Peters. Deep lagrangian networks: Using physics as model prior for deep learning. ar Xiv preprint ar Xiv:1907.04490, 2019.

[19] Nigel J Hitchin, Graeme B Segal, and Richard Samuel Ward. Integrable systems: Twistors, loop groups, and Riemann surfaces, volume 4. OUP Oxford, 2013.

[20] Alston S Householder. Unitary triangularization of a nonsymmetric matrix. Journal of the ACM (JACM), 5(4):339 342, 1958.

[21] Ward Cheney and David Kincaid. Linear algebra: Theory and applications. The Australian Mathematical Society, 110:544 550, 2009.

[22] Olivier Darrigol. Worlds of flow: A history of hydrodynamics from the Bernoullis to Prandtl. Oxford University Press, 2005.

[23] Norman J Zabusky and Martin D Kruskal. Interaction of" solitons" in a collisionless plasma and the recurrence of initial states. Physical review letters, 15(6):240, 1965.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] See sections 6.1 and 8.

(c) Did you discuss any potential negative societal impacts of your work? [Yes] See section 8. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [N/A] (b) Did you include complete proofs of all theoretical results? [N/A] 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the footnote at the end of section 1.

(b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See section 4. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [Yes] See table 2 and figure 5. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See section 4. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [N/A] (b) Did you mention the license of the assets? [N/A]

(c) Did you include any new assets either in the supplemental material or as a URL? [N/A]

(d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]