# learning_macroscopic_dynamics_from_partial_microscopic_observations__784c2aea.pdf Learning Macroscopic Dynamics from Partial Microscopic Observations Mengyi Chen1, Qianxiao Li1, 2 Department of Mathematics, National University of Singapore1, Institute for Functional Intelligent Materials, National University of Singapore2 chenmengyi@u.nus.edu, qianxiao@nus.edu.sg Macroscopic observables of a system are of keen interest in real applications such as the design of novel materials. Current methods rely on microscopic trajectory simulations, where the forces on all microscopic coordinates need to be computed or measured. However, this can be computationally prohibitive for realistic systems. In this paper, we propose a method to learn macroscopic dynamics requiring only force computations on a subset of the microscopic coordinates. Our method relies on a sparsity assumption: the force on each microscopic coordinate relies only on a small number of other coordinates. The main idea of our approach is to map the training procedure on the macroscopic coordinates back to the microscopic coordinates, on which partial force computations can be used as stochastic estimation to update model parameters. We provide a theoretical justification of this under suitable conditions. We demonstrate the accuracy, force computation efficiency, and robustness of our method on learning macroscopic closure models from a variety of microscopic systems, including those modeled by partial differential equations or molecular dynamics simulations. Our code is available at https://github.com/MLDS-NUS/Learn-Partial.git. 1 Introduction Macroscopic properties, including thestructural and dynamical properties, provide a way to describe and understand the collective behaviors of complex systems. In a wide range of real applications, researchers focus mainly on the macroscopic properties of a system, e.g., the viscosity and ionic diffusivity of liquid electrolytes for Li-ion batteries (Dajnowicz et al., 2022). Macroscopic observables usually depend on the whole microscopic system, e.g., the calculation of mean squared displacement requires all microscopic coordinates during the simulation. With growing simulation and experimental data, data-driven learning of macroscopic properties from microscopic observations has become an active area of research (Zhang et al., 2018; Wang et al., 2019; Husic et al., 2020; Lee et al., 2020; Fu et al., 2023; Chen et al., 2024). Accurate calculation of macroscopic properties requires large-scale microscopic simulation. However, accurate force computations on all microscopic coordinates for large systems are extremely expensive (Jia et al., 2020; Musaelian et al., 2023). For example, in ab initio molecular simulations, accurate forces need to be calculated from density functional theory (DFT). The computational cost of DFT limits its application to relatively small systems, typically ranging from a few hundred atoms to several thousand atoms, depending on the level of accuracy and computation resources (Hafner et al., 2006; Luo et al., 2020). This poses a dilemma: Accurate macroscopic properties are obtained from large-scale microscopic simulation, but the computation of forces on all the microscopic coordinates is extremely challenging. 38th Conference on Neural Information Processing Systems (Neur IPS 2024). Figure 1: Overview of our method. Left. Data generation workflow. For each configuration x, forces on a subset of all the microscopic coordinates are calculated by the microscopic force calculator. Right. Macroscopic dynamics identification. The macroscopic dynamics is mapped to the microscopic space first, then compared with the forces on a subset of the microscopic coordinates. To solve the dilemma, the corresponding question is: Can we still obtain accurate macroscopic observables even though only access to forces on a subset of the microscopic coordinates? In this work, we develop a method to learn the dynamics of the macroscopic observables directly, while only forces on a subset of the microscopic coordinates are needed. Efficient partial computation of microscopic forces relies on the sparsity assumption, where the computation cost of forces on a subset of microscopic coordinates does not scale with the microscopic system size. To learn the dynamics of the macroscopic observables, we first map the macroscopic dynamics back to the microscopic space, then compare it with the partial microscopic forces. Our key idea is summarized in Fig. 1. Our main contributions are as follows: We develop a novel method that can learn the macroscopic dynamics from partial computation of the microscopic forces. Our method can significantly reduce the computational cost for force computations. We theoretically justify that forces on a subset of the microscopic coordinates can be used as stochastic estimation to update latent model parameters, even if the macroscopic observables depend on all the microscopic coordinates. We empirically validate the accuracy, force computation efficiency, and robustness of our method through a variety of microscopic dynamics and latent model structures. 2 Related Work Learning from Partial Observations Several works have sought to learn dynamics from partially observed state ˆx utilizing machine learning (Ruelle & Takens, 1971; Sauer et al., 1991; Takens, 2006; Ayed et al., 2019; Ouala et al., 2020; Huang et al., 2020; Schlaginhaufen et al., 2021; Lu et al., 2022; Stepaniants et al., 2023). For training, these methods reconstruct the unobserved state x first and model the dynamics of x pˆx, xq. Our work assumes full state x but partial forces f. Furthermore, we do not model the dynamics on state x directly, but rather on the latent space because the dimension of x would be extremely high for large systems. Reduced Order Models By modeling the dynamics in the latent space and then recovering the full states from them, reduced order models (ROMs) substitute expensive full order simulation with cheaper reduced order simulation (Schilders et al., 2008; Fresca et al., 2020; Lee & Carlberg, 2020; Hernandez et al., 2021; Fries et al., 2022). Our method can be thought to fall in the range of closure modeling. Unlike ROMs, we aim to model the dynamics of some given macroscopic observables directly and are not interested in recovering the microscopic states from the latent states. Equation-free Framework The equation-free framework (EFF) has sought to simulate the macroscopic dynamics efficiently (Kevrekidis et al., 2003; Samaey et al., 2006; Liu et al., 2015). The EFF is usually applied to partial differential equation (PDE) systems, and the macroscopic observables are chosen to be the solution of the PDE at the coarse spatial grid. In EFF, the macroscopic observables depend locally on the microscopic coordinates, allowing the macroscopic dynamics to be directly estimated from the microscopic simulations performed in small spatial domains. In contrast, the macroscopic observables may depend globally on the microscopic coordinates in our method, and the macroscopic dynamics may not be easily estimated from microscopic simulations performed in small spatial domains. Another difference is that our method explicitly learns the macroscopic dynamics, while EFF can bypass explicit derivation of macroscopic evolution law by coupling microscale and macroscale dynamics. During simulation, EFF still requires microscopic simulation to be performed in small spatial domains and for short times, but our method can enable fast macroscopic simulation without requiring any microscopic simulation However, for systems where the macroscopic evolution equations conceptually exist but are not available in closed form, EFF can efficiently handle such cases, but the learned dynamics in our method may involve approximation or statistical errors that are often challenging to estimate. 3 Problem Setup We consider a microscopic system consisting of n particles. Let the state of the microscopic system be x px1, , xnq P RN, xi P Rm, N mn, where xi P Rm is some physical quantity associated with the i-th particle, such as the position and velocity. Assume the dynamics of the microscopic system can be characterized by an ordinary differential equation(ODE): dt fpxptqq (1) where fpxq pf1pxq, , fnpxqq P RN. We will call xi the microscopic coordinate of the i-th particle and fi the force acting on the microscopic coordinate xi. In many real applications, we are interested in the dynamics of some macroscopic observables z φ pxq. Here φ is given beforehand and describes the functional dependence of z on x. For example, z can be chosen to be the instantaneous temperature or mean squared displacement in a Lennard-Jones system. The goal is to learn the dynamics of z from microscopic simulation data. Existing methods that try to learn the macroscopic or latent dynamics require microscopic trajectories or forces on all the microscopic coordinates for training (Champion et al., 2019; Fries et al., 2022; Fu et al., 2023; Chen et al., 2024). The problem is: When the microscopic system size N is very large such that the force computations on all the microscopic coordinates are impossible, these methods are no longer applicable. Instead, our method aims to learn from partial computation of microscopic forces. Consider we are given a microscopic force calculator S for computation of partial microscopic forces. Let the microscopic coordinate x be sampled from a distribution D. For each x, the microscopic force calculator S will first sample an n dimensional random variable Ipxq p I1pxq, , Inpxqq Px P t0, 1un according to a certain strategy. Next S will calculate the corresponding partial forces f Ipxq : pf I1pxq, , f Inpxqq. For notation simplicity sometimes we will simply write f Ipxq as f I. Iipxq indicate whether partial i is chosen to calculate the force or not. If particle i is chosen, Iipxq 1, f Iipxq fi, otherwise Iipxq 0, f Iipxq 0. We require the sampling strategy Px to satisfy: 1. For each Ipxq Px, exactly n p items are equal to 1 and the rest are 0. 2. Each particle can be chosen with equal probability p, i.e. Pp Iipxq 1q p, Pp Iipxq 0q 1 p, i 1, , n. This means that the microscopic force calculator S can calculate forces on n p microscopic coordinates, and 0 ă p ă 1 limits the computation capacity of the microscopic force calculator S. The above requirement is consistent with real applications since it is difficult to calculate all the microscopic forces due to computational cost. Furthermore, for efficient calculation of the partial microscopic forces, we will assume f satisfies the following sparsity assumption: Assumption: For a given error tolerance ϵ ą 0, there exists a constant M ! n, such that for any x D and i P t1, , nu, we can always find an index set Jpxiq Ă ti 1, , nu, |Jpxiq| ă M which satisfies: ||fipx1, , xnq fiptxiui PJpxiqq||2 ă ϵ (2) Intuitively, the assumption implies that the computational cost of force fi is independent of the microscopic system dimension N. Thus our microscopic force calculator S can compute partial forces in an efficient way. This assumption is prevalent in real-world applications. To better illustrate this, we give two examples here. The first example is about molecular dynamics. In molecular dynamics, each xi represents the position ri and velocities vi of the i-th atom, i.e., xi pri, viq P R6. Then Eq. (1) becomes the Newton s law of motion: mi Fipr1, , rnq, (4) It is common to limit the range of pairwise interactions to a cutoff distance (Allen et al., 2004; Zhou & Liu, 2022; Vollmayr-Lee, 2020). To calculate the force on an atom, we only need to consider its interaction with other atoms that are within the cutoff. The second example is about systems modeled by partial differential equation (PDE). We consider a time-dependent PDE and apply finite difference scheme to discretize the spatial derivatives. Then, the resulting semi-discretized equation takes the form of Eq. (1), and each xi is the value at the i-th grid. fi only depends on those grids that are used for finite difference approximation of the spatial derivatives. Let the training data generated by the microscopic force calculator S be txi, f Ipxiqui 1, ,K. The data generation procedure is provided in Algorithm 1. We will introduce in the next section how we can learn the macroscopic dynamics from the training data with partial forces. Existing works for macroscopic dynamics identification consist of two parts: dimension reduction and macroscopic dynamics identification (Fu et al., 2023; Chen et al., 2024). We will follow these two parts. We start with most standard parts of closure modeling with an autoencoder, next, we turn to the main difficulty of macroscopic dynamics identification from partial forces. 4.1 Autoencoder for Closure Modeling We will use an autoencoder to find the closure ˆz ˆφpxq to z φ pxq such that z pz , ˆzq forms a closed system. Here we define z as forming a closed system if its dynamics 9z depends only on z, not any external variables. Note that in z φ pxq, φ is determined beforehand and contains no trainable parameters. This ensures z represents the desired macroscopic observables and remains unchanged during the training of the autoencoder. Denote the encoder by φ pφ , ˆφq and the decoder by ψ, we will minimize the following reconstruction loss: K řK i 1 xi ψ φpxiq 2 2 (5) We also want φ1pxqφ1pxq T to be well-conditioned (see Section 4.2), then we impose constraints on the condition number of φ1pxqφ1pxq T : K řK i 1 κpφ1pxiqφ1pxiq T q 1 2 2 (6) By enforcing φ1pxiqφ1pxiq T to be well-conditioned, we are also enforcing φ1pxiq P RdˆN, d ! N to have full row rank, which will be used later. The overall loss to train the autoencoder is : LAE Lrec λcond Lcond, (7) Algorithm 1 Data generation. D:configuration distribution S: microscopic force calculator K: training data size P: partial index sampling strategy 1: for i 1 to K do 2: xi D 3: Ipxiq Pxi 4: calculate f Ipxiqpxiq using S return txi, f Ipxiqpxiqui 1, ,K Algorithm 2 Training procedure. txi, fxipxiqui 1, ,K: data B: minibatch size θ0: model parameter opt: optimizer 1: while stopping criterion is not met do 2: sample J Ă t1, , Ku, |J| B 3: Calculate Lx,p in Eq. (12) with txi, fxipxiqui PJ 4: θt 1 Ð opt(θt, Lx,p) return model parameter λcond is a hyperparameter to adjust the ratio of Lcond and is chosen to be quite small in our experiments, e.g., 10 5 or 10 6. The aim of training the decoder ψ is to help the discovery of the closure variables. The decoder ψ will not be used for further macroscopic dynamics identification. To facilitate comparison between models tainted with all and partial forces, we will train the autoencoder first and freeze it for macroscopic dynamics identification. 4.2 Macroscopic Dynamics Identification We now address the difficulty of learning from data with partial microscopic forces. Substitute z φpxq into equation Eq. (1) and make use of chain rule, we get the dynamics of z: dt φ1pxqfpxq, zp0q φpx0q (8) here we use φ1pxq to denote the Jacobian of xφpxq for notation simplicity. If the dynamics of z is closed, the right-hand side of Eq. (8) will only depend on z, and we parametrize it using a neural network gθpzq φ1pxqfpxq. Since we are only interested in macroscopic dynamics identification, the loss would be naturally defined on the macroscopic coordinates: K řK i 1 ||φ1pxiqfpxiq gθpziq||2 (9) Eq. (9) is used commonly in existing work (Champion et al., 2019; Fries et al., 2022; Bakarji et al., 2022; Park et al., 2024). The main difficulty of Lz is that it includes the matrix-vector product φ1pxqfpxq. Note that the i-th entry of φ1pxqfpxq can be written as řn j 1 φ1 ijpxqfjpxq, and it is difficult to find an unbiased estimation of the i-th entry using a subset of tfjpxquj 1, ,N. Thus the accurate calculation of Lz requires the forces tfjpxquj 1, ,N on all the microscopic coordinates. The main idea of our method is to map the loss on the macroscopic coordinates back to the microscopic coordinates: K řK i 1 ||fpxiq pφ1pxiqq:gθpziq||2 (10) where pφ1pxiqq: P RNˆd is the Moore-Penrose inverse. Since φ1pxiq is of full row rank, pφ1pxiqq: is in fact the right inverse of φ1pxiq, i.e., φ1pxiqpφ1pxiqq: is an identity matrix. Below we will show our main theoretical result: Theorem 1. Assume for any x D, the eigenvalues of φ1pxqφ1pxq T are lower bounded by b1 and upper bounded by b2, 0 ă b1 ď b2. Then: b1p Lxpθq Cq ď Lzpθq ď b2p Lxpθq Cq (11) here C does not depend on θ hence does not affect the optimization. The proof relies on singular value decomposition of φ1pxq and we provide the full proof in Appendix A.1. Theorem 1 states that by minimizing Lxpθq, we are actually narrowing the range of Lzpθq. Hence we want b1 and b2 to be as close as possible, this is the reason why we constrain the condition number of φ1pxqφ1pxq T in Eq. (6). In the very special case where b1 b2, minimizing Lxpθq is just equivalent to minimizing Lzpθq. Note that in loss Lx, the term ||fpxq pφ1pxqq:gθpzq|| can be rewritten as řn j 1 ||fjpxq pφ1pxqq: jgθpzq||, and 1 j PIpxq ||fjpxq pφ1pxqq: jgθpzq|| can be regarded as its unbiased stochastic estimation. Then, we can introduce our loss defined for partial forces: Lx,ppθq 1 p K řK i 1 f Ipxiqpxiq pφ1pxiqq: Ipxiqgθpziq 2 2 (12) Here a constant 1{p is multiplied to Lx,p to guarantee: Ex1, ,x KEIpx1q, ,Ipx Kq Lx,ppθq Ex1, ,x KLxpθq (13) We provide the full proof of Eq. (13) in Appendix A.2. By training with the model with Lx,p, we can use data with partial forces as stochastic estimation to update model parameters. The full training procedure is provided in Algorithm 2. Note that in Algorithm 1 during the data generation, Ipxiq is also sampled from its distribution. Thus, Lx,p is deterministic once the samples txi, f Ipxiqpxiqui 1, ,K are generated. In the limit, the estimation is unbiased: Theorem 2 (informal). Let Lxpθq ELxpθq, θ P arg minθ Lxpθq, θK,p P arg minθ Lx,ppθq, then under certain conditions: LxpθK,pq Lxpθ q a.s. ÝÑ 0 (14) The proof utilized Rademacher complexity and a crucial assumption used in the proof is the uniform boundedness of Lx,p. The formal version of Theorem 2 and the complete proof is provided in Appendix A.3. Theorem 2 theoretically justifies the expected risk LxpθK,pq at the optimal parameter found by Lx,p, converges to the optimal expected risk Lxpθ q as K goes to infinity. 5 Experiments In this section, we experimentally validate the accuracy, force computation efficiency, and robustness through a variety of microscopic dynamics. 5.1 Force Computation Efficiency We first consider a one-dimensional spatiotemporal Predator-Prey system, mainly to validate the correctness and force computations efficiency of our method. Predator-Prey System The simplified form of the Predator-Prey system with diffusion (Murray, 2003) is Bu Bt up1 u vq DB2u Bt avpu bq B2v Bx2 , x P Ω r0, 1s, t P r0, 8q (15) where u, v denote the dimensionless populations of the prey and predator respectively, a, b, D are three parameters. The complex dynamics of Predator-Prey interaction, including the pursuit of the predator and the evasion of the prey in ecosystems, can be described by Eq. (15). We discretize the spatial domain of Eq. (15) into 50 uniform grids with xi pi 1 2q x, x 0.02, 1 ď i ď 50. Let uptq pupx1, tq, , upx50, tqq, vptq pvpx1, tq, , vpx50, tqq, then puptq, vptqq P R100 are treated as the microscopic states. After approximating the spatial derivatives in Eq. (15) with the finite difference method, we consider the semi-discrete equation which is an N 100 dimensional ODE as our microscopic evolution law (see Appendix B.1). We choose the macroscopic observable of interest to be z p u, vq, the spatial average of the predator and the prey s population: i 1 upxi, tq, v 1 i 1 vpxi, tq (16) We find another 2 closure variables using the autoencoder, then the total dimension of the latent space z is 4. We choose D to be the trajectory distribution of the state x. For the data generation with partial forces, given x we randomly choose forces on 100 p microscopic coordinate. For example, if p 1{5, then for each configuration, forces on 20 coordinates are calculated for training. # of training data x, p(p=3/4) x, p(p=1/2) x, p(p=1/4) x, p(p=1/5) Figure 2: Mean relative error on the test dataset of the Predator-Prey system. The black dashed line represents test error 3 ˆ 10 3. Results Fig. 2 shows the results on the test dataset which consists of 100 trajectories (for more detials, see Appendix Table 2). For models trained with partial microscopic forces, we report the equivalent number of training data with full forces throughout the paper. For example, for a model trained with Lx,ppp 1{5q on 3 ˆ 103 data, we will report the number of training data to be 3 ˆ 103 ˆ 0.2 600. The test error is defined to be the mean relative error of the macroscopic observables between the ground truth and the predicted trajectories as in Appendix Eq. (43). First, we observe that the mean relative error of all the models is around 10 4 when the number of training data is large enough. This tells us that training with Lx,p is correct and accurate, which is consistent with Theorem 2. The predicted trajectories fit quite well with the ground truth trajectories (see Appendix Fig. 6 and Fig. 7). We can conclude from Fig. 2 that, under the same number of training data, Lx,p with smaller p p1{4, 1{5q performs better. Similarly, to achieve the same performance, Lx,p with smaller p requires less training data. We set the error tolerance to be etol 3 ˆ 10 3 and investigate how much training data is required to reach the error tolerance. In Fig. 2 the x-coordinate of the intersection point between the black dashed line and the other curves indicates the minimum data size required If we arrange each model according to their minimal required training data, then Lx,ppp 1{5q Lx,ppp 1{4q ă Lx,ppp 1{2q ă Lx,ppp 3{4q ă Lx. Model trained with Lx,ppp 1{4, 1{5q requires less data to reach etol, or equivalently, less force computations. This validates the force computation efficiency of our method. One explanation could be that there are many redundant information in the forces acting on all the microscopic coordinates. By using partial microscopic forces, Lx,p can explore more configurations x given the same size of training data, thus can make use of more useful information. Another observation from Fig. 2 is that as the training data size increases, the gap between models trained with different p narrows down. This is because as more data are provided, these data can contain almost all the information of the Predator-Prey system, thus more information will not lead to significant improvement. 5.2 Robustness to Different Latent Structures Having validated the correctness and force computation efficiency of our method, we are ready to apply our method to a variety of latent structures. We will tackle the Lennard-Jones system in this subsection, and validate the robustness of our method to different latent model structures. Three latent model structures are considered: MLP, Onsager Net (Yu et al., 2021), GFINNs (Zhang et al., 2022). Both the Onsager Net and GFINNs endow the latent dynamical model with certain thermodynamic structure to ensure stability and interpretability. The specific implementations of these two models are slightly different. Lennard-Jones System The Lennard-Jones system is widely used in molecular simulation to study phase transition, crystallization and macroscopic properties of a system (Hansen & Verlet, 1969; Bengtzelius, 1986; Lin et al., 2003; Luo et al., 2004). The Lennard-Jones potential describes the interaction between two atoms i and j through the potential of the following form: Vijprq " 4ϵijrpσij{rq12 pσij{rq6s if r ď rcut, 0 if r ą rcut. (17) Table 1: Summary of the results on each system. Results of the Predator-Prey and Lennard-Jones (small) system are taken from Section 5.1, Section 5.2. For each system, Lz and Lx,p are trained with the same size of data. Micro dim N Observables Latent dim d Partial labels p Lz Lx,p Predator-Prey system 100 u, v 4 1/5 3.19 0.60 ˆ10 3 1.34 0.16 ˆ10 3 Allen-Cahn system 40000 free energy Epvq 16 1/25 6.93 2.80 ˆ10 3 3.98 1.58 ˆ10 3 Lennard-Jones system (small) 4800 temperature T 32 1/16 4.45 2.03 ˆ10 3 1.17 0.18ˆ10 3 Lennard-Jones system (large) 307200 temperature T 32 1/1024 - 4.96 0.56 ˆ10 3 MLP Onsager Net GFINNs x, p(p = 1/16) Figure 3: Results on the Lennard-Jones system with 800 atoms and N 4800. Forces on 50 atoms are used to train Lx,p for all the latent model structures. Each model is trained with ten repeats. All the results in this experiment will be shown in reduced Lennard-Jones units. We consider a threedimensional Lennard-Jones fluid with Natoms 800 atoms of the same type. The microscopic state consists of the positions and velocities of the 800 atoms, thus the microscopic dimension is N 4800. We simulate the Lennard-Jones system under NVE ensemble using LAMMPS (Thompson et al., 2022). We choose the instantaneous temperature (T) as our macroscopic observables: T 2 3p Natoms 1q ˆ miv2 i 2 (18) here vi is the velocity of the i-th atom and mi 1. We find another 31 closure variables using the autoencoder, then the latent dimension is 32. In our experiment, we also adopt the trajectory distribution of microscopic statex for D. For data generation with partial forces, we choose p 1{16. For each x we randomly choose 50 atoms for force computations. Results All the models are trained with the same size of data. Fig. 3 shows the test error on 10 test trajectories. The test errors of using Lx,ppp 1{16q are relatively small ( 10 3), which validates the accuracy of our model on the Lennard-Jones system. It is easy to observe from Fig. 3 that for all the latent model structures, models trained with Lx,p can always outperform those trained with Lz. This validates Lx,p is robust over different latent model structures. 5.3 Robustness to Different Microscopic dynamics We have already validated the accuracy and force computation efficiency of Lx,p on the Predatory Prey system and the Lennard-Jones system, but their microscopic dimension is still not big enough. In this subsection, we focus on the robustness of our method to different systems including those with much larger microscopic dimension. We will consider two large systems: the Allen-Cahn system and a larger Lennard-Jones system with 51200 atoms. Allen-Cahn System The Allen-Cahn equation is widely used to model the phase transition process in binary mixtures (Allen & Cahn, 1979; Del Pino et al., 2008; Shen & Yang, 2010; Kim et al., 2021; Yang et al., 2023). We consider the 2-dimensional Allen-Cahn equation with zero Neumann boundary condition on a bounded domain: ϵ2 F 1pvq on Ω r0, 1s ˆ r0, 1s Bnv 0 on BΩ, (19) where vp 1 ď v ď 1q denotes the difference of the concentration of the two phases. Fpvq is usually chosen to be the double potential taking the form of Fpvq 1 4pv2 1q2. The Allen-Cahn equation is the L2 gradient flow of the free energy functional Epvq P R in Eq. (20) (Bartels, 2015). The free energy functional z Epvq is a macroscopic observable of wide interest, hence we choose Epvq as our target macroscopic observable. The spatial domain is discretized into 200 ˆ 200 grids, then the dimension of the microscopic system is N 40000. We find another 31 closure variables using the autoencoder hence the total dimension of the macroscopic system is 32, which is much smaller compared to the dimension of the microscopic system. We consider D to be the trajectory distribution of x. We choose p 1{25, each time the forces on 1600 grids are calculated for training Lx,p. Lennard-Jones System (large) To further demonstrate the capacity of our method, we scale up the Lennard-Jones system in Section 5.2 to encompass 51200 atoms, then N 307200. The size of the simulation box is increased from 10 ˆ 10 ˆ 10 to 40 ˆ 40 ˆ 40 to keep the density unchanged. Results For a summary of the experiments and the results, we refer the reader to Table 1. Note that for the Lennard-Jones system (large), we still use the forces on 50 atoms for training, thus p 1{1024. For the training of Lx,ppp 1{1024q, only 5000 configurations with partial forces are used due to memory limit, which is equivalent to 5000{1024 5 training data with forces on all the atoms. Obviously, training data with size 5 is way too small, hence the results of Lz are not reported for this system. From the results shown in Table 1, one can observe that for a variety of problems, including those modeled by partial differential equations or molecular dynamics simulations, Lx,p can always outperform Lz. This shows the robustness of our method to a variety of systems. Moreover, the success of our method on the Lennard-Jones system (large) demonstrates the ability and efficiency of our method when scaled to very large systems. 0 5000 10000 15000 20000 # of atoms # of force computations Figure 4: Number of force computations required to achieve etol 3ˆ10 3 on Lennard Jones system with different sizes. Forces on 50 atoms are used to train Lx,p for systems of different sizes. We also compare the number of force computations that are required for models trained with Lz and Lx,p to reach test error etol 3 ˆ 10 3. Fig. 4 shows the results on the Lennard-Jones system with different sizes. Lennard-Jones system with 800, 2700, 6400, 21600 atoms are considered and the density is 0.8 for all the systems. The number of force computations here refers to the total number of forces on atoms that are used. For example, if Lx,p uses 100 configurations to train, and for each configuration, forces on 50 atoms are calculated, then the number of force computations would be 100 ˆ 50 5000. From Fig. 4 we can observe that as the system size increases, the number of force computations required by Lz continues increasing. In the experiment, we find that the number of training data does not change a lot. The increase in the number of force computations is mainly due to the system size increases, then for each configuration, more force computations are required. However, the number of force computations even decreases a bit. One possible explanation is that, as the system size increases, the dynamics become less fluctuating and are easy to learn. 6 Conclusion We present a framework for modeling the dynamics of macroscopic observables from partial computation of microscopic forces. We theoretically and experimentally demonstrate the accuracy, force computation efficiency, and robustness of our method through different problems. Finally, we apply our method to a very large Lennard-Jones system which contains 51200 atoms. While our method can learn the macroscopic dynamics from partial computation of microscopic forces, it relies on the sparsity assumption. For systems that do not satisfy the sparsity assumption, the calculation of partial computation of microscopic forces is not efficient, thus it is not beneficial to learn from partial forces computation. For example, in the Mc Kean-Vlasov system, the force on each microscopic coordinate depends on the collective behavior of all the other coordinates (Méléard, 1996). Another limitation is the structure of the autoencoder. For particle systems such as the Lennard-Jones system, an ideal encoder should be permutation-invariant. Currently, we use MLP for the encoder, which can be improved. Additionally, our method assumes the microscopic state to be sampled from a distribution D. We choose D to be trajectory distribution in the experiments, but in reality trajectory distribution of large systems may be impossible to obtain. Active learning is commonly applied to efficiently select microscopic configurations for training (Ang et al., 2021; Zhang et al., 2019; Farache et al., 2022; Kulichenko et al., 2023; Duschatko et al., 2024). It is of interest to combine active learning and our proposed method to overcome the difficulty of the choice of D. Furthermore, our method assumes the microscopic dynamics to be deterministic, another future direction could be generalizing our method to stochastic systems. Acknowledgments This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG3-RP-2022-028). Michael P Allen et al. Introduction to molecular dynamics simulation. Computational soft matter: from synthetic polymers to proteins, 23(1):1 28, 2004. Samuel M Allen and John W Cahn. A microscopic theory for antiphase boundary motion and its application to antiphase domain coarsening. Acta metallurgica, 27(6):1085 1095, 1979. Shi Jun Ang, Wujie Wang, Daniel Schwalbe-Koda, Simon Axelrod, and Rafael Gómez-Bombarelli. Active learning accelerates ab initio molecular dynamics on reactive energy surfaces. Chem, 7(3): 738 751, 2021. Ibrahim Ayed, Emmanuel de Bézenac, Arthur Pajot, Julien Brajard, and Patrick Gallinari. Learning dynamical systems from partial observations. ar Xiv preprint ar Xiv:1902.11136, 2019. Joseph Bakarji, Kathleen Champion, J Nathan Kutz, and Steven L Brunton. Discovering governing equations from partial measurements with deep delay autoencoders. ar Xiv preprint ar Xiv:2201.05136, 2022. Sören Bartels. The Allen Cahn Equation, pp. 153 182. Springer International Publishing, Cham, 2015. U Bengtzelius. Dynamics of a lennard-jones system close to the glass transition. Physical Review A, 34(6):5059, 1986. Kathleen Champion, Bethany Lusch, J Nathan Kutz, and Steven L Brunton. Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences, 116(45): 22445 22451, 2019. Xiaoli Chen, Beatrice W Soh, Zi-En Ooi, Eleonore Vissol-Gaudin, Haijun Yu, Kostya S Novoselov, Kedar Hippalgaonkar, and Qianxiao Li. Constructing custom thermodynamics using deep learning. Nature Computational Science, 4(1):66 85, 2024. Steven Dajnowicz, Garvit Agarwal, James M Stevenson, Leif D Jacobson, Farhad Ramezanghorbani, Karl Leswing, Richard A Friesner, Mathew D Halls, and Robert Abel. High-dimensional neural network potential for liquid electrolyte simulations. The Journal of Physical Chemistry B, 126(33): 6271 6280, 2022. Manuel Del Pino, Michał Kowalczyk, and Juncheng Wei. The toda system and clustering interfaces in the allen cahn equation. Archive for rational mechanics and analysis, 190(1):141 187, 2008. Blake R Duschatko, Jonathan Vandermause, Nicola Molinari, and Boris Kozinsky. Uncertainty driven active learning of coarse grained free energy models. npj Computational Materials, 10(1):9, 2024. David E Farache, Juan C Verduzco, Zachary D Mc Clure, Saaketh Desai, and Alejandro Strachan. Active learning and molecular dynamics simulations to find high melting temperature alloys. Computational Materials Science, 209:111386, 2022. Stefania Fresca, Andrea Manzoni, Luca Dedè, and Alfio Quarteroni. Deep learning-based reduced order models in cardiac electrophysiology. Plo S one, 15(10):e0239416, 2020. William D Fries, Xiaolong He, and Youngsoo Choi. Lasdi: Parametric latent space dynamics identification. Computer Methods in Applied Mechanics and Engineering, 399:115436, 2022. Xiang Fu, Tian Xie, Nathan J Rebello, Bradley Olsen, and Tommi S Jaakkola. Simulate timeintegrated coarse-grained molecular dynamics with multi-scale graph networks. Transactions on Machine Learning Research, 2023. Jürgen Hafner, Christopher Wolverton, and Gerbrand Ceder. Toward computational materials design: the impact of density functional theory on materials research. MRS bulletin, 31(9):659 668, 2006. Jean-Pierre Hansen and Loup Verlet. Phase transitions of the lennard-jones system. physical Review, 184(1):151, 1969. Quercus Hernandez, Alberto Badias, David Gonzalez, Francisco Chinesta, and Elias Cueto. Deep learning of thermodynamics-aware reduced-order models from data. Computer Methods in Applied Mechanics and Engineering, 379:113763, 2021. Zijie Huang, Yizhou Sun, and Wei Wang. Learning continuous system dynamics from irregularlysampled partial observations. Advances in Neural Information Processing Systems, 33:16177 16187, 2020. Brooke E Husic, Nicholas E Charron, Dominik Lemm, Jiang Wang, Adrià Pérez, Maciej Majewski, Andreas Krämer, Yaoyi Chen, Simon Olsson, Gianni de Fabritiis, et al. Coarse graining molecular dynamics with graph neural networks. The Journal of chemical physics, 153(19), 2020. Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, Weinan E, and Linfeng Zhang. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. In SC20: International conference for high performance computing, networking, storage and analysis, pp. 1 14. IEEE, 2020. Ioannis G Kevrekidis, C William Gear, James M Hyman, Panagiotis G Kevrekidis, Olof Runborg, Constantinos Theodoropoulos, et al. Equation-free, coarse-grained multiscale computation: enabling microscopic simulators to perform system-level analysis. Commun. Math. Sci, 1(4):715 762, 2003. Yongho Kim, Gilnam Ryu, and Yongho Choi. Fast and accurate numerical solution of allen cahn equation. Mathematical Problems in Engineering, 2021:1 12, 2021. Maksim Kulichenko, Kipton Barros, Nicholas Lubbers, Ying Wai Li, Richard Messerly, Sergei Tretiak, Justin S Smith, and Benjamin Nebgen. Uncertainty-driven dynamics for active learning of interatomic potentials. Nature Computational Science, 3(3):230 239, 2023. Kookjin Lee and Kevin T Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics, 404:108973, 2020. Seungjoon Lee, Mahdi Kooshkbaghi, Konstantinos Spiliotis, Constantinos I Siettos, and Ioannis G Kevrekidis. Coarse-scale pdes from fine-scale observations via machine learning. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(1), 2020. Shiang-Tai Lin, Mario Blanco, and William A Goddard III. The two-phase model for calculating thermodynamic properties of liquids from molecular dynamics: Validation for the phase diagram of lennard-jones fluids. The Journal of chemical physics, 119(22):11792 11805, 2003. Ping Liu, Giovanni Samaey, C William Gear, and Ioannis G Kevrekidis. On the acceleration of spatially distributed agent-based computations: A patch dynamics scheme. Applied Numerical Mathematics, 92:54 69, 2015. Peter Y Lu, Joan Ariño Bernad, and Marin Soljaˇci c. Discovering sparse interpretable dynamics from partial observations. Communications Physics, 5(1):206, 2022. Sheng-Nian Luo, Alejandro Strachan, and Damian C Swift. Nonequilibrium melting and crystallization of a model lennard-jones system. The Journal of chemical physics, 120(24):11640 11649, 2004. Zhaolong Luo, Xinming Qin, Lingyun Wan, Wei Hu, and Jinlong Yang. Parallel implementation of large-scale linear scaling density functional theory calculations with numerical atomic orbitals in honpas. Frontiers in Chemistry, 8:589910, 2020. Sylvie Méléard. Asymptotic behaviour of some interacting particle systems; Mc Kean-Vlasov and Boltzmann models, pp. 42 95. Springer Berlin Heidelberg, Berlin, Heidelberg, 1996. ISBN 978-3-540-68513-5. doi: 10.1007/BFb0093177. URL https://doi.org/10.1007/ BFb0093177. JD Murray. Multi-species waves and practical applications. Mathematical Biology: II: Spatial Models and Biomedical Applications, pp. 1 70, 2003. Albert Musaelian, Simon Batzner, Anders Johansson, and Boris Kozinsky. Scaling the leading accuracy of deep equivariant models to biomolecular simulations of realistic size. In SC23: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1 12. IEEE, 2023. Said Ouala, Duong Nguyen, Lucas Drumetz, Bertrand Chapron, Ananda Pascual, Fabrice Collard, Lucile Gaultier, and Ronan Fablet. Learning latent dynamics for partially observed chaotic systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(10), 2020. Jun Sur Richard Park, Siu Wun Cheung, Youngsoo Choi, and Yeonjong Shin. tlasdi: Thermodynamicsinformed latent space dynamics identification. ar Xiv preprint ar Xiv:2403.05848, 2024. David Ruelle and Floris Takens. On the nature of turbulence. Les rencontres physiciensmathématiciens de Strasbourg-RCP25, 12:1 44, 1971. Giovanni Samaey, Ioannis G Kevrekidis, and Dirk Roose. Patch dynamics with buffers for homogenization problems. Journal of Computational Physics, 213(1):264 287, 2006. Tim Sauer, James A Yorke, and Martin Casdagli. Embedology. Journal of statistical Physics, 65: 579 616, 1991. Wilhelmus HA Schilders, Henk A Van der Vorst, and Joost Rommes. Model order reduction: theory, research aspects and applications, volume 13. Springer, 2008. Andreas Schlaginhaufen, Philippe Wenk, Andreas Krause, and Florian Dorfler. Learning stable deep dynamics models for partially observed or delayed dynamical systems. Advances in Neural Information Processing Systems, 34:11870 11882, 2021. Jie Shen and Xiaofeng Yang. Numerical approximations of allen-cahn and cahn-hilliard equations. Discrete Contin. Dyn. Syst, 28(4):1669 1691, 2010. George Stepaniants, Alasdair D Hastewell, Dominic J Skinner, Jan F Totz, and Jörn Dunkel. Discovering dynamics and parameters of nonlinear oscillatory and chaotic systems from partial observations. ar Xiv preprint ar Xiv:2304.04818, 2023. Floris Takens. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980: proceedings of a symposium held at the University of Warwick 1979/80, pp. 366 381. Springer, 2006. A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comp. Phys. Comm., 271:108171, 2022. doi: 10.1016/j.cpc.2021.108171. Katharina Vollmayr-Lee. Introduction to molecular dynamics simulations. American Journal of Physics, 88(5):401 422, 2020. Martin J. Wainwright. Uniform laws of large numbers, pp. 98 120. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. Jiang Wang, Simon Olsson, Christoph Wehmeyer, Adrià Pérez, Nicholas E Charron, Gianni De Fabritiis, Frank Noé, and Cecilia Clementi. Machine learning of coarse-grained molecular dynamics force fields. ACS central science, 5(5):755 767, 2019. Junxiang Yang, Yibao Li, Chaeyoung Lee, Yongho Choi, and Junseok Kim. Fast evolution numerical method for the allen cahn equation. Journal of King Saud University-Science, 35(1):102430, 2023. Haijun Yu, Xinyuan Tian, Weinan E, and Qianxiao Li. Onsagernet: Learning stable and interpretable dynamics using a generalized onsager principle. Physical Review Fluids, 6(11):114402, 2021. Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, et al. Deepcg: Constructing coarse-grained models via deep neural networks. The Journal of chemical physics, 149(3), 2018. Linfeng Zhang, De-Ye Lin, Han Wang, Roberto Car, and Weinan E. Active learning of uniformly accurate interatomic potentials for materials simulation. Physical Review Materials, 3(2):023804, 2019. Zhen Zhang, Yeonjong Shin, and George Em Karniadakis. Gfinns: Generic formalism informed neural networks for deterministic and stochastic dynamical systems. Philosophical Transactions of the Royal Society A, 380(2229):20210207, 2022. Kun Zhou and Bo Liu. Chapter 2 - potential energy functions. In Kun Zhou and Bo Liu (eds.), Molecular Dynamics Simulation, pp. 41 65. Elsevier, 2022. Let x px1, , xnq P X Ă RN, y fpxq P Y Ă RN, θ P Θ, here Θ is the set of our model parameters. Write DK txi, yiui 1, ,K , then Lzpθq, Lxpθq can be written as: px,yq PDK φ1pxqy gθpzq 2 2 px,yq PDK y pφ1pxqq:gθpzq 2 2 In our method, we map the training procedure from the macroscopic coordinates to microscopic coordinates and use partial computation of the microscopic forces to train. We treat Lzpθq as baseline and give detailed theoretical analysis of the possible error introduced by using Lx,ppθq. The error can be controlled by two parts: (i) Convert loss from z space to x space, i.e., the error introduced by using Lxpθq (ii) Use partial labels, i.e., the error between Lx,ppθq and Lxpθq. We will analyze the first part and the second part of the error accordingly (Appendix A). More experimental details are provided in Appendix B. A Theoretical Analysis A.1 Proof of Theorem 1 Theorem. Assume for any x D, the eigenvalues of φ1pxqφ1pxq T are lower bounded by b1 and upper bounded by b2, 0 ă b1 ď b2. Then: b1p Lxpθq Cq ď Lzpθq ď b2p Lxpθq Cq (22) here C does not depend on θ hence does not affect the optimization. Proof. We can write φ1pxq in the following form by leveraging singular value decomposition: φ1pxq UΣVT , (23) where φ1pxq P RdˆN, Σ P RdˆN is a rectangular diagonal matrix, U P Rdˆd and V P RNˆN are two orthogonal matrices, d ! N. U, Σ, V actually depends on x but for simplicity we omit the dependence in notation. During the training of the autoencoder, we enforce φ1pxq to have full row rank, then the diagonal items are all nonzero, i.e., Σi,i λipxq 0, i 1, , d. We define Σ: P RNˆd to be the rectangular diagonal matrix such that the diagonal items are pΣ:qi,i λ 1 i pxq 0, i 1, , d and all the remaining items are zero. Actually Σ: is the Moore-Penrose inverse of Σ. Then pφ1pxqq: can be calculated by: pφ1pxqq: VΣ:UT (24) If we denote the i-th column of V by vi, the i-th column of U by ui, then V pv1, , v Nq, U pu1, , udq, and we can rewrite Lzpθq and Lxpθq: px,yq PDK φ1pxqy gθpzq 2 2 1 px,yq PDK UΣVT y gθpzq 2 2 1 px,yq PDK UΣVT y UUT gθpzq 2 2 1 px,yq PDKpΣVT y UT gθpzqq T UT UpΣVT y UT gθpzqq px,yq PDK ΣVT y UT gθpzq 2 2 px,yq PDK λipxqv T i y u T i gθpzq 2 2 K ř px,yq PDK y pφ1pxqq:gθpzq 2 2 1 px,yq PDK y VΣ:UT gθpzq 2 2 1 px,yq PDK VVT y VΣ:UT gθpzq 2 2q px,yq PDKp VT y Σ:UT gθpzqq T VT Vp VT y Σ:UT gθpzq px,yq PDK VT y Σ:UT gθpzq 2 2 px,yq PDK v T i y λ 1 i pxqu T i gθpzq 2 2 1 K řN i d 1 ř px,yq PDK v T i y 2 2 px,yq PDK λ 2 i pxq λipxqv T i y u T i gθpzq 2 2 1 K řN i d 1 ř px,yq PDK v T i y 2 2 (26) We define: ˆLxpθq 1 px,yq PDK λ 2 i pxq λipxqv T i y u T i gθpzq 2 2 K řN i d 1 ř px,yq PDK v T i y 2 2 (27) then Lxpθq ˆLxpθq C, C does not depend on θ and: min θ Lxpθq ðñ min θ ˆLxpθq. (28) Comparing Lzpθq and ˆLxpθq, we observe that the only difference between Lzpθq and ˆLxpθq is that for every term λipxqv T i fpxq u T i gθpzq 2 2, there is a constant λ 2 i pxq multiplied to it. Hence ˆLxpθq is a weighted version of Lzpθq. Note that if the eigenvalues of φ1pxq are λipxq, i 1, , d, then the eigenvalues of φ1pxqφ1pxq T are λ2 i pxq, i 1, , d. Since the eigenvalues of φ1pxqφ1pxq T are lower bounded by b1 ą 0 and upper bounded by b2, i.e., @x P X, 0 ă b1 ď λ2 i pxq ď b2, then: b 1 2 Lzpθq ď Lxpθq C ď b 1 1 Lzpθq (29) or equivalently, b1p Lxpθq Cq ď Lzpθq ď b2p Lxpθq Cq (30) By minimizing ˆLxpθq, we are actually narrowing the region of Lzpθq. Hence we want b1 and b2 to be as close as possible. In the extreme case where b1 b2, minimizing ˆLxpθq is just equivalent to minimizing Lzpθq. Another observation is that ˆLxpθq is a weighted version of Lxpθq, and if there exists i such that λipxq2 is too small compared to the others, the weighted sum will be dominated by it in Eq. (27). The above insights guide us to constrain the condition number of φ1pxqφ1pxq T during the training of autoencoder, i.e., we require φ1pxqφ1pxq T to be well-conditioned through Eq. (6). A.2 Proof of Eq. (13) Ex1, ,x KEIpx1q, ,Ipx Kq Lx,ppθq E 1 p K řK i 1 f Ipxiqpxiq pφ1pxiqq: Ipxiqgθpziq 2 2 p Ex EIpxq f Ipxqpxq pφ1pxqq: Ipxqgθpzq 2 2 p Ex EIpxq řn i 1 Iipxq fipxq pφ1pxqq: igθpzq 2 2 p Ex řn i 1 EIpxq Iipxq fipxq pφ1pxqq: igθpzq 2 2 Ex řn i 1 fipxq pφ1pxqq: igθpzq 2 2 Ex fpxq pφ1pxqq:gθpzq 2 2 Ex1, ,x KLxpθq A.3 Proof of Theorem 2 In this section we will prove the behavior of the minimizer found by Lx,p in the limit. Our proof relies on the statistical learning theory and especially Rademacher complexity. We will provide some background information first. Let H be a family of real-valued functions with domain W and integrable w.r.t. P, here P is a probability over W. Wn pw1, , wnq is a collection of i.i.d. samples from probability distribution P defined over W. We will use the tool of Rademacher complexity: Definition 1. Let H, Wn, P be defined as before. The empirical Rademacher complexity of H with respect to Wn is defined as: i 1 σihpwiq σ pσ1, , σnq, tσiun i 1 are independent random variables uniformly chosen from t 1, 1u, with Ppσi 1q Ppσi 1q 0.5. Taking the expectation with respect to Wn yields the Rademacher complexity of the functional class H: Rnp Hq EWn Eσ i 1 σihpwiq Then one can derive the generalization bound in terms of the Rademacher complexity (Wainwright, 2019): Theorem 3. Assume H is uniformly bounded by b (i.e., f 8 ď b. Then for all n ě 1 and δ ě 0, we have i 1 hpwiq Erhs ˇˇˇˇˇ ď 2Rnp Hxq δ (34) with probability at least 1 2 exp nδ2 8b2 . Consequently, as long as Rnp Hxq op1q, we have 1 n řn i 1 fp Xiq Erfs a.s. ÝÑ 0, @f P Hx. Now in our problem, let Hx,p be the following one function class: Hx,p thθ,p : X ˆ Y ˆ I Ñ R; hθ,ppx, y, Ipxqq 1 p y Ipxq pφ1pxqq: Ipxqgθpzq 2 2, θ P Θu (35) px,yq PDK hθ,ppx, y, Ipxqq (36) Now we can prove Theorem 2: Theorem. Let Lxpθq ELxpθq, θ P arg minθ Lxpθq, θK,p P arg minθ Lx,ppθq, if Hx,p is uniformly bounded by bx,p and RKp Hx,pq op1q, then: LxpθK,pq Lxpθ q a.s. ÝÑ 0 (37) Proof. We define Lx,ppθq ELx,ppθq, then Lx Lx,p by Appendix A.2. Applying Theorem 3, we get ˇˇˇLx,ppθq Lx,ppθq ˇˇˇ ď 2RKp Hx,pq δ (38) with probability at least 1 2 expp Kδ2 8bx,p q, and Lx,ppθq a.s. ÝÑ Lx,ppθq, @θ P Θ. Note that θ P arg minθ Lxpθq P arg minθ Lx,ppθq, then 0 ď LxpθK,pq Lxpθ q Lx,ppθK,pq Lx,ppθK,pq loooooooooooooomoooooooooooooon Lx,ppθK,pq Lx,ppθ q looooooooooooomooooooooooooon Lx,ppθ q Lx,ppθ q loooooooooooomoooooooooooon thus Lx,ppθK,pq Lx,ppθ q a.s. ÝÑ 0. B Experiment Details 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75 1.00 0.00 (a) Predator-Prey system (b) Allen-Cahn system (c) Lennard-Jones system Figure 5: Visualization of the microscopic state of each system B.1 Predator-Prey System We consider the Neumann boundary condition Bnu 0, Bnv 0 on BΩand the following initial conditions upx, 0q µ σ cosp5πxq vpx, 0q 1 µ σ cosp5πxq, pµ, σq P r0, 0.2s ˆ r0.4, 0.6s (40) The Neumann boundary condition is commonly used in mathematical models of ecosystems, restricting any movement of the species out of the boundary. We approximate the spatial derivatives in Eq. (15) with finite difference method: B2v Bx2 pxi, tq vpxi 1, tq 2vpxi, tq vpxi 1, tq x2 2 ď i ď 49 B2v Bx2 px1, tq vpx2, tq vpx1, tq B2v Bx2 px50, tq vpx49, tq vpx50, tq Let hupu, vq up1 u vq, hvpu, vq avpu bq, and hupu, vq, hvpu, vq denote the element-wise application of hu, hv to each upxi, tq, vpxi, tq, 1 ď i ď 50. Then du dt hupu, vq dt hvpu, vq Av (42) here A P R50ˆ50 is a matrix defined according to Eq. (42). Hence the predator-prey system after spatial discretization can be written in the form of Eq. (1). In our experiment, we choose a 3, b 0.4, λ 0. The training parameter set Ttrain of pairs pµ, σq are sampled uniformly from r0, 0.2s ˆ r0.4, 0.6s. For testing, Ttest is also sampled uniformly from r0, 0.2s ˆ r0.4, 0.6s , but with a different random seed from Ttrain. The mean relative error is defined as : ep Ttestq 1 |Ttest| pµ,σq PTtest n z trueptn; µ, σq z predptn; µ, σq 2 2 ř n z trueptn; µ, σq 2 2 here we use zp ; µ, σq to denote the dependency of the solution on the initial condition. Table 2: Results on the Predator-Prey system. Models are trained with different training metric Lx, Lx,ppp 3{4, 1{2, 1{4, 1{5q. Mean and standard deviation are reported over three repeats. # of training data Lx Lx,ppp 3{4q Lx,ppp 1{2q Lx,ppp 1{4q Lx,ppp 1{5q 6.0 ˆ 102 1.09 0.57 ˆ10 2 1.79 1.11 ˆ10 2 9.22 4.46ˆ10 3 3.50 1.33 ˆ10 3 4.00 1.78 ˆ10 3 1.5 ˆ 103 4.70 0.46 ˆ10 3 4.56 1.97ˆ10 3 3.30 1.60 ˆ10 3 2.19 0.57ˆ10 3 1.92 0.47ˆ10 3 3.0 ˆ 103 2.90 1.63 ˆ10 3 2.07 0.41 ˆ10 3 1.57 0.45 ˆ10 3 1.23 0.16 ˆ10 3 1.34 0.20 ˆ10 3 6.0 ˆ 103 1.24 0.16ˆ10 3 1.07 0.18 ˆ10 3 1.46 0.59 ˆ10 3 9.08 0.25 ˆ10 4 1.13 0.31 ˆ10 3 1.2 ˆ 104 9.20 2.30 ˆ10 4 8.13 0.64 ˆ10 4 8.23 0.13 ˆ10 4 7.66 3.14 ˆ10 4 8.36 1.94 ˆ10 4 2.4 ˆ 104 6.76 0.93 ˆ10 4 6.85 0.95ˆ10 4 6.98 1.20ˆ10 4 5.64 0.18 ˆ10 4 5.70 1.00 ˆ10 4 Data Generation The microscopic equation is solved with a uniform time step t 0.01 from t 0 to t 30 using the Euler method. We subsampled every tenth snapshot for training. During testing, the microscopic evolution equation is solved with the same t 0.01 using Runge-Kutta 4-order RK4 solver. Then we encode the microscopic trajectories to obtain the ground truth latent trajectories. The predicted latent trajectories are obtained by encoding the initial microscopic state first, then solved using RK4 solver with t 0.1, 0.5 on r0, 30s. Fig. 6 and Fig. 7 show the true and predicted trajectories. 0 5 10 15 20 25 30 t Predictions of u x, p (p = 1/5) Ground truth 0 5 10 15 20 25 30 t Predictions of v x, p (p = 1/5) Ground truth 0.35 0.40 0.45 0.50 u Predictions in the phase plane x, p (p = 1/5) Ground truth 0 5 10 15 20 25 30 t x, p (p = 1/5) Figure 6: Latent trajectories with initial condition µ 0.02, σ 0.52 and t 0.1 in the Predator Prey system. 0 5 10 15 20 25 30 t Predictions of u x, p (p = 1/5) Ground truth 0 5 10 15 20 25 30 t Predictions of v x, p (p = 1/5) Ground truth 0.35 0.40 0.45 0.50 u Predictions in the phase plane x, p (p = 1/5) Ground truth 0 5 10 15 20 25 30 t x, p (p = 1/5) Figure 7: Latent trajectories with initial condition µ 0.02, σ 0.52 and t 0.5 in the Predator Prey system. B.2 Allen-Cahn System In our experiment, we consider the initial condition of a torus (Kim et al., 2021): vpx, y, 0q 1 tanhpr1 dpx, yq ? 2ϵ q tanhpr2 dpx, yq ? here dpx, yq a px 0.5q2 py 0.5q2, r1 P r0.3, 0.4s is the circumscribed circle radius and r2 P r0.1, 0.15s is the inscribed circle radius. The initial condition is visualized in Fig. 5 (b). The free energy in Eq. (20) also tends to decrease with time, following the energy dissipation law in Eq. (45). Then the minimization of the free energy drives the evolution of the system towards equilibrium. BEpvq µ Btv 2 2 dx dy (45) Data Generation For both training and testing, the microscopic evolution law is solved using RK4 method with t 1{N 2.5 ˆ 10 5 from t 0 to t minptf, 1q.Here tf is the time when the Allen-Cahn system reaches the equilibrium. We subsample every hundredth snapshots for training. We choose ϵ in Eq. (19) and Eq. (44) to be 10ˆ200 2 ? 2 tanh 1p0.9q as in (Kim et al., 2021). For testing, 50 parameter points pr1, r2q are chosen uniformly from r0.3, 0.4s ˆ r0.1, 0.15s, and we report the mean relative error. The test parameter set Ttest contains 50 parameter points pr1, r2q are chosen uniformly from r0.3, 0.4s ˆ r0.1, 0.15s, but are sampled with a different random seed. B.3 Lennard-Jones System We consider Lennard-Jones systems containing different atoms in this paper: Natoms 800, 2700, 6400, 21600, 51200. We use periodic boundary conditions and fix the density to be 0.8, then the corresponding box side lengths are 10, 15, 20, 30, 40. We simulate the Lennard-Jones system under the NVE ensemble using the LAMMPS (Thompson et al., 2022). In our experiment, ϵij σij 1, @i, j , and rcut 2.5. The integration step is 0.001 and each trajectory is integrated for 250 steps. We sample the initial temperature randomly from r0.5, 1.5s. The initial velocities are then sampled from the Maxwell Boltzmann distribution. For each system, the initial configuration has the same atom positions and velocity direction. For testing, the initial temperatures are also randomly sampled from r0.5, 1.5s but with a different random seed to the training data. C Implementation Details All the experiments are run on a single NVIDIA Ge Force RTX 3090 GPU. For all the experiments, we use the multilayer perceptron (MLP) for both the encoder and the decoder. The autoencoders are trained with LAE in Eq. (7). The condition number is the maximal eigenvalue λmax divided by the minimal eigenvalue λmin of φ1pxqφ1pxq T : κ φ1pxqφ1pxq T |λmaxpφ1pxqφ1pxq T q| |λminpφ1pxqφ1pxq T q| ě 1 (46) Since φ1pxq P RdˆN, φ1pxqφ1pxq T P Rdˆd, and d is small, the condition number κpφ1pxqφ1pxq T q can be calculated efficiently. In our experiments, we calculate λmax and λmin with torch.linalg.svd. To better compare Lx and Lx,p, once finish the training of the autoencoders, we freeze them and use the encoder for macroscopic dynamics identification. For the macroscopic dynamics identification, MLP and GFINNs are used for the latent model in Section 5.2. For the rest experiments, we adopt the structure of Onsager Net to enhance the stability for latent dynamics prediction for gθ (Yu et al., 2021). D Additional Experiments D.1 Loss Curve of Lz, Lx, Lx,p To give the readers a better idea of the behaviors of the loss Lz, Lx, Lx,ppp 1{4q trained on the same number of training data. Fig. 8 shows the training and test loss curve of different training metrics. Note that the training metrics of these models are different, but they are tested with the same metric Eq. (43). D.2 Ablation Analysis of λcond To evaluate the influence of the hyperparameter λcond on the performance of the loss Lx,p, we conduct experiments with different values of λcond and show the test error in Table 3. From Table 3 we can observe when λcond increases from 0 to 10 6, the test error gradually decrease. When λcond further increases from 10 6 to 10 2, the test error gradually decrease. Among all the λcond that we tried, the test error has the minimal value when λcond 10 6. Theoretically, if λcond is too low, since LAE Lrec λcond Lcond, there may not have enough constraint on Lcond and the condition number of φ1pxqφ1pxq T may be very large. By Theorem 1, 0 500 1000 1500 2000 Epoch training loss z 0 500 1000 1500 2000 Epoch training loss x 0 500 1000 1500 2000 Epoch training loss x, p 0 500 1000 1500 2000 Epoch Figure 8: Loss curve of the Lz, Lx, Lx,ppp 1{4q on the Predator-Prey system. Models are trained with different loss functions Lz, Lx, Lx,ppp 1{4q on the same number of training data. Table 3: Results on the Predator-Prey system. We train the autoencoder with different λcond and then train the macroscopic dynamics model with loss Lx,ppp 1{5q. The mean relative error of the macroscopic dynamics model is reported over three repeats. λcond 0 10 8 10 7 10 6 10 5 10 4 10 2 Test error of Lx,ppp 1{5q 6.42 ˆ 10 2 2.62 ˆ 10 3 2.84 ˆ 10 3 8.36 ˆ 10 4 3.66 ˆ 10 3 2.60 ˆ 10 3 4.24 ˆ 10 3 a small condition number of φ1pxqφ1pxq T can guarantee the effectiveness of Lx. But when the condition number is large, there is no guarantee that Lx thus Lx,p can perform well. If instead λcond is too large, LAE will be dominated by Lcond.Then the autoencoder may not reconstruct the microscopic dynamics well and, hence may not capture the closure terms well. If the latent space is not closed enough, we can not learn the macroscopic dynamics well. Neur IPS Paper Checklist Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes] Justification: The main claim clearly reflect our contributions and scope both in theory and experiments. Guidelines: The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: In Section 6 we discuss the limitations. Guidelines: The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations. 3. Theory Assumptions and Proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [Yes] Justification: Full set of assumptions and complete proof of our theoretical results are provided in Appendix A. Guidelines: The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: We explain our method in details in Section 4.2, Algorithm 1 and Algorithm 2. Guidelines: The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Our code is available at https://github.com/MLDS-NUS/ Learn-Partial Guidelines: The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: Details of the experiment setting is provided in Section 5. Guidelines: The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: We report the error bar for all the experiments. Guidelines: The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments Compute Resources Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: We mention in Appendix C. Guidelines: The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper). 9. Code Of Ethics Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: Our research conform with the Neur IPS Code of Ethics. Guidelines: The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: no societal impacts Guidelines: The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML). 11. Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: no such risks Guidelines: The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort. 12. Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: Existing assets are properly cited. Guidelines: The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators. 13. New Assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [Yes] Justification: The documentation is provided. Guidelines: The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file. 14. Crowdsourcing and Research with Human Subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: Our research is not involved in such subjects. Guidelines: The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: Our research is not involved in such subjects. Guidelines: The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.