# neural_diffeomorphic_nonuniform_bspline_flows__dd60c4bd.pdf Neural Diffeomorphic Non-uniform B-spline Flows Seongmin Hong1, Se Young Chun1,2* 1 Department of Electrical and Computer Engineering, Seoul National University, Republic of Korea 2 INMC, Interdisciplinary Program in AI, Seoul National University, Republic of Korea {smhongok, sychun}@snu.ac.kr Normalizing flows have been successfully modeling a complex probability distribution as an invertible transformation of a simple base distribution. However, there are often applications that require more than invertibility. For instance, the computation of energies and forces in physics requires the second derivatives of the transformation to be well-defined and continuous. Smooth normalizing flows employ infinitely differentiable transformation, but with the price of slow nonanalytic inverse transforms. In this work, we propose diffeomorphic non-uniform B-spline flows that are at least twice continuously differentiable while bi-Lipschitz continuous, enabling efficient parametrization while retaining analytic inverse transforms based on a sufficient condition for diffeomorphism. Firstly, we investigate the sufficient condition for Ck 2-diffeomorphic non-uniform kth-order B-spline transformations. Then, we derive an analytic inverse transformation of the non-uniform cubic B-spline transformation for neural diffeomorphic non-uniform B-spline flows. Lastly, we performed experiments on solving the force matching problem in Boltzmann generators, demonstrating that our C2-diffeomorphic non-uniform B-spline flows yielded solutions better than previous spline flows and faster than smooth normalizing flows. Our source code is publicly available at https://github.com/smhongok/Non-uniform-B-spline-Flow. Introduction Normalizing flows (Rezende and Mohamed 2015; Papamakarios et al. 2021) model complex probability distributions. Normalizing flows are not only performing the probability density estimation but also sampling from the learned probability distribution. Let X be a dataset from the true target probability distribution p x. Constructing normalizing flows fits a flow-based model px to the true target distribution p x using a simple base probability distribution pu and a diffeomorphic (i.e., invertible and differentiable) mapping T : Ω Ωwhere Ωis a compact subset of RD with the following density transformation: px(x) = pu(T 1(x)) det T 1 x (x) . (1) *Corresponding author Copyright 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. For X = {x(n)}N n=1 where x(1), . . . , x(N) are i.i.d. samples from p x, normalizing flows are trained on X by minimizing the following negative log-likelihood (NLL): n=1 log px(x(n)) Ex p x[log px(x)] (2) where the above approximation becomes true as N . Density estimation corresponds to obtaining T from x and sampling is referred to getting x = T(u) from u pu(u). Normalizing flows are often used to model physical systems. For example, one can model the probability density of a molecular system (K ohler, Klein, and No e 2020; Wu, K ohler, and No e 2020; Garcia Satorras et al. 2021; Xu et al. 2021), sample a lattice model (Li and Wang 2018; Albergo, Kanwar, and Shanahan 2019; Nicoli et al. 2020, 2021; Boyda et al. 2021), or estimate free energies (Wirnsberger et al. 2020; Ding and Zhang 2021). While a C1diffeomorphic T may be sufficient in a typical normalizing flow for generating images or texts, modeling physical systems requires more conditions such as a normalizing flow being a Ck-diffeomorphism. For example, Boltzmann generator (No e et al. 2019; K ohler, Kr amer, and No e 2021; Liu et al. 2022; Ahmad and Cai 2022; Jing et al. 2022) requires the condition that T is a C2-diffeomorphism. Boltzmann generators are generative models for sampling molecular structures whose existence probability distributions follow the Boltzmann distributions. Without loss of generality, the true target distribution can be expressed as p x(x) exp ( v(x)) where v is the potential energy of a molecular system and its force components are f(x) = xv(x) = x log p x(x) = xp x(x)/p x(x). (3) Since valid molecular systems have well-defined and continuous force components, p x must be at least once continuously differentiable (for continuous f) and nonzero (for well-defined f). When the Boltzmann generators are constructed using the normalizing flows, Equation (1) implies that T 1 must be at least twice continuously differentiable (for continuously differentiable px) and its derivative must have positive lower bound (for px to have positive lower-bound). In other words, T should be at least C2-diffeomorphic. In this case, it is possible to train through force matching (FM) by minimizing The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23) the mean squared error of the force components, which can improve the performance of Boltzmann generators. However, most existing normalizing flow models (Dinh, Sohl Dickstein, and Bengio 2017; M uller et al. 2019; Durkan et al. 2019b; Dolatabadi, Erfani, and Leckie 2020; Durkan et al. 2019a) can not be used for this problem since they are only C1-diffeomorphic. They can not be trained using FM and can produce physically invalid samples. Smooth normalizing flows (K ohler, Kr amer, and No e 2021) addressed this problem by employing an ensemble of smooth (C ) bump functions (Tu 2008). However, since those functions do not admit analytic inverses, either density estimation or sampling should be conducted in a time-consuming black box root finding method. To the best of the authors knowledge, there is no known normalizing flow to meet the conditions of both being at least C2-diffeomorphic and admitting analytic inverses. In this paper, we propose Ck-diffeomorphic non-uniform B-spline flows where k can be controlled so that they can be applicable to physics problems such as Boltzmann generators that require both at least C2-diffeomorphism and analytic inverses. We firstly investigate the sufficient condition for obtaining global invertibility. To ensure that the probability density is continuously differentiable and has a positive lower bound, we state and prove the sufficient conditions for non-uniform B-spline transformations of any order to be bi Lipschitz. Then, we propose a parameterization method that is applicable in periodic and aperiodic domains as a building block of our non-uniform B-spline flows to ensure surjectivity and sufficient expressive power. Bi-Lipschitz continuity, surjectivity, and the nature of the non-uniform Bsplines ensure that the proposed normalizing flow is a Ck 2diffeomorphism for the order k non-uniform B-splines (using polynomials with degree k 1). Our experiments demonstrate that the proposed non-uniform B-spline flow is capable of solving the FM problems in Boltzmann generators that can not be done with most existing normalizing flows and is able to solving the same problems quickly using analytic inverses for the low-order B-splines (i.e., polynomial root formula) that smooth normalizing flows can not admit. Here is the summary on the contributions of this paper: Investigating the sufficient conditions for the any order non-uniform B-spline transformation to be diffeomorphic on a compact domain. Proposing a parametrization method for the non-uniform B-spline transformation in normalizing flows to maintain expressive power on periodic / aperiodic domains. Showing that our non-uniform B-spline flows with FM yielded much better experimental results than RQ-spline flows and are as good as the state-of-the-art smooth normalizing flows in density estimation and sampling. Demonstrating that sampling with non-uniform loworder B-spline flows is much faster than that of smooth flows due to the admissibility of analytic inverses. Related Works Coupling transformations. A coupling transform (Dinh, Krueger, and Bengio 2015; Dinh, Sohl-Dickstein, and Ben- gio 2017) ϕ : Ω Ω RD is defined as ϕ(x)i = fθi(xi), θi = NN(x1:d 1) if d i D, xi if 1 i < d, (4) where NN is an arbitrary neural network and fθi : Ω Ω R is an invertible function parameterized by θi. The Jacobian determinant of this transform is easily obtained by the derivatives of f, expressed as det ( ϕ/ x) = ΠD i=d fθi/ xi. The inverse of ϕ is obtained as ϕ 1(x)i = f 1 θi (xi), θi = NN(x1:d 1) if d i D, xi if 1 i < d. Coupling transformations have the computational advantages for Jacobian and inverse as well as have sufficient expressive power; hence many normalizing flows employ multiple coupling layers. However, various invertible f functions have been proposed. Affine coupling flows. Many studies employ affine transformations as f, i.e., fθi(xi) = aixi + bi where θi = {ai, bi}, for computational advantages. Since affine transformations are easy to compute their Jacobian and inverse transformations (Kingma and Welling 2014; Dinh, Sohl Dickstein, and Bengio 2017), they are suitable for generating images (Kingma and Dhariwal 2018; Ho et al. 2019; Lugmayr et al. 2020; Sukthanker et al. 2022) or speeches (Prenger, Valle, and Catanzaro 2019; He et al. 2022) with relatively large dimensions D. Spline-based flows. Affine coupling flows have difficulty in learning discontinuous distributions even with small dimension D. To tackle this problem, some studies use more complex transformations such as splines, which are piecewise polynomials or rational functions (parameterized with θi), such as linear and quadratic splines (M uller et al. 2019), cubic splines (Durkan et al. 2019b), linear-rational splines (Dolatabadi, Erfani, and Leckie 2020), and rationalquadratic (RQ) splines (Durkan et al. 2019a). For the loworder splines with proper invertibility conditions, the inverse can be analytically obtained through the root formula, which is slower than inverse affine transformations. Smooth normalizing flows. Smooth normalizing flows (K ohler, Kr amer, and No e 2021) generate C - diffeomorphism using smooth compact bump functions (Tu 2008) as follows: fθi(xi) = X ρθij(xi) ρθij(xi) + ρθij(1 xi), ρθij(x) = exp 1/(αijxβij) , where θi = S j θij = S j{αij, βij}. Even though ρ(x) is a low-order polynomial such as x3 (which should have an analytic inverse), this function does not have an analytic inverse because f is an ensemble (i.e., linear combination) of rational functions. Diffeomorphic Non-uniform B-spline Flows Non-uniform B-splines (Curry and Schoenberg 1947; De Boor 1978) are highly attractive methods for compromising the trade-offs between spline spline-based flows and smooth normalizing flows. Non-uniform B-splines have several nice properties: continuously differentiable with any degrees, compact support, and locally analytic invertibility for low-orders. However, for constructing Ck-diffeomorphic normalizing flows using non-uniform B-splines, the following conditions should be satisfied: global invertibility, surjectivity on various domains, and appropriate parameterization for sufficient expressive power. Definition for Flow Models We utilize the definition of a non-uniform B-spline in (Curry and Schoenberg 1947) with some modifications on normalization suggested by (De Boor 1978). Let t := {tj} be an increasing sequence. The j-th non-uniform B-spline of order k (using polynomials with degree k 1) for the knot sequence t is denoted by Bj,k,t and is defined recursively as: Bj,1,t = 1[tj,tj+1), (6) Bj,k,t = ωjk Bj,k 1,t + (1 ωj+1,k)Bj+1,k 1,t, (7) with ωjk(x) := (x tj)/(tj+k 1 tj). (8) For simplicity, we often drop t so that Bj,k,t = Bjkt = Bjk. Then, for the flow model in Equation (4), we propose to construct the transformation f : R R using the nonuniform B-splines of order k (i.e., Bjkt) as follows: f(x; α, t) = j=r k+1 αj Bjkt(x), x [tr, ts] (9) where r, s Z, α = {αj}s 1 j=r k+1 Rs r+k 1 and t = {tj}s+k 2 j=r k+2 Rs r+k 3 are designed to be the outputs of an arbitrary neural network (NN) with some constraints that we further discuss later. The function f is a surjective mapping from [tr, ts] to [tr, ts] and this transformation is defined to be an identity mapping when the input is outside this range. Note that f( ; α, t) Ck 2, but there is no guarantee that it is diffeomorphic. A differentiable transformation f is diffeomorphic if it is bijective and the inverse f 1 is differentiable. If these functions are n times continuously differentiable, f is called a Cn-diffeomorphism. For a diffeomorphic f, (f 1) should exist (i.e., bounded and nonzero) and thus, there should be a positive lower bound for f (x; α, t). Since (f 1(x)) = 1/f (f 1(x)), both the lower and upper bounds for f and (f 1) should exist, respectively. Therefore, to enforce f to be diffeomorphic, NN should generate α and t so that f ( ; α, t) is bounded on both sides. Let {tj}s+k 2 j=r k+2 be an increasing sequence and u, l R, u > 1 > l > 0. Let S(t, l, u) Rs r+k 1 be the set of α that makes the derivative of f( ; α, t) has an upper bound u and a lower bound l, which can be written as S(t, l, u) = {α Rs r+k 1 : l < f (x; α, t) < u, x R}. Since f (x; α, t) is linear in α, S(t, l, u) is (open) convex. Sufficient Conditions for Diffeomorphism We investigate the sufficient condition for non-uniform Bspline transformations of any order to be diffeomorphic. There have been studies on sufficient conditions for diffeomorphic uniform B-spline transformations (Chun and Fessler 2009; Sdika 2013). We leverage these studies to further investigate the sufficient condition for diffeomorphic non-uniform B-spline transformations. Theorem 1 (Sufficient condition for diffeomorphic transformations). Let k N \ {1, 2}, r, s Z, r + k s. Let α = {αj}s 1 j=r k+1 Rs r+k 1 and t = {tj}s+k 1 j=r k+1 Rs r+2k 1 be (strictly) increasing sequences. Let f(x; α, t) = Ps 1 j=r k+1 αj Bjkt(x), where Bjkt is the jth non-uniform B-spline of order k (from polynomials with degree k 1) for the knot sequence t. For u > 1, 0 < l < 1 and j = r k + 2, . . . , s 1, l k 1 < αj αj 1 tj+k 1 tj < u k 1 (10) leads to l < f ( ; α, t) < u on [tr, ts]. Proof. See the supplementary material. Theorem 1 suggests that we can generate bi-Lipschitz non-uniform B-spline transformations by constraining the parameters α = {αj}s 1 j=r k+1and t = {tj}s+k 1 j=r k+1 to satisfy (10). Then, by the nature of non-uniform B-splines, we can guarantee that these bi-Lipschitz kth-order non-uniform B-spline transformations are in fact Ck 2-diffeomorphic. Existence of an Analytic Inverse Smooth normalizing flows exhibit good expressive power but with the price of slow non-analytic inverse transformations. In contrast, the non-uniform B-splines of order k have analytic inverses if k < 5 since they are piecewise (k 1)thorder polynomials. Inverse non-uniform B-spline transformation has been partially studied in (Trist an and Arribas 2007), which ensured its exactness only on the knots t, not on the entire continuous domain. Here, we propose an analytic inverse non-uniform B-spline transformation for Equation (9) on a compact domain in our normalizing flows with k < 5. The map f : [tr, ts] [tr, ts] in Equation (9) with α Rs r+k 1 satisfying the sufficient condition (10) in Theorem 1 is Ck 2-diffeomorphic. Like prior works (Durkan et al. 2019a,b), computing the inverse of a non-uniform B-spline at any location y requires finding the bin index j [r, s] where x lies with f(tj; α, t) y < f(tj+1; α, t). The following equation can easily identify such bin j: f(tj; α, t) = i=r k+1 αi Bik(tj) = i=j k+1 αi Bik(tj). This bin search does not increase computational burden due to strictly increasing (i.e., sorted) {f(tj; α, t)}j sequence. After finding the bin j, {i|supp(Bik) (tj, tj+1) = } = {j k + 1, j k + 2, . . . , j}. Thus, for the given y, the following equation holds for x [tj, tj+1), i=j k+1 αi Bik(x). (11) Equation (11) is a (k 1)th-order polynomial, so the root formulas of the polynomial equations can be used if k 5. In case of k = 4 (i.e., cubic B-spline), the root finding algorithm by (Peters 2016) can be used, which is a modification of the algorithm of (Blinn 2007). We used the code of cubicroot computation by (Durkan et al. 2019b). Definitions on Various Domains For the applications to physics problems, non-uniform Bspline transformation must be well-defined on domains such as closed interval and circle (i.e., periodic interval). Without loss of generality, we construct transformations on I (unit interval) and S1 (unit circle), which can be extended to arbitrary closed intervals and circles through affine transforms. The following subsections discuss how to construct well-defined diffeomorphic non-uniform B-spline transformations on I and S1. On I. Two constraints f(0) = 0 and f(1) = 1 must hold to make f be surjective from I to I. One trivial solution for a surjective f (i.e., f(0) = 0 and f(1) = 1) is to set αj = 0 for all j such that tj < 0 and αj = 1 for all j such that tj+k > 1. However, this na ıve solution severely decreases the expressive power of the non-uniform B-spline transformation near both endpoints (i.e., 0 and 1). This is because the kth-order non-uniform B-spline transformations are Ck 2, which results in f (m)(0) = 0 and f (m)(1) = 0 for m = 1, . . . , k 2 where f (m)(x) denotes the mth derivative of f(x). Therefore, we propose Algorithm 1 to generate parameters (i.e., t and α) to maintain the expressive power of the non-uniform B-spline transformation at both endpoints. This proposed algorithm consists of three main steps as follows: Line 1-7 : Generating an increasing sequence {tj}s+k 2 j=r k+2 such that tr = 0, ts = 1 and tj := tj+1 tj having a positive infimum for j = r k + 2, . . . , s + k 3. Such a sequence can be obtained by applying the softmax (line 1), cumulative summation (line 4-7), and the affine transform (line 2-3) to the output of an arbitrary neural network. Note that ϵt is set to a small positive constant (e.g., 10 6) to ensure tj has a positive infimum. Line 8-12 : Similar to the first step, generating another increasing sequence {αj}s 2 j=r k+2 such that αr k+2 > 0, αs 2 < 1 and αj := αj+1 αj having a positive infimum for j = r k + 2, . . . , s 3. ϵα is set to a small positive constant (e.g., 10 6) to ensure αj has a positive infimum. Line 13-17 : Computing αr k+1 and αs 1 such that f(0) = 0 and f(1) = 1, respectively. This step ensures surjectivity. Algorithm 1: Non-uniform B-spline parameter generation Input: t = tj s+k 3 j=r k+2 , α = { αj}s 2 j=r k+1 Parameter: ϵt, ϵα Output: t = {tj}s+k 2 j=r k+2 , α = {αj}s 1 j=r k+1 1: t softmax( t) 2: t ϵt + (1 (s r + 2k 4)ϵt) t 3: t t/ Ps 1 j=r( t)j 4: tr k+2 = Pr 1 j=r k+2 tj 5: for i = r k + 3, . . . , s + k 2 do 6: ti = tr k+2 + Pi 1 j=r k+2 tj 7: end for 8: α softmax( α) 9: α ϵα + (1 (s r + k 2)ϵα) α 10: for i = r k + 2, . . . , s 2 do 11: αi = Pi 1 j=r k+1 αj 12: end for 13: αr k+1 = 0 14: αs 1 = 1 15: fr = Pr 1 j=r k+1 αj Bjkt(tr) 16: fs = Ps 1 j=s k+1 αj Bjkt(ts) 17: α = ( α fr)/( fs fr) 18: return t, α On S1. Smooth normalizing flows implemented periodic transformations with non-zero derivatives in interval boundaries. Since the vanishing gradient at the endpoint makes the universal approximation of arbitrary periodic transformations impossible, it is important to implement periodic transformations. Similar to RQ circular spline flows (Durkan et al. 2019a), we propose to control the parameters to match all 1st to (k 2)th-order derivatives on both interval boundaries using the following theorem. Theorem 2. Let k N \ {1, 2}, r, s Z, r + k s. Let α = {αj}s 1 j=r k+1 Rs r+k 1 and t = {tj}s+k 1 j=r k+1 Rs r+2k 1 be (strictly) increasing sequences. Let f(x; α, t) = Ps 1 j=r k+1 αj Bjkt(x), where Bjkt is the jth non-uniform B-spline of order k for the knot sequence t. If αs i = 1 + αr i for i = 1, 2, . . . , k 1 and ts+j = 1 + tr+j for j = k + 2, k + 3, . . . , k 3, k 2, then f (m)(tr) = f (m)(ts) for m = 1, . . . , k 2. Proof. See the supplementary material. Theorem 2 suggests that by taking additional conditions on some of the parameters, Ck 2-diffeomorphic nonuniform B-spline transformations can be constructed on the unit circle. Therefore, when we generate diffeomorphic nonuniform B-spline transformations on S1, we use the Algorithm 1, but with αs i = 1 + αr i for i = 1, 2, . . . , k 1 and ts+j = 1+tr+j for j = k+2, k+3, . . . , k 3, k 2. These additional conditions can be implemented by setting αs i = 1 + αr i for i = 2, . . . , k 1 and ts+j = (a) Ground Truth (b) RQspline (d) Nonuniform B-spline (ours) Figure 1: Probability density (left) and its corresponding force field (right) of (a) ground truth, and their approximations by (b) RQ-spline flows, (c) Smooth flows, and (d) Nonuniform (cubic) B-spline flows (ours). 1 + tr+j for j = k + 2, k + 3, . . . , k 3 in the Algorithm 1. Experiments Illustrative Toy Example A simple toy example in this section shows that the proposed non-uniform B-spline flows generate feasible continuous forces. All flow models used NLL as a loss function. We provide full experimental details in the supplementary material. Figure 1 (a) shows energy and force by two-dimensional ground-truth distribution. In Figure 1 (b), RQ-spline flow shows an unstable force field with many singularities (i.e., unrealistic) because it is only a C1-diffeomorphism. On the other hand, since smooth normalizing flow is a C - diffeomorphism, the force field is well-defined (i.e., not diverging) and continuous in Figure 1 (c). In Figure 1 (d), nonuniform cubic B-spline flow, which is a C2-diffeomorphism, shows that the force field is well-defined and continuous, which is comparable to the smooth normalizing flow. We also depict some non-uniform B-spline transformations generated by our models in this toy example experiment in Figure 2. Figure 2 (a) shows the transformation in the aperiodic domain (I), and Figure 2 (b) illustrates the transformation in the periodic domain (S1). Note that Figure 2 (a) is extracted from the flow that models aperiodic prob- 0.0 0.2 0.4 0.6 0.8 1.0 (a) Aperiodic (I) 0.0 0.2 0.4 0.6 0.8 1.0 (b) Periodic (S1) Figure 2: Accumulation of non-uniform B-spline transformations on (a) I and (b) S1 generated by the flow model used in the illustrative toy example experiment ((a) from Figure 1(d) and (b) from the periodic case in supplemental, respectively). The displayed transformations were randomly selected. Note that learned transformations in (b) have continuous derivatives at both ends, demonstrating Theorem 2. ability density as in Figure 1, while Figure 2 (b) is obtained from the flow that models periodic probability density as in the periodic toy example in supplemental. We can observe that our diffeomorphic non-uniform B-spline transformation is surjective, differentiable, and diversely expressive. In particular, in Figure 2 (b), it can be observed that it also has the same slope at both endpoints as Theorem 2 guarantees while having diverse diffeomorphic transformations. Boltzmann Generator Training by Force Matching This section demonstrates that the proposed non-uniform B-spline flows can be applied to physical system modeling such asa Boltzmann generator, trained with FM, just like smooth normalizing flows, but unlike other prior normalizing flows such as RQ spline flows. The output of the Boltzmann generator is a 60-dimensional vector representing the structure (including bond length and angle) of a molecular system, alanine dipeptide. Alanine dipeptide is a typical example of molecular system structure sampling problems. Because alanine dipeptide has a highly nonlinear potential energy surface and singular points, common C1-diffeomorphic normalizing flows may have difficulty in learning its distribution model. We trained three normalizing flow models: RQ-spline flow, smooth normalizing flow, and our non-uniform Bspline flow. Two loss functions were used: NLL loss LNLL or NLL + FM loss (1 λFM)LNLL + λFMLFM where f(x(n)) x log px(x(n)) 2 Ex p x h f(x) x log px(x) 2 2 i , and we set λFM = 0.001. In density estimation, each transformation is performed in a forward direction, and in sampling, it is performed in a reverse direction (i.e., rootfinding). However, NLL + FM loss was not able to be used Method NLL FME 104 KLD RQ-spline -210.28 ( 0.05) 79.95 ( 92.80) 385.22 ( 8.05) Smooth -210.87 ( 0.05) 1.34 ( 0.07) 192.57 ( 0.22) Smooth+FM -210.25 ( 0.09) 0.909 ( 0.002) 196.85 ( 1.02) Non-uniform B-spline (ours) -211.03 ( 0.03) 4.59 ( 5.11) 192.62 ( 0.31) Non-uniform Bspline+FM (ours) -209.45 ( 1.78) 0.812 ( 0.345) 228.09 ( 54.6) Table 1: Negative log-likelihoods (NLLs), force matching errors (FMEs), and reverse Kullback-Leibler divergences (KLDs) of alanine dipeptide training with different methods. The +FM indication means that it is trained with NLL and FME; otherwise, it is trained with NLL only. The statistical values are the mean and twice the standard error for ten replication experiments. for RQ-spline flows since it diverged. So we trained Boltzmann generators in a total of five scenarios. Each experimental scenario was repeated ten times. Experimental details are described in the supplementary material. Table 1 shows NLLs, force matching errors (FMEs), and reverse Kullback-Leibler divergences (KLDs) on the test set after ten training epochs for five different scenarios, with standard error for ten replication experiments. RQ-spline flow yielded the worst FME and KLD. It seems obvious that RQ-spline yielded the largest FME because LFM can not be used. RQ-spline also yielded the largest KLD, which suggests that if the normalized flow model itself is not C2diffeomorphism, it may not be appropriate for Boltzmann generator even if LFM is not used. Smooth normalizing flow achieves great performance regardless of using LFM. Unlike RQ-spline flow, non-uniform B-spline flow achieved performance similar to smooth normalizing flow. One difference is that non-uniform B-spline flow yielded larger standard errors than smooth normalizing flow for some metrics. This is because non-uniform B-spline flow reaches similar metrics to smooth normalizing flow in most trials (about 9/10 trials) but has an outlier (about 1/10 trials). We discuss this phenomenon in the following section. Figure 3 depicts the scatterplot of the forces estimated by the flow model in five different scenarios. The top row shows the results of density estimation, and the bottom row shows the results of sampling. There was no significant difference between the two rows except for RQ-spline flows, which showed the worst performance. The other four scenarios showed similar performance, but each achieved slightly better performance when using FM. Therefore, both smooth normalizing flow and non-uniform spline flow seem to model the force well. Dynamics Simulation by Density Estimation In the previous experiment, our non-uniform B-spline flow seemed to model force well. However, simulating molecular dynamics through density estimation using normalizing Architecture #params Runtime (Reverse) Runtime (Forward) RQ-spline 285,497 0.59 ( 0.01) 0.49 ( 0.01) Smooth 314,456 19.8 ( 0.26) 0.64 ( 0.01) Non-uniform B-spline (ours) 380,982 1.12 ( 0.02) 0.72 ( 0.01) Table 2: Runtimes per sample (in ms) of RQ-spline flows, smooth flows and non-uniform B-spline flows. The runtime is averaged over 10,000 samples each. The statistical values are the mean and twice the standard error for ten replication experiments. All computations were conducted on NVIDIA Ge Force RTX3090. flows is much more challenging. Since the flow model does not have any internal algorithm that can suppress numerical errors, it quickly breaks the molecular structure unless it is very well-trained. Therefore, here we further verify our non-uniform B-spline flow thoroughly through the molecular dynamics simulation. We used three models for molecular dynamics simulation; RQ-spline flow, smooth normalizing flow with FM, and nonuniform B-spline flow with FM. Each model is identical to the one used in Figure 3. Figure 4 shows the potential energies during the molecular dynamics simulation. As in Figure 4 (a) and (b), RQspline flow failed to maintain the initial state for all ten different initial values. On the other hand, in Figure 4 (c) and (d), both non-uniform B-spline flow and smooth normalizing flow well-maintained all initial values. Runtime Comparison We compared the runtime of forward and reverse operations for the models used in the previous experiment. The forward operation performs density estimation, and the reverse operation performs sampling. Table 2 shows the average runtime for 10,000 samples. The table also shows the model s total number of parameters, indicating no significant difference in the size of the flow models. Because all three models performed forward operations analytically, they had similar runtimes (proportional to the number of parameters). In reverse operation, since only smooth normalizing flow was non-analytic (black-box root-finding), smooth normalizing flow was the slowest. Non-uniform B-spline flow, whose reverse operates analytically, was about 17 times faster than smooth normalizing flow. This demonstrates that the proposed non-uniform B-spline flow has a great advantage in runtime over smooth normalizing flow. We also observed that RQ-spline flow had an increased runtime in reverse operation since RQ-spline flow and non-uniform Bspline flow solve the quadratic and cubic equations, respectively. Discussion Outlier issue in non-uniform cubic B-spline flows In the Boltzmann generator experiment, our non-uniform cu- Figure 3: Scatterplot (10,000 samples) of forces estimated by each normalizing flow model (RQ-spline, smooth, and our nonuniform B-spline). The upper row is obtained by density estimation for test set samples, and the lower row is obtained by sampling with a flow model. The +FM indication means that it is trained with NLL + FM; otherwise, it is trained with NLL. 0.0 2.5 5.0 t [ps] Total Energy [k J/mol] (a) RQ-spline 0.0 2.5 5.0 t [ps] (b) RQ-spline (rescaled) 0.0 2.5 5.0 t [ps] (c) Non-uniform B-spline (ours) 0.0 2.5 5.0 t [ps] Figure 4: Potential energies during the molecular dynamics simulation, estimated by (a) RQ-spline flow ((b) rescaled), (c) non-uniform B-spline flow, and (d) smooth normalizing flow, respectively. The simulation started with ten stable random initial configurations. bic B-spline flow model showed an outlier with a large reverse KLD, about once in ten times. One possible explanation for this phenomenon can be the numerical instability of the cubic equation root-finding formula (Durkan et al. 2019b). Since the implementation of the model uses a float- ing point operation, it is impossible to guarantee whether the root-finding formula will actually find the root correctly beyond machine precision. This problem has also been raised in cubic spline flows (Durkan et al. 2019b), a study using the same root-finding formula as ours. Considering 32-bit floating-point precision, cubic spline flows clipped the input of every cubic-spline transformation to [10 6, 1 10 6] instead of [0, 1]. Similar trick of clipping the inputs or the increasement of the floating-point precision (e.g., 64-bit or 128-bit) may suppress outliers from the instability of the equation root formula for much more accurate physics system modeling. Limitations Spline-based normalizing flows, including our non-uniform B-spline flows, are much more computationally burdensome than affine coupling flows. For this reason, normalization flows dealing with images mainly employ affine transformations. Researchers in various fields such as medical imaging have studied invertibillity (Chun and Fessler 2009; Sdika 2013) and fast transformation (Unser, Aldroubi, and Eden 1993a,b) of uniform B-splines. Using uniform B-splines rather than non-uniform B-splines, it may be possible to construct normalizing flows as fast as affine coupling flows. We proposed non-uniform B-spline flows based on the sufficient conditions that kth-order non-uniform B-spline transformation is a Ck 2-diffeomorphism, by proving bi Lipschitz continuity and surjectivity on various compact domains. Experiments demonstrated that our non-uniform Bspline flows can solve the force matching problem in Boltzmann generators better than previous spline-based flows and as good as smooth flows. Our method can admit analytic inverses so that it is much faster than smooth flows. Acknowledgements This work was supported by the National Research Foundation of Korea(NRF) grants funded by the Korea government(MSIT) (NRF-2022R1A4A1030579, NRF2022M3C1A309202211) and by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF2017R1D1A1B05035810). Also, the authors acknowledged the financial supports from BK21 FOUR program of the Education and Research Program for Future ICT Pioneers, Seoul National University. Ahmad, R.; and Cai, W. 2022. Free energy calculation of crystalline solids using normalizing flows. Modelling and Simulation in Materials Science and Engineering, 30(6): 065007. Albergo, M. S.; Kanwar, G.; and Shanahan, P. E. 2019. Flow-based generative models for Markov chain Monte Carlo in lattice field theory. Physical Review D, 100(3): 034515. Blinn, J. F. 2007. How to solve a cubic equation, part 5: Back to numerics. IEEE Computer Graphics and Applications, 27(3): 78 89. Boyda, D.; Kanwar, G.; Racani ere, S.; Rezende, D. J.; Albergo, M. S.; Cranmer, K.; Hackett, D. C.; and Shanahan, P. E. 2021. Sampling using SU (N) gauge equivariant flows. Physical Review D, 103(7): 074504. Chun, S. Y.; and Fessler, J. A. 2009. A simple regularizer for B-spline nonrigid image registration that encourages local invertibility. IEEE journal of selected topics in signal processing, 3(1): 159 169. Curry, H. B.; and Schoenberg, I. J. 1947. On spline distributions and their limits: the P olya distribution functions. In Bulletin of the American Mathematical Society, volume 53, 1114. De Boor, C. 1978. A practical guide to splines, volume 27. springer-verlag New York. Ding, X.; and Zhang, B. 2021. Deep BAR: a fast and exact method for binding free energy computation. The journal of physical chemistry letters, 12(10): 2509 2515. Dinh, L.; Krueger, D.; and Bengio, Y. 2015. NICE: Nonlinear Independent Components Estimation. In ICLR (Workshop). Dinh, L.; Sohl-Dickstein, J.; and Bengio, S. 2017. Density estimation using Real NVP. In ICLR (Poster). Dolatabadi, H. M.; Erfani, S.; and Leckie, C. 2020. Invertible generative modeling using linear rational splines. In AISTATS, 4236 4246. Durkan, C.; Bekasov, A.; Murray, I.; and Papamakarios, G. 2019a. Neural spline flows. Neur IPS, 32. Durkan, C.; Bekasovs, A.; Murray, I.; and Papamakarios, G. 2019b. Cubic-Spline Flows. In ICML Workshop on Invertible Neural Nets and Normalizing Flows. Garcia Satorras, V.; Hoogeboom, E.; Fuchs, F.; Posner, I.; and Welling, M. 2021. E (n) Equivariant Normalizing Flows. Neur IPS, 34: 4181 4192. He, J.; Zhao, Z.; Ren, Y.; Liu, J.; Huai, B.; and Yuan, N. J. 2022. Flow-Based Unconstrained Lip to Speech Generation. In AAAI, 843 851. Ho, J.; Chen, X.; Srinivas, A.; Duan, Y.; and Abbeel, P. 2019. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In ICML, 2722 2730. Jing, B.; Corso, G.; Chang, J.; Barzilay, R.; and Jaakkola, T. 2022. Torsional Diffusion for Molecular Conformer Generation. In ICLR Workshop MLDD. Kingma, D. P.; and Dhariwal, P. 2018. Glow: Generative flow with invertible 1x1 convolutions. Neur IPS, 31. Kingma, D. P.; and Welling, M. 2014. Auto-Encoding Variational Bayes. In ICLR. K ohler, J.; Klein, L.; and No e, F. 2020. Equivariant flows: exact likelihood generative learning for symmetric densities. In ICML, 5361 5370. K ohler, J.; Kr amer, A.; and No e, F. 2021. Smooth normalizing flows. Neur IPS, 34: 2796 2809. Li, S.-H.; and Wang, L. 2018. Neural network renormalization group. Physical review letters, 121(26): 260601. Liu, T.; Gao, W.; Wang, Z.; and Wang, C. 2022. Path Flow: A Normalizing Flow Generator that Finds Transition Paths. In UAI. Lugmayr, A.; Danelljan, M.; Gool, L. V.; and Timofte, R. 2020. Srflow: Learning the super-resolution space with normalizing flow. In ECCV, 715 732. M uller, T.; Mc Williams, B.; Rousselle, F.; Gross, M.; and Nov ak, J. 2019. Neural importance sampling. ACM Transactions on Graphics (TOG), 38(5): 1 19. Nicoli, K. A.; Anders, C. J.; Funcke, L.; Hartung, T.; Jansen, K.; Kessel, P.; Nakajima, S.; and Stornati, P. 2021. Estimation of thermodynamic observables in lattice field theories with deep generative models. Physical review letters, 126(3): 032001. Nicoli, K. A.; Nakajima, S.; Strodthoff, N.; Samek, W.; M uller, K.-R.; and Kessel, P. 2020. Asymptotically unbiased estimation of physical observables with neural samplers. Physical Review E, 101(2): 023304. No e, F.; Olsson, S.; K ohler, J.; and Wu, H. 2019. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457): eaaw1147. Papamakarios, G.; Nalisnick, E. T.; Rezende, D. J.; Mohamed, S.; and Lakshminarayanan, B. 2021. Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research, 22(57): 1 64. Peters, C. 2016. How to solve a cubic equation, revisited. http://momentsingraphics.de/Cubic Roots.html. Accessed: 2022-07-16. Prenger, R.; Valle, R.; and Catanzaro, B. 2019. Waveglow: A flow-based generative network for speech synthesis. In IEEE ICASSP, 3617 3621. Rezende, D.; and Mohamed, S. 2015. Variational inference with normalizing flows. In ICML, 1530 1538. Sdika, M. 2013. A sharp sufficient condition for B-spline vector field invertibility. Application to diffeomorphic registration and interslice interpolation. SIAM Journal on Imaging Sciences, 6(4): 2236 2257. Sukthanker, R. S.; Huang, Z.; Kumar, S.; Timofte, R.; and Van Gool, L. 2022. Generative Flows with Invertible Attentions. In CVPR, 11234 11243. Trist an, A.; and Arribas, J. I. 2007. A fast B-spline pseudoinversion algorithm for consistent image registration. In CAIP, 768 775. Tu, L. W. 2008. Bump Functions and Partitions of Unity, 127 134. Springer New York. Unser, M.; Aldroubi, A.; and Eden, M. 1993a. B-spline signal processing. I. Theory. IEEE transactions on signal processing, 41(2): 821 833. Unser, M.; Aldroubi, A.; and Eden, M. 1993b. B-spline signal processing. II. Efficiency design and applications. IEEE transactions on signal processing, 41(2): 834 848. Wirnsberger, P.; Ballard, A. J.; Papamakarios, G.; Abercrombie, S.; Racani ere, S.; Pritzel, A.; Jimenez Rezende, D.; and Blundell, C. 2020. Targeted free energy estimation via learned mappings. The Journal of Chemical Physics, 153(14): 144112. Wu, H.; K ohler, J.; and No e, F. 2020. Stochastic normalizing flows. Neur IPS, 33: 5933 5944. Xu, M.; Luo, S.; Bengio, Y.; Peng, J.; and Tang, J. 2021. Learning Neural Generative Dynamics for Molecular Conformation Generation. In ICLR.