# the_kernel_perspective_on_dynamic_mode_decomposition__9badda77.pdf Published in Transactions on Machine Learning Research (09/2024) The Kernel Perspective on Dynamic Mode Decomposition Efrain H Gonzalez ehgonza@sandia.gov Department of Mathematics and Statistics University of South Florida Moad Abudia abudia@okstate.edu School of Mechanical and Aerospace Engineering Oklahoma State University Michael Jury mjury@ufl.edu Department of Mathematics University of Florida Rushikesh Kamalapurkar rkamalapurkar@ufl.edu Department of Mechanical and Aerospace Engineering University of Florida Joel A. Rosenfeld rosenfeldj@usf.edu Department of Mathematics and Statistics University of South Florida Reviewed on Open Review: https: // openreview. net/ forum? id= s IR8x V7h Gl This manuscript takes a critical look at the interactions between Koopman theory and reproducing kernel Hilbert spaces with an eye towards giving a tighter theoretical foundation for Koopman based dynamic mode decomposition (DMD), a data driven method for modeling a nonlinear dynamical system from snapshots. In particular, this paper explores the various necessary conditions imposed on the dynamics when a Koopman operator is bounded or compact over a reproducing kernel Hilbert space. Ultimately, it is determined that for many RKHSs, the imposition of compactness or boundedness on a Koopman operator forces the dynamics to be affine. However, a numerical method is still recovered in more general cases through the consideration of the Koopman operator as a closed and densely defined operator, which requires a closer examination of the connection between the Koopman operator and a RKHS. By abandoning the feature representation of RKHSs, the tools of function theory are brought to bear, and a simpler algorithm is obtained for DMD than what was introduced in Williams et al (2016). This algorithm is also generalized to utilize vector valued RKHSs. 1 Introduction Dynamic mode decomposition (DMD) has been gaining traction as a model-free method of making short-run predictions for nonlinear dynamical systems using data obtained as snapshots of trajectories. DMD aims to obtain a finite rank representation of the Koopman operator by studying its action on the full state observable (i.e. the identity function) (Rowley et al., 2009). Koopman operators over reproducing kernel Hilbert spaces (RKHSs) were studied to take advantage of infinite dimensional feature spaces to extract more information from the snapshots of a system in (Williams et al., 2015a). This perspective also enacts a dimensionality reduction by formulating the DMD method in a reproducing kernel Hilbert space (RKHS) framework and implicitly using the kernel trick to compute inner products in the high-dimensional space of observables. In Now at the Department of Cognitive & Emerging Computing, Sandia National Laboratories The following You Tube playlist discusses the content of this manuscript: https://youtube.com/playlist?list= PLldi Dn Qu2phu B64cc OYWxe SBZxq0Gg47g Published in Transactions on Machine Learning Research (09/2024) (Williams et al., 2015b), it is shown that kernel-based DMD produces a collection of Koopman modes that agrees with other DMD results in the literature. The introduction of kernel based techniques for Koopman analysis and DMD has catalyzed a new direction for Koopmanism. The new perspective added by kernel methods is that of approximations and it is at the center of the work presented in this manuscript. Universal RKHSs, such as those corresponding to Gaussian RBFs and exponential dot product kernel functions, have the ability to approximate any continuous function over a compact subset of Rn to any desired accuracy up to computational precision. Moreover, when the kernel function is continuous or bounded, convergence in RKHS norm yields point wise everywhere convergence and uniform convergence over compact subsets of Rn (Steinwart & Christmann, 2008). This everywhere convergence stands in contrast to the perspective through the lens of ergodic methods, where convergence results only hold almost everywhere. For additional insight into the point wise everywhere convergence property described above see example 1 in Appendix A.1. The study of dynamical systems through the Koopman formalism over RKHSs manifests as a search for functions that are close to being eigenfunctions of the Koopman operator, rather than the actual eigenfunctions. Since only a finite amount of data can be available for the study of an infinite dimensional operator, actual eigenfunctions typically cannot be computed. In fact, there is no requirement that KF even has any eigenfunctions. The universality property arises in the search for Koopman eigenfunctions, which, given a particular Koopman operator and kernel space, might not exist, since the existence of an eigenfunction depends on the Hilbert space. However, the formal equation, ϕ(F(x)) λϕ(x) = 0 may still hold for some continuous function ϕ. If one is working over a RKHS corresponding to a universal kernel, then for any given compact set and ε > 0, there is a function ϕ H such that |ϕ(x) ϕ(x)| < ε for all x in that compact set, which provides for an approximate eigenfunction. Here we define approximate eigenfunctions as follows: Definition 1. For ϵ > 0 an approximate eigenfunction ˆϕ is a function in the Hilbert space H together with ˆλ C such that |KF ˆϕ(x) ˆλ ˆϕ(x)| < ϵ for all x X. This is particularly important for DMD methods, which attempt to construct a finite rank approximation of a Koopman operator from a finite collection of observed snapshots. Note that obtaining approximate eigenfunctions as in Example 1 is not dissimilar to the objective of ergodic methods, where approximation of system invariants and eigenfunctions using time averages is sought. The existence of eigenfunctions depends on the selection of the Hilbert space, as will be shown in Section 4, and eigenfunctions may not be present even in the L2 ergodic setting (Budišić et al., 2012). The objective of this manuscript is to present the kernel perspective of Koopmanism as a distinct study from the ergodic perspective. With that goal in mind, the paper is structured in the following way: Section 2: Introduces reproducing kernel Hilbert spaces (RKHSs) which will be necessary for the remainder of the paper. Section 3: Introduces Koopman operators, their relationship to DMD, and the differences between the ergodic and kernel perspectives. Here the pointwise convergence properties in the RKHS are presented. Section 4: Discusses properties of Koopman operators over RKHSs. It is demonstrated that assumptions of boundedness and compactness hold for a very small collection of Koopman operators. Section 5: A numerical algorithm for a Koopman based DMD method is presented, which relaxes the assumptions of bounded and compact Koopman operators to densely defined Koopman operators. This yields a new theoretical foundation for the study of Koopman DMD over RKHSs. The new algorithm still relies on some of the same matrices presented in (Williams et al., 2015b). Section 6: Extends the developed DMD algorithm to vector valued RKHSs. Section 7: Provides a numerical example that compares the developed DMD method to that presented in (Williams et al., 2015b). Published in Transactions on Machine Learning Research (09/2024) Appendix: Includes examples and proofs referenced in the text as well as a few numerical experiments which show that the developed DMD algorithm has nearly identical results to that developed by (Williams et al., 2015b). 2 Reproducing Kernel Hilbert Spaces Definition 2. A Reproducing Kernel Hilbert Space (RKHS) H over a set X is a Hilbert space composed of functions from X to C such that for all x X the evaluation functional Exf := f(x) is bounded. Therefore, for all x X there exists a function Kx H such that f(x) = f, Kx for all f H. The function Kx is the reproducing kernel centered at x and the function K : X X C defined by K(x, y) = Ky, Kx is the unique kernel function corresponding to H (Aronszajn, 1950). Throughout most of this manuscript, the methods used will be restricted to RKHSs of real valued functions. However, for some specific examples in Section 4, it will be more convenient to employ RKHSs of complex valued functions of a real variable. Reproducing kernels can be equivalently expressed as realizations of inner products of feature space mappings in ℓ2(N) (Steinwart & Christmann, 2008). Proposition 1. Given an orthonormal basis for a RKHS, {em( )} m=1 H, the kernel function may be expressed as K(x, y) = P m=1 em(x)em(y), where Ψ(x) := (e1(x), e2(x), . . .) ℓ2(N) is called a feature map. Equivalently, given a feature mapping Ψ : X ℓ2(N), there is a RKHS whose kernel function is given as K(x, y) = Ψ(x), Ψ(y) ℓ2. We will make repeated use of projections onto finite dimensional vector spaces arising from spans of collections of kernels centered at snapshots from a dynamical system. Proposition 2. For a collection of centers {x1, . . . , xm}, the projection of a function g H onto α = span{Kx1, . . . , Kxm} is given as arg minh α h g H. The projection can be expressed in terms of α as Pαg := Pm i=1 wi Kxi where the weights wi satisfy K(x1, x1) K(x1, xm) ... ... ... K(xm, x1) K(xm, xm) g(x1) ... g(xm) Proof. The projection of a function in a Hilbert space onto a closed subspace is determined by finding the closest member of that subspace to the function being projected. The matrix equation may be established by expressing the projection as Pαg = Pm i=1 wi Kxi, expanding Pm i=1 wi Kxi g 2 H via inner products, and setting the derivative with respect to w := (w1, . . . , wm)T Rm to zero, resulting in the system of linear equations in equation 1. If the functions {Kx1, . . . , Kxm} are linearly independent then the Gram matrix in equation 1 is non-singular. In that case, the weights {wi}m i=1 are unique. Definition 3. A RKHS of real valued functions, H, over Ω Rn, is said to be universal if for any compact V Ω, ϵ > 0, and h C(V ), there is a function h H such that h h < ϵ, where C(V ) denotes the set of continuous functions defined on V . Many commonly used kernel functions satisfy this universality property, including the Gaussian RBF kernel functions and the exponential dot product kernel functions (Steinwart & Christmann, 2008). 3 Koopman Operators over RKHSs The theory of Koopman operators has been long intertwined with ergodic theory, where ergodic theoretic methods justify almost everywhere convergence claims of time averaging methods to invariants of the Koopman operator. The Birkhoffand Von Neumann ergodic theorems are posed over L1(R) and Lp(R) for p > 1, respectively (Walters, 2000). However, the invariants for Koopman operators are not always analytic or even Published in Transactions on Machine Learning Research (09/2024) smooth. Hence, ergodic theorems do not give guarantees of convergence within most RKHSs, which are frequently composed of real analytic functions. Furthermore, even though ergodic theorems guarantee the existence of invariants over L2(R), the invariant itself is hidden behind a limiting operation via time averages (Walters, 2000). Hence, there is an expected error in the constructed invariant that stems from finiteness of data. The objective of DMD methods is to find functions within the function space that nearly achieve eigenfunction behavior. Specifically, for a Koopman operator, KF , and ϵ > 0, the objective is to find ˆϕ H and λ C for which |KF ˆϕ(x) λ ˆϕ(x)| < ϵ for all x in a given workspace. When such a function is discovered, the eigenfunction witnesses the snapshots, xi+1 = F(xi), as an exponential function as ˆϕ(xi+1) = λi ˆϕ(x1) + 1 λi+1 1 λ ϵ. If ˆϕ is a proper eigenfunction for KF , then ϵ may be taken to be zero, and ˆϕ(xi+1) = λi ˆϕ(x1). Ergodic methods generally yield |KF ˆϕ(x) λ ˆϕ(x)| < ϵ only for almost all x within the domain of interest. For a RKHS, the condition |KF ˆϕ(x) λ ˆϕ(x)| < ϵ may be relaxed to KF ˆϕ λ ˆϕ H < ϵ, since |KF ˆϕ(x) λ ˆϕ(x)| < C KF ˆϕ λ ˆϕ H < Cϵ, where C > 0 depends on the kernel function and the point x. If the kernel function is continuous and the domain is compact, a finite C may be selected uniformly for that domain. In the special case of the Gaussian RBF kernel function and the domain being Rn, C may be taken to be 1. Proposition 3. If KF is compact and the finite rank approximation of KF , which we will denote as ˆKF , is within ϵ of KF with respect to the operator norm, and if ˆϕ is a normalized eigenfunction for ˆKF with eigenvalue λ, then KF ˆϕ λ ˆϕ H KF ˆϕ ˆKF ˆϕ H KF ˆKF ϵ. Hence, if we obtain a finite rank approximation of KF that is within ϵ with respect to the operator norm, then an eigenfunction of the finite rank approximation will be an approximate eigenfunction of KF . This approximation is important in DMD, where the eigenfunctions are utilized to generate an approximation of the full state observable, gid(x) := x, one dimension at a time, as outlined in Section 5. If an accurate finite rank approximation of the Koopman operator can be obtained in a RKHS, then the approximation of the overall model is accurate point-wise everywhere via the same proof given in (Rosenfeld et al., 2022), and uniformly over compact sets when the RKHS consists of continuous functions. In contrast, the approximation is only accurate almost everywhere when considering Koopman operators posed over L2(R). Point-wise everywhere approximation is a distinctive advantage of kernel based methods. This convergence result is less clear from an invocation of the kernel trick in machine learning, and exemplifies the advantage of the operator theoretic considerations introduced in this manuscript. The approximation of the Koopman operator in the operator norm topology naturally leads to the question of when a Koopman operator, or more generally, a composition operator, can be compact. Compactness is a central issue for DMD as every approximation is indeed of finite rank, stemming from the observed data. In 1979, it was established in (Singh & Kumar, 1979) that composition operators over L2(µ) cannot be compact when µ is non-atomic. Indeed, L2(R) has no compact composition operators. Koopman operators over most frequently considered RKHSs are compact for a very narrow range of dynamics. In fact, most Koopman operators over these spaces are not even bounded, as will be expanded upon in Section 4.2.3. Unboundedness is an added complication for the valid implementation of DMD, addressed in Section 5, using densely defined and potentially unbounded Koopman operators. Alternatively, other classes of compact operators over RKHSs connected to dynamical systems can be leveraged for DMD procedures to give convergence guarantees (see, e.g., Rosenfeld et al. (2022)). The remainder of this section introduces densely defined Koopman operators over RKHSs, where Lemma 1 enables the DMD algorithm introduced in Section 5. Definition 4. Let H be a RKHS over Rn. For a function F : Rn Rn we define the Koopman Operator (sometimes called a composition operator), KF : D(KF ) H, as KF g = g F where D(KF ) = {g H : g F H}. When D(KF ) is dense in H, KF is said to be densely defined. While not all densely defined Koopman operators over RKHSs are bounded, they are all closed operators (Pedersen, 2012). Lemma 1. Let F : X X be the symbol for a Koopman operator over a RKHS H over a set X. KF : D(KF ) H is a closed operator. Published in Transactions on Machine Learning Research (09/2024) The proof of the above lemma is provided in Appendix B.1. Proposition 4. If a Koopman operator is densely defined, its adjoint is densely defined and closed. Proof. Given the kernel function centered at x, KF g, Kx H = g F, Kx H = g(F(x)) = g, KF (x) H. Therefore, the linear functional g 7 KF g, Kx is bounded over H. Thus, Kx D(K F ) for all x X, and K F Kx = KF (x). Hence, each kernel function is in the domain of the adjoint of a densely defined Koopman operator, and as the span of kernel functions is dense in their RKHS, the adjoint is densely defined. 4 A Different Landscape for Koopman Operators This section examines properties of Koopman operators over RKHSs. The selection of space fundamentally changes the behavior of Koopman operators over that space, where properties such as the lattice of eigenfunctions, common eigenfunctions for different discretizations, and boundedness of the operators may not hold. In the succeeding subsections we discuss each of these properties and provide counter examples for each of these properties in Appendix A.3 for Koopman operators corresponding to the continuous time dynamics x = x2 x1 T . We discuss that many RKHSs only support bounded Koopman operators when the discretized dynamics are affine, and in the case of the Gaussian RBF s native space, we provide a novel proof of this fact in Appendix B.2. Subsequently, the notation F t will denote the discretized dynamics corresponding to x = x2 x1 T with fixed time step t > 0. Recently, kernel methods have been adapted for the study of DMD and Koopman operators, largely through the guise of extended DMD, where kernels are leveraged to simplify computations via the kernel trick. However, the adjustment from the classical study of Koopman operators through ergodic theory to that of reproducing kernel Hilbert spaces leads to significant differences in the Koopman operators and their properties. In most cases, the ergodic theorem cannot be directly applied to recover invariants of the operation g 7 g F, for a given F, since those invariants may be nonsmooth. This section exemplifies some of the distinguishing properties of Koopman operators over RKHSs, and in some cases, illustrates their limitations. Much of the classical properties of Koopman operators is established in a variety of specific contexts, such as L2 spaces of invariant measures and L1 spaces, can be seen in (Budišić et al., 2012; Kawahara, 2016; Kutz et al., 2016b; Brunton & Kutz, 2019; Brunton et al., 2021). Properties of the Koopman operator strongly depend on the selection of underlying vector space, and boundedness, compactness, eigenvalues, etc. change based on this selection. While Koopman operators were introduced by Koopman in 1931 in (Koopman, 1931) and then later picked up by the data science community in the early 2000s (e.g. (Mezić, 2005; Kutz et al., 2016b)), the study of such operators and their properties continued in earnest throughout the 20th century as composition operators (e.g. (Shapiro, 2012)). This is particularly important for RKHSs, where the specification of a bounded or densely defined Koopman operator over a particular space yields strong restrictions on the available dynamics. 4.1 Concerning Sampling and Discretizations 4.1.1 Forward Complete Dynamics In applications, Koopman operators enter the theory of continuous time dynamics through a discretization of the continuous time dynamical system (Bittracher et al., 2015; Mauroy & Mezić, 2016). That is, given the dynamical system x = f(x), the system is discretized through the selection of a fixed time-step, t > 0, as xm+1 = xm + R tm+ t tm f(x(t))dt, where the right hand side plays the role of the discrete dynamics. However, Published in Transactions on Machine Learning Research (09/2024) for such a discretization to exist for arbitrarily large values of m, it is necessary that the dynamics be forward complete. The forward completeness assumption restricts the class of continuous dynamics on which Koopman based methods may be applied. For example, the continuous time dynamics, x = 1 + x2, does not admit a discretization, since it is not forward complete. Example 2 in Appendix A.2 demonstrates where discretization fails for x = 1 + x2. A DMD method that circumvents this requirement, by utilizing Liouville operators and occupation kernels, may be found in (Rosenfeld et al., 2022). 4.1.2 Sampling and Data Science Ergodic based methods as employed in (Budišić et al., 2012; Mezić, 2005; Kutz et al., 2016b;b; Takeishi et al., 2017) provide a methodology for obtaining invariants and eigenfunctions for a Koopman operator almost everywhere. That is, by selecting a continuous representative of an equivalence class in an L2 space for the invariant measure, at almost every point within the domain, time averaging against that representative will converge to an invariant of the operator. However, this is a probability 1 result, and the number of points where it may fail can potentially be uncountable. Without any external information concerning the convergence, there is no true guarantee that at a particular selected point, the time-averaged approximation will be close to the evaluation of an actual invariant at that point. Such computational issues are precisely where the strength of kernel methods manifest. To illustrate the kernel method consider the following theorem: Theorem 1. Let F : Rn Rn be a discretization of a dynamical system with the corresponding Koopman operator, KF : H H, and let H be a RKHS over Rn consisting of continuous functions. Suppose further that ϵ > 0 and ˆKF : H H is an approximation of KF such that the norm difference is bounded as KF ˆKF < ϵ. Suppose that ˆϕ H and is a normalized eigenfunction of ˆKF with eigenvalue λ. Then ˆϕ is an approximate eigenfunciton of KF . |KF ˆϕ(x) λ ˆϕ(x)| = |KF ˆϕ(x) ˆKF (x) ˆϕ(x)| = | (KF ˆKF ) ˆϕ, K( , x) H| KF ˆKF K( , x) H ϵ C and C > 0 is a constant that depends on the kernel function and a prespecified compact domain. The compact domain may be extended to all of Rn in some cases, such as when the kernel function is the Gaussian RBF kernel function. Thus it can be seen that kernel spaces and approximations that are close to the Koopman operator, in operator norm, can provide functions that behave similar to eigenfunctions of the Koopman operator. Moreover, the difference in behavior from a proper eigenfunction is governed pointwise by how close the operator approximation is in the first place. 4.2 Properties of the Operators This section will consider the classical Fock space consisting of entire functions as a function space over which the Koopman operator is defined. The Fock space is used extensively in Quantum Mechanics (Hall, 2013) and it is a space where operators have been well studied (Zhu, 2012). Definition 5. The Fock space is a RKHS, with kernel function K(z, w) = e wz. The kernel function for the Fock space over Cn may be obtained through a product of single variable kernels as K(z, w) = ew z = e w1z1 e wnzn. The Fock space is given as m=0 |am|2m! < Closely related to the Fock space is the exponential dot product kernel, ex T y, where for a single variable, the exponential dot product kernel s native space may be obtained by restricting the Fock space to the reals, and Published in Transactions on Machine Learning Research (09/2024) then taking the real part of the restricted functions. Through a conjugation of the exponential dot product kernel, the Gaussian RBF may be obtained as KG(x, y) = e x 2 2/2ex T ye y 2 2/2 = exp x y 2 2 2 and performing the same operation on the Fock space kernel over Cn yields KG(z, w) = e z2/2ew ze w2/2 = exp (z w)2 which is the kernel corresponding to the complexified native space for the Gaussian radial basis function over Cn (cf. (Steinwart & Christmann, 2008)). This space may be expressed as H2 G(C) = n g(z)e z2/2 : g F 2(Cn) o , and the native space corresponding to the Gaussian RBF can be obtained by taking the real parts of functions from H2 G and restricting to Rn. 4.2.1 Lattice of Eigenfunctions As presented in (Budišić et al., 2012; Klus et al., 2015), the eigenfunctions of Koopman operators over L (R) form a lattice. That is if ϕ1 and ϕ2 are two eigenfunctions for the Koopman operator, then so is ϕ1 ϕ2. For the lattice to occur more generally, it is necessary for the product of the eigenfunctions to be a member of the underlying vector space. This closure property holds, for example, in the space of continuous functions and other Banach algebras. Hilbert spaces are not generally Banach algebras, and since it is desirable to work over Hilbert spaces for properties such as best approximations, projections, and orthonormal bases (cf. (Folland, 1999)), it is important to point out that the closure property of eigenfunctions of Koopman operators does not hold in general. For example the eigenfunctions of KFπ do not form a lattice. Fundamentally, powers of ϕ(z) = ez2/4 F 2(C) are not all contained in the Fock space. This is ultimately a consequence of growth conditions imposed by the RKHS norm. For more details consult Appendix A.3.1. 4.2.2 Common Eigenfunctions The intuition behind the use of Koopman operators in the study of continuous time dynamical systems is that eigenfunctions for the Koopman operators should be close to that of the Koopman generator for small timesteps. However, semi-groups of Koopman operators do not always share a common collection of eigenfunctions. Since each Koopman operator obtained through a fixed time-step may produce a different collection of eigenfunctions, there is no way to distinguish which, if any, should correspond to eigenfunctions of the Koopman generator. In Appendix A.3.2 we show that an eigenfunction for the Koopman operator corresponding to Fπ/2 and is not an eigenfunction for the Koopman operator corresponding to Fπ/3. 4.2.3 Boundedness of Koopman Operators Throughout the literature, it is frequently assumed that Koopman operators are bounded. This assumption manifests as an unrestricted selection of observables in the study of the Koopman operator. When a Koopman operator is a densely defined operator whose domain is the entire Hilbert space, it is also closed (Pedersen, 2012). Hence, by the closed graph theorem (cf. (Folland, 1999, Theorem 5.12)), such an operator must be bounded. Furthermore, the collection of finite rank operators is dense in the collection of bounded operators over a Hilbert space in the strong operator topology (SOT) (cf. (Pedersen, 2012, Paragraph 4.6.2)). Convergence in SOT was independently studied in the work (Korda & Mezić, 2018), where the DMD routine was demonstrated to converge to a bounded Koopman operator in SOT. As mentioned in (Korda & Mezić, 2018), SOT convergence does not in general lead to convergence of the eigenvalues. To preserve spectral convergence, the finite rank approximations produced by DMD algorithms need to converge to Koopman operators in the operator norm topology. The most direct approach, and Published in Transactions on Machine Learning Research (09/2024) one that leads to good pointwise estimates of eigenfunctions, is through the use of compact Koopman operators. However, it isn t immediately clear when one can expect a continuous dynamical system to yield a compact Koopman operator through discretization. For example, the Koopman operator corresponding to discretization of the continuous time system x = 0 is the identity operator, I, for any fixed time step, and I is not compact over any infinite dimensional Hilbert space. In addition, for any given RKHS, the collection of bounded Koopman operators is very small. It was demonstrated in (Carswell et al., 2003) that a Koopman operator over the Fock space is bounded only when the corresponding discrete dynamics are affine. It follows that the same result holds over the exponential dot product kernel s native space. It may perhaps be less obvious that this result extends to the Gaussian RBF s native space. A proof of this fact was first published in (Ikeda et al., 2022). We provide a novel proof of this result that leverages the Gaussian RBF kernel s connection to the Fock space in Appendix B.2. The theorem and corollary related to our proof is shown below. These proofs demonstrate that even for popular selections of RKHSs, the collection of bounded Koopman operators is small. Theorem 2. If F : Cn Cn is an entire function, and KF is bounded on HG, then F(z) = Az + b for a matrix A Cn n and vector b Cn. Here, HG represents the Gaussian RBF s native space. Corollary 1. If F is a real entire vector valued function, and KF is bounded on the Gaussian RBF s native space over Rn, then F is affine. Hence, for the most commonly used kernel function in machine learning, the collection of bounded (and hence compact) Koopman operators over its native space is restricted to only those Koopman operators corresponding to affine dynamics. Each selection of RKHS and kernel function will yield a correspondingly small collection of bounded Koopman operators. It should be noted that Koopman operators were completely classified over the classical sampling space, the Paley-Wiener space (Chacón & Giménez, 2007), as also being those that correspond to affine dynamics, and it is a simple exercise to show that the native space for the polynomial kernel also only admits bounded Koopman operator when the dynamics are affine. In summary, thus far it has been established that Koopman operators are only bounded in the case where the dynamics are affine over the following spaces: The polynomial kernel space, K(x, y) = (1 + x T y)n, The Fock Space, K(z, w) = eαz w, The exponential dot product kernel s native space, K(x, y) = eαx T y, The Gaussian RBF s native space, K(x, y) = exp 1 µ x y 2 2 , Paley-Wiener spaces, K(x, y) = sinc(α(x y)), and Other spaces discussed in (Ikeda et al., 2022). Consequently, in most practical respects Koopman operators over RKHSs should not be assumed to be bounded, and certainly not compact. 5 Dynamic Mode Decomposition with Koopman Operators over RKHSs As a product of its genesis in the machine learning community, many DMD procedures appeal to feature space, and this holds in implementations of kernel-based extended DMD (Williams et al., 2015b), which casts the snapshots from a finite dimensional nonlinear system into an infinite feature space. The direct involvement of the feature space in the estimation of the Koopman operator leads to rather complicated numerical machinery. To avoid directly computing the infinite dimensional vectors that result, an involved collection of linear algebra techniques are leveraged to extract the Koopman modes. Here it is shown that this process may be simplified and that a procedure that directly involves the kernel functions centered at Published in Transactions on Machine Learning Research (09/2024) the snapshots simplifies the design of DMD algorithms. This approach keeps with the spirit of the kernel trick, where feature vectors are never directly evaluated and only accessed through evaluations of the kernel function itself. The algorithm presented in this section is designed around scalar valued RKHSs, which necessitates the decomposition of the (vector valued) full state observable component-wise. A complete vector valued algorithm is discussed in Section 6, where it is demonstrated that the present algorithm is computationally equivalent to the vector valued algorithm for certain selections of kernel operators. Recall from Proposition 2 that in a real valued Hilbert space, the projection of a function g onto a collection of linearly independent basis functions, u1, . . . , u M, is a linear combination of those functions, Pg = PM j=1 wjuj where the weights wj may be determined by solving u1, u1 H u1, u M H ... ... ... u M, u1 H u M, u M H g, u1 H ... g, u M H We will use Definition 6 in order to aid us in creating and denoting the finite rank representation of KF . Definition 6. Let E be a linear transformation between two finite dimensional vector spaces V and W. Suppose that α = {α1, . . . , αN} is an ordered basis for V , and suppose that β = {β1, . . . , βM} is an ordered basis for W. A matrix representation of the linear transformation, E, with respect to the ordered bases α and β is denoted as [E]β α, where the j-th column of this matrix contains the weights of the vector Eαj corresponding to the ordered basis β. Throughout this algorithm, a Koopman operator will be assumed to be densely defined, as Section 4 demonstrated that most Koopman operators cannot be expected to be bounded or compact. An additional assumption will be made that the kernel functions themselves reside in the domain of the Koopman operator. It should be noted that since the kernels are always in the domain of the adjoint of the Koopman operator (see Section 3), a finite rank representation of the adjoint of the Koopman operator may thus be derived without assuming that the kernels are in the domain of the Koopman operator. For the sake of the derivation, it is also assumed that the Koopman operator is diagonalizable, which is not generally expected to be true. However, the finite rank representations leveraged in this manuscript are almost always diagonalizable, since the set of non-diagonalizable matrices are of measure zero in the collection of all matrices. Moreover, for periodic data sets, the adjoint of the Koopman operator will be invariant on the span of the collection of kernel functions centered at the snapshots, and thus, the finite rank representations will be explicitly the adjoint of the Koopman operator on that subspace, which supports the assumption of the availability of eigendecompositions for the Koopman operator in the periodic or quasiperiodic settings. For a given collection of snapshots {x1, x2, ..., xm}1, the goal is to determine a finite rank representation of KF that is derived from the kernel functions centered at the snapshots. To express a finite rank representation, the ordered basis α = {kx1, ..., kxm 1} is leveraged. In particular, if Pα is the projection onto span(α), the operator PαKF maps span(α) to itself, which enables the discussion of eigenfunctions and eigenvalues of PαKF using only functions in span(α). Proposition 5. Given a collection of snapshots {x1, x2, ..., xm} X and the corresponding ordered basis α = {kx1, ..., kxm 1} of kernel functions, if g = Pm 1 i=1 aikxi and PαKF g = Pm 1 i=1 bikxi then Gb = Ia, (2) 1While availability of a time series of snapshots {x1, x2, ..., xm} such that xi+1 = F(xi) is a more typical use case, the developed method does not require such a time series. It can also be implemented using arbitrary snapshots {x1, x2, ..., xm} and {y1, y2, ..., ym} provided yi = F(xi). Published in Transactions on Machine Learning Research (09/2024) where a := (a1, ..., am 1)T , b := (b1, ..., bm 1)T , and the Gram matrix G and the interaction matrix I are given by K(x1, x1) K(x1, xm 1) ... ... ... K(xm 1, x1) K(xm 1, xm 1) K(x2, x1) K(x2, xm 1) ... ... ... K(xm, x1) K(xm, xm 1) Proof. Using Proposition 2, the reproducing property, and the fact that KF kxi(x) = kxi(F(x)) = K(F(x), xi), KF Pm 1 i=1 aikxi, kx1 H ... KF Pm 1 i=1 aikxi, kxm 1 H Pm 1 i=1 ai KF kxi, kx1 H ... Pm 1 i=1 ai KF kxi, kxm 1 H Pm 1 i=1 ai K(x2, xi) ... Pm 1 i=1 ai K(xm, xi) If kx1, . . . , kx M 1 are linearly independent, then G is invertible and b = G 1Ia, i.e., the operator PαKF , restricted to span(α) is uniquely represented by the matrix [PαKF ]α α = G 1I. If G is singular, then the projected function PαKfg admits multiple sets of coefficients bi that satisfy PαKfg = Pm 1 i=1 bikxi. In this case, the finite rank representation is not unique. In this paper, we study two specific approaches to compute a finite rank representation for the case where G is singular or numerically singular. The first approach, summarized in Algorithm 1, is regularized regression, i.e., [PαKF ]α α := (G + ϵIm 1) 1I, where ϵ > 0 is a user-selected regularization coefficient and Im 1 is an m 1 m 1 identity matrix. The second approach is a truncated pseudoinverse approach, i.e., [PαKF ]α α := G+ ϵ I, where G+ ϵ is the truncated pseudoinverse of G, computed by zeroing the singular values of G that are smaller than ϵ 0. It should be noted that G and I are precisely the matrices examined in (Williams et al., 2015b) after the use of a truncated SVD. If the truncated pseudoinverse approach is used to implement the method developed in this paper, then the generated results are identical to those generated by the KDMD method in (Williams et al., 2015b). The results obtained by using the regularized regression approach differ from those obtained using the KDMD method in (Williams et al., 2015b), but not substantially so. In Appendix C we provide empirical evidence for the similarity of the results obtained from our method and the method employed in (Williams et al., 2015b). The objective of DMD is to use the finite rank representation determined above to create a data driven model of the dynamical system. This makes use of a fundamental property of eigenfunctions of the Koopman operator. Lemma 2. Suppose that ϕ is an eigenfunction of KF with eigenvalue λ. Evaluating the eigenfunction at a snapshot reveals ϕ(xi+1) = λiϕ(x1). Proof. Let ϕ be an eigenfunction of KF with eigenvalue λ, where F represents the discrete time dynamics such that xi+1 = F(xi). Furthermore, recall the definition of the Koopman operator presented in Definition 4. Then λϕ(xi) = KF ϕ(xi) = ϕ (F(xi)) = ϕ (xi+1) Therefore, ϕ(xi+1) = λiϕ(x1). Proposition 6. Suppose that {ϕj} j=1 is a complete set of eigenvectors of the Koopman operator, KF , corresponding to the eigenvalues {λj} j=1. For a state x Rn, let (x)d be the d-th component of x for d = 1, . . . , n. If it is assumed that the mapping x 7 (x)d is in the RKHS (as it is when H is the native space for the exponential dot product space (Steinwart & Christmann, 2008)), then each snapshot may be reconstructed as xi+1 = lim M j=1 ξj,Mλi jϕj(x1). (3) Published in Transactions on Machine Learning Research (09/2024) Proof. Since {ϕj} j=1 is an eigenbasis for KF corresponding to the eigenvalues {λj} j=1 and the mapping x 7 (x)d is in the RKHS, (x)d = lim M j=1 (ξj,M)dϕj(x) for some coefficients {(ξj,M)d} j=1. By stacking each (x)d, the full state observable gid, given by gid(x) = x, is expressed as gid(x) = lim M j=1 ξj,Mϕj(x). (4) Therefore, equation 3 can be derived by using equation 4 and Lemma 2. Note that since the Koopman operator is not generally a normal operator, {ϕi} i=1 is not expected to be an orthonormal basis, and hence, there may be nonzero influences between the coefficients obtained by projection and this is expressed by the additional index M in ξj,M. This means that a series representation of the decomposition as expressed in (Kawahara, 2016; Brunton & Kutz, 2019) is not always possible. Hence, Koopman modes are not fixed quantities unless there is an orthonormal basis of eigenfunctions for the Koopman operator. Since the Koopman operator is approximated here by a finite rank representation, perfect reproduction of gid through a series of eigenfunctions is not possible. Instead, eigenfunctions determined through the finite rank representation are used to construct the approximation of gid. In particular, the matrix [PαKF ]α α is the matrix representation of PαKF . Proposition 7. If vj is an eigenvector for the matrix [PαKF ]α α with eigenvalue λj, then Pm 1 i=1 (vj)i K(x, xi) is an eigenfunction of PαKF . Proof. By the definition of eigenvector, [PαKF ]α αvj = λjvj. Therefore, i=1 (vj)i K(x, xi) K(x, x1) ... K(x, xm 1) [PαKF ]α αvj = λj i=1 (vj)i K(x, xi). Using Proposition 7 the corresponding normalized eigenfunction is denoted by ˆϕj(x) := 1 q i=1 (vj)i K(x, xi), (5) where G = (K(xi, xℓ))m 1 i,ℓ=1 is the Gram matrix associated with the snapshots and the kernel function and ( ) denotes the conjugate transpose. Using a finite rank representation of equation 4, it is easy to see that the d-th row of the matrix ˆξ := ˆξ1 . . . ˆξm 1 of Koopman modes is comprised of the components of (x)d along the (non-orthogonal) directions ˆϕj. That is, gid(xi) = xi = j=1 ξj ˆϕj(xi), (6) which yields ˆξV T G = X, Published in Transactions on Machine Learning Research (09/2024) where X := x1 . . . xm 1 is the data matrix and v 1Gv1 . . . vm 1 p is the matrix of normalized eigenvectors of [PαKF ]α α. Similar to the finite rank representation above, if G is non-singular, then the matrix of Koopman modes is unique, given by ˆξ = X(V T G) 1. If G is singular and if regularized regression is used, then modes are computed using ˆξ = X(V T (G+ϵIm 1)) 1. If the truncated pseudoinverse approach is selected, then eigenvalues of the finite rank representation with absolute value smaller than ϵ are removed from the computation, along with the corresponding eigenvectors, and the modes are computed using ˆξ = X(V T G)+ ϵ . Using the approximate eigenfunctions, ˆϕj, equation 6, and Lemma 2 a data driven model for the system is obtained as j=1 ˆξjλi j ˆϕj(x1). (7) Furthermore, the discrete-time dynamics F may be approximated (by setting i = 1) as F(x) ˆF(x) := j=1 ˆξjλj ˆϕj(x). (8) Consider a continuous-time system x = f(x) with f C1, and let a compact set X be forward invariant under the flow of the system. If the discrete-time system xk+1 = F(xk) is obtained via discretization of the continuous-time system with time step T, and if (eγT , ϕ) is an eigenpair of resulting Koopman operator KT for all T > 0, then along a solution x( ) of x = f(x), we have dϕ(x(t)) dt = γϕ(x(t)) (Mauroy & Mezić, 2016, Section III-A). Motivated by this relationship, the continuous-time dynamics may be approximated as f(x) ˆf(x) := T ˆξj ˆϕj(x). (9) Note that in the above construction, the eigenfunctions ϕj are assumed to be common across all time-steps T. In view of Example 5, not all eigenfunctions of the Koopman operator may be common eigenfunctions. As a result, the continuous-time model in equation 9 is a heuristic, albeit a useful one (see Figure 3 in the appendix). 6 Vector Valued Considerations The attentive reader will notice that the ultimate objective of DMD is to achieve an approximation or decomposition of the function gid(x) := x, the full state observable. However, the full state observable is Rn valued, whereas the RKHSs in question consist of scalar valued functions. Consequently, the coefficients that are determined to approximate the full state observable are vector valued. Up to now, the methods have simply separated the individual components of gid and established an approximation for each of them separately. The weights for these approximations are stacked to make the vector valued coefficients. This section shows how the vector valued coefficients arise naturally from a projection onto vector valued functions in a vector valued RKHS. With the right selection of kernel operator, this projection operation reduces to the setting of Section 5. If the goal is to decompose the full state observable all at once, then one must appeal to vector valued RKHSs, which produce an operator valued kernel function. Proposition 8. Given a vector valued RKHS, H, consisting of functions that map a set X to a Hilbert space Y, there is for each x X and ν Y a function Kx,ν H such that g, Kx,ν H = g(x), ν Y. Published in Transactions on Machine Learning Research (09/2024) Algorithm 1: Pseudocode for the kernel perspective based DMD algorithm. Upon obtaining the Koopman modes, the approximate eigenfunctions, and the eigenvalues, equation 7 is used to compute xi+1. Input: Snapshots, X = {x1, x2, ..., xm} Input: Kernel function K : Rn Rn R of an RKHS Compute the Gram matrix, G = (K(xi, xℓ))m 1 i,ℓ=1 if G is ill-conditioned then if regularized regression is used then Set G = G + ϵIm 1 else if truncated pseudoinverse is used then Set G 1 = G+ ϵ end if end if Compute the interaction matrix, I = (K(xi, xℓ))i=m,ℓ=m 1 i=2,ℓ=1 Compute the finite rank representation of the Koopman operator, [PαKF ]α α = G 1I Compute eigenvalues, λj, and eigenvectors, vj, of [PαKF ]α α if truncated pseudoinverse is used then For all j, if |λj| < ϵ then set λj = 0 and vj = 0 Set (V T G) 1 = V T G + ϵ end if Compute the matrix of Koopman modes, ˆξ = X V T G 1 Compute approximate eigenfunctions, equation 5 Output: Koopman modes, ˆξj for j = 1, . . . , m 1 Output: Approximate eigenfunctions, ˆϕj(x) for j = 1, . . . , m 1 Output: Eigenvalues λj for j = 1, . . . , m 1 The mapping ν 7 Kx,ν is linear, and hence, the kernels are expressed as Kxν where Kx : Y H is a bounded operator. In this case, for each g H, K xg = g(x). The operator valued kernel associated with H is given as K(x, y) := K x Ky : Y Y H, and note that when Y = Rn, this operator valued kernel is a matrix whose entries are scalar valued functions. When a vector valued RKHS, H, is obtained from a scalar valued RKHS, H with kernel k(x, y), as H = Hn where the inner product of two elements, g = (g1, . . . , gn)T and h = (h1, . . . , hn)T , from H is given as g, h H = Pn j=1 gj, hj H, then the operator valued kernel is given as k(x, y)In, which has many convenient properties for computation. More details concerning vector valued RKHSs may be found in (Carmeli et al., 2010; Agler & Mc Carthy, 2023). Here the Koopman operator is defined just as it would be for a scalar valued space: KF g = g F, and the difference here is that g is vector valued. Similar relationships between the Koopman operator and the kernel functions still hold, where KF g, Kxν H = g(F(x)), ν Y = g, KF (x)ν H, hence K F Kx = KF (x). Now given a collection of snapshots, {x1, . . . , x M}, the associated operator valued kernels are Kx1, . . . Kx M . For decomposition of the full state observable, gid(x) = x, one needs to interface with the Hilbert space and construct a finite rank approximation of KF as was done in the prior section. This means that a subspace within H needs to be constructed, say spanned by a collection of basis functions α, so one can write KF = PαKF Pα. The selection of the basis α is less constrained than in the scalar valued case, where only the kernels centered at the snapshots were considered. In this case, not only must the snapshots be considered, but also complete Published in Transactions on Machine Learning Research (09/2024) freedom of the choice of ν Rn to select each Kx,ν = Kxν. A natural choice is to select ν from the standard basis of Rn. This leads to α = {Kx0,e1, . . . , Kx M 1,e1, Kx0,e2, . . . , Kx M 1,e2, . . . , Kx0,en, . . . , Kx M 1,en}, which is a large basis for problems such as the cylinder flow example, where n is of the order 80, 000. Consequently, the gram matrix corresponding to this basis is a square matrix of dimension M n. So for arbitrary operator valued kernels, the DMD algorithm presented in this paper would have to invert an M n dimensional matrix, and the computation time would scale with the dimension of the space as O((M n)3) using standard inversion algorithms. This scaling would make benchmarks such as the cylinder flow example completely infeasible. Computational feasibility can be achieved through the judicious selection of the kernel operator. Here we leveraged a kernel operator of the form K(x, y) = k(x, y)In, with k a scalar valued kernel function. In this case, Kxν, Kyω H = k(x, y) ν, ω Rn and the two functions Kxν and Kyω are orthogonal when ν and ω are orthogonal in Rn. This means that the Gram matrix composed of the α given above is going to be a block diagonal matrix with each block being the Gram matrix corresponding to the scalar valued kernel. Thus only one matrix of size M must be inverted, and each dimension treated individually. Significantly, the inversion problem no longer scales with the dimension. Alternatively, if we instead use a different scalar valued space for each dimension, we would get n different Gram matrices to invert. The complexity would scale with the dimension, but linearly rather than cubicly. Proposition 9. Suppose that k(x, y) corresponds to a scalar valued RKHS, H, and let H be an Rn valued RKHS such that if g H, then g = (g1, . . . , gn)T where gi H for each i = 1, . . . , n equipped with the inner product g, h H = Pn i=1 gi, hi H. Then H is a vector valued RKHS. Proof. Let v Rn and x X then | g(x), v Rn| g(x) Rn v Rn = i=1 | gi, k( , x) H|2 v Rn i=1 gi 2 H v Rn = p k(x, x) g H v Rn, hence the functional g 7 g(x), v Rn is bounded. In this setting, K(x, y) = diag(k(x, y), . . . , k(x, y)), and Kxei, Kyej Rn = 0 if i = j. That H is a vector valued RKHS follows from the above. If {x1, . . . , x N} is a collection of snapshots such that F(xi) = xi+1, then a finite rank approximation of KF may be constructed by examining the image of the operator on Kxi,ej for i = 1, . . . , N and j = 1, . . . , n, and then projecting back onto the span of these kernels. The projection operation requires the computation of the Gram matrix for the basis {Kxi,ej}, which is a block diagonal matrix, where each block corresponds to a selection of dimension through ej. Thus, if G = (k(xs, xℓ))N s,ℓ=1 the gram matrix manifests as Published in Transactions on Machine Learning Research (09/2024) Similarly, the interaction matrix is a block diagonal. Consequently, the vector of weights corresponding to each kernel function is composed of n smaller vectors of length N 1, each one corresponding to a different dimension. Hence, each dimension may be treated independently. The eigenfunctions for this finite rank representation of the Koopman operator are then composed of n identical collections of N 1 functions, differing only in the dimension they are supported in. Let ϕ1, . . . , ϕN H be these scalar valued functions, and write ϕs,i := ϕsei be the corresponding vector valued eigenfunction. The full state observable is projected onto this collection of eigenfunctions as s=1 ws,iϕs,i(x) = Here, Pn i=1 ws,iei = ξs, where ξs is the Koopman mode from the previous section. 7 Numerical Example In this section we compare the results obtained from the developed DMD method and the method developed in (Williams et al., 2015b) on a benchmark dataset. Two additional numerical examples are included in Appendix C 7.1 Periodic flow around a cylinder The website (Kutz et al., 2016a) accompanying the textbook (Kutz et al., 2016b) provides several data sets that serve as benchmarks for spectral decomposition approaches to nonlinear modeling, which have been released to the public through their website. This section utilizes the cylinder flow data set to demonstrate the results of the developed DMD method. The cylinder flow example is numerically generated, and the data provided corresponds to a planar steady state flow of the system. The data set consists of 151 snapshots, containing values of the vorticity of the flow at several mesh points in a plane, recorded every 0.02 seconds. In this demonstration, snapshots 1 through 30 are used as the input data, and snapshots 2 through 31 are used as output data, assuming that the ith and (i + 1)th snapshots satisfy xi+1 = F(xi) for some unknown nonlinear function F. The snapshots are normalized so that the largest 2-norm among all snapshots is 1. (a) This figure shows reconstruction of the vorticity field at the same time points using two different kernels. The left column shows reconstruction of the initial state, x1. The middle column shows reconstructions of the state, x15, and the right column corresponds to reconstruction of the state x30. (b) This figure presents prediction of the vorticity field at the same time points using two different kernels. The left column presents prediction of the state x51. The middle column shows prediction of the state x101, and the right column corresponds to prediction of the state x151. Figure 1: Reconstruction and prediction of the vorticity of a fluid flow past a cylinder using two different kernel functions. The first rows contains the ground truth, the second rows leverages the Gaussian RBF kernel, and the third row uses the exponential dot product kernel. Published in Transactions on Machine Learning Research (09/2024) 0.5 1 1.5 2 2.5 3 0 x(t) ˆx(t) 2 (a) Method developed in this paper 0.5 1 1.5 2 2.5 3 0 x(t) ˆx(t) 2 (b) Kernel DMD method in (Williams et al., 2015b) Figure 2: For the periodic flow example, this figure shows the 2-norm of the error between the true snapshots and the snapshots generated using DMD. The results for the Gaussian RBF kernel and the exponential dot product kernel are identical. The Koopman modes, eigenvalues, and eigenfunctions are then computed using the developed technique and snapshots 2 through 31 are reconstructed using equation 7. DMD is implemented using the exponential dot product kernel, K(x, y) = exp( 1 µx T y) (with µ = 1), and the Gaussian RBF kernel, K(x, y) = exp 1 µ x y 2 2 (with µ = 1). To further demonstrate the accuracy of the obtained finite-dimensional representation of the Koopman operator, snapshots 32 through 151 are predicted from snapshot 1 using equation 7. The code used to generate these results is publicly available, see Gonzalez et al. (2023). 7.2 Discussion Figures 1a and 2 show the ability of the developed DMD technique to reconstruct the training data. The relative reconstruction errors associated with this system are of the order of 1e 8. Additionally, figure 1a shows that for the periodic flow example, similar reconstruction results can be obtained when using the Gaussian RBF kernel or the exponential dot product kernel. As shown in figures 1b and 2, given the first 31 snapshots, the developed DMD technique is able to predict the remaining 120 snapshots, with a relative prediction error of order 1e 4, without the knowledge of the underlying physics, F. Furthermore, as expected, the performance of the developed method is nearly identical to the baseline Kernel DMD method developed in (Williams et al., 2015b). 8 Conclusion This manuscript revisits theoretical assumptions concerning DMD of Koopman operators, including the existence of lattices of eigenfunctions, common eigenfunctions between Koopman operators, and boundedness and compactness of Koopman operators. Counterexamples that illustrate restrictiveness of the assumptions are provided for each of the assumptions. In particular, a major theoretical result is established to show that the native RKHS of the Gaussian RBF kernel function only supports bounded Koopman operators if the dynamics are affine. Moreover, a kernel-based DMD algorithm that simplifies the algorithm in (Williams et al., 2015a) and presents it in an operator theoretic context is developed and then validated through simulations. The developed DMD algorithm is also extended to vector valued RKHSs. Acknowledgments This research was supported by the Air Force Office of Scientific Research (AFOSR) under contract numbers FA9550-20-1-0127 and FA9550-21-1-0134, and the National Science Foundation (NSF) under award numbers Published in Transactions on Machine Learning Research (09/2024) 2027976, 2027999, and 1900364. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsoring agencies. Jim Agler and John E Mc Carthy. Pick interpolation and Hilbert function spaces, volume 44. American Mathematical Society, 2023. Nachman Aronszajn. Theory of reproducing kernels. Trans. Am. Math. Soc., 68(3):337 404, 1950. Andreas Bittracher, Péter Koltai, and Oliver Junge. Pseudogenerators of spatial transfer operators. SIAM Journal on Applied Dynamical Systems, 14(3):1478 1517, 2015. Ralph Philip Boas. Entire functions. Academic Press, 2011. Steven L Brunton and J Nathan Kutz. Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press, 2019. Steven L Brunton, Marko Budišić, Eurika Kaiser, and J Nathan Kutz. Modern koopman theory for dynamical systems. ar Xiv preprint ar Xiv:2102.12086, 2021. Marko Budišić, Ryan Mohr, and Igor Mezić. Applied koopmanism. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(4):047510, 2012. Claudio Carmeli, Ernesto De Vito, Alessandro Toigo, and Veronica Umanitá. Vector valued reproducing kernel hilbert spaces and universality. Analysis and Applications, 8(01):19 61, 2010. Brent Carswell, Barbara D Mac Cluer, and Alex Schuster. Composition operators on the fock space. Acta Sci. Math.(Szeged), 69(3-4):871 887, 2003. Gerardo Chacón and José Giménez. Composition operators on spaces of entire functions. Proceedings of the American Mathematical Society, 135(7):2205 2218, 2007. Gerald B Folland. Real analysis: modern techniques and their applications, volume 40. John Wiley & Sons, 1999. Efrain Gonzalez, Moad Abudia, Michael Jury, Rushikesh Kamalapurkar, and Joel A. Rosenfeld. https: //github.com/scc-lab/publications-code/tree/master/2024-TMLR-Kernel Perspective, 2023. Accessed: 2023-12-29. Brian C Hall. Quantum theory for mathematicians, volume 267. Springer, 2013. Masahiro Ikeda, Isao Ishikawa, and Yoshihiro Sawano. Boundedness of composition operators on reproducing kernel hilbert spaces with analytic positive definite functions. Journal of Mathematical Analysis and Applications, 511(1):126048, 2022. Jakob Jakob, Markus Gross, and Tobias Günther. A fluid flow data set for machine learning and its application to neural flow map interpolation. IEEE Transactions on Visualization and Computer Graphics, 27 (2):1279 1289, 2020. Jakob Jakob, Markus Gross, and Tobias Günther. A fluid flow data set for machine learning. https: //doi.org/10.3929/ethz-b-000515488, 2021. Accessed: 2023-12-29. Yoshinobu Kawahara. Dynamic mode decomposition with reproducing kernels for koopman spectral analysis. In Advances in neural information processing systems, pp. 911 919, 2016. Stefan Klus, Péter Koltai, and Christof Schütte. On the numerical approximation of the perron-frobenius and koopman operator. ar Xiv preprint ar Xiv:1512.05997, 2015. Bernard O Koopman. Hamiltonian systems and transformation in hilbert space. Proceedings of the National Academy of Sciences of the United States of America, 17(5):315, 1931. Published in Transactions on Machine Learning Research (09/2024) Milan Korda and Igor Mezić. On convergence of extended dynamic mode decomposition to the koopman operator. Journal of Nonlinear Science, 28(2):687 710, 2018. J. Nathan Kutz, Steven L. Brunton, Bingni W. Brunton, and Joshua L. Proctor. Dynamic mode decomposition. http://www.dmdbook.com/, 2016a. Accessed: 2023-12-29. J. Nathan Kutz, Steven L. Brunton, Bingni W. Brunton, and Joshua L. Proctor. Dynamic mode decomposition: data-driven modeling of complex systems. SIAM, Philadelphia, PA, USA, 2016b. Alexandre Mauroy and Igor Mezić. Global stability analysis using the eigenfunctions of the koopman operator. IEEE Trans. Autom. Control, 61(11):3356 3369, 2016. Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn., 41(1):309 325, 2005. doi: 10.1007/s11071-005-2824-x. Gert K Pedersen. Analysis now. Springer Science & Business Media, 2012. Joel A Rosenfeld, Rushikesh Kamalapurkar, L Gruss, and Taylor T Johnson. Dynamic mode decomposition for continuous time systems with the liouville operator. Journal of Nonlinear Science, 32(1):1 30, 2022. Clarence W Rowley, Igor Mezić, Shervin Bagheri, Philipp Schlatter, and Dan S Henningson. Spectral analysis of nonlinear flows. Journal of fluid mechanics, 641:115 127, 2009. Joel H Shapiro. Composition operators: and classical function theory. Springer Science & Business Media, 2012. RK Singh and Ashok Kumar. Compact composition operators. Journal of the Australian Mathematical Society, 28(3):309 314, 1979. Ingo Steinwart and Andreas Christmann. Support vector machines. Springer Science & Business Media, 2008. Naoya Takeishi, Yoshinobu Kawahara, and Takehisa Yairi. Learning koopman invariant subspaces for dynamic mode decomposition. ar Xiv preprint ar Xiv:1710.04340, 2017. Peter Walters. An introduction to ergodic theory, volume 79. Springer Science & Business Media, 2000. Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data driven approximation of the koopman operator: Extending dynamic mode decomposition. J. Nonlinear Sci., 25(6):1307 1346, 2015a. Matthew O. Williams, Clarence W. Rowley, and Ioannis G. Kevrekidis. A kernel-based method for datadriven Koopman spectral analysis. J. Comput. Dyn., 2(2):247 265, 2015b. Kehe Zhu. Analysis on Fock spaces, volume 263. Springer Science & Business Media, 2012. In this section we provide detailed examples that support some of the claims made throughout this paper. The title of each subsection is used as a way of linking the section in the paper with the examples that are relevant to that section. A.1 Introduction Example 1. Given a Koopman operator KF corresponding to the discrete dynamics, F, an ϵ > 0, and a compact subset Ωthat is invariant for the dynamics, it follows that if φ satisfies KF φ λφ H ϵ, Published in Transactions on Machine Learning Research (09/2024) |φ(F m(x)) λmφ(x)| Cϵ1 λm+1 for x Ω, where the accuracy of the resultant models depend on ϵ and λ. Significantly, as ϵ tends to zero, so does the difference |φ(F m(x)) λmφ(x)| at every point in x Ω. A.2 Forward Complete Dynamics Example 2. Consider the one dimensional dynamics: x = 1 + x2. For fixed 0 < t < π/2, the corresponding discrete time dynamics are given as: xm+1 = tan(arctan(xm) + t). xm = tan(π/2 t), it is clear that xm+1 is undefined. Consequently, the composition symbol, F(x) = tan(arctan(x) + t) for the hypothetical Koopman operator, is not well defined over Rn for any selection of t. A.3 Properties of the Operators The following simple example will be leveraged throughout the discussions in the ensuing subsections. Example 3. Consider the dynamical system x = x2 x1 T , which corresponds to circular dynamics in the plane. For any fixed θ := t, the discretization of this system yields the linear discrete dynamics xm+1 = cos(θ) sin(θ) sin(θ) cos(θ) That is, the discretization corresponding to a fixed time-step results in a fixed rotation of R2. To simplify the presentation, we use C as a model for R2, where rotation of the complex plane reduces to multiplication by a unimodular constant, zm+1 = eiθzm. (10) The corresponding discrete time dynamics will be written as Fθ(z) := eiθz. A.3.1 Lattice of Eigenfunctions Example 4. Consider the dynamical system mentioned in example 3. If θ = π in equation 10, the discrete dynamics corresponding to rotation by π in the complex plane become zm+1 = eiπzm = zm. Published in Transactions on Machine Learning Research (09/2024) The corresponding Koopman operator, KFπ : F 2(C) F 2(C), is given as KFπg(z) := g( z). Hence, every even function is an eigenfunction for this Koopman operator with eigenvalue 1. Any function g F 2(C) exhibits a strict bound on its growth rate (cf. (Zhu, 2012)). To wit, |g(z)| = | g, K( , z) F 2(C)| g F 2(C) K( , z) F 2(C) = g F 2(C)e That is, if a function is in the Fock space then the function is of order at most 2, and if the function is of order 2 it has type at most 1/2 (cf. (Boas, 2011)). Conversely, if an entire function is of order less than 2, it is in the Fock space, and if it is of order 2 and type less than 1/2, then it is also in the Fock space. While functions of order 2 and type 1/2 can be in the Fock space, it does not hold for every such function. For example, ez2/2 is of order 2 and type 1/2, but is not in the Fock space. Thus, ϕ(z) = ez2/4 is an eigenfunction for KFπ in the Fock space. However, ϕ ϕ = ez2/2 is not in the Fock space, and cannot be an eigenfunction for KFπ : F 2(C) F 2(C). Hence, the eigenfunctions for KFπ do not form a lattice. A.3.2 Common Eigenfunctions Example 5. Consider again the dynamical system in example 3, but instead set θ = π/2, then Fπ/2(z) = iz. In this case, the polynomial z4 + z8 F 2(C) is an eigenfunction for the Koopman operator corresponding to θ = π/2 with eigenvalue 1. However, z4 + z8 is not an eigenfunction for KFπ/3, as (eiπ/3z)4 + (eiπ/3z)8 = ei4π/3z4 + ei2π/3z8, and the constants cannot be factored out of the polynomial as an eigenvalue. B.1 Proof of Lemma 1 Lemma 1. Let F : X X be the symbol for a Koopman operator over a RKHS H over a set X. KF : D(KF ) H is a closed operator. Proof. Suppose that {gm} m=1 D(KF ) such that gm g H and KF gm h H. To show that KF is closed, we must show that g D(KF ) and KF g = h. This amounts to showing that h = g F, by the definition of D(KF ). Fix x X, then h(x) = h, Kx H = lim m KF gm, Kx H = lim m gm(F(x)) = lim m gm, KF (x) H = g, KF (x) H = g(F(x)). As x was an arbitrary point in X, h = g(F(x)) and the proof is complete. B.2 Proof Regarding Boundedness of the Koopman Operator As stated in Section 4.2.3, here we provide a novel proof that shows that a Koopman operator over the Gaussian RBF s native space is bounded only when the corresponding discrete dynamics are affine. In order to achieve this we begin by introducing and proving a few lemmas. Lemma 4. If KF is a bounded operator over the Gaussian RBF s native space, then F is a real analytic vector valued function over Rn. Published in Transactions on Machine Learning Research (09/2024) Proof. If KF is bounded, then KF Ky(x) = Ky(F(x)) = exp( F(x) y 2 2) is in the RBF s native space for each y Rn. Since every function in the RBF s native space is real analytic, so is Ky(F(x)), and thus, the logarithm, F(x) y 2 2 = F(x) 2 2 + 2y T F(x) y 2 2 is real analytic. This holds if y = 0, and hence F(x) 2 2 is real analytic. Thus, for every y, the quantity y T F(x) is real analytic. That each component of F(x) is real analytic follows from the selection of y as the cardinal basis elements of Rn, and this completes the proof. Lemma 5. If F is a real analytic vector valued function that yields a bounded Koopman operator, KF , over the Gaussian RBF s native space, then its extension to an entire function, F : Cn Cn induces a bounded operator over HG(Cn). Proof. Since an entire function on Cn is uniquely determined by its restriction to Rn, it follows that the span of the complex valued Gaussian RBFs with centers in Rn is dense in HG. Moreover, to demonstrate that KF is bounded, it suffices to show that there is a constant C such that C2KG(z, w) KG(F(z), F(w)) (11) is a positive kernel. By the above remark, it suffices to show this for real x, y Rn, but then this is equivalent to the statement that KF is bounded over the Gaussian RBF s native space over Rn. Theorem 2. If F : Cn Cn is an entire function, and KF is bounded on HG, then F(z) = Az + b for a matrix A Cn n and vector b Cn. Proof. If KF is bounded, then it has a bounded adjoint, K F , which acts on the complex Gaussian as K F KG( , z) = KG( , F(z)). In particular, there is a constant C > 0 such that KG( , F(z)) 2 HG C2 KG( , z) 2 HG. Noting the identity KG( , z) 2 HG = exp 2 Pn j=1(ℑzj)2 and taking the logarithm, it follows that n X j=1 (ℑFj(z))2 log(C2) + j=1 (ℑzj)2 log(C2) + z 2 2. (12) From this inequality, it follows that for each coordinate j = 1, . . . , n, the harmonic function vj(z) = ℑFj(z) has linear growth. That is, there is a constant C so that |vj(z)| C(1 + z 2) for all z Cn. It follows (e.g. from the standard Cauchy estimates) that vj(z) = vj(x + iy) must be an affine linear function of x and y, and therefore, so must its harmonic conjugate uj(z), and we conclude that F(z) = Az + b. C Further Numerical Examples C.1 The Duffing oscillator In this numerical example, sixteen 10 seconds long trajectories of the Duffing oscillator, given by x1 = x2 and x2 = 0.1x2 +x1 x3 1, are generated starting from initial states on a uniform grid on the set [ 1, 1] [ 1, 1]. The trajectories are sampled every 0.5 seconds to generate a set of snapshots. The snapshots are used to build a predictive model of the oscillator using the method developed in this paper (implemented using regularized regression with ϵ = 10 6 and the exponential dot product kernel with µ = 20) and using the baseline method from (Williams et al., 2015b) (implemented using a threshold of 10 3 for zeroing singular values and the exponential dot product kernel with µ = 20). To evaluate the predictive models, 15 seconds long trajectories of the oscillator starting from 100 initial conditions randomly selected from the set [ 1.5, 1.5] [ 1.5, 1.5] are generated by integrating the continuous-time model in equation 9 using a variable time-step fourth order Runge-Kutta integrator. These trajectories are compared against the true trajectories starting from the same initial conditions, generated by integrating the true dynamics using the same integrator. The metric used for comparison is the pointwise relative error e(t) = x(t) ˆx(t) maxt{ x(t) }, where x( ) is the true trajectory and ˆx( ) is the predicted trajectory. Figure 3 shows the pointwise mean of the relative error across the 100 trajectories along with the spread of the relative error from the pointwise minimum to the pointwise maximum. Published in Transactions on Machine Learning Research (09/2024) 0 2 4 6 8 10 0 0 5 10 0 1 2 3 4 10 2 (a) Method developed in this paper, reconstruction using the continuous-time model in equation 9. 0 2 4 6 8 10 0 0 2 4 6 8 0 1 2 3 4 10 2 (b) Kernel DMD from (Williams et al., 2015b), reconstruction using the continuous-time model in equation 9. 0 2 4 6 8 10 0 0 5 10 0 0.05 (c) Method developed in this paper, reconstruction using the discrete-time model in equation 8. 0 0.5 1 1.5 2 0 0 1 2 0 0.05 (d) Kernel DMD method in (Williams et al., 2015b), reconstruction using the discrete-time model in equation 8. Figure 3: For the Duffing oscillator example, this figure shows the pointwise mean of the relative error between the true trajectories and the trajectories generated using DMD. The shaded area shows the pointwise maximum and the pointwise minimum over 100 randomly initialized trajectories. To demonstrate the effectiveness of the heuristic continuous-time model in equation 9, the true trajectories, sampled every 0.5 seconds, are also compared against trajectories generated using the discrete-time model in equation 8. Figure 3 also shows the pointwise mean of the relative error across the 100 trajectories along with error bars that show the pointwise maximum and the pointwise minimum relative error. C.2 Turbulent flow Another Numerical example uses a data set that accompanies (Jakob et al., 2020), which consist of flow simulations that range from laminar flow configurations to turbulent flow configurations. The data can be accessed from their website (Jakob et al., 2021). The flow data used in this paper (id number 7999) corresponds to 2-dimensional turbulent flow with a Reynolds number of 4092.216 and a Kinematic viscosity of 5.45 exp 05. The data consist of 1001 snapshots, each snapshot contains flow velocities in the horizontal and vertical directions measured on a regular grid of size 512 512 in two dimensions with domain [0, 1] [0, 1]. The snapshots are separated by 0.01 seconds. In this demonstration, snapshots 1 through 300 are used as the input data, and snapshots 2 through 301 are used as output data, assuming that the ith and (i + 1)th snapshots satisfy xi+1 = F(xi) for some unknown nonlinear function F. The snapshots are normalized so that the largest 2-norm among all snapshots is 1. The Koopman modes, eigenvalues, and eigenfunctions are then computed using the developed technique and snapshots 2 through 301 are reconstructed from snapshot 1 using equation 7. DMD is implemented using the Gaussian RBF kernel, K(x, y) = exp 1 µ x y 2 2 (with µ = 0.75 and ϵ = 0.0001). The baseline method Published in Transactions on Machine Learning Research (09/2024) 0 2 4 6 8 10 0 (a) Difference between prediction generated using the method in (Williams et al., 2015b) and the truncated pseudoinverse implementation of the method developed in this paper. 0 2 4 6 8 10 0 (b) Difference between trajectories generated using the method in (Williams et al., 2015b) and the regularized regression implementation of the method developed in this paper. Figure 4: For the Duffing oscillator example, this figure shows the pointwise mean of the error between the trajectories predicted using the continuous-time model in equation 9 generated by the method in (Williams et al., 2015b) and the method developed in this paper. The shaded area shows the pointwise maximum and the pointwise minimum over 100 randomly initialized trajectories. 2 4 6 8 10 0 x(t) ˆx(t) 2 (a) Method developed in this paper 2 4 6 8 10 0 x(t) ˆx(t) 2 (b) Kernel DMD method in (Williams et al., 2015b) Figure 5: For the turbulent flow example, this figure shows the 2-norm of the error between the true snapshots and the snapshots generated using DMD. from (Williams et al., 2015b) is implemented using the same kernel with µ = 0.000518. The predictive model in equation 7 is also used to generate 700 additional snapshots. C.3 Additional Discussion The ability of the developed DMD technique to reconstruct the training data from the same initial condition is apparent from figures 3 and 5. In the duffing oscillator example the pointwise mean of the relative error is shown to be nearly zero. The relative reconstruction errors for the turbulent flow are of the order of 1e 2. The duffing oscillator example, is also used to highlight the similarity in predictions generated by the DMD method developed in this manuscript and that proposed by (Williams et al., 2015b). Figure 4 shows that the difference between the predictions generated by a truncated pseudoinverse implementation of the method developed in this paper and the predictions generated by the method in (Williams et al., 2015b) are of the order of 1e 6. As expected, figure 4 further highlights that choosing to use the truncated pseudoinverse implementation as opposed to the regularized regression implementation results in predictions that are more similar to those obtained under the method in (Williams et al., 2015b). Furthermore, figure 3 also shows Published in Transactions on Machine Learning Research (09/2024) that for the duffing oscillator example the continuous-time model in equation 9 results in a smaller pointwise mean relative error than the discrete-time model in equation 8. In the case of the turbulent flow, as shown in 5, the relative prediction errors for the method developed in this paper and the baseline kernel DMD method developed in (Williams et al., 2015b) quickly increase to 1. Since the data are normalized so that the largest 2-norm among all snapshots is 1, a relative prediction error of 1 signifies a very poor match between the predicted state and the true state. Such performance degradation is expected for turbulent flows since the underlying assumption that the ith and (i + 1)th snapshots satisfy xi+1 = F(xi) for some unknown nonlinear function F is unlikely to generalize outside the training data when the flow is turbulent. From figure 5, the method developed in this paper appears to perform slightly better than the baseline. This performance improvement can be attributed to the explicit regularization of the Gram matrix, where the regularization parameter can be tuned in addition to the kernel width parameter. in (Williams et al., 2015b), the regularization is done implicitly through the pseudoinverse operation. We postulate that by manipulating the threshold used to decide which singular values are zero in the pseudoinverse computation, the performance of the kernel DMD method can be improved.