# learnability_of_linear_porthamiltonian_systems__ab180216.pdf Journal of Machine Learning Research 25 (2024) 1-56 Submitted 4/23; Published 2/24 Learnability of Linear Port-Hamiltonian Systems Juan-Pablo Ortega Juan-Pablo.Ortega@ntu.edu.sg Division of Mathematical Sciences Nanyang Technological University, Singapore Daiying Yin yind0004@e.ntu.edu.sg Division of Mathematical Sciences Nanyang Technological University, Singapore Editor: Maxim Raginsky A complete structure-preserving learning scheme for single-input/single-output (SISO) linear port-Hamiltonian systems is proposed. The construction is based on the solution, when possible, of the unique identification problem for these systems, in ways that reveal fundamental relationships between classical notions in control theory and crucial properties in the machine learning context, like structure-preservation and expressive power. In the canonical case, it is shown that, up to initializations, the set of uniquely identified systems can be explicitly characterized as a smooth manifold endowed with global Euclidean coordinates, which allows concluding that the parameter complexity necessary for the replication of the dynamics is only O(n) and not O(n2), as suggested by the standard parametrization of these systems. Furthermore, it is shown that linear port-Hamiltonian systems can be learned while remaining agnostic about the dimension of the underlying data-generating system. Numerical experiments show that this methodology can be used to efficiently estimate linear port-Hamiltonian systems out of input-output realizations, making the contributions in this paper the first example of a structure-preserving machine learning paradigm for linear port-Hamiltonian systems based on explicit representations of this model category. Keywords: Linear port-Hamiltonian system, machine learning, structure-preserving algorithm, systems theory, physics-informed machine learning, unique identification problem, controllable representation, observable representation, canonical representation. 1 Introduction 3 2 Preliminaries 6 2.1 State-space systems and morphisms . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Hamiltonian and port-Hamiltonian systems . . . . . . . . . . . . . . . . . . 7 2.3 Controllability and observability . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 The symplectic Lie group and its Lie algebra . . . . . . . . . . . . . . . . . 10 2.5 Williamson s normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 c 2024 Juan-Pablo Ortega and Daiying Yin. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v25/23-0450.html. Juan-Pablo Ortega and Daiying Yin 3 Controllable and observable Hamiltonian representations 11 4 Unique identification of linear port-Hamiltonian systems 16 4.1 The unique identification problem for filters in PHn . . . . . . . . . . . . . 18 4.2 Equivalence classes of port-Hamiltonian systems by system isomorphisms . 20 4.3 The quotient spaces as groupoid orbit spaces . . . . . . . . . . . . . . . . . 21 4.4 Characterization of canonical port-Hamiltonian systems . . . . . . . . . . . 23 4.5 The unique identifiability space for canonical port-Hamiltonian systems as a group orbit space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.6 Global Euclidean coordinates for the unique identifiability space of canonical port-Hamiltonian systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5 Linear port-Hamiltonian systems in normal form are restrictions of higher dimensional ones 26 6 Practical implementation of the results 29 7 Numerical illustrations 30 7.1 Non-dissipative circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 7.2 Positive definite Frenkel-Kontorova model . . . . . . . . . . . . . . . . . . . 32 8 Conclusions 35 Acknowledgments 36 Glossary of Symbols 36 Bibliography 37 9 Appendices 42 9.1 Proof of Theorem 7 (i) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 9.2 Proof of Theorem 7 (ii) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 9.3 Proof of Proposition 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 9.4 Proof of Theorem 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 9.5 Proof of Proposition 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 9.6 Proof of Proposition 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 9.7 Proof of Proposition 31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 9.8 Proof of Proposition 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 9.9 Proof of Proposition 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 9.10 Proof of Theorem 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 9.11 Proof of Proposition 35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 9.12 Proof of Proposition 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 9.13 Proof of Proposition 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 9.14 A note on the design of discrete integrators on the transformed space . . . . 55 Learnability of Linear Port-Hamiltonian Systems 1. Introduction Machine learning has experienced substantial development in recent years due to significant advances in algorithmics and a fast growth in computational power. The universal approximation properties of neural networks (Cybenko (1989); Hornik et al. (1989)) and other similar families make it possible for them to learn any function with very few prior assumptions. A typical modus operandi in supervised machine learning is first to choose a neural network architecture, to perform forward propagation using available data, to compute some loss function, and then to carry out backward propagation, that is, gradient descent, to recursively optimize the parameters. This paradigm has proved to be very successful in the learning of numerous complicated tasks, including time-series forecasting (Hochreiter and Schmidhuber (1997)), computer vision (Krizhevsky et al. (2012)), and natural language processing (Devlin et al. (2018)). In physics and engineering, machine learning is called to play an essential role in predicting and integrating the equations associated with physical dynamical systems. Physical systems are primarily formulated in terms of ordinary, time-delay, and partial differential equations that can be deduced mostly from variational principles. Consequently, some researchers propose to learn adequately discretized versions of their corresponding vector fields (see, for instance, Raissi and Karniadakis (2017),Qin et al. (2019), Long et al. (2018), and references therein). In addition to vector fields learning, researchers have proposed modelfree methods like transformers (Shalova and Oseledets (2020); Acciaio et al. (2022)), reservoir computing (Jaeger and Haas (2004); Lu et al. (2018); Pathak et al. (2018a,b)), recurrent neural networks (Bailer-Jones et al. (1998)), convolutional neural networks (Mukhopadhyay and Banerjee (2020)), or LSTMs (Wang (2017)). Various universal approximation properties theoretically explain the empirical success of some of these approaches (see, for instance, Grigoryeva and Ortega (2018a,b); Gonon and Ortega (2020, 2021)) of some of these learning paradigms. Nevertheless, for physics-related problems, like in mechanics or optics, it is natural to build into the learning algorithm any prior knowledge that we may have about the system based on physics first principles. This may include specific forms of the laws of motion, conservation laws, symmetry invariance, as well as other underlying geometric and variational structures. This observation regarding the construction of structure-preserving schemes has been profusely exploited with much success before the emergence of machine learning in the field of numerical integration (Gonzalez (2000); Marsden and West (2001); Leimkuhler and Reich (2004); Mc Lachlan and Quispel (2006)). Many examples in that context show how the failure to maintain specific conservation laws can lead to physically inconsistent solutions. The translation of this idea to the context of machine learning has led to the emergence of a new domain collectively known as physics-informed machine learning (see Raissi et al. (2017); Wu et al. (2018); Karniadakis et al. (2021) and references therein). In the specific case of Hamiltonian systems, the two main structural constraints are that the flow is symplectic and the energy, that is,, the Hamiltonian, is conserved along the flow. Additionally, symmetries are frequently present, which carries the emergence of additional conserved quantities in the form of the so-called momentum maps via Noether s Theorem (Abraham and Marsden (1978); Marsden and Ratiu (1999); Ortega and Ratiu (2004)). These are all examples of qualitative properties to be preserved by the learning algorithms. Needless Juan-Pablo Ortega and Daiying Yin to say, the above-mentioned model-free approaches generically fail to preserve all these structures. With these in mind, several attempts have been made in the literature to develop tailor-made learning algorithms for Hamiltonian systems. For example, in Greydanus et al. (2019); Celledoni et al. (2023), neural methods are proposed to learn the Hamiltonian function directly. In Chen et al. (2020), a symplectic recurrent neural network is proposed that uses symplectic integration while matching the predictions and observations and leads to a structure-preserving paradigm. Other structure-preserving methods include the so-called Symp Net (Jin et al. (2020)), the generating function neural networks (GFNN) in Chen and Tao (2021), and the symplectic reversible neural networks in Valperga et al. (2022). Symp Net constructs a universal approximating family of symplectic maps, while GFNN applies a modified KAM theory to control long-term prediction error. Symplectic reversible neural networks are also proposed as a family of universal approximating maps that concern, in particular, reversible symplectic dynamics. In Zhong et al. (2020), a parametric framework of learning Hamiltonian state dynamics with control is proposed, assuming that the Hamiltonian is separable. Under the same assumption, Tong et al. (2021) proposes to learn with a parametrized Hamiltonian in a Taylor series form. This paper s focus differs from the references mentioned above in two ways. First, these methods are designed to learn the state evolution of Hamiltonian systems, whereas our approach focuses on learning the input-output dynamics of port-Hamiltonian systems while remaining agnostic about the physical state space. As will be introduced later, these systems have an underlying Dirac structure that describes the geometry of numerous physical systems with external inputs (van der Schaft and Jeltsema (2014)) and includes the dynamics of the observations of Hamiltonian systems as a particular case. Even though various learning schemes for these systems have already been proposed in the literature (Nageshrao et al. (2015); Cherifi(2020); Desai et al. (2021); Beckers et al. (2022)), most works on the learning of Hamiltonian systems deal with autonomous (separable) Hamiltonian systems on which one assumes access to the entire phase space and not only to its observations. Second, instead of a general nonlinear system for which only approximation error can be possibly estimated, we consider, as a first approach exclusively linear systems, in which case, we can obtain explicit representations of linear port-Hamiltonian systems in normal form and characterize the symmetries and quotient spaces associated to the invariance by system automorphisms. Thereby, we propose a structure-preserving learning paradigm with a provable minimal parameter space. The contributions in this paper are contained in several results that we briefly introduce in the following lines. In Section 2, we define the notion of linear port-Hamiltonian systems in normal form and present some necessary introductory concepts. We start in Theorem 7 by introducing system morphisms that allow us to represent any linear port-Hamiltonian system in normal form as the image of another linear system of the same dimension in which the state equation is in controllable canonical form. An obvious observation is that since the constructed linear system and the original port-Hamiltonian system are linked by a system morphism, the input/output relations of the former are input/output relations of the latter once the initial state conditions have been properly set up. In particular, the new system can be used to learn to reproduce the input/output dynamics of the original port-Hamiltonian system (for a subspace of initial conditions) and this learning paradigm is structure-preserving by construction. Similarly, Theorem 7 also contains another type of Learnability of Linear Port-Hamiltonian Systems system morphisms that link any linear port-Hamiltonian system in normal form to some linear system of the same dimension in observable canonical form. Consequently, the inputouput relations of the original port-Hamiltonian system with respect to any initial condition can be captured by the observable Hamiltonian representation. Both representations are derived based on classical techniques from control theory, the Cayley-Hamilton theorem, and are ultimately corollaries of the Williamson normal form (Williamson (1936, 1937); Ikramov (2018)). We show that the controllable and observable representations are closely related to each other, and both system morphisms become isomorphisms for canonical port Hamiltonian systems. However, for the purpose of learning a general port-Hamiltonian system that may not be canonical, we reveal that there is a trade-offbetween the structurepreserving property and the expressive power. These results establish a strong link between classical notions in the control theory, that is, controllability and observability, and those in machine learning, namely, structure-preservation and expressive power. Based on these explicit constructions and using the parametrizations that come with them, we aim to tackle in Section 4 the unique identifiability of input-output dynamics of linear port-Hamiltonian systems in normal form. Such a characterization is obviously needed to solve the model estimation problem since, in applications, we only have access to input/output data, and different state space systems can induce the same filter that produces that data. This fact has important implications when it comes to the learning of port-Hamiltonian systems out of finite-sample realizations of a given data-generating process because such degeneracy makes impossible its exact recovery. Said differently, it is not the space of port-Hamiltonian systems that needs to be characterized but its quotient space with respect to the equivalence relation defined by the constraint on inducing the same input/ouput filter. We shall see in Subsection 4.1 that the presence of non-canonical systems in PHn and possible initialization inconsistencies make it, in general, difficult to directly characterize that quotient space by filter-equivalence and we shall settle for the closest to it that we can get, namely, the quotient space by system automorphisms that, as it will be justified, approximates the general case in a certain sense and admits an explicit characterization as a Lie groupoid orbit space (Subsection 4.3). In Subsection 4.4, we restrict our identification analysis to canonical port-Hamiltonian systems and show, first, that in that situation eliminating the system isomorphisms completely identifies the set of input/output systems up to state initializations (Sussmann (1976)), and second, that the corresponding quotient spaces can be characterized as orbit spaces with respect to a group (as opposed to a groupoid in the general unrestricted case) action, where the group is explicitly given by a semi-direct product. Moreover, (see Subsection 4.6) this orbit space can be explicitly endowed with a smooth manifold structure that has global Euclidean coordinates that can be used at the time of constructing estimation algorithms. Consequently, up to state initializations, canonical port-Hamiltonian dynamics can be identified fully and explicitly in either the controllable or the observable Hamiltonian representations and learned by estimating an initial state condition and a unique set of parameters in a smooth manifold that is obtained as a group orbit space. Another learning-related problem that we tackle is that, in applications, one is obliged to remain agnostic as to the dimension of the underlying data-generating port-Hamiltonian system. This leads to the difficulty of choosing the dimension of the controllable/observable Hamiltonian representations. We solve this issue by proving in Theorem 34 that, for m n, Juan-Pablo Ortega and Daiying Yin any 2n-dimensional linear port-Hamiltonian system in normal form can be regarded as the restriction of a 2m-dimensional one to some subspace. This fact, together with some subsequent results, guarantees theoretically that we can choose a sufficiently large m in practice and parametrize the observable Hamiltonian representation in dimension 2m and use it for learning without assuming any knowledge about the dimension of the data generating system. The paper concludes with some numerical examples in Section 7 that illustrate the viability of the method that we propose in systems with various levels of complexity and dimensions, as well as the computational advantages associated with using the parameter space in which unique identification is guaranteed. For the reader s convenience, the Python code necessary to reproduce these numerics is public and can be found in https: //github.com/YINDAIYING/Learnability-of-Linear-Port-Hamiltonian-Systems. 2. Preliminaries In this section, we introduce various notions and preliminary results necessary to understand the context and the contributions of the paper. 2.1 State-space systems and morphisms A continuous time state-space system is given by the following two equations z = F(z, u), y = h(z), (1) where u U is the input, z Z is the internal state and F : Z U Z is called the state map. The first equation is called the state equation while the second one is usually referred to as the observation equation. The solutions of (1) (when available and unique) yield an input/output map that is by construction causal and time-invariant. State-space systems will be sometimes denoted using the triplet (Z, F, h). Definition 1 A map f : Z1 Z2 is called a system morphism (see Grigoryeva and Ortega (2021)) between the continuous-time state-space systems (Z1, F1, h1) and (Z2, F2, h2) if it satisfies the following two properties: (i) System equivariance: f(F1(z1, u)) = F2(f(z1), u), for all z1 Z1 and u U. (ii) Readout invariance: h1(z1) = h2(f(z1)) for all z1 Z1. As a direct consequence of this definition, the composition of system morphisms is again a system morphism. In the case f is invertible and f 1 is also a morphism, we say that f is a system isomorphism. An elementary but very important fact is that if f : Z1 Z2 is a linear system-equivariant map between (Z1, F1, h1) and (Z2, F2, h2) (Z1 and Z2 are in this case vector spaces) then, for any solution z1 C1(I, Z1) of the state equation associated to F1 and to the input u C1(I, U), with I R an interval, its image f z1 C1(I, Z2) is a solution for the state space system associated to F2 with the same input. Indeed, for any t I we have, by the linearity and the system equivariance of f: d dt[f(z1(t))] = Df(z1(t)) z1(t) = f( z1(t)) = f(F1(z1(t), u(t))) = F2(f(z1(t)), u(t)). Learnability of Linear Port-Hamiltonian Systems Notice that if at time t = 0, the output of both systems (Z1, F1, h1) and (Z2, F2, h2) are the same, that is, the initial conditions z1(0) and z2(0) at the time of integrating (1) are chosen so that h1(z1(0)) = h2(f(z1(0))), then the two systems (Z1, F1, h1) and (Z2, F2, h2) have the same associated input/output relation, in the sense that we introduce later on in definition (7). This observation has an important consequence, namely that, in general, input/output systems are not uniquely identified since all the system-isomorphic state-space systems with appropriate initializations yield the same input/output map. 2.2 Hamiltonian and port-Hamiltonian systems Hamiltonian systems are dynamical systems whose behavior is governed by Hamilton s variational principle. Even though these autonomous systems can be in general formulated on any symplectic manifold (Abraham and Marsden (1978)), we will restrict in this paper to the case in which the phase space is the even-dimensional vector space R2n endowed with the Darboux canonical symplectic form. In this case, the Hamiltonian system determined by the Hamiltonian function H C1(R2n) is given by the differential equation is the so-called the canonical symplectic matrix. Note that J = JT = J 1 and hence endows R2n also with a complex structure. In this paper, we will denote the canonical symplectic matrix as J, unless the context requires to specify the dimension, in which case we denote it by Jn. A linear Hamiltonian system is determined by a quadratic Hamiltonian function H(z) = 1 2z T Qz, where z R2n and Q M2n is a square matrix that without loss of generality can be assumed to be symmetric. In this case, Hamilton s equations (2) reduce to z = JQz. (3) Port-Hamiltonian systems (see van der Schaft and Jeltsema (2014)) are state-space systems that generalize autonomous Hamiltonian systems to the case in which external signals or inputs control in a time-varying way the dynamical behavior of the Hamiltonian system. The family of input-state-output port-Hamiltonian systems are those port-Hamiltonian systems with no algebraic constraints on the state-space variables, and where the flow and effort variables of the resistive, control and interaction ports are split into conjugated pairs. In such cases, the implicit representation may be proved (see van der Schaft and Jeltsema (2014)) to be equivalent to the following explicit form: x = [J(x) R(x)] H x (x) + g(x)u, y = g T (x) H where (u, y) is the input-output pair (corresponding to the control and output conjugated ports), J(x) is a skew-symmetric interconnection structure and R(x) is a symmetric positivedefinite dissipation matrix. Our work concerns linear port-Hamiltonian systems in the Juan-Pablo Ortega and Daiying Yin normal form which we define now: a linear port-Hamiltonian system (4) is in normal form if the skew-symmetric matrix J is constant and equal to the canonical symplectic matrix J, the Hamiltonian matrix Q is symmetric positive-definite, and the energy dissipation matrix R = 0, in which case (4) takes the form: z = JQz + Bu, with z R2n, u, y R, and where B R2n specifies the interconnection structure simultaneously at the input and output levels. By definition, such systems are fully determined by the pair (Q, B), and hence we define by (Q, B)|0 < Q M2n, Q = QT , B R2n the space of paramters of (5). Let θPHn : ΘPHn PHn the map that associates to the parameter (Q, B) θPHn the corresponding port-Hamiltonian state space system. For convenience, we shall often use (Q, B) to denote elements in PHn unless there is a risk of confusion. Note that the condition Q > 0 implies that the origin is a Lyapunov stable equilibrium of (3). All these systems have the existence and uniqueness of solutions property and hence determine a family of input/output systems, also known as filters, that will be denoted by PHn. More specifically, the elements in PHn are maps U(Q,B) : C1([0, 1]) R2n C1([0, 1]) given by U(Q,B) : C1([0, 1]) R2n C1([0, 1]) (u, x0) U(Q,B)(u, x0)t = BT Qe JQt t 0 e JQs Bu(s) ds + x0 (7) t [0, 1]. Note that PHn includes as a special case linear observations of autonomous linear Hamiltonian systems (case B = 0). Note that as a manifold ΘPHn = S+ 2n R2n, where S+ 2n denotes the space of symmetric positive-definite matrices (SPD). We recall that S+ 2n has a natural differentiable manifold structure whose tangent space at any point is the vector space of symmetric matrices S2n (see Quang Minh and Murino (2018), and references therein). Port-Hamiltonian systems are also closely linked to the so-called affine Hamiltonian input-output systems that have been considered as a natural extension of Hamiltonian systems with external forces and studied extensively in the literature (see Crouch and van der Schaft (1987) for the deterministic case and Bismut (1982); L azaro-Cam ı and Ortega (2008) for stochastic extensions), which take the form x = XH(x) + Xg(x)u, y = g(x), (8) where XH and Xg are the Hamiltonian vector fields of H, g C1(R2n). In the linear case, (8) reduces to z = JQz JBu, Learnability of Linear Port-Hamiltonian Systems The relation between (9) and (5) is that y = BT z = BT JQz = ( JB)T Qz, showing that the time derivative of the affine Hamiltonian input-output system has a port-Hamiltonian structure. Note that in the last equality, we used that BT JB = 0 since J is antisymmetric. Consider now a general linear single-input/single-output system that takes the form x = Ax + Bu, where A Mn, B, C Rn. Very often in control theory, it is the so-called transfer matrix rather than the input/output system which is studied. The transfer matrix G(s) of (10) is defined as G(s) = CT (Is A)B and converts the differential equations in the time domain to an algebraic equation in the Laplace frequency domain. It can be proved that the transfer matrix of the port-Hamiltonian systems (5) satisfies G(s) = G( s) and that of systems of the type (9) satisfies G(s) = G( s). The converse statements also hold for canonical realizations (see the definition in the next section and Brockett and Rahimi (1972), Maschke and van der Schaft (1992)). These facts are a strong indication that the systems (5) and (9) carry intrinsic symmetries that should be explicitly characterized. We shall do so in Section 4 for port-Hamiltonian systems but only using the original state-space representation. 2.3 Controllability and observability Given a general linear system like (10), we recall that its controllability and observability matrices are defined by B | AB | . . . | An 1B ... CT An 1 , respectively. The system is called controllable (respectively, observable) if its controllability (respectively, observability) matrix has full rank. Any linear controllable (respectively, observable) system can be transformed into the so-called controllable (respectively, observable) canonical forms by using appropriate linear system isomorphisms (see Polderman and Willems (1998)). Conversely, systems in these canonical forms are automatically controllable (respectively, observable). In the next section, we characterize the controllable/observable/canonical systems in the linear port-Hamiltonian category. Controllability and observability are intertwined concepts in the linear port-Hamiltonian category. Indeed, it can be proved (see Medianu et al. (2013)) that if a linear port Hamiltonian system without dissipation is controllable and det(Q) = 0, then it is also observable. Conversely, if it is observable, then this implies that det(Q) = 0 and it is also controllable (see Medianu et al. (2013)). As it is customary in systems theory, we say a linear port-Hamiltonian system in normal form is canonical if it is both controllable and observable. In view of the results that we just recalled, if det(Q) = 0, then either controllability or observability is equivalent to the system being canonical. Furthermore, it can be shown that being canonical is a generic property, that is, the set of canonical systems forms an open and dense subset. We shall denote by PHcan n PHn the subset of PHn made of Juan-Pablo Ortega and Daiying Yin canonical linear port-Hamiltonian systems. Later on in the paper, the significance of these observations will become apparent. 2.4 The symplectic Lie group and its Lie algebra A square matrix S M2n in dimension 2n is called symplectic if it satisfies ST JS = J. The set of all symplectic matrices forms a Lie group denoted by Sp(2n, R). It is well-known that if S Sp(2n, R) then det(S) = 1 and hence Sp(2n, R) is a subgroup of the general linear group GL(2n, R). The Lie algebra sp(2n, R) of Sp(2n, R) is given by the matrices A M2n that satisfy the identity AT J + JA = 0. Equivalently, A sp(2n, R) if and only if A = JR, where R M2n is symmetric. We will refer to the elements in Sp(2n, R) as symplectic matrices and to those in sp(2n, R) as infinitesimally symplectic. Notably, the eigenvalues of the elements in sp(2n, R) appear in specific patterns that are spelled out in the following classical proposition (see (Abraham and Marsden, 1978, Section 3.1)). Proposition 2 The characteristic polynomial of any matrix in A sp(2n, R) is even. Thus, if λ is an eigenvalue of A then so are λ, λ, and λ. The importance of this group in our developments is that the (constant) vector field associated with the Hamilton s equations (3) is an element in sp(2n, R). Its flow determines a one-parameter subgroup of elements in Sp(2n, R). We also introduce the unitary group U(n, C), which consists of matrices U Mn(C) with UU = U U = In, where U denotes the conjugate transpose of U. We denote by U(n) (see De Gosson (2006)) the image of U(n, C) in Sp(2n, R) by the monomorphism The so-called 2-out-of-3 property (Arnold (1989)) implies that U(n) = O(2n, R) GL(n, C) Sp(2n, R), and it is indeed the intersection of any two out of the three groups. 2.5 Williamson s normal form The following classical result can be found in Williamson (1936, 1937); Ikramov (2018); De Gosson (2006). Theorem 3 Let M M2n be a positive-definite symmetric real matrix. Then (i) There exists a symplectic matrix S Sp(2n, R) such that M = ST D = diag(d) an n-dimensional diagonal matrix with positive entries and d = (d1, . . . , dn)T (ii) The values d1, . . . , dn are independent, up to reordering, on the choice of the symplectic matrix S used to diagonalize M . Learnability of Linear Port-Hamiltonian Systems (iii) Assume S and S are two elements of Sp(2n, R) such that M = ST S , where D is as above, then S(S ) 1 U(n). Later in this paper, we always use the notation D = diag(d) to denote that D is a diagonal matrix with diagonal entries given by the vector d = (d1, . . . , dn)T . The elements di in the above theorem are called the symplectic eigenvalues of M since they are also the eigenvalues of JM. Remark 4 The above theorem can be generalized to positive-semidefinite real symmetric matrices. Indeed, it can first be shown that if the kernel of M is a symplectic subspace of R2n of dimension 2m, then the statement of Theorem 3 still holds true holds with the only added feature that exactly m of the diagonal entries in D are equal to 0 (see Son and Stykel (2022)). More generally, without the symplecticity assumption, all that it can be said is that there exists S Sp(2n, R) such that M = ST S where D1 and D2 may contain diagonal zero entries (see Idel et al. (2017); Egusquiza and Parra-Rodriguez (2022)). 3. Controllable and observable Hamiltonian representations In this section, we state two representation results for linear port-Hamiltonian systems in normal form, which are the main building blocks in our learnability results. More precisely, we define two subfamilies of linear systems of the type (10), that are respectively called controllable/observable Hamiltonian representations, that are by construction controllable/observable (Definition 5). We subsequently show in Theorem 7 that morphisms can be established between the elements in these families and those in the category PHn of normal form port-Hamiltonian systems. As it will be spelled out later on in detail, the existence of these morphisms immediately guarantees that the complexity of the family of filters PHn is actually not O(n2), as it could be guessed from (5), but O(n). However, our proposed representations have certain limitations for non-canonical port-Hamiltonian systems. For example, the observable representation is guaranteed to capture all possible input-output dynamics of port-Hamiltonian systems (full expressive power), but it does not always produce port-Hamiltonian dynamics (fails to be structure-preserving). In the controllable case, structure preservation is guaranteed, but there is, in general, no full expressive power. Fortunately, for canonical port-Hamiltonian systems, all the morphisms that we shall introduce become isomorphisms, meaning that they are both structure-preserving and have full expressive power. Roughly speaking, the more canonical a port-Hamiltonian system is, the better the corresponding representations behave in terms of structure-preserving properties and expressive power. The representations introduced below can be seen as a reparametrization of the elements (Q, B) PHn in terms of a diagonal matrix D = diag(d) Mn, d Rn, and a vector v R2n, where D is obtained from Williamson s Theorem 3 as Q = ST v = S B. This makes it obvious that the learning problem for port-Hamiltonian systems Juan-Pablo Ortega and Daiying Yin has parameter complexity of at most O(n) even if the Hamiltonian matrix has complexity O(n2). We emphasize that even in the canonical situation, the availability of the controllable/observable representations does not yet provide a well-specified learning problem for this category since the invariance of these systems under system automorphisms implies the existence of symmetries (or degeneracies) in the parametrizations, which will be the focus of the next section. The proofs of all our results are provided in the appendices. Definition 5 Given d = (d1, . . . , dn)T Rn, with di > 0, and v R2n, we say that a 2ndimensional linear state space system is a controllable Hamiltonian (respectively, observable Hamiltonian) representation if it takes the form 1 (d) s + (0, 0, , 0, 1)T u, 2 (d, v) s, 1 (d) s + gobs 2 (d, v) u, y = (0, 0, , 0, 1) s, 1 (d) M2n and gctr 2 (d, v) M1,2n (respectively, gobs 1 (d) M2n and gobs 2 (d, v) R2n) are constructed as follows: (i) Given d Rn, let {a0, a1, . . . , a2n 1} be the real coefficients that make λ2n + 2n 1 i=0 ai λi = (λ2 + d2 2) . . . (λ2 + d2 n) an equality between the two polynomials in λ. Let a2n = 1 by convention. Note that the entries ai with an odd index i are zero. Define: 0 1 0 . . . 0 0 0 1 . . . 0 ... ... ... ... ... 0 0 0 . . . 1 a0 a1 a2 . . . a2n 1 (respectively, gobs 1 (d) = gctr (ii) Given d and v, then 2 (d, v) := 0 c2n 1 0 c2n 3 . . . 0 c1 , (resp., gobs 2 (d, v) = gctr c2k+1 = v T for k = 0, . . . , n 1, and f2 0 ... 0 fn 1 with fl = dl j1,...,jk =l 1 j1< 0, v R2n Sometimes later on in the paper we shall write ai(d) and cj(d, v) to indicate that ai and cj are functions of d and v. Remark 6 Observe that the controllable and the observable Hamiltonian representations of port-Hamiltonian systems are closely related to each other. The controllable Hamiltonian matrix gctr 1 is the transpose of the observable Hamiltonian matrix gobs 1 . Moreover, as can be directly observed from the construction, the input and readout matrices of the two representations, that is, gctr 2 , are transpose of each other. Consider now the maps θCHn : ΘCHn CHn and θOHn : ΘOHn OHn that associate to each parameter values the corresponding state-space system. Note that the elements in CHn (respectively, in OHn) of the form (12) are in canonical controllable (respectively, observable) form in the sense of Sontag (1998) and they are hence controllable (respectively, observable). Our main result below establishes a relationship between port-Hamiltonian systems and controllable (respectively, observable) Hamiltonian representations as defined above, which will be used later on for considerations on the structure preservation and expressiveness in the modeling of PHn. Theorem 7 (i) There exists, for each S Sp(2n, R), a map ϕS : CHn PHn θCHn(d, v) θPHn with D = diag(d), such that the controllable Hamiltonian system θCHn(d, v) CHn and the port-Hamiltonian image ϕS (θCHn(d, v)) PHn are linked by a linear system morphism f(d,v) S : R2n R2n. (ii) Given a port-Hamiltonian system θPHn(Q, B) PHn, there exists an explicit linear system morphism f(Q,B) : R2n R2n between the state space of θPHn(Q, B) PHn and that of an observable Hamiltonian system θOHn(d, v) OHn, where (d, v) ΘOHn is determined by the Williamson s normal form decomposition of Q determined by S Sp(2n, R), that is, Q = ST S, D = diag(d) and v = S B. Remark 8 We emphasize that given (Q, B) ΘPHn, the pair (d, v) ΘCHn/ΘOHn is not uniquely determined by Williamson s decomposition. This can be seen from Theorem 3 because the element S Sp(2n, R) in its statement is not unique and the entries di of d are independent of S up to their ordering. Juan-Pablo Ortega and Daiying Yin Remark 9 (Controllability, observability, and invertibility) (i) In the proof of the theorem above (available in the Appendix), we define the linear system morphism f(d,v) S : R2n R2n as z = f(d,v) S (s) := Ls and an explicit construction of the matrix L is provided. It turns out that, the matrix L is invertible if and only if the image port-Hamiltonian system (5) is controllable, or equivalently, observable. Indeed, using the same notation as in the proof of Theorem 7, we have L1v L2v L2nv S 1L1v S 1L2v S 1L2nv S 1L2n kv = S 1 + + a2n k I2n (Jn S T QS 1)k + a2n 1 (Jn S T QS 1)k 1 + + a2n k I2n (SJn QS 1)k + a2n 1 (SJn QS 1)k 1 + + a2n k I2n (Jn Q)k + a2n 1 (Jn Q)k 1 + + a2n k I2n Therefore, L can be transformed by elementary column operations into the controllability matrix of (5) and hence L being invertible, that is, the two systems being isomorphic, is equivalent to the controllability matrix of (5) having full rank (regardless of the choice of S Sp(2n, R)), which is again equivalent to (5) being canonical. Additionally, the condition for f(d,v) S to be invertible can also be formulated in terms of D and v directly, which we will discuss in Subsection 4.4. (ii) Systems in CHn are by construction in controllable canonical form, and are therefore always controllable. If the image system (5) by ϕS that we want to learn is controllable (or equivalently, observable), then by the previous point L is necessarily an invertible matrix which means that (12) and (5) are isomorphic systems by construction. As a consequence, (12) is not only controllable but also observable. Remark 10 (Application to structure-preserving system learning) As a corollary of the previous result, we can use controllable Hamiltonian representations to learn port-Hamiltonian systems in an efficient and structure-preserving fashion. Indeed, given a realization of a port-Hamiltonian system, a system of the type θCHn(d, v) CHn can be estimated using an appropriate loss (see Section 7). A representation of this type is more advantageous than the original port-Hamiltonian one for two reasons: (i) The model complexity of the controllable Hamiltonian representation is only of order O(n), as opposed to O(n2) for the original port-Hamiltonian one. (ii) This learning scheme is automatically structure-preserving. Indeed, once a system θCHn(d, v) CHn has been estimated for a given realization, we have shown that there exists a family of linear morphisms, each of which is between the state space of θCHn(d, v) Learnability of Linear Port-Hamiltonian Systems CHn and some θPHn(Q, B) PHn, such that any solution of (12) is automatically a solution of some system in PHn. Hence, even in the presence of estimation errors for (d, v) ΘCHn, the solutions of θCHn(d, v) still correspond to a port-Hamiltonian system and hence this structure is preserved by the learning scheme. Remark 11 (System learning and expressive power) Expressive power is an important property of any machine learning paradigm. As a continuation of the previous remarks, we emphasize that there is an important relation between the controllability of a system in PHn and the expressive power of the corresponding representation in CHn. Indeed, if (5) is controllable, by point (ii) in Remark 9, the corresponding preimage system θCHn(d, v) CHn can capture all possible solutions of (5), which amounts to the learning scheme based on ΘCHn having full expressive power. To see this, let z0 be an initial state of the controllable system θPHn(Q, B) PHn in (5). Since in that case we can find an invertible system isomorphism f(d,v) S that links it to some θCHn(d, v) ΘCHn, there exists some corresponding initial state s0 = (z0). Then, by Theorem 7 and the uniqueness of the solutions of ODEs, the solution of (12) with initial state s0 is a representation of the solution of (5) with initial state z0. However, if (5) fails to be controllable (that is, f(d,v) S not invertible), then such an initial condition s0 may not exist. As a rule of thumb, the more controllable a system of the type (5) is, the higher the rank of f(d,v) S is, and then the more expressive the corresponding controllable Hamiltonian representations are. Remark 12 (Expressive power and structure-preservation) We emphasize that systems in OHn always have full expressive power guaranteed by the system morphism in Theorem 7. This implies that any input-output dynamics generated by the original port-Hamiltonian system will be captured by some of the observable Hamiltonian representations in the statement. However, unlike in the controllable case, the system morphism is between θPHn(Q, B) PHn and θOHn(d, v) OHn. Therefore, unless (Q, B) is canonical, in which case the morphism becomes an isomorphism, we cannot, in general, assert the structure-preserving property of this representation. Remark 13 (Positive semi-definite Hamiltonians) The above results can be easily generalized to positive semi-definite (PSD) Hamiltonians with the aid of the generalized Williamson s theorem in the references Son and Stykel (2022); Idel et al. (2017); Egusquiza and Parra-Rodriguez (2022) that we briefly discussed in Section 2.5. In general, the number of unknown parameters in the vector d is doubled (because of the matrices D1 and D2 that appear in this case), and their relation with the coefficients {a0, a1, . . . , a2n 1} has to be modified accordingly, that is, λ2n + 2n 1 i=0 ai λi = (λ2+d1dn+1)(λ2+d2dn+2) . . . (λ2+dnd2n), where some of the di s could be 0. The expression Juan-Pablo Ortega and Daiying Yin 1 (d) remains the same, whereas the expression of 2 (d, v) becomes Fk,0 0 0 Fk,1 f2,p 0 ... 0 fn 1,p and fl,p = dnp+l j1,...,jk =l 1 j1< 0, i {1, . . . , n} Now, it is a natural question to ask what is the equivalence relation that corresponds to sys on the parameter space ΘCHn, and if it is possible to explicitly characterize the quotient space PHn/ sys on ΘCHn in a certain sense. All these questions are addressed step-by-step in the following subsections. In Subsection 4.1, we show that two canonical controllable Hamiltonian representations are filter-equivalent if and only if they are sys-equivalent. In Subsection 4.2, we define an equivalence relation on ΘCHn and we show that PHn/ sys = ΘCHn/ (see Theorem 22). In Subsection 4.3, we characterize the equivalence classes PHn/ sys and ΘCHn/ as Lie groupoid orbit spaces. In Subsection 4.4, we exclusively restrict our analysis to canonical port-Hamiltonian systems PHcan n . We first show that the parameter subset Θcan CHn ΘCHn that corresponds to PHcan n is open and dense in ΘCHn as it is determined by certain generic non-resonance and nondegeneracy conditions. If we define on ΘCHn the equivalence relation sys of system automorphisms of the corresponding controllable/observable Hamiltonian representations (see Definition 17), then it can be proved that, restricted to the canonical subset Θcan CHn, the equivalence relation coincides with sys, and hence n / sys = Θcan CHn/ = Θcan In Subsection 4.5, we prove that the fact that we restricted the above equivalence relations to canonical subsets allows us to characterize the corresponding quotients as orbit spaces with respect to a group (as opposed to groupoids in the general unrestricted case) action, where the group is given by a semi-direct product Sn φ Tn that will be specified in detail later on. Finally, in Subsection 4.6, we show that the orbit space Θcan CHn/(Sn φTn) can be explicitly identified as a smooth manifold Rn + and endowed with global Euclidean coordinates, and hence n / sys = Θcan CHn/ = Θcan CHn/ filter = Θcan CHn/ sys = Θcan CHn/(Sn φ Tn) = Rn Juan-Pablo Ortega and Daiying Yin Consequently, up to initializations, canonical port-Hamiltonian dynamics can be identified fully and explicitly in either the controllable or the observable Hamiltonian representations (12) and learned by estimating an initial state condition and a unique set of parameters in a smooth manifold that is obtained as a group orbit space. 4.1 The unique identification problem for filters in PHn In the context of model estimation/machine learning, we would like to characterize and identify the filters that constitute the elements in PHn. In Section 2.1, we have seen that two systems that are system isomorphic and are initialized according to the isomorphism induce the same input-output dynamics, which indicates that these isomorphisms are redundancies/symmetries in PHn. Our aim is to quotient out the symmetries given by system automorphisms and to investigate whether the quotient space uniquely identifies the filters in PHn. Definition 15 (PHn with equivalence relations sys and filter) (i) The fact that two systems θPHn(Q1, B1) and θPHn(Q2, B2) in PHn induce the same fil- ter defines an equivalence relation in PHn, which we denote by (Q1, B1) filter (Q2, B2). Consequently, we have by definition PHn = PHn/ filter, which we call the unique identifiability space. (ii) We observe that θPHn(Q1, B1) and θPHn(Q2, B2) in PHn are linearly system isomor- phic according to Definition 1 if and only if there exists an invertible matrix L such that LJQ1 = JQ2L It is straightforward to check that system isomorphisms determine an equivalence relation on PHn. If θPHn(Q1, B1) and θPHn(Q2, B2) are system isomorphic, we write (Q1, B1) sys (Q2, B2). We denote by PHn/ sys the quotient space. The equivalence class in PHn/ sys that contains the element θPHn(Q, B) is denoted by [Q, B] PHn/ sys. It is a natural question to ask about the relation between PHn/ sys and PHn/ filter, and if they are the same. However, in general, neither of the two equivalence relations sys and filter implies the other. To see filter does not imply sys, we note that in the next Example 1, a filter in PHn could be realized by two elements in PHn that are not sys-equivalent since filters identify exclusively the canonical part (that is, the minimal realization, see Kalman (1963)). To see the other direction, that is, sys does not imply filter, we can simply consider the value of the filter at time t = 0 induced by two systems (Q1, B1) filter (Q2, B2) from (7), which gives BT 1 Q1z0 = BT 2 Q2z0 for any z0, leading to BT 2 Q2. On the other hand, we have seen in (14) that (Q1, B1) sys (Q2, B2) only guarantees BT 2 Q2L, but not BT 2 Q2, unless L can be shown to be the identity matrix. Learnability of Linear Port-Hamiltonian Systems Example 1 Consider two systems θPHn(Q1, B1), θPHn(Q2, B2) PHn where 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 0 0 0 3 , B1 = B2 = Both systems induce the same filter y(u, z0)t = 0 cos(t s)u(s)ds + (cos t, 0, sin t, 0)T z0, where z0 is the initial state. However, these two systems cannot be system isomorphic, since by (14) in that case there would exist an invertible L such that LJQ1 = JQ2L, and hence JQ1 would have the same set of eigenvalues as JQ2, which is not the case. We have seen from the above that, in general, PHn/ filter and PHn/ sys are different objects, and neither one is a subset of the other. In practice, we are more interested in characterizing the former, which appears to be difficult due to issues that involve initialization consistency. Nevertheless, we can partially solve the problem by restricting to the generic subset of canonical port-Hamiltonian systems PHcan n , and consider their corresponding controllable Hamiltonian representations θCHn(Θcan CHn) by system isomorphisms, then, on those representations, the two equivalence relations will coincide exactly, that is,, Θcan CHn/ filter = Θcan CHn/ sys, which we ultimately characterize in Section 4.5. We present rigorous definitions of these equivalence relations on the parameter space before we state our main result Theorem 19. Lemma 16 For (d1, v1) and (d2, v2) ΘCHn = ΘOHn, θCHn(d1, v1) sys θCHn(d2, v2) if and only if θOHn(d1, v1) sys θOHn(d2, v2) Proof The proof is basically a restatement of the fact that gctr 1 (d) = gobs 1 (d)T and gctr 2 (d, v) = gobs 2 (d, v)T . Definition 17 (ΘCHn with equivalence relations sys and filter) (i) We shall denote (d1, v1) sys (d2, v2) if θCHn(d1, v1) and θCHn(d2, v2) are system isomorphic for (d1, v1), (d2, v2) ΘCHn. Note that system isomorphisms for controllable/observable Hamiltonian representations are indeed equivalent as we showed in Lemma 16. (ii) We shall denote (d1, v1) filter (d2, v2) if θCHn(d1, v1) and θCHn(d2, v2) induce the same filter for (d1, v1), (d2, v2) ΘCHn. Note that, unlike sys, filter is defined specifically for ΘCHn, and could be different if one replace ΘCHn with ΘOHn. Proposition 18 Given (d1, v1) and (d2, v2) in ΘCHn, then (I) θCHn(d1, v1) sys θCHn(d2, v2) if and only if ai(d1) = ai(d2) and ci(d1, v1) = ci(d2, v2) for all i = 1, . . . , n. In other words, there exists a permutation matrix Pσ Mn such that, for D = diag(d) and P = , the following conditions hold true: Juan-Pablo Ortega and Daiying Yin (F1)k 0 0 (F1)k (F2)k 0 0 (F2)k v2, k = 0, . . . , n 1 The matrices Fi are defined in Theorem 7. (II) θCHn(d1, v1) filter θCHn(d2, v2) if and only if ci(d1, v1) = ci(d2, v2) and ei(d1, v1) = ei(d2, v2) for all i = 1, . . . , n, where the scalar functions ei are defined recursively as e1 = c1 e2 = c3 a2n 2 e1 e3 = c5 a2n 2 e2 a2n 4 e1 en = c2n 1 a2n 2 en 1 a2n 4 en 2 a2 e1. Theorem 19 Given (d1, v1) and (d2, v2) in Θcan CHn, then θCHn(d1, v1) filter θCHn(d2, v2) if and only if θCHn(d1, v1) sys θCHn(d2, v2), that is, Θcan CHn/ filter = Θcan Proof The first part of the statement immediately follows from Proposition 18 and the fact that e1 = c1 = 0, which is guaranteed by the fact that we are considering canonical systems, see the characterizations in Section 4.4. 4.2 Equivalence classes of port-Hamiltonian systems by system isomorphisms We have seen that PHn/ sys is not the set of port-Hamiltonian filters due to the presence of non-canonical systems and possible initialization inconsistencies. However, it is still informative to study the quotient space PHn/ sys because when restricted to the canonical systems, PHcan n / sys uniquely identifies canonical port-Hamiltonian dynamics up to initializations. Furthermore, PHcan n / sys is isomorphic to Θcan CHn/ filter. In other words, PHcan n / sys uniquely identifies the set of canonical controllable Hamiltonian representations CHcan n . We shall make this point clearer in Sections 4.5 and 4.6. In this section, we introduce a manageable characterization of the quotient space PHn/ sys by using parameter spaces. First, motivated by Williamson s theorem, we consider the space ΘCHn defined before as the set of all pairs of the form (d, v), where d = (d1, d2, . . . , dn)T with di > 0, and v = (v1, v2, . . . , v2n)T R2n. Inspired by the representation results, we now define an equivalence relation on ΘCHn as below whose equivalence classes are denoted by [d, v]. The importance of the next definition is that, as we shall prove in Theorem 22, the relation on ΘCHn plays the same role as sys on PHn. Definition 20 The pairs (d1, v1) and (d2, v2) in ΘCHn are -equivalent, that is, (d1, v1) (d2, v2), if there exists a permutation matrix Pσ Mn and an invertible matrix Learnability of Linear Port-Hamiltonian Systems A such that, for Di = diag(di), i {1, 2} and P = , the following conditions hold (iv) v2 = PAv1. Proposition 21 The relation defined in Definition 20 is an equivalence relation on ΘCHn. In the next subsection, we shall give meaning to in terms of groupoid orbits. Now, we aim to characterize the sys equivalence relation on PHn as the equivalence relation on the space ΘCHn of (d, v)-pairs, that is, we shall prove that ΘCHn/ = PHn/ sys. This will be proved in three steps. First, we show that for an arbitrary S Sp(2n, R), the map ϕS defined in Theorem 7 composed with θCHn is compatible with the equivalence relations and sys, that is, (d1, v1) (d2, v2) if and only if ϕS(θCHn(d1, v1)) sys ϕS(θCHn(d2, v2)). Then, we show that the unique map ψS induced by ϕS θCHn on the quotient spaces does not depend on the choice of S and hence the family of maps ψS parameterized by S Sp(2n, R) induces a unique map Φ : ΘCHn/ PHn/ sys which is a homeomorphism. Theorem 22 (Characterization of PHn/ sys as ΘCHn/ ) Given any arbitrary S Sp(2n, R), the map ϕS θCHn induces on the quotient spaces a map Φ : ΘCHn/ PHn/ sys which does not depend on S Sp(2n, R) and is given by Φ([d, v] ) = where D = diag(d). Moreover, Φ is a homeomorphism with respect to the quotient topologies. 4.3 The quotient spaces as groupoid orbit spaces Recall that from a category theory point of view, a group can be seen as a category with a single object where all morphisms are invertible. Groupoids are a natural generalization of this notion and refer to categories with possibly more than one object, where again all morphisms are invertible (see Mackenzie (2005) for a comprehensive introduction). As it is customary, groupoids will be denoted with the symbol α, β : G M (or simply G M), where α and β are the target and the source maps, respectively. Given m M, the groupoid orbit that contains this point is given by Om = α M. The orbit space associated to G M is denoted by M/G. Juan-Pablo Ortega and Daiying Yin In this section, we provide an alternative point of view for Theorem 22 in terms of groupoid orbits. More precisely, we show first that the set of equivalence classes PHn/ sys (resp. ΘCHn/ ) is the orbit space ΘPHn/Gn (resp. ΘCHn/Hn) of a groupoid Gn ΘPHn (resp. Hn ΘCHn) which we construct in the following paragraphs. In a second step we show that the statement in Theorem 22 is equivalent to saying that the orbit spaces PHn/ sys and ΘCHn/Hn of the two groupoids coincide. Definition 23 1. Let Gn := {(L, (Q, B)) |L GL(2n, R), (Q, B) ΘPHn such that (i) JT LJQL 1is symmetric positive-definite (ii) B = JT LT JLB}. 2. Let the target and source maps α, β : Gn ΘPHn be defined as α(L, (Q, B)) := (JT LJQL 1, LB) and β(L, (Q, B)) := (Q, B). 3. Define the set of composable pairs as n := {((L1, (Q1, B1)), (L2, (Q2, B2))) | β((L1, (Q1, B1))) = α((L2, (Q2, B2)))}. 4. Let the multiplication map m : G(2) n Gn be defined as m((L1, (Q1, B1)), (L2, (Q2, B2))) = (L1L2, (Q2, B2)). 5. Let the identity section : ΘPHn Gn be defined as (Q, B) := (I2n, (Q, B)). 6. Let the inversion map i : Gn Gn be defined as i(L, (Q, B)) := (L 1, (JT LJQL 1, LB)). Proposition 24 The definition above determines a Lie groupoid Gn ΘPHn with Gn the total space, ΘPHn the base space, and structure maps α, β, m, , i. We refer to Gn ΘPHn as the port-Hamiltonian groupoid. The orbit space of this groupoid ΘPHn/Gn coincides with PHn/ sys. Definition 25 1. Let Hn := ((Pσ, A), (d, v)) |Pσ Mn is a permutation matrix, A GL(2n, R), (d, v) ΘCHn, such that (i) AT A, where D = diag(d) 2. Let the target and source maps α, β : Hn ΘCHn be defined as α((Pσ, A), (d, v)) := (d, v) and β((Pσ, A), (d, v)) := (Pσd, PAv), where P = 3. Define the set of composable pairs as ((Pσ,1, A1), (d1, v1)), ((Pσ,2, A2), (d2, v2)) | β((Pσ,2, A2), (d2, v2)) = α((Pσ,1, A1), (d1, v1)) 4. Let the multiplication map m : H(2) n Hn be defined as m ((Pσ,1, A1), (d1, v1)), ((Pσ,2, A2), (d2, v2)) = ((Pσ,2Pσ,1, P T σ,1A2Pσ,1A1), (d1, v1)). Learnability of Linear Port-Hamiltonian Systems 5. Let the identity section : ΘCHn Hn be defined as (d, v) := ((In, I2n), (d, v)). 6. Let the inversion map i : Hn Hn be defined as i((Pσ, A), (d, v)) := ((P T σ , PσA 1P T σ ), (Pσd, PAv)). Proposition 26 The definition above determines a Lie groupoid Hn ΘCHn with Hn the total space, ΘCHn the base space, and structure maps α, β, m, , i. We refer to Hn ΘCHn as the reduced port-Hamiltonian groupoid. The orbit space of this groupoid ΘCHn/Hn coincides with ΘCHn/ . Theorem 22 can now be restated in terms of the elements that we just introduced. Theorem 27 The orbit spaces of the Lie groupoids Gn ΘPHn and Hn ΘCHn are isomorphic. 4.4 Characterization of canonical port-Hamiltonian systems In Subsections 4.2 and 4.3 we have provided a characterization of PHn/ sys in terms of ΘCHn/ and groupoid orbit spaces. Recall from Subsection 4.1 that the difficulty of the unique identifiability of filters in PHn comes from two parts: the possible presence of non-canonical systems, and the possible initialization inconsistency. We have shown in Subsection 4.1 that, by restricting to canonical systems, the filters induced by controllable Hamiltonian representations CHcan n can be uniquely identified, even though we still cannot do the same for PHcan n . Hence, it is worth studying what the quotient spaces above look like when restricted to the subset that contains only canonical port-Hamiltonian systems. In this section, we take a step in that direction. Recall that a port-Hamiltonian system in PHn of the form (5) is controllable (or equivalently, observable/canonical) if and only if B | JQB | . . . | (JQ)2n 1B Using the Williamson decomposition of Q into D and S, and v := S B, this is equivalent to By definition, we have that PHcan n (respectively, Θcan CHn) is a subset of PHn (respectively, ΘCHn) made of systems that satisfy (16) (respectively, (17)). We now characterize the space of pairs (d, v) ΘCHn that correspond to canonical port-Hamiltonian systems in normal form. The calculation of the determinant in (17) yields 1 j 0 for l {1, . . . , n} We shall refer to the statement on the entries of d being all different as the non-resonance condition and to v2 n+l > 0 for all l {1, . . . , n} as the nondegeneracy condition. There might be a concern about whether different choices of the matrix S lead to different vectors v and hence the notion of nondegeneracy would be ill-defined. This is indeed not a Juan-Pablo Ortega and Daiying Yin problem since, as we show in Remark 28 below, once the non-resonance condition is assumed, different vectors v are obtained by rotating the planes spanned by each and every pair of l-th and n + l-th entries, which preserves the value of v2 n+l. Thus, the nondegeneracy condition is actually based on the non-resonance condition. Remark 28 (Williamson s decomposition in the canonical case) We have mentioned in Theorem 3 (iii) that two symplectic matrices S and S that Williamson decompose the same Q differ by a unitary matrix. We now note that for an element Q that satisfies the non-resonance condition, S and S do not only differ by an arbitrary U U(n), (see (11) for the definition of U(n)) but by a special one R that has the form cos θ1 0 sin θ1 0 ... ... 0 cos θn 0 sin θn sin θ1 0 cos θ1 0 ... ... 0 sin θn 0 cos θn This fact accounts for part of the symmetry that we shall spell out later on. The proof of this fact is purely computational: the assumption that the diagonal entries of D are all positive and distinct, the fact that U satisfies the equation U at the same time, U U(n) = SO(2n, R) Sp(2n, R), guarantees the claim. Remark 29 (Being canonical is a generic property) It is well-known that the set of canonical systems, as a subset of all linear systems, corresponds to a Zariski open set, which is open and dense in the usual topology (Tcho (1983)). In particular, this also holds for linear port-Hamiltonian systems. Therefore, PHcan n is open and dense in PHn. On the other hand, using the characterization provided above, it is clear that Θcan CHn is also open and dense in ΘCHn. The isomorphism in Theorem 22 naturally restricts to canonical subsets, that is, PHcan n / sys = Θcan CHn/ . On the other hand, we will see below another isomorphism result involving PHcan Proposition 30 (Characterization of PHcan n / sys as Θcan The map Φ : Θcan CHn/ sys PHcan n / sys defined by Φ([d, v]sys) = D = diag(d), is an isomorphism. We just proved that both Θcan CHn/ and Θcan CHn/ sys are isomorphic to PHcan n / sys, and even via the same ismorphism Φ. Therefore, the equivalence relations and sys coincide when restricted to Θcan CHn. To summarize, we have proved in this subsection that n / sys = Θcan CHn/ = Θcan In the next subsection, we continue the investigation of the above chain of isomorphisms. Learnability of Linear Port-Hamiltonian Systems 4.5 The unique identifiability space for canonical port-Hamiltonian systems as a group orbit space In Subsection 4.3, it is proved that the quotient space PHn/ sys can be treated as a Lie groupoid orbit space. We now show that the restricted quotient space to canonical port Hamiltonian systems, that is, PHcan n / sys, is isomorphic to the orbit space of a certain group action on Θcan CHn, where the group is a semi-direct product of the n-permutation group and the n-torus, that is, Sn φ Tn. The intuition behind this fact is that restricting to the subset of canonical systems PHcan n removes the degeneracies in PHn, which allows to reduce the symmetry of the Lie groupoid Gn ΘPHn to that of the Lie group Sn φ Tn. We start by defining the group action. First, let the permutation group Sn act on Rn by permuting the entries di of the vector d Rn. For each i {1, . . . , n} the circle S1 acts on the plane spanned by the i-th and (n + i)-th entries of v by rotations. More precisely, we define the action of Sn on elements d and v as (d1, . . . , dn)T = (dσ(1), . . . , dσ(n))T = Pσ (d1, . . . , dn)T where Pσ is the corresponding permutation matrix and (v1, . . . , v2n)T = (vσ(1), . . . , vσ(n), vn+σ(1), . . . , vn+σ(n))T = (v1, . . . , v2n)T , respectively. Then the σ-action on a pair (d, v) is understood as acting on d and v simultaneously. We also define the action of the i-th circle of the torus Tn as the planar rotation of the space spanned by the i-th and (n+i)-th entries of v. This torus action is understood to leave d invariant. More concretely, it is the action (d1, . . . , dn, v1, . . . , v2n)T = (d1, . . . , dn, v1, . . . , vi 1, cosθivi sinθivn+i, vi+1, . . . , vn, vn+1, . . . , vn+i 1, sinθivi + cosθivn+i, vn+i+1, . . . , v2n)T . With these actions of the groups Sn and Tn on ΘCHn we define the map Γ(σ,(θ1,...,θn)T ) : (Rn Γ(σ,(θ1,...,θn)T )(d, v) = Γθ1 Γθn Γσ(d, v) = (Pσ d, Γθ1 Γθn ) = (Pσ d, RP v), (19) which constitutes an action of the semi-direct product group Sn φ Tn, where φ : Sn Aut(Tn) is given by the permutation φ(σ)((θ1, . . . , θn)T ) = Pσ (θ1, . . . , θn)T . Note that the matrix of Γθ1 Γθn is given by R in (18), Pσ is the permutation matrix that corresponds to σ Sn, and P = Proposition 31 The map Γ(σ,(θ1,...,θn)T ) defined as (19) for σ Sn and (θ1, . . . , θn)T Tn is a left group action of (Sn φ Tn) on ΘCHn. Juan-Pablo Ortega and Daiying Yin Using the definition of the (Sn φ Tn)-action on ΘCHn, two elements (d1, v1), (d2, v2) ΘCHn are in the same orbit if and only if the following conditions hold true for some σ Sn: (i) d2,i = d1,σ(i), 1,σ(i) + v2 1,n+σ(i), i = 1, . . . , n. By Proposition 18 (I) parts (i) and (ii), it can be seen that there could be a close relation between the (Sn φ Tn)-action and the equivalence relation sys on ΘCHn. The next proposition demonstrates that the orbit spaces of the (Sn φ Tn)-action coincide with the equivalence classes of the relation sys when we restrict our attention to the subset Θcan Proposition 32 (Characterization of Θcan CHn/ sys as Θcan CHn/(Sn φ Tn)) Given (d1, v1) and (d2, v2) in Θcan CHn, then (d1, v1) sys (d2, v2) if and only if (d1, v1) and (d2, v2) lie in the same orbit of the (Sn φ Tn)-action. 4.6 Global Euclidean coordinates for the unique identifiability space of canonical port-Hamiltonian systems Recall from Section 4.4 that Θcan CHn contains pairs (d, v) where d Rn + and v R2n are such that the entries dl s are all distinct and v2 n+l > 0 for all l = 1, . . . , n. We define for convenience a function R : R2n Rn 0 as R((v1, . . . , v2n)T ) = n+1, . . . , v2 T . Now observe that the quotient space Θcan CHn/(Sn φ Tn) naturally has a smooth manifold structure. We briefly prove this in the following lines. Note that the torus Tn is a connected abelian compact Lie group. The symmetry group Sn is a finite group, and hence compact as well. Thus, it is easy to see that the semi-direct product Sn φ Tn is also a compact Lie group, and hence its action on Θcan CHn is automatically proper. On the other hand, since Θcan CHn is the space of (d, v) pairs satisfying that d contains distinct entries and R(v)(l) > 0 for l = 1, . . . , n, it necessarily holds that the only element in Sn φ Tn that possibly keep any element in Θcan CHn invariant is the identity, which implies the (Sn φ Tn)-action on Θcan CHn is free. Classical results in Lie theory (Ortega and Ratiu, 2004, Proposition 2.3.8) guarantee that Θcan CHn/(Sn φ Tn) admits a unique smooth structure such that the quotient map π : Θcan CHn/(Sn φ Tn) is a submersion. With this as a motivation, we try to find the quotient space explicitly in the following. For a fixed d, we denote by d the reordered vector constructed out of d by placing the entries in increasing order. Denote by Rn + the set of d Rn + with distinct positive entries in increasing order. We have then the following proposition that explicitly characterizes the quotient space Θcan CHn/(Sn φ Tn). Proposition 33 (Global Euclidean coordinates for orbit space Θcan CHn/(Sn φ Tn)) The map f : Θcan CHn/(Sn φ Tn) Rn + defined by f([d, v]) = (d , R(Γσ(v))), where σ Sn is the unique permutation such that Γσ(d) = d , is an isomorphism. 5. Linear port-Hamiltonian systems in normal form are restrictions of higher dimensional ones In this section, we prove a theorem (Theorem 34), inspired by the classical Kalman Decomposition (Jacob and Zwart (2012)), which says the filter induced by any (Q, B) PHn can Learnability of Linear Port-Hamiltonian Systems be regarded as that induced by some (Q , B ) PHm, where m can be any integer that is at least n. The motivation for these considerations is given by the fact that in many practical situations in which an input/ouput system has to be learned, the dimension of the underlying state-space system is not known. In that situation, we may want to have the flexibility of considering the actual system that needs to be learned as a lower-dimensional restriction of a much larger-dimensional one that we have picked for the learning task. We shall carry this out by producing an explicit injective system morphism between the state space of (Q, B) and that of (Q , B ) in our next Theorem 34. In Proposition 35, we show that the quotient space PHn/ sys can be characterized as PHm,n/ sys, where PHm,n PHm is the space containing all the systems of the form (Q , B ). Motivated by the developments in Section 4, we then characterize the pair (d , v ) that corresponds to (Q , B ) in Proposition 36. Eventually, in Proposition 37, we show that the isomorphism PHn/ sys = ΘCHn/ can be lifted to high dimension as well. We shall comment further at the end of this section on the significance of the above-mentioned results in the context of machine learning. The following theorem states that the filter induced by (Q, B) PHn can be reproduced using systems in an arbitrarily higher dimension. Theorem 34 Given any system (Q, B) PHn, then (i) For any m n, there exists an orthogonal matrix O O(2m, R) such that the filter induced by (Q , B ) = Q 0 0 I2m 2n PHm coincides with that induced (ii) The map f : R2n R2m defined by f(z) = O z is an injective system morphism between the state spaces of (Q, B) and (Q , B ). As it can be seen in the proof (included in Appendix 9.10), the matrix O O(2m, R) above is constructed so that Jn 0 0 Jm n OT = Jm. (20) From now on, we denote by PHm,n PHm the space of linear port-Hamiltonian sys- tems parametrized by pairs (Q , B ) of the form Q 0 0 I2m 2n O(2m, R) satisfies (20), and equip it with the system automorphism relation sys defined on PHm. The following proposition states that, up to system isomorphism, PHn is indeed the same as PHm,n. This means that, with appropriate initialization, we can exactly reproduce the input/output dynamics of 2n-dimensional port-Hamiltonian systems in higher dimension by simply considering the elements (Q , B ) in PHm,n. Proposition 35 The function f : PHn/ sys PHm,n/ sys defined by f([Q, B]sys) = Q 0 0 I2m 2n is an isomorphism, where O O(2m, R) is as in Theorem 34 and hence satisfies (20). Juan-Pablo Ortega and Daiying Yin Recall that for a system (Q, B) PHn, we derive the corresponding object (d, v) ΘCHn from Williamson s decomposition Q = ST S and v = S B. We have seen that (Q , B ) PHm,n PHm is also a linear port-Hamiltonian system in normal form. Therefore, it makes sense to investigate the relation between (d, v) and the element (d , v ) which corresponds to (Q , B ). The following proposition asserts that d can be obtained from d by padding it with ones and, similarly, v can be obtained by splitting v and padding each segment with zeros. Proposition 36 (Symplectic eigenvalues of the higher dimensional system) Let (Q, B) and (Q , B ) be as in Theorem 34, and let d and d be their corresponding symplectic eigenvalues. Then, up to reordering, d = (d1, , dn, 1, 1, . . . , 1)T . Even though v and v are not uniquely determined (See Remark 8), there exists a choice of v that is related to v = (v1, , vn, vn+1, , v2n)T via v1, , vn, 0, , 0 , vn+1, , v2n, 0 0 From the above proposition, we call d the extended symplectic eigenvalues and v the extended vector. Now we define the space ΘCHm,n as the set of all pairs of the form (d , v ) and equip ΘCHm,n with the equivalence relation as in Definition 20 but in dimension m instead of n. Recall that we proved ΘCHn/ = PHn/ sys. Now we proceed to show that the above isomorphism in dimension 2n can be lifted to dimension 2m by considering only the restricted parameter spaces with vectors of the form (d , v ) and (Q , B ). Proposition 37 The function f : ΘCHm,n/ PHm,n/ sys defined by f([d , v ] ) = where D = diag(d ), is an isomorphism. Note that in general d contains repeated symplectic eigenvalues because of all the ones used in the extension and that v 2 m+l = 0 for l > n. Therefore, it is impossible that ΘCHm,n contains canonical systems for m > n. In other words, lifting PHn to PHm,n introduces degeneracies that exclude the possibility of the systems being canonical. We emphasize that the above-mentioned series of results are crucial in machine learning applications. Very often in practice, the dimension 2n of the underlying data-generating process, that is, the latent port-Hamiltonian system (5), is not known, causing a problem when choosing the dimension of the controllable/observable Hamiltonian representation for learning. This issue can be solved by composing the morphism in Theorem 34 (ii) (which is injective) and the one in Theorem 7 (not necessarily injective). The composition of system morphisms is still a system morphism, this time between the underlying system θPHn(Q, B) and the observable Hamiltonian representation in an arbitrarily higher dimension 2m 2n. In this way, the observable Hamiltonian representations in dimension 2m still have full expressive power to represent any 2n-dimensional system in PHn, and hence can be used for learning. Practically, one can choose a sufficiently large m, and parameterize the observable Hamiltonian representation using (d, v) (we use the notation (d, v) instead of (d , v ) Learnability of Linear Port-Hamiltonian Systems because practically we do not know what n is) and then estimate them. We emphasize that the higher-dimensional port-Hamiltonian systems are in general not canonical, hence the (d, v)-pair that corresponds to the data-generating process is not guaranteed to be unique. Still, we always know there is at least one choice of (d, v) that works no matter how large an m we choose, and which is constructed using the recipe in Proposition 36. 6. Practical implementation of the results We start with a diagram that summarizes the results that we have proved. Theorem 38 The following diagram holds true using the isomorphisms explicitly constructed in all the preceding results. We denote the inclusion between one set and the other by a one-directional arrow. ΘCHm,n/ PHm,n/ sys ΘCHn/Hn ΘCHn/ PHn/ sys ΘPHn/Gn CHn/(Sn φ Tn) Θcan CHn/ sys Θcan CHn/ filter We now comment on how to use the results contained in the diagram above depending on the different learning situations that we may encounter. Indeed, we can use our statements to tackle three different learning scenarios: Case 1: The target port-Hamiltonian system (the data generating process that we want to learn) is canonical and its state-space dimension is known, that is, θPHn(Q, B) PHcan n with n known. This is the most favorable situation in the sense that we can exactly represent the system θPHn(Q, B) by either the controllable or the observable Hamiltonian representations, which are both isomorphic to the original system. Furthermore, since, in this case, the input/output map can be uniquely identified by properly setting up the initialization, it can be learned by estimating an initial state condition of the representation used and the unique parameters in Rn Case 2: The target port-Hamiltonian system is not guaranteed to be canonical but its dimension is known, that is, θPHn(Q, B) PHn with n known. In this case, there is a trade-offbetween the controllable Hamiltonian representation and the observable one. Juan-Pablo Ortega and Daiying Yin As mentioned before, the controllable one will be structure-preserving but its expressive power depends on the controllability of the target system θPHn(Q, B). On the other hand, the observable one always possesses full expressive power but does not always guarantee the port-Hamiltonian structure of the induced filter. Case 3: We are agnostic about the dimension of the target port-Hamiltonian system, that is, given θPHn(Q, B) PHn with n unknown. In this case, we need to choose a sufficiently large m so that m n, then based on composition of system morphisms, it suffices to learn some (d, v) ΘCHm and use the 2m-dimensional observable Hamiltonian representation to reproduce the input-output dynamics of (Q, B). Due to the loss of the canonical property, such a (d, v) pair may not be unique. Additionally, we do not know the dimension 2n of the data generating process, and hence we are ignorant of how many ones should be padded into d (and similarly, how many zeros are padded into the vector v). However, we do know that an element (d, v) exists in some ΘCHm,n ΘCHm that captures the input/output dynamics, given by Proposition 36. An important special case is when there is no input to the port-Hamiltonian system, that is, u(t) = 0. In this case, the port-Hamiltonian system reduces to a linear Hamiltonian system with an arbitrary linear readout matrix. We emphasize that the observable Hamiltonian representation in a higher dimension is totally independent of B since it is simply given by y = (0, 0, , 0, 1) s, (21) In other words, Hamiltonian systems with linear readout can be learned by adjusting the initial state s0 and symplectic eigenvalues di, without even knowing the linear readout function that yields the observations. 7. Numerical illustrations In this section, we present two numerical examples to demonstrate the effectiveness of our representation results from a learning point of view. 7.1 Non-dissipative circuit Similar to an example in Medianu and Lefevre (2021), we consider a circuit consisting of a power source with voltage V = u(t), together with five parallelizations, each of them containing a capacitor Ci with charge Qi and an inductor Li with magnetic flux linkage φi for i = 1, . . . , 5 (see Figure 1). Using Kirchhofflaws, we obtain the following port Hamiltonian system in normal form (22) and (23), where the Hamiltonian of the system is H(Q1, . . . , Q5, φ1, . . . , φ5) = Q2 Learnability of Linear Port-Hamiltonian Systems H Q1 ... H Q5 Figure 1: Lossless circuit port-Hamiltonian system This port-Hamiltonian system treats the power supply V = u as input and the current through the power supply, that is y, as output. One verifies that such a system is noncanonical. Our purpose is to learn the input-output behavior of this system without any access to the internal physical state and train only with input-output observations. In our implementation, we choose for simplicity Ci = 1 and Li = 1 for i = 1, . . . , 5. We choose to learn with a 10-dimensional observable Hamiltonian representation to show that the dynamics can be captured even in the non-canonical case. (Indeed, with our choice of Ci s and Li s, the system is readily checked to be noncanonical). We randomly generate an initial condition for the ground-truth system and integrate it using Euler s method (see Appendix 9.14 for more sophisticated structure-preserving integration methods) with a discretization step of 0.01 for 1000 time steps. The input is chosen as u(t) = sin(t). The 1000 pairs of input and output data will be used as training data. During the training phase, we estimate the initial state x R10 as well as the parameters d R5 + and v R10. This is carried out via gradient descent using a learning rate of λ = 0.1 for 500 epochs. At each gradient descent iteration we integrate the state-space equations corresponding to the Juan-Pablo Ortega and Daiying Yin current parameter values over 1000 times steps with Euler s method and then we compute the squared error with respect to the training set. We set a testing period of 4000 time steps and demonstrate the robustness of our approach by not only testing our trained model on the original input u(t) = sin(t) but evaluating on other three commonly used input signals (see Figure 2, 3, 4 and 5). The numerical experiments provide a strong indication that the underlying system is learned independently of the input signal and is robust with respect to various forms of inputs. (a) Input signal u(t) (b) output y(t) Figure 2: Training and testing on a sinusoidal signal. (a) Input signal u(t) (b) output y(t) Figure 3: Testing on a constant signal. The training had been carried out using a sinusoidal signal. See Figure 2. 7.2 Positive definite Frenkel-Kontorova model As a second example, we consider a modification of the well-known Frenkel-Kontorova model such that it becomes a linear port-Hamiltonian system with a positive-definite Hamiltonian function. Recall that the general form of Frenkel-Kontorova model describes the motion of classical particles with nearest neighbor interactions using periodic potentials. The Hamil- Learnability of Linear Port-Hamiltonian Systems (a) Input signal u(t) (b) output y(t) Figure 4: Testing on a square signal. The training had been carried out using a sinusoidal signal. See Figure 2. (a) Input signal u(t) (b) output y(t) Figure 5: Testing on a ramp signal. The training had been carried out using a sinusoidal signal. See Figure 2. tonian function can be written as 1 cos qn + 1 2g (qn+1 qn a0)2 Since we are dealing with linear systems, we remove the periodic potential and rescale the potential coefficient. By fixing a0 = 0, we obtain the Hamiltonian n + (qn+1 qn)2 In order to consider a Hamiltonian that is strictly positive definite, we add a term 1 1 to the Hamiltonian, which carries the physical meaning that the particle q1 interacts with Juan-Pablo Ortega and Daiying Yin the origin via a spring. In summary, our model of interest now has the positive-definite Hamiltonian n + (qn+1 qn)2 n + (qn+1 qn)2 For the sake of simplicity, we consider in the above, a Hamiltonian system with N = 2 unit mass particles (so that pi = qi) and an external force F = u that is imposed on the first particle. This gives a linear port-Hamiltonian system in normal form as below with the output being the velocity of the first particle. q1 q2 p1 p2 H q1 H q2 H p1 H p2 In contrast to the first example, this system is canonical. Therefore, based on our theoretical results, any input-output dynamics can be captured by either a controllable or an observable Hamiltonian representation, and furthermore, it is possible to uniquely identify the system by learning an initial condition, and the parameters in the quotient space R2 +. For the sake of numerical illustration, we choose the initial state condition x = (2, 1, 3, 3)T for the ground-truth system and integrate it 1000 time steps times using Euler s method with step of 0.01 (see Appendix 9.14 for more sophisticated structure-preserving integration methods), where the input is chosen as u(t) = sin(t). The 1000 pairs of input and output data are then used as training data. As motivated above, we apply two different training mechanisms in which we learn the initial state condition and the parameter values of the model using both the natural parameters from ΘOHn of the observable Hamiltonian representation and those in the unique identifiability space R2 +. As in the previous example, we carry out the training using gradient descent with a learning rate of λ = 0.02 over 1500 epochs out of randomly chosen initial values for the initial state condition and the model parameters in ΘCHn and R2 +. We record the validation error during the 1500 gradient descent iterations of both training mechanisms to compare their convergence rates. Heuristically, it should be expected that the rate of convergence is faster when the models are trained using the coordinates that provide unique identifiability. This is empirically confirmed in Figure 6 (indeed, unique identifiability provides exponentially faster convergence). After 1500 iterations, the prediction accuracy when training was carried out using the unique identifiability space significantly outperforms the other setting, as can be seen in Figure 7. Moreover, we found that the learned parameters d R2 + are exactly the same as the eigenvalues of the Hamiltonian matrix, which is theoretically guaranteed by the unique identifiability. It is worth emphasizing that, despite the difference in the convergence rates, both mechanisms eventually lead to perfect path continuations of the input-output dynamics after enough training iterations. Learnability of Linear Port-Hamiltonian Systems Figure 6: Logarithm of validation errors of the two training mechanisms based on using the natural parameters of the observable representation and the unique identifiability space (a) Using observable representation (b) Using unique identifiability space Figure 7: Training and testing performance of the two training mechanisms after 1500 gradient descent iterations based on using the natural parameters of the observable representation (pane (a)) and the unique identifiability space (pane (b)) 8. Conclusions In this paper, we have introduced a complete structure-preserving learning scheme for singleinput/single-output (SISO) linear port-Hamiltonian systems. The construction is based on the solution, when possible, of the unique identification problem for these systems, in ways that reveal fundamental relationships between classical notions in control theory and crucial properties in the machine learning context, like structure-preservation and expressive power. The main building block in our construction is a representation result that we introduced for linear port-Hamiltonian systems in normal form that provides two subfamilies of linear systems that are by construction controllable and observable (Definition 5). We showed that morphisms can be established between the elements in these families and those in the category of normal form port-Hamiltonian systems. The existence of these morphisms immediately guarantees that the complexity of a generic subset of the family of Juan-Pablo Ortega and Daiying Yin port-Hamiltonian filters is actually not O(n2), as it could be guessed from the standard parametrization of this family, but O(n). We showed that the expressive power of our proposed representations is limited for non-canonical port-Hamiltonian systems. Indeed, we saw that the observable representation is guaranteed to capture all possible input-output dynamics of port-Hamiltonian systems (full expressive power), but it does not always produce port-Hamiltonian dynamics (fails to be structure-preserving). In the controllable case, structure preservation is guaranteed, but there is, in general, no full expressive power. For canonical port-Hamiltonian systems, these representations are both structure-preserving and have full expressive power. We saw that even in the canonical situation, the availability of the controllable/observable representations did not yet provide a well-specified learning problem for this category since the invariance of these systems under system automorphisms implies the existence of symmetries (or degeneracies) in those parametrizations. We tackled this problem by solving the unique identifiability of input-output dynamics of linear port-Hamiltonian systems in normal form up to initializations by characterizing the quotient space by system automorphisms as a Lie groupoid orbit space. Moreover, we showed that in the canonical case the corresponding quotient spaces can be characterized as orbit spaces with respect to an explicit group action and can be explicitly endowed with a smooth manifold structure that has global Euclidean coordinates that can be used at the time of constructing estimation algorithms. Consequently, we showed that canonical port-Hamiltonian dynamics can be identified fully and explicitly in either the controllable or the observable Hamiltonian representations and learned by estimating an initial state condition and a unique set of parameters in a smooth manifold obtained as a group orbit space. Additionally, we complemented this learning scheme with results that allow us to extend it to situations where we remain agnostic regarding the dimension of the underlying data-generating port-Hamiltonian system. We concluded the paper with some numerical examples that illustrate the viability of the method we propose in systems with various levels of complexity and dimensions and the computational advantages associated with using the parameter space in which unique identification is guaranteed. Acknowledgments The authors thank Lyudmila Grigoryeva for helpful discussions and remarks and acknowledge partial financial support from the Swiss National Science Foundation (grant number 175801/1) and the School of Physical and Mathematical Sciences of the Nanyang Technological University. DY is funded by the Nanyang President s Graduate Scholarship of Nanyang Technological University. Glossary of Symbols ΘCHm,n The space of parameters (d , v ) for PHm,n + The set of n-tuples of distinct positive real numbers in increasing order Tn The n-torus CHn/OHn The space of filters induced by CHn/OHn Gn ΘP Hn Port-Hamiltonian groupoid, see Proposition 24 Learnability of Linear Port-Hamiltonian Systems Hn ΘCHn Reduced port-Hamiltonian groupoid, see Proposition 26 PHn The space of input-output dynamics/filters induced by systems in PHn n The space of input-output dynamics/filters induced by systems in PHcan sp(2n, R) Lie algebra of the symplectic group An equivalence relation defined on ΘCHn filter The equivalence relation of inducing the same filter sys The equivalence relation of system automorphism ΘCHn/ΘOHn The space of parameters (d, v) for CHn and/or OHn, which are the same θCHn/θOHn The map that send parameters in for ΘCHn/ΘOHn to the corresponding state space system in CHn/OHn CHn The subset of ΘCHn that corresponds to canonical systems ΘP Hn The space of parameters (Q, B) for PHn θP Hn The map that sends parameters in for ΘP Hn to the corresponding state space system in PHn B Input matrix of a port-Hamiltonian system in normal form CHn/OHn The space of 2n-dimensional controllable/observable Hamiltonian representations F : Z U Z State equation H : R2n R Hamiltonian function PHn The space of 2n-dimensional linear normal form port-Hamiltonian systems (5) n The subspace of PHn consisting of canonical linear normal form port Hamiltonian systems PHm,n The subspace of PHm containing all (Q , B ) = Q 0 0 I2m 2n (Q, B) PHn, O O(2m, R) Q Quadratic form that determines a linear Hamiltonian system Sn Permutation group of n-elements Sp(2n, R) Symplectic group Canonical symplectic matrix Ralph Abraham and Jerrold E. Marsden. Foundations of Mechanics. Addison-Wesley, Reading, MA, 2nd edition, 1978. Ralph Abraham, Jerrold E. Marsden, and Tudor S. Ratiu. Manifolds, Tensor Analysis, and Applications, volume 75. Applied Mathematical Sciences. Springer-Verlag, 1988. Juan-Pablo Ortega and Daiying Yin Beatrice Acciaio, Anastasis Kratsios, and Gudmund Pammer. Metric hypertransformers are universal adapted maps. ar Xiv preprint ar Xiv:2201.13094, 2022. V. I. Arnold. Mathematical Methods of Classical Mechanics. Springer, 1989. Coryn A L Bailer-Jones, David J C Mac Kay, and Philip J Withers. A recurrent neural network for modelling dynamical systems. Network: Computation in Neural Systems, 9 (4):531 547, 1998. Thomas Beckers, Jacob Seidman, Paris Perdikaris, and George J Pappas. Gaussian process port-Hamiltonian systems: Bayesian learning with physics prior. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 1447 1453. IEEE, 2022. Jean-Michel Bismut. M ecanique al eatoire. Springer, 1982. Roger W Brockett and Abdolhossein Rahimi. Lie algebras and linear differential equations. Ordinary Differential Equations, pages 379 386, 1972. Elena Celledoni, Andrea Leone, Davide Murari, and Brynjulf Owren. Learning hamiltonians of constrained mechanical systems. Journal of Computational and Applied Mathematics, 417:114608, 2023. Renyi Chen and Molei Tao. Data-driven prediction of general Hamiltonian dynamics via learning exactly-symplectic maps. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1717 1727. PMLR, 2021. Zhengdao Chen, Jianyu Zhang, Martin Arjovsky, and L eon Bottou. Symplectic recurrent neural networks. In International Conference on Learning Representations, 2020. Karim Cherifi. An overview on recent machine learning techniques for Port Hamiltonian systems. Physica D: Nonlinear Phenomena, 411:132620, 2020. Peter E. Crouch and Arjan van der Schaft. Variational and hamiltonian control systems. G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4):303 314, dec 1989. Maurice A De Gosson. Symplectic geometry and quantum mechanics, volume 166. Springer Science & Business Media, 2006. Shaan A Desai, Marios Mattheakis, David Sondak, Pavlos Protopapas, and Stephen J Roberts. Port-Hamiltonian neural networks for learning explicit time-dependent dynamical systems. Physical Review E, 104(3):34312, 2021. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. Co RR, abs/1810.0, 2018. I L Egusquiza and A Parra-Rodriguez. Algebraic canonical quantization of lumped super- conducting networks. Physical Review B, 106(2):24510, jul 2022. Learnability of Linear Port-Hamiltonian Systems Lukas Gonon and Juan-Pablo Ortega. Reservoir computing universality with stochastic inputs. IEEE Transactions on Neural Networks and Learning Systems, 31(1):100 112, 2020. Lukas Gonon and Juan-Pablo Ortega. Fading memory echo state networks are universal. Neural Networks, 138:10 13, 2021. Oscar Gonzalez. Time integration and discrete Hamiltonian systems. In Mechanics: from theory to computation, pages 257 275. Springer, 2000. Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. In Advances in Neural Information Processing Systems, pages 15353 15363, 2019. Lyudmila Grigoryeva and Juan-Pablo Ortega. Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems. Journal of Machine Learning Research, 19(24):1 40, 2018a. Lyudmila Grigoryeva and Juan-Pablo Ortega. Echo state networks are universal. Neural Networks, 108:495 508, 2018b. Lyudmila Grigoryeva and Juan-Pablo Ortega. Dimension reduction in recurrent networks by canonicalization. Journal of Geometric Mechanics, 13(4):647 677, 2021. Sepp Hochreiter and J urgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735 1780, 1997. Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359 366, 1989. Martin Idel, Sebasti an Soto Gaona, and Michael M Wolf. Perturbation bounds for Williamson s symplectic normal form. Linear Algebra and its Applications, 525:45 58, 2017. Kh. D Ikramov. On the symplectic eigenvalues of positive definite matrices. Moscow Uni- versity Computational Mathematics and Cybernetics, 42:1 4, 2018. Birgit Jacob and Hans Zwart. Linear Port-Hamiltonian Systems on Infinite-dimensional Spaces. Birkh auser, 2012. Herbert Jaeger and Harald Haas. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science, 304(5667):78 80, 2004. Pengzhan Jin, Zhen Zhang, Aiqing Zhu, Yifa Tang, and George Em Karniadakis. Symp Nets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks, 132:166 179, 2020. R. E. Kalman. Mathematical description of linear dynamical systems. Journal of the Society for Industrial and Applied Mathematics Series A Control, 1(2):152 192, 1963. Juan-Pablo Ortega and Daiying Yin George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422 440, 2021. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Image Net classification with deep convolutional neural networks. In F Pereira, C J Burges, L Bottou, and K Q Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. Joan-Andreu L azaro-Cam ı and Juan-Pablo Ortega. Stochastic hamiltonian dynamical sys- tems. Reports on Mathematical Physics, 61(1):65 122, 2008. Benedict Leimkuhler and Sebastian Reich. Simulating Hamiltonian Dynamics. Cambridge University Press, 2004. Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-Net: Learning PDEs from Data. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3208 3216. PMLR, 2018. Zhixin Lu, Brian R. Hunt, and Edward Ott. Attractor reconstruction by machine learning. Chaos, 28(6), 2018. Kirill C H Mackenzie. General theory of Lie groupoids and Lie algebroids. Number 213. Cambridge University Press, 2005. Jerrold E. Marsden and Tudor S. Ratiu. Introduction to mechanics and symmetry. Springer- Verlag, New York, second edition, 1999. Jerrold E. Marsden and Matthew West. Discrete mechanics and variational integrators. Acta Numerica, 10:357 514, 2001. B.M. Maschke and A.J. van der Schaft. Port-controlled hamiltonian systems: Modelling origins and systemtheoretic properties. IFAC Proceedings Volumes, 25(13):359 365, 1992. 2nd IFAC Symposium on Nonlinear Control Systems Design 1992, Bordeaux, France, 2426 June. Robert I Mc Lachlan and G Reinout W Quispel. Geometric integrators for ODEs. Journal of Physics A: Mathematical and General, 39(19):5251, 2006. Silviu Medianu and Laurent Lefevre. Structural identifiability of linear port hamiltonian systems. Systems & Control Letters, 151:104915, 2021. Silviu Medianu, Laurent Lefevre, and Dan Stefanoiu. Identifiability of linear lossless Port- controlled Hamiltonian systems. In 2nd International Conference on Systems and Computer Science, pages 56 61, 2013. Sumona Mukhopadhyay and Santo Banerjee. Learning dynamical systems in noise using convolutional neural networks. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(10):103125, 2020. Learnability of Linear Port-Hamiltonian Systems S P Nageshrao, G A D Lopes, D Jeltsema, and R Babuska. Adaptive and learning control of port-Hamiltonian systems: a survey. IEEE Transactions on Automatic Control, page 37, 2015. Juan-Pablo Ortega and Tudor S. Ratiu. Momentum Maps and Hamiltonian Reduction. Birkhauser Verlag, 2004. Jaideep Pathak, Brian Hunt, Michelle Girvan, Zhixin Lu, and Edward Ott. Model-Free Pre- diction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach. Physical Review Letters, 120(2):24102, 2018a. Jaideep Pathak, Alexander Wikner, Rebeckah Fussell, Sarthak Chandra, Brian R. Hunt, Michelle Girvan, and Edward Ott. Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model. Chaos, 28(4), 2018b. Jan Willem Polderman and Jan C. Willems. Introduction to Mathematical Systems Theory. Springer New York, NY, 1998. Tong Qin, Kailiang Wu, and Dongbin Xiu. Data driven governing equations approximation using deep neural networks. Journal of Computational Physics, 395:620 635, oct 2019. H a Quang Minh and Vittorio Murino. Covariances in Computer Vision and Machine Learn- ing. Morgan and Claypool Publishers, 2018. Maziar Raissi and George E Karniadakis. Hidden physics models: machine learning of nonlinear partial differential equations. Co RR, abs/1708.0, 2017. URL http://arxiv. org/abs/1708.00588. Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learn- ing (part i): Data-driven solutions of nonlinear partial differential equations. ar Xiv preprint ar Xiv:1711.10561, 2017. Anna Shalova and Ivan Oseledets. Tensorized transformer for dynamical systems modeling. ar Xiv preprint ar Xiv:2006.03445, 2020. Nguyen Son and Tatjana Stykel. Symplectic eigenvalues of positive-semidefinite matrices and the trace minimization theorem. 08 2022. doi: 10.48550/ar Xiv.2208.05291. Eduardo Sontag. Mathematical Control Theory: Deterministic Finite Dimensional Systems. Springer-Verlag, 1998. Hector J., Sussmann. Existence and uniqueness of minimal realizations of nonlinear systems. Mathematical systems theory, 1976. Krzysztof Tcho. On generic properties of linear systems: An overview. Kybernetika, 19, 01 Yunjin Tong, Shiying Xiong, Xingzhe He, Guanghan Pan, and Bo Zhu. Symplectic neural networks in taylor series form for hamiltonian systems. Journal of Computational Physics, 437:110325, 2021. Juan-Pablo Ortega and Daiying Yin Riccardo Valperga, Kevin Webster, Dmitry Turaev, Victoria Klein, and Jeroen Lamb. Learning reversible symplectic dynamics. In Roya Firoozi, Negar Mehr, Esen Yel, Rika Antonova, Jeannette Bohg, Mac Schwager, and Mykel Kochenderfer, editors, Proceedings of The 4th Annual Learning for Dynamics and Control Conference, volume 168 of Proceedings of Machine Learning Research, pages 906 916. PMLR, 23 24 Jun 2022. Arjan van der Schaft and Dimitri Jeltsema. Port-hamiltonian systems theory: an introduc- tory overview. Foundations and Trends in Systems and Control, 1(2-3):173 378, 2014. Yu Wang. A new concept using LSTM Neural Networks for dynamic system identification. 2017 American Control Conference (ACC), pages 5324 5329, 2017. John Williamson. On the algebraic problem concerning the normal forms of linear dynamical systems. American Journal of Mathematics, 58(1):141 163, 1936. John Williamson. On the normal forms of linear canonical transformations in dynamics. American Journal of Mathematics, 59(3):599 617, 1937. Jin-Long Wu, Heng Xiao, and Eric Paterson. Physics-informed machine learning approach for augmenting turbulence models: A comprehensive framework. Physical Review Fluids, 3(7):74602, 2018. Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Symplectic ode-net: Learning hamiltonian dynamics with control. In International Conference on Learning Representations, 2020. 9. Appendices 9.1 Proof of Theorem 7 (i) Let (d, v) ΘCHn and let 1 (d) s + (0, 0, , 0, 1)T u, 2 (d, v) s, be the corresponding linear controllable state-space system. In the following paragraphs, we construct for every S Sp(2n, R), a linear system morphism f (d,v) S : R2n R2n between (26) and the port-Hamiltonian system (Q, B) = ϕS(θCHn(d, v)) PHn in the statement. Notice, first of all that Q is by construction symmetric and positive-definite. Let now L M2n be the matrix implementing the linear map f (d,v) S , that is, f (d,v) S (s) = Ls, s R2n. We now explicitly construct L and prove that it provides a system morphism. We start by denoting A := Jn , and define for each k = 1, . . . , 2n, a matrix Lk M2n as L2n k := Ak + a2n 1 Ak 1 + + a2n k I2n. In particular, L2n = I2n. Then, L is constructed as L := L1v L2v L2nv , and L := S 1L . We now check that f (d,v) S (s) = Ls is indeed a system morphism between (26) and the port-Hamiltonian system (5) with Q = ST S and B = S 1v. This amounts to checking that 1 (d) = Jn QL (ii) L (0, 0, , 0, 1)T = B Learnability of Linear Port-Hamiltonian Systems 2 (d, v) = BT QL. We note that (ii) trivially holds. Now, (i) is equivalent to 1 (d) = Jn ST SS 1L = S 1Jn L1v L2v L2nv 0 1 0 . . . 0 0 0 1 . . . 0 ... ... ... ... ... 0 0 0 . . . 1 a0 a1 a2 . . . a2n 1 L1v L2v L2nv We compare the k-th columns of the left and the right-hand sides in this equality. When k = 1, the difference between the first columns in the left and the right-hand side is AL1v + a0v = A(A2n 1 + a2n 1 A2n 2 + + a1 I)v + a0v = (A2n + a2n 1A2n 1 + + a1A + a0)v = 0. (27) The last equality holds as a consequence of the Cayley-Hamilton theorem. Indeed, by the definition of the entries {a0, a1, . . . , a2n 1} we have that the characteristic polynomial of A is det (λI2n A) = det λIn D D λIn = det (λI2n) det 2) . . . (λ2 + d2 Consequently, since by the Cayley-Hamilton theorem, A solves its characteristic polynomial, we can conclude that A2n + a2n 1A2n 1 + + a1A + a0 = 0 and hence (27) follows. When 1 < k 2n, the difference between the k-th columns in the left and the right-hand side is (Lk 1v ak 1v) ALkv = (Lk 1 ak 1I2n ALk)v = 0, Lk 1 ak 1I2n ALk = (A2n k+1 + a2n 1 A2n k + + ak 1 I2n) ak 1 I2n A(A2n k + a2n 1 A2n k 1 + + ak I2n) = 0. We have hence proved that (i) holds. We now proceed to check (iii). This amounts to computing BT QL = (S 1v)T ST SS 1L = v T L1v L2v L2nv Let us denote c2n c2n 1 c2n 2 . . . c2 c1 Juan-Pablo Ortega and Daiying Yin Then we observe that for k = 1, . . . , n, + + a2n 2k+1 I2n + + a2n 2k+2 + a2n 2 J2k 3 + + a2n 2k+2 Jn The last equation follows from the fact that each summand is a skew-symmetric matrix. On the other hand, for k = 0, . . . , n 1, c2k+1 = v T + + a2n 2k I2n + + a2n 2k I2n + a2n 2 ( 1)k 1 Substitute the values of coefficients a2k as expressions in terms of di s, we obtain that c2k+1 = v T for k = 0, . . . , n 1, and f2 0 ... 0 fn 1 with fl = dl j1,...,jk =l 1 j1< n. In conclusion, by combining the two special cases, we see that θCHn(d1, v1) and θCHn(d2, v2) induce the same filter if and only if ci(d1, v1) = ci(d2, v2) and ei(d1, v1) = ei(d2, v2) for all 0 i n 1, Learnability of Linear Port-Hamiltonian Systems 9.4 Proof of Theorem 22 ϕS θCHn is compatible with and sys. Fix a choice of S Sp(2n, R). We need to show that (d1, v1) (d2, v2) if and only if := (Q1, B1) sys (Q2, B2) := This means there exists an invertible L such that (14) holds. We claim that L = S 1PAS does the job, where P and A are given by Definition 20. The first condition is LJQ1 = JQ2L LJQ1L 1 = JQ2 SS 1A 1P 1S = JST A 1P 1S = S 1J A 1 = P T J The second condition is true by construction, namely LB1 = B2 S 1PASS 1v1 = S 1v2 v2 = PAv1. The third condition is Based on the compatibility result above, we know that ϕS θCHn induces a unique map ΦS : ΘCHn/ PHn/ sys defined as ΦS([d, v] ) = . We now verify that ΦS does not depend on the choice of S Sp(2n, R). ΦS is independent of S. It suffices to check that, for S1 = S2, we have , which again goes back to checking (14) holds for some Juan-Pablo Ortega and Daiying Yin invertible L. We claim that L = S 1 2 S1 does the job. The first condition is The second condition is The third condition is Since ΦS does not depend on S Sp(2n, R), we may as well choose S = Jn and call it Φ. Then Φ has the expression Φ([d, v] ) = . We now verify that Φ is injective and surjective, and hence an isomorphism. Φ is surjective. For an arbitrary choice [Q, B]sys of equivalence class, we take a representative Q and B. Since Q is symmetric positive-definite, by Williamson s theorem, Q = ST S for some S Sp(2n, R) and the diagonal entries of D are nonnegative and can be identified with d. Let v = S B. Then we have ΦS([d, v] ) = [Q, B]sys. Given that ΦS = Φ for any S, it holds that Φ([d, v] ) = [Q, B]sys. This concludes Φ being surjective. Φ is injective. For , it means there exists some invertible L such that the conditions in (14) are all satisfied. We aim to show that (d1, v1) (d2, v2). The first condition gives Therefore, {d1,i|i = 1, . . . , n} is the same as {d2,i|i = 1, . . . , n} as a set, and this implies the existence of some σ Sn such that d2,i = d1,σ(i) for i = 1, . . . , n. In other words, there exists some permutation matrix Learnability of Linear Port-Hamiltonian Systems Pσ such that P . Thus, (i) of Definition 20 holds. Further, we have if we denote A := P T L. Thus, (iii) of Definition 20 holds true. The second condition of (14) says v2 = Lv1 = PAv1. Thus, (iv) of Definition 20 holds true. Lastly, the third condition implies Thus, (ii) in Definition 20 holds. We conclude that Φ is injective. Φ is a homeomorphism with respect to the quotient topology. Before we prove this statement, we first quote a lemma (see, for instance, Abraham et al. (1988)). Lemma 39 Let X and Y be sets equipped with equivalence relations X and Y respectively. If φ : X Y is a map such that, for any x1, x2 X, x1 X x2 if and only if φ(x1) Y φ(x2), then φ projects to a unique map φ : X/ X Y/ Y between the quotient spaces given by φ([x] X ) = [φ(x)] Y and such that the following diagram commutes. In particular, if φ is a homeomorphism between two topological spaces X and Y , then φ is also a homeomorphism. We now proceed with the proof. (i) If (Q1, B1) and (Q2, B2) PHn are linked by some linear symplectic map S Sp(2n, R) by (Q2, B2) = (S T Q1S 1, SB1), then (Q1, B1) sys (Q2, B2). Therefore, as an immediate consequence of Williamson s normal form, we have PHn/ sys= PHdiag n / sys, where D = diag(d), di > 0, v R2n (ii) There is an obvious homeomorphism ϕ : ΘCHn PHdiag n given by ϕ(d, v) = fore, by identifying PHn/ sys with PHdiag n / sys, the induced map of ϕ on the quotients is exactly Φ. By Lemma 39, Φ is also a homeomorphism. To summarize, we have that the following diagram commutes. ΘCHn/ PHn/ sys Juan-Pablo Ortega and Daiying Yin 9.5 Proof of Proposition 24 The axioms of being a groupoid mostly follow from the definition. Here, we only check the closure of the multiplication operation m, that is, (L1L2, (Q2, B2)) Gn. Note that JT (L1L2)JQ2(L1L2) 1 = JT L1J(JT L2JQ2L 1 1 = JT L1JQ1L 1 is symmetric positive-definite. On the other hand, we have JT (L1L2)T J(L1L2)B2 = JT LT 1 JL1L2B2 = JT LT 1 JL1B1) = JT LT 2 JB1 = JT LT 2 JL2B2 = B2. Thus, closure of multiplication is proved. We also need to show that α and β are submersions. Indeed, for (L, (Q, B)) Gn and (N, (P, C)) T(L,(Q,B))Gn, it holds that T(L,(Q,B))α(N, (P, C)) = d JT (L + t N)J(Q + t P)(L + t N) 1, (L + t N)(B + t C) = (JT NJQL 1 + JT LJPL 1 JT LJQL 1NL 1, LC + NB). Obviously, LC + NB can traverse R2n with varying N GL(2n, R) and C R2n. For the first component, we can take N = L such that it becomes JT LJPL 1. Since the tangent space of an open submanifold can be identified with the tangent space of the whole manifold, plus the fact that the tangent space of a vector space can be identified with itself, we naturally conclude that T(L,(Q,B))α is surjective and hence α is a submersion. Similarly, one check that β is a submersion. Then, the orbit of the groupoid containing (Q, B) is given by α(β 1(Q, B)) = α({(L, (Q, B))|L satisfies 1.(i) and 1.(ii) in Definition 23 }) (JT LJQL 1, LB)|L satisfies 1.(i) and 1.(ii) in Definition 23 (Q , B )|(Q , B ) sys (Q, B) 9.6 Proof of Proposition 30 f is well-defined. If (d1, v1) sys (d2, v2), then there exists an invertible matrix L0 such that 1 (d1) = gctr L0 (0, 0, , 0, 1)T = (0, 0, , 0, 1)T 2 (d1, v1) = gctr 2 (d2, v2) L0 Since we are restricting on canonical systems, we apply the representation theorem to deduce the existence of some invertible matrices Li, i = 1, 2 such that 1 (di) = JQi Li Li (0, 0, , 0, 1)T = Bi 2 (di, vi) = BT Now, check L = L2L0L 1 1 is invertible and satisfies LJQ1 = JQ2L Therefore, (Q1, B1) sys (Q2, B2). f is surjective. This is obvious, see the proof above. f is injective. Given all the matrices are invertible, this can be shown by essentially reversing the proof of f being well-defined. Learnability of Linear Port-Hamiltonian Systems 9.7 Proof of Proposition 31 We directly verify that Γ(σ,(θ1,...,θn)T ) ( σ,( θ1,..., θn)T )(d, v) = Γ(σ σ,(θ1,...,θn)T +Pσ ( θ1,..., θn)T )(d, v) = (Pσ σ d, Γ(θ1,...,θn)T ΓPσ ( θ1,..., θn)T Pσ σ 0 0 Pσ σ = (PσP σ d, Γ(θ1,...,θn)T cos θσ(1) 0 sin θσ(1) 0 ... ... 0 cos θσ(n) 0 sin θσ(n) sin θσ(1) 0 cos θσ(1) 0 ... ... 0 sin θσ(n) 0 cos θσ(n) P σ 0 0 P σ = (PσP σ d, Γ(θ1,...,θn)T Γ( θ1,..., θn)T P σ 0 0 P σ = Γ(σ,(θ1,...,θn)T )(Γ( σ,( θ1,..., θn)T )(d, v)). 9.8 Proof of Proposition 32 Recall that (d1, v1) and (d2, v2) lie in the same (Sn φ Tn)-orbit if and only if for some σ Sn and (θ1, . . . , θn) Tn. (i) d2,i = d1,σ(i), i = 1, . . . , n. 1,σ(i) + v2 Clearly, (i) above is equivalent to Proposition 18 (i). Moreover, Proposition 18 (ii) implies that for k = 0, . . . , n 1, F1,k 0 0 F1,k v1 = (P T v2)T F1,k 0 0 F1,k 2,σ 1(i) + v2 2,n+σ 1(i)) Now, let R1 = (R1,1, . . . , R1,n)T , where R1,i = v2 1,n+i. Let R2 = (R2,1, . . . , R2,n)T , where R2,i = v2 2,σ 1(i) +v2 2,n+σ 1(i). Identify the diagonal matrix F1,k as a row vector in Rn. Then, the above is equivalent to saying that the inner product of F1,k with R1 and R2 are the same for all k = 0, . . . , n 1. Rewrite these inner products as matrix multiplication gives ( R1 R2) = 0. The determinant of this matrix is . Since there are no repeated symplectic eigenvalues, we must have R1 = R2, namely v2 2,σ 1(i) + v2 2,n+σ 1(i) for all i = 1, . . . , n. Thus, (ii) holds by inversing the permutation σ. The converse is clearly true. Juan-Pablo Ortega and Daiying Yin 9.9 Proof of Proposition 33 f is well-defined. Let (d1, v1) and (d2, v2) be in the same orbit of the (Sn φ Tn)-action. This means there exists σ Sn and Θ Tn such that Γσ(d1) = d2 and ΓΘ(Γσ(v1)) = v2. This immediately implies (d1) = (d2) , as well as R(v2) = R(Γσ(v1)). Moreover, let σi Sn be the unique permutation such that Γσi(di) = (di) , i = 1, 2. Then we have, 2 ((d2) ) = Γσ 1 Since all the entries of d are distinct, we necessarily have σ = σ 1 2 σ1. We want to show R(Γσ1(v1)) = R(Γσ2(v2)), but since R and Γσ commutes for any σ Sn, this is equivalent to Γσ1(R(v1)) = Γσ2(R(v2)) Γσ1(R(v1)) = Γσ2(R(Γσ(v1))) Γσ1(R(v1)) = Γσ2(R(Γσ 1 Γσ1(R(v1)) = R(Γσ1(v1)), which is clearly true. f is surjective. This is obvious. f is injective. Now suppose ((d1) , R(Γσ1(v1))) = ((d2) , R(Γσ2(v2))). This immediately implies the existence of some σ Sn such that Γσ(d1) = d2. On the other hand, since di = Γσ 1 i (di) , i = 1, 2, we have σ = σ 1 2 σ1 and hence d2 = Γσ 1 2 σ1(d1). On the other hand, R(Γσ1(v1)) = R(Γσ2(v2)) implies R(Γσ 1 2 σ1(v1)) = R(v2), which further implies the existence of some Θ Tn such that v2 = ΓΘ(Γσ 1 2 σ1(v1)). This concludes that (d1, v1) and (d2, v2) lie in the same orbit. 9.10 Proof of Theorem 34 Proof of part (i). Say we are given a latent system z = Jn Qz + Bu , B R2n and Q a 2n by 2n symmetric, positive-definite matrix. Consider the matrix Jn 0 0 Jm n 0 In 0 0 In 0 0 0 0 0 0 Im n 0 0 Im n 0 There exists a conjugate transform by an orthogonal matrix that turns this matrix into Jm, since only elementary row(column) permutation matrices are involved, and these elementary matrices are themselves orthogonal. That is, there exists OOT = OT O = I2m such that O Jn 0 0 Jm n OT = Jm. Now, consider the following linear port-Hamiltonian system in normal form Jn 0 0 Jm n Q 0 0 I2m 2n Q 0 0 I2m 2n Q 0 0 I2m 2n Learnability of Linear Port-Hamiltonian Systems with the change of variable z = OT z, which is equivalent to Jn Q 0 0 Jm n which, restricted to the upper subspace, coinsides with (31). Moreover, the matrix O Q 0 0 I2m 2n again symmetric positive-definite by construction. Proof of part (ii). According to the system morphism conditions, we just need to check LJn Q = Jm O Q 0 0 I2m 2n The first condition is LJn Q = Jm O Q 0 0 I2m 2n OT LJn Q = OT Jm O Q 0 0 I2m 2n Jn 0 0 Jm n Q 0 0 I2m 2n The second and third conditions are clear with L = O 9.11 Proof of Proposition 35 f is well-defined. Given (Q1, B1) sys (Q2, B2), there exists an invertible L R2n such that (14) is satisfied. Let L = O L 0 0 I2m 2n OT . Check that L satisfies the conditions (14) together with (Q 2). Therefore, (Q f is surjective. This is clear from definition of (Q , B ). f is injective. Given (Q 2), it means there exists an invertible L R2m such that L satisfies the conditions in (14) together with (Q 2). Write the matrix OT L O in the form L1 L2 L3 L4 , where L1 R2n. Then check L1 satisfies the conditions (14) together with (Q1, B1) and (Q2, B2). Therefore, (Q1, B1) sys (Q2, B2). Juan-Pablo Ortega and Daiying Yin 9.12 Proof of Proposition 36 Clearly, Q is also symmetric and positive-definite. Thus, again by Williamson s theorem, Q = (S )T As before, we have λI2m (S ) 1 λI2m Jm(S )T = det(λI2m Jm Q ) Q 0 0 I2m 2n Q 0 0 I2m 2n Q 0 0 I2m 2n λ(Jm O) 1Jm(Jm O) + Q 0 0 I2m 2n Q 0 0 I2m 2n λJn 0 0 λJm n Q 0 0 I2m 2n = det(λJm n + I2m 2n) det(λJn + Q) = (λ2 + 1)m n det(λI2n Jn Q) = (λ2 + 1)m n If we fixed the order of symplectic eigenvalues d according to d = (d1, . . . , dn, 1, . . . , 1), then Q 0 0 I2m 2n ST 0 0 Jm n D 0 0 D 0 0 Im n 0 0 Im n D 0 0 Im n 0 0 D 0 0 Im n Learnability of Linear Port-Hamiltonian Systems Now, we check the matrix O OT is symplectic, that is, Jn 0 0 Jm n Jn 0 0 Jm n Therefore, O OT is a symplectic matrix diagonalizing Q in Williamson s theorem. Then, we v = S B = O 9.13 Proof of Proposition 37 Similar to the proof of Theorem 22, simply replace Q with O Q 0 0 I2m 2n OT , B with O OT , D with upper 0m n v T 9.14 A note on the design of discrete integrators on the transformed space Even though we used just a simple Euler integration scheme in the numerical illustration, structurepreserving integration algorithms could have been used. In particular, we could have used an implicit midpoint rule which is symplectic (see Marsden and West (2001)), that is, it preserves the symplectic form dq dp. Recall that if LLag(q, q) is the Lagrangian function of the system of interest, then the midpoint integrator is obtained by using the discrete Lagrangian d (q0, q1, h) = h LLag((1 α)q0 + αq1, q1 q0 2 to approximate the exact discrete Lagrangian d (q0, q1, h) = LLag(q0,1(t), q0,1(t))dt. Explicitly, the midpoint integrator for a linear autonomous Hamiltonian system is zn+1 zn = h JQ which in terms of the controllable Hamiltonian representation reads L(sn+1 sn) = h 2 JQL(sn+1 + sn) = h 1 (d)(sn+1 + sn), (32) where the second equality holds by the construction of L in the proof of Theorem 7 part (i). Thus, for the symplectic structure to be preserved in the original space, we can merely integrate by requiring sn+1 sn = h 1 (d)(sn+1 + sn), where gctr 1 (d) as we have seen, takes the form 0 1 0 . . . 0 0 0 1 . . . 0 ... ... ... ... 0 0 0 . . . 1 a0 a1 a2 . . . a2m 1 Juan-Pablo Ortega and Daiying Yin Therefore, the integrator is given by sn+1 = (I2n h 1 (d)) 1(I2n + h where the matrix inverse is well-defined for sufficiently small time step h. Indeed, the integrator can be defined on the quotient space of L, since by (32), we may as well choose sn+1 such that sn+1 sn = h 1 (d)(sn+1 + sn) + sker for an arbitrary sker ker(L). By a similar argument, the midpoint rule in terms of observable Hamiltonian representation reads sn+1 sn = L(zn+1 zn) = h 2 LJQ(zn+1 + zn) = h 1 (d)(sn+1 + sn), (33) where the last equality holds by construction of L from Theorem 7 Part (ii). Therefore, the integrator is sn+1 = (I2n h 1 (d)) 1(I2n + h In the case of port-Hamiltonian system, if the system is driven by some fiber-preserving external force f H, that is, some input as in our case, then the discrete Lagrange-d Alembert Principle can be used to construct variational integrators so that all the correspondence relationships and error analysis of standard variational integrators still hold (see Marsden and West (2001)). For example, the midpoint rule applied to the controllable Hamiltonian representation becomes zn+1 zn = h JQ L(sn+1 sn) = h 1 (d)(sn+1 + sn) + Note that this structure-preserving integrator is not explicit in general.