# the_gyrostructure_of_some_matrix_manifolds__de358245.pdf The Gyro-Structure of Some Matrix Manifolds Xuan Son Nguyen ETIS, UMR 8051, CY Cergy Paris Université, ENSEA, CNRS, Cergy, France xuan-son.nguyen@ensea.fr In this paper, we study the gyrovector space structure (gyro-structure) of matrix manifolds. Our work is motivated by the success of hyperbolic neural networks (HNNs) that have demonstrated impressive performance in a variety of applications. At the heart of HNNs is the theory of gyrovector spaces that provides a powerful tool for studying hyperbolic geometry. Here we focus on two matrix manifolds, i.e., Symmetric Positive Definite (SPD) and Grassmann manifolds, and consider connecting the Riemannian geometry of these manifolds with the basic operations, i.e., the binary operation and scalar multiplication on gyrovector spaces. Our work reveals some interesting facts about SPD and Grassmann manifolds. First, SPD matrices with the Affine-Invariant (AI) and Log-Euclidean (LE) geometries have rich structure with strong connection to hyperbolic geometry. Second, linear subspaces, when equipped with our proposed basic operations, form what we call gyrocommutative and gyrononreductive gyrogroups. Furthermore, they share remarkable analogies with gyrovector spaces. We demonstrate the applicability of our approach for human activity understanding and question answering. 1 Introduction Data lying on matrix manifolds are commonly encountered in various applied areas such as medical imaging [3, 37], shape analysis [41], drone classification [6], image recognition [11], and human behavior analysis [10, 15, 16, 20, 21, 22, 31, 40]. These data arise from constraint sets of the problem, for which there is a natural representation of elements in the form of matrix arrays [2]. Due to the non-Euclidean nature of these data, traditional optimization algorithms usually fail to obtain good results in the matrix manifold setting. While a large body of works [6, 7, 8, 10, 20, 21, 22, 31, 33, 48] has been developed to generalize traditional optimization algorithms to this setting, there is still a lack of works that translate the language of differential geometry to basic operations on matrix manifolds so that they can be used in computational building blocks of neural network models on these manifolds just as basic operations on Euclidean spaces, e.g., matrix-matrix addition and scalar-matrix multiplication are used in deep neural networks (DNNs). To address the above issue, we propose a novel framework based on the theory of gyrovector spaces [44, 45, 46] that has been successfully applied in the context of HNNs [12, 39]. Our aim is to uncover hidden analogies between the target manifolds and Euclidean spaces in the same way that we uncover hidden analogies between hyperbolic and Euclidean spaces [12, 39, 44, 45, 46]. Although there are some works [1, 17, 24, 25, 26, 27, 29] showing the gyro-structure of SPD manifolds with the AI geometry, none of them provides a rigorous mathematical formulation for the connection between the basic operations of [1, 17, 18, 24, 25, 26, 27] and the AI geometry of SPD manifolds. In this paper, we show how the basic operations can be constructed from the Riemannian geometry of matrix manifolds, and derive their compact expressions for SPD and Grassmann manifolds. In the case of SPD manifolds with the LE geometry [3], one recovers precisely the operations of [3] that give these manifolds a vector space structure. In the case of Grassmann manifolds, we obtain gyrocommutative and gyrononreductive gyrogroups that share remarkable analogies with gyrovector spaces. To the 36th Conference on Neural Information Processing Systems (Neur IPS 2022). best of our knowledge, our work is the first that studies the structure of Grassmann manifolds under the framework of gyrovector spaces. Our main contributions are (1) We propose a method for constructing some basic operations, i.e., matrix-matrix addition and scalar-matrix multiplication on SPD and Grassmann manifolds; (2) We derive compact expressions of these operations for the considered manifolds; (3) We verify the gyro-structure of SPD manifolds, and some axioms of gyrovector spaces for Grassmann manifolds; (4) We showcase our approach on the tasks of human activity understanding and question answering. 2 Background 2.1 Gyrovector Spaces Gyrovector spaces form the setting for hyperbolic geometry in the same way that vector spaces form the setting for Euclidean geometry [44, 45, 46]. We first recap the definitions of gyrogroups and gyrocommutative gyrogroups proposed in [44, 45, 46]. For greater mathematical detail and in-depth discussion, we refer the interested reader to these papers. Definition 2.1 (Gyrogroups [46]). A pair (G, ) is a groupoid in the sense that it is a nonempty set, G, with a binary operation, . A groupoid (G, ) is a gyrogroup if its binary operation satisfies the following axioms for a, b, c G: (G1) There is at least one element e G called a left identity such that e a = a. (G2) There is an element a G called a left inverse of a such that a a = e. (G3) There is an automorphism gyr[a, b] : G G for each a, b G such that a (b c) = (a b) gyr[a, b]c (Left Gyroassociative Law). The automorphism gyr[a, b] is called the gyroautomorphism, or the gyration of G generated by a, b. (G4) gyr[a, b] = gyr[a b, b] (Left Reduction Property). Definition 2.2 (Gyrocommutative Gyrogroups [46]). A gyrogroup (G, ) is gyrocommutative if it satisfies a b = gyr[a, b](b a) (Gyrocommutative Law). The following definition of gyrovector spaces is slightly different from Definition 3.2 in [46]. Definition 2.3 (Gyrovector Spaces). A gyrocommutative gyrogroup (G, ) equipped with a scalar multiplication (t, x) t x : R G G is called a gyrovector space if it satisfies the following axioms for s, t R and a, b, c G: (V1) 1 a = a, 0 a = t e = e, and ( 1) a = a. (V2) (s + t) a = s a t a. (V3) (st) a = s (t a). (V4) gyr[a, b](t c) = t gyr[a, b]c. (V5) gyr[s a, t a] = Id, where Id is the identity map. Note that the axioms of gyrovector spaces considered in our work are more strict than those proposed in [24, 25, 26, 27]. Thus many results proved in [24, 25, 26, 27] can be applied to our case, which gives rise to interesting applications. 3 Proposed Approach For simplicity of exposition, we will concentrate on real matrices. Denote by Mn,m the space of n m matrices, Sym+ n the space of n n SPD matrices, Symn the space of n n symmetric matrices, On the space of n n orthogonal matrices, Grn,p the p-dimensional subspaces of Rn. Let M be a Riemannian homogeneous space, TPM be the tangent space of M at P M. Denote by exp(P) and log(P) the usual matrix exponential and logarithm of P, Exp P(W) the exponential map at P that associates to a tangent vector W TPM a point of M, Log P(Q) the logarithmic map of Q M at P. Let TP Q(W) be the parallel transport of W from P to Q along geodesics connecting P and Q. We will use superscripts for the exponential and logarithmic maps, and the parallel transport to indicate their associated Riemannian metric (in the case of SPD manifolds) or the target manifolds (in the case of Grassmann manifolds). The following definitions generalize those of [12] to the matrix manifold setting. Definition 3.1. Let r be a positive real number. The binary operation P r Q where P, Q M is obtained by projecting Qr in the tangent space at the identity element I M with the logarithmic map, computing the parallel transport of this projection from I to Pr along geodesics connecting I and Pr, and then projecting it back on the manifold with the exponential map, i.e., P r Q = Exp Pr(TI Pr(Log I(Qr))) 1 r . (1) Definition 3.2. The scalar multiplication t P where t R and P M is obtained by projecting P in the tangent space at the identity element I M with the logarithmic map, multiplying this projection by the scalar t in TIM, and then projecting it back on the manifold with the exponential map, i.e., t P = Exp I(t Log I(P)). (2) In addition to the basic operations defined above, we need to determine an automorphism in order to verify the gyro-structure of a given matrix manifold. In the following, we will provide such automorphisms and derive compact expressions of the basic operations for SPD and Grassmann manifolds. The obtained expressions will ease the task of verifying the axioms of gyrovector spaces. Also, they often lead to simple and efficient implementations of neural networks on the considered manifolds. Furthermore, they enable effective generalizations of some operations on Euclidean spaces, e.g., matrix scaling to these manifolds (see Sections 4.1.2 and 4.2.2). 3.1 Gyrovector Spaces of SPD Matrices We investigate the gyro-structure of SPD manifolds with the AI and LE geometries in Sections 3.1.1 and 3.1.2, respectively. The AI and LE frameworks are reviewed in the supplemental material. We refer the interested reader to that document for our theoretical results that reveal hidden analogies between SPD manifolds with the AI and LE geometries and Euclidean spaces. 3.1.1 AI Gyrovector Spaces We first examine SPD manifolds with the AI geometry. Lemma 3.3 gives a compact expression of the binary operation (matrix-matrix addition). Lemma 3.3. For P, Q Sym+ n , the binary operation P r ai Q is given as P r ai Q = (P r 2 Qr P r 2 ) 1 r . (3) Proof See the supplemental material. An implicit assumption in Eq. (3) is that positive definite matrices are taken from the computations related to matrix powers. This assumption is used in all computations in Sections 3.1.1 and 3.1.2. The identity element of Sym+ n is the n n identity matrix In. Then, from Eq. (3), the inverse of P is given by r ai P = P 1. Lemma 3.4. For P Sym+ n and t R, the scalar multiplication t ai P is given as t ai P = Pt. (4) Proof See the supplemental material. Definition 3.5 (AI Gyrovector Spaces). Define a binary operation r ai and a scalar multiplication ai by Eqs. (3) and (4), respectively. Define a gyroautomorphism generated by P and Q as gyrr ai[P, Q]R = F r ai(P, Q)Rr(F r ai(P, Q)) 1 1 where F r ai(P, Q) = (P r 2 Qr P r 2 ) 1 2 P r 2 Q r 2 . Theorem 3.6. Gyrogroups (Sym+ n , r ai) with the scalar multiplication ai form gyrovector spaces (Sym+ n , r ai, ai). Proof See the supplemental material. The expressions of the binary operation, scalar multiplication, and gyroautomorphism given in Eqs. (3), (4), and (5) have appeared in [1, 17, 18]. However, these works concern with algebraic structures referred to as generalized gyrovector spaces where their considered set of axioms is different from the one in Section 2.1. The expressions given in Eq. (5) with r = 1 and r = 2 have also appeared in [24, 25, 26, 27]. However, no general form of the gyroautomorphism is given in these works. 3.1.2 LE Gyrovector Spaces The LE metrics have proven to yield similar results as the AI metrics in practice, but with much simpler and faster computations [3]. This motivates us to study the gyro-structure of SPD manifolds with the LE geometry. As far as we know, this topic has not been investigated in previous works. The result given in the following lemma is not trivial as closed formulae do not exist for the parallel transport associated with the LE geometry. Lemma 3.7. For P, Q Sym+ n , the binary operation P r le Q is given as P r le Q = exp(log(Pr) + log(Qr)) 1 Proof See the supplemental material. From Lemma 3.7, the inverse of P is given by r le P = P 1. Similarly to the scalar multiplication ai, the scalar multiplication le is constructed from Eq. (2). Lemma 3.8 gives an expression of this operation that is straightforward. Lemma 3.8. For P Sym+ n and t R, the scalar multiplication t le P is given by t le P = Pt. (7) Definition 3.9 (LE Gyrovector Spaces). Define a binary operation r le and a scalar multiplication le by Eqs. (6) and (7), respectively. Define a gyroautomorphism generated by P and Q as gyrr le[P, Q] = Id . Theorem 3.10. Gyrogroups (Sym+ n , r le) with the scalar multiplication le form gyrovector spaces (Sym+ n , r le, le). Proof See the supplemental material. The conclusion of Theorem 3.10 agrees with [3] which shows that the space of SPD matrices with the LE geometry has a vector space structure. This vector space structure is given by the operations defined in [3] that turn out to be the binary operation and scalar multiplication on LE gyrovector spaces in the specific case where r = 1. Indeed, it can be shown that the mapping exp : (Symn, +, .) (Sym+ n , r le, le) is a vector space isomorphism. Thus, for any r, the operations defined above also turn the space of SPD matrices with the LE geometry into a vector space. This generalizes the result of [3]. 3.2 Grassmann Gyrocommutative and Gyrononreductive Gyrogroups In the previous sections, we concern with the non-positively curved spaces of SPD matrices. In this section, we focus on another family of matrix manifolds, i.e., the non-negatively curved Grassmann manifolds. We adopt the following definition for Grassmann manifolds: Grn,p = {P Mn n |PT = P, P2 = P, rank(P) = p}. The Riemannian geometry of Grassmann manifolds is reviewed in the supplemental material. Let In,p = Ip 0 0 0 be the identity element of Grn,p. For the sake of convenience, we denote P = Loggr In,p(P). For P, Q Grn,p, assuming that In,p and P are not in each other s cut locus (see the supplemental material for a definition of cut locus of Grassmann manifolds), and In,p and Q are not in each other s cut locus, the binary operation is defined as P gr Q = Expgr P (TIn,p P(Loggr In,p(Q))). Lemma 3.11. For P, Q Grn,p, the binary operation P gr Q is given as P gr Q = exp([P, In,p])Q exp( [P, In,p]), (8) where [., .] denotes the matrix commutator. Proof See the supplemental material. Lemma 3.12 gives a closed-form expression of the binary operation in terms of P and Q. Lemma 3.12. For P, Q Grn,p, the binary operation P gr Q is given as P gr Q = exp 1 2 log (In 2P)(In 2In,p) Q exp 1 2 log (In 2P)(In 2In,p) . Proof See the supplemental material. For P Grn,p such that In,p and P are not in each other s cut locus, the inverse gr P of P is defined as gr P = Expgr In,p( Loggr In,p(P)). The scalar multiplication is defined as in Eq. (2) (subject to the assumption stated above). Lemma 3.13. For P Grn,p and t R, the scalar multiplication t gr P is given by t gr P = exp([t P, In,p])In,p exp( [t P, In,p]). (9) Proof See the supplemental material. Lemma 3.14 gives a closed-form expression of the scalar multiplication in terms of t and P. Lemma 3.14. For P Grn,p and t R, the scalar multiplication t gr P is given by t gr P = exp t 2 log (In 2P)(In 2In,p) In,p exp t 2 log (In 2P)(In 2In,p) . Proof See the supplemental material. Grassmann manifolds, when equipped with the above operations, do not form gyrovector spaces. However, as we will show later, they still form spaces that verify most of the axioms of gyrovector spaces. Note that since these operations can only be defined under the assumptions stated at the beginning of this section, all the axioms in Section 2.1 must be considered under these assumptions. We thus make them implicitly in the following. In order to specify the structure of Grassmann manifolds, we need to define some new algebraic structures that are generalizations of groups (Definitions 3.15 and 3.16) and vector spaces (Definition 3.17). Definition 3.15 (Nonreductive Gyrogroups). A groupoid (G, ) is a nonreductive gyrogroup if its binary operation satisfies axioms (G1), (G2), and (G3). Definition 3.16 (Gyrocommutative and Gyrononreductive Gyrogroups). A nonreductive gyrogroup (G, ) is gyrocommutative and gyrononreductive if it satisfies the Gyrocommutative Law. Definition 3.17 (Nonreductive Gyrovector Spaces). A gyrocommutative and gyrononreductive gyrogroup (G, ) equipped with a scalar multiplication is called a nonreductive gyrovector space if it satisfies axioms (V1), (V2), (V3), (V4), and (V5). Definition 3.18 gives the expression of a gyroautomorphism in Grassmann manifolds. Definition 3.18 (Grassmann Gyrocommutative and Gyrononreductive Gyrogroups). Define a binary operation gr and a scalar multiplication gr by Eqs. (8) and (9), respectively. For P, Q Grn,p, assuming that In,p and P are not in each other s cut locus, In,p and Q are not in each other s cut locus, In,p and P gr Q are not in each other s cut locus, then a gyroautomorphism generated by P and Q can be defined as gyrgr[P, Q]R = Fgr(P, Q)R(Fgr(P, Q)) 1, (10) where Fgr(P, Q) is given by Fgr(P, Q) = exp( [P gr Q, In,p]) exp([P, In,p]) exp([Q, In,p]). (11) The following identities are required for the verification of some axioms of nonreductive gyrovector spaces for Grassmann manifolds subject to certain conditions (Theorem 3.20). These identities are new to the best of our knowledge. Lemma 3.19. For P, F Grn,p, O On, TP Grn,p, m N, and s R, the following identities hold: 1. Loggr OFOT (OPOT ) = O Loggr F (P)OT ; 2. Expgr OPOT (O OT ) = O Expgr P ( )OT ; 3. [ , P]m = (In 2P)( [ , P])m(In 2P); 4. exp(s[ , P]) = (In 2P) exp( s[ , P])(In 2P). Proof See the supplemental material. We are now ready to state the main results of this section. Theorem 3.20. Groupoids (Grn,p, gr) form gyrocommutative and gyrononreductive gyrogroups. Furthermore, gyrocommutative and gyrononreductive gyrogroups (Grn,p, gr) with the scalar multiplication gr satisfy axioms (V1) and (V4), and axioms (V2), (V3), and (V5) under the following conditions: (V2) (s + t) gr P = s gr P gr t gr P for P Grn,p, t R, and s R such that: |s| < min λi n π 2| Im(λi)| where λi is an eigenvalue of [P, In,p], and Im(λi) is the imaginary part of λi. (V3) (st) gr P = s gr (t gr P) for P Grn,p, s R, and t R such that: |t| < min λi n π 2| Im(λi)| (V5) gyrgr[s gr P, t gr P] = Id for P Grn,p, and s, t R such that: max{|s|, |t|, |s + t|} < min λi n π 2| Im(λi)| Proof See the supplemental material. The following corollaries are useful in practice for the verification of axioms (V2), (V3), and (V5). Corollary 3.21. Gyrocommutative and gyrononreductive gyrogroups (Grn,p, gr) with the scalar multiplication gr satisfy axioms (V2), (V3), and (V5) if the right-hand sides of the inequalities stated in Theorem 3.20 are replaced with π 2 [P, In,p] , where . denotes the Hilbert-Schmidt norm. Proof See the supplemental material. Corollary 3.22. Gyrocommutative and gyrononreductive gyrogroups (Grn,p, gr) with the scalar multiplication gr satisfy axioms (V2), (V3), and (V5) under the following conditions: (V2) |s| 1; (V3) |t| 1; (V5) max{|s|, |t|, |s + t|} 1. Proof See the supplemental material. It follows from Corollary 3.22 that if max{|s|, |t|, |s + t|} 1, then Grassmann manifolds, when equipped with the basic operations defined in Eqs. (8) and (9), verify all the axioms of nonreductive gyrovector spaces. 4 Applications To showcase our approach, we propose new methods for human activity understanding and question answering. We refer the interested reader to the supplemental material for more applications. 4.1 Human Activity Understanding In this section, we develop a class of RNNs on SPD manifolds for human activity understanding. It is worth mentioning that the operations defined in Section 3.1 as well as those constructed in Section 4.1.2 can be used to build any type of neural networks on SPD manifolds, e.g., convolutional neural networks. However, since RNNs are based on update equations that involve the basic operations on matrices, they are well suited for validating our approach. 4.1.1 Problem Formulation Human activities can be recognized from low/mid level features [35, 36] or high-level poses [23, 32, 33, 34]. We use 3D skeleton data as they have shown to outperform low/mid level features for the considered task [23]. The goal is to build a model that for each sequence of 3D positions of body (hand) joints identifies the action performed by the person (or group of persons) in the sequence. 4.1.2 Proposed Method We will make use of the basic operations on AI and LE gyrovector spaces and the concept of gyroderivative in these spaces (see the supplemental material) that is similar to hyperbolic derivative [4] in Möbius gyrovector spaces. We also need to generalize some operations of Euclidean RNNs to the SPD manifold setting. Here we focus on 2 operations, i.e., matrix scaling and pointwise nonlinearity. Matrix Scaling If P Sym+ n , W Rn, W > 0, then the matrix scaling W v spd P is given by W v spd P = U diag(W V)UT , where U diag(V)UT is the eigenvalue decomposition of P, and W V is the element-wise multiplication. Pointwise nonlinearity If ϕ is a pointwise nonlinear activation function, then the pointwise nonlinearity ϕ a(P) is given by ϕ a(P) = U diag(max(ϵI, ϕ(V)))UT , where ϵ > 0 is a rectification threshold, and U diag(V)UT is the eigenvalue decomposition of P. By adapting a class of models that are invariant to time rescaling [42] to the SPD manifold setting (see the supplemental material), we obtain the following update equations for our models: Pt = ϕ a(Wh v spd Ht 1 + Wx v spd Xt), (12) Ht = Ht 1 α (( Ht 1) Pt), (13) where Xt Sym+ n is the input at frame t, Ht 1, Ht Sym+ n are the hidden states at frames t 1 and t, respectively, Wh, Wx Rn, and α R is a learnable parameter. 4.1.3 Implementation Details In order to retain the correlation of neighboring joints [5, 49] and to increase feature interactions encoded by covariance matrices, we first identify a closest left (right) neighbor of every joint based on their distance to the hip (wrist) joint1, and then combine the 3D coordinates of each joint and those of its left (right) neighbor to create a feature vector for the joint. For a given frame t, a mean vector µt and a covariance matrix Σt are computed from the set of feature vectors of the frame and then combined [30] to create a SPD matrix as Yt = Σt + µt(µt)T µt (µt)T 1 The lower part of matrix log(Yt) is flattened to obtain a vector vt. All vectors vt within a time window [t, t + c 1] where c is a constant are used to compute a covariance matrix as Zt = 1 c Pt+c 1 i=t ( vi vt)( vi vt)T , where vt = 1 c Pt+c 1 i=t vi. Matrix Zt is then the input data at frame t of the networks. Our network Gyro AI-HAUNet is illustrated in Fig. 1a. For classification, the network output is projected to the tangent space at the identity matrix using the logarithmic map. The lower part of the resulting matrix is flattened and then is fed to a fully-connected layer. 1For joints having more than 2 neighbors, one of them can be chosen. (a) Architecture of Gyro AI-HAUNet (d = 6) Similarity Scoring Layer Question Embedding Negative Answer Embedding Positive Answer Embedding Token Embedding Token Embedding Projection Layer Embedding Layer Positive Answer Sequence Negative Answer Sequence Question Sequence (b) Architecture of Gyro GR-QANet Figure 1: Our proposed architectures for human action understanding and question answering. Table 1: Accuracy comparison (%) of our networks against state-of-the-art SPD neural networks. Dataset SPDNet SPDNet BN SPD-SRU Gyro AI-HAUNet Gyro LE-HAUNet M = 1 M = 3 M = 1 M = 3 M = 1 M = 3 M = 1 M = 3 M = 1 M = 3 HDM05 58.44 72.75 62.54 76.25 42.26 54.79 61.50 78.14 57.01 74.53 #HDM05 0.12 MB 0.77 MB 0.13 MB 0.82 MB 0.05 MB 0.32 MB 0.05 MB 0.31 MB 0.05 MB 0.31 MB FPHA 87.65 88.17 88.52 91.83 78.57 85.16 89.73 96.00 83.03 89.94 #FPHA 0.04 MB 0.28 MB 0.05 MB 0.31 MB 0.019 MB 0.114 MB 0.018 MB 0.110 MB 0.018 MB 0.110 MB NTU60 73.26 77.38 75.84 79.52 66.25 75.34 83.12 94.72 77.25 89.44 #NTU60 0.03 MB 0.20 MB 0.04 MB 0.28 MB 0.004 MB 0.027 MB 0.004 MB 0.026 MB 0.004 MB 0.026 MB 4.1.4 Results We use three datasets, i.e., HDM05 [16], FPHA [13], and NTU RGB+D 60 (NTU60) [38]. These datasets include three different types of activities: body actions (HDM05), hand actions (FPHA), and interaction actions (NTU60). In the same spirit of previous works [6, 20, 40], we are interested in comparing manifold networks with a focus on SPD manifolds, and do not necessarily seek state-ofthe-art performance for the target task. We use a temporal pyramid representation for each sequence. At temporal pyramid M, a given sequence is partitioned into M subsequences of equal size. Each subsequence is then fed to a model with its own parameter set. The outputs from all the models are concatenated to create a final representation of the sequence. We run each model three times and report the best accuracy from these three runs [12]. The experimental settings can be found in the supplemental material. This document also reports the mean accuracies and standard deviations of some representative methods, an ablation study, and a comparison of our networks against Euclidean RNNs, transformers, HNNs, and graph neural networks (GNNs). Comparison against SPD Neural Networks Our networks, referred to as Gyro AI-HAUNet and Gyro LE-HAUNet, are compared against SPDNet [20], SPDNet BN [6], and SPD-SRU [8]. Results of these networks are obtained using their official code2,3,4 with default parameter settings. Results for M = 1 and M = 35 are given in Tab. 1. On HDM05 dataset, Gyro AI-HAUNet outperforms SPDNet and SPD-SRU, and performs worse than SPDNet BN when M = 1. However, when M = 3, Gyro AI-HAUNet gives the best result among the competing networks. On FPHA and NTU60 datasets, Gyro AI-HAUNet gives the best results for both M = 1 and M = 3. In terms of model size, Gyro AI-HAUNet and Gyro LE-HAUNet require far fewer parameters than SPDNet and SPDNet BN in all cases. For example, on FPHA dataset, when M = 3, Gyro AI-HAUNet outperforms SPDNet BN by 4.17% with 2.8 times fewer parameters. Also, on NTU60 dataset, when M = 3, Gyro AI-HAUNet outperforms SPDNet BN by 15.20% with 10.7 times fewer parameters. 2https://github.com/zhiwu-huang/SPDNet 3https://papers.nips.cc/paper/2019/hash/6e69ebbfad976d4637bb4b39de261bf7-Abstract. html 4https://github.com/zhenxingjian/SPD-SRU/tree/master 5For all the networks, setting M > 3 did not yield better results on the three datasets. 4.2 Question Answering In this section, we consider learning word embeddings in Grn,p. We also investigate the use of product manifolds Grn1,p Sym+ n2 for training word embeddings. This idea is mainly motivated by the work of [14] that shows the efficacy of mixed-curvature representations for graph embeddings. 4.2.1 Problem Formulation Let Q be a list of questions, A be a list of answers to the questions in Q. Each question q Q has a list of candidate answers in A. The candidate set comes with their relevancy judgements, where answers that are correct (positive) have labels equal to 1 and 0 otherwise. The goal is to build a model that for each query q and its list of candidate answers generates an optimal ranking such that correct answers appear at top of the list [29, 43]. 4.2.2 Proposed Method The core idea is to learn a scoring function [29, 43] given as φqa(q, a) = wfd(Q, A) + wb, where Q and A are embeddings of question q and answer a, respectively, wf, wb R are parameters of the model, and d(.) is a distance function in the target manifolds. We define a matrix scaling operation as follows. Matrix Scaling We parameterize a point P Grn,p by a matrix B Mp,n p such that 0 B BT 0 = [P, In,p]. Then point P can be computed (see Eq. (8) and the supplemental material) by P = exp([P, In,p])In,p exp( [P, In,p]) = exp 0 B BT 0 In,p exp 0 B BT 0 The matrix scaling m gr is defined as A m gr P = exp 0 A B (A B)T 0 In,p exp 0 A B (A B)T 0 where A Mp,n p. The embedding of question q is computed from those of its tokens as Q = B gr (S m gr T1) gr (S m gr T2) . . . gr (S m gr Tl) , where Ti Grn,p, i = 1, . . . , l are embeddings of the tokens in question q, S Mp,n p and B Grn,p are parameters of the model. The embedding of answer a is the summation of those of its tokens using operation gr. For P, Q Grn,p, the distance function d(., .) is defined as d(P, Q) = dgr(P, Q) = P Q F . We also train word embeddings in product manifold Grn1,p Sym+ n2 using the distance function d((Pgr, Pspd), (Qgr, Qspd)) = dgr(Pgr, Qgr) + τd F1 spd(Pspd, Qspd), where Pgr, Qgr Grn1,p, Pspd, Qspd Sym+ n2, d F1 spd(., .) is the Finsler distance function proposed in [29], and τ is a constant. This results in three models based on the scaling, rotation, and reflection transformations of [29]. Our Grassmann model is shown in Fig. 1b. 4.2.3 Results We use two datasets, i.e., Trec QA [47] (clean version) and Wiki QA [50]. The mean average precision (MAP) and mean reciprocal rank (MRR) are used as evaluation metrics [43]. We compare our models6 against the SPD models of [29]. Since the codes of these models are not publicly available, 6Code available at https://github.com/spratmnt/qa. Table 2: Comparison of our models against the SPD models of [29]. The SPD models learn embeddings in Sym+ 14. Our Grassmann model learns embeddings in Gr14,7. Our models based on the product manifold learn embeddings in Gr14,7 Sym+ 8 . Results are computed over three runs. DOF Model Trec QA Wiki QA Time (Trec QA,seconds) MAP MRR MAP MRR Train/epoch Test 105 SPDR Sca 47.77 0.18 57.54 0.32 59.42 0.04 60.57 0.05 15.35 1.41 105 SPDF1 Sca 48.35 1.24 57.64 2.23 60.68 0.42 62.01 0.34 15.56 1.43 105 SPDR Rot 48.07 0.89 54.49 0.87 59.37 1.25 60.74 1.50 15.98 1.41 105 SPDF1 Rot 49.02 1.62 58.72 1.85 60.60 0.67 62.28 0.74 16.09 1.43 105 SPDR Ref 48.79 0.91 57.12 1.23 59.07 0.53 60.58 0.52 15.98 1.41 105 SPDF1 Ref 48.33 0.48 56.18 0.97 59.93 0.18 61.80 0.39 16.09 1.43 49 Gyro GR-QANet 50.18 1.29 58.19 2.59 56.69 1.45 58.26 1.45 3.69 0.25 85 Gyro GR-SPDF1 Sca-QANet 50.10 0.30 57.70 0.93 60.62 0.25 62.42 0.16 11.62 0.90 85 Gyro GR-SPDF1 Rot-QANet 50.27 0.56 58.62 1.35 59.78 0.15 61.66 0.23 11.72 0.90 85 Gyro GR-SPDF1 Ref-QANet 48.83 1.89 58.11 0.87 60.41 0.39 61.86 0.35 11.72 0.90 we implemented them by following closely the instructions in [29]. The implementation details and experimental settings can be found in the supplemental material. Tab. 2 reports the means and standard deviations of MAP and MRR from three runs (DOF stands for degrees of freedom). Our Grassmann model Gyro GR-QANet performs favorably against most of the SPD models on Trec QA dataset. Also, results of our models based on the product manifold show the efficacy of mixed-curvature representations [14] in question answering. Note that the numbers of DOF of our models are smaller than those of the SPD models. We also note that techniques in [19, 28] could potentially improve the performance of our models. 5 Limitations of Our Work To develop our SPD models, we only construct the basic operations and two other operations, i.e., matrix scaling and pointwise nonlinearity (see Section 4.1.1). Other operations [39] should also be designed in order to improve the representation power of our networks. Also, the question of how to generalize a broader class of DNNs to the SPD manifold setting should be addressed in future work. Our experiments for question answering have shown the usefulness of product manifolds Grn1,p Sym+ n2 for learning word, entity, and relation embeddings. More investigation is needed to see if a product of multiple smaller-dimension SPD and Grassmann manifolds will improve performance [14]. It would also be interesting to find out patterns that show the relationship between the performance of the embeddings and the theoretical curvature of the manifolds in our problems [9]. 6 Conclusion We have shown that the AI and LE geometries of SPD manifolds have strong connection to hyperbolic geometry, and Grassmann manifolds share very similar properties with gyrovector spaces. We have presented new methods for generalizing Euclidean neural networks to the SPD and Grassmann manifold settings. Our experimental results on human activity understanding and question answering have demonstrated the effectiveness of the proposed approach. Acknowledgments We are grateful for the constructive comments and feedback from the anonymous reviewers. We thank Amir Shahroudy, Jun Liu, Tian-Tsong Ng, Gang Wang, Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, and Tae-Kyun Kim for giving us access to the NTU RGB+D and FPHA datasets. [1] T. Abe and O. Hatori. Generalized Gyrovector Spaces and a Mazur Ulam Theorem. Publicationes Mathematicae Debrecen, 87:393 413, 2015. 1, 4 [2] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2007. 1 [3] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Fast and Simple Computations on Tensors with Log-Euclidean Metrics. Technical Report RR-5584, INRIA, 2005. 1, 4 [4] G. S. Birman and A. A. Ungar. The Hyperbolic Derivative in the Poincaré Ball Model of Hyperbolic Geometry. Journal of Mathematical Analysis and Applications, 254(1):321 333, 2001. 7 [5] Y.-L. Boureau, F. Bach, Y. Le Cun, and J. Ponce. Learning Mid-level Features for Recognition. In CVPR, pages 2559 2566, 2010. 7 [6] D. A. Brooks, O. Schwander, F. Barbaresco, J.-Y. Schneider, and M. Cord. Riemannian Batch Normalization for SPD Neural Networks. In Neur IPS, pages 15463 15474, 2019. 1, 8 [7] R. Chakraborty, J. Bouza, J. Manton, and B. C. Vemuri. Manifold Net: A Deep Neural Network for Manifold-valued Data with Applications. TPAMI, 44(2):799 810, 2020. 1 [8] R. Chakraborty, C.-H. Yang, X. Zhen, M. Banerjee, D. Archer, D. E. Vaillancourt, V. Singh, and B. C. Vemuri. A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices. In Neur IPS, pages 8897 8908, 2018. 1, 8 [9] C. Cruceru, G. Bécigneul, and O.-E. Ganea. Computationally Tractable Riemannian Manifolds for Graph Embeddings. In AAAI, pages 7133 7141, 2021. 10 [10] Z. Dong, S. Jia, C. Zhang, M. Pei, and Y. Wu. Deep Manifold Learning of Symmetric Positive Definite Matrices with Application to Face Recognition. In AAAI, pages 4009 4015, 2017. 1 [11] M. Engin, L. Wang, L. Zhou, and X. Liu. Deep KSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition. In ECCV, pages 629 645, 2018. 1 [12] O. Ganea, G. Becigneul, and T. Hofmann. Hyperbolic neural networks. In Neur IPS, page 5350 5360, 2018. 1, 3, 8 [13] G. Garcia-Hernando, S. Yuan, S. Baek, and T.-K. Kim. First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations. In CVPR, pages 409 419, 2018. 8 [14] A. Gu, F. Sala, B. Gunel, and C. Ré. Learning Mixed-Curvature Representations in Products of Model Spaces. In ICLR, 2019. 9, 10 [15] M. Harandi, M. Salzmann, and R. Hartley. Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods. TPAMI, 40:48 62, 2018. 1 [16] M. T. Harandi, M. Salzmann, and R. Hartley. From Manifold to Manifold: Geometry-Aware Dimensionality Reduction for SPD Matrices. In ECCV, pages 17 32, 2014. 1, 8 [17] O. Hatori. Examples and Applications of Generalized Gyrovector Spaces. Results in Mathematics, 71:295 317, 2017. 1, 4 [18] O. Hatori. Extension of Isometries in Generalized Gyrovector Spaces of the Positive Cones. Contemporary Mathematics - American Mathematical Society, 687:145 156, 2017. 1, 4 [19] K. Helfrich, D. Willmott, and Q. Ye. Orthogonal Recurrent Neural Networks with Scaled Cayley Transform. In ICML, pages 1974 1983, 2018. 10 [20] Z. Huang and L. V. Gool. A Riemannian Network for SPD Matrix Learning. In AAAI, pages 2036 2042, 2017. 1, 8 [21] Z. Huang, C. Wan, T. Probst, and L. V. Gool. Deep Learning on Lie Groups for Skeleton-Based Action Recognition. In CVPR, pages 6099 6108, 2017. 1 [22] Z. Huang, J. Wu, and L. V. Gool. Building Deep Networks on Grassmann Manifolds. In AAAI, pages 3279 3286, 2018. 1 [23] H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black. Towards Understanding Action Recognition. In ICCV, pages 3192 3199, 2013. 7 [24] S. Kim. Distributivity on the Gyrovector Spaces. Kyungpook Mathematical Journal, 55:13 20, 2015. 1, 2, 4 [25] S. Kim. Gyrovector Spaces on the Open Convex Cone of Positive Definite Matrices. Mathematics Interdisciplinary Research, 1(1):173 185, 2016. 1, 2, 4 [26] S. Kim. Operator Inequalities and Gyrolines of the Weighted Geometric Means. Co RR, abs/2009.10274, 2020. 1, 2, 4 [27] S. Kim. Ordered Gyrovector Spaces. Symmetry, 12(6), 2020. 1, 2, 4 [28] M. Lezcano-Casado. Trivializations for Gradient-Based Optimization on Manifolds. In Neur IPS, pages 9154 9164, 2019. 10 [29] F. López, B. Pozzetti, S. Trettel, M. Strube, and A. Wienhard. Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices. In Neur IPS, pages 18350 18366, 2021. 1, 9, 10 [30] M. Lovri c, M. Min-Oo, and E. A. Ruh. Multivariate Normal Distributions Parametrized As a Riemannian Symmetric Space. Journal of Multivariate Analysis, 74(1):36 48, 2000. 7 [31] X. S. Nguyen. Geom Net: A Neural Network Based on Riemannian Geometries of SPD Matrix Space and Cholesky Space for 3D Skeleton-Based Interaction Recognition. In ICCV, pages 13379 13389, 2021. 1 [32] X. S. Nguyen, L. Brun, O. Lézoray, and S. Bougleux. Skeleton-Based Hand Gesture Recognition by Learning SPD Matrices with Neural Networks. In IEEE International Conference on Automatic Face and Gesture Recognition, pages 1 5, 2019. 7 [33] X. S. Nguyen, L. Brun, O. Lézoray, and S. Bougleux. Learning Recurrent High-order Statistics for Skeleton-based Hand Gesture Recognition. In ICPR, pages 975 982, 2020. 1, 7 [34] X. S. Nguyen, A.-I. Mouaddib, and T. P. Nguyen. Hierarchical Gaussian Descriptor Based on Local Pooling for Action Recognition. Machine Vision and Applications, 30(2):321 343, 2019. 7 [35] X. S. Nguyen, A.-I. Mouaddib, T. P. Nguyen, and L. Jeanpierre. Action Recognition in Depth Videos Using Hierarchical Gaussian Descriptor. Multimedia Tools and Applications, 77(16):21617 21652, 2018. 7 [36] X. S. Nguyen, T. P. Nguyen, F. Charpillet, and N.-S. Vu. Local Derivative Pattern for Action Recognition in Depth Images. Multimedia Tools and Applications, 77(7):8531 8549, 2018. 7 [37] X. Pennec. Statistical Computing on Manifolds for Computational Anatomy. Habilitation à diriger des recherches, Université Nice Sophia-Antipolis, 2006. 1 [38] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In CVPR, pages 1010 1019, 2016. 8 [39] R. Shimizu, Y. Mukuta, and T. Harada. Hyperbolic Neural Networks++. In ICLR, 2021. 1, 10 [40] R. S. Sukthanker, Z. Huang, S. Kumar, E. G. Endsjo, Y. Wu, and L. V. Gool. Neural Architecture Search of SPD Manifold Networks. In IJCAI, pages 3002 3009, 2021. 1, 8 [41] H. Tabia, H. Laga, D. Picard, and P.-H. Gosselin. Covariance Descriptors for 3D Shape Matching and Retrieval. In CVPR, pages 4185 4192, 2014. 1 [42] C. Tallec and Y. Ollivier. Can Recurrent Neural Networks Warp Time? In ICLR, 2018. 7 [43] Y. Tay, A. T. Luu, and S. C. Hui. Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering. ACM International Conference on Web Search and Data Mining, pages 583 591, 2018. 9 [44] A. A. Ungar. Beyond the Einstein Addition Law and Its Gyroscopic Thomas Precession: The Theory of Gyrogroups and Gyrovector Spaces. Fundamental Theories of Physics, vol. 117, Springer, Netherlands, 2002. 1, 2 [45] A. A. Ungar. Analytic Hyperbolic Geometry: Mathematical Foundations and Applications. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2005. 1, 2 [46] A. A. Ungar. Analytic Hyperbolic Geometry in N Dimensions: An Introduction. CRC Press, 2014. 1, 2 [47] M. Wang, N. A. Smith, and T. Mitamura. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 22 32, 2007. 9 [48] R. Wang, X.-J. Wu, and J. Kittler. Sym Net: A Simple Symmetric Positive Definite Manifold Deep Learning Method for Image Set Classification. IEEE Transactions on Neural Networks and Learning Systems, pages 1 15, 2021. 1 [49] X. Yang and Y. Tian. Super Normal Vector for Activity Recognition Using Depth Sequences. In CVPR, pages 804 811, 2014. 7 [50] Y. Yang, W.-t. Yih, and C. Meek. Wiki QA: A Challenge Dataset for Open-Domain Question Answering. In Conference on Empirical Methods in Natural Language Processing, pages 2013 2018, 2015. 9 1. For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] See the supplemental material. (c) Did you discuss any potential negative societal impacts of your work? [No] (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes] See the supplemental material. 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We included the codes, data, and instructions for a part of our experiments as a URL and in the supplemental material. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See the supplemental material. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [No] (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See the supplemental material. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [Yes] (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] We included new codes as a URL and in the supplemental material. (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [Yes] We clearly cited the authors of the codes and data used in our experiments. (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]