# the_generalized_skew_spectrum_of_graphs__62928243.pdf The Generalized Skew Spectrum of Graphs Armando Bellante 1 2 3 Martin Pl avala 4 5 Alessandro Luongo 6 7 This paper proposes a family of permutationinvariant graph embeddings, generalizing the Skew Spectrum of graphs of Kondor & Borgwardt (2008). Grounded in group theory and harmonic analysis, our method introduces a new class of graph invariants that are isomorphisminvariant and capable of embedding richer graph structures - including attributed graphs, multilayer graphs, and hypergraphs - which the Skew Spectrum could not handle. Our generalization further defines a family of functions that enables a tradeoff between computational complexity and expressivity. By applying generalization-preserving heuristics to this family, we improve the Skew Spectrum s expressivity at the same computational cost. We formally prove the invariance of our generalization, demonstrate its improved expressiveness through experiments, and discuss its efficient computation. 1. Introduction Graph-structured data is ubiquitous across domains, from molecular structures in chemistry (Kosmala et al., 2023) to social networks in computational social science (Nettleton, 2013), or fraud detection in banking (Pourhabibi et al., 2020). However, developing adequate graphs representations remains a fundamental challenge in graph analysis (Hamilton, 2020). Traditional graph representations, 1Max-Planck-Institut f ur Quantenoptik, Hans-Kopfermann-Str. 1, 85748 Garching, Germany 2Munich Center for Quantum Science and Technology (MCQST), Schellingstr. 4, 80799 M unchen, Germany 3Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy 4Institute of Theoretical Physics, Leibiniz Universit at Hannover, Appelstraße 2, 30167 Hannover, Germany 5Naturwissenschaftlich-Technische Fakult at, Universit at Siegen, Siegen, Germany 6Centre for Quantum Technologies, National University of Singapore, Singapore 7Inveriant Pte. Ltd., Singapore. Correspondence to: Armando Bellante . Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s). (d) Embedding in R2. Figure 1. Three graphs and a permutation-invariant embedding. such as adjacency matrices, are highly sensitive to the conventional node numbering used when acquiring the graph data. For instance, graphs G1 and G2 in Figure 1 are isomorphic, but their different node numbering would yield distinct adjacency matrices. Graph representations that are sensitive to initial node numbering make learning on graphs a harder task, similarly to how rotationand translation-sensitive representations make learning on images challenging. This motivates the need for permutation-invariant graph embeddings, also known as graph invariants. A permutation-invariant embedding is a mapping that represents graphs as vectors while ensuring that isomorphic graphs are assigned the same vector. The expressivity, or completeness, of such an embedding is determined by its ability to distinguish non-isomorphic graphs. An embedding is complete if it maps all non-isomorphic graphs to different vectors. However, as of today, achieving both computational efficiency and completeness requires a trade-off, as determining wether two graphs are isomorphic is a paramount example of an NP-intermediate problem (Babai, 2016). Various approaches have been proposed to construct graph invariants, and the challenge has gained renewed importance with the rise of graph neural networks (GNNs) (Scarselli et al., 2008) and their applications across scientific disciplines (Corso et al., 2024). Modern methods primarily rely on message-passing neural networks (Gilmer et al., 2017) The Generalized Skew Spectrum of Graphs or spectral properties of the graph Laplacian (Hammond et al., 2011). While these approaches have achieved practical success, they often lack theoretical guarantees about their expressivity beyond the Weisfeiler-Lehman test (Morris et al., 2019; Xu et al., 2019) or do not generalize naturally to richer graph structures like attributed graphs, multilayer graphs, and hypergraphs. This limitation becomes particularly crucial as such complex graph structures increasingly appear in real-world applications, from multi-omics biological networks to heterogeneous social networks. This leads us to investigate the following research questions: How can we construct permutation-invariant graph embeddings that (i) handle rich graph structures beyond simple graphs, (ii) are mathematically well-grounded, and (iii) offer practical computational efficiency? Furthermore, can we develop a theoretical framework that enables principled trade-offs between computational efficiency and completeness? Group theory offers a principled and complementary framework for constructing graph invariants with provable properties (Bronstein et al., 2021; Dehmamy et al., 2021; Batatia et al., 2023). A seminal contribution in this direction was the Skew Spectrum (Kondor & Borgwardt, 2008), which introduced a rigorous method for constructing graph invariants through functions on the symmetric group. While theoretically elegant, the original Skew Spectrum could not handle rich graph structures and offered no clear pathway to trade off between computational complexity and expressivity. Our work revisits and improves these older group-theoretic foundations to address modern challenges in graph representation, opening new research directions. In this work, we introduce a generalization of the Skew Spectrum. Our contributions are threefold: 1. Multi-Orbit Spectra: We develop a mathematically rigorous family of permutation-invariant graph embeddings that naturally extend to attributed graphs, multilayer graphs, and hypergraphs; 2. k-Correlation Spectra: We establish a theoretical framework that enables principled trade-offs between computational complexity and expressivity; 3. Doubly-Reduced k-Spectra: We introduce heuristics that preserve the generalizations and improve the expressivity of the Skew Spectrum without increasing its computational cost, bridging theory and practice. We illustrate our theoretical contributions with numerical experiments, demonstrating that our generalizations significantly improve the Skew Spectrum expressivity: distinguishing richer graphs, and distinguishing more non-isomorphic simple graphs at the same computational complexity. 2. Background Our work extends the seminal paper of Kondor & Borgwardt (2008) on the Skew Spectrum of graphs. In this section, we review the key concepts, laying the ground for our results. Notation. The symbol L denotes a direct sum. For example, Ln i=1 i denotes a diagonal matrix with entries 1, . . . , n along the diagonal. The symbol N denotes a sequence of tensor products. For instance, N2 i=1 vi = v1 v2. We use [n] to denote the set of integers {1, . . . , n}. Similarly, [a, b] refers to the set of integers between a and b, inclusive. 2.1. The Symmetric Group The symmetric group Sn consists of all permutations of n elements. A permutation σ Sn is a bijection of the indices of these elements, where σ(i) denotes the image of index i under σ. We use cycle notation, e.g., σ = (1, 4, 3) maps 1 4, 4 3, and 3 1. For k < n, the subgroup Sk a naturally embeds in Sn, by acting on the first k indices while stabilizing the last n k. This induces a natural partition of Sn into right and left cosets, leading to the coset tranversals Sk\Sn and Sn/Sk. These right (left) sets contain exactly one representative element per coset: a unique x Sk\Sn (y Sn/Sk), such that any σ Sn can be decomposed as σ = h1x (σ = yh2) for some h1 Sk (h2 Sk). This definition extends to double-coset transversals Sk\Sn/Sk, where any σ Sn can be written as σ = h3zh4 for a unique z Sk\Sn/Sk and h3, h4 Sk. For example, Sn 2\Sn/Sn 2 consists of the following 7 group elements: {(), (n 1, n), (n 2, n 1), (n 2, n), (n 2, n 1, n), (n 2, n, n 1), (n 3, n 1)(n 2, n)}. We denote irreducible representations (irreps) of Sn by ρ. These are mappings from group elements to unitary matrices, and can be labeled by partitions of n. Specifically, we use the Young Orthogonal Representation (YOR), which has real-valued entries. The set of YORs is denoted by Irr(Sn), and we define Λn as the subset of four irreps indexed by the partitions (n), (n 1, 1), (n 2, 2), (n 2, 1, 1). We refer the interested reader to Appendix A and the background sections (2.6 and 2.7) of Bellante (2024) for more. 2.2. Graph isomorphism and functions on Sn The adjacency matrix A of a graph encodes its structure, with Ai,j specifying the relationship between nodes i and j. The construction of A inherently depends on the node numbering, and since graphs are often labeled arbitrarily, the same structure can be represented by multiple distinct adjacency matrices, complicating graph comparison tasks. Graph isomorphism is the problem of determining whether two graphs G1 and G2 have the same structure despite The Generalized Skew Spectrum of Graphs 0 1.3 0 0 1.5 0 0.3 0.8 0 0 0 0 1 0 2 0 (a) Adjacency matrix. 0 [..., 1, 2] [..., 1, 3] [..., 1, 4] [..., 2, 1] [..., 2, 3] [..., 2, 4] [..., 3, 1] [..., 3, 2] [..., 3, 4] [..., 4, 1] [..., 4, 2] [..., 4, 3] (b) Function f(g) = Ag(n),g(n 1). g is represented as a vector where gi = g(i). Figure 2. A weighted, directed graph as a function on S4. differences in node numbering. Formally, for adjacency matrices A(1) and A(2), G1 and G2 are isomorphic if there exists a permutation σ Sn such that A(1) i,j = A(2) σ(i),σ(j). Here, σ reorders the node indices of G2 to align with G1, preserving the edge relationships. To encode adjacency information in a manner that reflects this relationship, we define a function f : Sn R: f(g) = Ag(n),g(n 1) for all g Sn. (1) This encoding captures information about node adjacency while naturally reflecting the action of permutations on node indices. Indeed, using this formalism, two graphs are isomorphic if and only if their corresponding functions satisfy: σ s.t. f (A1)(g) = f (A2)(σg) for all g Sn. (2) In other words, graph isomorphism reduces to verifying whether two functions are related by a translation σ Sn. The Skew Spectrum leverages this formulation to construct graph invariants through translation-invariant functions on Sn, using correlation functions and their Fourier transforms. To conclude, note that graph functions (1) are sparse: they depend only on the image of the last two indices under g (g(n 1) and g(n)). More formally, f(σh) = f(σ) for all h Sn 2. Thus, f is supported on the coset transversal Sn/Sn 2. As shown in Fig. 2, this sparsity implies that f takes distinct values on at most n2 elements, rather than n!. 2.3. Fourier transform on Sn The Fourier transform of a function over Sn is defined as ˆf(ρ) = 1 |Sn| g Sn f(g)ρ(g) (3) where ρ is an irreducible representation of Sn. Standard FFT algorithms require O(n!polylog(n!)) operations to evaluate ˆf(ρ) for all irreps (Maslen, 1998). However, for graph-derived functions, taking values on Sn/Sn 2, the complexity reduces to O(n3) (or even O(n2)) operations (Kondor & Borgwardt, 2008; Clausen & H uhne, 2017). A key factor in this reduction is the projection τSn 2(ρ), which maps ρ onto the trivial irrep of Sn 2. This projection induces a sparsity structure in the Fourier transform, enabling efficient computation. This sparsity also plays a critical role in the efficient computation of the graph invariants. We formalize this sparsity in the following lemma. Lemma 2.1 (Fourier sparsity). Let f : Sn/Sn 2 C. Its Fourier transform using YOR is a collection of sparse square matrices indexed by specific partitions of n. The sparsity pattern is independent of f: 1. (n); size 1, one scalar entry. 2. (n 1); size n 1, 2 non-zero columns. 3. (n 2, 2); size n(n 3) 2 , 1 non-zero columns. 4. (n 2, 1, 1); size (n 1)(n 2) 2 , 1 non-zero columns. We use Λn to denote this set of partitions. For a detailed discussion on τSn 2 and its connection to the sparsity pattern, see Appendix B. 2.4. The Skew Spectrum of Graphs Kondor & Borgwardt (2008) introduced a translationinvariant function on Sn as the spectrum (the Fourier transform) of a triple correlation function, which they termed the Skew Spectrum. Definition 2.2 (Skew Spectrum). Let f : Sn C. An entry of the Skew Spectrum is defined as ˆSf(g1, ρ) = X f( g)f( gg1)f( g2) |Sn|2 ρ( g) ρ( g2). (4) A Skew Spectrum entry is a (sparse) matrix. The Skew Spectrum is uniquely determined by a small subset of entries - specifically, it is indexed by 7 group elements (Sn 2\Sn/Sn 2) and 4 irreps (Λn). Moreover, (4) can be expressed as ˆSf(g1, ρ) = ˆr(g1, ρ) ˆf(ρ), where ˆr(g1, ρ) = P g Sn f( g)f( gg1) |Sn| ρ( g) is the Fourier transform of another function r(g1, σ). This decomposition, Fourier transforms, and sparsity are key to efficient computation. The Skew Spectrum has up to 85 non-zero scalar values (features) across all the entries, and they can be computed in O(n6) operations. To improve efficiency, the authors proposed a way to trade expressivity for computational cost, introducing a reduced invariant. Definition 2.3 (Reduced Skew Spectrum). Let f : Sn C. An entry of the reduced Skew Spectrum is defined as ˆSf(g1, ρ) = X f( g)f( gg1)f( g2) |Sn|2 ρ( g) τSn 2(ρ) ρ( g2). where τ projects ρ onto the trivial representation of Sn 2. The Generalized Skew Spectrum of Graphs This projection reduces the domain of r(g1, σ) to Sn/Sn 2, ensuring the same sparsity pattern as in ˆf(ρ) and further simplifying the computation. The reduced Skew Spectrum s efficiency is summarized in the following result. Theorem 2.4 (Reduced Skew Spectrum computation). Let f : Sn/Sn 2 C. We can compute its reduced Skew Spectrum in O(n3) operations. The reduced Skew Spectrum has up to 49 non-zero entries. The reduced Skew Spectrum is the key contribution of Kondor & Borgwardt (2008), validated experimentally as an effective and efficient tool for representing graphs. In the next sections, we will present our generalization. 3. Richer graph structures and multiple orbits While the Skew Spectrum is a powerful invariant for weighted and directed graphs, it does not fully capture richer graph structures such as graphs with node or edge features, multilayer graphs, or hypergraphs. To address these limitations, we introduce the Multi-Orbit Skew Spectrum, enabling the analysis of these complex structures. Richer graph structures are typically represented by a collection of data structures (e.g., adjacency matrices, node feature vectors, etc.) indexed by the graph nodes. Under node renumbering, the elements within these structures permute independently without mixing (e.g., node features remain distinct from edge weights). These distinct subsets of data are referred to as orbits, and two such graph structures are isomorphic if there exists a single permutation that aligns all their orbits. To illustrate, consider graphs G1 and G2 in Figure 1. Treating node numbers as labels, the two graphs are not isomorphic; e.g., node 5 has a high degree in G1 but a lower degree in G2. For isomorphism, both the adjacency matrix and the node labels must align under the same permutation. A natural extension of the invariant is to define separate functions for each orbit (e.g., one for the adjacency matrix and one for node labels) and concatenate their respective Skew Spectra. However, this approach fails to account for correlations between orbits. For example, the individual spectra of G1 and G2 would coincide, even though the permutations linking the adjacency matrix and node labels differ. This information loss limits the invariant s completeness. To overcome this, we replace scalar functions with vectorvalued functions, enabling a unified representation of multiple orbits. Specifically, we define a function f : Sn/Sn 2 Cd as the direct sum of single-orbit functions: i [d] fi(g) (5) The pointwise product required for the Skew Spectrum is then replaced by the tensor product. Definition 3.1 (Multi-Orbit Skew Spectrum). Let f : Sn Cd. A Multi-Orbit Skew Spectrum entry is defined as ˆSf(g1, ρ) = X f( g) f( gg1) f( g2) |Sn|2 ρ( g) ρ( g2) . Expanding the direct sums in f(g), we obtain: ˆSf(g1, ρ) = M (i1,i2,i3) [d]3 ˆr(i1,i2)(g1, ρ) ˆfi3(ρ) (6) where ˆr(i1,i2)(g1, ρ) = P g Sn fi1( g)fi2( gg1) |Sn| ρ( g). This formulation introduces terms fi( g)fj( gg1), for i = j, which capture correlations between orbits and enhance the spectrum s ability to distinguish between graphs. Our intuition comes from a quantum theory perspective, where these terms can be viewed as interference terms. In quantum theory interference terms capture superposition-related phenomena where the whole can be greater than sum of the parts. Similarly, the Multi-Orbit Skew Spectrum is more expressive than the concatenation of individual spectra. We formally prove the invariance of the Multi-Orbit Skew Spectrum in Section 4, where it emerges as a special case of the broader framework we propose. Its reduced version is akin to the one of the original Skew Spectrum: each term of the direct sum in (6) is computed as a reduced Skew Spectrum (Def. 2.3). Its computation is detailed in Section 6 and has the following complexity. Theorem 3.2 (Reduced Multi-Orbit Skew Spectrum). Let f : Sn/Sn 2 Cd. Computing its reduced Multi-Orbit Skew Spectrum takes O(d2n3 + d3n2) operations. The resulting embedding has up to 49d3 non-zero entries. In the remainder of this section, we demonstrate how to construct Multi-Orbit functions for various graph structures. These examples illustrate the spectrum s versatility in capturing complex data representations. Self-loops. Graphs with self-loops naturally split into two orbits: diagonal and off-diagonal entries, which do not mix under permutations. Define f1 = Aσ(n),σ(n 1) for offdiagonal entries and f2 = Aσ(n),σ(n) for diagonal entries. Node features. If each node has a vector xi of p features, the Multi-Orbit function f spans d = p + 1 orbits. Specifically, fi(σ) = xi,σ(n) for i [p] capturing the features, and fd(σ) = Aσ(n),σ(n 1), encoding the adjacency. Edge features. When each edge has up to d features, the graph can be represented as a tensor A Cd n n, where Al,i,j encodes the lth feature of the edge between nodes i and j. The function fi(σ) = Ai,σ(n),σ(n 1) captures these features, with the number of orbits scaling with the features. The Generalized Skew Spectrum of Graphs Multilayer graphs. Graphs with multiple edge layers, each representing distinct relationships (e.g., friendship or work connections in social networks), can be represented using separate adjacency matrices. For layer i, define fi(σ) = Ai,σ(n),σ(n 1), where i indexes the layers. Hypergraphs. In hypergraphs, edges connect multiple nodes. A simple representation uses A Cd n, with rows corresponds to edges and fi(σ) = Ai,σ(n) specifies nodeedge relationships. Although f is defined over Sn/Sn 1, potentially reducing the computation, the number of orbits matches the number of edges, which can be large. Alternatively, for hypergraphs with edges connecting up to q nodes, an adjacency tensor Ai1,...,iq can encode whether an edge exists between nodes i1, . . . , iq. A fictitious node n + 1 handles edges with fewer than q nodes. This defines f(σ) = Aσ(n),...,σ(n (q 1)) on Sn/Sn q. While beyond the scope of this paper, Clausen & H uhne (2017) provide efficient Fourier transforms on Sn/Sn q, which could be used to extend our invariant. 4. Spectra of higher correlations The Skew Spectrum and our Multi-Orbit generalization are Fourier transforms over the last function entry of a triple correlation S(3) f (σ1, σ2) = P σ Sn f( σ)f( σσ1)f( σσ2) In signal analysis, it is well-known that correlation functions constitute translation invariants, because summing over the entire domain effectively averages out translations. Our idea for the second generalization is to compute higher correlations, aiming to increase the completeness of the invariant at the expense of higher computation complexity. Formally, we define a (Multi-Orbit) k-correlation as follows. Definition 4.1 (k-correlation). Let f : Sn Cd. An entry of the k-correlation function is defined as S(k) f (Gk 1) = 1 |Sn| l=1 f( ggl) (7) where Gk 1 = (g1, . . . , gk 1) (Sn)k 1. Our family of invariants - the k-Spectrum entries - is given by Fourier transforms over the last entry of a k-correlation function. This establishes a bijection with the correlation functions, while helping the computation. Definition 4.2 (k-Spectrum). Let f : Sn Cd. An entry of the k-Spectrum is defined as ˆS(k) f (Gk 2, ρ) = M Ik ˆr(k) Ik 1(Gk 2, ρ) ˆfik(ρ). (8) Here Ik = (i1, . . . , ik) [d]k is a list of indices, and ˆr(k) Ik 1(Gk 2, ρ) = X fik 1( g) Qk 2 l=1 fil( ggl) |Sn| ρ( g). (9) The invariance of the k-Spectrum follows directly from the one of the k-correlation functions, holding for both the Skew Spectrum (k = 3) and our Multi-Orbit extension. Theorem 4.3 (Invariance). Let f1, f2 : Sn Cd be equivalent to up to left-translation, such that f1(σ) = f2(σσ) for some fixed σ Sn and every σ Sn. Then, the Multi-Orbit k-Spectra of f1 and f2 are identical. In other words, for each Gk 2 (Sn)k 2, ρ Irr(Sn), we have ˆS(k) f1 (Gk 2, ρ) = ˆS(k) f2 (Gk 2, ρ). Proof. Because of the bijection, we show the translation invariance of k-correlations. By substituting g = σ g, we obtain S(k) f1 (Gk 1) = P g Sn f2(σ g) Nk 1 l=1 f2(σ ggl) |Sn| = P g Sn f2(g ) Nk 1 l=1 f2(g gl) |Sn| = S(k) f2 (Gk 1). Intuition. There are reasons to believe that computing higher correlations will increase the expressivity of the invariant. First, k-correlations are group-invariant polynomials of f of order k. The Stone-Weierstrass theorem (Stone, 1948) tell us that polynomials are dense in continuous functions. Thus one can hope that the set of all k-correlations for arbitrary high k will span a dense subset of the space of all invariants. In other words, one can hope that if a complete invariant exist, then it can be arbitrary well approximated by the k-correlations. Indeed, Adler & Konheim (1962) observed that correlation functions for k = 1, 2, . . . form a complete translation invariant for real-valued, Haar-integrable functions on locally compact abelian groups. While (Chazan & Weiss, 1970) showed that in that general case the invariants are complete only for an infinite k, Kakarala (1992) references and shows settings for which 3-correlations are already complete. We conjecture that, for any finite n, there exists a finite kmax for which the collection of k-correlations, for all k < kmax, forms a complete translation invariant for functions on Sn/Sn 2. In Section 4.1, we provide arguments to believe that kmax O(n2). Another confirmation arises from the study of higher correlations on undirected, unweighted graphs, in which case Aij {0, 1} and Aii = 0. Here, higher correlation entries count increasingly larger graph substructures. For instance, S(1) f = P g Sn A g(n), g(n 1) |Sn| counts the number of edges in the graph. Going on with similar calculations, S(2) f (g1) re- covers the results of S(1) f for g1 = () and g2 = (n 1, n), counts the number of pairs of edges that share a common vertex for g {(n 2, n 1), (n 2, n), (n 2, n 1, n), (n The Generalized Skew Spectrum of Graphs 2, n, n 1)}, and counts the number of pairs of edges that do not share a vertex for g = (n 3, n 1)(n 2, n). Both the first and second correlations are sufficient to distinguish all graphs with up to three vertices. Finally, one can deduce that S(3) f ((n 2, n 1), (n 3, n 1)) is proportional to the number of triples of edges that all share one vertex, while S(3) f ((n 2, n), (n 2, n 1)) is proportional to the number of triples of edges that form a triangle, allowing the invariant to distinguish graphs with more nodes. 4.1. Sparsity and double reduction The increased expressivity of higher correlations comes with higher computational complexity. Indeed, (8) needs to be evaluated on k 2 group elements, each one taking up to n! values. Here we prove a result that reduces the number of k-Spectrum entries to compute. Theorem 4.4. Let f : Sn/Sn 2 Cd. The Multi-Orbit k Spectrum of f is uniquely determined by its subset of components { ˆS(k) f (Gk 2, ρ)|ρ Λn, Gk 2 (Sn/Sn 2)k 2}. Proof. Observe Def. 4.2. To restrict Gk 2, it suffices to see that for any Hk 2 (Sn 2)k 2, Qk 2 l=1 fil( gglhl) = Qk 2 l=1 fil( ggl). To restrict ρ, use Lemma 2.1. While the Skew Spectrum and its Multi-Orbit generalization only need to be evaluated on 7 group elements (Sn 2\Sn/Sn 2) and 4 irreps (Λn), the k-Spectrum requires O(n2(k 2)) group entries and 4 irreps. To reduce the computational complexity, while maintaining the generalization, we employ two heuristics: the reduction of Kondor & Borgwardt (2008), and a double reduction that further limits the entries to evaluate. Reduction. Analogously to the approach of Theorem 2.4, we define the reduced k-Spectrum as ˆS(k) f (Gk 2, ρ) = M Ik ˆs(k) Ik 1(Gk 2, ρ) ˆfik(ρ) (10) where ˆs(k) Ik 1(Gk 2, ρ) = ˆr(k) Ik 1(Gk 2, ρ)τSn 2(ρ) and τSn 2(ρ) is the projection of ρ onto the trivial irrep of Sn 2. τSn 2 makes the rows of ˆs(k) Ik 1(Gk 2, ρ) match the sparsity patterns of the Fourier transform columns (Lemma 2.1 and Appendix B), making ˆs(k) easier to compute and restricting the non-zero elements of each the direct sum term to 49. Double reduction. The Doubly-reduced k-Spectrum is given by the entries of the reduced k-Spectrum evaluated on Gk 2 taking values in (Sn 2\Sn/Sn 2)k 2, without repetitions or order. This heuristic reduces the number of possible entries from n2(k 2) to 7 k 2 . This effectively restricts the possible values of k to a maximum of 9, with the spectrum reaching its peak number of entries in the range [4, 6]. While this heuristics heavily sacrifices expressivity, we will show that it still adds completeness to the reduced Skew Spectrum, without increasing its computational complexity. The reasons that motivate these heuristics are two. First, the Skew Spectrum and the Multi-Orbit version only need to be evaluated on the 7 double cosets representatives Sn 2\Sn/Sn 2. We extend this to k-Spectra by analogy. Second, we note that repeated elements do not increase expressivity when representing unweighted graphs, and that the order of the elements in Gk 2 does not matter. Theorem 4.5 (Element distinctness). Let f : Sn {0, 1} and let Gk 2 the list obtained by adding one element to Gk 3. Then, ˆS(k) f (Gk 2, ρ) adds information over ˆS(k 1) f (Gk 3, ρ) only if the added element in Gk 2 is not already in Gk 3. Proof. Because of the Fourier bijection, we consider kcorrelations (see Def. 4.1). Define Gk 1 by taking Gk 2 and adding one element σ, and then Gk 2 by taking Gk 3 and adding the same σ. Using f(g)2 = f(g) and (7) one can verify S(k) f (Gk 1) = S(k 1) f (Gk 2), implying that ˆS(k) f (Gk 2, ρ) = ˆS(k 1) f (Gk 3, ρ). Theorem 4.6 (Element ordering). Let f : Sn C. ˆS(k) f (Gk 2, ρ) adds information over ˆS(k) f (G k 2, ρ) only if Gk 2 and G k 2 contain distinct sets of elements. Proof. Consider the k-Spectrum (Def. 4.2). Let G k 2 contain permuted elements of Gk 2. Since the terms f(ggl) commute, we have ˆr(k)(Gk 2, ρ) = ˆr(k)(G k 2, ρ). 5. Relationship to WL hierarchy and GNNs Understanding the expressive power of Graph Neural Networks has become a central theme in graph learning research. A key result in this area establishes that message-passing neural networks - a broad class of GNNs - are at most as expressive as the 1-dimensional Weisfeiler-Lehman (1-WL) graph isomorphism test (Xu et al., 2019; Morris et al., 2019). Recent work has extended this line of analysis to alternative architectures, including those based on relational pooling and spectral features, relating them to the broader Weisfeiler Lehman hierarchy (Zhou et al., 2023; Zhang et al., 2024). Within this landscape, exploring the relationship between our proposed k-Spectra (or k-correlations) framework and the WL hierarchy offers a promising avenue for both theoretical insight and practical application. Understanding this connection can shed light on the representational capabilities of k-Spectra and suggests ways it may enhance the The Generalized Skew Spectrum of Graphs Algorithm 1 Doubly-Reduced k-Spectrum Input: Functions fi : Sn/Sn 2 C, for i [d] Precompute s(k) for ρ {(n), (n 1, 1), (n 2, 2), (n 2, 1, 1)} do for i [d] do Compute ˆfik(ρ) end for for Gk 2 comb(Sn 2\Sn/Sn 2, k 2) do for Ik 1 [d]k 1 do Compute ˆs(k) Ik 1(Gk 2, ρ) end for ˆS(k) f (Gk 2, ρ) = L Ik [d]k ˆs(k) Ik 1(Gk 2, ρ) ˆfik(ρ) end for end for Output ˆS(k) f expressivity of graph learning models, such as GNNs. Theoretically, k-WL and k-correlation spectra capture distinct structural properties: while k-WL iteratively refines node representations based on patterns over k-tuples of node neighborhoods, k-correlations count edge-level substructures (see Intuition in the previous section). This distinction positions k-Spectra as a complementary framework for evaluating and augmenting expressivity. As an illustrative example, Gai et al. (2025, Corollary 3.14) show that spectral-invariant GNNs can count paths and cycles of up to 7 vertices. In contrast, non-reduced k-Spectra - with sufficiently large k - can capture longer paths and larger cycles, suggesting a path for practical advantage. To empirically examine the relationship with WL tests, in Section 7.2 we demonstrate that features derived from the doubly-reduced k-Spectra are neither strictly less expressive nor strictly more expressive than those derived from 1-WL. This supports the intuition that k-Spectra offer orthogonal structural insights not captured by WL-based approaches. Beyond theoretical comparisons with WL tests, we highlight three potential integration approaches into GNN pipelines: Post concatenation: Precompute a k-Spectra wholegraph embedding and concatenate it with the output of the GNN embedding, just before the prediction layers. k-Spectra aggregation: Incorporate k-Spectra subgraph invariants during the node aggregation step. k-Spectra pooling: Replace standard pooling functions (e.g., summation, max) with k-Spectra-based invariants to aggregate node features. Each of these strategies supports the incorporation of learnable parameters, such as orbit-specific weights (terms of Eq. 8), allowing the model to adaptively emphasize the most relevant graph information. We see these directions as Algorithm 2 Precomputing s(k) Input: Functions fi : Sn/Sn 2 C, for i [d] for i [d] do for j [n 2] do l [n]\{j} fi((n 1, j, n, l)) for σ Sn/Sn 2 do Pi(g1, σ, j) = fi(σ) Pi(g2, σ, j) = fi(σ (n 1, n)) Pi(g3, σ, j) = fi(σ (n 1, j)) Pi(g4, σ, j) = fi(σ (n, j)) Pi(g5, σ, j) = fi(σ (j, n 1, n)) Pi(g6, σ, j) = fi(σ (j, n, n 1)) Pi(g7, σ, j) = Fi(σ(j)) P l {3,6} Pi(gl, σ, j) n 3 end for end for end for for Ik 2 [d]k 2 do for Gk 2 comb(Sn 2\Sn/Sn 2, k 2) do for σ Sn/Sn 2 do QIk 2(Gk 2, σ) = Pn 2 j=1 Qk 2 l=1 Pil(gl, σ, j) for i [d] do Ik 1 = append(Ik 2, i) s(k) Ik 1(Gk 2, σ) = fik 1(σ) n 2 QIk 2(Gk 2, σ) end for end for end for end for Output s(k) promising opportunities for future work at the intersection of spectral methods and GNN design. 6. Computation and complexity We proceed to detail the computation of the Doubly Reduced k-Spectrum (10), which is the generalization of the Skew Spectrum with the lowest computational complexity. The main idea is summarized in Algorithm 1. We can treat ˆs(k) as the Fourier transform of a function s(k) Sn/Sn 2. s(k) Ik 1(Gk 2, σ) = fik 1(σ) l=1 fil(σhgl) (11) After precomputing s(k), we can compute its transform and that of one single orbit function, for all the input. Then, multiplying them produces one entry of the Spectrum. Precomputing s(k) efficiently requires some creativity. Algorithm 2 does so, using a divide-and-conquer strategy, coupled with dynamic programming. The procedure breaks the computation of s(k) into the computation of partial sums The Generalized Skew Spectrum of Graphs Figure 3. Synthetic dataset with four graphs families. Each family contains 1000 isomorphic graphs. Families are not isomorphic. that can be efficiently recombined when fixing the input parameters. We summarize this intermediate result in the following Lemma, discussing its correctness in Appendix C. Lemma 6.1 (Precomputing s(k)). Algorithm 2 precomputes s(k) Ik 1(Gk 2, σ) for all the choices of the input parameters Ik 1 [d]k 1, Gk 2 (Sn 2\Sn/Sn 2)k 2, σ Sn/Sn 2. It requires O(dk 2n3 +dk 1n2) operations and O(dn3 + dk 1n2) space. Combining this lemma with the efficient computation of Fourier transforms on Sn/Sn 2, we arrive to the following. Theorem 6.2 (Doubly-reduced k-Spectrum). Let f : Sn/Sn 2 Cd. We can compute its doubly-reduced kspectrum in O(dk 1n3 + dkn2) operations. The doublyreduced k-spectrum has up to 7 k 2 7dk non-zero entries. This result recovers the complexity of the reduced Multi Orbit Skew Spectrum (Theorem 3.2) for k = 3 and that of the reduced Skew Spectrum (Theorem 2.4) for d = 1. The double reduction limits the scaling with n. On a single orbit, this enables the computation of more expressive invariants at the same computational cost of the Skew Spectrum. 6.1. Complexity analysis We briefly prove Theorem 6.2, discussing the complexity of the algorithm and the number of non-zero entries. Main algorithm. Algorithm 1 precomputes s(k) (Lemma 6.1) and then proceeds to compute the entries of ˆS(k) f . The for loops compute dk 1 Fourier transforms on s(k) and d on fi, each of which takes O(n3) operations, since both are functions on Sn/Sn 2. Finally, the direct sum performs dk matrix multiplication that cost O(n2) each (see the sparsity patterns of Lemma 2.1). The overall complexity is indeed O(dk 1n3 + dkn2). Non-zero entries. For each of the 4 irreps, there are 7 k 2 lists Gk 2 on which evaluate the spectrum. Each spectrum entry has dk terms in the direct sum, with non-zero entries determined by the irrep. Using (10) and the column sparsity of the Fourier transform (Lemma 2.1), we observe that each ρ {(n), (n 1, 1), (n 2, 1, 1)} produces 1 non-zero entry, while ρ = (n 2, 2) produces 4: (7 = 1 3 + 4). 7. Numerical experiments We implemented a proof-of-concept for the Doubly-reduced k-Spectrum to validate our theoretical insights. We report experiments that illustrate our contributions and show how the new results add expressivity to the original embedding. 7.1. Multiple orbits First, we illustrate the potential of Multi-Orbit embeddings. Eigenvalue collision. A first demonstration considers graphs with node labels, represented by adjacency matrices with labels encoded in the diagonal. While eigenvalues and singular values also form valid permutation-invariant embeddings (permutations act as similarity transformations through orthogonal matrices, preserving the spectrum), they fail to distinguish such structures. (a) Graph 1. (b) Graph 2. Figure 4. Two directed, weighted, labeled non-isomorphic graphs. For example, the two graphs in Fig. 4 are non-isomorphic, as - for instance - the path from 1 to 2 has different weights. Their adjacency matrices, A1 = 1 1 2 2 and A2 = 2 1 2 1 share eigenvalues Eigs(A1) = Eigs(A2) = {3, 0} and singular values {3.16, 0}. However, the reduced 2-Orbit Skew Spectrum, using f1(σ) = Aσ(n),σ(n 1) and f2(σ) = Aσ(n),σ(n), correctly distinguishes them. Synthetic graphs. We evaluated the Multi-Orbit generalization on a synthetic dataset of labeled, directed, and weighted graphs. The dataset consists of four families: (15A, 15B, 6A, 6B) (Fig. 3). Graphs within a family are isomorphic under edgeand label-preserving permutations, but graphs across different families are not. Specifically, in 15A and 15B (and similarly in 6A and 6B), there is a permutation that links the edges and a separate permutation that links the labels, but these permutations are not the same. The task is to correctly classify the four families. We processed the graphs using two representations: (1) the concatenation of Single-Orbit Skew Spectra computed on graph structure and labels, and (2) the 2-Orbit Skew Spectrum. A Random Forest classifier (60 estimators, no max depth) trained on 80% of the data achieves 50% accuracy The Generalized Skew Spectrum of Graphs Table 1. Atomization energy regression on QM7, for different representations and models. Results in Mean Average Error (MAE). REPRESENTATION XGB GBR EN LINEAR SINGLE-ORBIT 29.15 36.55 114.68 61.15 2-ORBIT 18.28 27.12 58.60 49.45 C S EIGS 38.04 37.92 47.83 - LAPL. S EIGS 23.52 26.93 47.62 47.80 with the Single-Orbit representation and 100% with the 2Orbit representation. This is because the concatenation of Single-Orbit Skew Spectra cannot distinguish the couples 15A-15B and 6A-6B, whose labels and edges are linked by different permutations, while the 2-Orbit Skew Spectra capture interactions between edges and labels (6). QM7. We tested the Multi-Orbit generalization on the QM7 dataset, which contains Coulomb matrices of 7,165 molecules with up to 23 atoms (Rupp et al., 2012; Blum & Reymond, 2009). The Coulomb matrix C R23 23 is defined as Cii = 1 2Z2.4 i and Cij = Zi Zj |Ri Rj|, where Zi is the nuclear charge of the i-th atom of the molecule, and Ri is its position. The task is to predict molecular atomization energies (kcal/mol), renormalized from 1 to 1. We compare different representations: the Single-Orbit Skew Spectrum on the off-diagonals, the 2-Orbit Skew Spectrum, and eigenvalues of adjacency and Laplacian matrices. We train several models on a 80%-20% split: Extreme Gradient Boosting (XGB), Gradient Boosting Regressor (GBR), Elastic Net (EN), Linear Regression (Linear) (Pedregosa et al., 2011). Table 1 reports the results, with XGB and the 2-Orbit Skew Spectrum achieving the best performance. 7.2. Higher correlations We study how higher-order correlations affect expressivity using two datasets of non-isomorphic, unweighted, undirected graphs: the Atlas of all graphs with 7 nodes, and a set of connected chordal graphs with 8 nodes (Mc Kay). For each dataset, we computed the Doubly-Reduced k-Spectra for k [3, 9], their concatenation, the eigenvalues of the graph Laplacian, and the 1-WL test for multiple iterations. We then measured the number of collisions, i.e., cases where non-isomorphic graphs could not be distinguished by each method. Figure 5 summarizes the results. Increasing k significantly improves the expressivity of the Skew Spectrum (k = 3), reducing the number of collisions. For n = 7,the concatenated k-Spectra form a complete invariant - no collisions remain - outperforming both the Laplacian eigenvalues and 1-WL in distinguishing graph structures. Interestingly, 1-WL does not achieve a complete invariant for the n = 7 graphs, and the concatenated Representation (a) Graph Atlas, 7 nodes. Representation (b) Chordal graphs, 8 nodes. Figure 5. Shattering non-isomorphic graphs with k-Spectra (k [3, 9]), their concatenation (C), Laplacian s eigenvalues (L), and 1-WL tests with different iterations (1-WL-ITER). k-Spectra offer greater expressivity in that setting. On the other hand, for the chordal graphs with n = 8, 1-WL results in fewer collisions than the concatenated spectra. This outcome supports the perspective discussed in Section 5, reinforcing the idea that k-correlations and WL tests measure different structural properties and can serve as complementary expressivity metrics. 8. Conclusion This work generalizes the Skew Spectrum (Kondor & Borgwardt, 2008), advancing group-theoretic methods for permutation-invariant graph embeddings. We introduced Multi-Orbit Spectra for complex graph structures, k-Spectra to trade-off expressivity and complexity, and Doubly Reduced k-Spectra to enhance expressivity of the original Skew Spectrum without added cost. We support our theoretical insights with numerical experiments. This study complements modern approaches and opens new avenues for research in graph representation. Future directions. On the theoretical side, (1) a key open question is whether there exists a finite kmax such that the collection of all k-correlations up to kmax forms a complete invariant, which remains to be proven. (2) Our framework, focused on Sn/Sn 2, could naturally be extended to functions on G/H for arbitrary groups/subgroups, broadening its scope beyond graphs to invariants for other structured data domains. (3) Another avenue is exploring other families of translation-invariant polynomials beyond k-correlations. On the practical side, (1) developing a scalable, optimized open-source implementation with thorough benchmarking is a crucial step toward real-world adoption. (2) Integrating our methods with modern deep learning frameworks offers further opportunities: (2.1) introducing trainable parameters, such as learning the relative importance of correlations between orbits, or (2.2) using the generalized Skew Spectrum as a positional encoding, aggregation, or pooling function to enhance neural representations. The Generalized Skew Spectrum of Graphs Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Acknowledgements First and foremost, we would like to thank Ramakrishna Kakarala for providing us with a soft copy of his thesis. We also extend our sincere thanks to the Quantum Open Source Foundation (QOSF), through which the authors met and this project started. A.B. gratefully acknowledges Stefano Gogioso and Erickson Tjoa for insightful discussions that helped improve the mathematical notation; Filippo Guerranti for providing references, background material, and assistance in revising the manuscript and supporting the submission process; and Alessandro Palma for valuable feedback on experiments and the submission process. He also thanks Professors Stefano Zanero, Donatella Sciuto, and Ferruccio Resta for their support during his Ph.D., Patrick Rebentrost for hosting him at CQT during part of this research, and Professor Ignacio Cirac for his support during his postdoctoral work. A.B. s research was partially funded by THEQUCO as part of the Munich Quantum Valley, supported by the Bavarian State Government through the Hightech Agenda Bayern Plus. Additional financial support was provided by ICSC - National Research Centre in High Performance Computing, Big Data and Quantum Computing, Spoke 10, funded by the European Union - Next Generation EU, under grant PNRR-CN00000013-HPC. M.P. acknowledges support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, project numbers 447948357 and 440958198), the Sino-German Center for Research Promotion (Project M-0294), the German Ministry of Education and Research (Project Qu Ku K, BMBF Grant No. 16KIS1618K), the DAAD, the Alexander von Humboldt Foundation, and the Nieders achsisches Ministerium f ur Wissenschaft und Kultur. A.L. acknowledges the support of the National Research Foundation, Singapore, and A*STAR under its Centre for Quantum Technologies Funding Initiative (S24Q2d0009), and the Quantum Engineering Programme (QEP 2.0) under grants NRF2021-QEP2-02-P05 and NRF2021-QEP2-02P01. Adler, R. and Konheim, A. A note on translation invariants. Proceedings of The American Mathematical Society - PROC AMER MATH SOC, 13:425 425, 03 1962. doi: 10.2307/2034951. Babai, L. Graph isomorphism in quasipolynomial time. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pp. 684 697, 2016. Batatia, I., Geiger, M., Munoz, J., Smidt, T., Silberman, L., and Ortner, C. A general framework for equivariant neural networks on reductive lie groups. Advances in Neural Information Processing Systems, 36:55260 55284, 2023. Bellante, A. Quantum Algorithms for Sparse Recovery and Machine Learning. Phd thesis, Politecnco di Milano, Milan, Italy, 2024. Blum, L. C. and Reymond, J.-L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc., 131:8732, 2009. Bronstein, M. M., Bruna, J., Cohen, T., and Veliˇckovi c, P. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. ar Xiv preprint ar Xiv:2104.13478, 2021. Chazan, D. and Weiss, B. Higher order autocorrelation functions as translation invariants. Information and Control, 16(4):378 383, 1970. Clausen, M. and H uhne, P. Linear time fourier transforms of sn-k-invariant functions on the symmetric group sn. In Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, pp. 101 108, 2017. Corso, G., Stark, H., Jegelka, S., Jaakkola, T., and Barzilay, R. Graph neural networks. Nature Reviews Methods Primers, 4(1):17, 2024. Dehmamy, N., Walters, R., Liu, Y., Wang, D., and Yu, R. Automatic symmetry discovery with lie algebra convolutional network. Advances in Neural Information Processing Systems, 34:2503 2515, 2021. Gai, J., Du, Y., Zhang, B., Maron, H., and Wang, L. Homomorphism expressivity of spectral invariant graph neural networks. In 13th International Conference on Learning Representations, ICLR 2025, 2025. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263 1272. PMLR, 2017. Grillet, P. A. Abstract algebra, volume 242. Springer Science & Business Media, 2007. Hamilton, W. L. Graph representation learning. Morgan & Claypool Publishers, 2020. The Generalized Skew Spectrum of Graphs Hammond, D. K., Vandergheynst, P., and Gribonval, R. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129 150, 2011. Kakarala, R. Triple correlation on groups. Phd thesis, University of California, Irvine, 1992. Kondor, I. R. Group theoretical methods in machine learning. Phd thesis, Columbia University, 2008. Kondor, R. The skew spectrum of functions on finite groups and their homogeneous spaces. ar Xiv preprint ar Xiv:0712.4259, 2007. Kondor, R. and Borgwardt, K. M. The skew spectrum of graphs. In Proceedings of the 25th international conference on Machine learning, pp. 496 503, 2008. Kosmala, A., Gasteiger, J., Gao, N., and G unnemann, S. Ewald-based long-range message passing for molecular graphs. In International Conference on Machine Learning, pp. 17544 17563. PMLR, 2023. Maslen, D. The efficient computation of fourier transforms on the symmetric group. Mathematics of Computation, 67(223):1121 1147, 1998. Mc Kay, B. Combinatorial graphs. https://web. archive.org/web/20241231180856/https: //users.cecs.anu.edu.au/ bdm/data/ graphs.html. Accessed: 2024-12-31. Morris, C., Ritzert, M., Fey, M., Hamilton, W. L., Lenssen, J. E., Rattan, G., and Grohe, M. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pp. 4602 4609, 2019. Nettleton, D. F. Data mining of social networks represented as graphs. Computer Science Review, 7:1 34, 2013. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825 2830, 2011. Pourhabibi, T., Ong, K.-L., Kam, B. H., and Boo, Y. L. Fraud detection: A systematic literature review of graphbased anomaly detection approaches. Decision Support Systems, 133:113303, 2020. Raczka, R. and Barut, A. O. Theory of group representations and applications. World Scientific Publishing Company, 1986. Rotman, J. J. An introduction to the theory of groups, volume 148. Springer Science & Business Media, 1999. Rupp, M., Tkatchenko, A., M uller, K.-R., and von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Physical Review Letters, 108:058301, 2012. Sagan, B. E. The symmetric group: representations, combinatorial algorithms, and symmetric functions, volume 203. Springer Science & Business Media, 2013. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE transactions on neural networks, 20(1):61 80, 2008. Stone, M. H. The generalized weierstrass approximation theorem. Mathematics Magazine, 21(5):237 254, 1948. Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019. Zee, A. Group Theory in a Nutshell for Physicists. In a Nutshell. Princeton University Press, 2016. ISBN 9780691162690. Zhang, B., Zhao, L., and Maron, H. On the expressive power of spectral invariant graph neural networks. In Proceedings of the 41st International Conference on Machine Learning, pp. 60496 60526, 2024. Zhang, Z., Bu, J., Ester, M., Zhang, J., Yao, C., Yu, Z., and Wang, C. Hierarchical graph pooling with structure learning. ar Xiv preprint ar Xiv:1911.05954, 2019. Zhao, Y. Young tableaux and the representations of the symmetric group. dimension, 3(1):3, 2008. Zhou, C., Wang, X., and Zhang, M. From relational pooling to subgraph gnns: A universal framework for more expressive graph neural networks. In International Conference on Machine Learning, pp. 42742 42768. PMLR, 2023. The Generalized Skew Spectrum of Graphs A. The Symmetric Group Sn We review key concepts related to the symmetric group Sn to support our discussion. For a more in-depth treatment, see Grillet (2007), Rotman (1999), Raczka & Barut (1986), and the background sections of Bellante (2024). The Symmetric Group Sn is the group of all permutations of n objects. Each permuation g Sn acts by mapping the indices of objects in an ordered set to new positions, effectively defining a bijective function g : [n] [n]. The group has |Sn| = n! elements, corresponding to all possible rearrangements of n indices. Besides the cycle notation used in the main text, permutations can be conveniently represented as arrays: g = 1 2 n 1 n [ ] g(1), g(2), g(n 1), g(n) (12) where g(i) denotes the image of index i under g. For instance, the transposition swapping the first two elements in S4 is g = 1 2 3 4 [ ] 2, 1, 3, 4 . This notation makes it easy to identify a left coset transversal for Sn/Sn 2 as a set containing one representative for each group element that fixes the last two positions g = 1 n 1 n [ ] , , i, j . Since i, j [n] and i = j, there are n(n 1) such representatives, giving |Sn/Sn 2| = n(n 1). Functions on graphs, such as f(σ) = Aσ(n),σ(n 1), depend only on the image of the last two indices. This makes it clear that f(σh) = f(σ), for all σ Sn, h Sn 2, (13) since elements h Sn 2 preserve the last two indices: h = 1 n 1 n [ ] , , n 1, n . A.1. Irreducible representations We introduce the concepts that are most relevant to our work. For a more comprehensive treatment of the irreducible representations of Sn, see Sagan (2013), Zhao (2008), and the background sections of Bellante (2024). For our focus, a representation of Sn is a mapping ρ : Sn U(dim(ρ)) that assigns each permutation σ Sn a unitary matrix of size dim(ρ) dim(ρ). This mapping satisfies the following properties: 1. ρ(e) = I, where e is the identity permutation and I the identity matrix. 2. ρ(σ1σ2) = ρ(σ1)ρ(σ2) for all σ1, σ2 Sn. 3. ρ(σ 1) = ρ(σ) . Since the direct sum of two representations is also a representation, we say that a representation is irreducible (irrep) if it cannot be decomposed into a direct sum of smaller representations. That is, if ρ(σ) = ρ1(σ) ρ2(σ) for all σ Sn, then either dim(ρ1) = 0 or dim(ρ2) = 0. Irreducible representations of Sn are classified by partitions of n, up to equivalence under invertible linear transformations. A partition of n is a sequence λ = (λ1, λ2, . . . , λl) of weakly decreasing positive integers (λi λi+1) that sum to n. We denote this by λ n and often use ρ interchangeably for both the partition and the corresponding representation. The trivial partition of n is (n), which corresponds to the trivial representation ρ : Sn C given by ρ(σ) = 1 for all σ. A convenient way to visualize partitions is through Ferres diagrams (or Young diagrams), which consist of rows of boxes, where the i-th row contains λi boxes. Because λ is weakly decreasing, each row contains at most as many boxes as the row above. For example, the partition (7, 4, 3, 1) 15 is represented as: The Generalized Skew Spectrum of Graphs Figure 6. Young lattice for S4. A Young tableau is a Young diagram filled with the numbers 1 through n without repetition. A standard Young tableau has strictly increasing numbers along each row and each column. The dimension of an irrep labeled by λ is given by the number of standard Young tableaux of shape λ. This number can be computed through the hook lenght formula. Table 2 lists the dimensions of irreps relevant to our work. Table 2. Dimensions of irreducible representations of Sn indexed by partitions λ n. The dimension is given by the hook-length formula f λ, which counts the number of standard tableaux of shape λ. These values are from Kondor (2008, Table 1.1). λ f λ or dim(ρλ) (n) 1 (n 1, 1) n 1 (n 2, 2) n(n 3) 2 (n 2, 1, 1) (n 1)(n 2) In this work, we use the Young Orthogonal Representation (YOR), a specific irrep of Sn with real-valued, sparse matrices. This representation is adapted to the subgroup chain Sn Sn 1 S1, meaning that any ρ k, when restricted to Sk 1, decomposes as: ρ Sk 1 (h) = M η:η<ρ η(h), h Sk 1. (14) Here ρ k, η k 1, and the notation η < ρ means that η can be obtained by removing one box from the Young diagram of ρ. This structure provides an efficient way to decompose irreps of Sn when acting on elements of a subgroup Sn k. Figure 6 illustrates this decomposition for S4. Such diagrams are often termed Bratteli diagrams or Young diagrams. Finally, a useful result in representation theory, sometimes referred to as the Great Orthogonality Theorem (Zee (2016, Section II.3)), states that for any two irreps ρ1, ρ2, g Sn ρ1(g) i,jρ(g)k,l = |Sn| dim(ρ)δρ1,ρ2δi,lδj,k. (15) The Generalized Skew Spectrum of Graphs (n 2, 1) (n 1) (n 1, 1) (n) (n 2, 2) (n 2, 1, 1) Figure 7. Partial Young lattice of the partitions of n reaching (n 2). Here, δρ1,ρ2 is the Kronecker delta, which is 1 if ρ1 and ρ2 are equivalent (i.e., represent the same irrep) and 0 otherwise. A direct consequence of (15) is that summing over all elements of Sn in a given irrep of ρ n yields: σ Sn ρ(σ) = ( |Sn| ρ = (n) 0 otherwise. (16) This follows from the fact that the trivial representation (n) has dimension 1 and consists entirely of ones. B. Fourier sparsity and projecting on the trivial irrep In this appendix, we examine the role of τSn 2(ρ), which represents the projection of an irrep ρ Irr(Sn) onto the trivial representation of Sn 2: τSn 2(ρ) = 1 |Sn 2| h Sn 2 ρ(h) (17) This projection naturally arises in the Fourier transform of functions Sn/Sn 2 C (Kondor, 2007): ˆf(ρ) = 1 |Sn| σ Sn f(σ)ρ(σ) = 1 |Sn| h Sn 2 f(yh)ρ(yh) (18) y Sn/Sn 2 f(y)ρ(y) 1 |Sn 2| h Sn 2 ρ(h) (19) y Sn/Sn 2 f(y)ρ(y)τSn 2(ρ). (20) Moreover, τSn 2(ρ) is responsible for the sparsity patterns described in Lemma 2.1. Using the fact that YOR is adapted (14) , along with a special case of the Great Orthogonality Theorem (16), we obtain: τSn 2(ρ) = 1 |Sn 2| h Sn 2 η(h) = M ( 1 if η = (n 2) 0 otherwise. (21) This shows that τSn 2(ρ) is a diagonal matrix containing mostly 0s, except for a few 1s corresponding to the positions where (n 2) appears in the decomposition of ρ. From the structure of the Young lattice, we see that there are exactly four partitions of n that contain (n 2): (n), (n 1, 1), (n 2, 2), (n 2, 1, 1). (22) This set of partitions is refereed to as Λn in the main text. Figure 7 provides a visualization of a partial Young lattice, illustrating these decompositions. The Generalized Skew Spectrum of Graphs Since all other irreps do not contain (n 2), their projection τSn 2(ρ) is the zero matrix, implying that their Fourier transform also vanishes. Further observation reveals that three of these partitions - (n), (n 1, 1), (n 2, 1, 1) - contain (n 2) exactly once, while (n 2, 2) contains it twice. The number of paths from ρ to (n 2) in the Young lattice determines how many diagonal entries of τSn 2(ρ) are 1. Combining this observation with the dimensions of the irreps (see Table 2) completes the proof of Lemma 2.1. Finally, note that, since τSn 2(ρ) is a diagonal matrix, it follows that τSn 2(ρ) = τSn 2(ρ) . C. Precomputing s(k) with Dynamic Programming In this Appendix, we prove the correctness of Lemma 6.1 and Algorithm 2. We begin by proving the following theorem. Theorem C.1. The Fourier transform of s(k) Ik 1(Gk 2, σ) = fik 1(σ) l=1 fil(σhgl) (23) over the last entry produces ˆs(k) Ik 1(Gk 2, ρ) = ˆr(k) Ik 1(Gk 2, ρ)τSn 2(ρ), where ˆr(k) Ik 1(Gk 2, ρ) = g Sn fik 1( g) Qk 2 l=1 fil( ggl)ρ( g) and τSn 2(ρ) is the projection of ρ onto the trivial irrep (n 2). Proof. We compute ˆs(k) using the expressions from (23) and (3): ˆs(k) Ik 1(Gk 2, ρ) = 1 |Sn| l=1 fil(ghgl)ρ(g) (use g = gh) (24) = 1 |Sn| 1 |Sn 2| h Sn 2 fik 1( gh 1) l=1 fil( ggl)ρ( gh 1) (25) g Sn fik 1( g) l=1 fil( ggl)ρ( g) h Sn 2 ρ(h) (use τSn 2(ρ) = τSn 2(ρ) ) (26) = ˆr(k) Ik 1(Gk 2, ρ)τSn 2(ρ). (27) It is also important to observe that s(k) Ik 1(Gk 2, σ) = s(k) Ik 1(Gk 2, σh) for any h Sn 2. This implies that one can use a fast Fourier transform on Sn/Sn 2 to obtain ˆs(k) Ik 1(Gk 2, ρ). Now, we address the task of efficiently precomputing (23) for i1, . . . , ik 1 [d], g1, . . . , gk 2 taking values in the double coset transversal Sn 2\Sn/Sn 2 = {(), (n 1, n), (n 2, n 1), (n 2, n), (n 2, n 1, n), (n 2, n, n 1), (n 3, n 1)(n 2, n)}, and σ Sn/Sn 2. Naively computing (23) for all the possible entries would be prohibitively expensive. Specifically, the sum over Sn 2 contains (n 2)! terms, making this computation infeasible even for small n. To circumvent this, we employ a divide-and-conquer strategy plus dynamic programming. The main idea is to decompose (23) as s(k) Ik 1(Gk 2, σ) = fik 1(σ) n 2 QIk 2(Gk 2, σ) (28) where QIk 2(Gk 2, σ) = Pn 2 j=1 Qk 2 l=1 Pil(gl, σ, j) and we have 7 functions P, one per element in the double coset The Generalized Skew Spectrum of Graphs 1. Pi(g1, σ, j) = fi(σ). 2. Pi(g2, σ, j) = fi(σ (n 1, n)). 3. Pi(g3, σ, j) = fi(σ (n 1, j)). 4. Pi(g4, σ, j) = fi(σ (n, j)). 5. Pi(g5, σ, j) = fi(σ (j, n 1, n)). 6. Pi(g6, σ, j) = fi(σ (j, n, n 1)). 7. Pi(g7, σ, j) = Fi(σ(j)) P l {3,6} Pi(gl, σ, j) with Fi(j) = P l [n] l =j fi((n 1, j, n, l)). Here, σ g represents the group operation combining two elements, while σ(j) denotes the image of j under σ. By utilizing dynamic programming, we can compute these P functions efficiently and precompute s(k) without resorting to a naive approach. We break Lemma 6.1 into two propositions, proving its complexity and correctness separately. C.1. Complexity Proposition C.2. Algorithm 2 requires O(dk 2n3 + dk 1n2) operations and O(dn3 + dk 1n2) space. Proof. We analyze the complexity of each step in Algorithm 2. Computing Fi(j): Each function Fi(j) requires O(n) operations. Since we need to compute O(dn) such functions, the total time and space complexity for this step is O(dn2). Computing Pi(g, σ, j): Each function Pi(g, σ, j) can be evaluated in constant time, O(1). We need to evaluate O(dn3) such functions, leading to a total time and space complexity of O(dn3). Computing QIk 2(Gk 2, σ): For a fixed constant k [3, 9], each QIk 2(Gk 2, σ) can be evaluated in O(n) operations. There are dk 2 7 k 2 n(n 1) entries, so the total time complexity of computing all the functions is O(dk 2n3), and the space complexity is O(dk 2n2). Computing s(k) Ik 1(Gk 2, σ): Each s(k) Ik 1(Gk 2, σ) can be evaluated in constant time, O(1). There are O(dk 1n2) such entries, requiring O(dk 1n2) time and space. Summing up the asymptotic complexities for time and space we obtain O(dk 2n3 + dk 1n2) operations and O(dn3 + dk 1n2) space. C.2. Correctness Proposition C.3. Algorithm 2 precomputes s(k) Ik 1(Gk 2, σ) for all the choices of the input parameters Ik 1 [d]k 1, Gk 2 (Sn 2\Sn/Sn 2)k 2, σ Sn/Sn 2. Proof. The core idea behind the correctness of Algorithm 2 lies in breaking down the sum over Sn 2. In particular, the crucial fact is that the sum over Sn 2 l=1 fil(σhgl) (29) can be broken as a sum of n 2 partial sums l=1 Pil(gl, σ, j). (30) To illustrate this, we study the term 1 |Sn 2| h Sn 2 fi(σhg) (31) The Generalized Skew Spectrum of Graphs for all g Sn 2\Sn/Sn 2 and show how it naturally breaks into n 2 partial sums indexed by j [n 2]. Specifically, we show that each j corresponds to the (n 3)! elements h Sn 2 such that h(n 2) = j. With this additional fact, analyzing (31) suffices to prove the equivalence of (29) and (31). In the following, for each of the 7 group elements, we consider the action of σhg on n and n 1 and simplify the expression. For an ease of notation, we rewrite each fi as fi(σ) = f i(σ(n 1), σ(n)), with f i : [n]2 C. σhg(n) = σh(n) = σ(n) (32) σhg(n 1) = σh(n 1) = σ(n 1). (33) h Sn 2 f i(σhg(n 1), σhg(n)) = f i(σ(n 1), σ(n)) = fi(σ) (34) So the term breaks in 1 n 2 Pn 2 j=1 Pi(g1, σ, j). The choice of how j divides Sn 2 is arbitrary here, so we can satisfy h(n 2) = j. 2. g2 = (n 1, n) σhg(n) = σh(n 1) = σ(n 1) (35) σhg(n 1) = σh(n) = σ(n) (36) h Sn 2 f i(σhg(n 1), σhg(n)) = f (σ(n), σ(n 1)) = f(σ (n 1, n)). (37) So the term breaks in 1 n 2 Pn 2 j=1 Pi(g2, σ, j). Similarly to the previous case, the choice of how j divides Sn 2 is arbitrary, so we satisfy h(n 2) = j. 3. g3 = (n 2, n 1) σhg(n) = σh(n) = σ(n) (38) σhg(n 1) = σh(n 2) = σ(j) for j [n 2] (39) There are (n 3)! elements h Sn 2 that can produce a certain j, and there are n 2 possible j. Grouping these terms, we can write h Sn 2 f i(σhg(n 1), σhg(n)) = 1 (n 2)! j=1 (n 3)!f (σ(j), σ(n)) = 1 n 2 j=1 f(σ (n 1, j)). (40) So the term breaks in 1 n 2 Pn 2 j=1 Pi(g3, σ, j). 4. g4 = (n 2, n) σhg(n) = σh(n 2) = σ(j) for j [n 2] (41) σhg(n 1) = σh(n 1) = σ(n 1) (42) Similarly to the case above, h Sn 2 f i(σhg(n 1), σhg(n)) = 1 (n 2)! j=1 (n 3)!f (σ(n 1), σ(j)) = 1 n 2 j=1 f(σ (n, j)). (43) The Generalized Skew Spectrum of Graphs So the term breaks in 1 n 2 Pn 2 j=1 Pi(g4, σ, j). 5. g5 = (n 2, n 1, n) σhg(n) = σh(n 2) = σ(j) for j [n 2] (44) σhg(n 1) = σh(n) = σ(n) (45) h Sn 2 f i(σhg(n 1), σhg(n)) = 1 (n 2)! j=1 (n 3)!f (σ(n), σ(j)) = 1 n 2 j=1 f(σ (j, n 1, n)). (46) So the term breaks in 1 n 2 Pn 2 j=1 Pi(g5, σ, j). 6. g6 = (n 2, n, n 1) σhg(n) = σh(n 1) = σ(n 1) (47) σhg(n 1) = σh(n 2) = σ(j) for j [n 2] (48) h Sn 2 f i(σhg(n 1), σhg(n)) = 1 (n 2)! j=1 (n 3)!f (σ(j), σ(n 1)) (49) j=1 f(σ (j, n, n 1)). (50) So the term breaks in 1 n 2 Pn 2 j=1 Pi(g6, σ, j). 7. g = (n 3, n 1)(n 2, n) σhg(n) = σh(n 2) = σ(j) for j [n 2] (51) σhg(n 1) = σh(n 3) = σ(l) for l [n 2], l = j (52) Then, we obtain h Sn 2 f i(σhg(n 1), σhg(n)) = 1 (n 2)! (n 4)!f (σ(j), σ(l)) (53) f (σ(j), σ(l)). (54) So the term breaks in 1 n 2 Pn 2 j=1 Pi(g7, σ, j). Here, Pi(g7, σ, j) = 1 n 3 Pn 2 l=1 l =j f (σ(j), σ(l)). To further optimize the computation of Pi(g7, σ, j), we can compute partial sums Fi(j) = Pn l=1 j =i f i(j, l) and write Pi(g7, σ, j) = Fi(σ(j)) f i(σ(j), σ(n 1)) f i(σ(j), σ(n)) n 3 . (55) Using the cyclic notation, we obtain Pi(g7, σ, j) = Fi(σ(j)) P l {3,6} Pi(gl, σ, j) n 3 and Fi(j) = Pn l=1 j =i fi((n 1, j, n, l)). The Generalized Skew Spectrum of Graphs Table 3. Regression on QM9 and ZINC with different representations. We tested the following machine learning models: Extreme Gradient Boosting (Xgboost), Gradient Boosting Regressor (GBR), Elastic Net (EN), Linear Regression (Linear) using the default parameters of sk-learn. The error is measured as Mean Absolute Error. Dataset Representation Xgboost GBR EN Linear Single-orbit 1.1560 1.1552 1.1571 1.1558 3-orbits 1.1570 1.1528 1.1559 1.1540 C s eigs 0.9602 0.9640 1.1571 0.9899 Laplacian s eigs 0.9245 0.9377 1.0773 0.9704 Single-orbit 1.5442 1.5432 1.5422 1.5423 3-orbits 1.5500 1.5441 1.5422 1.5423 C s eigs 1.3208 1.3876 1.5297 1.4249 Laplacian s eigs 1.3350 1.3933 1.5353 1.4324 D. Further experiments In this section, we report further experiments carried on during the conference rebuttal period. These results are preliminary and intended to provide further insight, though they merit deeper future analysis. D.1. Scalability and other real-world datasets To further evaluate the scalability and practical relevance of the multi-orbit k-Spectra, we extended the multi-orbit experiments on QM7 presented in Section 7.1 (Table 1) to two larger molecular datasets: QM9 and ZINC. Both datasets were loaded through torch.geometric. QM9 consists of 130,831 molecules, each with up to 9 heavy atoms (maximum 29 nodes). We used the first 100,000 molecules for training and the remaining 30,831 for testing. ZINC contains 249,456 molecules with up to 38 nodes. We used the default train/test/validation split from torch.geometric, training on 220,011 molecules and testing on 5,000, ignoring the validation set for simplicity. Both tasks are regression problems. For QM9, the goal is to predict the dipole moment (target label 0 in torch.geometric); for ZINC, the task is to predict penalized log P, also referred to as constrained solubility. Following the setup used for QM7, we compare different graph representations: the Single-Orbit Skew Spectrum on the off-diagonals, a 3-Orbit Skew Spectrum, and eigenvalues of adjacency and Laplacian matrices. For QM9, the three orbits are defined as follows: (1) the Coulomb matrix C; (2) an edge attribute obtained by summing the four edge attributes available in the dataset; and (3) the sixth node feature. For ZINC, the orbits correspond to: (1) the G matrix; (2) the single edge attribute; and (3) the single node feature. We trained multiple regression models on these representations: Extreme Gradient Boosting (XGB), Gradient Boosting Regressor (GBR), Elastic Net (EN), Linear Regression (Linear) (Pedregosa et al., 2011). Results are presented in Table 3. For QM9, , the 3-orbit representation consistently outperforms the single-orbit skew spectrum and is competitive with Laplacian and adjacency-based features, with the Laplacian eigenvalues achieving the best accuracy. In ZINC, the 3-orbit representation is again comparable to other methods, although the eigenvalues of the Coulomb matrix C yield the best performance. We imagine that a more thoughtful use of the k-Spectra could yield better results. D.2. Integration with GNNs To evaluate the practical impact of incorporating our graph representation into deep learning pipelines, we integrate the k-Spectra features into a standard Graph Neural Network. Specifically, we integrated our k-Spectra with the HGP-SL model (Hierarchical Graph Pooling with Structure Learning) (Zhang et al., 2019). We follow a straightforward integration strategy: after computing a whole-graph embedding using our Doubly-Reduced k-Spectra, we concatenate this vector with the graph embedding produced by the final convolutional layer of HGP-SL. The concatenated representation is then passed to a fully connected layer that produces the final prediction. This design corresponds to the post-concatenation approach discussed in Section 5. The Generalized Skew Spectrum of Graphs Table 4. Test accuracy on the PROTEINS dataset. Average standard deviation (Max) over six splits. k-Spectra features are concatenated to the HGP-SL embeddings, before the prediction layer. Baseline PR=0.5 0.508 0.004 (0.615) PR=0.2 0.518 0.004 (0.615) k=3 k=4 k=5 k=3 k=4 k=5 Single-O 0.609 0.020 (0.769) 0.622 0.011 (0.769) 0.612 0.010 (0.692) 0.642 0.016 (0.738) 0.625 0.001 (0.662) 0.588 0.009 (0.662) Multi-O 0.695 0.002 (0.754) 0.683 0.000 (0.708) 0.582 0.023 (0.738) 0.665 0.003 (0.723) 0.680 0.003 (0.738) 0.665 0.010 (0.815) The model is trained and evaluated on the PROTEINS dataset, which contains protein structures represented as graphs, filtered to contain only graphs with 3 to 30 nodes. We use the same training protocol and hyperparameters as in the original HGP-SL work, including a learning rate of 0.001, a batch size of 32, and a maximum of 1000 training epochs. Early stopping is applied based on validation loss. We experiment with two dropout rates: the original 0.5 and a lower value of 0.2. We test both single-orbit and multi-orbit variants of DRk SP, using k {3, 4, 5}. For each configuration, we report the average accuracy, standard deviation, and maximum accuracy across six random train/test splits. Table 4 summarizes the results. The baseline HGP-SL model achieves an average accuracy of 0.508 (dropout 0.5) and 0.518 (dropout 0.2). Augmenting HGP-SL with the spectra consistently improves accuracy across all settings. Notably, moving from k = 3 to k = 4 generally improves performance. Multi-orbit variants, built using the adjacency matrix and one orbit per node feature, outperform single-orbit variants in most cases. These results support the hypothesis that k-Spectra features provide complementary information that enhances the learning capacity of GNNs.