# hodgeaware_convolutional_learning_on_simplicial_complexes__fdcc7331.pdf

Published in Transactions on Machine Learning Research (08/2025)

Hodge-Aware Convolutional Learning on Simplicial Complexes

Maosheng Yang m.yang-2@tudelft.nl Department of Intelligent Systems Delft University of Technology

Geert Leus g.j.t.leus@tudelft.nl Department of Microelectronics Delft University of Technology

Elvin Isufi e.isufi-1@tudelft.nl Department of Intelligent Systems Delft University of Technology

Reviewed on Open Review: https: // openreview. net/ forum? id= Nm5sp09Q25

Neural networks on simplicial complexes (SCs) can learn representations from data residing on simplices such as nodes, edges, triangles, etc. However, existing works often overlook the Hodge theorem that decomposes simplicial data into three orthogonal characteristic subspaces, such as the identifiable gradient, curl and harmonic components of edge flows. This provides a universal tool to understand the machine learning models on SCs, thus, allowing for better principled and effective learning. In this paper, we study the effect of this data inductive bias on learning on SCs via the principle of convolutions. Particularly, we present a general convolutional architecture that respects the three key principles of uncoupling the lower and upper simplicial adjacencies, accounting for the inter-simplicial couplings, and performing higher-order convolutions. To understand these principles, we first use Dirichlet energy minimizations on SCs to interpret their effects on mitigating simplicial oversmoothing. Then, we show the three principles promote the Hodge-aware learning of this architecture, through the lens of spectral simplicial theory, in the sense that the three Hodge subspaces are invariant under its learnable functions and the learning in two nontrivial subspaces is independent and expressive. Third, we investigate the learning ability of this architecture in optic of perturbation theory on simplicial topologies and prove that the convolutional architecture is stable to small perturbations. Finally, we corroborate the three principles by comparing with methods that either violate or do not respect them. Overall, this paper bridges learning on SCs with the Hodge theorem, highlighting its importance for rational and effective learning from simplicial data, and provides theoretical insights to convolutional learning on SCs.

1 Introduction

In the line of geometric deep learning (Bronstein et al., 2021), there is a growing interest in learning from data defined on simplicial complexes. The motivation behind this comes from two limitations of standard graph neural networks (GNNs). First, graphs are limited to model pairwise interactions between data entites on nodes, yet polyadic (multi-way) interactions often arise in real-world networks (Battiston et al., 2020; Benson et al., 2021; Torres et al., 2021), such as friendship networks (Newman et al., 2002), collaboration networks (Benson et al., 2018), gene regulatory networks (Masoomy et al., 2021). Second, graphs are often used to support signals on the nodes, and standard graph signal processing and GNN approaches often

Published in Transactions on Machine Learning Research (08/2025)

revolve around signals and features on nodes. Yet, signals involved with multiple entities are less researched compared to signals on nodes (with one entity). They arise as signal flows on edges, signals on triangles and so on. For example, in physical networks, we may encounter water flows in a water supply network (Money et al., 2022), traffic flows in a road network (Jia et al., 2019), trading flows in financial networks (Lim, 2020) and information flows in brain networks (Anand et al., 2022), as well as in human-generated networks, we have collaboration data, such as triadic collaborations in coauthorship networks (Benson et al., 2018).

Simplicial complexes are a popular higher-order network model and have been shown effective to address both limitations of graph-based models (Bick et al., 2021). They are composed of topological objects, namely, nodes, edges, triangles, etc., which are simplices of different orders. Simplicial complexes naturally describe more topological (higher-order) relationships in networks, thus, having more topological expressive power than graphs. This has been the main motivation of recent neural networks developed on simplicial complexes (Roddenberry & Segarra, 2019; Bunch et al., 2020; Ebli et al., 2020; Roddenberry et al., 2021; Bodnar et al.,

2021b; Chen et al., 2022b; Giusti et al., 2022). We also refer readers to the recent surveys (Papamarkou et al., 2024; Besta et al., 2024). In analogy to standard GNNs relying on the adjacency between nodes, the central idea behind these works is to rely on the relationships between simplices to enable learning. Such relations can be twofold: first, two simplices can be lower and upper adjacent to each other, e.g., an edge can be (lower) adjacent to another via a shared node, and can also be (upper) adjacent to another by locating in a common triangle; and second, there exist inter-simplicial couplings (or simplicial incidences) between simplices of different orders, as shown in Fig. 1a. The aforementioned works mainly vary in terms of either message-passing or convolutional flavor, or the type of simplicial relationships relying, either on only simplicial adjacencies or on both adjacencies and incidences.

Furthermore, signals can be defined on simplices to model the data related to multiple entites in networks. This has been the main focus of topological signal processing literature Barbarossa & Sardellitti (2020); Schaub et al. (2021); Yang et al. (2022a). The celebrated combinatorial Hodge decomposition arising from discrete calculus (Grady & Polimeni, 2010; Lim, 2020) provides a unique and characteristic decomposition of simplicial signals into three components. This is particularly intuitive for edge flows which allows their decomposition into gradient flows, curl flows and harmonic flows, that are, respectively, curl-free, divergence-free or both. These notions from discrete calculus interestingly allow us to capture some physical properties of the simplicial signals, such as the conservation laws (Grady & Polimeni, 2010). More importantly, this decomposition offers a tool to better analyze simplicial signals, as reported in statistical ranking problems, financial exchange markets (Jiang et al., 2011), traffic networks (Jia et al., 2019), brain networks (Anand et al., 2022) and game theory (Candogan et al., 2011). We hypothesize it will further promote better principled and effective learning methods on simplicial complexes.

Given this context, we reckon that the aforementioned works on simplicial neural networks mostly focus on the pure topological aspect of simplicial complexes. It lacks theoretical analyses of their learning capabilities from the Hodge spectral perspective. Also, since SCs are often built from data and are prone to estimation uncertainty, the learning on SCs benefits from a stability analysis to investigate their robustness against perturbations on the simplicial topologies. Thus, in this paper, after reviewing some background on simplicial complexes and simplicial signals in Section 2, we propose a more general and unified framework, namely, simplicial complex convolutional neural network (SCCNN), and we focus on the following three theoretical aspects.

Contributions

In Section 3 we introduce SCCNN and emphasize its three principles, namely, uncoupling the lower and upper simplicial adjacencies, accounting for the inter-simplicial couplings, and performing higher-order convolutions. We then use the Dirichlet energy minimization on SCs to understand how uncoupling the lower and upper adjacencies in Hodge Laplacians, as well as the inter-simplicial couplings can mitigate simplicial oversmoothing.

In Section 4, we characterize the spectral behavior of SCCNN and its expressive power under the help of spectral simplicial theory (Steenbergen, 2013; Barbarossa & Sardellitti, 2020; Yang et al., 2021). We show that an SCCNN performs independent and expressive learning in the three subspaces of the

Published in Transactions on Machine Learning Research (08/2025)

Hodge decomposition, which are invariant under its learning operators. This Hodge-awareness (or Hodge-aided bias) allows for effective and rational learning on SCs compared to MLPs or simplicial message-passing networks (Bodnar et al., 2021b).

In Section 5, we obtain a theoretical stability bound on the SCCNN outputs against small perturbations on the simplicial connections. This allows us to see how the three principles and other network factors can affect the stability, as well as the limitations of SCCNNs. This analysis in turn guides the design of convolutional architectures.

In Section 6, we validate our theoretical findings and highlight the effect of the three principles, the need for the Hodge-aware learning, as well as the stability, based on different simplicial tasks including recovering foreign currency exchange (forex) rates, predicting triadic and tetradic collaborations, and ocean current trajectories. Finally, we conclude the paper in Section 7 with a discussion on this work and its relations to existing works.

2 Background

We first review simplicial complexes and data supported on simplices, which are natural generalizations of the corresponding notions on graphs. Then, we introduce discrete calculus on simplicial complexes, which is linked to the incidence matrices. Finally, we discuss the Hodge decomposition, which uniquely characterizes the simplicial signals from three subspaces.

2.1 Simplicial complex and simplicial signals

Given a set V = {1, . . . , n0} of vertices, a k-simplex sk is a subset of V with cardinality k + 1. Geometrically, a node is a 0-simplex, an edge connecting two vertices is a 1-simplex and a triangular face (we shorten it as a triangle) is a 2-simplex. A subset, with cardinality k, of sk is a face. A coface of sk is a (k + 1)-simplex that has sk as a face. Furthermore, one can collect k-simplices for k = 0, . . . , K to form a simplicial complex (SC) S of order K with the inclusion restriction that if a simplex is in the SC, so are its subsets. A graph is an SC of order one and by including some triangles, we obtain an SC of order two, as shown in Fig. 1a. We denote the set of all k-simplices in S as Sk = {sk i }i=1,...,nk where nk = |Sk|, i.e., S = K k=0Sk.

Simplicial adjacency For any two k-simplices, we say they are lower (upper) adjacent if they share a common face (coface), which natualy defines the notion of simplicial neighborhoods. For example, two nodes are (upper) adjacent in a graph if they are connected by an edge. In Fig. 1a, edges e1 and e3 are lower neighbors as they share node 1; while e1 and e2 are upper neighbors since they are located in the triangle t1.

Orientation For computational purposes, we annotate each simplex with an orientation, as an ordering of the labels of its vertices (a node has a trivial orientation). Here we consider the increasing ordering as the reference orientation, that is, a triangle s2 = {i, j, k} is oriented as [i, j, k] for i < j < k, and an edge s1 = {i, j} is oriented as [i, j] for i < j.

Algebraic representation We use the incidence matrix Bk Rnk 1 nk to describe the relationships between (k 1)- and k-simplices. Thus, B1 encodes the node-to-edge incidence and B2 the edge-to-triangle incidence. In an oriented SC S, the entries of B1 and B2 are given by

1, for e = [i, ] 1, for e = [ , i] 0, otherwise. [B2]et =

1, for e = [i, j], t = [i, j, k] 1, for e = [i, k], t = [i, j, k] 1, for e = [j, k], t = [i, j, k] 0, otherwise.

We further define the k-Hodge Laplacian

Lk = B k Bk + Bk+1B k+1 (2)

Published in Transactions on Machine Learning Research (08/2025)

(a) SC (b) Edge flow (c) Gradient flow (d) Curl flow (e) Harmonic flow

Figure 1: (a) A simplicial 2-complex where green shaded triangles denote 2-simplices and the arrows denote the chosen reference orientations. (b) An edge flow where we denote its divergence (div) and curl in purple and orange, respectively. (c)-(d) The Hodge decomposition of the edge flow in (b). The gradient flow is the gradient of some node signal (in blue) and is curl-free. The curl flow can be obtained from some triangle flow (in red), and is div-free. The harmonic flow has zero div and zero curl, and is circulating around the hole {1, 3, 4}. Note that in this figure, the flow numbers are rounded up to two decimal places. Thus, at some nodes or triangles with zero-div or zero-curl, the div or curl might not be exactly zero.

with the lower Laplacian Lk,d = B k Bk and the upper Laplacian Lk,u = Bk+1B k+1. We have a set of Lk, k = 1, . . . , K 1 in an SC of order K with L0 = B1B 1 the graph Laplacian, and LK = B KBK. Topologically, Lk,d and Lk,u encode the lower and upper adjacencies of k-simplices, respectively. For example, L1,d encodes the edge-to-edge adjacencies through nodes while L1,u encodes the adjacencies through triangles.

2.2 Simplicial signals and Hodge decomposition

Simplicial signals A k-simplicial signal (or data) xk Rnk supported on the simplicial set Sk is defined by an alternating map fk : Sk Rnk, which assigns a real value to a simplex, with the condition that if the orientation of a simplex is anti-aligned with the reference orientation, then the signal will change the sign (Lim, 2020). For convenience, we call x0 a node signal. We also refer to a 1-simplicial signal x1 as an edge flow and a 2-simplicial signal x2 as a triangle flow. A d-dimensional simplicial feature Xk Rnk d can be defined for a rich representation learning of simplices. For simplicity, we restrict our analysis to d = 1.

Incidence matrices as derivatives on SCs Given a simplicial signal xk, we can measure its variability with respect to the faces and cofaces of k-simplices by computing Bkxk and B k+1xk (Grady & Polimeni, 2010). Specifically, B 1 x0 computes the gradient of a node signal x0 as the signal difference between the adjacent nodes, i.e., [B 1 x0][i,j] = [x0]j [x0]i, which is often used in the GNN literature. For an edge flow x1, B1x1 computes its divergence, which is the difference between the total in-flow and out-flow at node j, i.e., [B1x1]j = P

i<j[x1][i,j] P

j<k[x1][j,k]. Moreover, B 2 x1 computes the curl of x1, i.e., [B 2 x1]t = [x1][i,j] [x1][i,k] + [x1][j,k], which is the net-flow circulation in triangle t = [i, j, k]. As illustrated in Fig. 1b, these two computations provide divergent and rotational variation measures of an edge flow, which are analogous to the notions of divergence and curl for vector fields in continuous domains. In the following, we introduce the Hodge decomposition (Hodge, 1989; Lim, 2020) which unfolds an edge flow into three unique characteristic components.

Lemma 1 ((Lim, 2020)). We have B 2 B 1 = 0, i.e., the curl of the gradient is zero.

Theorem 2 (Hodge decomposition). The k-simplicial signal space Rnk admits a direct sum decomposition

Rnk = im(B k ) ker(Lk) im(Bk+1), thus, xk = xk,G + xk,H + xk,C, (3)

where xk,G = B k xk 1 for some xk 1, and xk,C = Bk+1xk+1 for some xk+1. Moreover, we have ker(B k+1) = im(B k ) ker(Lk) and ker(Bk) = ker(Lk) im(Bk+1).

In the node space, this decomposition is trivial as Rn0 = ker(L0) im(B1) where the kernel of L0 contains constant node data and the image of B1 contains nonconstant data. In the edge case, the three subspaces carry more tangible meaning: the gradient space im(B 1 ) collects edge flows as the gradient of some node signal, which are curl-free; the curl space im(B2) consists of flows cycling around triangles, which are div-free; and flows in the harmonic space ker(L1) are both divand curl-free. In this paper, we inherit the names

Published in Transactions on Machine Learning Research (08/2025)

(a) SC example (b) Lower edge conv. (c) Upper edge conv. (d) Inter-simplicial locality

Figure 2: (a) An SC where arrows indicate the reference orientations of edges and triangles. 2-simplices are (filled) triangles shaded in green and open triangle {1, 3, 4} is not in the SC. (b) Lower convolution via H1 and H1,d on edge e1. (c) Upper convolution via H1 and H1,u on e1. (d) Node 1 (in black) contains information from its neighbors {2, 3, 4} (nodes in red), and projected information from edges which contribute to these neighbors (denoted by arrows in red from edges to nodes), and from triangles {t1, t2, t3} which contribute to those edges (denoted by double arrows in red from triangle centers to edges). This interaction is the coupling between the intraand the extended inter-simplicial locality.

of three edge subspaces to general k-simplices. The above theorem states that any simplicial signal xk can be uniquely expressed as xk = xk,G + xk,H + xk,C with the gradient part xk,G = B k xk 1, the curl part xk,C = Bk+1xk+1, for some xk 1, and the harmonic part following Lkxk,H = 0. Figs. 1c to 1e provide the three Hodge components of the edge flow in Fig. 1b.

3 Simplicial Complex CNNs

We first introduce the general convolutional architecture on SCs, then discuss the propreties of SCCNN and study the effects of the three principles from an energy minimization perspective.

In an SC, taking xl 1 k 1, xl 1 k and xl 1 k+1 as inputs, an SCCNN at layer l = 1, . . . , L computes the k-simplicial output xl k via a map

SCCNNl k : {xl 1 k 1, xl 1 k , xl 1 k+1} xl k, xl k = σ(Hl k,dxl 1 k,d + Hl kxl 1 k + Hl k,uxl 1 k,u ), (4)

t=0 wk,d,t Lt k,d +

t=0 wk,u,t Lt k,u, Hk,d =

t=0 w k,d,t Lt k,d, Hk,u =

t=0 w k,u,t Lt k,u. (5)

Here, Hk denotes a simplicial convolutional filter (SCF, (Yang et al., 2022b)) with two sets of learnable coefficients {wk,d,t}, {wk,u,t}, while Hk,d and Hk,u are the lower and upper SCFs, respectively. Moreover, xl 1 k,d = B k xl 1 k 1 and xl 1 k,u = Bk+1xl 1 k+1 are the lower and upper projections from (k 1)-simplices via incidence relations to k-simplices, respectively, and σ( ) is an elementwise nonlinearity. The convolution operations in this SCCNN can be understood as follows: 1) The previous k-simplicial output xl 1 k is passed to an SCF Hl k of orders Td, Tu, which performs a linear combination of the signals from the lower-adjacent (up to Td-hop) and upper-adjacent (up to Tu-hop) simplices. 2) The previous k 1-simplicial outputs xl 1 k 1 are first projected to k-simplices, which are then convolved using a lower SCF and an upper SCF, respectively. Example 3. In Fig. 2, we provide an example of SCCNN for the edge case k = 1. We focus on edge e1 and consider the cases Td = Tu = 2. On edge e1, the SCF H1 linearly combines the signals from its direct lower neighbors (edges in blue) and two-hop lower neighbors (edges in purple), as shown in Fig. 2b. It also combines the signals from the direct upper neighbors (edges in red) and two-hop upper neighbors (edges in orange), as shown in Fig. 2c. At the same time, the signals on nodes are projected on the edges, denoted by arrows in blue and purple from nodes to edges in Fig. 2b, which are then combined to edge e1 by the lower SCF H1,d. The signals on triangles are projected on the edges as well, denoted by double arrows in red and orange in Fig. 2c, which are combined to edge e1 by the upper SCF H1,u.

This architecture subsumes the convolutional learning methods on SCs in Bunch et al. (2020); Ebli et al. (2020); Roddenberry et al. (2021); Yang et al. (2022a); Chen et al. (2022b); Yang et al. (2022c). We refer

Published in Transactions on Machine Learning Research (08/2025)

to Appendix C for a detailed discussion. Particularly, we here emphasize on the key three principles of an SCCNN layer:

(P1) It uncouples the lower and upper parts in the Hodge Laplacian. This leads to an independent treatment of the lower and upper adjacencies, achieved by using two sets of learnable weights. We shall see in Section 4 that how this relates to the independent and expressive learning in the Hodge subspaces given in Theorem 2.

(P2) It accounts for the inter-simplicial couplings via the incidence relations. The projections xk,d and xk,u carry nontrivial information contained in the faces and cofaces of simplices (by Theorem 2).

(P3) It performs higher-order convolutions. We consider Td, Tu 1 in SCFs which leads to a multi-hop receptive field on SCs.

In short, each SCCNN layer propagates information across SCs based on two simplicial adjacencies and two incidences in a multi-hop fashion.

3.1 Properties

Simplicial locality The simplicial convolutions admit an intra-simplicial locality where the output Hkxk is localized in Td-hop lower and Tu-hop upper k-simplicial neighborhoods (Yang et al., 2022b). An SCCNN preserves such property as σ( ) does not alter the information locality. It also admits an inter-simplicial locality between kand (k 1)-simplices due to the inter-simplicial couplings. This further extends to simplices of orders k l if L l because Bkσ(Bk+1) = 0 (Schaub et al., 2021). Moreover, the intraand inter-simplicial localities are coupled in a multi-hop way through higher-order convolutions such that, for example, a node not only interacts with its incident edges and the triangles including it, but also with those further hops away, as shown in Fig. 2d. We refer to Appendix B.1 for a more formal discussion.

Complexity An SCCNN layer has a parameter complexity of order O(Td + Tu) and a computational complexity O(k(nk + nk+1) + nkmk(Td + Tu)), which are linear in the simplex dimensions. Here, mk is the maximum number of neighbors for k-simplices. We refer to Appendix B.2 for more details.

Equivariance SCCNNs are permutation-equivairant, which allows us to list simplices in any order. They are also orientation-equivariant if the activation function σ( ) is odd, which gives us the freedom to choose reference orientations. In Appendix B.3, we provide a more formal discussion on such equivariances and why permutations form a symmetry group of an SC and orientation changes are symmetries of data spaces but not SCs.

3.2 A simplicial Dirichlet energy perspective

Here we analyze the convolution architecture in Eq. (4) from an energy minimization perspective. First, we extend the notion of Dirichlet energy from graphs to SCs.

Definition 4. The Dirichlet energy of a k-simplicial signal xk is

D(xk) = Dd(xk) + Du(xk) := Bkxk 2 2 + B k+1xk 2 2. (6)

This Dirichlet energy returns the graph Dirichlet energy when k = 0. In this case, D(x0) = B 1 x0 2 2 = P

j x0,i x0,j 2 is the ℓ2-norm of the gradient of the node signal x0. For edge flow x1, D(x1) consists of two parts, B1x1 2 2 and B 2 x1 2 2, which measure the total divergence and curl of x1, respectively, i.e., the edge flow variations w.r.t. nodes and triangles. In the general case, Dd(xk) and Du(xk) measure the lower and upper k-simplicial signal variations w.r.t. the faces and cofaces, respectively. A harmonic k-simplicial signal xk has zero Dirichlet energy, e.g., a constant node signal and a divand curl-free edge flow.

Published in Transactions on Machine Learning Research (08/2025)

Simplicial shifting as Hodge Laplacian smoothing Bunch et al. (2020); Yang et al. (2022c) considered Hk to be a weighted variant of I Lk, generalizing the graph convolutional network (GCN) (Kipf & Welling, 2017). This is necessarily a Hodge Laplacian smoothing as in Schaub et al. (2021) given an initial signal x0 k, we consider the Dirichlet energy minimization:

min xk Bkxk 2 2 + γ B k+1xk 2 2, γ > 0,

gradient descent: xl+1 k,gd = (I ηLk,d ηγLk,u)xl k (7)

with step size η > 0. The simplicial shifting xl+1 k = w0(I Lk)xl k is a gradient descent step with η = γ = 1 and weighted by w0. A minimizer of Eq. (7) with γ = 1 is in fact in the harmonic space ker(Lk). Thus, a neural network composed of simplicial shifting layers may generate an output with an exponentially decreasing Dirichlet energy as it deepens, formulated by the following proposition. We refer to this as simplicial oversmoothing, a notion that generalizes the oversmoothing of a GCN and its variants (Nt & Maehara, 2019;

Cai & Wang, 2020; Rusch et al., 2023). Proposition 5. If w2 0 I Lk 2 2 < 1, D(xl+1 k ) in a neural network of simplicial shifting layers exponentially converges to zero.

However, when uncoupling the lower and upper parts of Lk in this shifting, associated to the case γ = 1, the decrease of D(xk) can slow down or cease because the objective function in Eq. (7) instead looks for a solution primarily in either ker(Bk) (for γ 1) or ker(B k+1) (for γ 1), not necessarily in ker(Lk), as we shall corroborate in Section 6.

Inter-simplicial couplings as sources Given some nontrivial xk 1 and xk+1, we consider the optimization

min xk Bkxk xk 1 2 2 + B k+1xk xk+1 2 2,

gradient descent: xl+1 k,gd = (I ηLk)xl k + η(xk,d + xk,u) (8)

with step size η > 0. It resembles the convolutional layer, xl+1 k = w0(I Lk)xl k + w1xk,d + w2xk,u with some learnable weights, in Bunch et al. (2020); Yang et al. (2022c). Proposition 6. We have the following bounds for the Dirichlet energy: xk 1 2 2 + xk+1 2 2 D(xl+1 k ) w2 0 I Lk 2 2D(xl k) + w2 1λmax(Lk,d) xk,d 2 2 + w2 2λmax(Lk,u) xk,u 2 2.

The signal projections from the lower and upper simplices act as energy sources for xl k, and also the objective function in Eq. (8) looks for an xk in the image spaces of Bk+1 and B k , instead of ker(Lk). This ensues that D(xl+1 k ) may not converge to zero, but rather to a nontrivial value xk 1 2 2 + xk+1 2 2. Thus, inter-simplicial couplings pay a role in mitigating the oversmoothing.

Here we showed that simply generalzing GCNs to simplices will inherit its oversmoothing risks. However, by uncoupling the lower and upper Laplacians and accounting for the inter-simplicial couplings we could mitigate this issue. This can also be explained by means of a diffusion process on SCs (Ziegler et al., 2022), which we discuss in Appendix B.4.

4 From convolutional to Hodge-aware

In this section, we first introduce the Hodge-invariant operator, which is an operator such that the three Hodge subspaces are invariant under it. Then, we show that the SCF is such an operator and SCCNN, guided by the three principles (P1-P3), performs Hodge-invariant learning, allowing for rational and effective learning on SCs while remaining expressive. Throughout the exposition, we rely on the simplicial spectral theory (Barbarossa & Sardellitti, 2020; Yang et al., 2021; 2022b), which also allows us to characterize the expressive power of SCCNNs. We refer to the detailed derivations and proofs in Appendix E. Definition 7 (Invariant subspace). Let V be a finite-dimensional vector space over R with dim(V ) 1, and let T : V V be an operator in V. A subspace U V is an invariant subspace under T if Tu U for all u U, i.e., the image of every vector in U under T remains within U. We denote this as T|U : U U where T|U is the restriction of T on U.

Published in Transactions on Machine Learning Research (08/2025)

Given the notion of invariant subspace, we then define the Hodge-invariant operators.

Definition 8 (Hodge-invariant operator). Let {im(B k ), im(Bk+1), ker(Lk)} be any Hodge subspace of Rnk. A linear transformation F : Rnk Rnk is a Hodge-invariant operator if for all xk it holds that F(xk) . That is, any simplicial signal in a certain Hodge subspace remains in that subspace under F.

Definition 9 ((Barbarossa & Sardellitti, 2020)). The simplicial Fourier transform (SFT) of xk is xk = U k xk where the eigenbasis Uk of Lk acts as the simplicial Fourier basis and the eigenvalues in Λk = diag(λk) are simplicial frequencies.

Proposition 10 (Yang et al. (2022b)). The SFT basis can be found as Uk = [Uk,H Uk,G Uk,C] where

Uk,H is the eigenvector matrix associated to the zero eigenvalues Λk,H = diag(λk,H), named as harmonic frequencies, Uk,G is associated to the nonzero eigenvalues in Λk,G = diag(λk,G) of Lk,d, named as gradient frequencies, and Uk,C is associated to the nonzero eigenvalues in Λk,C = diag(λk,C), named as curl frequencies.

Moreover, they span the Hodge subspaces:

span(Uk,H) = ker(Lk), span(Uk,G) = im(B k ), span(Uk,C) = im(Bk+1) (9)

where span( ) denotes all possible linear combinations of columns of .

Remark 11. The frequency notion in general carries the physical meaning of signal variations. In the simplicial case, gradient frequencies reflect the degree of lower variations Dd(uk,G) of the associated gradient Fourier basis, and curl frequencies reflect the degree of upper variations Du(uk,C) of the associated curl basis. Harmonic frequencies (zeros) correspond to the basis having zero lower and upper variations. In the edge case, the gradient and curl frequencies, respectively, correspond to the total divergence and total curl, measuring how divergent and rotational the associated basis is (Yang et al., 2022b).

Proposition 12. The SCF Hk is a Hodge-invariant operator. That is, for any xk , we have Hkxk , for {im(B k ), im(Bk+1), ker(Lk)}. Moreover, the SCF operation can be implicitly written as

Hkxk = Hk|im(B k )xk,G + Hk|ker(Lk)xk,H + Hk|im(Bk+1)xk,C (10)

where Hk|im(B k ) = PTd t=1 wk,d,t Lt k,d + (wk,d,0 + wk,u,0)I is the restriction of Hk on the gradient space im(B k ), Hk|ker(Lk) = (wk,d,0 + wk,u,0)I is the restriction of Hk on the harmonic space, and Hk|im(Bk+1) = PTu t=0 wk,u,t Lt k,u + (wk,d,0 + wk,u,0)I is the restriction on the curl space.

Provided with the Hodge-invariance of Hk and the SFT, we can perform a spectral analysis, which is of interest to further understand the SCCNN since simplicial frequencies reflect the variation characteristics of simplicial signals.

4.1 Spectral analysis

Consider the SFT xk = [ x k,H, x k,G, x k,C] of xk where each component is the intensity of xk at a certain simplicial frequency. We can understand how an SCCNN convolutional layer yk = Hk,dxk,d+Hkxk+Hk,uxk,u regulates/learns from the simplicial signals at different frequencies by performing the SFT

yk,H = hk,H xk,H yk,G = hk,d xk,d + hk,G xk,G yk,C = hk,C xk,C + hk,u xk,u, (11)

with the elementwise multiplication. The nk-dimensional vector hk = diag(U k Hk Uk) = [ h k,H h k,G h k,C]

is the frequency response vector of Hk with

hk,H = (wk,d,0 + wk,u,0)1, hk,G = PTd t=0 wk,d,tλ t k,G + wk,u,01, hk,C = PTu t=0 wk,u,tλ t k,C + wk,d,01, (12)

Published in Transactions on Machine Learning Research (08/2025)

G, G, G, G C, C, C, C

Hodge Lap. smoothing

G, G, G, G G, G

10 2 dist node dist edge dist tri

0 10 3 10 2 10 1 100

10 2 dist node dist edge dist tri

Figure 3: (a) (top): Independent gradient and curl learning responses. (bottom): Stability-selectivity tradeoff of SCFs where h G has better stability but smaller selectivity than g G. (b) Information spillage of nonlinearity. (top): the SFT of an input with only gradient components. (bottom): the SFT of the output shows that after applying a nonlinearity the output also contains information in non-gradient frequencies. (c) The distance between the perturbed outputs and true when node adjacencies are perturbed. (top): L = 1, triangle output remains clean. (bottom): L = 2, triangle output is perturbed.

where t is the elementwise t-th power of a vector. Likewise,

hk,d = PTd t=0 w k,d,tλ t k,G + w k,u,01, and hk,u = PTu t=0 w k,u,tλ t k,C + w k,d,01 (13)

are the frequency response vectors of Hk,d and Hk,u. The spectral relation in Eq. (11) shows that the gradient SFT xk,G is learned by a gradient response hk,G, while the curl SFT xk,C is learned by a curl response hk,C. The two learnable responses are independent and they only coincide at the trivial harmonic frequency, as shown by the two individual curves in Fig. 3a. Moreover, the lower and upper projections are independently learned by hk,d and hk,u, respectively.

The elementwise nonlinearity induces an information spillage that one type of spectra could be spread over other types. As illustrated in Fig. 3b, the top figure shows the SFT of an input with only gradient components, and the bottom figure plots the SFT of σ(yk), showing that it also contains information in harmonic or curl subspaces. This results from the nonlinearity, since applying a linear SCF leads to an output with only gradient components. In the following, we characterize the expressive power of SCCNN.

Proposition 13. An SCCNN layer with inputs xk,d, xk, xk,u is at most expressive as an MLP σ(G k,dxk,d + Gkxk + G k,uxk,u) with Gk = Gk,d + Gk,u where Gk,d and G k,d are analytical matrix functions of Lk,d, while Gk,u and G k,u are analytical matrix functions of Lk,u. This expressivity can be achieved by setting Td = T d = nk,G and Tu = T u = nk,C in Eq. (4) with nk,G the number of distinct gradient frequencies and nk,C the number of distinct curl frequencies.

The proof follows from Cayley-Hamilton theorem (Horn & Johnson, 2012). This expressive power can be better understood from the spectral perspective The gradient SFT xk,G can be learned as expressive as by an analytical vector-valued function gk,G, which collects the eigenvalues of Gk,d at gradient frequencies. The curl SFT xk,C can be learned as expressive as by another analytical vector-valued function gk,C, which collects the eigenvalues of Gk,u at curl frequencies. These two functions need only to coincide at the harmonic frequency. In addition, the SFTs of lower and upper projections can be learned as expressive as by two independent analytical vector-valued functions as well.

4.2 Hodge-aware learning

Given the expressive power in Proposition 13 and the spectral relation in Eq. (11), we show that SCCNN performs a Hodge-aware learning in the following sense, which comes with advantages over the existing approaches.

Theorem 14. An SCCNN is Hodge-aware: 1) The SCF Hk is a Hodge-invariant learning operator. Specifically, three Hodge subspaces are invariant under Hk; 2) The lower SCF Hk,d and upper SCF Hk,u are,

Published in Transactions on Machine Learning Research (08/2025)

Figure 4: An illustration of the Hodge-aware learning of an SCCNN. We show that how an edge flow x1, together with the lower and upper projections x1,d and x1,u, are transformed by an SCCNN in the spectral domain. The implicit operation H1x1 (in the dashed box on the right) reflects the Hodge-aware learning: 1) H1 is Hodge invariant: each component is learned within their own subspace, and H1 does not mix up the three subspaces; 2) The learning in the gradient and curl subspaces are independent where features at shared frequencies λG,2 and λC,1 can be separately learned; and 3) The learning operators are expressive in the sense that the spectral responses are as expressive as any analytical functions in the gradient and curl frequencies.

respectively, gradientand curl-invariant learning operators; 3) The learnings in the gradient and curl spaces are independent; And 4) the learnings in the gradient and curl spaces are expressive as in Proposition 13.

This theorem shows that an SCCNN performs an expressive and independent learning in the gradient and curl subspaces from the three inputs while preserving the three subspaces to be invariant w.r.t its learnable SCFs. This allows for the rational and effective learning on SCs, as illustrated in Fig. 4, from the two aspects. These three-fold properties of an SCCNN, respectively, come from the convolutional architecture choice, the uncoupling of the lower and upper adjacencies, and the higher-order convolutions in the SCCNN.

On the one hand, Proposition 12 shows that the operation of Hk on the simplicial signal space is equivalent to a summation of its restrictions Hk| on three smaller subspaces . This Hodge-invariant nature of the learnable SCFs substantially shrinks the learning space of an SCCNN and allows for an effective learning. On the other hand, simplicial signals often present implicit or explicit properties that different Hodge subspaces can capture. For example, water flows, traffic flows, electrical currents (Grady & Polimeni, 2010; Jia et al., 2019) follow flow conservation, i.e., being div-free in the gradient space ker(B1), while exchange rates can be modelled as curl-free edge flows (Jiang et al., 2011). Owing to the Hodge-invariance of H1 and its independent learning in the nontrivial subspaces, an SCCNN can capture such characteristics of real-world edge flows effectively. When it comes to regression tasks on SCs, an SCCNN can generate outputs respecting these physicial laws.

Remark 15 (Relation to message passing networks). Message-passing simplicial networks (MPSNs) (Bodnar et al., 2021b) using MLP to aggregate and update are non-Hodge-aware. Their learning functions pursue direct mappings between the much larger signal space Rnk, thus, requiring more training data for accurate learning, as well as a larger computational complexity. Moreover, MPSN does not preserve the Hodge subspaces, i.e., it is not Hodge-invariant. Thus, they might generate outputs with small losses (e.g., mean-squared-errors) in regression tasks, yet not respecting the physical laws being either divor curl-free properties such as the above simplicial signals. We shall corroborate this in Appendix G.

Published in Transactions on Machine Learning Research (08/2025)

Remark 16 (Relation to other convolutional methods). While most convolutional networks on SCs use Hodgeinvariant learning operators, they are not strictly Hodge-aware, resulting in practical limits. For example, Ebli et al. (2020) considered Hk = P

i wi Li k, which preserves the Hodge subspaces yet does not uncouple the lower and upper parts of Lk. This makes it strictly less expressive and non-Hodge-aware. Consider two frequencies λG = λC which share a common value but correspond to the gradient and curl subspaces, respectively. The simplicial signal components at these two frequencies are always learned in the same fashion, which induces contradicting issues when the underlying component in one subspace should be diminished while the one in the other subspace should be preserved. This underlines the importance of uncoupling the two adjacencies because the lower and upper Laplacians operate in different subspaces. Roddenberry et al. (2021) applied Hk with Td = Tu = 1. Spatially, this limits the receptive field of each simplex to its direct neighbors. Spectrally, it leads to a linear learning frequency response. A similar treatment was considered in Bunch et al. (2020); Yang et al. (2022c) which simply generalized the GCN without uncoupling the two adjacencies, and gave a limited low-pass linear spectral response, as shown in Fig. 3a and discussed in Section 3.2.

5 How robust are SCCNNs to domain perturbations?

In practice, an SCCNN is often built on a weighted SC to capture the strengths of simplicial adjacencies and incidences. We defer the explicit formulations in Appendix F.1 since it has the same form as Eq. (4) in this case, except for that the Hodge Laplacians are weighted, as well as the incidence matrices. These matrices are often defined following Grady & Polimeni (2010); Horak & Jost (2013); Guglielmi et al. (2023). For example, Bunch et al. (2020); Yang et al. (2022c) considered a particular random walk formulation (Schaub et al., 2020). They can also be learned from data, e.g., via an attention method (Goh et al., 2022; Giusti et al., 2022). For the weighted incidence matrices B k , Bk+1, we use operators Rk,d, Rk,u in this section.

To highlight the need for a stability analysis, note that, on the one hand, we may lack the true underlying topologies in SCs as they are often estimated from noisy data; and we may undergo adversarial attacks on the topologies. On the other hand, we want to characterize the stability-selectivity tradeoff of SCCNN, in analogy to the study for CNNs (Bruna & Mallat, 2013; Qiu et al., 2018; Bietti & Mairal, 2017) and GNNs (Gama et al., 2019b; 2020a; Kenlay et al., 2021; Parada-Mayorga et al., 2022).

This motivates us to investigate the stability of SCCNN: how far are the outputs of an SCCNN before and after perturbations are applied to SCs? We consider the following relative perturbation model, generalizing the graph perturbation model in Gama et al. (2019b)

Definition 17 (Relative perturbation). Consider some perturbation matrix of an appropriate dimension. For the weighted Hodge Laplacian Lk,d, its relative perturbed version is b Lk,d = Lk,d + Ek,d Lk,d + Lk,d Ek,d with perturbation Ek,d; likewise for b Lk,u by Ek,u. For the weighted incidence matrix Rk,d, its relative perturbed version is b Rk,d = Rk,d + Jk,d Rk,d with perturbation Jk,d; likewise for b Rk,u by Jk,u.

This models the domain perturbations on the strengths of adjacent and incident relations, e.g., a large weight is applied when two edges are weakly or not adjacent, or data on a node is projected on an edge not incident to it. Moreover, this quantifies the relative perturbations with respect to the local simplicial topology in the sense that weaker connections in an SC are deviated by perturbations proportionally less than stronger connections. We further define the integral Lipschitz property of spectral filters to measure the variability of spectral response functions of Hk.

Definition 18 (Intergral Lipschitz SCF). An SCF Hk is integral Lipschitz with constants ck,d, ck,u 0 if the derivatives of its spectral response functions hk,G(λ) and hk,C(λ) follow that |λ h k,G(λ)| ck,d and |λ h k,C(λ)| ck,u.

This property provides a stability-selectivity tradeoff of SCFs independently in the gradient and curl frequencies. A spectral response can have both a good selectivity and stability in small frequencies (a large | h k, | for λ 0), while it tends to be flat for having better stability at the cost of selectivity (a small variability for large λ) in large frequencies, as shown in Fig. 3a. As of the polynomial nature of responses, all SCFs of SCCNN are integral Lipschitz. We also denote the integral Lipschitz constant for the lower SCFs Hk,d by ck,d and for the upper SCFs Hk,u by ck,u without loss of generality.

Published in Transactions on Machine Learning Research (08/2025)

Under the following assumptions, we now characterize the stability bound of SCCNN.

Assumption 19. The perturbations are small such that Ek,d 2 ϵk,d, Jk,d 2 εk,d, Ek,u 2 ϵk,u and Jk,u 2 εk,u, where A 2 = max|x|1=1 Ax 2 is the operator norm (spectral radius) of a matrix A.

Assumption 20. The SCFs Hk of an SCCNN have a normalized bounded frequency response (for simplicity, though unnecessary), likewise for Hk,d and Hk,u.

Assumption 21. The lower and upper projections are finite such that Rk,d 2 rk,d and Rk,u 2 rk,u.

Assumption 22. The nonlinearity σ( ), e.g., relu, tanh, sigmoid, is cσ-Lipschitz with cσ 0.

Assumption 23. The initial inputs x0 k, for all k, are finite, such that x0 k 2 [β]k. We collect them in β = [β0, . . . , βK] .

Theorem 24. Let x L k be the k-simplicial signal output of an L-layer SCCNN on a weighted SC. Let ˆx L k be the output of the same SCCNN but on a relatively perturbed SC. Define δk,d = ( Vk,d Uk + 1)2 1 and δk,u = ( Vk,u Uk + 1)2 1, with Vk,d and Vk,u the eigenvectors of Ek,d and Ek,u, which measure the eigenvector misalignments between the perturbations and Laplacians. Under Assumptions 19 to 23, the Euclidean distance between the two outputs is finite and upper-bounded

ˆx L k x L k 2 [d]k with d = c L σ

l=1 b Zl 1T ZL lβ, (14)

where for K = 2,

t0 t0,u t1,d t1 t1,u t2,d t2

1 r0,u r1,d 1 r1,u r2,d 1

1 ˆr0,u ˆr1,d 1 ˆr1,u ˆr2,d 1

with ˆrk,d = rk,d(1+εk,d) and ˆrk,u = rk,u(1+εk,u). Notice that T , Z and b Z are tridiagonal and follow a similar structure for a general K. The diagonal entries of T are tk = ck,d k,dϵk,d + ck,u k,uϵk,u. The off-diagonal entries are tk,d = rk,dεk,d+ck,d k,dϵk,drk,d and tk,u = rk,uεk,u+ck,u k,uϵk,urk,u, where k,d = 2(1+δk,d) nk and k,u = 2(1 + δk,u) nk.

We refer to Appendix F.2 for a two-step proof. This result bounds the outputs of an SCCNN on all simplicial levels, showing that they are stable to small perturbations on the simplicial adjacencies and incidences. Specifically, we make two observations from the complex expression. First, the stability bound depends on i) the degree of perturbations including their magnitudes ϵk, and εk, , and the eigenspace misalignment δk, ; ii) the number of simplices nk; iii) the integral Lipschitz properties ck, of SCFs; and, iv) the degree of projections rk, . Second, the stability of the k-output depends on the factors related to not only k-simplices, but also simplices of adjacent orders due to inter-simplicial couplings. For example, when L = 1, the node output bound d0 is affected by factors in the node space, as well as the edge space factored by the projection degree. As the layer deepens, this mutual dependence expands further. When L = 2, the factors in the triangle space also affect the stability of the node output d0, as we observe in Fig. 3c.

More importantly, this stability bound provides intuitive practical implications for convolutional learning on SCs. While inter-simplicial couplings may be beneficial, SCCNN becomes less stable as the number of layers increases due to the mutual dependence between outputs on different simplicial levels. Thus, to maintain the expressive power, we expect to use higher-order SCFs in exchange for shallow layers. This yet does not harm the stability in the following two aspects:

First, high-frequency components can be spread over the low frequencies due to the information spillage of nonlinearity [cf. Fig. 3b] where the spectral responses are more selective and have better stability. If the signal has large high gradient frequency components, we need the SCCNN to be selective in high gradient frequencies. However, to guarantee the stability, the frequency response should be smooth (less selective) in these frequencies, as illustrated by h G in Fig. 3a (bottom). This selectivity-stability tradeoff can be mitigated by the nonlinearity the information in high gradient frequencies could spill over in lower frequencies, where the spectral responses are more selective and have better discriminating ability.

Published in Transactions on Machine Learning Research (08/2025)

Second, higher-order SCFs are easier to be learned with smaller integral Lipschitz constants than lower-order ones due to the increased degree of freedom, thus, leading to an increased stability. This can be easily seen by comparing one-order and two-order cases. We experimentally investigate this in Section 6.4. Moreover, we introduce the following regularizer to the loss function during training so to promote the integral Lipschitz property.

r IL = λk,G h k,G(λk,G) + λk,C h k,C(λk,C) =

t=0 twk,d,tλt k,G

t=0 twk,u,tλt k,C

for λk,G {λk,G,i}nk,G i=1 and λk,C {λk,C,i}nk,C i=1 , which are the gradient and curl frequencies. To avoid computing the eigendecomposition of the Hodge Laplacian, we can approximate the true frequencies by sampling certain number of points in the frequency band (0, λk,G,m] and (0, λk,C,m] where the maximal gradient and curl frequencies can be computed by efficient algorithms, e.g., power iteration.

6 Experiments

The goal of this section is to answer the following four research questions with experiments on various simplicial-level regression and classification tasks:

RQ 1 What are the effects of the three principles of SCCNN, i.e., uncoupling the lower and upper parts of Hodge Laplacians (P1), the inter-simplicial couplings (P2), and higher-order convolutions (P3)? RQ 2 How do the uncoupling of the lower and upper parts of Hodge Laplacians and the inter-simplicial couplings affect the simplicial oversmoothing? RQ 3 How does the Hodge-aware property of SCCNN play a role in different tasks on SCs, compared to non-Hodge-aware methods? RQ 4 How do different factors affect the stability of SCCNN, and how can we maintain the stability while keeping the expressive power?

For comparison, we consider the following learning methods on single-level simplices:

simplicial neural network (SNN) (Ebli et al., 2020), which does not respect P1 and P2 and is non-Hodge-aware; principled simplicial neural network (PSNN) (Roddenberry et al., 2021), which does not respect P2 and P3 and is non-Hodge-aware; simplicial convolutional neural networks (SCNN)1 (Yang et al., 2022a), which does not respect P2 but is Hodge-aware;

and the following learning methods on simplicial complexes:

Bunch (Bunch et al., 2020), which does not respect P1 and P2 and is non-Hodge-aware; MPSN (Bodnar et al., 2021b), which is based on message-passing and not Hodge-aware.

We also considered the MLP and standard GNN (Defferrard et al., 2016) as baselines to highlight the effect of SC topology on simplicial-level tasks. We refer to Appendix C for the detailed comparisons between these methods and SCCNN, as well as to Appendix G for the full experimental details.

6.1 Foreign currency exchange (RQs 1, 3)

In forex problems, to build a fair market, the arbitray-free condition implies that for any currencies i, j, k, it follows that ri/jrj/k = ri/k where ri/j is the exchange rate between i and j. That is, the exchange path

1Note that the difference between SCCNN and SCNN lies in that the latter does not include the inter-layer projections, as detailed in Appendix C thus, we refer to our method, simplicial complex CNN.

Published in Transactions on Machine Learning Research (08/2025)

Table 1: Forex results (nmse|total arbitrage, ).

Methods Random Noise Curl Noise Interpolation

Input 0.119 0.004|29.19 0.874 0.552 0.027|122.4 5.90 0.717 .030|106.4 0.902 Baseline (ℓ2 regularizer) 0.036 0.005|2.29 0.079 0.050 0.002|11.12 0.537 0.534 0.043|9.67 0.082 SNN 0.110 0.005|23.24 1.03 0.446 0.017|86.95 2.20 0.702 0.033|104.74 1.04 PSNN 0.008 0.001|0.984 0.170 0.000 0.000|0.000 0.000 0.009 0.001|1.13 0.329 MPSN 0.039 0.004|7.74 0.88 0.076 0.012|14.92 2.49 0.117 0.063|23.15 11.7

SCCNN, id 0.027 0.005|0.000 0.000 0.000 0.000|0.000 0.000 0.265 0.036|0.000 0.000 SCCNN, tanh 0.002 0.000|0.325 0.082 0.000 0.000|0.003 0.003 0.003 0.002|0.279 0.151

i j k provides no profit or loss over a direct exchange i k. Following Jiang et al. (2011), we model exchange rates as edge flows in an SC of order two, specifically, via [x1][i,j] = log(ri/j). This conveniently translates the arbitrage-free condition into x1 being curl-free, i.e., [x1][i,j] + [x1][j,k] [x1][i,k] = 0 in any triangle [i, j, k]. We consider a real-world forex market at three timestamps, which contains certain degree of arbitrage (Jia et al., 2019; Yang et al., 2024). We focus on recovering a fair market in two scenarios, first, denoising from noisy exchange rates where random noise and noise only in the curl space modelling random arbitrage ( curl noise ) are added, and second, interpolation, when only 50% of the total rates are observed. To evaluate the performance, we measure the normalized mean squared error (nmse) and total arbitrage (total curl), both equally important for achieving a fair market.

From Table 1, we make the following observations on the impacts of P1 and P3, as well as the Hodge-awareness.

1) MPSN performs poorly at this task: although it reduces the nmse, it outputs unfair rates with large arbitrage, against the forex principle, because it is not Hodge-aware and unable to capture the arbitrage-free property with small amount of data (cf. Remark 15). 2) SNN performs poorly as well: as discussed in Remark 16, it restricts the gradient and curl spaces to be always learned in the same fashion and makes it impossible to perform disjoint learning in two subspaces. However, since there are eigenvalues which share a common value but live in different subspaces in this SC, it requires preserving the gradient component while removing the curl one here. 3) PSNN can reconstruct relatively fair forex rates with small nmse. The reconstruction from curl noise is perfect, while in the other two cases, the nmse and arbitrage are three times larger than the proposed SCCNN due to the limited expressivity of linear learning responses. 4) SCCNN performs the best in both reducing the total error and the total arbitrage, ultimately, corroborating the impact of performing Hodge-aware learning.

We notice that with an identity activation function (σ = id), the arbitrage-free rule is fully learned by an SCCNN. However, it has relatively large errors in the random noise and interpolation cases due to its limited linear expressive power. With a nonlinearity σ = tanh, an SCCNN can tackle these more challenging cases, finding a good compromise between overall errors and data characteristics.

6.2 Simplicial oversmoothing analysis (RQ 2)

1 10 100 Layer

shift,node with proj.,node shift,edge with proj.,edge uncouple lower/upper,edge shift,tri. with proj.,tri.

Figure 5: Simplicial oversmoothing.

We use simplicial shifting layers (i.e., Eq. (7) composed with σ = tanh) to illustrate the evolution of Dirichlet energies of the outputs on nodes, edges and triangles in an SC of order two with respect to the number of layers. The corresponding inputs are randomly sampled from a uniform distribution U([ 5, 5]). Fig. 5 (the dashed lines) shows that simply generalizing the GCN on SCs as in Bunch method could lead to oversmoothing on simplices of all orders. This aligns with our theoretical results in Section 3.2. However, uncoupling the lower and upper parts of L1 (e.g., by setting γ = 2 in Eq. (7)) could mitigate

Published in Transactions on Machine Learning Research (08/2025)

Table 2: Simplex prediction (AUC, ).

MLP GNN SNN PSNN SCNN Bunch MPSN SCCNN

2-simplex 68.5 1.6 93.9 1.0 92.0 1.8 95.6 1.3 96.5 1.5 98.3 0.5 98.1 0.5 98.7 0.5 3-simplex 69.0 2.2 96.6 0.5 95.1 1.2 98.1 0.5 98.3 0.4 98.5 0.5 99.2 0.3 99.4 0.3

Table 3: Ablation study on different components in an SCCNN: the best results with the corresponding hyperparameters, and the results under the same hyperparameters.

Missing component 2-simplex Hyper Params. 2-simplex Hyper Params.

98.7 0.5 L = 2, T = 2 98.7 0.5 L = 2, T = 2 Edge-to-Node 93.9 1.0 L = 5, T = 2 89.8 1.9 L = 2, T = 2 Node-to-Node 98.7 0.4 L = 4, T = 2 96.1 0.9 L = 2, T = 2 Edge-to-Edge 98.5 1.0 L = 3, T = 2 97.4 0.8 L = 2, T = 2 Node-to-Edge 98.8 0.3 L = 4, T = 2 98.5 0.5 L = 2, T = 2

Node input 98.2 0.5 L = 2, T = 4 98.1 0.7 L = 2, T = 2 Edge input 98.1 0.4 L = 2, T = 3 97.2 0.9 L = 2, T = 2

the oversmoothing on edges, as shown by the dotted line. Lastly, when we account for the inter-simplicial coupling, as shown by the solid lines (where we applied Eq. (8)), it could almost prevent the oversmoothing, since it provides energy sources. We refer to Appendix G.1 for more results.

6.3 Simplex prediction (RQs 1, 3-4)

We consider the prediction task of 2and 3-simplices which extends the link (1-simplex) prediction in graphs. Our approach is to first learn the representations of lower-order simplices and then use an MLP with their concatenation as inputs to identify if a simplex is closed or open, which generalizes the link prediction method of Zhang & Chen (2018). Considering a coauthorship dataset (Ammar et al., 2018), we built an SC following Ebli et al. (2020) where nodes represent authors and (k 1)-simplices thus represent the collaborations of k-authors. The input simplicial signals are the numbers of citations, e.g., x1 and x2 are those of dyadic and triadic collaborations. Thus, 2-simplex (3-simplex) prediction amounts to predicting triadic (tetradic) collaborations. We evaluate the AUC (area under the curve) performance.

From Table 2, we make three observations on the effect of the three key principles. 1) SCCNN, MPSN and Bunch methods outperform the ones without inter-simplicial couplings. This highlights that accounting for contributions from faces and cofaces increases the representation power of the network. 2) SCNN performs better than an SNN, which shows that uncoupling the lower and upper parts in Lk improves the representation learning. 3) SCCNN performs better than Bunch (similarly, SCNN better than PSNN), showing that higherorder convolution further improves predictions. 4) While MPSN performs similar to SCCNN, it has three times more parameters than an SCCNN (Appendix G.3.6) under the settings of the best results.

Ablation study We then perform an ablation study to investigate the roles of different components in an SCCNN. As reported in Table 3, we remove certain simplicial relations in the SCCNN and evaluate the prediction performance. Without the edge-to-node incidence, when inputting the node features to the MLP predictor, it is equivalent to a GNN, which has a poor performance. When removing other adjacencies or incidences, the best performance remains similar but with an increased model complexity (more layers required). This however is not preferred, because the stability decreases as the architecture deepens and the model gets influenced by factors in other simplicial spaces, as discussed in Section 5 and shown in Fig. 3c. When keeping the hyperparameters fixed, the performance decreases more severely. We also consider the case with limited input where the input on nodes or on edges is missing. The best performance of an SCCNN only slightly drops with an increase of the convolution order.

Published in Transactions on Machine Learning Research (08/2025)

0 10 3 10 2 10 1 100 ϵ0

10 2 dist node dist edge dist tri

(a) Node pert.L = 1

0 10 3 10 2 10 1 100 ϵ0

dist node dist edge dist tri

(b) Node pert.L = 2

0 10 3 10 2 10 1 100 ϵ1

10 1 dist node dist edge dist tri

(c) Edge pert.L = 1

0 10 3 10 2 10 1 100 ϵ2

10 1 dist node dist edge dist tri

(d) Tri. pert.L = 1

0 10 3 10 2 10 1 100 ϵ2

dist node dist edge dist tri

(e) Tri. pert.L = 2

Figure 7: The relative difference of SCCNN outputs on simplices of different orders when different levels of perturbations are applied to only nodes, edges and triangles and the number of layers varies.

6.3.1 Stability analysis (RQ 4)

0 10 3 10 2 10 1 100

x L k ˆx L k

Node Edge Tri.

Node thm. Edge thm. Tri. thm.

Figure 6: Stability bounds.

Stability bounds To investigate the stability bound in Eq. (14), we add perturbations to relatively shift the eigenvalues of the Hodge Laplacians and the singular values of the projection matrices by ϵ [0, 1] (cf. Assumption 19). We compare the bound in Eq. (14) to the experimental ℓ2 distance on each simplex level. As shown in Fig. 6 where the dashed lines are the theoretical stability bounds whereas the solid ones are the experimental stability bounds, we see the bounds become tighter as perturbation increases.

Stability dependence across simplices For 2-simplex prediction of K = 2, we measure the distance between the simplicial outputs of SCCNN with and without perturbations on nodes, edges, and triangles, i.e., x L k ˆx L k / x L k , for k = 0, 1, 2. Fig. 7 shows that overall the stabilities of different simplicial outputs are dependent on each other. Specifically, we see that the triangle output is not influenced by the perturbation on node weights until L = 2; likewise, the node output is not influenced by the perturbations on triangle weights when L = 1. Also, perturbations on the edge weights will perturbe the outputs on nodes, edges, triangles when L = 1. This corroborates our discussions in Section 5.

Effect of number of simplices. We see that the same level of perturbations added to different simplices leads to different degrees of instability, owing to the effect of nk on stability in Eq. (14). Since n0 < n1 < n2, the perturbations on node weights cause less instability than those on edge and triangle weights.

Effect of number of layers. As the number of layers increases, Fig. 7 also shows that the stability of SCCNN degrades, which corresponds to our analysis of using shallow layers.

6.4 Trajectory prediction (RQ 1, 4)

We consider the task of predicting trajectories in a synthetic SC and ocean drifters from Schaub et al. (2020), following Roddenberry et al. (2021). From Table 4, we first observe that the SCCNN and Bunch methods do not always perform better than those without inter-simplicial couplings. This is because zero inputs are applied on nodes and triangles following Roddenberry et al. (2021), which makes inter-simplicial couplings inconsequential. Secondly, an SCCNN performs better than Bunch on average, and SCNN better than PSNN, showing the advantages of higher-order convolutions. Note that the prediction here aims to find the best candidate from the neighborhood of the end node, which depends on the node degree. Since the average node degree of the synthetic SC is 5.24 and that in the ocean drifter data is 4.81, a random guess has around 20% accuracy. The high standard derivations may result from the limited ocean drifter data size.

6.4.1 Stability analysis (RQ 4)

Different from Section 6.3.1, we further investigate the stability in terms of the integral Lipschitz properties and convolutional orders. We consider SCNNs (Yang et al., 2022a) with orders Td = Tu {1, 3, 5} and train them with regularizations on the integral Lipschitz constants. As shown in Fig. 8, the higher-order case has

Published in Transactions on Machine Learning Research (08/2025)

Table 4: Trajectory prediction (accuracy, ).

Methods Synthetic trajectories Ocean drifters

SNN 65.5 2.4 52.5 6.0 PSNN 63.1 3.1 49.0 8.0 SCNN 67.7 1.7 53.0 7.8 Bunch 62.3 4.0 46.0 6.2 SCCNN 65.2 4.1 54.5 7.9

T = 1 T = 3 T = 5

0.00 0.25 0.50 0.75 1.00 ϵ

x1 ˆx1 / x1

T = 1 T = 3 T = 5

Figure 8: Stability and accuracy of SCCNN versus convolutional orders.

better stability (smaller ℓ2 distance between the outputs without and with perturbations) and consistent better accuracy, compared to the lower-order case. This is because the additional flexibility in the higher-order case allows the filters to have better integral Lipschitz properties, thus, better stability, while maintaining the accuracy. Meanwhile, we also study the effect of the regularizer in Eq. (16) on improving the stability while not losing the performance. We refer to Appendix G.4.4 for the detailed experimental analysis where we numerically measure the improved integral Lipschitz properties with this regularizer.

7 Related Works, Discussions and Conclusion

Related works Our work is mainly related to learning methods on SCs. Roddenberry & Segarra (2019) first used L1,d to build neural networks on edges in a graph setting without the upper edge adjacency. Ebli et al. (2020) then generalized convolutional GNNs (Kipf & Welling, 2017; Defferrard et al., 2016) to simplices by using the Hodge Laplacian. Roddenberry et al. (2021); Yang et al. (2022a) instead uncoupled the lower and upper Laplacians to perform oneand multi-order convolutions, to which Goh et al. (2022); Giusti et al. (2022); Lee et al. (2022) added attention schemes. Keros et al. (2022) considered a variant of Roddenberry et al. (2021) to identify topological holes and Chen et al. (2022b) combined shifting on nodes and edges for link prediction. These works learned within a simplicial level and did not consider the incidence relations (inter-simplicial couplings) in SCs, which was included by Bunch et al. (2020); Yang et al. (2022c). These works considered convolutional-type methods, which can be subsumed by SCCNNs. Meanwhile, Bodnar et al. (2021b); Hajij et al. (2021) generalized the message passing on graphs (Xu et al., 2018a) to SCs, relying on both adjacencies and incidences. Most of these works focused on extending GNNs to SCs by varying the information propagation on SCs with limited theoretical insights into their components. Among them, Roddenberry et al. (2021) discussed the equivariance of PSNN to permutation and orientation, which SCCNNs admit as well. Bodnar et al. (2021b) studied the messgae-passing on SCs in terms of WL test of SCs built by completing cliques in a graph. The more closely related work is Yang et al. (2022a), which gave only a spectral formulation based on SCFs but not SCCNNs. We refer to Papamarkou et al. (2024); Besta et al. (2024) for an overview of the current progress on learning on SCs.

Discussions In our opinion, the advantage of SCs is not only about them being able to model higherorder network structures, but also support simplicial data, which can be both human-generated data like coauthorship, and physical data like flow-type data. This is why we approached the analysis from the perspectives of both simplicial structures and the simplicial data, i.e., the Hodge theory and spectral simplicial theory (Hodge, 1989; Lim, 2020; Yang et al., 2021; Barbarossa & Sardellitti, 2020; Steenbergen, 2013; Yang et al., 2022b; Govek et al., 2018). We provided insights into why the three principles (P1-P3) are needed and how they can guide the effective and rational learning from simplicial data. As we have practically found, SCCNNs perform well in applications where data exhibits properties characterized by the Hodge decomposition due to the Hodge-awareness, while non-Hodge-aware learners fail at giving rational results. In cases where data does not possess such properties, SCCNNs have better or comparable performance than the ones which violate or do not respect the three principles.

Published in Transactions on Machine Learning Research (08/2025)

Concurrently, there are works on more general cell complexes, e.g., (Hajij et al., 2020; 2022; Sardellitti et al., 2021; Roddenberry et al., 2022; Bodnar et al., 2021a), where 2-cells inlcude not only triangles, but also general polygon faces. We focus on SCs because a regular cell complex can be subdivided into an SC (Lundell et al., 1969; Grady & Polimeni, 2010) to which the analysis in this paper applies, or we can generalize our analysis by allowing B2 to include 2-cells. This is however informal and does not exploit the power of cell complexes, which relies on cellular sheaves, as studied in (Hansen & Ghrist, 2019; Bodnar et al., 2022).

Conclusion We proposed three principles (P1-P3) for convolutional learning on SCs, summarized in a general architecture, SCCNNs. Our analysis showed this architecture, guided by the three principles, demonstrates an awareness of the Hodge decomposition and performs rational, effective and expressive learning from simplicial data. Furthermore, our study reveals that SCCNNs exhibit stability and robustness against perturbations in the strengths of simplicial connections. Experimental results validate the benefits of respecting the three principles and the Hodge-awareness, as well as the stability results. Overall, our work establishes a solid theoretical fundation for convolutional learning on SCs, highlighting the importance of the Hodge theorem.

Reproducibility Statement

We refer to Learning_on_SCs for the reproducibility of our experiments. We also note that the proposed architecture is implemented in the Topo Model X framework (Hajij et al., 2024).

Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, et al. Construction of the literature graph in semantic scholar. ar Xiv preprint ar Xiv:1805.02262, 2018. Cited on pages 15 and 40.

D Vijay Anand, Soumya Das, and Moo K Chung. Hodge-decomposition of brain networks. ar Xiv preprint ar Xiv:2211.10542, 2022. Cited on pages 2 and 28.

Sergio Barbarossa and Stefania Sardellitti. Topological signal processing over simplicial complexes. IEEE Transactions on Signal Processing, 68:2992 3007, 2020. Cited on pages 2, 7, 8, 17, and 25.

Claudio Battiloro, Lucia Testa, Lorenzo Giusti, Stefania Sardellitti, Paolo Di Lorenzo, and Sergio Barbarossa. Generalized simplicial attention neural networks. ar Xiv preprint ar Xiv:2309.02138, 2023. Cited on page 29.

Federico Battiston, Giulia Cencetti, Iacopo Iacopini, Vito Latora, Maxime Lucas, Alice Patania, Jean-Gabriel Young, and Giovanni Petri. Networks beyond pairwise interactions: structure and dynamics. Physics Reports, 874:1 92, 2020. Cited on page 1.

Austin R Benson, Rediet Abebe, Michael T Schaub, Ali Jadbabaie, and Jon Kleinberg. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences, 115(48):E11221 E11230, 2018. Cited on pages 1, 2, and 40.

Austin R Benson, David F Gleich, and Desmond J Higham. Higher-order network analysis takes off, fueled by classical ideas and new data. ar Xiv preprint ar Xiv:2103.05031, 2021. Cited on page 1.

Maciej Besta, Florian Scheidl, Lukas Gianinazzi, Shachar Klaiman, Jürgen Müller, and Torsten Hoefler. Demystifying higher-order graph neural networks. ar Xiv preprint ar Xiv:2406.12841, 2024. Cited on pages 2 and 17.

Christian Bick, Elizabeth Gross, Heather A Harrington, and Michael T Schaub. What are higher-order networks? ar Xiv preprint ar Xiv:2104.11329, 2021. Cited on page 2.

Alberto Bietti and Julien Mairal. Group invariance, stability to deformations, and complexity of deep convolutional representations. ar Xiv preprint ar Xiv:1706.03078, 2017. Cited on page 11.

Published in Transactions on Machine Learning Research (08/2025)

Cristian Bodnar, Fabrizio Frasca, Nina Otter, Yu Guang Wang, Pietro Liò, Guido F Montufar, and Michael Bronstein. Weisfeiler and lehman go cellular: Cw networks. Advances in Neural Information Processing Systems, 34, 2021a. Cited on pages 18 and 30.

Cristian Bodnar, Fabrizio Frasca, Yuguang Wang, Nina Otter, Guido F Montufar, Pietro Lio, and Michael Bronstein. Weisfeiler and lehman go topological: Message passing simplicial networks. In International Conference on Machine Learning, pp. 1026 1037. PMLR, 2021b. Cited on pages 2, 3, 10, 13, 17, 28, 29, 30,

39, 40, and 42.

Cristian Bodnar, Francesco Di Giovanni, Benjamin Chamberlain, Pietro Liò, and Michael Bronstein. Neural sheaf diffusion: A topological perspective on heterophily and oversmoothing in gnns. Advances in Neural Information Processing Systems, 35:18527 18541, 2022. Cited on pages 18 and 40.

Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. ar Xiv preprint ar Xiv:2104.13478, 2021. Cited on page 1.

Joan Bruna and Stéphane Mallat. Invariant scattering convolution networks. IEEE transactions on pattern analysis and machine intelligence, 35(8):1872 1886, 2013. Cited on page 11.

Eric Bunch, Qian You, Glenn Fung, and Vikas Singh. Simplicial 2-complex convolutional neural networks. In TDA & Beyond, 2020. URL https://openreview.net/forum?id=TLbns Krt6J-. Cited on pages 2, 5, 7,

11, 13, 17, 28, 29, 30, 31, 33, 39, 40, and 44.

Chen Cai and Yusu Wang. A note on over-smoothing for graph neural networks. ar Xiv preprint ar Xiv:2006.13318, 2020. Cited on page 7.

Ozan Candogan, Ishai Menache, Asuman Ozdaglar, and Pablo A Parrilo. Flows and decompositions of games: Harmonic and potential games. Mathematics of Operations Research, 36(3):474 503, 2011. Cited on page 2.

Yuzhou Chen, Yulia Gel, and H Vincent Poor. Time-conditioned dances with simplicial complexes: Zigzag filtration curve based supra-hodge convolution networks for time-series forecasting. In Advances in Neural Information Processing Systems, 2022a. Cited on page 40.

Yuzhou Chen, Yulia R. Gel, and H. Vincent Poor. Bscnets: Block simplicial complex neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6333 6341, 2022b. doi: 10.1609/aaai. v36i6.20583. URL https://ojs.aaai.org/index.php/AAAI/article/view/20583. Cited on pages 2, 5, 17, and 30.

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https:// proceedings.neurips.cc/paper/2016/file/04df4d434d481c5bb723be1b6df1ee65-Paper.pdf. Cited on pages 13, 17, 29, and 40.

Stefania Ebli, Michaël Defferrard, and Gard Spreemann. Simplicial neural networks. In Neur IPS 2020 Workshop on Topological Data Analysis and Beyond, 2020. Cited on pages 2, 5, 11, 13, 15, 17, 29, 39, 40, and 44.

Fernando Gama, Antonio G Marques, Geert Leus, and Alejandro Ribeiro. Convolutional graph neural networks. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pp. 452 456. IEEE, 2019a. Cited on page 29.

Fernando Gama, Alejandro Ribeiro, and Joan Bruna. Stability of graph scattering transforms. Advances in Neural Information Processing Systems, 32, 2019b. Cited on page 11.

Fernando Gama, Joan Bruna, and Alejandro Ribeiro. Stability properties of graph neural networks. IEEE Transactions on Signal Processing, 68:5680 5695, 2020a. Cited on pages 11 and 28.

Published in Transactions on Machine Learning Research (08/2025)

Fernando Gama, Elvin Isufi, Geert Leus, and Alejandro Ribeiro. Graphs, convolutions, and neural networks: From graph filters to graph neural networks. IEEE Signal Processing Magazine, 37(6):128 138, 2020b. Cited on page 29.

Lorenzo Giusti, Claudio Battiloro, Paolo Di Lorenzo, Stefania Sardellitti, and Sergio Barbarossa. Simplicial attention neural networks. ar Xiv preprint ar Xiv:2203.07485, 2022. Cited on pages 2, 11, 17, and 29.

Christopher Wei Jin Goh, Cristian Bodnar, and Pietro Lio. Simplicial attention networks. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022. Cited on pages 11, 17, and 29.

Kiya W Govek, Venkata S Yamajala, and Pablo G Camara. Spectral simplicial theory for feature selection and applications to genomics. ar Xiv preprint ar Xiv:1811.03377, 2018. Cited on page 17.

Leo J Grady and Jonathan R Polimeni. Discrete calculus: Applied analysis on graphs for computational science, volume 3. Springer, 2010. Cited on pages 2, 4, 10, 11, 18, 28, 32, and 40.

Nicola Guglielmi, Anton Savostianov, and Francesco Tudisco. Quantifying the structural stability of simplicial homology. ar Xiv preprint ar Xiv:2301.03627, 2023. Cited on pages 11 and 32.

Mustafa Hajij, Kyle Istvan, and Ghada Zamzmi. Cell complex neural networks. In Neur IPS 2020 Workshop on Topological Data Analysis and Beyond, 2020. Cited on pages 18 and 30.

Mustafa Hajij, Ghada Zamzmi, Theodore Papamarkou, Vasileios Maroulas, and Xuanting Cai. Simplicial complex representation learning. ar Xiv preprint ar Xiv:2103.04046, 2021. Cited on pages 17 and 30.

Mustafa Hajij, Ghada Zamzmi, Theodore Papamarkou, Nina Miolane, Aldo Guzmán-Sáenz, and Karthikeyan Natesan Ramamurthy. Higher-order attention networks. ar Xiv preprint ar Xiv:2206.00606, 2022. Cited on pages 18 and 30.

Mustafa Hajij, Mathilde Papillon, Florian Frantzen, Jens Agerberg, Ibrahem Al Jabea, Rubén Ballester, Claudio Battiloro, Guillermo Bernárdez, Tolga Birdal, Aiden Brent, et al. Topox: a suite of python packages for machine learning on topological domains. Journal of Machine Learning Research, 25(374):1 8, 2024. Cited on page 18.

Jakob Hansen and Robert Ghrist. Toward a spectral theory of cellular sheaves. Journal of Applied and Computational Topology, 3(4):315 358, 2019. Cited on pages 18 and 30.

William Vallance Douglas Hodge. The theory and applications of harmonic integrals. CUP Archive, 1989. Cited on pages 4 and 17.

Danijela Horak and Jürgen Jost. Spectra of combinatorial laplace operators on simplicial complexes. Advances in Mathematics, 244:303 336, 2013. Cited on pages 11, 32, and 40.

Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 2012. Cited on pages 9

Elvin Isufi and Maosheng Yang. Convolutional filtering in simplicial complexes. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5578 5582, 2022. doi: 10.1109/ICASSP43922.2022.9746349. Cited on page 26.

Junteng Jia, Michael T Schaub, Santiago Segarra, and Austin R Benson. Graph-based semi-supervised & active learning for edge flows. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 761 771, 2019. Cited on pages 2, 10, and 14.

Xiaoye Jiang, Lek-Heng Lim, Yuan Yao, and Yinyu Ye. Statistical ranking and combinatorial hodge theory. Mathematical Programming, 127(1):203 244, 2011. Cited on pages 2, 10, and 14.

Henry Kenlay, Dorina Thano, and Xiaowen Dong. On the stability of graph convolutional neural networks under edge rewiring. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8513 8517. IEEE, 2021. Cited on page 11.

Published in Transactions on Machine Learning Research (08/2025)

Alexandros D Keros, Vidit Nanda, and Kartic Subr. Dist2cycle: A simplicial neural network for homology localization. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7):7133 7142, 2022. doi: 10.1609/aaai.v36i7.20673. URL https://ojs.aaai.org/index.php/AAAI/article/view/20673. Cited on pages 17 and 29.

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017. Cited on pages 7, 17, and 29.

See Hian Lee, Feng Ji, and Wee Peng Tay. Sgat: Simplicial graph attention network. ar Xiv preprint ar Xiv:2207.11761, 2022. Cited on page 17.

Lek-Heng Lim. Hodge laplacians on graphs. SIAM Review, 62(3):685 715, 2020. Cited on pages 2, 4, 17, and 27.

Albert T Lundell, Stephen Weingram, Albert T Lundell, and Stephen Weingram. Regular and semisimplicial cw complexes. The Topology of CW Complexes, pp. 77 115, 1969. Cited on page 18.

Hosein Masoomy, Behrouz Askari, Samin Tajik, Abbas K Rizi, and G Reza Jafari. Topological analysis of interaction patterns in cancer-specific gene regulatory network: persistent homology approach. Scientific Reports, 11(1):1 11, 2021. Cited on page 1.

Rohan Money, Joshin Krishnan, Baltasar Beferull-Lozano, and Elvin Isufi. Online edge flow imputation on networks. IEEE Signal Processing Letters, 2022. Cited on page 2.

James R Munkres. Elements of algebraic topology. CRC press, 2018. Cited on page 27.

Mark EJ Newman, Duncan J Watts, and Steven H Strogatz. Random graph models of social networks. Proceedings of the national academy of sciences, 99(suppl_1):2566 2572, 2002. Cited on page 1.

Hoang Nt and Takanori Maehara. Revisiting graph neural networks: All we have is low-pass filters. ar Xiv preprint ar Xiv:1905.09550, 2019. Cited on page 7.

Theodore Papamarkou, Tolga Birdal, Michael M Bronstein, Gunnar E Carlsson, Justin Curry, Yue Gao, Mustafa Hajij, Roland Kwitt, Pietro Lio, Paolo Di Lorenzo, et al. Position: Topological deep learning is the new frontier for relational learning. In Forty-first International Conference on Machine Learning, 2024. Cited on pages 2 and 17.

Alejandro Parada-Mayorga, Zhiyang Wang, Fernando Gama, and Alejandro Ribeiro. Stability of aggregation graph neural networks. ar Xiv preprint ar Xiv:2207.03678, 2022. Cited on page 11.

Qiang Qiu, Xiuyuan Cheng, Guillermo Sapiro, et al. Dcfnet: Deep neural network with decomposed convolutional filters. In International Conference on Machine Learning, pp. 4198 4207. PMLR, 2018. Cited on page 11.

T Mitchell Roddenberry and Santiago Segarra. Hodgenet: Graph neural networks for edge data. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pp. 220 224. IEEE, 2019. Cited on pages

2, 17, and 44.

T Mitchell Roddenberry, Nicholas Glaze, and Santiago Segarra. Principled simplicial neural networks for trajectory prediction. In International Conference on Machine Learning, pp. 9020 9029. PMLR, 2021. Cited on pages 2, 5, 11, 13, 16, 17, 28, 29, 39, 40, 43, and 44.

T Mitchell Roddenberry, Michael T Schaub, and Mustafa Hajij. Signal processing on cell complexes. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8852 8856. IEEE, 2022. Cited on pages 18 and 30.

Vincent Rouvreau. Alpha complex. In GUDHI User and Reference Manual. GUDHI Editorial Board, 2015. URL http://gudhi.gforge.inria.fr/doc/latest/group__alpha__complex.html. Cited on page 38.

Published in Transactions on Machine Learning Research (08/2025)

T Konstantin Rusch, Michael M Bronstein, and Siddhartha Mishra. A survey on oversmoothing in graph neural networks. ar Xiv preprint ar Xiv:2303.10993, 2023. Cited on page 7.

Aliaksei Sandryhaila and José MF Moura. Discrete signal processing on graphs. IEEE Transactions on Signal Processing, 61(7):1644 1656, 2013. Cited on page 29.

Aliaksei Sandryhaila and José MF Moura. Discrete signal processing on graphs: Frequency analysis. IEEE Transactions on Signal Processing, 62(12):3042 3054, 2014. Cited on page 29.

Stefania Sardellitti, Sergio Barbarossa, and Lucia Testa. Topological signal processing over cell complexes. In 2021 55th Asilomar Conference on Signals, Systems, and Computers, pp. 1558 1562. IEEE, 2021. Cited on pages 18 and 30.

Michael T Schaub, Austin R Benson, Paul Horn, Gabor Lippner, and Ali Jadbabaie. Random walks on simplicial complexes and the normalized hodge 1-laplacian. SIAM Review, 62(2):353 391, 2020. Cited on pages 11, 16, 32, 40, and 44.

Michael T Schaub, Yu Zhu, Jean-Baptiste Seby, T Mitchell Roddenberry, and Santiago Segarra. Signal processing on higher-order networks: Livin on the edge... and beyond. Signal Processing, 187:108149, 2021. Cited on pages 2, 6, 7, and 26.

John Steenbergen. Towards a spectral theory for simplicial complexes. Ph D thesis, Duke University, 2013. Cited on pages 2 and 17.

Leo Torres, Ann S Blevins, Danielle Bassett, and Tina Eliassi-Rad. The why, how, and when of representations for complex systems. SIAM Review, 63(3):435 485, 2021. Cited on page 1.

Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. In International conference on machine learning, pp. 6861 6871. PMLR, 2019. Cited on page 29.

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? ar Xiv preprint ar Xiv:1810.00826, 2018a. Cited on page 17.

Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pp. 5453 5462. PMLR, 2018b. Cited on page 26.

Maosheng Yang, Elvin Isufi, Michael T. Schaub, and Geert Leus. Finite Impulse Response Filters for Simplicial Complexes. In 2021 29th European Signal Processing Conference (EUSIPCO), pp. 2005 2009, August 2021. doi: 10.23919/EUSIPCO54536.2021.9616185. ISSN: 2076-1465. Cited on pages 2, 7, 17, 25, and 40.

Maosheng Yang, Elvin Isufi, and Geert Leus. Simplicial convolutional neural networks. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8847 8851, 2022a. doi: 10.1109/ICASSP43922.2022.9746017. Cited on pages 2, 5, 13, 16, 17, 29, 40, and 44.

Maosheng Yang, Elvin Isufi, Michael T. Schaub, and Geert Leus. Simplicial convolutional filters. IEEE Transactions on Signal Processing, 70:4633 4648, 2022b. doi: 10.1109/TSP.2022.3207045. Cited on pages 5,

6, 7, 8, 17, 25, 26, and 29.

Maosheng Yang, Viacheslav Borovitskiy, and Elvin Isufi. Hodge-compositional edge gaussian processes. In International Conference on Artificial Intelligence and Statistics, pp. 3754 3762. PMLR, 2024. Cited on page 14.

Ruochen Yang, Frederic Sala, and Paul Bogdan. Efficient representation learning for higher-order data with simplicial complexes. In The First Learning on Graphs Conference, 2022c. URL https://openreview. net/forum?id=n Gq JY4DODN. Cited on pages 5, 7, 11, 17, and 30.

Published in Transactions on Machine Learning Research (08/2025)

Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. Advances in neural information processing systems, 31, 2018. Cited on pages 15 and 39.

Cameron Ziegler, Per Sebastian Skardal, Haimonti Dutta, and Dane Taylor. Balanced hodge laplacians optimize consensus dynamics over simplicial complexes. Chaos: An Interdisciplinary Journal of Nonlinear Science, 32(2):023128, 2022. Cited on pages 7 and 28.

Published in Transactions on Machine Learning Research (08/2025)

1 Introduction 1

2 Background 3

2.1 Simplicial complex and simplicial signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Simplicial signals and Hodge decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Simplicial Complex CNNs 5

3.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 A simplicial Dirichlet energy perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 From convolutional to Hodge-aware 7

4.1 Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.2 Hodge-aware learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 How robust are SCCNNs to domain perturbations? 11

6 Experiments 13

6.1 Foreign currency exchange (RQs 1, 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6.2 Simplicial oversmoothing analysis (RQ 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.3 Simplex prediction (RQs 1, 3-4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6.3.1 Stability analysis (RQ 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6.4 Trajectory prediction (RQ 1, 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6.4.1 Stability analysis (RQ 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Related Works, Discussions and Conclusion 17

A Illustration for Background 25

B Simplicial 2-Complex CNNs and Details on Properties 26

C Related works 28

D Proofs for Section 3 30

E Proofs for Section 4 31

F Proofs for Section 5 32

G Experiment details 38

Published in Transactions on Machine Learning Research (08/2025)

A Illustration for Background

Ths paper relies on the Hodge decomposition and the spectral simplicial theory. To ease the exposition, we illustrate them for the edge flow space. We refer to Barbarossa & Sardellitti (2020); Yang et al. (2021; 2022b) for more details.

(a) u G,1, λG,1(0.80) (b) u G,2, λG,2(1.61) (c) u G,3, λG,3(2.43) (d) u G,4, λG,4(3.96) (e) u G,5, λG,5(5.12)

(f) u G,6, λG,6(6.08) (g) u C,1, λC,1(1.59) (h) u C,2, λC,2(3.00) (i) u C,3, λC,3(4.41) (j) u H, λH(0)

Figure 9: (a)-(f) Six gradient frequencies and the corresponding Fourier basis. We also annotate their divergences, and we see that these eigenvectors with a small eigenvalue have a small magnitude of total divergence, i.e., the edge flow variation in terms of the nodes. Gradient frequencies reflect the nodal variations. (g)-(i) Three curl frequencies and the corresponding Fourier basis. We annotate their curls and we see that these eigenvectors with a small eigenvalue have a small magnitude of total curl, i.e., the edge flow variation in terms of the triangles. Curl frequencies reflect the rotational variations. (j) Harmonic basis with a zero frequency, which has a zero nodal and zero rotational variation.

A.1 Spectral simplicial theory

Here we show how the eigenvalues of Lk carry the notion of simplicial frequency Yang et al. (2022b). Specifically, we show for k = 1 an eigenvalue measures the total divergence or curl of the eigenvector.

Gradient Frequency: the nonzero eigenvalues associated with the eigenvectors U1,G of L1,d, which span the gradient space im(B 1 ), admit L1,du1,G = λ1,Gu1,G for any eigenpair u1,G and λ1,G. Thus, we have λ1,G = u 1,GL1,du1,G = u 1,GB 1 B1u1,G = B1u1,G 2 2, which is an Euclidean norm of the divergence, i.e., the total nodal variation of u1,G. If an eigenvector has a larger eigenvalue, it has a larger total divergence. For the SFT of an edge flow, if the gradient embedding x1,G has a large weight on such an eigenvector, it contains components with a large divergence, and we say it has a large gradient frequency. Thus, we call such eigenvalues associated with U1,G gradient frequencies.

Curl Frequency: the nonzero eigenvalues associated with the eigenvectors U1,C of L1,u, which span the curl space im(B2), admit L1,uu1,C = λ1,Cu1,C for any eigenpair u1,C and λ1,C. Thus, we have λ1,C = u 1,CL1,uu1,C = u 1,CB2B 2 u1,C = B 2 u1,C 2 2, which is an Euclidean norm of the curl, i.e., the total rotational variation of u1,C. If an eigenvector has a larger eigenvalue, it has a larger total curl. For the SFT of an edge flow, if the curl embedding x1,C has a large weight on such an eigenvector, it contains components with a large curl, and we say it has a large curl frequency. Thus, we call such eigenvalues associated with U1,C curl frequencies.

Harmonic Frequency: the zero eigenvalues associated with the eigenvectors U1,H, which span the harmonic space ker(L1), admit L1u1,H = 0 for any eigenpair u1,H and λ1,H = 0. From the definition of L1, we have B1u1,H = B 2 u1,H = 0. That is, the eigenvector u1,H is divergenceand curl-free. We also say such an eigenvector has zero signal variation in terms of the nodes and triangles. This resembles the constant graph signal in the node space. We call such zero eigenvalues as harmonic frequencies.

Published in Transactions on Machine Learning Research (08/2025)

Fig. 9 shows the simplicial Fourier basis and the corresponding simplicial frequencies of the SC, from which we see how the eigenvalues of L1 can be interpreted as the simplicial frequencies.

For k = 0, the eigenvalues of L0 carry the notion of graph frequency, which measures the graph (node) signal smoothness w.r.t. the upper adjacent simplices, i.e., edges. Thus, the curl frequency of k = 0 coincides with the graph frequency and a constant graph signal has only harmonic frequency component. For a more general k, there exist these three types of simplicial frequencies, which measure the k-simplicial signal total variations in terms of faces and cofaces.

B Simplicial 2-Complex CNNs and Details on Properties

We give two examples where the first is a SCCNN on a SC of order two, and the second is the form of SCCNN with multi-features. Example 25. For k = 2, a SCCNN layer reads as

xl 0 = σ(Hl 0xl 1 0 + Hl 0,u B1xl 1 1 ),

xl 1 = σ(Hl 1,d B 1 xl 1 0 + Hl 1xl 1 1 + Hl 1,u B2xl 1 2 ),

xl 2 = σ(Hl 2,d B 2 xl 1 1 + Hl 2xl 1 2 ).

Recursively, we see that a SCCNN layer takes as inputs {xl 1 0 , xl 2 0 , xl 2 1 , xl 2 2 } to compute xl 0. One may find this familar as some type of skip connections in GNNs Xu et al. (2018b). Example 26 (Multi-Feature SCCNN). A multi-feature SCCNN at layer l takes {Xl 1 k 1, Xl 1 k , Xl 1 k+1} as inputs, each of which has Fl 1 features, and generates an output Xl k with Fl features as

t=0 Lt k,d B k Xl 1 k 1W l k,d,t +

t=0 Lt k,d Xl 1 k W l k,d,t +

t=0 Lt k,u Xl 1 k W l k,u,t +

t=0 Lt k,u Bk+1Xl 1 k+1W l k,u,t

(18) where Lt indicates the matrix t-power of L, while superscript l indicates the layer index.

B.1 Simplicial locality in details

The construction of SCFs has an intra-simplicial locality. Hkxk, which consists of basic operations Lk,dxk and Lk,uxk. They are given, on simplex sk i , by

[Lk,dxk]i = P

j N k i,d {i}[Lk,d]ij[xk]j, [Lk,uxk]i = P

j N k i,u {i}[Lk,u]ij[xk]j, (19)

where sk i aggregates signals from its lower and upper neighbors, N k i,d and N k i,u. We can compute the t-step shifting recursively as Lt k,dxk = Lk,d(Lt 1 k,d xk), a one-step shifting of the (t 1)-shift result; likewise for Lt k,uxk. A SCF linearly combines such multi-step simplicial shiftings based on lower and upper adjacencies. Thus, the output Hkxk is localized in Td-hop lower and Tu-hop upper k-simplicial neighborhoods (Yang et al., 2022b). SCCNNs preserve such intra-simplicial locality as the elementwise nonlinearity does not alter the information locality, shown in Figs. 2b and 2c.

A SCCNN takes the data on kand (k 1)-simplices at layer l 1 to compute xl k, causing interactions between k-simplices and their (co)faces when all SCFs are identity. In turn, xl 1 k 1 contains information on (k 2)-simplices from layer l 2. Likewise for xl 1 k+1, thus, xl k also contains information up to (k 2)-simplices if L 2, because Bkσ(Bk+1) = 0. Accordingly, this inter-simplicial locality extends to the whole SC if L K, unlike linear filters in a SC where the locality happens up to the adjacent simplices (Schaub et al., 2021; Isufi & Yang, 2022), which limits its expressive power. This locality is further coupled with the intra-locality through three SCFs such that a node not only interacts with the edges incident to it and direct triangles including it, but also edges and triangles further hops away which contribute to the neighboring nodes, as shown in Fig. 2d.

Published in Transactions on Machine Learning Research (08/2025)

B.2 Complexity

In a SCCNN layer for computing xl k, there are 2 + Td + Tu filter coefficients for the SCF Hl k, and 1 + Td and 1 + Tu for Hl k,d and Hl k,u, respectively, which gives the parameter complexity of order O(Td + Tu). This complexity will increase by Fl Fl 1 fold for the multi-feature case, and likewise for the computational complexity. Given the inputs {xl 1 k 1, xl 1 k , xl 1 k+1}, we discuss the computation complexity of xl k in Eq. (4).

First, consider the SCF operation Hl kxl 1 k . As discussed in the localities, it is a composition of Td-step lower and Tu-step upper simplicial shiftings. Each simplicial shifting has a computational complexity of order O(nkmk) dependent on the number of neighbors mk where nk is the number of k-simplices. Thus, this operation has a complexity of order O(nkmk(Td + Tu)).

Second, consider the lower SCF operation Hl k,d B k xl 1 k 1. As incidence matrix Bk is sparse, it has nk(k + 1) nonzero entries as each k-simplex has k + 1 faces. This leads to a complexity of order O(nkk) for operation B k xl 1 k 1. Followed by a lower SCF operation, i.e., a Td-step lower simplicial shifting, thus, a complexity of order O(knk + nkmk Td) is needed.

Third, consider the upper SCF operation Hl k,u Bk+1xl 1 k+1. Likewise, incidence matrix Bk+1 has nk+1(k + 2) nonzero entries. This leads to a complexity of order O(nk+1k) for the projection operation Bk+1xl 1 k+1. Followed by an upper SCF operation, i.e., a Tu-step upper simplicial shifting, thus, a complexity of order O(knk+1 + nkmk Tu) is needed.

Finally, we have a computational complexity of order O(k(nk + nk+1) + Nk Mk(Td + Tu)) in total. Remark 27. The lower SCF operation Hl k,d B k xl 1 k 1 can be further reduced if nk 1 nk. Note that we have

Hl k,d B k xl 1 k 1 =

t=0 w l k,d,t Lt k,d B k xl 1 k 1 = B k

t=0 w l k,d,t Lt k 1,uxl 1 k 1, (20)

where the second equality comes from that Lk,d B k = B k Bk B k = B k Lk 1,u, L2 k,d B k = (B k Bk)(B k Bk)B k = B k (Bk B k )(Bk B k ) = B k Lk 1,u and likewise for general t. Using the RHS of Eq. (20) where the simplicial shifting is performed in the (k 1)-simplicial space, we have a complexity of order O(knk + nk 1mk 1Td). Similarly, we have

Hl k,u Bk+1xl 1 k+1 =

t=0 w l k,u,t Lt k,u Bk+1xl 1 k+1 = Bk+1

t=0 w l k,u,t Lt k+1,dxl 1 k+1 (21)

where the simplicial shifting is performed in the (k + 1)-simplicial space. If it follows that nk+1 nk, we have a smaller complexity of O(knk+1 + nk+1mk+1Tu) by using the RHS of Eq. (21).

B.3 Symmetries of SCs and simplicial data, Equivariance of SCCNNs

Permutation symmetry of SCs. There exists a permutation group Pnk for each set Sk in a SC of order K. For K = 0, this gives the graph permutation group. We can combine these groups for different simplex orders by a group product to form a larger permutation group P = k Pnk, which is a symmetry group of SCs and simplicial data, assuming vertices in each simplex are consistently ordered. That is, we have, for p = (p0, p1, . . . , p K) P, [p Lk]ij = [Lk]p 1 k (i)p 1 k (j), [p Bk]ij = [Bk]p 1 k 1(i)p 1 k (j), and [p xk]i = [xk]p 1 k (i). This permutation symmetry of SCs gives us the freedom to list simplices in any order.

Orientation symmetry of simplicial data. The orientation of a simplex is an equivalence class that two orientations are equivalent if they differ by an even permutation Lim (2020); Munkres (2018). Thus, for a simplex sk i = {i0, . . . , ik} with k > 0, we have an orientation symmetry group Ok,i = {o+ k,i, o k,i} by a group homomorphism which maps all the even permutations of {i0, . . . , ik} to the identity element o+ k,i and all the odd permutations to the reverse operation o k,i.

We can further combine the orientation groups of all simplices in a SC as O = i,k Ok,i by using a group product. This however is not a symmetry group of an oriented SC because o k,i Lk changes the signs of

Published in Transactions on Machine Learning Research (08/2025)

Lk elements in ith column and row, and o k,i Bk changes the ith row, resulting in a different SC topology. Instead, it is a symmetry group of the data space, due to its alternating nature w.r.t. simplices. For o O we have [o xk]i = ok,i fk(sk i ) = fk(o 1 k,i sk i ), i.e., [xk]i remains unchanged w.r.t. the changed orientation of sk i . This gives us the freedom to choose reference orientations of simplices when working with simplicial data.

Theorem 28 (Permutation Equivariance). A SCCNN in Eq. (4) is P-equivariant. For all p P, we have p SCCNNk : {pk 1 xk 1, pk xk, pk+1 xk+1} pkxk.

Theorem 29 (Orientation Equivariance). A SCCNN in Eq. (4) is O-equivariant if σ( ) is odd. For all o O, we have o SCCNNk : {ok 1 xk 1, ok xk, ok+1 xk+1} ok xk.

Proof. (informal) Both the permutation group and orientation group have linear matrix representations. By following the same procedure in (Bodnar et al., 2021b, Appendix D) or Roddenberry et al. (2021), we can prove the equivariance.

B.4 Diffusion process on SCs

Diffusion process on graphs can be generalized to SCs to characterize the evolution of simplicial data over the SC, in analogy to data diffusion on nodes Anand et al. (2022); Ziegler et al. (2022); Grady & Polimeni (2010). Here we provide an informal treatment of how discretizing diffusion equations on SCs can give resemblances of simplicial shifting layers. Consider diffusion equation and its Euler discretization with a unit time step

xk(t) = Lkxk(t), Euler step: xk(t + 1) = xk(t) Lkxk(t) = (I Lk)xk(t) (22)

with an initial condition xk(t) = x0 k. The solution of this diffusion is xk(t) = exp ( Lkt)x0 k. As the time increases, the simplicial data reaches to a steady state xk(t) = 0, which lies in the harmonic space ker(Lk). The simplicial shifting layer resembles this Euler step with a weight and nonlinearity when viewing the time step as layer index. Thus, a NN composed of simplicial shifting layers can suffer from oversmoothing on SCs, giving outputs with decreasing Dirichlet energies as the number of layers increases.

Now let us consider the case where the two Laplacians have different coefficients

xk(t) = Lk,dxk(t) γLk,uxk(t), Euler step: xk(t) = (I Lk,d γLk,u)xk(t). (23)

The steady state of this diffusion equation follows (Lk,d + γLk,u)xk(t) = 0, where xk(t) would be in the kernal space of Lk still. However, before reaching this state, when the time increases, xk(t) would primarily approach to the kernel of B k+1 if γ 1, in which the lower part of the Dirichlet energy remains, i.e., the decrease of D(x(t)) slows down.

When accounting for inter-simplicial couplings, consider there are nontrivial xk 1 and xk+1 and the diffusion equation becomes xk(t) = Lkxk(t) + B k xk 1 + Bk+1xk+1, (24)

which has source terms B k xk 1+Bk+1xk+1. Consider a steady state xk = 0. We have Lkxk(t) = xk,d+xk,u, where xk is not in the kernel space of Lk. The Euler discretization gives

xk(t + 1) = (I Lk)xk(t) + xk,d + xk,u. (25)

The layer in Bunch et al. (2020) xl+1 k = w0(I Lk)xl k + w1xk,d + w2xk,u is a weighted variant of above step when viewing time steps as layers.

C Related works

We first compare SCCNN with other architectures on if they respect the three principles in Section 3 in Table 5. We then describe how the SCCNN in Eq. (18) generalize other NNs on graphs and SCs in Table 6. For simplicity, we use Y and X to denote the output and input, respectively, without the index l. Note that for GNNs, L0,d is not defined.

Published in Transactions on Machine Learning Research (08/2025)

Table 5: Comparisons between SCCNN and other architectures on if they respect the three principles.

Methods Scheme P1 P2 P3

MPSN Bodnar et al. (2021b) message-passing yes yes no, only direct neighborhoods Eq. (11) of MPSN, or Bunch et al. (2020) convolutional no yes no, only direct neighborhoods Eq. (27) of MPSN convolutional yes yes no, only direct neighborhoods SNN Ebli et al. (2020) convolutional no no yes PSNN Roddenberry et al. (2021) convolutional yes no no, only direct neighborhoods SCNN Yang et al. (2022a) convolutional yes no yes SCCNN convolutional yes yes yes

Table 6: SCCNNs generalize other convolutional architectures on SCs.

Methods Parameters (n.d. denotes not defined )

Ebli et al. (2020) wl k,d,t = wl k,u,t, Hl k,d, Hl k,u n.d. Roddenberry et al. (2021) Td = Tu = 1, Hl k,d, Hl k,u n.d. Yang et al. (2022a) Hl k,d, Hl k,u n.d. Bunch et al. (2020) Td = Tu = 1, Hl k,d = Hl k,u = I Bodnar et al. (2021b) Td = Tu = 1, Hl k,d = Hl k,u = I

Gama et al. (2020a) proposed to build a GNN layer with the form

t=0 Lt 0X0W0,u,t

where the convolution step is performed via a graph filter (Sandryhaila & Moura, 2013; 2014; Gama et al., 2019a; 2020b). This GNN can be easily built as a special SCCNN without contributions from edges. Furthermore, Defferrard et al. (2016) considered a fast implementation of this GNN via a Chebyshev polynomial, while Wu et al. (2019) simplified this by setting W0,t,u as zeros for t < Tu. Kipf & Welling (2017) further simplified this by setting Tu = 1, namely, GCN.

Yang et al. (2022a) proposed a simplicial convolutional neural network (SCNN) to learn from k-simplicial signals

t=0 Lt k,d Xk Wk,d,t +

t=0 Lt k,u Xk Wk,u,t

where the linear operation is also defined as a simplicial convolution filter in Yang et al. (2022b). This is a special SCCNN with a focus on one simplex level without taking into the lower and upper contributions consideration. The simplicial neural network (SNN) of Ebli et al. (2020) did not differentiate the lower and the upper convolutions with a form of Yk = σ(PT t=0 Lt k Xk Wk,t), which leads to a joint processing in the gradient and curl subspaces as analyzed in Section 4.

While Roddenberry et al. (2021) proposed an architecture (referred to as PSNN)of a particular form of

Eq. (27) with Td = Tu = 1, performing only a one-step simplicial shifting Eq. (19). Keros et al. (2022) also performs a one-step simplicial shifting but with an inverted Hodge Laplacian to localize the homology group in an SC. An attention mechanism was added to both SCNNs and PSNNs by Giusti et al. (2022) and Goh et al. (2022), respectively. Battiloro et al. (2023) added the attention mechanism to SCCNNs.

To account for the information from adjacent simplices, Bunch et al. (2020) proposed a simplicial 2-complex CNN (S2CCNN) Y0 = σ L0X0W0,u,1 + B1X1W 0,u,0

Y1 = σ B 1 X0W1,d,0 + L1X1W1,1 + B2X2W 1,u,0

Y2 = σ B 2 X1W2,d,0 + L2,u X2W2,u,1 (28)

Published in Transactions on Machine Learning Research (08/2025)

which is limited to SCs of order two. Note that instead of Hodge Laplacians, simplicial adjacency matrices with self-loops are used in Bunch et al. (2020), which encode equivalent information as setting all filter orders in SCCNNs as one. It is a particular form of the SCCNN where the SCF is a one-step simplicial shifting operation without differentiating the lower and upper shifting, and the lower and upper contributions are simply added, not convolved or shifted by lower and upper SCFs. That is, Bunch et al. (2020) can be obtained from Eq. (4) by setting lower and upper SCFs as identity, Hk,d = Hk,u = I, and setting wk,d,t = wk,u,t and Td = Tu = 1 for the SCF Hk. The convolution in Yang et al. (2022c, Eq. 3) is the same as Bunch et al. (2020) though it was performed in a block matrix fashion.

The combination of graph shifting and edge shifting in Chen et al. (2022b) can be again seen as a special S2CCNN, where the implementation was performed in a block matrix fashion. Bodnar et al. (2021b) proposed a message passing scheme which collects information from one-hop simplicial neighbors and direct faces and cofaces as Bunch et al. (2020) and Yang et al. (2022c), but replacing the one-step shifting and projections from (co)faces by some learnable functions. The same message passing was applied for simplicial representation learning by Hajij et al. (2021).

Lastly, there are works on signal processing and NNs on cell complexes. For example, Sardellitti et al. (2021); Roddenberry et al. (2022) generalized the signal processing techniques from SCs to cell complexes, Bodnar et al. (2021a); Hajij et al. (2020) performed message passing on cell complexes as in SCs and Hajij et al. (2022) added the attention mechanism. Cell complexes are a more general model compared to SCs, where k-cells compared to k-simplices contain any shapes homeomorphic to a k-dimensional closed balls in Euclidean space, e.g., a filled polygon is a 2-cell while only triangles are 2-simplices. We refer to Hansen & Ghrist (2019) for a more formal definition of cell complexes. Despite cell complexes are more powerful to model real-world higher-order structures, SCCNNs can be easily generalized to cell complexes by considering any k-cells instead of only k-simplices in the algebraic representations, and the theoretical analysis in this paper can be adapted to cell complexes as well.

D Proofs for Section 3

D.1 Dirichlet energy minimization perspective

Hodge Laplacian smoothing. We can find the gradient of problem Eq. (7) as D xk = B k Bkxk + γBk+1B k+1xk, thus, a gradient descent step follows as Eq. (7) with a step size η.

Proof of Proposition 5. Consider η = 1.

D(xl+1 k ) = w2 0 Bk(I Lk,d γLk,u)xl k 2 2 + w2 0 B k+1(I Lk,d γLk,u)xl k 2 2 = w2 0 (I Lk 1,u)Bkxl k 2 2 + w2 0 (I γLk+1,d)B k+1xl k 2 2 w2 0 (I Lk 1,u) 2 2 Bkxl k 2 2 + w2 0 (I γLk+1,d) 2 2 B k+1xl k 2 2

which follows from triangle inequality. By definition, we have I Lk 1,u 2 2 = I Lk,d 2 2 and I Lk,u 2 2 = I Lk+1,d 2 2. Also, we have I Lk 2 2 = max{ I Lk,d 2 2, I Lk,u 2 2} Thus, we have D(xl+1 k ) w2 0 I Lk 2 2D(xl k) when γ = 1. When w2 0 I Lk 2 2 < 1, Dirichlet energy D(xl+1 k ) will exponentially decrease as l increases.

When γ = 1, from Eq. (29), we have D(xl+1 k ) = Dd(xl+1 k ) + Du(xl+1 k ), which follows

Dd(xl+1 k ) w2 0 (I Lk,d) 2 2Dd(xl k) and Du(xl+1 k ) w2 0 (I γLk,u) 2 2Du(xl k) (30)

When γ = 1, the oversmoothing condition is I Lk 2 2 = max{ I Lk,d 2 2, I Lk,u 2 2} < 1 w2 0 . If

I Lk 2 2 = I Lk,d 2 2, under the oversmoothing condition, by not restricting γ to be 1, w2 0 (I γLk,u) 2 2 can be larger than 1 depending on the choice, which means Du(xl k) does not necessarily decrease, so does not D(xl k).

Published in Transactions on Machine Learning Research (08/2025)

Hodge Laplacian smoothing with sources. The gradient of the objective in Eq. (8) is given by Lkxl k B k xk 1 Bk+1xk+1, which gives the gradient descent update in Eq. (8) with a step size η.

Consider the layer in Bunch et al. (2020) xl+1 k = w0(I Lk)xl k + w1xk,d + w2xk,u with some weights. By triangle inequality, we have D(xl+1 k ) w2 0 I Lk 2 2D(xl k)+w2 1λmax(Lk,d) xk,d 2 2 +w2 2λmax(Lk,u) xk,u 2 2. If the weight w0 is small enough following the condition in Proposition 5, the contribution from the projections, controled by weights w1 and w2, can compromise the decrease by w0, maintaining the Dirichlet energy.

E Proofs for Section 4

E.1 The SCF is Hodge-invariant in Proposition 12

Proof. We first give the following lemma.

Lemma 30. Any finite set of eigenfunctions of a linear operator spans an invariant subspace.

Then, the proof follows from Lemma 30 and Proposition 10.

E.2 A derivation of the spectral frequency response in Eq. (11)

SFT of xk. First, the SFT of xk is given by xk = [ x k,H, x k,G, x k,C] with the harmonic embedding xk,H = U k,Hxk = U k,Hxk,H in the zero frequencies, the gradient embedding xk,G = U k,Gxk = U k,Gxk,G in the gradient frequencies, and the curl embedding xk,C = U k,Cxk = U k,Cxk,C in the curl frequencies.

SFT of Hkxk. By diagonalizing an SCF Hk with Uk, we have

Hkxk = Uk f Hk U k xk = Uk( hk xk) (31)

where f Hk = diag( hk). Here, hk = [ h k,H, h k,G, h k,C] is the frequency response, given by

harmonic response : hk,H = (wk,d,0 + wk,u,0)1, gradient response : hk,G = PTd t=0 wk,d,tλ t k,G + wk,u,01, curl response : hk,C = PTu t=0 wk,u,tλ t k,C + wk,d,01,

with ( ) t the elementwise t-th power of a vector. Thus, we can express hk xk as

[( hk,H xk,H) , ( hk,G xk,G) , ( hk,C xk,C) ] . (32)

SFT of projections. Second, the lower projection xk,d im(B k ) has only a nonzero gradient embedding xk,d = U k,Gxk,d. Likewise, the upper projection xk,u im(Bk+1) contains only a nonzero curl embedding xk,u = U k,Cxk,u. The lower SCF Hk,d has hk,d = PTd t=0 w k,d,tλ t k,G as the frequency response that modulates

the gradient embedding of xk,d and the upper SCF Hk,u has hk,u = PTu t=0 w k,u,tλ t k,C as the frequency response that modulates the curl embedding of xk,u.

SFT of yk. For the output yk = Hk,dxk,d + Hkxk + Hk,uxk,u, we have

yk,H = hk,H xk,H, yk,G = hk,d xk,d + hk,G xk,G, yk,C = hk,C xk,C + hk,u xk,u. (33)

E.3 Expressive power in Proposition 13

Proof. From the Cayley-Hamilton theorem Horn & Johnson (2012), we know that an analytical function f(A) of a matrix A can be expressed as a matrix polynomial of degree at most its minimal polynomial degree, which equals to the number of distinct eigenvalues if A is positive semi-definite.

Published in Transactions on Machine Learning Research (08/2025)

Consider an analytical function Gk,d of Lk,d, defined on the spectrum of Lk,d via analytical function gk,G(λ) where λ is in the set of zero and the gradient frequencies. Then, Gk,d can be implemented by a matrix polynomial of Lk,d of order up to nk,G where nk,G is the number of nonzero eigenvalues of Lk,d, i.e., the number of distinct gradient frequencies. Likewise, any analytical function Gk,u of Lk,u can be implemented by a matrix polynomial of Lk,u of order up to nk,C, which is the number of nonzero eigenvalues of Lk,u, i.e., the number of distinct curl frequencies.

Thus, as of the matrix polynomial definition of SCFs in a SCCNN, the expressive power of Hk,dxk,d +Hkxk + Hk,uxk,u is at most G k,dxk,d + (Gk,d + Gk,u)xk + G k,uxk,u, when the matrix polynomial orders (convolution orders) follow Tk,d = T k,d = nk,G and Tk,u = T k,u = nk,C.

E.4 Hodge-aware of SCCNN in Theorem 14

Proof. Consider a linear mapping T : V V . An invariant subspace W of T has the property that all vectors v W are transformed by T into vectors also contained in W, i.e., v W = T(v) W. For an input x im(B k ), the output Hkx is in im(B k ) too, because of

t Lt k,dx + X

t Lt k,ux = X

t Lt k,dx im(B k ) (34)

where the second equality comes from the orthogonality between im(B k ) and im(Bk+1). Similarly, we can show that for x im(Bk+1), the output Hkx im(Bk+1); for x ker(Lk), the output Hkx ker(Lk). This essentially says the three subspaces of the Hodge decomposition are invariant with respect to the SCF Hk. Likewise, the gradient space is invariant with respect to the lower SCF Hk,d, which says any lower projection remains in the gradient space after passed by Hk,d; and the curl space is invariant with respect to the upper SCF Hk,u.

Lastly, through the spectral relation in Eq. (11), the learning operator Hk in the gradient space is controlled by the learnable weights {wk,d,t}, which is independent of the learnable weights {wk,u,t}, associated to the learning of Hk in the curl space. Likewise, the lower SCF learns in the gradient space as well but with another set of learnable weights {w k,d,t}, and the upper SCF learns in the curl space with learnable weights {w k,u,t}. From the spectral expressive power, we see that above four independent learning in the two subspaces can be as expressive as any analytical functions of the corresponding frequencies (spectrum). This concludes the independent and expressive learning in the gradient and curl spaces.

F Proofs for Section 5

We first give the formulation of SCCNNs on weighted SCs, then we proceed the stability proof.

F.1 SCCNN on weighted SCs

A weighted SC can be defined through specifying the weights of simplices. We give the definition of a commonly used weighted SC with weighted Hodge Laplacians in Grady & Polimeni (2010); Horak & Jost (2013).

Definition 31 (Weighted SC and Hodge Laplacians). In an oriented and weighted SC, we have diagonal weighting matrices Mk with [M]ii measuring the weight of ith k-simplex. A weighted kth Hodge Laplacian is given by Lk = Lk,d + Lk,u = Mk B k M 1 k 1Bk + Bk+1Mk+1B k+1M 1 k . (35)

where Lk,d and Lk,u are the weighted lower and upper Laplacians. A symmetric version follows Ls k = M 1/2 k Lk M 1/2 k , and likewise, we have Ls k,d = M 1/2 k B k M 1 k 1Bk M 1/2 k and Ls k,u =

M 1/2 k Bk+1Mk+1B k+1M 1/2 k , with the weighted incidence matrix is M 1/2 k 1 Bk M 1/2 k (Horak & Jost, 2013; Guglielmi et al., 2023; Schaub et al., 2020).

SCCNNs in weighted SC. The SCCNN layer defined in a weighted SC is of form

xl k = σ(Hl k,d Rk,dxl 1 k 1 + Hl kxl 1 k + Hl k,u Rk,uxl 1 k+1) (36)

Published in Transactions on Machine Learning Research (08/2025)

where the three SCFs are defined based on the weighted Laplacians Eq. (35), and the lower and upper contributions xl k,d and xl k,u are obtained via projection matrices Rk,d Rnk nk 1 and Rk,u Rnk nk+1, instead of B k and Bk+1. For example, Bunch et al. (2020) considered R1,d = M1B 1 M 1 0 and R1,u = B2M2.

F.2 Proof of Stability of SCCNNs in Theorem 24

For a SCCNN in Eq. (36) in a weighted SC S, we consider its perturbed version in a perturbed SC b S at layer l, given by ˆxl k = σ(c Hl k,d b Rk,d ˆxl 1 k 1 + c Hl k ˆxl 1 k + c Hl k,u b Rk,u ˆxl 1 k+1) (37)

which is defined based on perturbed Laplacians with the same set of filter coefficients, and the perturbed projection operators following relativ perturbation model.

Given the initial input x0 k for k = 0, 1, . . . , K, our goal is to upper bound the Euclidean distance between the outputs xl k and ˆxl k for l = 1, . . . , L,

ˆxl k xl k 2 = σ(c Hl k,d b Rk,d ˆxl 1 k 1 Hl k,d Rk,dxl 1 k 1 + c Hl k ˆxl 1 k Hl kxl 1 k + c Hl k,u b Rk,u ˆxl 1 k+1 Hl k,u Rk,uxl 1 k+1) 2. (38)

We proceed the proof in two steps: first, we analyze the operator norm c Hl k Hl k 2 of a SCF Hl k and its perturbed version c Hl k; then we look for the bound of the output distance for a general L-layer SCCNN. To ease notations, we omit the subscript such that A = max x 2=1 Ax 2 is the operator norm (spectral radius) of a matrix A, and x is the Euclidean norm of a vector x.

In the first step we omit the indices k and l for simplicity since they hold for general k and l. We first give a useful lemma. Lemma 32. Given the ith eigenvector ui of L = UΛU , for lower and upper perturbations Ed and Eu, we have Edui = qdiui + E1ui, Euui = quiui + E2ui (39)

with eigendecompositions Ed = Vd Qd V d and Eu = Vu Qu V u where Vd, Vu collect the eigenvectors and Qd, Qu the eigenvalues. It holds that E1 ϵdδd and E2 ϵuδu, with δd = ( Vd U +1)2 1 and δu = ( Vu U +1)2 1 measuring the eigenvector misalignments.

Proof. We first prove that Edui = qdiui + E1ui. The perturbation matrix on the lower Laplacian can be written as Ed = E d + E1 with E d = UQd U and E1 = (Vd U)Qd(Vd U) + UQd(Vd U) + (Vd U)Qd U . For the ith eigenvector ui, we have that

Edui = E dui + E1ui = qdiui + E1ui (40)

where the second equality follows from E dui = qdiui. Since Ed ϵd, it follows that Qd ϵd. Then, applying the triangle inequality, we have that

E1 (Vd U)Qd(Vd U) + UQd(Vd U) + (Vd U)Qd U

Vd U 2 Qd +2 Vd U Qd U ϵd Vd U 2+2ϵd Vd U

=ϵd(( Vd U +1)2 1) = ϵdδd,

which completes the proof for the lower perturbation matrix. Likewise, we can prove for Euui.

F.2.1 Step I: Stability of the SCF

Proof. 1. Low-order approximation of c H H. Given a SCF H = PTd t=0 wd,t Lt d + PTu t=0 wu,t Lt u, we denote its perturbed version by c H = PTd t=0 wd,t b Lt d + PTu t=0 wu,t b Lt u, where the filter coefficients are the same. The difference between H and c H can be expressed as

t=0 wd,t(b Lt d Lt d) +

t=0 wu,t(b Lt u Lt u), (42)

Published in Transactions on Machine Learning Research (08/2025)

in which we can compute the first-order Taylor expansion of b Lt d as

b Lt d = (Ld + Ed Ld + Ld Ed)t = Lt d + Dd,t + Cd (43)

with Dd,t := Pt 1 r=0(Lr d Ed Lt r d + Lr+1 d Ed Lt r 1 d ) parameterized by t and Cd following Cd Pt r=2 t r Ed Ld + Ld Ed r Ld t r. Likewise, we can expand b Lt u as

b Lt u = (Lu + Eu Ld + Ldu)t = Lt u + Du,t + Cu (44)

with Du,t := Pt 1 r=0(Lr u Eu Lt r u + Lr+1 u Eu Lt r 1 u ) parameterized by t and Cu following Cu Pt r=2 t r Eu Lu + Lu Eu r Lu t r. Then, by substituting Eq. (43) and Eq. (44) into Eq. (42), we have

t=0 wd,t Dd,t +

t=0 wu,t Du,t + Fd + Fu (45)

with negligible terms Fd = O( Ed 2) and Fu = O( Eu 2) because perturbations are small and the coefficients of higher-order power terms are the derivatives of analytic functions h G(λ) and h C(λ), which are bounded [cf. Definition 18].

2. Spectrum of (c H H)x. Consider a simplicial signal x with an SFT x = U x = [ x1, . . . , xn] , thus, x = Pn i=1 xiui. Then, we study the effect of the difference of the SCFs on a simplicial signal from the spectral perspective via

t=0 wd,t Dt d,tui +

t=0 wu,t Dt u,tui + Fdx + Fux (46)

where we have

r=0 (Lr d Ed Lt r d + Lr+1 d Ed Lt r 1 d )ui, and Dt u,tui =

r=0 (Lr u Eu Lt r u + Lr+1 u Eu Lt r 1 u )ui. (47)

Since the lower and upper Laplacians admit the eigendecompositions for an eigenvector2 ui

Ldui = λdiui, Luui = λuiui, (48)

we can express the terms in Eq. (46) as

Lr d Ed Lt r d ui = Lr d Edλt r di ui = λt r di Lr d(qdiui + E1ui) = qdiλt diui + λt r di Lr d E1ui, (49)

where the second equality holds from Lemma 32. Thus, we have

Lr+1 d Ed Lt r 1 d ui = qdiλt diui + λt r 1 di Lr+1 d E1ui. (50)

With the results in Eq. (49) and Eq. (50), we can write the first term in Eq. (46) as

t=0 wd,t Dt d,tui =

r=0 2qdiλt diui

| {z } term 1

r=0 (λt r di Lr d E1ui + λt r 1 di Lr+1 d E1ui)

| {z } term 2

2Note that they can be jointly diagonalized.

Published in Transactions on Machine Learning Research (08/2025)

Term 1 can be further expanded as

t=0 twd,tλt diui = 2

i=1 xiqdiλdi h G(λdi)ui (52)

where we used the fact that PTd t=0 twd,tλt di = λdi h G(λdi). Using Ld = UΛd U we can write term 2 in Eq. (51) as

i=1 xi Udiag(gdi)U E1ui (53)

where gdi Rn has the jth entry

λt r di [Λd]r j + λt r 1 di [Λd]r+1 j

( 2λdi h G(λdi) for j = i, λdi+λdj λdi λdj ( h G(λdi) h G(λdj)) for j = i. (54)

Now, substituting Eq. (52) and Eq. (53) into Eq. (51), we have

t=0 wd,t Dt d,tui = 2

i=1 xiqdiλdi h G(λdi)ui +

i=1 xi Udiag(gdi)U E1ui. (55)

By following the same steps as in Eq. (51)-Eq. (54), we can express also the second term in Eq. (46) as

t=0 wu,t Dt u,tui = 2

i=1 xiquiλui h C(λui)ui +

i=1 xi Udiag(gui)U E2ui (56)

where gui Rn is defined as

λt r ui [Λu]r j + λt r 1 ui [Λu]r+1 j

( 2λui h C(λui) for j = i, λui+λuj λui λuj ( h C(λui) h C(λuj)) for j = i. (57)

3. Bound of (c H H)x . Now we are ready to bound (c H H)x based on triangle inequality. First, given the small perturbations Ed ϵd and Eu ϵu, we have for the last two terms in Eq. (46)

Fdx O(ϵ2 d) x , and Fux O(ϵ2 u) x . (58)

Second, for the first term Pn i=1 xi PTd t=0 wd,t Dt dui in Eq. (46), we can bound its two terms in Eq. (52) and Eq. (53) as

t=0 wd,t Dt d,tui

i=1 xiqdiλdi h G(λdi)ui

i=1 xi Udiag(gdi)U E1ui

For the first term on the RHS of Eq. (59), we can write

i=1 xiqdiλdi h G(λdi)ui

i=1 | xi|2|qdi|2|λdi h G(λdi)|2 4ϵ2 dc2 d x 2, (60)

which results from, first, |qdi| ϵd = Ed since qdi is an eigenvalue of Ed; second, the integral Lipschitz property of the SCF |λ h G(λ)| cd; and lastly, the fact that Pn i=1 | xi|2 = x 2 = x 2 and ui 2 = 1. We then have 2

i=1 xiqdiλdi h G(λdi)ui

2ϵdcd x . (61)

Published in Transactions on Machine Learning Research (08/2025)

For the second term in RHS of Eq. (59), we have

i=1 xi Udiag(gdi)U E1ui

i=1 | xi| Udiag(gdi)U E1 ui , (62)

which stems from the triangle inequality. We further have Udiag(gdi)U = diag(gdi) 2Cd resulting from U = 1 and the cd-integral Lipschitz of h G(λ) [cf. Definition 18]. Moreover, it follows that E1 ϵdδd from Lemma 32, which results in

i=1 xi Udiag(gdi)U E1ui

2Cdϵdδd n x (63)

where we use that Pn i=1 | xi| = x 1 n x = n x . By combining Eq. (60) and Eq. (63), we have

t=0 wd,t Dt d,tui

2ϵdcd x + 2Cdϵdδd n x . (64)

Analogously, we can show that

t=0 wu,t Dt u,tui

2ϵucu x + 2Cuϵuδu n x . (65)

Now by combining Eq. (58), Eq. (64) and Eq. (65), we can bound (c H H)x as

(c H H)x 2ϵdcd x + 2Cdϵdδd n x + O(ϵ2 d) x

+ 2ϵucu x + 2Cuϵuδu n x + O(ϵ2 u) x . (66)

By defining d = 2(1 + δd n) and u = 2(1 + δu n), we can obtain that

c H H cd dϵd + cu uϵu + O(ϵ2 d) + O(ϵ2 u). (67)

Thus, we have Hl k c Hl k ck,d k,dϵk,d+ck,u k,uϵk,u with k,d = 2(1+δk,d nk) and k,u = 2(1+δk,u nk) where we ignore the second and higher order terms on ϵk,d and ϵk,u. Likewise, we have Hl k,d c Hl k,d

ck,d k,dϵk,d for the lower SCF and Hl k,u c Hl k,u ck,u k,uϵk,u for the upper SCF.

F.2.2 Step II: Stability of SCCNNs

Proof. Given the initial input x0 k, the Euclidean distance between xl k and ˆxl k at layer l can be bounded by using triangle inequality and the cσ-Lipschitz property of σ( ) [cf. Assumption 22] as

ˆxl k xl k 2 cσ(ϕl k,d + ϕl k + ϕl k,u), (68)

with ϕl k,d := c Hl k,d b Rk,d ˆxl 1 k 1 Hl k,d Rk,dxl 1 k 1 ,

ϕl k := c Hl k ˆxl 1 k Hl kxl 1 k ,

ϕl k,u := c Hl k,u b Rk,u ˆxl 1 k+1 Hl k,u Rk,uxl 1 k+1 .

We now focus on upper bounding each of the terms.

1. Term ϕl k. By subtracting and adding c Hl kxl 1 k within the norm, and using the triangle inequality, we obtain ϕl k c Hl k(ˆxl 1 k xl 1 k ) + (c Hl k Hl k)xl 1 k ˆxl 1 k xl 1 k + c Hl k Hl k xl 1 k

ˆxl 1 k xl 1 k + (ck,d k,dϵk,d + ck,u k,uϵk,u) xl 1 k (70)

Published in Transactions on Machine Learning Research (08/2025)

where we used the SCF stability in Eq. (67) and that all SCFs have a normalized bounded frequency response in Assumption 20. Note that c Hl k is also characterized by h G(λ) with the same set of filter coefficients as Hl k.

2. Term ϕl k,d and ϕl k,u. By subtracting and adding a term c Hl k,d b Rk,dxl 1 k 1 within the norm, we have

ϕl k,d c Hl k,d b Rk,d(ˆxl 1 k 1 xl 1 k 1) + (c Hl k,d b Rk,d Hl k,d Rk,d)xl 1 k 1

b Rk,d ˆxl 1 k 1 xl 1 k 1 + c Hl k,d b Rk,d Hl k,d Rk,d xl 1 k 1 , (71)

where we used again triangle inequality and c Hl k,d 1 from Assumption 20. For the term b Rl k,d , we have b Rl k,d Rl k,d + Jk,d Rl k,d rk,d(1 + ϵk,d) where we used Rl k,d rk,d in Assumption 21 and

Jl k,d ϵk,d. For the second term of RHS in Eq. (71), by adding and subtracting c Hl k,d Rl k,d we have

c Hl k,d b Rk,d Hl k,d Rk,d = c Hl k,d b Rk,d c Hl k,d Rl k,d + c Hl k,d Rl k,d Hl k,d Rk,d

c Hl k,d b Rk,d Rk,d + c Hl k,d Hl k,d Rk,d

rk,dϵk,d + C k,d k,dϵk,drk,d

where we use the stability result of the lower SCF Hl k,d in Eq. (67). By substituting Eq. (72) into Eq. (71), we have ϕl k,d ˆrk,d ˆxl 1 k 1 xl 1 k 1 + (rk,dϵk,d + C k,d k,dϵk,drk,d) xl 1 k 1 . (73)

By following the same procedure [cf. Eq. (71) and Eq. (72)], we obtain

ϕl k,u ˆrk,u ˆxl 1 k+1 xl 1 k+1 + (rk,uϵk,u + C k,u k,uϵk,urk,u) xl 1 k+1 . (74)

3. Bound of ˆxl k xl k . Using the notations tk, tk,d and tk,u in Theorem 24, we then have a set of recursions, for k = 0, 1, . . . , K

ˆxl k xl k cσ(ˆrk,d ˆxl 1 k 1 xl 1 k 1 + tk,d xl 1 k 1 + ˆxl 1 k xl 1 k + tk xl 1 k

+ ˆrk,u ˆxl 1 k+1 xl 1 k+1 + tk,u xl 1 k+1 ). (75)

Define vector bl as [bl]k = ˆxl k xl k with b0 = 0. Let βl collect the energy of all outputs at layer l, with [βl]k := xl 1 k . We can express the Euclidean distances of all k-simplicial signal outputs for k = 0, 1, . . . , K, as bl cσ b Zbl 1 + cσT βl 1 (76)

where indicates elementwise smaller than or equal, and we have

t0 t0,u t1,d t1 t1,u ... ... ... t K 1,d t K 1 t K 1,u t K,d t K

1 ˆr0,u ˆr1,d 1 ˆr1,u ... ... ... ˆr K 1,d 1 ˆr K 1,u ˆr K,d 1

We are now interested in building a recursion for Eq. (76) for all layers l. We start with term xl k. Based on its expression in Eq. (36), we bound it as

xl k cσ( Hl k,d Rk,d xl 1 k 1 + Hl k xl 1 x + Hl k,u Rk,u xl 1 k+1 )

cσ(rk,d xl 1 k 1 + xl 1 x + rk,u xl 1 k+1 ), (78)

which holds for k = 0, 1, . . . , K. Thus, it can be expressed in the vector form as βl cσZβl 1, with

1 r0,u r1,d 1 r1,u ... ... ... r K 1,d 1 r K 1,u r K,d 1

Published in Transactions on Machine Learning Research (08/2025)

Similarly, we have βl 1 cσZβl 2, leading to βl cl σZlβ0 with β0 = β [cf. Assumption 23]. We can then express the bound Eq. (76) as bl cσ b Zbl 1 + cl σT Zl 1β. (80)

Thus, we have

b0 = 0, b1 cσT β, b2 c2 σ( b ZT β + T Zβ), b3 c3 σ( b Z2T β + b ZT Zβ + T Z2β), b4 . . . , (81)

which, inductively, leads to

i=1 b Zi 1T Zl iβ. (82)

Bt setting l = L, we obtain the bound b L d = c L σ PL l=1 b Zl 1T ZL lβ in Theorem 24.

G Experiment details

G.1 Synthetic experiments on Dirichlet energy evolution

We created a synthetic SC with 100 nodes, 241 edges and 135 triangles with the GUDHI toolbox Rouvreau (2015), and we set the initial inputs on three levels of simplices to be random sampled from U([ 5, 5]). We then built a SCCNN composed of simplicial shifting layers with weight w0 and nonlinearities including id, tanh and relu. When the weight follows the condition in Proposition 5, from Fig. 10 (the dashed lines labled as shift ), we see that the Dirichlet energies of all three outputs exponentially decrease as the number of layers increases. We then uncoupled the lower and upper parts of the Laplacians in the edge space in the shifting layers by setting γ = 1. As shown in Fig. 10 (the dotted lines), the Dirichlet energies of the edge outputs decrease at a slower rate than before. Lastly, we added the inter-simplicial couplings, which overcome the oversmoothing problems, as shown by the solid lines.

1 10 100 Layer

shift,node with proj.,node shift,edge with proj.,edge uncouple lower/upper,edge shift,tri. with proj.,tri.

(a) identity, γ = 2

1 10 100 Layer

shift,node with proj.,node shift,edge with proj.,edge uncouple lower/upper,edge shift,tri. with proj.,tri.

(b) tanh, γ = 2

1 10 100 Layer

shift,node with proj.,node shift,edge with proj.,edge uncouple lower/upper,edge shift,tri. with proj.,tri.

(c) relu, γ = 5

Figure 10: Oversmoothing effects of simplicial shifting and the mitigation effects of uncoupling lower and upper adjacencies and accounting for inter-simplicial couplings.

G.2 Additional details on Forex experiments

In the forex dataset, there are 25 currencies which can be exchanged pairwise at three timestamps. We first represented their exchange rates on the edges and took the logrithm, i.e., [x1][i,j] = log10 ri/j = [x1][j,i]. Then, the total arbitrage can be computed as the total curl B 2 x1.

We considered to recover the exchange rates under three types of settings: 1) random noise following normal distribution such that the signal-to-noise ration is 3d B, which is spread over the whole simplicial spectrum; 2) curl noise projected from triangle noise following normal distribution such that the signal-to-noise ration is 3d B, which is distributed only in the curl space; and 3) 50% of the total forex rates are recorded and the other half is not available, set as zero values.

Published in Transactions on Machine Learning Research (08/2025)

Table 7: Forex results (nmse, arbitrage) and the corresponding hyperparameters.

Methods Random noise Curl noise Interpolation

Input 0.119 0.004, 25.19 0.874 0.552 0.027, 122.36 5.90 0.717 0.030, 106.40 0.902

ℓ2-norm 0.036 0.005, 2.29 0.079 0.050 0.002, 11.12 0.537 0.534 0.043 , 9.67 0.082

SNN 0.11 0.005, 23.24 1.03 0.446 0.017, 86.947 2.197 0.702 0.033, 104.738 1.042 L = 5, F = 64, T = 4, tanh L = 6, F = 64, T = 3, tanh L = 2, F = 64, T = 1, tanh

PSNN 0.008 0.001, 0.984 0.17 0.000 0.000, 0.000 0.000 0.009 0.001, 1.128 0.329 L = 6, F = 64, tanh L = 5, F = 1, id L = 6, F = 64, tanh

Bunch 0.981 0.0 , 22.912 1.228 0.981 0.0, 22.912 1.228 0.983 0.005 , 19.887 6.341

MPSN 0.039 0.004, 7.748 0.943 0.076 0.012, 14.922 2.493 0.117 0.063, 23.147 11.674 L = 2, F = 64, id, sum L = 4, F = 64, tanh, mean L = 2, F = 64, tanh, sum

SCCNN, id 0.027 0.005, 0.000 0.000 0.000 0.000, 0.000 0.000 0.265 0.036 , 0.000 0.000 L = 2, F = 16, Td = 0, Tu = 3 L = 5, F = 1, Td = 1, Tu = 1 L = 2, F = 16, Td = 0, Tu = 3

SCCNN, tanh 0.002 0.000, 0.325 0.082 0.000 0.000, 0.003 0.003 0.003 0.002, 0.279 0.151 L = 6, F = 64, Td = 5, Tu = 2 L = 1, F = 64, Td = 2, Tu = 2 L = 6, F = 64, Td = 5, Tu = 1

First, as a baseline method, we chose ℓ2 norm of the curl B2x1 as a regularizer to reduce the total arbitrage, i.e., ˆx1 = (I +w L1,u) 1x1 with a regularization weight w [0, 10]. For the learning methods, we consider the following hyperparameter ranges: the number of layers to be L {1, 2, . . . , 6}, the number of intermediate features to be F {1, 16, 32, 64}. For the convolutional methods including SNN Ebli et al. (2020), PSNN Roddenberry et al. (2021), Bunch Bunch et al. (2020) and SCCNN, we considered the intermediate layers with nonlinearities including id and tanh. The convolution orders of SNN and SCCNN are set to be {1, 2, . . . , 5}. For the message-passing method, MPSN Bodnar et al. (2021b), we considered the setting from (Bodnar et al., 2021b, Eq. 35) where the sum and mean aggregations are used and each message update function is a two-layer MLP. With these noisy or masked rates as inputs and the clean arbitrage-free rates as outputs, we trained different learning methods at the first timestamp, and validated the hyperparameters at the second timestamp, and tested their performance at the thrid one. During the training of 1000 epochs, a normalized MSE loss function and adam optimizer with a fixed learning rate of 0.001 are used. We run the same experiments for 10 times. Table 7 reports the best results (nmse) and the total arbitrage, together with the hyperparameters.

G.3 Additional details on Simplex prediction

G.3.1 Method in Detail

The method for simplex prediction is generalized from link prediction based on GNNs by Zhang & Chen (2018): For k-simplex prediction, we use an SCCNN in an SC of order up to k to first learn the features of lower-order simplices up to order k 1. Then, we concatenate these embedded lower-order simplicial features and input them to a two-layer MLP which predicts if a k-simplex is positive (closed, shall be included in the SC) or negative (open, not included in the SC).

For example, in 2-simplex prediction, consider an SC of order two, which is built based on nodes, edges and (existing positive) triangles. Given the initial inputs on nodes x0 and on edges x1 and zero inputs on triangles x2 = 0 since we assume no prior knowledge on triangles, for an open triangle t = [i, j, k], an SCCNN is used to learn features on nodes and edges (denoted by y). Then, we input the concatenation of the features on three nodes or three edges to an MLP, i.e., MLPnode([y0]i [y0]j [y0]k) or MLPedge([y][i,j] [y][j,k] [y][i,k]), to predict if triangle t is positive or negative. A MLP taking both node and edge features is possible, but we keep it on one simplex level for complexity purposes. Similarly, we consider an SCCNN in an SC of order three for 3-simplex prediction, which is followed by an MLP operating on either nodes, edges or triangles.

Published in Transactions on Machine Learning Research (08/2025)

G.3.2 Data Preprocessing

We consider the data from the Semantic Scholar Open Research Corpus Ammar et al. (2018) to construct a coauthorship complex where nodes are authors and collaborations between k-author are represented by (k 1)-simplices. Following the preprocessing in Ebli et al. (2020), we obtain 352 nodes, 1472 edges, 3285 triangles, 5019 tetrahedrons (3-simplices) and a number of other higher-order simplices. The node signal x0, edge flow x1 and triangle flow x2 are the numbers of citations of single author papers and the collaborations of two and three authors, respectively.

For the 2-simplex prediction, we use the collaboration impact (the number of citations) to split the total set of triangles into the positive set TP = {t|[x2]t > 7} containing 1482 closed triangles and the negative set TN = {t|[x2]t 7} containing 1803 open triangles such that we have balanced positive and negative samples. We further split the 80% of the positive triangle set for training, 10% for validation and 10% for testing; likewise for the negative triangle set. Note that in the construction of the SC, i.e., the incidence matrix B2, Hodge Laplacians L1,u and L2,d, we ought to remove negative triangles in the training set and all triangles in the test set. That is, for 2-simplex prediction, we only make use of the training set of the positive triangles since the negative ones are not in the SC.

Similarly, we prepare the dataset for 3-simplex (tetrahedron) prediction, amounting to the tetradic collaboration prediction. We obtain balanced positive and negative tetrahedron sets based on the citation signal x3. In the construction of B3, L2,u and L3,d, we again only use the tetrahedrons in the positive training set.

G.3.3 Models

For comparison, we first use heuristic methods proposed in Benson et al. (2018) as baselines to determine if a triangle t = [i, j, k] is closed, namely, 1) Harmonic mean: st = 3/([x1] 1 [i,j] + [x1] 1 [j,k] + [x1] 1 [i,k]), 2) Geometric mean: st = limp 0[([x1]p [i,j] + [x1]p [j,k] + [x1]p [i,k])]1/p, and 3) Arithmetic mean: st = ([x1][i,j] + [x1][j,k] + [x1][i,k])/3, which compute the triangle weight based on its three faces. Similarly, we generalized these mean methods to compute the weight of a 3-simplex [i, j, k, m] based on the four triangle faces in 3-simplex prediction.

We then consider different learning methods. Specifically, 1) Bunch by Bunch et al. (2020) (we also generalized this model to 3-dimension for 3-simplex prediction); 2) Message passing simplicial network ( MPSN ) by Bodnar et al. (2021b) which provides a baseline of message passing scheme in comparison to the convolution scheme; 3) Principled SNN ( PSNN ) by Roddenberry et al. (2021); 4) SNN by Ebli et al. (2020); 5) SCNN by Yang et al. (2021); 6) GNN by Defferrard et al. (2016); 7) MLP: providing as a baseline for the effect of using inductive models.

For MLP, Bunch, MPSN and our SCCNN, we consider the outputs in the node and edge spaces, respectively, for 2-simplex prediction, which are denoted by a suffix -Node or -Edge . For 3-simplex prediction, the output in the triangle space can be used as well, denoted by a suffix -Tri. , where we also build SCNNs in both edge and triangle spaces.

G.3.4 Experimental Setup and Hyperparameters

We consider the normalized Hodge Laplacians and incidence matrices, a particular version of the weighted ones Horak & Jost (2013); Grady & Polimeni (2010). Specifically, we use the symmetric version of the normalized random walk Hodge Laplacians in the edge space, proposed by Schaub et al. (2020), which were used in Bunch et al. (2020); Chen et al. (2022a) as well. We generalized the definitions for triangle predictions.

Hyperparameters 1) the number of layers: L {1, 2, 3, 4, 5}; 2) the number of intermediate and output features to be the same as F {16, 32}; 3) the convolution orders for SCCNNs are set to be the same, i.e., T d = Td = Tu = T u = T {1, 2, 3, 4, 5}. We do so to avoid the exponential growth of the parameter search space. For GNNs (Defferrard et al., 2016) and SNNs (Ebli et al., 2020), we set the convolution orders to be T {1, 2, 3, 4, 5} while for SCNNs (Yang et al., 2022a), we allow the lower and upper convolutions to have different orders with Td, Tu {1, 2, 3, 4, 5}; 4) the nonlinearity in the feature learning phase: Leaky Re LU with a negative slope 0.01; 5) MPSN is set as Bodnar et al. (2022); 6) the MLP in the prediction phase: two

Published in Transactions on Machine Learning Research (08/2025)

layers with a sigmoid nonlinearity. For 2-simplex prediction, the number of the input features for the node features is 3F, and for the edge features is 3F. For 3-simplex prediction, the number of the input features for the node features is 4F, for the edge features is 6F and for the triangle features is 4F since a 3-simplex has four nodes, six edges and four triangles. The number of the intermediate features is the same as the input features, and that of the output features is one; and, 7) the binary cross entropy loss and the adam optimizer with a learning rate of 0.001 are used; the number of the epochs is 1000 where an early stopping is used. We compute the AUC to compare the performance and run the same experiments for ten times with random data splitting.

G.3.5 Results

In Table 8, we report the best results of each method with the corresponding hyperparameters. Different hyperparameters can lead to similar results, but we report the ones with the least complexity. All experiments for simplex predictions were run on a single NVIDIA A40 GPU with 48 GB of memory using CUDA 11.5.

Table 8: 2- (Left) and 3-Simplex (Right) prediction AUC (%) results.

Methods AUC Parameters

Harm. Mean 62.8 2.7 Arith. Mean 60.8 3.2 Geom. Mean 61.7 3.1 MLP-Node 68.5 1.6 L = 1, F = 32

GNN 93.9 1.0 L = 5, F = 32, T = 2 SNN-Edge 92.0 1.8 L = 5, F = 32, T = 5 PSNN-Edge 95.6 1.3 L = 5, F = 32 SCNN-Edge 96.5 1.5 L = 5, F = 32, Td = 5, Tu = 2

Bunch-Node 98.3 0.5 K = 1, L = 4, F = 32 MPSN-Node 98.1 0.5 K = 1, L = 3, F = 32 SCCNN-Node 98.7 0.5 K = 1, L = 2, F = 32, T = 2

Methods AUC Parameters

Harm. Mean 63.6 1.6 Arith. Mean 62.2 1.4 Geom. Mean 63.1 1.4 MLP-Tri. 69.0 2.2 L = 3, F = 32

GNN 96.6 0.5 L = 5, F = 32, T = 5 SNN-Tri. 95.1 1.2 L = 5, F = 32, T = 5 PSNN-Tri. 98.1 0.5 L = 5, F = 32 SCNN-Tri. 98.3 0.4 L = 5, F = 32, Td = 2, Tu = 1

Bunch-Edge 98.5 0.5 K = 3, L = 4, F = 16 MPSN-Edge 99.2 0.3 K = 3, L = 3, F = 32 SCCNN-Node 99.4 0.3 K = 3, L = 3, F = 32, T = 3

G.3.6 Complexity

Table 9: (Left) Complexity of the best three methods for 2-simplex prediction. (Right) Running time of SCCNN with different layers and convolution orders.

Method #params. Running time (seconds per epoch)

SCCNN 24288 0.073 Bunch 21728 0.140 MPSN 84256 0.028

Hyperparams. T = 2 T = 5

L = 2 0.073 0.082 L = 3 0.110 0.130 L = 5 0.192 0.237

Here we report the number of parameters and the running time of SCCNN for 2-simplex prediction on one NVIDIA Quadro K2200 with 4GB memory, compared with the two best alternatives. MPSN, compared to convolutional methods, has three times more parameters, analogous to the comparison between messagepassing and graph convolutional NNs. We also report the running time as the layers and convolution orders increase.

G.3.7 Ablation Study

We perform an ablation study to observe the roles of different components in SCCNNs.

SC Order K We investigate the influence of the SC order K. Table 10 reports the 2-simplex prediction results for K = {1, 2} and the 3-simplex prediction results for K = {1, 2, 3}. We observe that for k-simplex prediction, it does not necessarily guarantee a better prediction with a higher-order SC, which further

Published in Transactions on Machine Learning Research (08/2025)

indicates that a positive simplex could be well encoded by both its faces and other lower-order subsets. For example, in 2-simplex prediction, SC of order one gives better results than SC of order two (similar for Bunch), showing that in this coauthorship complex, triadic collaborations are better encoded by features on nodes than pairwise collaborations. In 3-simplex prediction, SCs of different orders give similar results, showing that tetradic collaborations can be encoded by nodes, as well as by pairwise and triadic collaborations.

Table 10: Prediction results of SCCNNs with different SC order K.

Method 2-Simplex Parameters

SCCNN-Node 98.7 0.5 K = 1, L = 2, F = 32, T = 2 SCCNN-Node 98.4 0.5 K = 2, L = 2, F = 32, T = 2 Bunch-Node 98.3 0.4 K = 1, L = 4, F = 32 Bunch-Node 98.0 0.4 K = 2, L = 4, F = 32 MPSN-Node 94.5 1.5 K = 1, L = 3, F = 32 MPSN-Node 98.1 0.5 K = 2, L = 3, F = 32

SCCNN-Edge 97.9 0.9 K = 1, L = 3, F = 32, T = 5 SCCNN-Edge 95.9 1.0 K = 2, L = 5, F = 32, T = 3 Bunch-Edge 97.3 1.1 K = 1, L = 4, F = 32 Bunch-Edge 94.6 1.2 K = 2, L = 4, F = 32 MPSN-Edge 94.1 2.4 K = 1, L = 3, F = 32 MPSN-Edge 97.0 1.2 K = 2, L = 2, F = 16

Method 3-Simplex Parameters

SCCNN-Node 99.3 0.3 K = 1, L = 2, F = 32, T = 1 SCCNN-Node 99.3 0.2 K = 2, L = 2, F = 32, T = 5 SCCNN-Node 99.4 0.3 K = 3, L = 3, F = 32, T = 3 MPSN-Node 96.0 1.2 K = 1, L = 3, F = 32 MPSN-Node 98.2 0.8 K = 2, L = 2, F = 32

SCCNN-Edge 98.9 0.5 K = 1, L = 3, F = 32, T = 5 SCCNN-Edge 99.2 0.4 K = 2, L = 5, F = 32, T = 5 SCCNN-Edge 99.0 1.0 K = 3, L = 5, F = 32, T = 5 MPSN-Edge 96.3 1.1 K = 1, L = 3, F = 32 MPSN-Edge 98.3 0.8 K = 2, L = 3, F = 32

SCCNN-Tri. 97.9 0.7 K = 2, L = 4, F = 32, T = 4 SCCNN-Tri. 97.4 0.9 K = 3, L = 4, F = 32, T = 4 MPSN-Tri. 99.1 0.2 K = 2, L = 3, F = 32

Missing Components in SCCNN With a focus on 2-simplex prediction with SCCNN-Node of order one, to avoid overcrowded settings, we study how each component of an SCCNN influences the prediction. We consider the following settings without: 1) Edge-to-Node , where the projection x0,u from edge to node is not included, equivalent to GNN; 2) Node-to-Node , where for node output, we have xl 0 = σ(Hl 0,u R1,uxl 1 1 ); 3) Node-to-Edge , where the projection x1,d from node to edge is not included, i.e., we have xl 1 = σ(Hl 1xl 1 1 ); and 4) Edge-to-Edge , where for edge output, we have xl 1 = σ(Hl 1,d R1,dxl 1 0 ).

Table 11: 2-Simplex prediction (SCCNN-Node without certain components or with limited inputs).

Missing Component AUC Parameters

98.7 0.5 L = 2, F = 32, T = 2 Edge-to-Node 93.9 0.8 L = 5, F = 32, T = 2 Node-to-Node 98.7 0.4 L = 4, F = 32, T = 2 Edge-to-Edge 98.5 1.0 L = 3, F = 32, T = 3 Node-to-Edge 98.8 0.3 L = 4, F = 32, T = 3

Missing Input AUC Parameters

98.7 0.5 L = 2, F = 32, T = 2 Node input 98.2 0.5 L = 2, F = 32, T = 4 Edge input 98.1 0.4 L = 2, F = 32, T = 3 Node, Edge inputs 50.0 0.0

From the results in Table 11 (Left), we see that No Edge-to-Node , i.e., GNN, gives much worse results as it leverages no information on edges with limited expressive power. For cases with other components missing, a similar performance can be achieved, however, at a cost of the model complexity, with either a higher convolution order or a larger number of layers L, while the latter in turn degrades the stability of the SCCNNs, as discussed in Section 5. SCCNNs with certain inter-simplicial couplings pruned/missing can be powerful as well (this is similarly shown by (Bodnar et al., 2021b, Thm. 6)), but if we did not consider certain component, it comes with a cost of complexity which might degrade the model stability if more number layers are required.

Limited Input We study the influence of limited input data for model SCCNN-Node of order two. Specifically, we consider the input on either nodes or edges is missing. From Table 11, we see that the prediction performance does not deteriorate at a cost of the model complexity (higher convolution orders) when a certain part of the input missing except with full zeros as input. This ability of learning from limited data shows the robustness of SCCNNs.

Published in Transactions on Machine Learning Research (08/2025)

G.3.8 Stability Analysis

We then perform a stability analysis of SCCNNs. We artificially add perturbations to the normalization matrices when defining the Hogde Laplacians, which resemble the weights of simplices. We consider small perturbations E0 on node weights which is a diagonal matrix following that E0 ϵ0/2. We generate its diagonal entries from a uniform distribution [ ϵ0/2, ϵ0/2) with ϵ0 [0, 1], which represents one degree of deviation of the node weigths from the true ones. Similarly, perturbations on edge weights and triangle weights are applied to study the stability. In a SCCNN-Node for 2-simplex prediction of K = 2, we measure the distance between the simplicial outputs with and without perturbations on nodes, edges, and triangles, i.e., x L k ˆx L k / x L k , for k = 0, 1, 2.

Stability dependence We first show the stability mutual dependence between different simplices in Fig. 11. We see that under perturbation on node weights, the triangle output is not influenced until the number of layers becomes two; likewise, the node output is not influenced by perturbations on triangle weights with a one-layer SCCNN. Also, a one-layer SCCNN under perturbations on edge weights will cause outputs on nodes, edges, triangles perturbed. Lastly, we observe that the same degree of perturbations added to different simplices causes different degrees of instability, owing to the number nk of k-simplices in the stability bound. Since N0 < N1 < N2, the perturbations on node weights cause less instability than those on edge and triangle weights.

0 10 3 10 2 10 1 100 ϵ0

10 2 dist node dist edge dist tri

(a) Node pert.L = 1

0 10 3 10 2 10 1 100 ϵ0

dist node dist edge dist tri

(b) Node pert.L = 2

0 10 3 10 2 10 1 100 ϵ1

10 1 dist node dist edge dist tri

(c) Edge pert.L = 1

0 10 3 10 2 10 1 100 ϵ2

10 1 dist node dist edge dist tri

(d) Tri. pert.L = 1

0 10 3 10 2 10 1 100 ϵ2

dist node dist edge dist tri

(e) Tri. pert.L = 2

Figure 11: The stabilities of different simplicial outputs are dependent on each other.

Number of Layers Fig. 12 shows that the stability of SCCNNs degrades as the number of layers increases as studied in Theorem 24. As the NN deepens, the stability deteriorates, which corresponds to our analysis of using shallow layers.

0 10 3 10 2 10 1 100 ϵ1

10 1 dist node dist edge dist tri

0 10 3 10 2 10 1 100 ϵ1

10 1 dist node dist edge dist tri

0 10 3 10 2 10 1 100 ϵ1

10 1 dist node dist edge dist tri

0 10 3 10 2 10 1 100 ϵ1

10 1 dist node dist edge dist tri

0 10 3 10 2 10 1 100 ϵ1

dist node dist edge dist tri

Figure 12: The relative difference of SCCNN outputs with and without perturbations in terms of different numbers of layers. We consider perturbations on edge weights.

G.4 Additional details on Trajectory prediction

G.4.1 Problem Formulation

A trajectory of length m can be modeled as a sequence of nodes [v0, v1, . . . , vm 1] in an SC. The task is to predict the next node vm from the neighbors of vm 1, Nvm 1. The algorithm in Roddenberry et al. (2021) first represents the trajectory equivalently as a sequence of oriented edges [[v0, v1], [v1, v2], . . . , [vm 2, vm 1]]. Then, an edge flow x1 is defined, whose value on an edge e is [x1]e = 1 if edge e is traversed by the trajectory

Published in Transactions on Machine Learning Research (08/2025)

in a forward direction, [x1]e = 1 if edge e is traversed in a backward direction by the trajectory, and [x1]e = 0, otherwise.

With the trajectory flow x1 as the input, together with zero inputs on the nodes and triangles, an SCCNN of order two is used to generate a representation x L 1 of the trajectory, which is the output on edges. This is followed by a projection step x L 0,u = B1W x L 1 , where the output is first passed through a linear transformation via W , then projected into the node space via B1. Lastly, a distribution over the candidate nodes Nvm 1 is computed via a softmax operation, nj = softmax([x L 0,u]j), j Nvm 1. The best candidate is selected as vm = argmaxjnj. We refer to Roddenberry & Segarra (2019, Alg. S-2) for more details.

Given that an SCCNN of order two generates outputs also on nodes, we can directly apply the node feature output x L 0 to compute a distribution over the candidate nodes Nvm 1 without the projection step. We refer to this as SCCNN-Node, and the method of using the edge features with the projection step as SCCNN-Edge.

G.4.2 Model

In this experiment, we consider the following methods: 1) PSNN by Roddenberry et al. (2021); 2) SNN by Ebli et al. (2020); 3) SCNN by Yang et al. (2022a) where we consider different lower and upper convolution orders Td, Tu; and 4) Bunch by Bunch et al. (2020) where we consider both the node features and edge features, namely, Bunch-Node and Bunch-Edge.

Synthetic Data Following the procedure in Schaub et al. (2020), we generate 1000 trajectories as follows. First, we create an SC with two holes by uniformly drawing 400 random points in the unit square, and then a Delaunay triangulation is applied to obtain a mesh, followed by the removal of nodes and edges in two regions. To generate a trajectory, we consider a starting point at random in the lower-left corner, and then connect it via a shortest path to a random point in the upper left, center, or lower-right region, which is connected to another random point in the upper-right corner via a shortest path.

We consider the random walk Hodge Laplacians Schaub et al. (2020). For Bunch method, we set the shifting matrices as the simplicial adjacency matrices defined in Bunch et al. (2020). We consider different NNs with three intermediate layers where each layer contains F = 16 intermediate features. The tanh nonlinearity is used such that the orientation equivariance holds. The final projection n generates a node feature of dimension one. In the 1000-epoch training, we use the cross-entropy loss function between the output d and the true candidate and we consider an adam optimizer with a learning rate of 0.001 and a batch size 100. To avoid overfitting, we apply a weight decay of 5 10 6 and an early stopping.

As done in Roddenberry et al. (2021), besides the standard trajectory prediction task, we also perform a reverse task where the training set remains the same but the direction of the trajectories in the test set is reversed and a generalization task where the training set contains trajectories running along the upper left region and the test set contains trajectories around the other region. We evaluate the correct prediction ratio by averaging the performance over 10 different data generations.

Real Data We also consider the Global Drifter Program dataset3 localized around Madagascar. It consists of ocean drifters whose coordinates are logged every 12 hours. An SC can then be created as Schaub et al. (2020) by treating each mesh as a node, connecting adjacent meshes via an edge and filling the triangles, where the hole is yielded by the island. Following the process in Roddenberry et al. (2021), it results in 200 trajectories and we use 180 of them for training. In the training, a batch size of 10 is used and no weight decay is used. The rest experiment setup remains the same as the synthetic case.

G.4.3 Results

We report the prediction accuracy of different tasks for both datasets in Table 12. We first investigate the effects of applying higher-order SCFs in the simplicial convolution and accounting for the lower and upper contributions. From the standard accuracy for both datasets, we observe that increasing the convolution orders improves the prediction accuracy, e.g., SCNNs become better as the orders Td, Tu increase and perform

3http://www.aoml.noaa.gov/envids/gld/,

Published in Transactions on Machine Learning Research (08/2025)

always better than PSNN, and SCCNNs better than Bunch. Also, differentiating the lower and upper convolutions does help improve the performance as SCNN of orders Td = Tu = 3 performs better than SNN of T = 3.

However, accounting for the node and triangle contributions in SCCNNs does not help the prediction compared to the SCNNs, likewise for Bunch compared to PSNN. This is due to the zero node and triangle inputs because there are no available node and triangle features. Similarly, the prediction directly via the node output features is not accurate compared to projection from edge features.

Moreover, we also observe that the performance of SCCNNs that are trained with the same data does not deteriorate in the reverse task because the orientation equivariance ensures SCCNNs to be unaffected by the orientations of the simplicial data. Lastly, we see that, like other NNs on SCs, SCCNNs have good transferability to the unseen data.

Table 12: Trajectory Prediction Accuracy. (Left): Synthetic trajectory in the standard, reverse and generalization tasks. (Right): Ocean drifter trajectories. For SCCNNs, we set the lower and upper convolution orders Td, Tu to be the same as T.

Methods Standard Reverse Generalization Parameters

PSNN 63.1 3.1 58.4 3.9 55.3 2.5 SCNN 65.6 3.4 56.6 6.0 56.1 3.6 Td = Tu = 2 SCNN 66.5 5.8 57.7 5.4 60.6 4.0 Td = Tu = 3 SCNN 67.3 2.3 56.9 4.8 59.4 4.2 Td = Tu = 4 SCNN 67.7 1.7 55.3 5.3 61.2 3.2 Td = Tu = 5 SNN 65.5 2.4 53.6 6.1 59.5 3.7 T = 3

Bunch-Node 35.4 3.4 38.1 4.6 29.0 3.0 Bunch-Edge 62.3 4.0 59.6 6.1 53.9 3.1 SCCNN-Node 46.8 7.3 44.5 8.2 31.9 5.0 T = 1 SCCNN-Edge 64.6 3.9 57.2 6.3 54.0 3.0 T = 1 SCCNN-Node 43.5 9.6 44.4 7.6 32.8 2.6 T = 2 SCCNN-Edge 65.2 4.1 58.9 4.1 56.8 2.4 T = 2

Standard Parameters

49.0 8.0 52.5 9.8 Td = Tu = 2 52.5 7.2 Td = Tu = 3 52.5 8.7 Td = Tu = 4 53.0 7.8 Td = Tu = 5 52.5 6.0 T = 3

35.0 5.9 46.0 6.2 40.5 4.7 T = 1 52.5 7.2 T = 1 45.5 4.7 T = 2 54.5 7.9 T = 2

G.4.4 Convolution Order and Integral Lipschitz Property

Here, to illustrate that the integral Lipschitz property of the SCFs helps the stability of NNs on SCs, we consider the effect of regularizer r IL in Eq. (16) against perturbations in PSNNs and SCNNs with different Td and Tu for the standard synthetic trajectory prediction. The regularization weight on r IL is set as 5 10 4

and the number of samples to approximate the frequencies is set such that the sampling interval is 0.01.

Fig. 13 shows the prediction accuracy and the relative distance between the edge outputs of the NNs trained with and without the integral Lipschitz regularizer in terms of different levels of perturbations. We see that the integral Lipschitz regularizer helps the stability of the NNs, especially for large SCF orders, where the edge output is less influenced by the perturbations compared to without the regularizer. Meanwhile, SCNN with higher-order SCFs, e.g., Td = Tu = 5, achieves better prediction than PSNN (with one-step simplicial shifting), while maintaining a good stability with its output not influenced by perturbations drastically.

We also measure the lower and upper integral Lipschitz constants of the trained NNs across different layers and features, given by maxλk,G |λk,G hk,G(λk,G)| and maxλk,C |λk,C hk,C(λk,C)|, shown in Fig. 14. We see that the SCNN trained with r IL indeed has smaller integral Lipschitz constants than the one trained without the regularizer, thus, a better stability, especially for NNs with higher-order SCFs.

Published in Transactions on Machine Learning Research (08/2025)

with r IL without r IL

0.00 0.25 0.50 0.75 1.00 ϵ

x1 ˆx1 2/ x1 2

with r IL without r IL

with r IL without r IL

0.00 0.25 0.50 0.75 1.00 ϵ

x1 ˆx1 2/ x1 2

with r IL without r IL

with r IL without r IL

0.00 0.25 0.50 0.75 1.00 ϵ

x1 ˆx1 2/ x1 2

with r IL without r IL

with r IL without r IL

0.00 0.25 0.50 0.75 1.00 ϵ

x1 ˆx1 2/ x1 2

with r IL without r IL

Figure 13: Effect of the integral Lipschitz regularizer r IL in the task of synthetic trajectory prediction against different levels ϵ of random perturbations on L1,d and L1,u. We show the accuracy (Top row) and the relative distance between the edge output (Bottom row) for different NNs on SCs with and without r IL. SCNN13 is the SCNN with Td = 1 and Tu = 3.

C1 1,d C1 1,u C2 1,d C2 1,u C3 1,d C3 1,u

(a) SCNN31, without r IL

C1 1,d C1 1,u C2 1,d C2 1,u C3 1,d C3 1,u 0

(b) SCNN31, with r IL

C1 1,d C1 1,u C2 1,d C2 1,u C3 1,d C3 1,u 0

(c) SCNN55, without r IL

C1 1,d C1 1,u C2 1,d C2 1,u C3 1,d C3 1,u

(d) SCNN55, with r IL

Figure 14: The integral Lipschitz constants of SCFs at each layer of the trained SCNNs with and without the integral Lipschitz regularizer r IL. We use symbols cl k,d and cl k,u to denote the lower and upper integral Lipschitz constants at layer l. Regularizer r IL promotes the integral Lipschitz property, thus, the stability, especially for NNs with large SCF orders.