# going_beyond_persistent_homology_using_persistent_homology__41ec4248.pdf Going beyond persistent homology using persistent homology Johanna Immonen University of Helsinki johanna.x.immonen@helsinki.fi Amauri H. Souza Aalto University Federal Institute of Ceará amauri.souza@aalto.fi Vikas Garg Aalto University Yai Yai Ltd vgarg@csail.mit.edu Representational limits of message-passing graph neural networks (MP-GNNs), e.g., in terms of the Weisfeiler-Leman (WL) test for isomorphism, are well understood. Augmenting these graph models with topological features via persistent homology (PH) has gained prominence, but identifying the class of attributed graphs that PH can recognize remains open. We introduce a novel concept of color-separating sets to provide a complete resolution to this important problem. Specifically, we establish the necessary and sufficient conditions for distinguishing graphs based on the persistence of their connected components, obtained from filter functions on vertex and edge colors. Our constructions expose the limits of vertexand edge-level PH, proving that neither category subsumes the other. Leveraging these theoretical insights, we propose Re PHINE for learning topological features on graphs. Re PHINE efficiently combines vertexand edge-level PH, achieving a scheme that is provably more powerful than both. Integrating Re PHINE into MP-GNNs boosts their expressive power, resulting in gains over standard PH on several benchmarks for graph classification. 1 Introduction Topological data analysis (TDA) is a rapidly growing field that provides tools from algebraic topology for uncovering the shape (or structure) of data, allowing for efficient feature extraction. Its flagship tool is persistent homology (PH) [8], which seeks to characterize topological invariants (e.g., connected components, loops) of an underlying manifold based on data samples. Notably, PH has been successfully applied in many scientific domains, including computer vision [17, 27], drug design [23], fluid dynamics [24], and material science [25]. For graphs, PH has been used to provide global topological signatures for graph-level prediction tasks [2, 12, 14, 33, 39] and act as local message modulators in graph neural networks (GNNs) for nodelevel tasks [4, 40]. By leveraging learnable filtration/vectorization maps, PH has also been integrated into neural networks as a building block in the end-to-end learning process [2, 3, 13, 15, 26]. These strategies allow us to exploit topological features to boost the predictive capabilities of graph models. However, in stark contrast with the developments on the representational power of GNNs [1, 11, 28 30, 34, 35, 37], the theoretical properties of PH on graphs remain much less explored. For instance, open questions include: Which graph properties can PH capture? What is the characterization of pairs of graphs that PH cannot separate? Can we improve the expressivity of PH on graphs? In a recent work, Rieck [32] discusses the expressivity of PH on graphs in terms of the Weisfeiler Leman (WL) hierarchy [36]. The paper shows that, given different k-WL colorings, there exists a filtration such that the corresponding persistence diagrams differ. This result provides a lower bound for the expressivity in terms of WL hierarchy, but it does not describe the class of graphs which can be distinguished via PH. In this paper, we aim to fully characterize this class of graphs. Equal contribution. 37th Conference on Neural Information Processing Systems (Neur IPS 2023). Theoretical contributions of this work On vertex-level filtrations (Section 2 and Section 3.1): Inconsistency issues due to injective vertex filtrations Lemma 1 Real holes (d = ) = Component-wise colors Lemma 2 Almost holes (b = d, d = ) = Separating sets Lemma 3 Distinct almost holes Color-separating set Lemma 4 Birth time of persistence tuples = Vertex color Lemma 5 The expressive power of vertex-color filtrations Theorem 1 On edge-level filtrations (Section 3.2): Almost holes = Disconnecting sets Lemma 6 Reconstruction of disconnecting sets Lemma 7 The expressive power of edge-color filtrations Theorem 2 Vertex-level vs. edge-level filtrations (Section 3.3): Vertex-level persistence edge-level persistence, and vice-versa Theorem 3 New method (Re PHINE) (Section 4): Re PHINE is isomorphism invariant Theorem 4 Re PHINE vertex-, edge-, and vertexedge-level diagrams Theorem 5 Figure 1: Overview of our theoretical results. We study the expressive power of PH on attributed (or colored) graphs, viewed as 1-dim simplicial complexes. We focus on the class of graph filtrations induced by functions on these colors. Importantly, such a class is rather general and reflects choices of popular methods (e.g., topological GNNs [15]). We first analyze the persistence of connected components obtained from vertex colors. Then, we extend our analysis to graphs with edge colors. To obtain upper bounds on the expressive power of color-based PH, we leverage the notion of separating/disconnecting sets. This allows us to establish the necessary and sufficient conditions for the distinguishability of two graphs from 0-dim persistence diagrams (topological descriptors). We also provide constructions that highlight the limits of vertexand edge-color PH, proving that neither category subsumes the other. Based on our insights, we present Re PHINE (short for Refining PH by Incorporating Node-color into Edge-based filtration ), a simple method that exploits a subtle interplay between vertexand edge-level persistence information to improve the expressivity of color-based PH. Importantly, Re PHINE can be easily integrated into GNN layers and incurs no computational burden to the standard approach. Experiments support our theoretical analysis and show the effectiveness of Re PHINE on three synthetic and six real datasets. We also show that Re PHINE can be flexibly adopted in different architectures and outperforms Pers Lay [2] a popular topological neural net. In sum, our contributions are three-fold: (Theory) We establish a series of theoretical results that characterize PH on graphs, including bounds on the expressivity of vertexand edge-level approaches, the relationship between these approaches, and impossibility results for color-based PH as summarized in Figure 1. (Methodology) We introduce a new topological descriptor (Re PHINE) that is provably more expressive than standard 0-dim and 1-dim persistence diagrams. (Experiments) We show that the improved expressivity of our approach also translates into gains in real-world graph classification problems. 2 Preliminaries We consider arbitrary graphs G = (V, E, c, X) with vertices V = {1, 2, . . . , n}, edges E V V and a vertex-coloring function c : V X, where X denotes a set of m colors or features {x1, x2, . . . , xm} such that each color xi Rd. We say two graphs G = (V, E, c, X) and G = (V , E , c , X ) are isomorphic (denoted by G = G ) if there is a bijection g : V V such that (v, w) E iff (g(v), g(w)) E and c = c g. Here, we also analyze settings where graphs have an edge-coloring function l. A filtration of a graph G is a finite nested sequence of subgraphs of G, that is, G1 G2 ... G. Although the design of filtrations can be flexible [12], a typical choice consists of leveraging a vertex filter (or filtration) function f : V R for which we can obtain a permutation π of n vertices such that f(π(1)) f(π(2)) f(π(n)). Then, a filtration induced by f is an indexed set {Gf(π(i))}n i=1 such that Gf(π(i)) G is the subgraph induced by the set of vertices Vf(π(i)) = {v V | f(v) f(π(i))}. Note that filtration functions which give the same permutation of vertices induce the same filtration. Persistent homology keeps track of the topological features of each subgraph in a filtration. For graphs G, these features are either the number of connected components or independent cycles (i.e., 0and 1-dim topological features, denoted respectively by the Betti numbers β0 G and β1 G) and can be efficiently computed using computational homology. In particular, if a topological feature first appears in Gf(π(i)) and disappears in Gf(π(j)), then we encode its persistence as a pair (f(π(i)), f(π(j))); if a feature does not disappear in Gf(π(n)) = G, then its persistence is (f(π(i)), ). The collection of all pairs forms a multiset that we call persistence diagram [5]. We use D0 and D1 to refer to the persistence diagrams for 0and 1-dim topological features respectively. Appendix A provides a more detailed treatment for persistent homology. Recent works have highlighted the importance of adopting injective vertex filter functions. Hofer et al. [13] show that injectivity of parameterized functions fθ : V R is a condition for obtaining well-defined gradients with respect to the parameters θ, enabling end-to-end filtration learning. Also, Horn et al. [15] show that for any non-injective function, we can find an arbitrarily close injective one that is at least as powerful at distinguishing non-isomorphic graphs as the original (non-injective) function. However, Lemma 1 shows that filtrations induced by injective functions on vertices may result in inconsistent persistence diagrams; namely, different diagrams for isomorphic graphs. Lemma 1 (Injective vertex-based filtrations can generate inconsistent persistence diagrams). Consider persistence diagrams obtained from injective vertex filter functions. There are isomorphic graphs G = G such that their persistence diagrams are different, i.e., DG = DG . To avoid inconsistent diagrams, we need to employ permutation equivariant filter functions see [32, Lemma 2]. Common choices include vertex degree [12], eigenvalues of the graph Laplacian [2], and GNN layers [13], which are permutation equivariant by construction. Another option is to define graph filtrations based on vertex/edge colors [15], which are also equivariant by design, i.e., if G = G with associated bijection g, then c(v) = c (g(v)) v V . Notably, color-based filtrations generalize the GNN-layers case since we could redefine vertex/edge-coloring functions to take the graph structure as an additional input. Thus, we now turn our attention to color-based filtrations. Color-based filtrations. Let f : X R be an injective function. Therefore, f must assign a strict total order for colors, i.e., there is a permutation π : {1, . . . , m} {1, . . . , m} such that f(xπ(1)) < < f(xπ(m)). We define the vertex-color filtration induced by f as the indexed set {Gi}m i=1 where Gi = (Vi, Ei, ci, Xi), with Xi = {xπ(1), xπ(2), . . . , xπ(i)}, Vi = {v V | c(v) Xi}, Ei = {(v, w) E | c(v) Xi, c(w) Xi}, and ci = {(v, c(v)) | v Vi}. Similarly, we can define the edge-color filtration induced by f as {Gi}m i=1 where Gi = (V, Ei, li, Xi) with Xi = {xπ(1), . . . , xπ(i)}, Ei = {(v, w) E | l(v, w) Xi}, and li = {((v, w), l(v, w)) | (v, w) Ei}. We denote the elements of a persistence diagram D as pairs (f(x(b)), f(x(d))), where x(b), x(d) X are the colors associated with the birth and death of a hole (topological feature) in a filtration induced by f( ). In the following, we use the notation {{ }} to denote multisets. 3 The power of 0-dim persistent homology under color-based filtrations In this section, we analyze the representational power of persistent homology when adopting colorbased filtrations. We focus on the persistence of connected components (0-dimensional holes). We separately discuss vertex-color (Section 3.1) and edge-color (Section 3.2) filtrations, and then compare these approaches in Section 3.3. Proofs for all Lemmas and Theorems are in Appendix B. 3.1 Vertex-color filtrations To help characterize the expressivity of persistent homology, we propose classifying persistence pairs (f(x(b)), f(x(d))) as either real holes, almost holes, or trivial holes. In particular, if f(x(d)) = and f(x(b)) = f(x(d)), we say the pair (f(x(b)), f(x(d))) is an almost hole. If f(x(b)) = f(x(d)), the pair is called a trivial hole. Finally, we call (f(x(b)), f(x(d))) a real hole if f(x(d)) = . 3.1.1 Real holes as connected components Real holes denote topological features that persist until a filtration reaches the entire graph. Thus, real holes from 0-dim persistence diagrams are associated with connected components, regardless of the filtration. Regarding the relationship between filtrations and real holes, Lemma 2 establishes the necessary and sufficient condition for the existence of a filtration that produces persistence diagrams with distinct real holes. Such a condition is associated with the notion of component-wise colors. Formally, let G and G be two graphs with connected components C1, . . . , Ck and C 1, . . . , C k , respectively. Also, let Xi = {c(v) | v VCi} be the set of colors in Ci and, similarly, X i be the colors in C i. We say that G and G have distinct component-wise colors if {{Xi}}k i=1 = {{X i}}k i=1. Lemma 2 (Equivalence between component-wise colors and real holes). Let G and G be two graphs. There exists some vertex-color filtration such that their persistence diagrams D0 G and D0 G have different multisets of real holes iff G and G have distinct component-wise colors. 3.1.2 Almost holes as separating sets Now we turn our attention to the characterization of almost holes. Our next result (Lemma 3) reveals the connection between almost holes and separating sets. Here, a separating set S of a graph G is a subset of its vertices whose removal disconnects some connected component of G. Lemma 3 (Almost holes and separating sets). Regarding the relationship between almost holes and separating sets, the following holds: 1. Let (f(x(b)), f(x(d))) be an almost hole from a vertex-color filtration. Then the set S = {w V | f(c(w)) f(x(d))} is a separating set of G. 2. Let S be a separating set of G that splits a connected component C G into k components C1, C2, . . . , Ck. Then, there exists a filtration that produces k 1 almost holes if the set of colors of vertices in k i=1VCi is disjoint from those of the remaining vertices, i.e., {c(v) | v V \ k i=1VCi} {c(v) | v k i=1VCi} = . Figure 2: We cannot use color-based separating sets to compare almost holes across graphs. Although these filtrations produce different almost holes, there is no way to remove colors s.t. the resulting graphs have different numbers of components. The relationship between almost holes and separating sets elucidated in Lemma 3 raises the question if we can use separating sets (obtained from colors) to compare almost holes across different graphs. The answer is no: even if the diagrams of two graphs only differ in their multisets of almost holes, the graphs might not have separating sets that split them into different numbers of components. For instance, consider the graphs in Figure 2, where numbers denote filter values. The persistence diagrams D0 G = {{(1, ), (2, 3), (2, 3), (3, 3), (3, 4), (4, 4)}} and D0 G = {{(1, ), (2, 3), (2, 4), (3, 3), (3, 3), (4, 4)}} only differ in almost holes. Still, we cannot pick colors whose removal of associated vertices would result in different numbers of components. Next, we introduce the notion of color-separating sets (Definition 1). Importantly, Lemma 4 leverages this definition to characterize the graphs that can be distinguished based on almost holes. Specifically, it establishes that whenever the diagrams of two graphs differ in their multiset of almost holes, we can build a color-separating set. Definition 1 (Color-separating sets). A color-separating set for a pair of graphs (G, G ) is a set of colors Q such that the subgraphs induced by V \{w V | c(w) Q} and V \{w V | c (w) Q} have distinct component-wise colors. We note that when G and G have identical component-wise colors, the sets {w V | c(w) Q} and {w V | c (w) Q} induced by the color-separating set Q are separating sets for G and G . Lemma 4 (Distinct almost holes imply distinct color-separating sets). Let D0 G, D0 G be persistence diagrams for G and G . If the diagrams D0 G, D0 G differ in their multisets of almost holes, then there is a color-separating set for G and G . 3.1.3 Bounds on the expressivity of vertex-color persistent homology Regardless of the filtration, vertex-color PH always allows counting the numbers of connected components and vertices of a graph. If all vertices have the same color, then we cannot have any expressive power beyond β0 and |V | when all vertices are added simultaneously, there cannot be almost holes as the finite living times of the holes are 0. Also, all real holes are identical, and we have D0 = {{(1, ), . . . , (1, ), (1, 1), ..., (1, 1)}}, with |D0| = |V |. For graphs with m 1 colors, Lemma 5 shows that sets of birth times correspond to vertex colors. As a consequence, if the multisets of vertex colors differ for graphs G and G , then the corresponding persistence diagrams are also different in all filtrations. Lemma 5 (Equivalence between birth times and vertex colors). There is a bijection between the multiset of birth times and the multiset of vertex colors in any vertex-color filtration. From Lemma 5, we can recover the multiset of colors from the persistence diagram and, consequently, distinguish graphs with different multisets. However, persistent homology uses vertex colors as input, and we do not need persistence diagrams to construct or compare such multisets. This highlights the importance of death times to achieve expressivity beyond identifying vertex colors. In fact, for non-trivial cases, the expressivity highly depends on the choice of filtration. We have discussed the importance of color-separating sets (Lemma 4) and component-wise vertex colors (Lemma 2). With these notions, Theorem 1 formalizes the limits of expressivity that may be achieved with suitable filtration and characterize which pairs of graphs can, at best, be distinguished by comparing their persistence diagrams. Here, we only consider pairs of graphs that cannot be distinguished by their multisets of colors, as this corresponds to a trivial case. Theorem 1 (The expressive power of vertex-color filtrations). For any two graphs G and G with identical multisets of colors {{c(v) : v V }} = {{c (v) : v V }}, there exists a filtration such that D0 G = D0 G if and only if there is a color-separating set for G and G . 3.2 Edge-color filtrations We now consider the expressivity of 0-dim persistent homology obtained from edge-color filtrations. The persistence diagrams are constructed exactly the same way. However, in this case, all holes are born at the same time (all vertices appear in G0). This implies that all real holes are identical. Also, the diagrams do not contain trivial holes since G0 does not have edges. All holes are either real holes or almost holes (of the form (0, d)). We also note that persistence diagrams will always have almost holes unless the graph is edgeless. Figure 3: (a) G and G differ in their multisets of colors, but no edge-color filtration can distinguish them. For instance, assume that f( blue ) = 1 < 2 = f( orange ). Then, D0 G = D0 G = {{(0, ), (0, 1), (0, 1), (0, 1)}}. The same holds for f( blue ) > f( orange ). (b) Graphs that have different disconnecting sets, but for which we can find filtrations that lead to identical diagrams. Analogously to separating sets in vertex-color filtrations, Lemma 6 characterizes edge-based almost holes as disconnecting sets a set of edges whose removal would increase the number of components. Lemma 6 (Edge-based almost holes as disconnecting sets). Let (0, f(x(d))) be an almost hole from an edge-color filtration. Then S = {e E | f(l(e)) f(x(d))} is a disconnecting set of G. Lemma 6 tells us how to construct a disconnecting set from an almost hole. Now, suppose we are given a disconnecting set S. Can we build an edge-color filtration for which S can be obtained from an almost hole? In other words, can we obtain a diagram with an almost hole (0, f(x(d))) such that {e E | f(l(e)) f(x(d))} is equal to S? Lemma 7 shows that if the colors of edges in S are distinct from those in E \ S, then there is a filtration that induces a persistence diagram with an almost hole from which we can reconstruct S. Lemma 7 (Reconstructing a disconnecting set). Let G = (V, E, l, X) be a graph and S E be a disconnecting set for G. If the set of colors of S is disjoint from that of E \ S, then there exists a filtration such that S = {e E | f(l(e)) f(x(d))} for an almost hole (0, f(x(d))) D0. 3.2.1 Bounds on the expressivity of edge-color persistent homology Similar to the vertex-color case, in any edge-color filtration, we have that |D0| = |V | and the number of real holes is β0. Also, the lowest expressivity is achieved when all edges have the same color. In this case, two graphs with different numbers of vertices or connected components have different persistence diagrams (and can be distinguished); otherwise, they cannot be distinguished. We have seen in Lemma 5 that vertex-color filtrations encode colors as birth times. In contrast, birth times from edge-color filtrations are always trivially equal to zero. Thus, we cannot generalize Lemma 5 to edge-color filtrations. Instead, we can show there are graphs with different multisets of edge colors such that the graphs have the same persistence diagrams for any filtration (see Figure 3(a)). Let us now consider lower limits for graphs with m > 1 edge colors. We can show that even if two graphs have different disconnecting sets (obtained from colors), there are filtrations that induce the same persistence diagrams. To see this behavior, consider the two graphs in Figure 3(b), where the deletion of blue edges disconnects one of the graphs but not the other. Although we can build an edge-color filtration that separates these graphs (i.e., D0 G = D0 G ), if we choose f( green ) = 3, f( orange ) = 2, and f( blue ) = 1, we obtain D0 G = D0 G = {{(0, ), (0, 1), (0, 2), (0, 2), (0, 2), (0, 2)}}. Interestingly, even if two graphs have different sets of edge colors, we might still find filtrations that induce identical diagrams. The reason is that unlike vertex-color filtrations where trivial holes make sure that all vertices are represented in the diagrams, in edge-color filtrations there are no trivial holes. As a result, persistence diagrams from edge-color filtrations do not account for edges that do not lead to the disappearance of connected components. Lemma 6 and Lemma 7 showed that edge-based almost holes can be characterized as disconnecting sets, somewhat analogously to vertex-based almost holes as separating sets. We complete the analogy by introducing the notion of color-disconnecting sets in Definition 2. We then use this notion to fully characterize the the expressive power of edge-color persistent homology in Theorem 2. More specifically, the existence of a color-disconnecting set between a given pair of graphs is a necessary and sufficient condition for distinguishing them based on 0-dimensional persistence diagrams. Definition 2 (Color-disconnecting sets). A color-disconnecting set for a pair of graphs (G, G ) is a set of colors Q such that if we remove the edges of colors in Q from G and G , we obtain subgraphs with different numbers of connected components. Theorem 2 (The expressive power of edge-color filtrations). Consider two graphs G and G . There exists an edge-color filtration such that D0 G = D0 G if and only if there is a color-disconnecting set for G and G . 3.3 Vertex-color versus edge-color filtrations To compare vertexand edge-color persistence diagrams, we consider graphs with vertex-coloring functions c( ) from which we derive edge-coloring ones l( ). In particular, for a graph G = (V, E, c, X), its edge-coloring function l : E X2 is defined as l(v, w) = {c(v), c(w)}. Figure 4: Illustration of graphs that cannot be distinguished based on (a) edge-color filtrations, (b) vertex-color filtrations, and (c) both vertexand edge-color filtrations. Recall that only vertex-color filtrations can 1) encode multisets of colors and 2) have real holes with different birth times. Naturally, we can find pairs of graphs (G, G ) for which we can obtain D0 G = D0 G from vertex-color filtrations, but not from edge-color ones. Consider the graphs in Figure 4(a). The vertex-color filtration f( blue ) = 1, f( orange ) = 2 produces D = {{(1, ), (1, ), (1, 2), (2, 2), (2, 2)}} and D = {{(1, ), (1, 1), (2, ), (2, 2), (2, 2)}}. However, there is no edge-color filtration that would tell them apart there are only two possible edge-color filtrations, leading to either D = {{(0, ), (0, ), (0, 1), (0, 2), (0, 2)}} = D , or D = {{(0, ), (0, ), (0, 1), (0, 1), (0, 2)}} = D . We can also show that there are graphs that can be distinguished by edge-color filtrations but not by vertex-color ones. Intuitively, one can think of this as a result of edge colors being more fine-tuned. For instance, consider the graphs in Figure 4(b). We can separate these graphs using the function f( orange ) = 1, f( blue-orange ) = 2, and f( blue ) = 3, which yields D0 G = {{(0, ), (0, 1), (0, 1), (0, 2), (0, 2), (0, 3)}} = {{(0, ), (0, 1), (0, 1), (0, 2), (0, 2), (0, 2)}} = D0 G . However, since there is no color-separating set for G and G , by Theorem 1, we have that DG = DG for all vertex-color filtrations. Theorem 3 formalizes the idea that none of the classes of colorbased filtrations subsumes the other. In addition, Figure 4(c) illustrates that there are very simple non-isomorphic graphs that PH under both vertexand edge-color filtrations cannot distinguish. Theorem 3 (Edge-color vs. vertex-color filtrations). There exist non-isomorphic graphs that vertex-color filtrations can distinguish but edge-color filtrations cannot, and vice-versa. 4 Going beyond persistent homology We now leverage the theoretical results in Section 3 to further boost the representational capability of persistent homology. In particular, we propose modifying edge-color persistence diagrams to account for structural information that is not captured via the original diagrams. We call the resulting approach Re PHINE (Refining PH by incorporating node-color into edge-based filtration). Notably, Re PHINE diagrams are not only provably more expressive than standard color-based ones but also make 1dimensional topological features redundant. Additionally, we show how to integrate Re PHINE into arbitrary GNN layers for graph-level prediction tasks. Edge-color diagrams with missing holes. A major drawback of edge-color filtrations is that information about the multisets of (edge) colors is lost, i.e., it cannot be recovered from persistence diagrams. To reconstruct disconnecting sets, we need the edge-color permutation given by the filtration function and the number of edges both of which cannot be deduced from diagrams alone. To fill this gap, we introduce the notion of missing holes. Conceptually, missing holes correspond to edges that are not associated with the disappearance of any connected component in a given filtration. By design, we set the birth time of missing holes to 1 this distinguishes them from real and almost holes, which have birth times equal to 0. The death time of a missing hole corresponds to the first filtration step f(x) that its corresponding edge color x appears. We note that missing holes correspond to cycles obtained from 1-dim persistence diagrams. As an example, consider the edge-color filtration in Figure 5, which produces D0 = {{(0, ), (0, 1), (0, 2), (0, 2), (0, 4)}}. We note that the orange edge and one of the orange-green ones do not kill any 0-dim hole. This results in the missing holes (1, 3) and (1, 4). Clearly, missing holes bring in additional expressivity as, e.g., it would be possible to distinguish graphs that only differ in the orange edge in Figure 5. Still, edge-color diagrams with missing holes are not more expressive than vertex-color ones. For instance, they cannot separate the two graphs in Figure 3(a). Augmenting edge-color diagrams with vertex-color information. To improve the expressivity of persistent homology, a simple approach is to merge tuples obtained from independent vertexand edge-color filtrations. However, this would double the computational cost while only allowing distinguishing pairs of graphs that could already be separated by one of the filtrations. Ideally, we would like to go beyond the union of vertexand edge-color persistence diagrams. As in Section 3.3, we consider graphs with edge colors obtained from vertex-coloring functions. Also, we assume that fv and fe are independent vertexand edge-color filtration functions, respectively. We propose adding two new elements to the tuples of edge-color diagrams with missing holes. Our augmented tuple is (b, d, α, γ) where α and γ are the additional terms. Recall that, in any edgecolor filtration, G0 has |V | connected components. Then, we can associate real or almost holes of edge-color diagrams with vertices in G. With this in mind, we define Re PHINE diagrams as follows. Figure 5: Re PHINE diagrams. At G1, one component dies and creates the almost hole (0, 1, 2, 1). We also save that two nodes were discovered at 1 (fourth component), with colors equal to 2 (third component). At step 2, two other holes are killed, resulting in two tuples (0, 2, 1, 2). At G3, we obtain the missing hole (1, 3, 0, 0). Finally, G4 creates one almost hole and one missing hole. Definition 3 (Re PHINE diagram). The Re PHINE diagram of a filtration on a graph G is a multiset of cardinality |V | + β1 G, with elements of form (b, d, α, γ). There are two cases: Case b = 0 (real or almost holes). Now, b and d correspond to birth and death times of a component as in edge-color filtration. We set α(w) = fv(c(w)) and γ(w) = minv N(w) fe({{c(w), c(v)}}), where w is the vertex that is associated with the almost or real hole. Vertices are matched with the diagram elements as follows: An almost hole (b,d) corresponds to an edge merging two connected components, T1, T2. Each of these connected components has exactly one vertex, w T1 or w T2, which has not yet been associated with any element of the Re PHINE diagram. Let w = arg maxw {w T1,w T2} fv(c(w )) , or if fv(c(w T1)) = fv(c(w T2)), then w = arg minw {w T1,w T2} γ(w ). The vertices that are associated with real holes are vertices that have not died after the last filtration step. Case b = 1 (missing holes). Here, the entry d is the filtration value of an edge e that did not kill a hole but produces a cycle that appears at the filtration step associated with adding the edge e. The entries α and γ take uninformative values (e.g., 0). Figure 5 provides an example of Re PHINE diagrams. Further details of the procedure can be found in Appendix C. Notably, our scheme can be computed efficiently at the same cost as standard persistence diagrams and is consistent we obtain identical diagrams for any two isomorphic colored graphs. Theorem 4 (Re PHINE is isomorphism invariant). Let G, G be isomorphic graphs. Then, any edge-color and vertex-color filtrations produce identical Re PHINE diagrams for G and G . In addition, Theorem 5 shows that Re PHINE diagrams are strictly more expressive than those from both vertexand edge-color filtrations, including 0and 1-dim topological features. Figure 4(c) provides an example of graphs that cannot be recognized by any color-based filtration, but for which we can obtain distinct Re PHINE diagrams. Theorem 5 (Re PHINE is strictly more expressive than color-based PH). Let D, D be the persistence diagrams associated with any edge or vertex-color filtration of two graphs. If D = D , then there is a filtration that produces different Re PHINE diagrams. The converse does not hold. Despite its power, there are simple non-isomorphic graphs Re PHINE cannot distinguish. In particular, if two graphs have one color, Re PHINE cannot separate graphs of equal size with the same number of components and cycles. For example, star and path graphs with 4 vertices of color c1 produce identical Re PHINE diagrams of the form {{(0, d, a, d), (0, d, a, d), (0, d, a, d), (0, , a, d)}}, where d = fe({{c1, c1}}) and a = fv(c1) for arbitrary edgeand vertex-color filtration functions. Combining Re PHINE and GNNs. Re PHINE diagrams can be easily incorporated into general GNN layers. For instance, one can follow the scheme in [15] to combine non-missing hole information with node features and leverage missing holes as graph-level attributes. However, here we adopt a simple scheme that processes Re PHINE tuples using Deep Sets [38]. These topological embeddings are then grouped using a pooling layer and concatenated with the graph-level GNN embedding. The resulting representation is fed to a feedforward network to obtain class predictions. Formally, let NG(u) denote the set of neighbors of vertex u in G, and h(0) u = c(u) for all u V . We compute Data (+1) Data (-1) 0 500 1000 1500 2000 Expressivity 0 500 1000 1500 2000 0 500 1000 1500 2000 Re PHINE PH GCN 0 500 1000 1500 2000 Expressivity 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 Epochs Expressivity 0 500 1000 1500 2000 Epochs 0 500 1000 1500 2000 Epochs Figure 6: Average learning curves for Re PHINE, PH, and GCN on connected cubic graphs. Re PHINE can learn representations in cases where PH and GNNs struggle to capture structural information. Re PHINE shows better expressivity and fitting capability on Cub10-2 and Cub12-3. GNN and Re PHINE embeddings (denoted by r(ℓ)) at layer ℓrecursively as: h(ℓ) u = AGG(ℓ)({{h(ℓ 1) w | w NG(u)}}) u V h(ℓ) u = UPDATE(ℓ) h(ℓ 1) u , h(ℓ) u u V R(ℓ) = REPHINE(f (ℓ) v , f (ℓ) e , {{h(ℓ) u }}u V ) r(ℓ) = ϕ(ℓ)( X d R(ℓ) ψ(ℓ)(d)) where f (ℓ) v , f (ℓ) e , ψ(ℓ), ϕ(ℓ), AGG(ℓ), and UPDATE(ℓ) are arbitrary non-linear mappings, usually implemented as feedforward neural nets. After L layers, we obtain the combined Re PHINE-GNN graph-level representation as [POOL1({r(ℓ)}ℓ) POOL2({h(L) u }u)], where POOL1 is either mean or concatenation, and POOL2 is an order invariant operation. 5 Experiments In this section, we compare Re PHINE to standard persistence diagrams from an empirical perspective. Our main goal is to evaluate whether our method enables powerful graph-level representation, confirming our theoretical analysis. Therefore, we conduct two main experiments. The first one leverages an artificially created dataset, expected to impose challenges to persistent homology and MPGNNs. The second experiment aims to assess the predictive performance of Re PHINE in combination with GNNs on popular benchmarks for graph classification. All methods were implemented in Py Torch [31], and our code is available at https://github.com/Aalto-Qu ML/Re PHINE. Synthetic data. We consider three datasets of cubic graphs (or 3-regular graphs): cub08, cub10, and cub12 [6]. These graphs cannot be distinguished by 1-WL and color-based PH as all vertices share the same color. Thus, we modify the datasets by changing the colors of 1, 2, or 3 vertices in each graph sample, resulting in the modified datasets cub08-1, cub10-2, and cub12-3. Also, we randomly partition each dataset and create a balanced binary classification task. We expect this to keep the hardness of the task while allowing some distinguishability. We compare standard 0-dim persistence diagrams from vertex-color filtrations (referred to as PH) to 0-dim Re PHINE (i.e., no missing holes). Both approaches are processed using Deep Sets with exactly the same structure and optimization procedure. Also, they operate on the original colors, not on GNN embeddings. For completeness, we report results for a 2-layer graph convolutional network (GCN) [22] followed by an MLP. We are interested in assessing if the persistence modules can overfit the observed graphs. We also monitor if the methods obtain different representations for each graph, measured in terms of the proportion of unique graph embeddings over training (which we call expressivity). We provide further details and additional results with 1-dim persistence diagrams in the supplementary material. Table 1: Predictive performance on graph classification. We denote in bold the best results. For ZINC, lower is better. For most datasets, Re PHINE is the best-performing method. GNN Diagram NCI109 PROTEINS IMDB-B NCI1 MOLHIV ZINC GCN - 76.46 1.03 70.18 1.35 64.20 1.30 74.45 1.05 74.99 1.09 0.875 0.009 PH 77.92 1.89 69.46 1.83 64.80 1.30 79.08 1.06 73.64 1.29 0.513 0.014 Re PHINE 79.18 1.97 71.25 1.60 69.40 3.78 80.44 0.94 75.98 1.80 0.468 0.011 GIN - 76.90 0.80 72.50 2.31 74.20 1.30 76.89 1.75 70.76 2.46 0.621 0.015 PH 78.35 0.68 69.46 2.48 69.80 0.84 79.12 1.23 73.37 4.36 0.440 0.019 Re PHINE 79.23 1.67 72.32 1.89 72.80 2.95 80.92 1.92 73.71 0.91 0.411 0.015 Figure 6 shows the learning curves for 2000 epochs, averaged over five runs. Notably, for all datasets, the expressivity of Re PHINE is significantly higher than those from PH and similar to GNN s. On cub10-2, while PH and GNN obtain accuracies of around 0.5, Re PHINE allows a better fit to the observed data, illustrated by higher accuracy and lower loss values. Real-world data. To assess the performance of Re PHINE on real data, we use six popular datasets for graph classification (details in the Supplementary): PROTEINS, IMDB-BINARY, NCI1, NCI109, MOLHIV and ZINC [7, 16, 20]. We compare Re PHINE against standard vertex-color persistence diagrams (simply called PH here). Again, we do not aim to benchmark the performance of topological GNNs, but isolate the effect of the persistence modules. Thus, we adopt default (shallow) GNN architectures and process the persistence diagrams exactly the same way using Deep Sets. We report the mean and standard deviation of predictive metrics (AUC for MOLHIV, MAE for ZINC, and Accuracy for the remaining) over five runs. We provide further implementation details in Appendix C. Table 1 shows the results of PH and Re PHINE combined with GCN [22] and GIN [37] models. Notably, Re PHINE consistently outperforms PH, being the best-performing method in 10 out of 12 experiments. Overall, we note that GIN-based approaches achieve slightly better results. Our results suggest that Re PHINE should be the default choice for persistence descriptors on graphs. Table 2: Pers Lay vs. Re PHINE: Accuracy results on graph classification. Method NCI109 PROTEINS IMDB-B NCI1 Pers Lay 70.12 0.83 67.68 1.94 68.60 5.13 68.86 0.86 Re PHINE+Linear 73.27 1.69 71.96 1.85 70.40 2.97 74.94 1.35 Comparison to Pers Lay [2]. We also compare our method against another topological neural network, namely, Pers Lay. Since Pers Lay does not leverage GNNs, we adapted our initial design for a fair comparison. Specifically, we compute Re PHINE diagrams with learned filtration functions and apply a linear classifier to provide class predictions. Also, we concatenate the vectorial representations of the Re PHINE diagrams with the same graph-level features obtained using Pers Lay. We refer to our variant as Re PHINE+Linear. Table 2 reports accuracy results over 5 runs on 4 datasets. For all datasets, Re PHINE+Linear achieves higher accuracy, with a significant margin overall. 6 Conclusion, Broader Impact, and Limitations We resolve the expressivity of persistent homology methods for graph representation learning, establishing a complete characterization of attributed graphs that can be distinguished with general nodeand edge-color filtrations. Central to our analyses is a novel notion of color-separating sets. Much like how WL test has fostered more expressive graph neural networks (GNNs), our framework of color-separating sets enables the design of provably more powerful topological descriptors such as Re PHINE (introduced here). Re PHINE is computationally efficient and can be readily integrated into GNNs, yielding empirical gains on several real benchmarks. We have not analyzed here other types of filtrations, e.g., those based on the spectral decomposition of graph Laplacians. Future work should also analyze the stability, generalization capabilities, and local versions of Re PHINE. Overall, we expect this work to spur principled methods that can leverage both topological and geometric information, e.g., to obtain richer representations for molecules in applications such as drug discovery and material design. Acknowledgments This work was supported by Academy of Finland (Flagship programme: Finnish Center for Artificial Intelligence FCAI) and a tenure-track starting grant by Aalto University. We also acknowledge the computational resources provided by the Aalto Science-IT Project from Computer Science IT. We are grateful to the anonymous area chair and reviewers for their constructive feedback. Johanna Immonen thanks Tuan Anh Pham, Jannis Halbey, Negar Soltan Mohammadi, Yunseon (Sunnie) Lee, Bruce Nguyen and Nahal Mirzaie for their support and for all the fun during her research internship at Aalto University in the summer of 2022. [1] P. Barceló, E. V. Kostylev, M. Monet, J. Pérez, J. L. Reutter, and J.-P. Silva. The logical expressiveness of graph neural networks. In International Conference on Learning Representations (ICLR), 2020. [2] M. Carrière, F. Chazal, Y. Ike, T. Lacombe, M. Royer, and Y. Umeda. Pers Lay: A neural network layer for persistence diagrams and new graph topological signatures. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. [3] M. Carriere, F. Chazal, M. Glisse, Y. Ike, H. Kannan, and Y. Umeda. Optimizing persistent homology based functions. In International Conference on Machine Learning (ICML), 2021. [4] Y. Chen, B. Coskunuzer, and Y. Gel. Topological relational learning on graphs. In Advances in Neural Information Processing Systems (Neur IPS), 2021. [5] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence diagrams. In Proceedings of the Twenty-First Annual Symposium on Computational Geometry, 2005. [6] K. Coolsaet, S. D hondt, and J. Goedgebeur. House of graphs 2.0: A database of interesting graphs and more. Discrete Applied Mathematics, 325:97 107, 2023. [7] V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, and X. Bresson. Benchmarking graph neural networks. Journal of Machine Learning Research, 24(43):1 48, 2023. [8] Edelsbrunner, Letscher, and Zomorodian. Topological persistence and simplification. Discrete & Computational Geometry, 28(4), 2002. [9] H. Edelsbrunner and J. Harer. Computational Topology - an Introduction. American Mathematical Society, 2010. [10] M. Fey and J. E. Lenssen. Fast graph representation learning with Py Torch Geometric. In Workshop track of the International Conference on Representation Learning (ICLR), 2019. [11] V. Garg, S. Jegelka, and T. Jaakkola. Generalization and representational limits of graph neural networks. In International Conference on Machine Learning (ICML), 2020. [12] C. Hofer, R. Kwitt, M. Niethammer, and A. Uhl. Deep learning with topological signatures. In Advances in Neural Information Processing Systems (Neur IPS), 2017. [13] C. Hofer, F. Graf, B. Rieck, M. Niethammer, and R. Kwitt. Graph filtration learning. In International Conference on Machine Learning (ICML), 2020. [14] C. D. Hofer, R. Kwitt, and M. Niethammer. Learning representations of persistence barcodes. Journal of Machine Learning Research, 20(126):1 45, 2019. [15] M. Horn, E. De Brouwer, M. Moor, Y. Moreau, B. Rieck, and K. Borgwardt. Topological graph neural networks. In International Conference on Learning Representations (ICLR), 2022. [16] W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec. Open graph benchmark: Datasets for machine learning on graphs. ar Xiv preprint ar Xiv:2005.00687, 2020. [17] Xiaoling Hu, Fuxin Li, Dimitris Samaras, and Chao Chen. Topology-preserving deep image segmentation. In Advances in Neural Information Processing Systems (Neur IPS), 2019. [18] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015. [19] John J. Irwin, Teague Sterling, Michael M. Mysinger, Erin S. Bolstad, and Ryan G. Coleman. Zinc: A free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 52(7):1757 1768, 2012. [20] Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. Benchmark data sets for graph kernels, 2016. URL http://graphkernels.cs.tu-dortmund.de. [21] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015. [22] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017. [23] V. Kovacev-Nikolic, P. Bubenik, D. Nikoli c, and G. Heo. Using persistent homology and dynamical distances to analyze protein binding. Statistical Applications in Genetics and Molecular Biology, 15(1): 19 38, 2016. [24] M. Kramár, R. Levanger, J. Tithof, B. Suri, M. Xu, M. Paul, M. F. Schatz, and K. Mischaikow. Analysis of Kolmogorov flow and Rayleigh Bénard convection using persistent homology. Physica D: Nonlinear Phenomena, 334:82 98, 2016. [25] Y. Lee, S. D. Barthel, P. Dłotko, S. M. Moosavi, K. Hess, and B. Smit. Quantifying similarity of pore-geometry in nanoporous materials. Nature Communications, 8(1), 2017. [26] J. Leygonie, S. Oudot, and U. Tillmann. A framework for differential calculus on persistence barcodes. Foundations of Computational Mathematics, 22(4):1069 1131, 2022. [27] C. Li, M. Ovsjanikov, and F. Chazal. Persistence-based structural recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. [28] A. Loukas. What graph neural networks cannot learn: depth vs width. In International Conference on Learning Representations (ICLR), 2020. [29] A. Loukas. How hard is to distinguish graphs with graph neural networks? In Advances in Neural Information Processing Systems (Neur IPS), 2020. [30] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and Leman go neural: Higher-order graph neural networks. In AAAI Conference on Artificial Intelligence (AAAI), 2019. [31] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. De Vito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. In Advances in Neural Information Processing Systems (Neur IPS - Workshop), 2017. [32] B. Rieck. On the expressivity of persistent homology in graph learning. ar Xiv: 2302.09826, 2023. [33] B. Rieck, C. Bock, and K. Borgwardt. A persistent weisfeiler-lehman procedure for graph classification. In International Conference on Machine Learning (ICML), 2019. [34] R. Sato. A survey on the expressive power of graph neural networks. ar Xiv:2003.04078, 2020. [35] R. Sato, M. Yamada, and H. Kashima. Approximation ratios of graph neural networks for combinatorial problems. In Advances in Neural Information Processing Systems (Neur IPS), 2019. [36] B. Weisfeiler and A. A. Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia, 2(9):12 16, 1968. [37] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations (ICLR), 2019. [38] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. Salakhutdinov, and A. Smola. Deep sets. In Advances in Neural Information Processing Systems (Neur IPS), 2017. [39] Q. Zhao and Y. Wang. Learning metrics for persistence-based summaries and applications for graph classification. In Advances in Neural Information Processing Systems (Neur IPS), 2019. [40] Q. Zhao, Z. Ye, C. Chen, and Y. Wang. Persistence enhanced graph neural network. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. Supplementary material: Going beyond persistent homology using persistent homology A Persistent homology Persistent homology (PH) is one of the workhorses for topological data analysis (TDA). A central idea underlying PH is to investigate the multiresolution structure in data through the lens of low-dimensional topological features such as connected components (0-dimensional), loops (1dimensional), and voids (2-dimensional). Here, we provide a brief description of PH, and how it extends to graphs. In particular, we do not present proofs and do not show that the constructions are well-defined. For a detailed treatment, we refer the reader to [13], [9]. We will first define homological groups. They allow to characterise p-dimensional holes in a topological space such as a simplicial complex. We present the theory for simplicial complexes, as our focus is on 1-dimensional simplicial complexes (i.e. graphs). Let K be a simplicial complex. The p-chains are formal sums c = P aiσi, where ai Z/2Z and σi are p-simplices in K. One can think of p-chain as a set of p-simplices such that ai = 1. Together with componentwise addition, p-chains form the group Cp(K). Now, consider a simplex σ = (v0, ..., vp) K. We can define a boundary for σ by j=0 (v0, ..., vj 1, vj+1, ..., vp), i.e., pσ is a sum of the (p 1)-dimensional faces of σ. We can extend this to define a boundary homomorphism p : Cp(K) Cp 1(K) where p P aiσi = P ai pσi. Thus, we can define a sequence of groups ...Cp+1(K) p+1 Cp(K) p Cp 1(K)..., each connected with a boundary homomorphism. This sequence is chain complex, and it is the last definition we need in order to consider homology groups. The pth homology group is a group of p-chains with empty boundary (i.e. pσ = 0) such that each of these particular p-chains (cycles) are a boundary of a different simplex in Cp+1(K). So, we can define the homology group as the quotient space Hp = ker p/Im (p+1). The rank of Hp is equal to the pth Betti number (βp). Then, let us see how the homology groups can be refined to gain persistent homology groups. Persistent homology tracks the evolution of Betti numbers in a sequence of chain complexes. For this, we need a filtration, which is an increasing sequence of simplicial complexes (Fi)r i=1 such that F1 = F2 . . . Fr = K. By constructing all homology groups for each of these simplicial complexes, we can capture changes. New holes (or, homology classes) may emerge, or they may be annihilated such that only the older remains. As such, we can associate a pair of timestamps, or persistence points, (i, j) for every hole to indicate the filtration steps it appeared and disappeared. The persistence of a point (i, j) is the duration for which the corresponding feature was in existence, i.e., the difference |i j|. We set j = if the hole does not disappear, i.e. is present at the last filtration step. The extension to persistent homology groups and persistent Betti numbers is natural: Hi,j p = ker p/(Im (p+1) ker p), and the pth persistent Betti number βi,j p are given by the rank of Hi,j p as earlier. Lastly, a persistent diagram that consists of the persistent points (i, j) with multiplicities µi,j p = (βi,j 1 p βi,j p ) (βi 1,j 1 p βi 1,j p ) where i < j, encodes the persistent homology groups entirely by the Fundamental Lemma of Persistent Homology. For graphs, the filtration may be viewed as creating an increasing sequence of subgraphs. This entails selecting a subset of vertices and edges of the graph at each step of the filtration. One can learn a parameterized function f (e.g., a neural network) to assign some value to each σ K, and thereby select the subsets Ki based on a threshold αi R. That is, f induces a filtration (Fi)r i=1 using a sequence (αi)r i=1 such that α1 α2 . . . αr: Fi F(f; αi) = {σ K : f(σ) αi}. We provide a detailed pseudocode in Algorithm 1 to compute the persistence diagram for an input graph. The algorithm uses the Union-Find data structure, also known as a disjoint-set forest. The code assumes we are given vertex-color filter values, stored in the variable v Values. The algorithm returns a multiset containing 0and 1-dimensional persistence tuples (i.e., persistence diagrams). Algorithm 1 Computing persistence diagrams Require: V, E, v Values Vertices, edges, and vertex-color filter values uf UNIONFIND(|V |) pers0 zeros(|V |, 2) Initialize the persistence tuples pers1 zeros(|E|, 2) for e E do (v, w) e e Values[e] max(v Values[v], v Values[w]) end for pers0[:, 1] v Values Pre-set the birth times SINDICES, SVALUES SORT(e Values) for e, weight Pair(SINDICES, SVALUES) do Pair is equivalent to the zip function in Python (v, w) e younger uf.find(v) younger denotes the component that will die older uf.find(w) if younger = older then A cycle was detected pers1[e, 1] weight pers1[e, 2] continue else if v Values[younger] < v Values[older] then younger, older, v, w older, younger, w, v end if end if pers0[younger, 2] weight uf.merge(v, w) Merge two connected components end for for r uf.roots() do pers0[r, 2] end for Dv JOIN(pers0, pers1) return Dv B.1 Proof of Lemma 1: Vertex-based filtrations can generate inconsistent diagrams v1 v2 v3 v4 v1 v2 v3 v4 v1 v2 v3 v4 Figure S1: Filtrations induced by injective vertex filter functions for two isomorphic graphs. Node ids are used as filter function. The top-row filtration induces the persistence diagram D0 1 = {{(1, ), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}}, while the second-row filtration produces D0 2 = {{(1, ), (2, 4), (3, 5), (4, 4), (5, 5), (6, 6)}}. Proof. Consider a simple cyclic graph with 6 vertices that share the same color. Since the vertices are structurally identical and have the same color, one would expect to get a single persistence diagram irrespective of the labeling of the vertices. However, this is not the case. Consider two different labelings for the vertices on the graph: ℓ1 = (v1, v2, v3, v4, v5, v6) and ℓ2 = (v1, v4, v2, v6, v3, v5) (see Figure S1). Now, consider an injective vertex-based filtration where f(vi) > f(vj) if i > j. Then, we obtain two different persistence diagrams, D1 = {{(1, ), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}} and D2 = {{(1, ), (2, 4), (3, 5), (4, 4), (5, 5), (6, 6)}}. We note that for any choice of vertex-based injective filter function on this cycle graph, we can follow a similar procedure to build two different labelings such that the persistence diagrams are different. B.2 Proof of Lemma 2: Equivalence between component-wise colors and real holes Proof. We consider two arbitrary graphs G = (V, E, c, X) and G = (V , E , c , X ) and an injective filter function f : X X R. We note that if G and G do not have the same number of connected components (i.e., β0 G = β0 G ), then G and G differ on the number of real holes, i.e., their multisets of real holes are different trivially. Thus, we now assume β0 G = β0 G = k. We also assume both graphs have same colors if there is a color in G that is not in G , the claim is trivial. [ ] Recall that Xi = {c(v) | v VCi} denotes the set of colors in the component Ci G. Similarly, X i is the set of colors in C i G . We want to show that if {{Xi}}k i=1 = {{X i}}k i=1, then there exists a filtration such that the multisets of real holes are different. We proceed with a proof by induction on the number of colors. If there is only 1 color, component-wise colors cannot differ for graphs with β0 G = β0 G . Let us thus consider 2 colors (say, b and w). For 2 colors, there are only three possibilities for what Xh {{Xi}}k i=1 may be: {b}, {w} or {b, w}. Now, let us denote the multiplicities of {b}, {w} and {b, w} in {{Xi}}k i=1 by n1, n2 and n3, respectively. Note that for G and G with β0 G = β0 G , we have n1 + n2 + n3 = n 1 + n 2 + n 3. Thus, when {{Xi}}k i=1 = {{X i}}k i=1, there are four cases to consider: 1. n1 = n 1, n2 = n 2, n3 = n 3: Here, n2 + n3 = n 2 + n 3 correspond to multiplicities of real holes (w, ) for G and G respectively, in a filtration that introduces the color w first. 2. n1 = n 1, n2 = n 2, n3 = n 3 : Again, n2 + n3 = n 2 + n 3 correspond to multiplicities of real holes (w, ) for G and G respectively in a filtration that introduces the color w first. 3. n1 = n 1, n2 = n 2, n3 = n 3: Now, n1 + n3 = n 1 + n 3 correspond to multiplicities of real holes (b, ) for G and G respectively in a filtration that introduces the color b first. 4. n1 = n 1, n2 = n 2, n3 = n 3: Similarly, n1 + n3 = n 1 + n 3 correspond to multiplicities of real holes (b, ) for G and G respectively in a filtration that introduces the color b first. Note that cases as n1 = n 1, n2 = n 2, n3 = n 3 are not possible as n1 + n2 + n3 = n 1 + n 2 + n 3. Let us then assume that there are l colors, and there exists a permutation of the colors {c1, c2, ..., cl} that induces a filtration giving different colored representatives. Let us consider graphs G and G with l + 1 colors. Now, if {{Xi}}k i=1 = {{X i}}k i=1 for subgraphs of G and G with only l colors, the permutation {cl+1, c1, c2, ..., cl} induces a filtration where the representatives of first l colors differ (and there may or may not be a difference also in the representatives of the l + 1-th color). However, if there are no such subgraphs, this means that each of the pairs of unmatched component-colors contain the l + 1 th color. Now {c1, c2, ..., cl, cl+1} must induce the wanted kind filtration, since now the representatives of each component are as in l colors. The claim follows by the induction principle. [ ] Now, we want to prove that if there is a filtration such that the multisets of real holes differ, then {{Xi}} = {{X i}}. We proceed with a proof by contrapositive. Assume that {{Xi}} = {{X i}}. Recall that, for a filter f, the color of the representatives of a real hole associated with Ci is given by arg minx Xi f(x). If {{Xi}} = {{X i}}, it implies that the multisets of colors of the representatives are identical. Finally, note that the birth times of real holes are functions of these colors and, therefore, are identical as well. B.3 Proof of Lemma 3: Almost holes and separating sets Figure S2: Graph to help illustrate the connectivity of almost holes. Statement 1: We want to show that if (f(x(b)), f(x(d))) is an almost hole, then S = {v V |f(c(v)) f(x(d))} is a separating set of G = (V, E, c, X). Proof. Let d = (f(x(b)), f(x(d))) be an almost hole. Then, we know there is at least one vertex w of color c(w) = x(b) that gives birth to a new connected component at the filtration step Gf(x(b)). Also, there is a distinct vertex w such that w and w are not in the same component at Gf(x(b)) but are connected at Gf(x(d)). The existence of w is guaranteed since if there was no such w that gets connected to w at Gf(x(d)), d would be a real hole, or if w was connected to all other nodes at Gf(x(b)), d would be a trivial hole. Figure S2 illustrates a filtration on a 5-vertex graph with 5 colors. The filtration produces the persistence diagram {{(1, ), (2, 2), (3, 4), (4, 4), (5, 5)}}, with a single almost hole (3, 4). According to our description, w corresponds to v3 (with x(b) = grey and f(x(b)) = 3), and v1 could be a candidate to w , for instance. The discovery of the vertices in T = {v V | f(c(v)) = f(x(d))} connects w to w since this set is added at the step when the component associated with w dies at f(x(d)). Equivalently, T is a separating set of Gf(x(d)). However, we want a separating set of G (not of Gf(x(d))). Finally, we note that expanding T to S = {v V | f(v) f(x(d))} suffices to obtain a separating set of G. Statement 2: Let S be a separating set of G that splits a connected component C G into k components C1, C2, . . . , Ck. Then, there exists a filtration that produces k 1 almost holes if the set of colors of vertices in k i=1VCi is disjoint from those of the remaining vertices, i.e., {c(v) | v V \ k i=1VCi} {c(v) | v k i=1VCi} = . Proof. Let us denote by C1, C2, ..., Ck the connected components that S separates C into. We can first set a restriction f| k i=1VCi to be any function mapping vertex colors to {1, 2, ..., | k i=1 VCi|} i.e., vertices in k i=1VCi must take filtration values in {1, 2, ..., | k i=1 VCi|}. Similarly, we can set f|V \ k i=1VCi to be any function to {| k i=1 VCi| + 1, ..., |V |}. The function f obtained by combining the domains of f| k i=1VCi and f|V \ k i=1VCi is well defined due to the assumption {c(v) | v V \ k i=1VCi} {c(v) | v k i=1VCi} = . Since C1, C2 ... , Ck are not path-connected, the persistence diagram induced by f must have k holes that are born at filtration steps in {1, 2, ..., | k i=1 VCi|}. Also, since the vertices of S are added at filtration steps in {| k i=1 VCi| + 1, ..., |V |}, all holes die, forcing the birth and death times to be different. Thus, there must be one real hole corresponding to the connected component C and k 1 almost holes. B.4 Proof of Lemma 4: Distinct almost holes imply distinct color-separating sets Proof. We will consider two cases. The first one assumes that the multisets of real holes of DG and DG are different. In the second case, we consider identical multisets of real holes and different multisets of almost holes. Case 1: multisets of real holes differ. By Lemma 2, we have that the graphs have distinct componentwise colors - that is, an empty set is a color-separating set. Case 2: D0 G and D0 G have identical real holes, but different multisets of almost holes. We want to show that there is a color-separating set for G and G . We note that we can split the condition of distinct multisets of almost holes into two sub-cases: (i) There is some color x0 such that there are more almost holes with birth time f(x0) in D0 G than in D0 G ; (ii) There is some color x0 such that there are more almost holes with death f(x0) in D0 G than in D0 G . Let us first consider case (i). By the definition of birth time, we have that Gf(x0) has more connected components of color set {x0} than G f(x0). As such, {x X X |f(x) > f(x0)} is a colorseparating set for G and G . For case (ii), we assume that there are equally many births of almost holes associated to the the color x0 otherwise we return to case (i), for which we showed how to build a color-separating set. We note that if there is a different number of connected components at any earlier filtration step than when x0 is introduced (i.e. f(y) < f(x0)), then {x X X |f(x) > f(y)} is a color separating set since Gf(y) and G f(y) do not have as many connected components, they cannot have identical component-wise colors. However, if there is no such filtration step f(y), it follows that Gf(x0) and G f(x0) cannot have the same number of components. This follows since vertices of color x0 kill more connected components in Gf(x0) than in G f(x0), while prior to this, the numbers of components were equal. Therefore, {x X X |f(x) > f(x0)} is a color-separating set. B.5 Proof of Lemma 5: Equivalence between birth times and vertex colors Proof. We consider a graph G = (V, E, c, X) and any injective vertex-color filter f : X R from which we obtain a persistence diagram D0. We want to show that there exists a bijection between the multiset of birth times B = {{b | (b, d) D0}} and the multiset of vertex colors X = {{c(v) | v V }}. Note that we can also represent a multiset as a pair B = (SB, m B) where SB is a set comprising the distinct elements of B, and m B : SB N is a multiplicity function that gives the number of occurrences of each element of SB in the multiset. If there is a bijection g : SB SX such that m B = m X g, then we say that g is also a bijection between the multisets B and X. We note that SX = Im[c] denotes the set of distinct colors in G. Without loss of generality, since we are interested in filtrations induced by f on G, we can constrain ourselves to filter values on SX . Thus, filtrations induced by f : SX R are increasing (i.e., for any consecutive filtration steps j > i, we have that Vj \Vi = ) and produce filtration steps T = {f(x) | x SX }. Because such filtrations are increasing, we have at least one vertex discovered at each step, resulting in the set of distinct birth times SB = T . The mapping g : SX SB where g(x) = f(x) for all x SX is a bijection. By definition, the number of vertices discovered at step f(x) equals the number of persistence pairs with birth time f(x), which is also equal to the number of vertices of color x. This implies that the multiplicity of an element x in X is the same as its corresponding element g(x) in B. B.6 Proof of Theorem 1: The expressive power of vertex-color filtrations Proof. We consider graphs G = (V, E, c, X) and G = (V , E , c , X ). and adopt the following notation. We use X = {{c(v) | v V }} and X = {{c (v) | v V }} to denote the multisets of vertex colors of G and G . Also, we denote by C1, . . . , Ck the components of G, and by C 1, . . . , C k the components of G . The set Xi = {c(w) | w VCi} denotes the distinct colors appearing in Ci. Similarly, X i = {c (w) | w V C i} refers to the distinct colors in C i. [Forward direction ] D0 G = D0 G there is a color-separating set The persistence diagrams D0 G and D0 G for graphs with X = X have the same birth times. It implies that if both the real holes and almost holes are identical, then the diagrams are also identical. As such, the assumption that D0 G = D0 G gives that either (1) their multisets of real holes or (2) their multiset of almost holes are different. In the following, we consider these two cases. Regarding case (1), Lemma 2 gives that if D0 G = D0 G with different multisets of real holes, then we have that {{Xi}}k i=1 = {{X i}}k i=1. Whenever this happens, the forward direction holds as even an empty set would work as a color-separating set here. Thus, it suffices to consider the case when D0 G and D0 G only differ in their multisets of almost holes. In this case, we can directly leverage Lemma 4 to obtain that there is a color-separating set for G and G . [Backward direction ] Now we want to show that if there is a color-separating set Q = for G and G , there exists a filtration such that D0 G = D0 G . If Q = , i.e. G and G have distinct component-wise colors, the claim follows by Lemma 2, with D0 G and D0 G having different multisets of real holes. If Q = , we can however use Lemma 2 to subgraphs G V and G V induced by V = V \ {w V | c(w) Q} and V = V \ {w V | c (w) Q} to gain a filter function g such that the diagrams for these subgraphs differ. Now, let s choose any a filter function such that f(x) = g(x) x X \ Q AND the filtration values for vertices with colors in X \ Q are smaller than those with colors in Q. It follows there is a filtration step j such that Gj = G V and G j = G V , and that the birth times for real holes (if the vertices of colors in Q do not merge the real holes of the subgraphs) or almost holes (if the vertices of colors in Q do merge components that would have been real holes in the subgraphs) differ. Thus, D0 G = D0 G . B.7 Proof of Lemma 6: Edge-based almost holes as disconnecting sets Proof. Initially, when none of the edges are added, there must be |V | connected components. By definition, each pair (0, d) D0 corresponds to the death of one component. It follows that Gf(x(d)) has c = |V | | {(0, d) D0 | d f(x(d))} | connected components. If G has β0 connected components, the subgraph with vertices V and edges E \ {e E | f(l(e)) f(x(d))} has more than β0 connected components. Thus, {e E | f(l(e)) f(x(d))} is a disconnecting set of G. B.8 Lemma 7: The reconstruction of a disconnecting set Proof. Let π be the permutation of colors associated with a vertex-color filter function f, i.e., for a set of colors X = (x1, . . . , xm), we have that f(xπ(i)) < f(xπ(i+1)) i = 1, . . . , m 1. Also, assume the colors associated with a disconnecting set S of a graph G = (V, E, l, X) is XS = {xπ(k), xπ(k+1), . . . , xπ(m)}. If S is a minimal disconnecting set, (0, f(xπ(k))) must be an almost hole in D since if we could add edges W S with color xπ(k) without killing some connected component of Gf(π(k 1)), the set of edges S = S \ W would form a proper disconnecting subset of a minimal disconnecting set If S is not a minimal disconnecting, then there must be a proper subset S S that is a minimal disconnecting set of G. Now, we choose S to be included first in the filtration, followed by elements of S \S . Thus, in both cases, there is a filtration s.t. an almost-hole (0, f(xk)) appears, which allows us to reconstruct S. B.9 Proof of Theorem 2: The expressive power of edge-color filtrations [ ] We split the proof of the backward direction into three cases. Proof. Case 1: The color-disconnecting set Q equals to X X . This is a trivial case. If Q = X X , G and G have distinct number of connected components when all the edges are removed from both graphs. This means |V | = |V |. Now, if |V | = |V |, then |D0 G| = |D0 G | for any filtration. Case 2: The color-disconnecting set Q is an empty set. Now, the graphs have distinct number of connected component (even if none of the edges are removed), i.e. β0 G = β0 G . The diagrams differ for any filtration since they have different numbers of real holes. Case 3: The color-disconnecting set Q = is a proper subset of X X : The existence of a color-disconnecting set implies there is a set S X X such that by removing the edges of colors S, the two graphs will have different number of connected components. Without loss of generality, we can assume that after removing the edges of colors S, G has more components than G . Now, we note that in a filtration where the colors of S are added the latest, there must either be more almost holes (0, f(x(d))) in D0 G than in D0 G with x(d) S, or alternatively β0 G = β0 G . In both cases, D0 G = D0 G for some filter function f. [ ] To prove the forward direction of the Theorem, we consider the cases where the edge-color diagrams differ in 1) their size, 2) the number of real holes, and 3) their almost holes. Proof. Case 1: |DG| = |DG |. Again, this corresponds to a trivial case, since if |DG| = |DG |, then |V | = |V |. Now, Q = X X is a color-disconnecting set. Case 2: DG and DG differ in their real holes. If there is a different count of real holes, then β0 G = β0 G , and Q = is a color-disconnecting set. Case 3: DG and DG only differ in their almost holes. We now assume that |DG| = |DG | and β0 G = β0 G , but DG = DG . This means that there is some (0, d) D such that there are more almost holes with this death time in DG than in DG , without loss of generality. There may be several such almost holes (for which the diagrams differ) with distinct death times. Let s denote the set of the death times for these almost holes by D. Then, let dmin be the minimum of the death times in D, i.e. dmin = mind D d. Let us show that the set Q = {x X X | f(x) > dmin} disconnects G into more connected components than G. For any lower filtration step, the induced subgraphs must have as many connected components because the almost holes corresponding to those steps match, and |DG| = |DG |, i.e. |V | = |V |, which means that at filtration step 0, we begin with equally many connected components. At filtration step dmin, we connect more components in G than in G because there are more almost holes corresponding to this step in DG than in DG . Now, Q must be a color-disconnecting set. B.10 Proof of Theorem 3: Edge-color vs. vertex-color filtrations Proof. This Theorem is proved in Section 3.3. In particular, Figure 3(a) provides an example of pairs of graphs that can be distinguished by vertex-color filtrations but not from edge-color ones. On the other hand, the graphs in Figure 3(b) can be distinguished by edge-color filtrations but not from vertex-color ones. This concludes the proof. B.11 Proof of Theorem 4: Re PHINE is isomorphism invariant Proof. Re PHINE diagram s isomorphism invariance stems from the fact that it is a function of a filtration on graph, and this filtration is gained from isomorphism invariant colorings. If this assumption is violated and the colorings are not gained in an invariant way, Re PHINE diagrams can also be inconsistent. It is easy to check that the tuples (b,d) are isomorphism invariant - when b = 0, these tuples correspond to diagrams gained from edge-color filtration. In this case, we can check the conditions given by Theorem 2 and note none of the conditions may be met with isomorphic graphs. With regard to b = 1, the set of missing holes is multiset of edge colors that did not appear in edge-color filtration diagram. This set can thus be gained by considering the multiset of edge colours and the edge-color diagram, which are both isomorphism invariant. Further, it is also easy to see that the tuples (α, γ) are invariant. When b = 0, the set of α s corresponds to the multiset of vertex colours, and for each vertex, γ = minv N(w) fe({c(w), c(v)}). However, the crucial part is how these two tuples are concatenated, i.e., how each of the vertices are associated with real and almost-holes. In particular, we need to check when two connected components are merged at a filtration step i and Re PHINE compares the representatives (i.e. vertices of a connected component which have not yet died ) of the two components, we will end up with same diagram elements of form (b, i, α, γ) regardless of the order we add the edges of color with filtration value i. In other words, while the Re PHINE algorithm considers one edge at a time and does only pairwise comparisons between merged connected components, the order or these comparisons must not affect the decision on which vertices are associated with death time i. Let s consider what happens when adding all the edges of a color results in merging more than two components. Assume there is a new connected components constituting of old connected components T1, T2, ..., Tn. Now, there are two different cases. Assume first that there is a strict minimum among the vertex filtration values of the old representatives. Then, any pairwise comparison will lead to choosing this minimum as the representative of the new connected component and all the other vertices will die at this filtration step. Then, assume there is no strict minimum but a tie between two or more representatives. Then, there will be comparisons based on γ, but choosing maximum of these is also permutation invariant function. In case there are two (or more) representatives such that there is a tie based on the vertex filtration values and γ values, choosing at random any of these leads to the same diagram. Lastly, note that for each real hole, (b, d) = (0, ), and so it does not thus matter how each of the vertices are matched to the real holes, when rest of vertices are associated with almost-holes in an invariant way. B.12 Proof of Theorem 5: Re PHINE is strictly more expressive than color-based PH Let RG denote the Re PHINE diagram for a graph G. Similarly, let Dv,G and De,G denote persistence diagrams associated with vertexand edge-color filtrations of G. We assume that Dv,G = (D0 v,G, D1 v,G) and De,G = (D0 e,G, D1 e,G) include 0and 1-dim persistence diagrams. We want to show that for two graphs G and G (i) if there is a vertex-color filtration such that Dv,G = Dv,G then there is a filtration that lead to RG = RG . (ii) if there is a edge-color filtration such that De,G = De,G then there is a filtration that lead to RG = RG . These results would show that Re PHINE is at least as expressive as color-based persistence diagrams. We further show that (iii) there is a pair of non-isomorphic graphs for which we can obtain RG = RG but Dv,G = Dv,G and De,G = De,G for all vertexand edge-color filtrations. Proof. Part (i): Dv,G = Dv,G RG = RG . Let f be the vertex-color function associated with the standard diagrams D. We can choose the Re PHINE s vertex-level function fv such that fv = f. We note that the original diagrams can be obtained from an auxiliary edge-level filter function fa where fa(u, w) = max(f(c(u)), f(c(w))). The procedure is described in Algorithm 1. Let fe be the edge-color filter function of Re PHINE. If we choose fe = fa, then Re PHINE contains in the second and third elements of its tuples exactly the same persistence information of the vertex-color diagrams. Note that in this case, we do not even need to require injectivity of the edge-color filter fe since the max function is not injective. Regarding the 1-dim features, for any tuple (d, ) in the 1-dim persistence diagram, we have a missing hole (1, d, , ) that comprises the same information. Thus, we have constructed vertexand edge-color functions such that Dv,G = Dv,G RG = RG . Part (ii): De,G = De,G RG = RG . This is a trivial case as Re PHINE consists of an augmented version of standard edge-color diagrams. Let f be the edge-color (injective) filter function associated with the standard diagrams. In this case, we can simply set fe = f, where fe is Re PHINE s edge-color filter. Then, the first and second elements of Re PHINE s tuples correspond to De. Regarding the 1-dim features, the only difference is the way the information is encoded. While we adopted the convention (1, d) for missing holes, the standard diagrams often use (d, ). The relevant information is the same. Therefore, Re PHINE is at least as expressive as edge-color persistence diagrams. Part (iii). To show that Re PHINE is strictly more expressive than color-based PH, it suffices to provide an example of two graphs for which there is a filtration such that RG = RG but these graphs cannot be separated from any vertexor edge-color filtration. We use the pair of graphs in Figure 3(c). We note that these graphs have no cycles, making 1-dim persistence information trivial. We first note that their multisets of colors are identical and there is no color-separating sets for these two graphs i.e., there is no subset of colors whose removal would separate the graphs into distinct component-wise colors. Thus, by Theorem 1, there is no vertex-color filtration s.t. Dv,G = Dv,G . Also, we have that |V | = |V | and X = X and β0 G = β0 G , and there is no color-disconnecting set for G and G (i.e., there is no edge colors whose removal would generate subgraphs with different number of components). By Theorem 2, these graphs cannot be separated by any edge-color filtration. However, if we choose the filter functions fv( blue ) = 1, fv( orange ) = 2, fe( blue-blue ) = 4, and fe( blue-orange ) = 3, we obtain distinct Re PHINE di- agrams given by RG = {{(0, 4, 1, 4), (0, , 1, 3), (0, 3, 2, 3), (0, 3, 2, 3)}} and RG = {{(0, , 1, 3), (0, 4, 1, 3), (0, 3, 2, 3), (0, 3, 2, 3)}}. C Implementation details C.1 Datasets Table S1 reports summary statistics of the real-world datasets used in this paper. For the IMDB-B dataset, we use uninformative features (vector of ones) for all nodes. NCI1, NCI109, Proteins, and IMDB-B are part of the TU Datasets2, a vast collection of datasets commonly used for evaluating graph kernel methods and GNNs. MOLHIV is the largest dataset (over 41K graphs) and is part of the Open Graph Benchmark3. We also consider a regression task using the ZINC dataset a subset of the popular ZINC-250K chemical compounds [19], which is particularly suitable for molecular property prediction [7]. Table S1: Statistics of the datasets. Dataset #graphs #classes #node labels Avg #nodes Avg #edges NCI1 4110 2 37 29.87 32.30 IMDB-B 1000 2 - 19.77 96.53 PROTEINS (full) 1113 2 3 39.06 72.82 NCI109 4127 2 38 29.68 32.13 MOLHIV 41127 2 9 25.5 27.5 ZINC 12000 - 28 23.16 49.83 The cubic datasets (Cubic08, Cubic10, and Cubic12) comprise non-isomorphic 3-regular graphs with 8, 10, and 12 vertices, respectively. These datasets contain 5 (Cubic08), 19 (Cubic10), and 85 (Cubic12) graphs and can be downloaded at https://houseofgraphs.org/meta-directory/ cubic. For each dataset, we create a balanced graph classification problem by randomly assigning each graph a binary class. Also, since the graphs do not have node features, we add a scalar feature to each vertex, i.e., c(v) = 1 for all v. However, this would make 1WL-GNNs and PH fail to distinguish any pair of graphs. Thus, we change the features of some arbitrary vertices of each graph, making c(v) = 1 for 1 vertex in graphs from Cubic08, 2 vertices in Cubic10, and 3 vertices in Cubic12 we denote the resulting datasets as Cubic08-1, Cubic10-2, and Cubic12-3. Given the modified datasets, we aim to assess if the existing methods can overfit (correctly classify all) the samples. C.2 Models We implement all models using the Py Torch Geometric Library [10]. Synthetic data. The GNN architecture consists of a GCN with 2 convolutional layers followed by a sum readout layer and an MLP (one hidden layer) with Re LU activation. The resulting architecture is: Conv(1, 36) Conv(36, 16) sum-readout BN(16) MLP(16, 24, 1), where BN denotes a batch norm layer [18]. For the PH model, we consider standard vertex-color filtration functions. In particular, we apply the same procedure as Hofer et al. [13], Horn et al. [15] to compute the persistence tuples. We only consider 0-dim persistence diagrams. The filtration function consists of an MLP with 8 hidden units and Re LU activation followed by a component-wise sigmoid function: Sigmoid(MLP(1, 8, 4)) i.e., we use 4 filtration functions with shared parameters. Since we can associate persistence tuples with vertices, we concatenate the resulting diagrams to obtain a |V | (4 2) matrix [D0 1, D0 2, D0 3, D0 4], where D0 i denotes the 0-dim diagram obtained using the i-th filtration function. This procedure was also employed by Horn et al. [15]. The obtained diagrams are processed using a Deep Set layer with mean aggregator and internal MLP function (Ψ) with 16 hidden and output units, MLP(4 * 2, 16, 16). We then apply a linear layer on top of the aggregated features. The overall Deep Set architecture is: MLP(4 * 2, 16, 16) Mean Aggregator Linear(16, 16). Finally, we obtain class predictions using Batch Norm followed by a single-hidden-layer MLP with 16 hidden units: BN(16) MLP(16, 16, 1). Re PHINE uses the same overall architecture as the PH model. The only differences are that i) Re PHINE tuples are 3-dimensional (as opposed to 2-dimensional in PH), and ii) Re PHINE addi- 2https://chrsmrrs.github.io/datasets/ 3https://ogb.stanford.edu tionally leverages an edge-level filtration function. Such a function follows the architecture of the vertex-level one, i.e., Sigmoid(MLP(1, 8, 4)). We note that Re PHINE tuples are 3-dimensional instead of 4-dimensional because we removed their uninformative first component (equal to 0) since we only use 0-dim diagrams. In other words, we do not leverage missing holes. Regarding the training, all models follow the same setting: we apply the Adam optimizer [21] for 2000 epochs with an initial learning rate of 10 4 that is decreased by half every 400 epochs. We use batches of sizes 5, 8, 32 for the cubic08, cubic10, and cubic12 datasets, respectively. All results are averaged over 5 independent runs (different seeds). For all models, we obtain the expressivity metric by computing the uniqueness of graph-level representations extracted before the final MLP, with a precision of 5 decimals. Importantly, these choices of hyperparameters ensure that all models have a similar number of learned parameters: 1177 (Re PHINE), 1061 (PH), and 1129 (GCN). Real-world data. For computing the standard vertex-color persistence diagrams, we use the code available by Horn et al. [15], which consists of a parallel implementation in Py Torch of the pseudocode in Algorithm 1. Moreover, we apply a multiple filtration scheme and concatenate the 0-dim persistence diagrams to form matrix representations again similarly to the design in [15]. Then, we apply a Deep Set architecture of the form: MLP(Tuple Size * n Filtrations, Out Dim, Out Dim) Mean Aggregator Linear(Out Dim, Out Dim). We use MLPs to define vertexand edge-level filtration functions. For the 1-dimensional persistence tuples (or missing holes), we first process the tuples from each filtration function using a shared Deep Set layer and then apply mean pooling to obtain graph-level representations this avoids possibly breaking isomorphism invariance by concatenating 1-dimensional diagrams. We sum the 0and 1-dim embeddings and send the resulting vector to an MLP head. The resulting topological embeddings are concatenated with last-layer GNN embeddings and fed into a final MLP classifier. We carry out grid-search for model selection. More specifically, we consider a grid comprised of a combination of {2, 3} GNN layers and {2, 4, 8} filtration functions. We set the number of hidden units in the Deep Set and GNN layers to 64, and of the filtration functions to 16 i.e., the vertex/edge-color filtration functions consist of a 2-layer MLP with 16 hidden units. For the largest datasets (ZINC and MOLHIV), we only use two GNN layers. The GNN node embeddings are combined using a global mean pooling layer. Importantly, for all datasets, we use the same architecture for Re PHINE and color-based persistence diagrams. For the TUDatasets, we obtain a random 80%/10%/10% (train/val/test) split, which is kept identical across five runs. The ZINC and MOLHIV datasets have public splits. All models are initialized with a learning rate of 10 3 that is halved if the validation loss does not improve over 10 epochs. We apply early stopping with patience equal to 40. Comparison to Pers Lay. We followed the guidelines in the official code repository regarding the choice of hyper-parameters. In particular, Pers Lay applies fixed filtration functions obtained from heat kernel signatures of the graphs with different parameters, resulting in extended and ordinary diagrams for 0 and 1-dimensional topological features. For Re PHINE+Linear, we carry out a simple model selection procedure using grid-search for the number of filtration functions ({4, 8}) and the number of hidden units ({16, 64}) in the Deep Set models. Hardware. For all experiments, we use Tesla V100 GPU cards and consider a memory budget of 32GB of RAM. C.3 Computing Re PHINE diagrams Algorithm 2 describes the computation of Re PHINE diagrams. The pseudocode has been written for clarity, not efficiency. The replacement for in real holes depends on the choice of edge and vertex filter functions. In all experiments, we employed the logistic function to the output of the feedforward networks (i.e., filtered values lie in [0, 1]) and used 1 to denote the death time of real holes. D Additional experiments Here, we complement the experiments on synthetic data, providing illustrations of the learned persistence diagrams and reporting results obtained when we combine 0and 1-dimensional diagrams. In Figure S3, we show the concatenation of the learned persistence diagrams at the end of the training procedure for Re PHINE and PH (i.e., standard vertex-color filtrations). In these examples, the Re PHINE diagrams are different while the PH ones are identical. We can observe this behavior by Algorithm 2 Re PHINE Require: V, E, e Values, v Values Vertices, edges, and edge/vertex-color filter values uf UNIONFIND(|V |) pers0 zeros(|V |, 4) Initialize the persistence tuples pers1 zeros(|E|, 4) pers0[:, 3] v Values Pre-set the birth times of each node SINDICES, SVALUES SORT(e Values) for e, weight Pair(SINDICES, SVALUES) do Pair is equivalent to the zip function in Python (v, w) e if pers0[v, 4] = 0 then pers0[v, 4] weight Save the first filtration step that a node is discovered end if if pers0[w, 4] = 0 then pers0[w, 4] weight end if younger uf.find(v) younger denotes the component that will die older uf.find(w) if younger = older then A cycle was detected pers1[e, 1] 1 pers1[e, 2] weight pers1[e, [3,4]] continue else if v Values[younger] = v Values[older] then if pers0[younger, 4] < pers0[older, 4] then Additional disambiguation step younger, older, v, w older, younger, w, v Flip younger, older, and node ids end if else if v Values[younger] < v Values[older] then younger, older, v, w older, younger, w, v end if end if pers0[younger, 2] weight uf.merge(v, w) Merge two connected components end for for r uf.roots() do pers0[r, 2] end for R JOIN(pers0, pers1) return R carefully inspecting the multisets of vectors at each row of the concatenated tuples (each row of the plots in Figure S3). For instance, consider the diagrams in Figure S3(b): in the Re PHINE diagram for the top graph, there is a row with 3 yellow entries which do not appear at the diagram for the bottom graph. However, the representations obtained from Standard PH are identical for these graphs. In Figure 6 we reported results using only 0-dimensional topological features. For completeness, Figure S4 shows learning curves when exploiting both 0 and 1-dimensional diagrams. Overall, we can again observe that Re PHINE produces higher expressivity and better fitting capability in comparison to vertex-color persistence diagrams. (a) Cubic08-1 (b) Cubic10-2 (c) Cubic12-3 (d) Cubic10-2 (e) Cubic12-3 (f) Cubic12-3 Figure S3: 0-dimensional diagrams obtained from Re PHINE and PH (standard vertex-color filtrations). These represent pairs of graphs for which the learning procedure in Re PHINE could yield different representations, whereas PH produced identical graph-level embeddings. Data (+1) Data (-1) 0 500 1000 1500 2000 Expressivity 0 500 1000 1500 2000 0 500 1000 1500 2000 Re PHINE PH 0 500 1000 1500 2000 Expressivity 0 500 1000 1500 2000 0 500 1000 1500 2000 1.6 0 500 1000 1500 2000 Epochs Expressivity 0 500 1000 1500 2000 Epochs 0 500 1000 1500 2000 Epochs Figure S4: Average learning curves for Re PHINE and PH on cubic graphs, using both 0-dim and 1-dim persistence diagrams. Again, Re PHINE can better fit the graph samples.