# maximally_expressive_gnns_for_outerplanar_graphs__d912c17d.pdf Published in Transactions on Machine Learning Research (12/2024) Maximally Expressive Graph Neural Networks for Outerplanar Graphs Franka Bause 1,2, Fabian Jogl 3,4, Patrick Indri3, Tamara Drucks3, David Penz3,5, Nils M. Kriege1,6, Thomas Gärtner3, Pascal Welke3, Maximilian Thiessen3 1Faculty of Computer Science, University of Vienna, Vienna, Austria 2Uni Vie Doctoral School Computer Science, University of Vienna, Vienna, Austria 3Machine Learning Research Unit, TU Wien, Vienna, Austria 4Center for Artificial Intelligence and Machine Learning, Vienna, Austria 5Multimedia Mining and Search, Johannes Kepler University Linz, Linz, Austria 6Research Network Data Science, University of Vienna, Vienna, Austria {firstname.lastname}@{univie, tuwien}.ac.at Reviewed on Open Review: https: // openreview. net/ forum? id= Xxb QAsxr RC We propose a linear time graph transformation that enables the Weisfeiler-Leman (WL) algorithm and message passing graph neural networks (MPNNs) to be maximally expressive on outerplanar graphs. Our approach is motivated by the fact that most pharmaceutical molecules correspond to outerplanar graphs. Existing research predominantly enhances the expressivity of graph neural networks without specific graph families in mind. This often leads to methods that are impractical due to their computational complexity. In contrast, the restriction to outerplanar graphs enables us to encode the Hamiltonian cycle of each biconnected component in linear time. As the main contribution of the paper we prove that our method achieves maximum expressivity on outerplanar graphs. Experiments confirm that our graph transformation improves the predictive performance of MPNNs on molecular benchmark datasets at negligible computational overhead. 1 Introduction We study graph neural networks (GNNs) for the class of outerplanar graphs and devise a linear time preprocessing step that message passing graph neural networks (MPNNs) to distinguish all non-isomorphic outerplanar graph. Morris et al. (2019) and Xu et al. (2019) showed that MPNNs have limited expressivity, i.e., there exist non-isomorphic graphs on which each MPNN produces the same embedding. Such graphs exist even within the restricted class of outerplanar graphs (see Figure 1). This led to the development of GNNs that are more expressive than MPNNs, often called higher-order GNNs. However, the increase in expressivity usually comes with a significant increase in computational complexity. For example, k GNNs (Morris et al., 2019) have a time complexity of Ω(|V |k). Other higher-order GNNs count pattern graphs such as cliques (Bodnar et al., 2021b), cycles (Bodnar et al., 2021a;b), and general subgraphs (Bouritsas et al., 2022), which can take time exponential in the pattern size. However, for certain domains of interest, the graph structure can be exploited to build efficient higher-order GNNs. In this work, we focus on the pharmaceutical domain and on graphs that represent molecules. Over 92% to 97% of the graphs in widely used benchmark datasets in this domain are outerplanar (see Table 1). The properties of outerplanar graphs have been exploited by algorithms for graph mining (Horváth et al., 2010) and molecular similarity computation (Schietgat et al., 2013; Droschinsky et al., 2017), but not yet for GNNs. We focus on this Equal contribution. Published in Transactions on Machine Learning Research (12/2024) Figure 1: The molecules bicyclopentyl (left) and decalin (right) which correspond to outerplanar graphs that cannot be distinguished by MPNNs and 1-WL. class of graphs and devise a linear time transformation that allows MPNNs to become maximally expressive on outerplanar graphs. This implies that, in principle, our architecture can solve any learning task on outerplanar graphs. While this does not mean that any given maximally expressive model will perform well in practice, our experiments show that our proposed transformation improves the predictive performance of several GNN architectures on multiple benchmark learning tasks. We propose to decompose outerplanar graphs into biconnected outerplanar components and trees. Using the fact that each biconnected outerplanar component has a unique Hamiltonian cycle that can be computed in linear time, we split each component into two graphs corresponding to the directed Hamiltonian cycles, and prove that MPNNs are maximally expressive on biconnected outerplanar graphs transformed in this way. Taking advantage of the well-known fact that MPNNs are maximally expressive on labeled trees (Arvind et al., 2015; Kiefer, 2020), we extend our result to a linear time graph transformation called Cyclic Adjacency Transform (CAT) that works on all outerplanar graphs. We benchmark CAT with common MPNNs on a variety of molecular graph benchmarks and show that CAT consistently boosts the performance of MPNNs. Main contributions. We propose CAT, a linear time graph transformation that renders MPNNs maximally expressive on outerplanar graphs. Experimentally, CAT consistently improves the performance of MPNNs with little increase in runtime. 2 Discussion and Related Work Since the expressivity of MPNNs is bounded by the 1-WL algorithm (Morris et al., 2019; Xu et al., 2019), any pair of non-isomorphic graphs that cannot be distinguished by 1-WL will get mapped to the same embedding by any given MPNN. One such pair of graphs are the molecules decalin and bicyclopentyl (see Fig. 1). As these two graphs are outerplanar, it follows that MPNNs are not sufficiently expressive for outerplanar graphs. Furthermore, in the graph mining community it is well known that many pharmaceutical molecules are outerplanar (Horváth et al., 2006; Horváth & Ramon, 2010). Outerplanarity has also been discussed in the context of reconstruction with GNNs (Cotta et al., 2021). This motivates the need for GNNs that are highly expressive on outerplanar graphs. Outerplanar graphs have treewidth at most two (Bodlaender, 1998), and Kiefer (2020) showed that 3-WL is sufficiently expressive to distinguish all outerplanar graphs. Hence, any GNN that matches the expressivity of 3-WL, such as 3-IGN (Maron et al., 2019) or 3-GNN (Morris et al., 2019), is capable of solving our main goal of distinguishing all outerplanar graphs. However, a single iteration of a naive implementation of the 3-WL algorithm on a graph with n vertices is at least O(n3) (Immerman & Lander, 1990; Kiefer, 2020), which can be infeasible even for medium-sized real-world graphs. Similarly, a single 3-GNN or 3-IGN layer runs in roughly O(n3) time (Maron et al., 2019; Morris et al., 2019). Even when additionally restricting the graph class to biconnected outerplanar graphs, MPNNs are not sufficiently expressive. Furthermore, Zhang et al. (2023b) have shown that most GNNs cannot even detect simple properties associated with biconnectivity such as articulation vertices. They find that only their distance-based GNN and specific GNNs based on subgraphs (Bevilacqua et al., 2021; Frasca et al., 2022) are able to detect some of these properties. Again, these approaches have at least quadratic worst case runtime. It is not straightforward to use outerplanarity to efficiently improve the expressivity of GNNs, as even finding a subgraph remains NP-hard for outerplanar graphs (Sysło, 1982). Thus, methods like the graph structural network (Bouritsas et al., 2022) that rely on counting subgraphs remain computationally expensive even on outerplanar graphs. Subgraph GNNs model graphs as a collection of subgraphs (Frasca et al., 2022), which usually requires a pre-processing with at least quadratic runtime, depending on the method used to extract Published in Transactions on Machine Learning Research (12/2024) Figure 2: A graph and both directions of its directed Hamiltonian cycle. Nodes are annotated with their HALs, the distances on the Hamiltonian cycle to their neighbors (Colbourn & Booth, 1981). subgraphs. For example, Node-delete (Bevilacqua et al., 2021) creates all subgraphs that are obtained by deleting a single node creating O(n2) nodes in total. k-ego-net (Bevilacqua et al., 2021) extracts the k-hop neighborhood for each node which for k 2 can create O(n2) nodes in the worst case, for example for star graphs, which are also outerplanar. Recently, Dimitrov et al. (2023) have proposed Plan E, a GNN architecture that can distinguish all planar graphs in O(n2) time. As a consequence, Plan E is also able to distinguish all outerplanar graphs. However, this comes at the cost of (1) runtime, (2) flexibility, and counter-intuitively (3) expressivity. (1) Plan E requires a quadratic time pre-processing whereas our proposed CAT+MPNN runs in linear time. (2) Plan E is a GNN architecture while CAT is a graph transformation. This means, that CAT can be combined with any GNN while Plan E requires specialized GNN layers which are not easily combined with other architectures. (3) Without changes to Plan E s pre-processing, Plan E is incapable of operating on graphs with non-planar components whereas CAT still achieves at least 1-WL expressivity on such graphs. This leads to the counter-intuitive result that vanilla Plan E is incomparable in expressivity to CAT (see Prop. 2 and 3 in the Appendix). Finally, while there exist many GNNs that are provably more expressive than WL, little is known about the precise class of graphs for which such GNNs are provably maximally expressive. Furthermore, proving an upper bound on the expressivity of an architecture is considered difficult and requires significant effort as demonstrated by Zhang et al. (2023a). In contrast, we identify outerplanar graphs as a large practical graph family that our proposed method CAT can distinguish. 3 Preliminaries A graph G = (V, E, µ, ν) consists of a set of nodes V , a set of edges E V V and attributes (also called features) for the nodes µ: V X and edges ν : E X, respectively, where X is a set of arbitrary attributes. We refer to an edge from u to v by uv and in case of undirected graphs uv = vu. The in-neighbors of a node u V are denoted by Nin(u) = {v | vu E}. The out-neighbors of a node u V are denoted by Nout(u) = {v | uv E} and in case of undirected graphs, Nin = Nout. In this paper, the input graphs are undirected and are transformed into directed ones. A graph G = (V , E , µ , ν ) is a subgraph of a graph G, denoted by G G, iff V V , E E, v V : µ (v) = µ(v), and e E : ν (e) = ν(e). A (directed) cycle (v1, . . . , vk) is a sequence of k 3 distinct nodes, with i {1, . . . , k 1} : vivi+1 E and vkv1 E. A graph is acyclic if it does not contain a cycle. Given a graph G, we denote the shortest path distance between two nodes u and v by d G(u, v), or d(u, v) if G is clear from the context. We denote the diameter of a graph G by Φ(G) = maxu,v V (G) d(u, v). A graph is outerplanar if it can be drawn in the plane without edge crossings and with all nodes belonging to the exterior face (see Appendix A for details). We call an undirected graph with at least three vertices biconnected if the removal of any single node does not disconnect the graph. A biconnected component is a maximal biconnected subgraph. We refer to the outerplanar biconnected components of a graph as blocks. Two graphs G and H are isomorphic, if there exists a bijection ψ: V (G) V (H), so that u, v V (G): µ(v) = µ(ψ(v)) uv E(G) ψ(u)ψ(v) E(H) uv E(G): ν(uv) = ν(ψ(u)ψ(v)). We call ψ an isomorphism between G and H. An in-tree T is a directed, acyclic graph with a distinct root Published in Transactions on Machine Learning Research (12/2024) Figure 3: Biconnected outerplanar graph G, CAT (G), and unfolding tree of node b. having no outgoing edges and all other nodes having one outgoing edge and for every node there is exactly one path from it to the root. Weisfeiler-Leman. The 1-dimensional Weisfeiler-Leman algorithm (WL) assigns colors (usually represented by numbers) to nodes. The color of a node v V (G) is updated iteratively according to ci+1(v) = h (ci(v), {{(ν(uv), ci(u)) | u Nin(v)}}), where h is an injective function and c0 = µ. Note, that this variant of WL makes use of edge features and works on directed graphs. While traditionally WL is defined for undirected and unlabeled graphs, this is a common assumption in similar lines of work. The unfolding tree with height i of a node v V (G) is defined as the in-tree F v i = (v, {{F u i 1 | u Nin(v)}}), where F v 0 = ({v}, ). The unfolding trees F v i and F w i of two nodes v and w are isomorphic if and only if the colors of the nodes in iteration i are equal, see, e.g., Kriege (2022). The Weisfeiler-Leman algorithm has historically been used as a heuristic for graph isomorphism. Let WL(G) = {{c (v) | v V (G)}} be the multiset of node colors in the stable coloring (Arvind et al., 2015). Two graphs G and H are not isomorphic if WL(G) = WL(H). However, non-isomorphic graphs G and H with WL(G) = WL(H) exist. WL for example cannot distinguish the molecular graphs in Figure 1 or a 6-cycle from two triangles. Expressivity. Let G denote the set of all graphs and Gn = {G G | |V (G)| n} for all n N. Let ϕ, ψ be two graph embedding algorithms, which map graphs to embedding spaces (e.g., Rd). We say ϕ is at least as expressive as ψ if for all pairs of graphs G, G with ψ(G) = ψ(G ) it holds that ϕ(G) = ϕ(G ). Let G G be a family of graphs, for example, the set of all outerplanar graphs. We say that a graph embedding algorithm ϕ is maximally expressive for G if for every non-isomorphic pair of graphs G, G G it holds that ϕ(G) = ϕ(G ). We can generalize this to parameterized graph embeddings such as GNNs: Let ϕw be a graph embedding with parameters w (e.g., the weights of a neural network). A parameterized graph embedding ϕw is maximally expressive for G if for all n N there exists a parameter choice wn such that for every non-isomorphic pair of graphs G, G G Gn it holds that ϕwn(G) = ϕwn(G ). Hamiltonian adjacency lists. A Hamiltonian cycle of a graph is a cycle containing each node exactly once. A Hamiltonian cycle (v1, . . . , vk) on an undirected graph, can be separated into two directed Hamiltonian cycles C = (v1, . . . , vk) with corresponding edges v1v2, . . . , vkv1 and C = (vk, . . . , v1) with corresponding edges vkvk 1, . . . , v1vk. Biconnected outerplanar graphs have a unique Hamiltonian cycle that can be found in linear time (Mitchell, 1979). Hamiltonian adjacency lists (HALs) are derived by annotating each node with the sorted distances d C to all its neighbors on the two directed variants of the Hamiltonian cycle C. Figure 2 shows a graph annotated with its HALs in both directions of the Hamiltonian cycle. Following the Hamiltonian cycle in one direction and concatenating the HALs gives a HAL sequence S (and a reverse sequence R, for the other direction). A sequence S of length n is a cyclic shift of another sequence S of length n if there exists an ℓ N such that Si = S j for all i {1, . . . , n} where j = i + ℓmod n. The HAL sequence uniquely identifies a biconnected outerplanar graph (if both directions and cyclic shifts are considered): Lemma 1 (Colbourn & Booth 1981). Two biconnected outerplanar graphs G and H with HAL and reverse sequences SG, SH and RG, RH are isomorphic, iff SG is a cyclic shift of SH or RH. Published in Transactions on Machine Learning Research (12/2024) Table 1: Common benchmark datasets and percentage of outerplanar graphs in them. Dataset #Graphs Outerplanar ZINC 12000 98 % PCQM-Contact 529434 98 % MOLESOL 1128 97 % MOLTOXCAST 8576 96 % MOLTOX21 7831 96 % MOLLIPO 4200 96 % MOLCLINTOX 1477 94 % NCI-2000 250251 94 % peptides-func 15535 93 % MOLBACE 1513 93 % MOLSIDER 1427 92 % MOLBBBP 2039 92 % MOLHIV 41127 92 % Table 2: Pre-processing time of CAT on the training splits of all datasets and relative additional training/evaluation time with CAT. Dataset CAT Runtime Train+Eval. Time Change MOLESOL 2 1 s 26 % MOLBBBP 5 1 s 36 % MOLSIDER 6 1 s 21 % MOLBACE 6 1 s 42 % MOLLIPO 14 1 s 38 % MOLTOX21 15 1 s 27 % MOLTOXCAST 16 1 s 13 % ZINC 44 1 s 27 % MOLHIV 152 1 s 31 % 4 Identifying Outerplanar Graphs Using Weisfeiler-Leman We develop a graph transformation called cyclic adjacency transform (CAT), that enables WL to distinguish all outerplanar graphs. We first introduce CAT , enabling WL to distinguish any pair of non-isomorphic biconnected outerplanar graphs, and then extend it to all outerplanar graphs. In CAT (see Section 4.1), nodes are duplicated to represent the Hamiltonian cycle in both directions. We annotate edges not in the Hamiltonian cycle with the distance their endpoints have on the Hamiltonian cycle. This allows the Weisfeiler-Leman algorithm to encode HAL sequences in the unfolding trees of the nodes and in turn distinguish pairs of non-isomorphic biconnected outerplanar graphs. To extend our transformation to all outerplanar graphs (see Section 4.2), we need to ensure that the biconnected components keep their unique encoding. It should also be ensured that non-isomorphic graphs with the same biconnected components (but arranged or rotated differently) can be distinguished. We address this by introducing articulation and block pooling vertices. These nodes contain all the information about their respective blocks and their position in these blocks. Together with nodes outside of blocks, this forms a forest which encodes the entire original graph. As it is known that WL is maximally expressive on forests, it follows that WL is maximally expressive on such graphs. 4.1 Identifying Biconnected Outerplanar Graphs Using Weisfeiler-Leman We first present a graph transformation called CAT , that allows the Weisfeiler-Leman algorithm to distinguish any two non-isomorphic biconnected outerplanar graphs. Figure 3 shows an example of CAT . Note that CAT (G) consists of two disjoint copies of G, with directed and annotated edges. We first describe the transformation informally for ease of understanding, followed by the formal definition. We start with an empty graph and add the (unique) Hamiltonian cycle of the original graph twice, directed in opposite ways (steps 1 and 2). Then, we add the remaining edges of the original graph to both cycles, directed in both directions each (step 3). Finally, we set node and edge features to the original features, and extend edge features with the distance of their endpoints in the Hamiltonian cycle (step 4). Definition 1. The CAT (G) = G transformation maps a biconnected outerplanar graph G = (V, E, µ, ν) to a new graph G = (V , E , µ , ν ) as follows: 1. Let (v1, . . . , vn) be the Hamiltonian cycle of G and C and C be its directed versions. 2. Add node and edge disjoint copies of C and C together with the corresponding edges to an empty graph G and e E set ν (e) = (1, ν(e)). Published in Transactions on Machine Learning Research (12/2024) Figure 4: One part of the CAT transformation of the graph from Figure 3 and an example unfolding tree of one of its nodes from which the HAL sequence of the original graph can be reconstructed. 3. Let D E be the edges of G not on the (undirected) Hamiltonian cycle. Add edges in both directions to G for the copies of C and C for each edge in D: E = E Ed E d with Ed = S vivj D{v iv j, v jv i} and E d = S vivj D{v i v j , v j v i } for copies v i of vi in C (resp. v i in C ). 4. v i, v i V set µ (v i) = µ (v i ) = µ(vi). vivj Ed set ν (vivj) = (d C(vj, vi), ν(vivj)) and v iv j E d set ν (v iv j) = (d C (v j, v i), ν(vjvi)). Using CAT we prove our first main result. Theorem 1. Two biconnected outerplanar graphs G and H are isomorphic, if and only if WL(CAT (G)) = WL(CAT (H)). Proof. Two graphs are distinguished by WL if and only if the multisets of node colors of their stable colorings differ. Trivially, |V (G)| = |V (H)| |V (CAT (G))| = |V (CAT (H))| WL(CAT (G)) = WL(CAT (H)), so we only focus on graphs with |V (G)| = |V (H)|. Two nodes only get the same color if their unfolding trees are isomorphic. The first number in the HAL of each node is always 1, so it can be ignored, and the last number is always |V (G)| 1, so this can simply be reconstructed by |V (CAT (G))|. The rest of the HAL sequence and the node labels of G can be reconstructed from the unfolding tree of any node in CAT (G): Trivially, each node has two direct neighbors in the Hamiltonian cycle. In the unfolding tree these are the parent and the single child with the 1-annotated edge. All other neighbors in the HAL can be reconstructed by looking at the weights of the edges that do not have weight 1. Figure 4 shows an example. Looking at any two biconnected outerplanar graphs with n nodes, the Weisfeiler-Leman algorithm will be able to distinguish them after at most n iterations, iff they are non-isomorphic: Since the HAL sequence is encoded in the unfolding trees from all starting points (cyclic shift) and in both directions (reverse direction), this identifies isomorphism by Lemma 1. 4.2 Extending CAT to All Outerplanar Graphs While CAT is defined for single blocks (biconnected outerplanar components), an outerplanar graph can contain multiple of them, as well as additional connections. We define the CAT transformation by applying CAT to the blocks of the graphs and adding additional nodes and edges, which enable WL to distinguish any two non-isomorphic outerplanar graphs. We first give an informal description to provide an intuition for the various steps before stating the formal definition. Starting from an empty graph, add the graph induced by all edges not in blocks and nodes in more than one block (steps 1 and 2). For each block, add the result of CAT applied to the block (step 3.1). We refer to nodes created in this step as Hamiltonian cycle nodes. Add pooling nodes connecting the two Hamiltonian cycle nodes corresponding to the same original node (step 3.2). Add a node for each block (block nodes) and connect it to the pooling nodes of that block (step 3.3). Relabel the nodes from step 2, that belong to at least one block (articulation nodes) and connect them to the pooling nodes corresponding to the same original node (step 3.6). Finally, create a global (block) pooling node and connect it to the block nodes (step 4). For each node we add, its initial features (if present) are extended by a label referring to the type (Hamiltonian cycle node, pooling node, articulation node, etc.). Published in Transactions on Machine Learning Research (12/2024) Figure 5: A graph and its CAT transformation. Original node labels are represented by letters, edge and node labels from the CAT transform are represented by colors. Note that CAT(G) looks like a cat. Definition 2. The CAT(G) = G transformation maps a graph G to a new graph G as follows: 1. Let B1, . . . , Bℓbe the blocks of G and let F be the graph induced by the edges of G that are not in any block plus the nodes that are present in more than one block. Let { , , , , } be distinct node labels not in X. 2. Add F to G with labels µ (v) = ( , µ(v)) for all v F. 3. For each block Bi in G: 3.1. Let B i, B i be the two connected components in CAT (Bi). Add B i and B i to G . 3.2. For all pairs (v, v ) of corresponding nodes in B i and B i add a node pv with µ (pv) = ( , µ(v)) and edges pvv, vpv, pv v , v pv to G . 3.3. Add a node bi to G with µ (bi) = . For all v V (Bi) add edges bipv and pvbi. 3.4. Let Ai = V (Bi) V (F) be the nodes of Bi in F. 3.5. Let γi : Ai V (B i) map nodes of F to their copy in B i. 3.6. For each a Ai, let µ (a) = ( , µ(a)) and add edges pγi(a)a and apγi(a) to G . 4. Add node g with µ (g) = to G and for all nodes bi, add edges gbi and big to G . 5. Let CAT(G) = G . An example of the CAT transformation can be seen in Figure 5. Appendix F contains additional visualizations of the transformation on real-life molecular graphs. Informally, a graph is transformed using CAT by uniquely encoding all its blocks (using CAT ) and then connecting the results to encode also the position within the graph and the orientation using the original graph structure. All nodes and edges that are not part of a block are of course also added to the transformed graph. We can now prove our main result, which implies that MPNNs together with CAT are maximally expressive on outerplanar graphs. Theorem 2. Two outerplanar graphs G and H are isomorphic, if and only if WL(CAT(G)) = WL(CAT(H)). Proof. Following Theorem 1, each block will be uniquely identified by WL. Since the additional nodes have distinct labels, they will not cause WL to falsely report two blocks as isomorphic when they are not. The Published in Transactions on Machine Learning Research (12/2024) Table 3: Resistance and diameter before and after the CAT transformation. ρ(G) and maxρ(G) denote the average and maximum pair-wise effective resistance for graph G. Results are reported as mean and standard deviation across all graphs in the datasets. In all cases, smaller is better. Dataset Φ(G) Φ(CAT(G)) ρ(G) ρ(CAT(G)) maxρ(G) maxρ(CAT(G)) ZINC 12.5 2.6 9.9 1.6 4.0 0.7 2.6 0.4 10.0 2.0 7.7 1.9 MOLESOL 6.6 3.3 6.9 3.8 2.3 1.0 2.1 0.9 5.5 2.3 6.0 2.8 MOLTOXCAST 8.5 4.7 8.4 4.0 3.0 1.5 2.6 1.3 7.2 4.3 7.4 3.9 MOLTOX21 8.8 4.6 8.7 4.0 3.1 1.5 2.7 1.3 7.5 4.2 7.7 3.9 MOLLIPO 13.8 4.0 9.9 2.1 4.3 1.2 2.6 0.5 10.7 3.4 7.9 2.3 MOLBACE 15.1 3.2 11.5 2.8 5.0 1.3 2.9 0.7 12.5 3.4 9.1 2.6 MOLSIDER 12.6 11.8 10.4 7.3 4.1 3.8 2.9 2.2 10.4 11.0 8.9 6.8 MOLBBBP 10.7 3.7 9.1 2.6 3.4 1.1 2.4 0.6 8.3 3.8 7.5 2.5 MOLHIV 11.9 5.2 9.9 3.8 3.9 1.7 2.7 1.2 9.3 4.7 8.2 3.8 information about the entire HAL sequence of each block is stored in the block nodes b. The pooling nodes p connect the block and block nodes to the rest of the graph (through the articulation nodes a), determining the orientation of the block. Note that the graph returned by CAT without the CAT blocks and the global pooling node g is a tree. Relying on the labels of the pooling and block nodes, we can reconstruct the original graph from this tree. As WL can distinguish non-isomorphic labeled trees (Arvind et al., 2015; Kiefer, 2020), it can thus distinguish non-isomorphic outerplanar graphs using CAT. For the other direction, note that CAT is permutation-invariant: for two isomorphic graphs G and H, the graphs CAT(G) and CAT(H) are isomorphic and WL will give the same coloring for both. Importantly, we can compute CAT(G) in linear time. The computational complexity is dominated by the computation of the blocks (Tarjan, 1972) and their Hamiltonian cycles (Mitchell, 1979), which both require linear time. Note that we only add a linear number of nodes and edges. From Morris et al. (2019) and Xu et al. (2019) it follows, that MPNNs, that are as expressive as 1-WL, can distinguish CAT(G) and CAT(H) for non-isomorphic outerplanar graphs G and H. Note that our proof used an important property of the WL algorithm: Adding uniquely labeled nodes and edges to WL-distinguishable graphs never leads to WL-indistinguishable graphs. We use this property to add a global pooling node in step 4 of CAT which is connected to all block pooling nodes. This allows to pass messages between block nodes in fewer iterations in the subsequent MPNN step. CAT can also be applied to non-outerplanar graphs. In this case, our graph transformation performs the steps described in Definition 2. However, if a non-outerplanar biconnected component Bi is encountered, only one copy B i is created in CAT(G) and its vertices are connected to the corresponding pooling nodes. An example for this is depicted in Appendix D. While this never reduces expressivity, it is also not guaranteed to improve expressivity on non-outerplanar graphs. Note that it can be determined in linear time whether a block is outerplanar while trying to compute the Hamiltonian cycle of the block (Mitchell, 1979). Hence, the CAT transformation always only requires linear time. 4.3 Influence of CAT on Graph Connectivity As poor graph connectivity may negatively influence the predictive performance of GNNs (Alon & Yahav, 2021), we investigate the effects of CAT on different measures of graph connectivity such as the diameter Φ(G) of a graph. We refer to the shortest path distance between two nodes as the distance between them. Observation 1. For a block B of a graph G, it holds that Φ(CAT(B)) 4. Proof. Let u, v V (CAT(B)). By definition all nodes in CAT(B) are either from a Hamiltonian cycle created by CAT , a pooling node, or a block node. If both nodes are from a Hamiltonian cycle, then there is a path u, pu, b, pv, v between them, where pu, pv are pooling nodes and b is the block node. Hence, d CAT(G)(u, v) 4. If u or v is a pooling or a block node, then the above path implies that d CAT(G)(u, v) < 4. Published in Transactions on Machine Learning Research (12/2024) Table 4: Predictive performance of MPNNs with and without CAT on different molecular benchmark datasets. Arrows indicate whether smaller ( ) or bigger ( ) results are better. Bold entries are an MPNN with CAT that outperforms the same MPNN without CAT. Dataset Model ZINC MAE ZINC250k MAE MOLHIV ROC-AUC MOLBACE ROC-AUC MOLBBBP ROC-AUC GIN 0.168 0.007 0.033 0.003 77.9 1.0 74.6 3.2 66.0 2.1 CAT+GIN 0.101 0.004 0.034 0.003 76.7 1.8 79.5 2.5 67.2 1.8 GCN 0.184 0.013 0.067 0.005 76.7 1.4 77.9 1.7 66.1 2.4 CAT+GCN 0.123 0.008 0.034 0.003 77.1 1.6 79.2 1.5 68.3 1.7 GAT 0.375 0.013 0.103 0.004 76.6 2.0 81.7 2.3 66.2 1.4 CAT+GAT 0.201 0.022 0.046 0.004 75.3 1.6 79.3 1.6 66.0 1.9 Dataset Model MOLSIDER ROC-AUC MOLESOL RMSE MOLTOXCAST ROC-AUC MOLLIPO RMSE MOLTOX21 ROC-AUC GIN 56.6 1.0 1.105 0.077 65.3 0.6 0.717 0.016 75.8 0.7 CAT+GIN 58.2 0.9 0.985 0.055 65.6 0.5 0.798 0.031 74.8 1.0 GCN 56.7 1.5 1.053 0.087 64.4 0.4 0.748 0.018 76.4 0.3 CAT+GCN 57.9 1.8 1.006 0.036 66.2 0.8 0.771 0.023 74.9 0.8 GAT 58.4 1.0 1.037 0.063 63.8 0.8 0.728 0.024 76.3 0.6 CAT+GAT 58.3 1.3 1.09 0.048 64.5 0.8 0.754 0.021 75.4 0.7 Observation 2. Let Bi and Bj be two blocks of a graph G. In CAT(G), the maximum distance between any node in CAT(Bi) and any node in CAT(Bj) is 6. Proof. Let u V (CAT(Bi)) and v V (CAT(Bj)). If Bi = Bj, then Observation 1 implies d CAT(G)(u, v) 4. If Bi = Bj, then there exists a path u, pu, bi, g, bj, pv, v where pu, pv are pooling nodes, bi, bj are the block node for block Bi, Bj, and g is the global block pooling node. Thus, d CAT(G)(u, v) 6. Observations 1 and 2 show that for some subgraphs CAT provides constant upper bounds in diameter. On the graph level, CAT can only lead to a small increase of the diameter. Proposition 1. For an outerplanar graph G, Φ(CAT(G)) Φ(G) + 7. Proof sketch. To prove the proposition we bound the distance between any pair of nodes in CAT(G) by case analysis based on the type of nodes. We defer the full proof to Appendix B. In most practical cases, the short-cutting inside or between blocks leads to CAT reducing the graph diameter (see Observations 1 and 2). In Table 3, we demonstrate this on molecular benchmark datasets. Besides the diameter, another useful graph connectivity measure is the effective resistance. The notion of effective resistance originates in electrical engineering (Kirchhoff, 1847) and has implications on several graph properties. For example, the effective resistance between two nodes is proportional to the commute time between them (Chandra et al., 1989). Intuitively, a large effective resistance between two nodes suggests that information propagation between the nodes is hindered. Recently, effective resistance has been in fact linked to over-squashing (Black et al., 2023) in GNNs, which is a negative effect that leads to long-range interactions having little impact on the predictions of a GNN. Effective resistance as introduced by Kirchhoff (1847) is naturally only defined for undirected graphs. As CAT produces directed graphs, we therefore use an extension of effective resistance introduced by Young et al. (2015) that is applicable to directed graphs. We refer to Young et al. (2015) for more details. In Table 3 we demonstrate that CAT reduces the pair-wise effective resistance on molecular benchmark datasets. Published in Transactions on Machine Learning Research (12/2024) 5 Experimental Evaluation We investigate whether our proposed method CAT can improve the predictive performance of MPNNs on molecular benchmark datasets.1 We utilize three commonly used MPNNs: GIN (Xu et al., 2019), GCN (Kipf & Welling, 2017), and GAT-v2 (Veličković et al., 2018; Brody et al., 2022). We train on the commonly used ZINC (Gómez-Bombarelli et al., 2018; Sterling & Irwin, 2015) and MOLHIV (Hu et al., 2020) datasets, which contain graphs representing molecules. We supplement these with 7 small datasets (see Table 4) from the OGB collection (Hu et al., 2020). In addition to the ZINC dataset with 12k graphs we also use the larger ZINC250k variant with 250k graphs. In total, we train three MPNNs on ten datasets with and without CAT. For each configuration, we separately tune hyperparameters on the validation set and train a model with the best hyperparameters ten times on the training set and evaluate it on the test set. The only exception to this is ZINC250k where we evaluate the final hyperparameter configuration five times, due to the large dataset size. For each dataset we report the mean and standard deviation of the most common evaluation metric on the test set in the epoch with the best validation performance. For ZINC we use a batch size of 128 and an initial learning rate of 10 3 that gets halved if the validation metric does not improve for 20 epochs. The training stops after 500 epochs or if the learning rate dips below 10 5. For all other datasets we train with a fixed learning rate for 100 epochs and a batch size of 128. We use the same hyperparameter grid for all models and provide more details in Appendix E. Besides measuring the predictive performance, we also measure the time needed for applying CAT (averaged over ten runs), and the training and evaluation time for GIN and GIN+CAT with the same hyperparameters on all datasets (averaged over five runs). Finally, we report the values for the diameters and effective resistances as described in Section 4.3. Results. Table 2 shows the pre-processing time of CAT. Note that this is the performance of running CAT on only a single CPU core. It is possible to achieve faster runtimes by simply parallelizing different graphs over different cores. This negligible runtime allows applying the transformation even in realistic high-throughput screening applications (Krasoulis et al., 2022), for example, on MOLHIV it takes only 5ms to process a graph. Training and prediction time on CAT transformed graphs increases by 29% on average. Table 4 shows the predictive performance of all models. Note that our baseline models obtain strong results, often surpassing the performance of (higher-order) GNNs reported in the literature, and that we train each MPNN and MPNN+CAT with exactly the same sets of hyperparameters. Overall, CAT improves the predictive performance of GIN and GCN in the majority of datasets (6 / 10 and 8 / 10, respectively). For GIN and GCN, performance increases reliably on all datasets, except MOLLIPO and MOLTOX21. Surprisingly, CAT does not work well with GAT and only improves its performance in 2 / 10 datasets. Most notably on ZINC, CAT achieves very strong results boosting the predictive performance of MPNNs between 33% (GCN) and 46% (GAT). This is only surpassed by higher-order GNNs such as CW Networks (Bodnar et al., 2021a) which obtain a MAE of 0.079 0.006 at the cost of potentially exponential pre-processing runtime due to enumerating cycles in the graph. Table 3 shows that CAT reduces both graph diameter and maximum pairwise resistance on most datasets. Furthermore, CAT reduces the average pair-wise resistance in all datasets. This suggests that CAT is effective at improving graph connectivity in real-life molecular graphs. 6 Conclusion We proposed Cyclic Adjacency Transform (CAT), a linear time graph transformation, that enables the Weisfeiler-Leman algorithm to be maximally expressive on outerplanar graphs. We rely on the fact that biconnected outerplanar graphs can be uniquely identified by their Hamiltonian adjacency list sequences, which CAT encodes in unfolding trees. It follows that a combination of MPNNs and CAT can distinguish all outerplanar graphs. We achieved promising empirical results on standard molecular benchmark datasets where CAT improved the performance of GIN and GCN in most cases, while for GAT we could not observe this benefit. Computing CAT takes linear time and our implementation of CAT typically runs in the order of seconds. Motivated by the recent interest in the over-squashing phenomenon, we also studied the effect of CAT on graph connectivity. We proved that in the worst-case CAT increases the diameter of outerplanar graphs by a small additive constant. However, inspecting CAT on real-world data, we find that 1Our code can be found at https://github.com/ocatias/outerplanar GNN. Published in Transactions on Machine Learning Research (12/2024) the diameter decreases most of the time. Similarly, we observed that the maximum and average pair-wise effective resistance, which is associated with over-squashing, typically decreases after applying CAT. Our work highlights the value of GNNs designed for specific graph classes. This perspective aligns with a recent position paper, in which Morris et al. (2024) propose to investigate the runtime complexity of maximally expressive GNNs on practically relevant graphs (Challenge II.4). In general, achieving high expressivity is computationally expensive. However, for some graph classes this might not be the case. In this work, we have demonstrated that for outerplanar graphs maximal expressivity can be achieved in linear time. Focusing on specific graph classes allows making use of existing graph theoretical results. This could lead to novel GNNs that are both runtime efficient and highly expressive. Possible applications are infrastructure and road networks, which are characterized by near-planar graphs that tend to have low graph degeneracy (Eppstein & Gupta, 2017). More generally, it seems promising to study sparse graphs, such as bounded expansion graphs (Nešetřil & De Mendez, 2012). Finally, in the future we plan to extend our work to k-outerplanar graphs, which are known to capture even more molecular graphs (Horváth et al., 2010). Acknowledgement This work was supported in part by the Vienna Science and Technology Fund (WWTF) through projects [10.47379/VRG19009] and [10.47379/ICT22059]; by the TU Wien DK Sec Int; and by the Austrian Science Fund (FWF) through project Nan OX-ML (6728). MT acknowledges support from a DOC fellowship of the Austrian academy of sciences (ÖAW). Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In ICLR, 2021. Vikraman Arvind, Johannes Köbler, Gaurav Rattan, and Oleg Verbitsky. On the power of color refinement. In Fundamentals of Computation Theory, 2015. Beatrice Bevilacqua, Fabrizio Frasca, Derek Lim, Balasubramaniam Srinivasan, Chen Cai, Gopinath Balamurugan, Michael Bronstein, and Haggai Maron. Equivariant subgraph aggregation networks. In ICLR, 2021. Lukas Biewald. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com. Mitchell Black, Zhengchao Wan, Amir Nayyeri, and Yusu Wang. Understanding oversquashing in GNNs through the lens of effective resistance. In ICML, 2023. Hans L. Bodlaender. A partial k-arboretum of graphs with bounded treewidth. Theoretical Computer Science, 209(1-2):1 45, 1998. Cristian Bodnar, Fabrizio Frasca, Nina Otter, Yu Guang Wang, Pietro Liò, Guido Montúfar, and Michael Bronstein. Weisfeiler and Lehman go cellular: CW networks. In Neur IPS, 2021a. Cristian Bodnar, Fabrizio Frasca, Yu Guang Wang, Nina Otter, Guido Montufar, Pietro Lio, and Michael Bronstein. Weisfeiler and Lehman go topological: Message passing simplicial networks. In ICML, 2021b. Giorgos Bouritsas, Fabrizio Frasca, Stefanos Zafeiriou, and Michael M Bronstein. Improving graph neural network expressivity via subgraph isomorphism counting. Transactions on Pattern Analysis and Machine Intelligence, 45(1):657 668, 2022. Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? In ICLR, 2022. Ashok K Chandra, Prabhakar Raghavan, Walter L Ruzzo, and Roman Smolensky. The electrical resistance of a graph captures its commute and cover times. In STOC, 1989. Published in Transactions on Machine Learning Research (12/2024) Charles J. Colbourn and Kellogg S. Booth. Linear time automorphism algorithms for trees, interval graphs, and planar graphs. SIAM Journal on Computing, 10(1):203 225, 1981. Leonardo Cotta, Christopher Morris, and Bruno Ribeiro. Reconstruction for powerful graph representations. Neur IPS, 2021. Reinhard Diestel. Graph theory. Springer, 2024. Radoslav Dimitrov, Zeyang Zhao, Ralph Abboud, and Ismail Ceylan. Plane: Representation learning over planar graphs. Neur IPS, 2023. Andre Droschinsky, Nils Kriege, and Petra Mutzel. Finding largest common substructures of molecules in quadratic time. In International Conference on Current Trends in Theory and Practice of Informatics, 2017. David Eppstein and Siddharth Gupta. Crossing patterns in nonplanar road networks. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2017. István Fáry. On straight line representation of planar graphs. Acta Sci. Math.(Szeged), 11:229 233, 1948. Stefan Felsner. Geometric graphs and arrangements: some chapters from combinatorial geometry. Springer, 2012. Matthias Fey and Jan E. Lenssen. Fast graph representation learning with Py Torch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019. Fabrizio Frasca, Beatrice Bevilacqua, Michael M. Bronstein, and Haggai Maron. Understanding and extending subgraph GNNs by rethinking their symmetries. In Neur IPS, 2022. Rafael Gómez-Bombarelli, Jennifer N. Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 2018. Tamás Horváth and Jan Ramon. Efficient frequent connected subgraph mining in graphs of bounded treewidth. Theoretical Computer Science, 411(31-33):2784 2797, 2010. Tamás Horváth, Jan Ramon, and Stefan Wrobel. Frequent subgraph mining in outerplanar graphs. In KDD, 2006. Tamás Horváth, Jan Ramon, and Stefan Wrobel. Frequent subgraph mining in outerplanar graphs. Data Mining and Knowledge Discovery, 21(3):472 508, 2010. Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open Graph Benchmark: Datasets for machine learning on graphs. In Neur IPS, 2020. Neil Immerman and Eric Lander. Describing graphs: A first-order approach to graph canonization. Yale University Press, 1990. Sandra Kiefer. Power and limits of the Weisfeiler-Leman algorithm. Ph D thesis, RWTH Aachen University, 2020. Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017. Gustav Kirchhoff. Ueber die Auflösung der Gleichungen, auf welche man bei der Untersuchung der linearen Vertheilung galvanischer Ströme geführt wird. Annalen der Physik, 148(12):497 508, 1847. Published in Transactions on Machine Learning Research (12/2024) Agamemnon Krasoulis, Nick Antonopoulos, Vassilis Pitsikalis, and Stavros Theodorakis. Denvis: Scalable and high-throughput virtual screening using graph neural networks with atomic and surface protein pocket features. Journal of Chemical Information and Modeling, 62(19):4642 4659, 2022. Nils M Kriege. Weisfeiler and Leman go walking: Random walk kernels revisited. In Neur IPS, 2022. Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. Neur IPS, 2019. Sandra L. Mitchell. Linear algorithms to recognize outerplanar and maximal outerplanar graphs. Information Processing Letters, 9(5):229 232, 1979. Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and Leman go neural: Higher-order graph neural networks. In AAAI, 2019. Christopher Morris, Fabrizio Frasca, Nadav Dym, Haggai Maron, Ismail Ilkan Ceylan, Ron Levie, Derek Lim, Michael M Bronstein, Martin Grohe, and Stefanie Jegelka. Position: Future directions in the theory of graph machine learning. In ICML, 2024. Jaroslav Nešetřil and Patrice Ossona De Mendez. Sparsity: graphs, structures, and algorithms, volume 28. Springer Science & Business Media, 2012. Leander Schietgat, Jan Ramon, and Maurice Bruynooghe. A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics. Annals of Mathematics and Artificial Intelligence, 69(4):343 376, 2013. Teague Sterling and John J. Irwin. Zinc 15 ligand discovery for everyone. Journal of Chemical Information and Modeling, 2015. Maciej M. Sysło. The subgraph isomorphism problem for outerplanar graphs. Theoretical Computer Science, 17(1):91 97, 1982. Robert Endre Tarjan. Depth-first search and linear graph algorithms. SIAM Journal of Computing, 1(2): 146 160, 1972. Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph Attention Networks. In ICLR, 2018. Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning. 2018. Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In ICLR, 2019. George Forrest Young, Luca Scardovi, and Naomi Ehrich Leonard. A new notion of effective resistance for directed graphs part I: Definition and properties. Transactions on Automatic Control, 61(7):1727 1736, 2015. Bohang Zhang, Guhao Feng, Yiheng Du, Di He, and Liwei Wang. A complete expressiveness hierarchy for subgraph GNNs via subgraph Weisfeiler-Lehman tests. In ICML, 2023a. Bohang Zhang, Shengjie Luo, Liwei Wang, and Di He. Rethinking the expressive power of GNNs via graph biconnectivity. In ICLR, 2023b. Published in Transactions on Machine Learning Research (12/2024) A Outerplanar Graphs Let G = (V, E) be an undirected graph. A planar embedding of G consists of an injective mapping f of nodes V to vectors in the plane R2 and for each pair of adjacent nodes u, v a continuous path between f(u) and f(v) such that no pair of such paths cross. More formally, a continuous path from f(u) to f(v) is a continuous function p : [0, 1] R2 such that p(0) = f(u) and p(1) = f(v). Two such continuous paths p, p cross if there exist t, t (0, 1) such that p(t) = p (t ). The graph G is planar if it has a planar embedding. By Fáry s theorem each planar graph always has a planar embedding with straight line segments as paths (Fáry, 1948). Intuitively a planar embedding dissects the plane into regions called faces. Formally these are the (topologically) connected regions of R2 with all the edge paths p removed. Any planar embedding has a unique face not bounded by paths (the only non-compact face), which is called the outer face. A graph is outerplanar if it has a planar embedding with all nodes lying on the boundary of the outer face. See, for example, Felsner (2012) for more details. We can also characterize outerplanar graphs using forbidden graph minors. A graph H is a minor of G, if H can be obtained from G by a series of node deletion, edge deletion, and edge contractions (i.e., removing an edge and replacing the two endpoints by a new node). Let Ka,b be the complete bipartite graph with bi-partition A B = V with a = |A| and b = |B| and Kc be the complete graph on c nodes. A graph G is outerplanar if and only if G has no K2,3 minor and no K4 minor. Similarly, planar graphs can be characterized by the Kuratowski theorem as graphs with no K3,3 and no K5 minor. For more details see, for example Diestel (2024). B Proof of Proposition 1 We prove Proposition 1 which states that Φ(CAT(G)) Φ(G) + 7 for every outerplanar graph G. Proof. Let u, v V (CAT(G)) such that d CAT(G)(u, v) = Φ(CAT(G)). We call a node a tree node if it was not part of a block in G and was not created by CAT. A node that is not a tree node is either a Hamiltonian cycle node, a pooling node, a block pooling node, or a global block pooling node. Case 1: Both u, v are not tree nodes. By Observation 2: d CAT(G)(u, v) 6. Case 2: Node u is a tree node and v is not. Let a V (G) be the closest articulation node to u in CAT(G). Then, there is a path of length d G(u, a) in CAT(G) from u to a. We can extend this path by one node to reach a pooling node. By Definition 2, there exists a path of length at most 6 from this pooling node to v. Thus d CAT(G)(u, v) d G(u, a) + 7 Φ(G) + 7. Case 3: Both u, v are tree nodes. Case 3a: Suppose that the shortest path between u and v in G does not contain any edge inside of an outerplanar block, then d CAT(G)(u, v) = d G(u, v) Φ(G). Case 3b: Suppose that the shortest path between u and v in G contains one or more edges inside exactly one block. Then, we can enter and exit this block in CAT(G) through a path a1, p1, b, p2, a2, where a1, a2 are articulation nodes, p1, p2 are pooling nodes, and b is a block node. Note that the articulation nodes were part of the path in G which implies d CAT(G)(u, v) = d G(u, a1) + d G(a2, v) + 4 = d G(u, v) d G(a1, a2) + 4. Furthermore, we do not need to take the one or more edges inside the block to go from a1 to a2. Using d G(a1, a2) 1 we obtain d CAT(G)(u, v) = d G(u, v) d G(a1, a2) + 4 d G(u, v) + 3 Φ(G) + 3. Case 3c: Suppose that the shortest path between u and v in G contains two or more edges that are contained in two or more different blocks. Then, for CAT(G) we can shortcut from the first to the last block node of the shortest path between u and v through a path a1, p1, b1, g, b2, p2, a2, where a1, a2 are articulation nodes, p1, p2 are pooling nodes, b1, b2 are block nodes, and g is the global block pooling node. Note that the articulation nodes were part of the path in G which implies that d CAT(G)(u, v) = d G(u, a1) + d G(a2, v) + 6 = d G(u, v) d G(a1, a2) + 6. By assumption, we know that d G(a1, a2) 2 which implies d CAT(G)(u, v) = d G(u, v) d G(a1, a2) + 6 d G(u, v) + 4 Φ(G) + 4. Published in Transactions on Machine Learning Research (12/2024) Figure 6: Additional examples for CAT . The biconnected outerplanar graphs on the left are transformed using Definition 1. Figure 7: Two nonplanar graphs which Plan E cannot distinguish. However, CAT can distinguish the graphs by their two central outerplanar biconnected components. C Additional Examples for CAT Here, we provide additional examples for CAT . Figure 6 shows two graphs and the results, when transformed using CAT . As described in Definition 1, the original graph is copied twice and the edges of the Hamiltonian cycle are directed in each direction once (with their label extended with 1). The remaining edges are added to each copy directed in both directions, with their label being extended to describe the distance of their two nodes when only using edges from the Hamiltonian cycle. D Incomparable Expressivity of CAT and Plan E The expressivity of CAT and Plan E is incomparable. This follows from Propositions 2 and 3, showing that there are pairs of graphs that are distinguishable by CAT and not Plan E and vice versa. Proposition 2. There exists non-planar graphs that CAT can distinguish and Plan E cannot distinguish. Proof. The claim follows from Figure 7. Published in Transactions on Machine Learning Research (12/2024) Figure 8: Two planar (but not outerplanar) graphs which CAT cannot distinguish. Proposition 3. There exists planar graphs that Plan E can distinguish and CAT cannot distinguish. Proof. The claim follows from Figure 8. E Details for Experimental Evaluation Our models are implemented in Py Torch-Geometric (Fey & Lenssen, 2019) and trained on a single NVIDIA Ge Force RTX 3080 GPU. We use Wand B (Biewald, 2020) for tracking. The used server has 64 GB of RAM and an 11th Gen Intel(R) Core(TM) i9-11900KF CPU running at 3.50GHz. Table 5 shows the hyperparameters of our MPNNs for different datasets. We use the same hyperparameter grid for MPNNs combined with CAT. We used a smaller hyperparameter grid for MOLHIV as MOLHIV is larger than the most of the other datasets, meaning that training takes much longer. When benchmarking the speed of GIN against GIN+CAT we train for 100 epochs with a batch size of 128 on all datasets with the same hyperparameters for both models (see Table 5). CAT implementation. CAT adds an additional feature to each node which encodes the type of that node, i.e., nodes from Hamiltonian cycles, block nodes, pooling nodes, articulation nodes and or global block nodes. Furthermore, we create additional edge features encoding the types of nodes incident to this edge i.e., an edge between two different nodes in a Hamiltonian cycle has a different type than an edge from a pooling node to the block node. For newly created nodes and edges we set their remaining features to the feature of the node / edge they are based on; for example, a pooling node will have the features of the node they are performing the pooling operation for. For nodes that have no natural representation in the graph (block and block pooling nodes) we set these features to 0. To ensure that only these nodes get assigned 0 features, we shift the values of these features for all other nodes by 1.2 Note that our MPNNs treat the distance on edges in blocks as a categorical feature. Representing the distances as numerical features did not improve performance in preliminary experiments. Table 5: Hyperparameter grids for GIN, GCN and GAT on different datasets. Parameter All datasets except MOLHIV MOLHIV Speed Benchmarks (Table 2) Message passing layers 2, 3, 4, 5 4, 5 4 Final MLP layers 2 2 2 Pooling operation mean, sum mean, sum mean Embedding dimension 64, 128, 256 64,128 64 Jumping knowledge last concat concat Dropout rate 0, 0.5 0.5 0 Dataset description. All graphs in our datasets represent molecules and have graph-level tasks. For all tasks we use the metrics that are commonly used by the community, e.g., MAE for ZINC or ROC-AUC 2This assumes that the features are categorical and not numerical which is the case for the used datasets. Published in Transactions on Machine Learning Research (12/2024) for MOLHIV. For ZINC (10k graphs) and ZINC250k (250k graphs) (Gómez-Bombarelli et al., 2018; Sterling & Irwin, 2015) the task is to predict the constrained solubility which is a regression task. All other datasets are from the Open Graph Benchmark (Hu et al., 2020) based on Molecule Net (Wu et al., 2018) and the task is always to predict some molecular property. On MOLHIV, the task is to predict whether a molecule inhibits HIV virus replication or not (Hu et al., 2020). For the tasks of the other datasets, we refer to Wu et al. (2018). F Additional Figures We provide additional visualizations of the CAT transformation. In all figures, the color of the vertices in the transformed graph have the following meaning: red nodes are from Hamiltonian cycles, blue nodes correspond to blocks, yellow nodes pool the nodes from Hamiltonian cycles, orange nodes correspond to articulation nodes and the gray node pools block nodes. Figure 9 shows a synthetic example with a nonouterplanar graph. Figure 10 demonstrates CAT on various synthetic graphs. Figure 11 shows the result of CAT on molecular graphs from ZINC and MOLHIV. Somewhat ironically, CAT often generates frog graphs on MOLHIV as can be seen in Figure 12. Figure 9: Left: example non-outerplanar graph. Right: result of applying CAT to the graph. Colors indicate the type of node (see Appendix F). Published in Transactions on Machine Learning Research (12/2024) Figure 10: Left: example graphs. Right: result of applying CAT to these graphs. Colors indicate the type of node (see Appendix F). Published in Transactions on Machine Learning Research (12/2024) Figure 11: Left: example graphs from MOLHIV (top) and ZINC (bottom). Right: result of applying CAT to these graphs. Colors indicate the type of node (see Appendix F). Published in Transactions on Machine Learning Research (12/2024) Figure 12: Left: example graphs from MOLHIV. Right: result of applying CAT to the graph. Colors indicate the type of node (see Appendix F).