# piecewise_constant_spectral_graph_neural_network__ce655dbb.pdf

Published in Transactions on Machine Learning Research (04/2025)

Piecewise Constant Spectral Graph Neural Network

Vahan Martirosyan vahan.martirosyan@centralesupelec.fr Université Paris-Saclay Centrale Supélec, Inria, France

Jhony H. Giraldo jhony.giraldo@telecom-paris.fr LTCI, Télécom Paris Institut Polytechnique de Paris, France

Fragkiskos D. Malliaros fragkiskos.malliaros@centralesupelec.fr Université Paris-Saclay Centrale Supélec, Inria, France

Reviewed on Open Review: https: // openreview. net/ forum? id= s Td Vn DW0HX

Graph Neural Networks (GNNs) have achieved significant success across various domains by leveraging graph structures in data. Existing spectral GNNs, which use low-degree polynomial filters to capture graph spectral properties, may not fully identify the graph s spectral characteristics because of the polynomial s small degree. However, increasing the polynomial degree is computationally expensive and, beyond certain thresholds, leads to performance plateaus or degradation. In this paper, we introduce the Piecewise Constant Spectral Graph Neural Network (Pie Co N) to address these challenges. Pie Co N combines constant spectral filters with polynomial filters to provide a more flexible way to leverage the graph structure. By adaptively partitioning the spectrum into intervals, our approach increases the range of spectral properties that can be effectively learned. Experiments on nine benchmark datasets, including both homophilic and heterophilic graphs, demonstrate that Pie Co N is particularly effective on heterophilic datasets, highlighting its potential for a wide range of applications. The implementation of Pie Co N is available at https://github.com/vmart20/Pie Co N.

1 Introduction

Graph Neural Networks (GNNs) (Wu et al., 2021; Zhou et al., 2020) have demonstrated remarkable performance across various application domains. They have been successfully applied in areas such as social network analysis (Panagopoulos et al., 2023), recommendation systems (Wu et al., 2019; Ying et al., 2018), drug discovery (Jiang et al., 2021; Bongini et al., 2021), and materials modeling (Coley et al., 2019; Duval et al., 2023), where data can naturally be represented as graphs. GNNs can be broadly classified into two types: spatial and spectral GNNs. Spatial GNNs (Kipf & Welling, 2017; Veličković et al., 2018; Chien et al., 2021) use a message passing approach to learn node representations by collecting information from neighboring nodes. This approach allows spatial GNNs to capture local structural information and adapt to varying neighborhood sizes. On the other hand, spectral GNNs (Wang & Zhang, 2022; Defferrard et al., 2016; He et al., 2021; Castro-Correa et al., 2024) use the graph s spectral characteristics, such as the graph Laplacian s eigenvalues and eigenvectors, to transform node features. By leveraging the spectral domain, these models can capture global structural patterns and apply graph convolution operations in the frequency domain.

Many existing spectral GNNs use low-degree polynomial filters, which approximate the filtering functions by applying polynomials to the graph s Laplacian matrix or other graph-shift operators (Wang & Zhang, 2022; Defferrard et al., 2016; He et al., 2021). One disadvantage of these low-degree polynomial filters is that they are continuous and, because of the low degree, may not give enough weight to specific eigenvalues, as the

Published in Transactions on Machine Learning Research (04/2025)

change between closely spaced eigenvalues cannot vary significantly. This can be problematic, particularly in real-world graphs where certain eigenvalues like zero1 have large multiplicities (Lu et al., 2024; Lim et al., 2023).

In this paper, we propose the Piecewise Constant Spectral Graph Neural Network (Pie Co N) to overcome this limitation. Our approach combines constant spectral filters with polynomial filters to better capture the spectral properties of the graph. The constant filters are defined by setting the values in the diagonal eigenvalue matrix to ones within specific intervals and zeros elsewhere, effectively isolating different frequency bands.

-1.0 -0.8 -0.5 -0.2 0.0 0.2 0.5 0.8 1.0 Eigenvalues

Filter Output

Jacobi Conv Pie Co N

Figure 1: Comparison of Jacobi Conv and Pie Co N trained filters on the Chameleon dataset.

By partitioning the spectrum into intervals, Pie Co N expands the range of spectral characteristics that can be learned, improving the model s performance. Figure 1 compares the response of a Jacobi Conv filter (Wang & Zhang, 2022) versus the response of a Pie Co N filter across the spectrum. The figure shows how polynomial filters produce a smooth response, while piecewise constant filters can sharply focus on selected intervals, capturing crucial spectral properties that polynomial filters miss. For example, Pie Co N can create a sharp drop at eigenvalue 0, which is impossible to achieve with lowdegree polynomial filters like Jacobi Conv. This sharp discontinuity is crucial because eigenvalue 0 often has high multiplicity in real-world graphs (as detailed in Table 1) and contains important structural information. Our theoretical analysis provides important insights about spectral GNNs by establishing error bounds for polynomial spectral filtering and proving that our model is invariant to eigenvector sign flips and basis changes. We validate our approach on standard benchmark datasets, showing improved performance, particularly when handling graphs with multiple zero eigenvalues.

Our main contributions are as follows:

We introduce a novel spectral GNN model (Pie Co N) that uses piecewise constant filters combined with polynomial filters to enhance learning from the spectral properties of graphs.

We propose a new method to isolate frequency bands in the spectral domain by partitioning the eigenvalue spectrum into intervals, allowing the model to focus on crucial spectral properties.

We demonstrate, through experiments, that Pie Co N outperforms or shows competitive performance against spatial and spectral GNNs on real-world datasets.

2 Related Work

GNNs have evolved significantly since the early work by Bruna et al. (2014), who introduced the first modern spectral-based graph convolution network using the graph Fourier transform. Later, Kipf & Welling (2017) simplified this approach with the Graph Convolutional Network (GCN) model, which applies a first-order approximation of spectral filters to make GNNs more scalable. Other important advancements include Graph Attention Networks (GAT) (Veličković et al., 2018), which use attention mechanisms to weigh neighboring nodes differently, and Graph SAGE (Hamilton et al., 2017), which focuses on inductive learning by generating node embeddings through sampling and aggregation techniques.

Message-passing neural networks (MPNNs) (Gilmer et al., 2017) are a class of GNNs where nodes iteratively exchange and aggregate information with their neighbors. Besides GCN and GAT, several other messagepassing approaches have emerged to address specific challenges. For instance, the Graph Isomorphism Networks (GIN) model (Xu et al., 2019) is designed to be as powerful as the 1-WL test for distinguishing non-isomorphic graphs, thus improving expressiveness.

1Note that, for the Laplacian matrix, the multiplicity of eigenvalue zero corresponds to the number of connected components. This correspondence does not hold for the normalized adjacency matrix we employ here.

Published in Transactions on Machine Learning Research (04/2025)

Spectral GNNs leverage the eigenvalues and eigenvectors of the graph Laplacian to perform graph convolution in the spectral domain. Cheb Net (Defferrard et al., 2016) introduced the use of Chebyshev polynomials for spectral filtering, improving the scalability of spectral GNNs. Jacobi Conv (Wang & Zhang, 2022) further enhances this by employing orthogonal Jacobi polynomials to flexibly learn graph filters. Recent work by Maskey et al. (2023) introduces a novel fractional graph Laplacian approach defined in the singular value domain to address over-smoothing, providing theoretical guarantees for both directed and undirected graphs. For heterophilic graphs with label noise, R2LP (Cheng et al., 2024) demonstrates that increasing graph homophily can help mitigate the impact of noisy labels.

While previous spectral GNNs have leveraged polynomial filters to approximate spectral properties of graphs, they are limited by their lack of flexibility in handling critical eigenvalues because of the low degree of the polynomial filters. DSF (Guo et al., 2023) addresses this by employing a shared network on positional encodings to learn unique polynomial coefficients per node, highlighting the advantages of node-specific filters over node-unified ones. NFGNN (Zheng et al., 2023) proposes a node-oriented spectral filtering approach that learns specific filters for each node, better adapting to local homophily patterns. PP-GNN (Lingam et al., 2022) also explores piecewise spectral filtering, but our approach differs in several key aspects. While PP-GNN uses a fixed two-part partitioning with low-degree polynomials and continuity constraints, Pie Co N implements adaptive K-part partitioning with constant filters that allow for discontinuities between intervals. This design enables Pie Co N to better capture critical eigenvalues with sharper responses. Additionally, our approach decomposes filters into positive and negative parts (Eq. (10)) and maintains invariance to eigenvector sign flips and basis changes (Proposition 2).

3 Background

We consider an undirected graph G = (V, E, X) with n nodes. Here, V = {v1, v2, . . . , vn} is the set of nodes, E V V (|E| = m) represents the set of edges, and X Rn d is the node feature matrix. The adjacency matrix A {0, 1}n n of the graph is defined as Aij = 1 if there is an edge between nodes vi and vj, and Aij = 0 otherwise. The degree matrix D = diag(d1, . . . , dn) is a diagonal matrix with the i-th diagonal entry as di = P

j Aij, representing the degree of node i. The normalized adjacency matrix ˆ A is defined as ˆ A = D 1

2 . The normalized adjacency matrix is symmetric and can be decomposed as ˆ A = UΛU , where Λ is a diagonal matrix containing the eigenvalues λi of ˆ A, and U is an orthogonal matrix whose columns are the corresponding eigenvectors ui. Let s be the number of distinct eigenvalues of ˆ A, denoted by λ 1, λ 2, . . . , λ s. For an eigenvalue λ, we define ν(λ) as the algebraic multiplicity of λ, which is the number of times λ appears in the eigenvalues. A graph signal is a function that assigns a scalar value to each node in the graph. Formally, a graph signal can be represented as a vector x Rn, where each entry xi corresponds to the signal value at node vi.

3.1 Graph Neural Networks

Graph Neural Networks (GNNs) are deep learning architectures designed to learn from graph-structured data. They can be broadly categorized into two types: spatial GNNs and spectral GNNs. Both approaches aim to learn node representations by leveraging the graph structure and node features but differ fundamentally in how they process graph information.

Graph Fourier Transform and Spectral GNNs. The graph Fourier transform (Ortega et al., 2018) of a graph signal x Rn is defined as ˆx = U x, and its inverse is given by x = U ˆx (Shuman et al., 2013). This transform projects a graph signal from the spatial domain (node space) to the spectral domain (frequency space), similar to how the classical Fourier transform operates on time signals.

The spectral filtering of a signal x with a kernel v is defined as:

z = v G x = U (ˆv ˆx) = U ˆV U x, (1)

where ˆV = diag(ˆv1, . . . , ˆvn) represents the spectral kernel coefficients and denotes element-wise multiplication. Spectral GNNs leverage this graph Fourier transform to define convolution operations in the

Published in Transactions on Machine Learning Research (04/2025)

Constant filters

Polynomial filters

Trained constant filter

Trained polynomial filter

Combined Filter

Input Graph

Initial filters Trained filters

Figure 2: Overview of the Pie Co N model. Our method processes an input graph through eigenvalue segmentation (Alg. 1) to create constant filters, while separately applying polynomial filters. These filters are trained and combined to create the final spectral filter.

spectral domain. These models approach graph learning from a signal processing perspective, using the eigendecomposition of graph matrices to define filtering operations.

To avoid the computationally expensive eigen-decomposition, polynomial functions h( ˆ A) are often used to approximate different kernels in spectral GNNs. Specifically, the spectral filter h(λ) is parameterized as a polynomial of degree P:

p=0 βpλp, (2)

where βp are learnable coefficients. Consequently, the filtering process can be reformulated as:

p=0 βp ˆ A px = Uh(Λ)U x, (3)

which allows efficient computation of the filtered signal using only matrix multiplications.

Spatial GNNs. Spatial GNNs, also known as MPNNs, operate directly on the graph structure by aggregating information from neighboring nodes (Gilmer et al., 2017). The general form of message passing can be expressed as:

h(l+1) i = UPDATE h(l) i , AGGREGATE {h(l) j : j N(i)} , (4)

where h(l) i represents the feature vector of node i at layer l, N(i) is the set of neighboring nodes of i, AGGREGATE is a permutation-invariant function that combines information from the neighbors, and UPDATE is a function that updates the node s representation.

Popular examples of MPNNs include the GCN model (Kipf & Welling, 2017), which can be formulated as:

H(l+1) = σ ˆ AH(l)W(l) , (5)

Published in Transactions on Machine Learning Research (04/2025)

where H(l) is the matrix of node representations at layer l, W(l) is a learnable weight matrix, and σ( ) is a non-linear activation function.

4 Piecewise Constant Spectral Graph Neural Network (Pie Co N)

Current spectral GNNs often have limited flexibility in how they process graph structures due to their reliance on polynomial filters. We propose Pie Co N, a model that combines different types of spectral filters to process graph data in ways that complement the capabilities of polynomial filters. The key steps of our methodology, illustrated in Fig. 2, include: (1) partitioning the graph spectrum into intervals based on significant points identified through spectral analysis, (2) constructing constant spectral filters for each interval to capture global and local spectral properties, and (3) combining constant spectral filters with polynomial filters. Below, we describe the methodology in detail. All the proofs of theorems and propositions are provided in Appendix A.

4.1 Identifying Significant Points in the Spectrum

0 500 1000 1500 2000 2500 Eigenvalue Index

Eigenvalues & derivatives

Eigenvalues derivative Eigenvalues Significant Points

Figure 3: Using the derivative of eigenvalues to identify significant points which show relatively high changes in the spectrum.

Understanding the spectral properties of a graph is crucial for analyzing its structure. A key challenge is identifying significant points in the eigenvalue spectrum, which can reveal important structural insights (Fig. 3). Algorithm 1 addresses this by identifying large gaps in the eigenvalue spectrum, which can indicate distinct frequency bands in the graph s spectral representation and highlight structural changes or regions of high spectral variation (Von Luxburg, 2007; Fiedler, 1973; Chung, 1997). By adaptively partitioning the eigenvalue space around these critical points, Pie Co N can capture the most informative spectral features of the graph.

Algorithm 1 identifies significant points in the eigenvalue spectrum by analyzing local patterns in the distribution of eigenvalues. For each position i in the sorted sequence of eigenvalues, the algorithm quantifies how much eigenvalue λi deviates from the statistical properties of its neighboring eigenvalues. Specifically, it computes the deviation of the discrete derivative di = λi+1 λi from the mean of derivatives within windows before and after position i, normalized by their respective standard deviations. This normalization produces a significance score si that measures how abnormal each eigenvalue gap is relative to its local context. Positions with the highest scores represent points where the spectrum exhibits sudden changes, effectively identifying natural boundaries between different structural components of the graph. The algorithm also handles eigenvalue multiplicity by assigning zero significance to positions where consecutive eigenvalues are identical, ensuring that clusters of repeated eigenvalues remain intact. The identified significant points serve as adaptive partitioning boundaries for the spectrum, allowing Pie Co N to focus computational resources on the most informative regions of the eigenvalue distribution. This partitioning preserves sign and basis invariance (as proven in Section 5.2), making our approach robust to different eigenvector calculation methods. A parameter sensitivity analysis for Alg. 1 is presented in Appendix B.

4.2 Construction of Constant Filters

For each interval [ak, ak+1), we construct a spectral filter T k, as shown in Fig. 4. The filter is defined as:

T k = UEk U = U [:,ak:ak+1]U [:,ak:ak+1], (6)

where Ek is a binary diagonal matrix with non-zero entries corresponding to the eigenvalues within the k-th interval and U [:,ak:ak+1] is the submatrix of U with columns ak to ak+1 1. Specifically, the matrix Ek for

Published in Transactions on Machine Learning Research (04/2025)

Algorithm 1 Thresholding Algorithm for Identifying Significant Eigenvalue Gaps

1: function Identify_Significant_Gaps(d, λ, w, K) d: Discrete derivative of eigenvalues, λ: Sorted eigenvalues, w: Window size for averaging, K: Number of top indices (Number of spectral intervals) 2: ϵ small constant A very small positive value to avoid division by zero 3: s 0 Significance of each index

4: for i w to n w 1 do 5: µp, σp mean(di w:i), std(di w:i) Mean and standard deviation before i 6: µn, σn mean(di+1:i+w+1), std(di+1:i+w+1) Mean and standard deviation after i

7: si |d µp|

σp+ϵ + |di µn|

σn+ϵ Sum of normalized distances to adjacent means

8: if λi = λi 1 then

9: si 0 Set to zero if no gap exists 10: end if 11: end for 12: a0 = 0, a K 1 = n + 1 13: a1, a2, . . . , a K 2 indices of the largest K 2 values in s 14: return a0, a1, . . . , a K 1 15: end function

an interval [ak, ak+1) is defined as:

Ek = diag(0, . . . , 1 |{z} ak , 1, . . . , 1 |{z} ak+1 1 , . . . , 0). (7)

This construction ensures that Ek captures the eigenvectors in the specified interval, allowing the filter T k to encapsulate the corresponding spectral properties. Our approach differs from traditional polynomial-based spectral filters by enabling more flexible and tailored filtering of the graph s spectral components.

4.3 Polynomial Filters

In addition to the spectral filters T k, polynomial filters of the form ˆ A p are used (Fig. 5). These filters provide a way to incorporate local neighborhood information into the model. By adjusting the polynomial degree p, we can capture varying scales of locality in the graph.

Using the distinct eigenvalues of ˆ A we can get:

( ˆ A λ 1I) ( ˆ A λ s I) = 0. (8)

This implies that polynomial filters have at most s free parameters because any polynomial of degree higher than s can be reduced to a polynomial of degree s. The inclusion of polynomial filters complements the constant spectral filters by providing a smooth interpolation between different spectral components. This combination allows Pie Co N to capture both sharp and gradual changes in the graph s spectral properties.

4.4 Combining Constant and Polynomial Filters

The final embedding matrix of the nodes is computed by combining the constant spectral filters and polynomial filters. As in Jacobi convolutions (Wang & Zhang, 2022), we apply independent filtering on each of the channels in X simultaneously:

k=0 αkl T kσ(XW 1)W 2 :l | {z } Constant Filters

p=0 βpl ˆ A pσ(XW 1)W 2 :l

| {z } Polynomial Filters

Published in Transactions on Machine Learning Research (04/2025)

-1.0 -0.5 0.0 0.5 1.0 Eigenvalues

Filter Output

Figure 4: Constant filters.

-1.0 -0.5 0.0 0.5 1.0 Eigenvalues

Filter Output

Figure 5: Polynomial filters.

Matrix:l is the lth column of the Matrix;

X is the input feature matrix, representing the initial features of the nodes;

W 1 and W 2 are weight matrices to be learned during training. W 1 maps the input features to an intermediate space, and W 2 maps the intermediate representations to the embedding space;

σ( ) is a non-linear activation function applied element-wise, introducing non-linearity into the model;

αkl are learnable coefficients associated with the spectral filters T k for each dimension l;

βpl are learnable coefficients associated with the polynomial filters ˆ A p for each dimension l.

Each element of T k represents a similarity between two nodes in some range of frequencies. When we perform the matrix multiplication MX for some similarity matrix M = T k, each entry M ij in the similarity matrix M represents the weight or importance of node j in contributing to the feature vector of node i. If M is non-negative, it means each node contributes either positively or not at all to the feature aggregation. Negative values, on the other hand, would imply subtracting features from neighbors, which is typically not meaningful in most graph-based learning contexts, where the goal is to aggregate features to enhance node representations. Therefore, we separate T k into positive and negative parts, as follows:

k=0 α+ kl(T k)+σ(XW 1)W 2 :l | {z } Positive Part

k=0 α kl(T k) σ(XW 1)W 2 :l | {z } Negative Part

p=0 βpl ˆ A pσ(XW 1)W 2 :l

| {z } Polynomial Filters

where α+ kl and α kl are learnable coefficients associated with the spectral filters T + k and T k for each dimension l. The superscripts + and - indicate coefficients for the positive and negative parts of the spectral filters, respectively.

It is important to note that while our method could technically be classified as a piecewise polynomial approach, we do not construct separate polynomial functions for each interval. Instead, our approach combines constant functions within specific spectral intervals with global polynomial filters.

4.5 Computational Complexity

The computational complexity of our method is broken down as follows:

1. Eigendecomposition (precomputation): O(n3) for computing spectral components.

2. Filter construction and sparsification (precomputation): O(n3 + 2Kn2 log(m)) = O(n3) for constructing and sparsifying TK filters, where K is the number of spectral intervals, and m is the edge count.

Published in Transactions on Machine Learning Research (04/2025)

3. Model propagation: O((K + P)md) during training and inference, where P is the polynomial degree, and d is the feature dimension. This matches the theoretical time complexity of Jacobi Conv (Wang & Zhang, 2022) and GPRGNN (Chien et al., 2021), while being more efficient than Bern Net s (He et al., 2021) O(P 2md). An empirical comparison of the computational efficiency between our approach and the polynomial filtering method Jacobi Conv is provided in Appendix C.

It is worth noting that while spectral methods like Cheb Net (Defferrard et al., 2016), Bern Net (He et al., 2021), and Jacobi Conv (Wang & Zhang, 2022) avoid explicit eigendecomposition, they still face challenges when using higher-degree polynomials (needed for better approximation), as the recursive computation of ˆ A kx becomes computationally intensive in practice.

5 Theoretical Analysis

This section provides theoretical and empirical analyses to establish the advantages of piecewise constant spectral filters over polynomial filters and validate the design choices of Pie Co N. For the theoretical part, we explain an error bound from polynomial approximation theory, which shows the limits of polynomial spectral filtering when dealing with sharp changes in functions. We discuss the challenges with eigenvector representations, such as their invariance under sign flips and basis shifts, which can affect the generalization of graph learning models. Additionally, in Appendix D, we analyze how specific graph structures influence the eigenvalue spectrum, with a focus on the eigenvalue 0, and explain why separating constant filters into negative and positive parts helps improve model performance and reduces approximation errors.

5.1 Error Analysis for Polynomial Approximation

To analyze the fundamental limitations of polynomial approximations in spectral filtering, we establish a theorem characterizing the error bounds.

Theorem 1 (Approximation error for ϵ-dense eigenvalues). Let ˆ A Rn n be a normalized adjacency matrix with spectrum {λi}n i=1 where 1 λ1 λn 1. Assume that these eigenvalues are ϵ-dense on [ 1, 1] and that d2ϵ < 1. Let f : [ 1, 1] R be a filter function with f = supx [ 1,1] |f(x)| = 1. For any polynomial p Pd(the space of polynomials of degree at most d), the approximation error is:

i=1 |p(λi) f(λi)| p (1 d2ϵ) 1. (11)

Generally, as n increases, the eigenvalues become more densely packed in [ 1, 1], causing ϵ to approach zero. This means d2ϵ will also approach zero, making the lower bound on the approximation error converge to p 1. Therefore, to minimize the error bound while maintaining sufficient approximation power, setting p 1 is a natural choice since the target function satisfies f = 1. This normalization allows for fair comparison between different polynomial approximations and simplifies the analysis. Hence, in the following theorem, we specifically focus on polynomials with p 1 to analyze the particular challenge of approximating functions with jump discontinuities.

Theorem 2 (Approximation error for functions with jump discontinuities). Let ˆ A Rn n be a normalized adjacency matrix with spectrum {λi}n i=1 where 1 λ1 λn 1. Let f : [ 1, 1] R be some filter function with f = 1 that the model needs to find and suppose it has a jump discontinuity of magnitude h > 0 between consecutive eigenvalues λR and λR+1 (|f(λR+1) f(λR)| = h). For any polynomial p Pd(the space of polynomials of degree at most d), satisfying p 1, the approximation error is:

i=1 |p(λi) f(λi)| h |λR+1 λR| d2. (12)

This result demonstrates a key limitation: For small d when |λR+1 λR| h d2 , the approximation error E(p, f) E2 h. This is particularly problematic for spectral filtering where: (1) sharp transitions in the

Published in Transactions on Machine Learning Research (04/2025)

Table 1: Statistics of the datasets used for node classification. ν(0) is the multiplicity of eigenvalue 0.

Dataset Nodes Edges Classes Homophily Ratio ν(0)/Nodes

Chameleon 2,277 31,396 5 0.23 0.52 Squirrel 5,201 198,423 5 0.22 0.37 Actor 7,600 26,705 5 0.22 0.15 Amazon-Ratings 24,492 93,050 5 0.38 0.17 Texas 183 168 5 0.11 0.35 Cora 2,708 5,278 7 0.81 0.11 Citeseer 3,327 4,614 6 0.74 0.14 Amazon-Photo 7,650 71,831 8 0.83 0.02 Pubmed 19,717 44,324 3 0.80 0.61

filter response are often desired (h is large), (2) some eigenvalues may be very close together (|λR+1 λR| is small), and (3) using high-degree polynomials is computationally expensive.

In contrast, by using piecewise constant filters, we can add a constant filter only at point λR+1 with the value h and eliminate the jump entirely. The ablation study in Table 3 demonstrates that adding constant filters to polynomial filters improves performance. This theoretical result justifies combining polynomial filters with constant filters in our approach.

5.2 Sign and Basis Invariance

Eigenvectors corresponding to a given eigenvalue can have multiple representations. For example, if λ = 0 is an eigenvalue of the normalized adjacency matrix ˆ A, any eigenvector ui associated with λ = 0 can be replaced with its opposite ui or any linear combination of eigenvectors for the same eigenvalue. This variability introduces sign and basis ambiguity problems, which can lead to inconsistent or unpredictable results in learning tasks.

Proposition 1 (This follows directly from (Lim et al., 2023)). Polynomial filters are invariant to sign changes and basis choices.

The invariance of polynomial filters stems from the stability of eigenvector products U µU µ under different basis representations. This property ensures that matrix powers ˆ A p maintain consistent behavior regardless of the specific eigenbasis chosen, making polynomial filters reliable for spectral graph operations.

Proposition 2. Constant filters are invariant to sign changes and basis choices.

Both polynomial and constant filters are robust to different eigenvector representations, ensuring that learned representations are not affected by arbitrary sign changes or basis choices, thus improving model stability and generalization. By leveraging filters with this characteristic, our model can produce consistent outputs despite the inherent ambiguities in eigendecomposition.

6 Experimental Evaluation

6.1 Datasets

We evaluate Pie Co N on seven diverse node classification datasets with varying graph structures and homophily ratios (Table 1). Cora, Citeseer, and Pubmed are citation networks where nodes are research papers and edges represent citations. Photo is a product co-occurrence graph with nodes as products and edges representing co-purchase relationships. Actor is a graph where nodes are actors and edges denote co-occurrence in films. Chameleon and Squirrel are graphs derived from Wikipedia pages. Nodes represent web pages, and edges denote mutual links. Texas is an academic web graph where nodes are webpages from the University of Texas and edges represent hyperlinks between pages. Amazon-Ratings is a product co-purchasing network

Published in Transactions on Machine Learning Research (04/2025)

Table 2: Results on real-world node classification tasks.

Model Heterophilic Homophilic

Amazon-Ratings

Amazon-Photo

Spatial-based GNNs

GCN 68.10 1.20 50.11 1.21 34.65 0.68 48.80 0.22 78.69 1.80 87.18 0.87 81.04 0.67 85.87 0.83 87.31 0.31 GAT 63.13 1.93 44.49 0.88 33.93 2.47 50.28 0.55 77.54 0.98 88.03 0.79 80.52 0.71 90.94 0.68 87.29 0.48 H2GCN 57.11 1.58 36.42 1.89 35.86 1.03 48.17 0.52 88.36 2.62 86.92 1.37 77.07 1.64 93.02 0.91 88.93 0.35 GCNII 63.44 0.85 41.96 1.02 36.89 0.95 46.60 1.20 89.18 4.43 88.46 0.82 79.97 0.65 89.94 0.31 89.68 0.30 Spectral-based GNNs

Free eigenvalues 69.58 1.31 59.76 1.01 41.61 0.63 44.28 1.13 88.20 3.28 84.91 0.89 77.39 0.82 86.08 0.81 86.07 0.47 Lanczos Net 64.81 1.56 48.64 1.77 38.16 0.91 48.35 0.40 76.39 4.43 87.77 1.45 80.05 1.65 93.21 0.85 84.41 0.66 Cheby Net 59.28 1.25 40.55 0.42 37.61 0.89 50.20 0.52 77.21 2.95 86.67 0.82 79.11 0.75 93.77 0.32 90.11 0.26 GPR-GNN 67.28 1.09 50.15 1.92 39.92 0.67 49.37 0.71 88.53 3.36 88.57 0.69 80.12 0.83 93.85 0.28 91.36 0.40 Bern Net 68.29 1.58 51.35 0.73 41.79 1.01 48.82 0.20 89.02 3.45 88.52 0.95 80.09 0.79 93.63 0.35 88.98 0.46 PPGNN 69.45 1.05 48.47 2.51 39.65 0.66 47.96 0.89 85.57 2.62 88.32 0.60 80.98 0.51 95.09 0.31 90.11 0.26 Cheb Net II 71.37 1.01 57.72 0.59 41.75 1.07 48.79 0.21 89.11 3.43 88.71 0.93 80.53 0.79 94.92 0.33 89.76 0.32 Jacobi Conv 73.92 1.07 57.38 0.60 40.43 0.81 48.53 0.96 89.02 2.79 88.69 1.03 81.65 0.46 95.36 0.24 87.83 0.43 DSF-Jacobi-R 72.17 0.79 55.84 0.94 39.89 0.54 48.68 0.34 89.18 3.44 88.31 0.89 81.11 0.63 94.90 0.31 88.97 0.39 Opt Basis GNN 74.40 0.90 63.98 1.12 42.39 0.52 48.80 0.21 87.38 2.79 87.96 0.71 80.79 1.35 94.71 0.33 87.36 0.41 Specformer 74.92 0.98 64.26 1.18 41.56 1.25 OOM 83.60 3.27 87.55 0.87 80.98 0.79 95.29 0.30 OOM Uni Filter 74.11 1.68 63.52 1.30 40.11 1.31 50.02 0.70 86.72 3.77 89.10 1.07 81.21 1.66 94.96 0.74 91.36 0.45 Pie Co N 75.75 0.96 65.67 0.82 39.79 0.56 52.37 0.50 89.34 3.11 89.16 0.64 80.98 0.57 95.65 0.34 91.39 0.41 p < 0.05 (significant difference from Pie Co N) p < 0.001 (highly significant difference from Pie Co N)

where nodes are products and edges indicate frequent co-purchases, with the task of predicting product rating classes. The ratio of eigenvalue 0 multiplicity to the number of nodes is shown in the last column of Table 1.

All datasets were randomly split into 60% training, 20% validation, and 20% test sets for 10 different seeds. For each dataset, we report the average performance along with the 95% confidence interval. Details about hyperparameter optimization and the running environment are provided in Appendix E.

6.2 Baseline Models

We compare Pie Co N against several baseline models categorized into different groups based on their underlying graph learning methodologies:

Spatial-based GNNs: Graph Convolutional Network (GCN) (Kipf & Welling, 2017), Graph Attention Network (GAT) (Veličković et al., 2018), Higher-order GCN (H2GCN) (Zhu et al., 2020), and GCNII (Chen et al., 2020).

Spectral-based GNNs: Uni Filter (Huang et al., 2024), Lanczos Net (Liao et al., 2019), Cheby Net (Defferrard et al., 2016), Generalized Page Rank GNN (GPR-GNN) (Chien et al., 2021), Bern Net (He et al., 2021), PP-GNN (Lingam et al., 2022), Cheb Net II (He et al., 2022), DSF-Jacobi-R (Guo et al., 2023), Jacobi Conv (Wang & Zhang, 2022), Opt Basis GNN (Guo & Wei, 2023), and Specformer (Bo et al., 2023).

Free eigenvalues: A graph neural network that learns a spectral filter by directly parameterizing the eigenspectrum ˆ A = UΛU , where Λ contains trainable eigenvalues.

6.3 Results

Table 2 shows the node classification accuracy of Pie Co N compared to baseline models across various datasets. We observe that Pie Co N achieves the highest performance on seven datasets. Notably, the largest

Published in Transactions on Machine Learning Research (04/2025)

Table 3: Ablation study results.

Amazon-Ratings

Amazon-Photo

66.35 0.88 48.39 0.78 40.31 0.94 50.21 0.87 88.69 3.28 88.74 0.77 80.33 0.85 95.57 0.35 91.24 0.40 67.26 0.50 54.65 0.72 32.07 1.17 49.66 0.71 90.66 2.30 84.30 1.06 75.45 0.68 92.70 0.34 90.72 0.39 73.61 0.81 60.50 1.03 38.98 0.78 47.35 0.76 90.00 3.11 86.80 1.16 81.72 0.58 94.95 0.33 90.74 0.48 74.77 1.01 65.00 1.12 39.02 0.54 49.28 0.62 89.67 2.46 87.22 1.12 81.54 0.63 94.76 0.39 90.83 0.41 75.75 0.96 65.67 0.82 39.79 0.56 52.37 0.50 89.34 3.11 89.16 0.64 80.98 0.57 95.65 0.34 91.38 0.41 p < 0.05 (significant difference from full model) p < 0.001 (highly significant difference from full model)

improvements are observed on the heterophilic datasets Chameleon, Squirrel, and Amazon-Ratings, with gains of 0.83%, 1.41%, and 2.09%, respectively. This may be linked to the high multiplicity of the eigenvalue 0 in the normalized adjacency matrix of these graphs (see Table 1), to which our method gives more importance.

For homophilic datasets, such as Cora, Amazon-Photo, and Pubmed, Pie Co N also demonstrates competitive performance, achieving slight improvements over existing methods. The smaller gains suggest that traditional GNNs already perform well in these settings, as they inherently align with homophilic assumptions. Nonetheless, Pie Co N remains robust, indicating that its spectral filtering approach does not hinder its ability to learn from homophilic graphs. These results indicate that Pie Co N is effective in both heterophilic and homophilic settings.

We also run t-tests comparing Pie Co N with each baseline method to verify the statistical significance of our results. Stars in Table 2 show when a baseline performs significantly different than Pie Co N (* for p < 0.05, *** for p < 0.001). Most baselines show significant differences on heterophilic datasets, confirming that our gains are meaningful. For example, all methods except Specformer and Uni Filter show significantly lower performance on Chameleon. On homophilic datasets, several methods show no significant difference from Pie Co N, indicating competitive but not always superior performance.

6.4 Ablation Study

We have performed an ablation study using Eq. (10) to evaluate the contribution of each component on model performance. The results in Table 3 reveal several key findings. First, the full model incorporating all three components (positive part, negative part, and polynomial filters) achieves the best performance on 6 out of 9 datasets, with notable improvements on Chameleon (75.75%), Squirrel (65.67%), and Amazon-Ratings (52.37%). The combination of positive and negative parts without polynomial filters also shows strong performance, suggesting that these components capture complementary spectral information. For instance, on Squirrel, adding the negative part to the positive part improves accuracy from 60.50% to 65.00%. Interestingly, on the Actor dataset, using only polynomial filters yields the best performance (40.31%), while on Citeseer, the positive part alone achieves optimal results (81.72%).

In a separate experiment, we also create a simple spectral method with the eigenvalues as parameters. The results of this experiment are presented in Table 2 with the model name Free eigenvalues . However, this approach may be less effective because the method does not receive any explicit structural information associated with the eigenvalues.

7 Limitations

Our work has some limitations. The computational complexity of O(n3) for eigendecomposition presents scalability challenges for large-scale graphs. However, this preprocessing step occurs only once and has become increasingly practical with modern hardware, taking approximately 8 minutes for datasets with 25,000 nodes on an NVIDIA A100 GPU. It is worth noting that our approach is designed for graphs of reasonable size (up to tens of thousands of nodes), where the eigendecomposition is a worthwhile one-time preprocessing cost

Published in Transactions on Machine Learning Research (04/2025)

given the expressivity benefits. Also, our work does not address the problem of scaling GNNs to massive graphs with millions of nodes, which is a separate research area requiring specialized techniques like graph sampling or partitioning beyond the scope of this paper. Furthermore, the model s performance is highly dependent on how we partition the eigenvalue intervals, and our current approach using hard thresholding to identify significant spectral changes may not be optimal.

8 Conclusion

In this paper, we presented the Piecewise Constant Spectral Graph Neural Network (Pie Co N), a new approach to graph prediction tasks. Our method aims to address some limitations of existing spectral GNNs by combining constant spectral filters with polynomial filters to capture a broader range of spectral characteristics in real-world graphs. We introduced an adaptive spectral partitioning technique that analyzes the derivative of sorted eigenvalues to identify significant spectral changes. This helps focus on the most informative regions of the spectrum. Pie Co N expands the search space of possible eigenvalue filters beyond traditional polynomial-based filters, allowing for a more tailored capture of graph spectral properties. This is particularly useful when dealing with graphs that have high eigenvalue multiplicity. By integrating spectral filters with polynomial filters, our approach attempts to model both global graph structure and local neighborhood information. Our experiments on nine benchmark datasets, covering both homophilic and heterophilic graph structures, suggest that Pie Co N performs well on both types of datasets.

Acknowledgements

Vahan Martirosyan is the recipient of a Ph D scholarship from the STIC Doctoral School of Université Paris-Saclay.

Naum Achiezer. Theory of Approximation. Dover Publications, Inc., New York, 1992.

Anirban Banerjee. The Spectrum of the Graph Laplacian as a Tool for Analyzing Structure and Evolution of Networks. Ph.d. thesis, Universität Leipzig, Leipzig, Germany, 2008.

James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper-parameter optimization. In Neur IPS, 2011.

Deyu Bo, Chuan Shi, Lele Wang, and Renjie Liao. Specformer: Spectral graph neural networks meet transformers. In ICLR, 2023.

Pietro Bongini, Monica Bianchini, and Franco Scarselli. Molecular generative graph neural networks for drug discovery. Neurocomputing, 450:242 252, 2021.

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le Cun. Spectral networks and locally connected networks on graphs. In ICLR, 2014.

Jhon A. Castro-Correa, Jhony H. Giraldo, Mohsen Badiey, and Fragkiskos D. Malliaros. Gegenbauer graph neural networks for time-varying signal reconstruction. IEEE Trans. Neural Netw. Learn. Syst., 35(9): 11734 11745, 2024.

Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. Simple and deep graph convolutional networks. In ICML, 2020.

Yao Cheng, Caihua Shan, Yifei Shen, Xiang Li, Siqiang Luo, and Dongsheng Li. Resurrecting label propagation for graphs with heterophily and label noise. In KDD, 2024.

Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. Adaptive universal generalized pagerank graph neural network. In ICLR, 2021.

Published in Transactions on Machine Learning Research (04/2025)

Fan R. K. Chung. Spectral Graph Theory, volume 92 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997.

Connor W. Coley, Wengong Jin, Luke Rogers, Timothy F. Jamison, Tommi S. Jaakkola, William H. Green, Regina Barzilay, and Klavs F. Jensen. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci., 10(2):370 377, 2019.

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Neur IPS, 2016.

Alexandre Duval, Victor Schmidt, Alex Hernández-García, Santiago Miret, Fragkiskos D. Malliaros, Yoshua Bengio, and David Rolnick. FAENet: Frame averaging equivariant gnn for materials modeling. In ICML, 2023.

Miroslav Fiedler. Algebraic connectivity of graphs. Czechoslovak Math. J., 23(98):298 305, 1973.

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. In ICML, 2017.

Jingwei Guo, Kaizhu Huang, Xinping Yi, and Rui Zhang. Graph neural networks with diverse spectral filtering. In WWW, 2023.

Yuhe Guo and Zhewei Wei. Graph neural networks with learnable and optimal polynomial bases. In ICML, 2023.

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Neur IPS, 2017.

Mingguo He, Zhewei Wei, Zengfeng Huang, and Hongteng Xu. Bern Net: Learning arbitrary graph spectral filters via bernstein approximation. In Neur IPS, 2021.

Mingguo He, Zhewei Wei, and Ji-Rong Wen. Convolutional neural networks on graphs with chebyshev approximation, revisited. In Neur IPS, 2022.

Keke Huang, Yu Guang Wang, Ming Li, and Pietro Liò. How universal polynomial bases enhance spectral graph neural networks: Heterophily, over-smoothing, and over-squashing. In ICML, 2024.

Dejun Jiang, Zhe Wu, Chang-Yu Hsieh, Guangyong Chen, Ben Liao, Zixuan Wang, Chao Shen, Dongsheng Cao, Jian Wu, and Tingjun Hou. Could graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models. J. Cheminform., 13(1): 1 23, 2021.

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.

Renjie Liao, Zhizhen Zhao, Raquel Urtasun, and Richard S. Zemel. Lanczos Net: Multi-scale deep graph convolutional networks. In ICLR, 2019.

Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, and Stefanie Jegelka. Sign and basis invariant networks for spectral graph representation learning. In ICLR, 2023.

Vijay Lingam, Chanakya Ekbote, Manan Sharma, Rahul Ragesh, Arun Iyer, and Sundararajan Sellamanickam. A piece-wise polynomial filtering approach for graph neural networks. In ICLR Workshop, 2022.

Kangkang Lu, Yanhua Yu, Hao Fei, Xuan Li, Zixuan Yang, Zirui Guo, Meiyu Liang, Mengran Yin, and Tat-Seng Chua. Improving expressive power of spectral graph neural networks with eigenvalue correction. In AAAI, 2024.

Sohir Maskey, Raffaele Paolino, Aras Bacho, and Gitta Kutyniok. A fractional graph laplacian approach to oversmoothing. In Neur IPS, 2023.

Published in Transactions on Machine Learning Research (04/2025)

Antonio Ortega, Pascal Frossard, Jelena Kovačević, José M. F. Moura, and Pierre Vandergheynst. Graph signal processing: Overview, challenges, and applications. Proc. IEEE, 106(5):808 828, 2018.

George Panagopoulos, Nikolaos Tziortziotis, Michalis Vazirgiannis, and Fragkiskos Malliaros. Maximizing influence with graph neural networks. In ASONAM, 2023.

Prasanna Sahoo and Thomas Riedel. Mean Value Theorems and Functional Equations. World Scientific, Singapore, 1998.

David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag., 30(3):83 98, 2013.

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In ICLR, 2018.

Ulrike Von Luxburg. A tutorial on spectral clustering. Stat. Comput., 17(4):395 416, 2007.

Xiyuan Wang and Muhan Zhang. How powerful are spectral graph neural networks. In ICML, 2022.

Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. Session-based recommendation with graph neural networks. In AAAI, 2019.

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst., 32(1):4 24, 2021.

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In ICLR, 2019.

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. In KDD, 2018.

Shuai Zheng, Zhenfeng Zhu, Zhizhe Liu, Youru Li, and Yao Zhao. Node-oriented spectral filtering for graph neural networks. IEEE Trans. Pattern Anal. Mach. Intell., 46(1):388 402, 2023.

Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. AI Open, 1:57 81, 2020.

Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: Current limitations and effective designs. In Neur IPS, 2020.

Published in Transactions on Machine Learning Research (04/2025)

Theorem 1 (Approximation error for ϵ-dense eigenvalues). Let ˆ A Rn n be a normalized adjacency matrix with spectrum {λi}n i=1 where 1 λ1 λn 1. Assume that these eigenvalues are ϵ-dense on [ 1, 1] and that d2ϵ < 1. Let f : [ 1, 1] R be a filter function with f = supx [ 1,1] |f(x)| = 1. For any polynomial p Pd(the space of polynomials of degree at most d) the approximation error is:

i=1 |p(λi) f(λi)| p (1 d2ϵ) 1. (13)

Proof. Let x0 [ 1, 1] be a point where |p(x0)| = p , i.e., the point where the polynomial p attains its maximum absolute value on the interval [ 1, 1]. Since the eigenvalues {λi}n i=1 are ϵ-dense on [ 1, 1], there exists an eigenvalue λj such that |x0 λj| ϵ. By Markov s polynomial inequality (Sahoo & Riedel, 1998):

p d2 p . (14)

Using the mean value theorem (Achiezer, 1992), there exists ξ [x0, λj] (or [λj, x0] if λj > x0) such that:

|p(x0) p(λj)| = |p (ξ)| |x0 λj| p |x0 λj| d2 p ϵ. (15)

Therefore, |p(λj)| |p(x0)| |p(x0) p(λj)| p d2 p ϵ = p (1 d2ϵ). (16)

Since f = 1, we know that |f(λj)| 1. Then,

|p(λj) f(λj)| |p(λj)| |f(λj)| p (1 d2ϵ) 1. (17)

Since E(p, f) = Pn i=1 |p(λi) f(λi)| |p(λj) f(λj)|, we have:

E(p, f) p (1 d2ϵ) 1. (18)

This establishes the claimed lower bound on the approximation error.

Theorem 2 (Approximation error for functions with jump discontinuities). Let ˆ A Rn n be a normalized adjacency matrix with spectrum {λi}n i=1 where 1 λ1 λn 1. Let f : [ 1, 1] R be some filter function with f = 1 that the model needs to find and suppose it has a jump discontinuity of magnitude h > 0 between consecutive eigenvalues λR and λR+1. For any polynomial p Pd(the space of polynomials of degree at most d), satisfying p 1, the approximation error is:

i=1 |p(λi) f(λi)| h |λR+1 λR| d2. (19)

Proof. By Markov s polynomial inequality (Sahoo & Riedel, 1998):

p d2 p d2, p Pd. (20)

Next using the mean value theorem (Achiezer, 1992), we find that ξxy [x, y] such that:

|p(x) p(y)| = |p (ξxy)| |x y| d2|x y|, x, y [ 1, 1]. (21)

Now, we define the local error at consecutive eigenvalues as:

E2 = |p(λR) f(λR)| + |p(λR+1) f(λR+1)|. (22)

Applying the triangle inequality, we obtain:

E2 |f(λR+1) f(λR)| |p(λR+1) p(λR)|

h |λR+1 λR| d2. (23)

Since it holds that E(p, f) E2, the result follows.

Published in Transactions on Machine Learning Research (04/2025)

Proposition 1 (This follows directly from (Lim et al., 2023)). Polynomial filters are invariant to sign changes and basis choices.

Proof. The key property we use is that the product U µU µ , where U µ is a matrix of eigenvectors associated with eigenvalue µ, remains invariant under sign changes and different choices of basis (Lim et al., 2023). This implies that regardless of which orthonormal basis is chosen for a given eigenspace, the product U µU µ stays the same.

Consider two orthonormal bases U µ and V µ for the eigenspace corresponding to eigenvalue µ. There exists an orthogonal matrix Q such that: V µ = U µQ. (24)

Using this, we show the invariance:

V µV µ = (U µQ)(U µQ) = U µQQ U µ = U µU µ . (25)

where we used the fact that QQ = I since Q is orthogonal. This confirms that U µU µ is invariant under the change of basis.

Now, consider the matrix power ˆ A p, which can be expressed as:

i=1 λp i uiu i =

i=1 (λ i)p U λ i U λ i, (26)

where λ i are the distinct eigenvalues, and U λ i are the corresponding eigenvector matrices.

Since each term U λ i U λ i is invariant to basis choices and each term uiu i = ( ui)( ui) is invariant to

sign changes, it follows that ˆ A p is also invariant to both.

Proposition 2. Constant filters are invariant to sign changes and basis choices.

Proof. Consider the constant filter T k defined over the interval [ak, ak+1) for some k. Let tk be the index such that λ tk = λak, and thus λ tk+1 1 = λak+1 1.

Suppose l, r are indices such that λl 1 = λl = λl+1 = = λr 1 = λr. According to Alg. 1, no significant point i will be selected with l < i < r, ensuring that constant eigenvalue intervals are not split. Consequently, λ tk = λak 1 when ak > 1, and λ tk+1 1 = λak+1 when ak+1 n.

The distinct eigenvalues of ˆ A within the interval [λak, λak+1 1] are:

λ tk, λ tk+1, . . . , λ tk+1 1. (27)

Using Equation 6, we express T k as:

T k = U [:,ak:ak+1]U [:,ak:ak+1] =

i=ak uλiu λi =

i=tk U λ i U λ i. (28)

Since each term U λ i U λ i is invariant to basis choices and each term uiu i = ( ui)( ui) is invariant to sign changes, it follows that T k is also invariant to both.

B Parameter Sensitivity Analysis of Algorithm 1

We further investigate the impact of the window size (w) and the number of limits (K) from Alg. 1 on model performance. To isolate the effect of each parameter, we conduct controlled experiments where we fix one parameter at its optimal value while varying the other. Specifically, when examining window size effects, we fix K at its optimal value for each dataset, and when studying the number of spectral intervals,

Published in Transactions on Machine Learning Research (04/2025)

10 20 30 40 50 60 Window Size (w)

Test Accuracy (%)

Chameleon Accuracy

Figure 6: Effect of w on Chameleon dataset performance with optimal K.

5 10 15 20 25 30 Number of Spectral Intervals (K)

Test Accuracy (%)

Chameleon Accuracy

Figure 7: Effect of K on Chameleon dataset performance with optimal w.

10 20 30 40 50 60 Window Size (w)

Test Accuracy (%)

Cora Accuracy

Figure 8: Effect of w on Cora dataset performance with optimal K.

5 10 15 20 25 30 Number of Spectral Intervals (K)

Test Accuracy (%)

Cora Accuracy

Figure 9: Effect of K on Cora dataset performance with optimal w.

we fix w at its optimal value. As shown in Fig. 6 and Fig. 7, the Chameleon dataset exhibits distinctive parameter sensitivities. Performance peaks at w = 20 with an accuracy of 76.30%, followed by a sharp decline and stabilization around 74.70% for larger window sizes. For the number of spectral intervals, we observe a dramatic performance increase from K = 2 to K = 10, with optimal performance in the range 10 K 15 (reaching 75.84%), followed by a gradual decrease as K increases further. In contrast, Cora (Fig. 8 and Fig. 9) shows more stability across different window sizes with an optimal value at w = 40 reaching 89.13% accuracy. For spectral intervals, Cora demonstrates a similar pattern to Chameleon but with less pronounced differences, showing peak performance at 10 K 12.

These results indicate that the number of spectral intervals (K) significantly influences model performance, whereas window size (w) has a more limited effect. Too few partitions fail to capture important spectral characteristics, while too many may introduce noise.

C Computational Efficiency Comparison

To evaluate the computational efficiency of our approach, we conduct a running time analysis comparing the constant filter component of Pie Co N with Jacobi Conv s polynomial filters.

Published in Transactions on Machine Learning Research (04/2025)

2 5 10 20 30 40 50 60 70 80 90 100 Parameter Value (Degree / Number of Spectral Intervals (K))

Average Epoch Time (Seconds)

Jacobi Conv (Degree) Pie Co N (Number of Spectral Intervals (K))

Figure 10: Computation time of Pie Co N vs. Jacobi Conv.

Figure 10 illustrates the average epoch time (in seconds) as we increase the number of spectral intervals (K) in Pie Co N compared to increasing the polynomial degree in Jacobi Conv. For a fair comparison, we disabled the polynomial part in the Pie Co N implementation for this experiment.

The results show that Pie Co N s constant filters require less computation time than high-degree polynomial filters. At parameter value K = 100, Pie Co N takes approximately 0.05 seconds per epoch compared to Jacobi Conv s 0.13 seconds. This efficiency comes from our direct spectral interval filtering approach, which applies filters to specific eigenvalue intervals rather than performing sequential matrix multiplications needed for higher-degree polynomials.

D Analysis of Graph Structure and Eigenvalue Zero

Figure 11: Simple graph with duplicate subgraphs. After adding the duplicates, the multiplicity of eigenvalue 0 increases from 0 to 3.

The presence of eigenvalue 0 in graph spectra reveals important structural properties that many polynomial-based GNN methods overlook. Real-world datasets often exhibit high multiplicity of eigenvalue 0 (Table 1), yet methods like Jacobi Conv (Wang & Zhang, 2022), Bernnet (He et al., 2021), and Chebynet (Defferrard et al., 2016), which use low-degree polynomials of the normalized adjacency matrix ˆ A, do not adequately capture these properties.

Theorem 3 (Banerjee (2008)). Let J be a graph and H be a subgraph of J with eigenvalue 0. Then V (H) = p1, p2, . . . , pm V (J) and E(H) E(J) consists of edges between vertices in V (H). We define JH to be the graph obtained from J by attaching a copy of H as follows: (1) add a new set of vertices q1, q2, . . . , qm to J, where each qi corresponds to pi in H; (2) add edges between vertices in q1, q2, . . . , qm that mirror the edges in H; (3) for each i 1, 2, . . . , m and each vertex r V (J) \ V (H), add an edge qi, r if there exists an edge pi, r in J.

Then, the graph JH has an eigenvalue 0 with an associated eigenvector u that is nonzero only at the nodes pi and qi. Furthermore upi = uqi.

Theorem 3 reveals that when a graph contains duplicate substructures, it leads to eigenvalue 0 with eigenvector localized to specific node sets. This localization property is particularly relevant for node classification and community detection tasks, as nodes with similar structural roles often share the same labels (Fig. 11).

In analyzing how the negative and positive parts of T k affect and change the structure of the graph, we consider a simple graph with duplicate subgraphs, as illustrated in Fig. 11. In this graph, nodes with the same labels are duplicates. According to Theorem 3, these duplicates create eigenvalues equal to 0 in the

Published in Transactions on Machine Learning Research (04/2025)

Figure 12: Simple graph with added positive edges.

Figure 13: Simple graph with added negative edges.

Table 4: Hyperparameter ranges used for optimization.

Hyperparameter Values

Learning Rate (lr) 0.0005, 0.001, 0.005, 0.01, 0.05 Weight Decay (weight_decay) 0.0, 5e-5, 1e-4, 5e-4, 1e-3 Feature Dropout (feat_dropout) 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 Number of Layers (nlayer) 1, 2, 3, 4, 5 Hidden Dimension (hidden_dim) 16, 32, 64 Window size for Algorithm 1 (window_size) 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 Number of spectral intervals (num_limits) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20

eigendecomposition of the normalized Adjacency matrix. Let U 0 denote the eigenvectors corresponding to the eigenvalue 0. We can decompose R0 = U 0U 0 into its negative (R 0 ) and positive parts (R+ 0 ). Using these parts, we construct two graphs by choosing the edges with the highest score in these matrices. The graphs resulting from this process, including original and added edges, are shown in Figures 12 and 13.

From these graphs, we observe that both negative and positive edges identify connections between duplicate motifs. The negative edges also reveal connections between duplicate nodes within these duplicate motifs, highlighting their role in capturing structural similarities.

Another intuition to split T k is that, for example, the second eigenvector provides a direction that best separates the graph into two groups while minimizing the connections cut between them. The positive and negative values show two clusters that are internally connected but separated from each other (Fiedler, 1973; Von Luxburg, 2007).

E Hyperparameter Optimization and Running Environment

All experiments were carried out on a Linux machine with an NVIDIA A100 GPU, Intel Xeon Gold 6230 CPU (20 cores @ 2.1GHz), and 24GB RAM. Hyperparameter tuning was performed using the Hyperopt Tree of Parzen Estimators (TPE) algorithm (Bergstra et al., 2011) with the hyperparameter ranges shown in Table 4.

The Adam optimizer was used for training with 2,000 epochs. Hyperparameters were selected to achieve the best performance on a validation set.