# signed_laplacians_for_constrained_graph_clustering__581171f2.pdf Signed Laplacians for Constrained Graph Clustering John Stewart Fabila Carrasco 1 He Sun 1 Given two weighted graphs G = (V, E, w G) and H = (V, F, w H) defined on the same vertex set, the constrained clustering problem seeks to find a subset S V that minimises the cut ratio between w G(S, V \ S) and w H(S, V \ S). In this work, we establish a Cheeger-type inequality that relates the solution of the constrained clustering problem to the spectral properties of G and H. To reduce computational complexity, we utilise the signed Laplacian of H, streamlining calculations while maintaining accuracy. By solving a generalised eigenvalue problem, our proposed algorithm achieves notable performance improvements, particularly in challenging scenarios where traditional spectral clustering methods struggle. We demonstrate its practical effectiveness through experiments on both synthetic and real-world datasets. 1. Introduction Clustering is a fundamental technique in machine learning, with extensive applications across computer science and various scientific disciplines. The primary goal of clustering is to partition data points into clusters, such that points within each cluster are more densely connected than those in other clusters. Traditional clustering, such as spectral clustering (von Luxburg, 2007) relies solely on the structure of the data. However, in many real-world scenarios, additional domain knowledge is available, introducing specific constraints that should be incorporated into the clustering process to achieve more accurate and meaningful results (Basu et al., 2008). Constrained clustering focuses on developing algorithms that effectively incorporate this domain knowledge to enhance clustering performance (Wagstaff et al., 2001). The 1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom. Correspondence to: He Sun . Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s). domain knowledge is represented by two types of pairwise constraints: (1) MUST-LINK constraints, requiring that a pair of data points must be assigned to the same cluster, and (2) CANNOT-LINK constraints, requiring that a pair of data points must be assigned to different clusters. In the context of graph clustering, the goal is to partition the vertices of a graph based on edge connectivity while satisfying the given constraints. In this paper, we examine the constrained clustering problem by formalising these constraints using the graphs G = (V, E, w) and H = (V, E , w ), in which every data point corresponds to a graph vertex, every MUSTLINK (resp. CANNOT-LINK) constraint corresponds to an edge in G (resp. H), and the edge weights capture the strength of the user s preference for satisfying the corresponding constraint. For any set S V , we define the cut ratio of S between G and H by cut G H(S, V \ S) = w G(S, V \ S) w H(S, V \ S), (1) and the objective is to find S that achieves ΦG H = min S V cut G H(S, V \ S). We develop an efficient approximation algorithm for the constrained graph clustering problem. The key to our algorithm is a Cheeger-type inequality that upper bounds ΦG H with respect to λ2( G H) and λ2( H), where λ2 G H = min x 1 x, Gx x, Hx , (2) λ2 H = min x 1 x, Hx and G and H are the Laplacian operators of G and H respectively. By introducing several techniques to adjust G and H, we significantly reduce the time complexity for solving the generalised eigenvalue problem, while maintaining the one-to-one correspondence between the solution of the new reduced instance and the initial one. The empirical studies on both the synthetic and real-world data sets confirm that with the two sets of constraints our algorithm presents significantly better performance than the classical spectral clustering algorithm, and the running time of our algorithm is close to traditional spectral clustering methods. Signed Laplacians for Constrained Graph Clustering Related work. Cheeger-type inequalities for constrained graph clustering are studied in the literature. For example, Cucuringu et al. (2016) proved that ΦG H ΦG K 4λ2( G H); this inequality is based on a third graph K, which they call the demand graph. Koutis et al. (2023) showed that ΦG H 16λ2( G H)/Φ(G), where Φ(G) is the standard conductance of G. These two results cannot be directly compared with ours, since both inequalities upper bound ΦG H with respect to λ2( G H) and parameters of H, i.e., ΦG K in (Cucuringu et al., 2016) and Φ(G) in (Koutis et al., 2023). In contrast, we upper bound ΦG H with respect to λ2( G H) and λ2( H). Trevisan (2013) studied the computational complexity of the problem and proved that under the Unique Games Conjecture, it s impossible to find a cut that achieves approximation in polynomial time. Our work also relates to the studies on constrained graph clustering from practical perspectives (Jia et al., 2021; Wang & Davidson, 2010; Wang et al., 2014) and signed cuts using the signed Laplacian (Knyazev, 2017). Most of these studies, however, lack the rigorous analysis of the quality of the resulting clusters compared to the optimal solution. Our work is further linked to Cheeger-type inequalities for different graph Laplacians (Lange et al., 2015; Li et al., 2019) including the signed Laplacian (Atay & Liu, 2020), and their higher-order generalisations (Lee et al., 2014). Contribution. We propose a novel method for constrained graph clustering that establishes a Cheeger-type inequality explicitly linking the cut ratio objective to the spectral properties of two graphs G and H. Our approach introduces a signed Laplacian based implementation, which avoids additional parameters while improving both numerical stability and computational efficiency. Finally, our experiments on synthetic and real-world datasets confirm the robustness and scalability of our algorithm, significantly outperforming standard spectral clustering under challenging scenarios. 2. Background & Preliminaries We consider a finite, undirected graph G = (V, E), where V is the set of vertices and E is the set of edges. Each edge uv E denotes an undirected connection between vertices u and v. A self-loop in this context is represented by uu, indicating an edge that starts and ends at the same vertex u. The notation u v means that u and v are connected by an edge. We define weights in the graph G through the function w : E R+, where wuv = wvu specifies the weight of the edge between vertices u and v. By definition, each self-loop uu contributes twice to the degree of vertex u. For E0 E, we interpret w as a discrete measure on the corresponding sets, using the notation w(E0) = P For V0, V1 V , we denote the set of unoriented edges between V0 and V1 as E(V0, V1) = {uv E | u V0 and v V1}. We denote by Ev the set of all edges connected to v, and Nv the neighbourhood of v as the set of vertices adjacent to v, i.e., Ev = E({v}, V ) and Nv = {u V | v u}. The edge weights on a graph determine a weighted degree of a vertex, defined by deg(v) = w(Ev) = P e Ev we. The Laplacian. We define some standard spaces related to a finite and weighted graph G = (V, E) with weight function w. We define the Hilbert spaces ℓ2(V, w) and ℓ2(E, w) as ℓ2(V, w) = {φ : V R}, ℓ2(E, w) = {η : E R}. Furthermore, we consider the natural inner product for these spaces. For ℓ2(V, w), the inner product between functions φ and ψ is defined as φ, ψ V = P v V φ(v)ψ(v) deg(v). For ℓ2(E, w), the inner product between functions η and ξ is defined as η, ξ E = P e E ηeξewe. Let G = (V, W, w) be a weighted graph. The derivative d is defined as d : ℓ2(V, w) ℓ2(E, w), (dφ)e=(u,v) = φ(u) φ(v). The adjoint d : ℓ2(E, w) ℓ2(V, w) is given by (d η)(v) = 1 deg(v) The weighted Laplacian : ℓ2(V, w) ℓ2(V, w) is defined as = d d, and acts as ( φ)(v) = 1 deg(v) φ(v) φ(u) wuv. Let G = (V, E, w) be a weighted graph, and for all φ ℓ2(V, w) we have that φ, Gφ = φ, d dφ ℓ2(V ) = dφ, dφ ℓ2(E) u v |dφuv|2wuv = X u v (φ(u) φ(v))2wuv. Graph Signature. A signature of a graph G = (V, E) is a map α : E {+1, 1}, which assigns a sign to each edge. Let G = (V, E, w) be a weighted graph with a signature α. The signed Laplacian, denoted as α, is a linear operator α : ℓ2(V, w) ℓ2(V, w), defined by ( αφ)(v) = 1 deg(v) φ(v) αvuφ(u) wvu, where wuv is the weight of the edge uv, and αuv indicates the sign of the edge as given by the signature α. Observe that the Laplacian is a particular case of the signed Laplacian by Signed Laplacians for Constrained Graph Clustering taking αuv = 1 for all edges and the signless Laplacian by taking αuv = 1 for all edges. The signed Laplacian can be viewed as a special case of the magnetic Laplacian with a discrete magnetic potential taking values in {0, π}. The operators α (and therefore ) are positive, semi-definite, and self-adjoint, hence all of their eigenvalues are real and non-negative. 3. Algorithm & Analysis In this section, we present a constrained graph clustering algorithm called CC++. At a high level, our algorithm consists of the following: in the preprocessing step, we adjust the edge weights of G and add self-loops to the vertices of G, such that both of G and H have the same degree sequence. Then, we show that a desired cut can be found by a sweep-set algorithm when the eigenvector corresponding to a generalised eigenvalue problem is given as input. Taking the practical implementation into account, we introduce a negative self-loop in H for efficient computation, and justify its performance both in theory and in practice. Due to page limitations, proofs omitted from this section can be found in the appendix. 3.1. Preprocessing G and H In the preprocessing step, our algorithm first scales the weights of all the edges in G by the same factor, and adds self-loops to the resulting graph G. These two operations ensure that the constructed graph e G and H have the same degree sequence, while maintaining the optimal cut of the input instance. Scaling the graph G. For any c R+, we scale the edge weights of G by a factor of c and define G(c) = (V, E, c w), where (c w)e = c we for each edge e. By (1), we have that cut G(c) H (S, V \ S) = w G(c)(S, V \ S) w H(S, V \ S) = c w G(S, V \ S) w H(S, V \ S) = c cut G H(S, V \ S). Thus, the minimum cut problem for the scaled graph can be expressed as ΦG(c) H = c ΦG H. This equivalence indicates that choosing an appropriate scaling factor c is crucial for balancing the edge weights between G and H. To ensure that the degrees of the vertices in the scaled graph G(c0) do not exceed those in H, we define the scaling factor c0 by c0 = min v V ( deg H(v) deg G(v) This choice of c0 guarantees that for the scaled graph G(c0), the degree of each vertex v V satisfies that deg G(c0)(v) deg H(v) for all v V . With this scaling factor c0 established, we now proceed to study the properties of the graph G = (V, E, w) and its corresponding minimum cut value ΦG H in comparison to H = (V, E , w ), under the assumption that deg G(v) deg H(v) for all v V . Equalising the Degrees of G. To further refine this comparison, consider the subset of vertices V0 = v V | deg G(v) < deg H(v) . We now construct a new graph e G = (V, e E, ew), where e E = E {(v, v)}v V0, and the weight function ew is defined by ew |E= w and ewvv = deg H(v) deg G(v) Observe that the construction ensures that deg e G(v) = deg H(v) for all v V . Indeed, we have that deg e G(v) = X u: u v ewuv = X u: u v wuv + 2 X (v,v) ew(v, v) = deg G(v) + (deg H(v) deg G(v)) = deg H(v). This modification of G demonstrates that, despite the additional restriction imposed by c0, the problems remain equivalent. Specifically, we observe that ΦG H = min S V w G(S, V \ S) w H(S, V \ S) = min S V ew(S, V \ S) ew H(S, V \ S) = Φ e G H. Therefore, the generalized cut problem for G and H remains equivalent to that of e G and H, despite the degree adjustments made in e G. The following remark will be used in our analysis. Remark 3.1. For the weighted graph G = (V, E, w), the previous e G = (V, e E, ew) where e E includes additional self-loops at some vertices, and any function φ : V (G) = V ( e G) R, the following holds: adding self-loops does not affect the quadratic form associated with the graph Laplacian, i.e., φ, Gφ = φ, e Gφ . Specifically, u Gv |φ(u) φ(v)|2wuv u e Gv |φ(u) φ(v)|2 ewuv = φ, e Gφ . However, the addition of self-loops does affect the norm in the space of vertices: φ, φ ℓ2(V,w) = X v V φ(v)2 deg G(v) v V φ(v)2 deg e G(v) = φ, φ ℓ2(V, e w). Signed Laplacians for Constrained Graph Clustering Therefore, while the first quadratic form remains unchanged, the vertex norm in the extended space is generally increased due to the additional self-loops. 3.2. A Cheeger-type Inequality for Constrained Clustering Next we relate the constrained clustering problem to the generalised eigenvalue problem. Our key result is a Cheegertype inequality proving that the value of ΦG H can be upper bounded with respect to λ2( G H) and λ2( H), and the cut with the proven approximation guarantee can be found by a sweep-set algorithm. Our result is as follows: Theorem 3.2. Let G = (V, E, w) and H = (V, E , w ) be graphs such that deg G(v) = deg H(v) for all v V . Then, it holds that λ2( G H) λ2( H), (5) where H denotes the normalised Laplacian of the graph H, G H is the operator given by G H(x) = x, Gx and λ2 is the smallest non-trivial eigenvalue of the corresponding operator defined in (2). Moreover, the cut achieving this approximation guarantee can be found with a sweepset algorithm. Notice that we can assume that G is a connected graph. If H has m connected components, we will work with λm 1( H), i.e. x 1C for each connected component, ensuring x ker( H). Before presenting the proof, notice that we can assume without loss of generality that the graphs G and H have the same degree sequence due to the preprocessing step. Proof Sketch of Theorem 3.2. At a high level, our proof is similar with the one of the Cheeger inequality for graphs. We present the proof sketch here, and the full proof can be found in Appendix A. Step 1. Let x1 x2 . . . xn be the eigenvector associated with λ2( G H). It suffices to prove the existence of a non-empty, proper subset S V such that w G(S, V \ S) w H(S, V \ S) = 4 x, Hx 2 . (6) This suffices because the above inequality implies Step 2. We reduce the problem to proving (6) for a scaled version of x, defined as z = cx, where z2 n + z2 1 = 1. The scaling ensures the invariance of the expression: z, Gz z, Hz z, z z, Hz = c2 x, Gx c2 x, Hx c2 x, x c2 x, Hx x, Gx x, Hx x, x x, Hx . Thus, the next equation is sufficient to establish (6): w G(S, V \ S) w H(S, V \ S) = 4 z, Hz 2 . (7) Step 3. We now consider sweep sets St, defined as St = {v V | zv t}. To establish (7), it suffices to prove that there exists a threshold t0 [z1, zn] and defining S = St0: w G(St0, V \ St0) w H(St0, V \ St0) 4 z, Gz z, Hz z, z z, Hz . (8) Step 4. To establish (8), we will find a probability density function over [z1, zn], and prove E (w G(St, V \ St)) E (w H(St, V \ St)) 4 z, Hz 2 . (9) Thus, (8) holds because the existence of a threshold t0 is guaranteed by the next equation: w G(St0, V \ St0) w H(St0, V \ St0) E (w G(St, V \ St)) E (w H(St, V \ St)), and this implies ( w G(St0, V \ St0) w H(St0, V \ St0) 4 Step 5. To establish (9), it suffices to find a probability distribution over [z1, zn] such that E (w G(St, V \ St)) E (w H(St, V \ St)) u Gv |zu zv|(|zu| + |zv|)wuv u Hv |zu zv|2 2 wuv . (10) Using the Cauchy-Schwarz inequality, the definition of , and the property deg G(v) = deg H(v), we can derive (9) Signed Laplacians for Constrained Graph Clustering from (10) as follows: u Gv |zu zv| (|zu| + |zv|) wuv u Gv |zu zv|2 wuv X u Gv (|zu| + |zv|)2 wuv Thus, finding a distribution that satisfies (10) is sufficient to complete the proof. Step 6. We define a distribution, and choose t according to the probability density function 2|t|. Specifically, the probability that a value between [a, b] is chosen is P[t [zv, zu]] = Z zu zv 2|t|dt = sgn(zu) z2 u sgn(zv) z2 v. Since z2 1 + z2 n = 1, we have that P[t [z1, zn]] = 1. Step 7. For this distribution, and regardless of the sign of zu and zv, we have E [w G(St, V \ St)] = X u Gv P [zu t and t < zv] wuv u Gv |zu zv|(|zu| + |zv|)wuv, E [w H(St, V \ St)] = X u Hv P [zu t and t < zv] wuv This establishes (10) and concludes the proof. The complete details are in the appendix. Based on the proof of Theorem 3.2, to find the cut with the guaranteed approximation, we only need to order the vertices based on the entries of the eigenvector for the generalised eigenvalue problem, and construct n sweep sets. See Algorithm 1 for the formal description of our algorithm. Remark 3.3. Theorem 3.2 can be viewed as a generalisation of the classical Cheeger inequality for graphs (Chung, 1997). Specifically, if we consider the graph H as the complete Algorithm 1 The Constrained Clustering Algorithm Input: Graph G and graph H Output: A bi-partition of the vertex sets Compute the scaling factor c0 defined in (4). Scale all edge weights in G by multiplying them with c0. for each vertex v V do if deg H(v) > deg G(v) then Add a loop with weight 1 2(deg H(v) deg G(v)). end if end for Compute the Laplacians G and H for the graphs G and H. Solve the generalised eigenvalue problem f, Gf f, Hf subject to f 1, (11) where f is the eigenvector that minimises the ratio. Apply a sweep-set algorithm on the eigenvector f to partition the vertices of G into two clusters. Return: a bi-partition of the vertex set graph with wuv = 1 for all edges, then it is straightforward to show that min S V w G(S, V \ S) |S| |V \ S| 4 q where λ2( G) is the second smallest eigenvalue of the normalised graph Laplacian of G. Similarly, if we consider the graph H = (V, E , w H) as the complete graph with self-loops where w H uv = deg G(u) deg G(v) vol(G) , then min S V w G(S, V \ S) min(vol(S), vol(V \ S)) min S V vol(G) w G(S, V \ S) vol(S) vol(V \ S) 3.3. Practical Considerations The proof of Theorem 3.2 not only establishes the general cut bound but also provides a constructive method to find a subset S V that is close to minimising the generalised cut problem. However, this approach can be computationally expensive, particularly because the Laplacian H is not invertible. To address this issue, we modify the graph H by adding a negative self-loop at any vertex, effectively making the Laplacian invertible. This modification leverages the signed Laplacian, which adjusts the operator to ensure invertibility, and the introduction of a negative selfloop has little impact on the overall results, which will be demonstrated in Section 4 through experiments. Signed Laplacians for Constrained Graph Clustering Formally, we prove that adding a negative self-loop to H makes the signed Laplacian H α invertible, ensuring λ1( H α ) > 0. Since λ1( H α ) λ2( H), we can replace λ2( H) with λ1( H α ) in Theorem 3.2, maintaining the theorem s validity. Lemma 3.4. Let H = (V, E, w) be a weighted graph, and let H = (V, E , w ) be another weighted graph such that E = E {(v0, v0)}, where v0 is a vertex with a self-loop. Assume that w |E = w and consider the signature s = 1 for all E and s(v0,v0) = 1. Then, 0 < λ1( H α ) λ2( H), where H is the Laplacian of graph H. The proof of Lemma 3.4 shows that g, H α g g, Hg for a small weight in the self-loop, then we will solve (11) for the self-loop, because f, Gf f, H α f f, Gf The problem involves identifying the eigenfunction and eigenvalue of a linear operator using a Lagrangian-based framework. The Lagrangian L(φ, λ) is defined as L(φ, λ) = Gφ, φ λ( H α φ, φ 1), where φ is the function to be optimised, and λ is the Lagrange multiplier. To find the minimiser φ, we set the gradient of L with respect to φ to zero, i.e., φL(φ, λ) = 0. Expanding this condition yields that 2 Gφ 2λ H α φ = 0, which simplifies to Gφ = λ H α φ. This formulation leads to a generalised eigenvalue problem where φ is the eigenfunction, and λ is the eigenvalue. If H α is invertible, the equation can be reformulated as ( H α ) 1 Gφ = λφ, illustrating the relationship between the linear operators and providing a solution to the eigenvalue problem via the Lagrange multiplier method. This eigenvalue equation is crucial for extracting the optimal partitions of the graph based on the constraints encoded within H α . We prove that solving the generalised eigenvalue problem for G and H α produces all feasible solutions, as all eigenvalues are real and non-negative. This contrasts with the approach by Wang et al. (2014). Lemma 3.5. Let H α be the signed Laplacian of the weighted graph H , and G the normalized Laplacian of the weighted graph G. The operator ( H α ) 1 G is a positive, self-adjoint operator, with all its eigenvalues being real and non-negative. We remark that solving (11) becomes significantly more efficient for H α because it is symmetric positive definite and invertible. For dense matrices, this property allows for the use of the Cholesky decomposition, reducing the problem to a standard eigenvalue problem (Saad, 2011). This is an improvement over the general case for the QZ algorithm (the generalised Schur decomposition). Moreover, the Cholesky decomposition enhances numerical stability, leading to fewer round-off errors. For large, sparse graphs and positive definite operators, iterative methods such as the Lanczos algorithm could be used. The complexity of these solvers is O(nkm), where k is the number of eigenvalues to find and m related to the iterations required for convergence and the condition number. 4. Experiments We conducted experiments to compare the spectral clustering method with the constrained clustering approach, using both synthetic and real-world datasets. The clustering accuracy was evaluated using the Adjusted Rand Index (ARI). All simulations were run on a PC equipped with an Intel Core i7-10610U CPU running at 1.80 GHz and 32 GB of RAM, using MATLAB R2024a for computation. The three clustering algorithms compared in our experiments are as follows: SPECTRAL CLUSTERING (SC): We computed the normalized Laplacian G and used its second smallest eigenvector (Fiedler vector) for clustering the vertices. CONSTRAINED CLUSTERING (CC): We solved (11) and used the eigenvector corresponding to the smallest positive eigenvalue (excluding the trivial zero eigenvalue) for clustering. CONSTRAINED CLUSTERING WITH NEGATIVE SELFLOOPS (CC++): Our algorithm consisted in adding a negative self-loop and solved (11) for the signed Laplacian. 4.1. Stochastic Block Model We considered a binary Stochastic Block Model (SBM) with n = 1, 000 vertices divided into two equal-sized communities. Edges between vertices were generated based on intra-cluster probability p and inter-cluster probability q. Specifically, we fixed p = 0.2 and varied q from 0.12 to 0.2 in 30 equidistant steps. For each value of q, we generated two graphs generated from the SBM: G: a graph with intra-cluster edge probability p and intercluster edge probability q. H: a graph generated with intra-cluster edge probability q and inter-cluster edge probability p, effectively the complement of G in terms of edge probabilities. The vertex labels were kept consistent between G and H. Signed Laplacians for Constrained Graph Clustering This variation allows us to observe how the clustering performance changes as the distinction between communities becomes less pronounced (since higher q implies more intercluster edges). The experimental results, visualised in Figure 1, illustrate the superior performance of CC++ compared to traditional SC, particularly as the inter-cluster edge probability q increases. At low values of q both methods perform well, achieving near-perfect ARI values. As seen in Figure 1, both methods maintain the ARI values close to 1.0 when q 0.14. This is expected, as the strong intra-cluster edge probability p = 0.2 dominates the inter-cluster connections, making the community structure relatively easy to detect. However, as q increases, the performance of SC deteriorates rapidly. For instance, between q = 0.16 and q = 0.18, the ARI for SC drops sharply from approximately 0.7 to near 0.1. This decline occurs because higher values of q increase the number of inter-cluster edges, blurring the distinction between communities. SC relies solely on the structure of G, and struggles to correctly partition the vertices under these conditions. In contrast, both of CC and CC++ are significantly more robust to increasing q. Even as q approaches 0.17, the ARI remains above 0.5, significantly outperforming SC in this regime. This robustness stems from the ability of the generalised eigenvalue method to leverage the structural information of both G and H, the method mitigates the negative impact of increased inter-cluster edges, thus maintaining better clustering accuracy even when the community structure is less pronounced. This set of experimental results clearly demonstrates that the CC++ algorithm outperforms the traditional SC, particularly in challenging scenarios where the inter-cluster edge probability q is high. Figure 2 compares the mean execution time of SC, CC, and CC++ as the number of vertices increases. For smaller graphs, CC++ has nearly the same runtime as standard spectral clustering (SC), indicating minimal overhead from incorporating the second graph s constraints. As the graph size grows, however, CC++ clearly scales more efficiently than the original CC: the runtime of CC++ increases only modestly with n, remaining close to SC s runtime, whereas CC s runtime rises sharply. This demonstrates that CC++ retains the computational efficiency of SC while handling larger graphs far more effectively than CC, underscoring its superior scalability. 4.2. Varying Cluster Distance Based on the Geometric Random Graphs (RGG) (Avrachenkov et al., 2021; Dall & Christensen, 2002), we generated two clusters of vertices, 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 Inter-cluster Edge Probability (q) Adjusted Rand Index (ARI) Mean ARI vs q with Error Bars Figure 1. Mean ARI versus inter-cluster edge probability q with error bars. CC (red) and CC++ (blue) consistently outperform SC (yellow), especially as q increases. each containing 500 points, randomly distributed within two-dimensional circular regions (disks) of radius 0.2. The separation between the clusters centroids varied between -0.35 and 0.35, in 25 equidistant steps. This range allows us to simulate different levels of overlap between the clusters. Each configuration of cluster separation was repeated 10 times, and we generated G and H on the same set of vertices but with different connectivity structures: G: vertices within the same cluster were connected with a large radius rintra = 0.1, and vertices between clusters were connected with a smaller radius rinter = 0.05. H: the intra-cluster connection radius is reduced to rintra = 0.05, while the inter-cluster radius is increased to rinter = 0.1. Edges more likely connect vertices across the two clusters Figure 3 shows the RGG used in the experiments. The cluster distance is varied to simulate different levels of overlap between clusters. Figure 4 shows the ARI scores for both methods across different cluster distances. Each point represents the mean ARI, with error bars indicating the standard error across 10 repetitions. When the cluster distance is large (above 0.3), both methods achieve near-perfect ARI values. The separation between the clusters is clear, and both algorithms can detect the underlying structure accurately. As the clusters get closer, SC shows a significant drop in performance. For distances near zero, where clusters overlap, the ARI scores for SC drop to nearly zero, indicating its struggle to differentiate overlapping clusters. In contrast, the generalised eigenvalue method is more resilient to cluster overlap, maintaining significantly higher ARI values even when the clusters become indistinguishable by conventional means. This robustness Signed Laplacians for Constrained Graph Clustering 500 1000 1500 2000 Number of Vertices Mean Execution Time (seconds) Mean Execution Time vs Number of Vertices Figure 2. Mean execution time versus the number of vertices. The plot shows that CC++ (blue) performs similarly to SC (yellow) for smaller graphs but scales more efficiently than CC (red) as the number of vertices increases. Figure 3. Clusters generated by a RGG with varying separation. is due to the method s ability to leverage information from both graphs G and H, capturing both intraand inter-cluster relationships. 4.3. Experiments with Temperature Data We evaluated SC and CC++ on real-world temperature data from ground stations in Brittany, January 2014 (Girault, 2015). The experiments considered three data types: temperature, maximal temperature, and minimal temperature. Temperature values can be seen as graph signals (Ortega et al., 2018), with vertices representing readings. The objective is to cluster stations by proximity and similar temperature patterns, combining spatial and temperature data. This approach can be applied to identify micro climates (Cao et al., 2021) and segment regions for agricultural (Yao et al., 2022), where both location and temperature are important -0.3 -0.2 -0.1 0 0.1 0.2 0.3 Cluster Distance Adjusted Rand Index (ARI) Mean ARI vs Cluster Distance with Standard Error Bars Figure 4. Comparison of ARI vs Cluster Distance for SC, CC, and CC++. We construct a graph from the input data as follows: every station is represented as a vertex, and the edge weight is defined by spatial similarity via a Gaussian kernel: Wij = exp d(i,j)2 if d(i, j) σ2, and 0 otherwise, where d(i, j) is the Euclidean distance. Parameters were set to σ2 1 = 5 108 and σ2 = 105 (Girault, 2015). SC primarily grouped stations by proximity (Figure 5a). To construct the constraint graph H, we use the temperature values and an inverse Gaussian kernel, assigning edge weights close to 1 for large temperature differences and 0 for similar temperatures, analogous to how G is built using spatial proximity. This allows the clustering algorithm to incorporate both geographical and temperature constraints for a more nuanced partition. The output of CC++ is shown in Figure 5b, where only two cities differ from the clustering based on spatial proximity alone (SC). Finally, we compute the mean and standard error (SE) for every clustering method. For this specific hour, SC shows overlap between clusters temperature values, whereas CC++ achieves no overlap, indicating better clustering of stations with similar temperatures (Figure 5c). Figure 5. (a) Clustering based on SC, grouping stations by geographical proximity. (b) Clustering using CC++, which considers both location and temperature similarity. (c) Mean and SE of temperature, illustrating no overlap for CC++ compared to SC. Signed Laplacians for Constrained Graph Clustering We repeated this process for all available measurements, one per hour, across 24 hours over 31 days, totaling 744 observations. To evaluate clustering accuracy, we calculated the mean and SE for clustering using both methods. A separation was considered correct if the clusters values did not overlap, as determined by their SE. Table 1 summarises the results, showing the percentage of correct separations for each temperature data type when comparing SC and CC++, and demonstrates that CC++ consistently outperformed CC across all temperature data types. This can be attributed to the method s ability to leverage both the spatial structure of the stations and the actual temperature data. In contrast, SC, which relies solely on the graph structure, struggled to accurately separate stations into distinct clusters when the temperature differences were subtle. Table 1. Percentage of successful separation of regions using temperature and location data. Data Type SC CC CC++ Temperature 63.30% 79.16% 79.16% Maximal Temperature 62.90% 80.91% 81.04% Minimal Temperature 62.63% 79.16% 77.95% 5. Conclusion In this paper, we introduced a novel spectral method for the constrained clustering problem that incorporates MUSTLINK and CANNOT-LINK constraints on the same vertex set. We established a Cheeger-type inequality that provides a theoretical approximation guarantee, relating the optimal constrained cut value to the spectral properties of the graphs. Building on this insight, we proposed an efficient algorithm (CC++) that solves a generalised eigenvalue problem involving the signed Laplacian of the constraint graph, which significantly reduces computational complexity while maintaining accuracy. Empirically, CC++ demonstrated superior clustering performance on both synthetic and real-world datasets, particularly in challenging scenarios with weak or noisy cluster structure. It also achieved runtime comparable to standard spectral clustering (SC) and scaled much better than the original constrained clustering method (CC). These results highlight the significance of our proposed CC++ method as a principled and practical solution for constrained graph clustering, bridging the gap between theoretical guarantees and scalable real-world performance. Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here. Acknowledgements This work is supported by EPSRC Early Career Fellowship (EP/T00729X/1). Atay, F. M. and Liu, S. Cheeger constants, structural balance, and spectral clustering analysis for signed graphs. Discrete Mathematics, 343(1):111616, 2020. Avrachenkov, K., Bobu, A., and Dreveton, M. Higherorder spectral clustering for geometric graphs. Journal of Fourier Analysis and Applications, 27(2):22, 2021. Basu, S., Davidson, I., and Wagstaff, K. Constrained clustering: Advances in algorithms, theory, and applications. Chapman and Hall/CRC, 2008. Cao, J., Zhou, W., Zheng, Z., Ren, T., and Wang, W. Withincity spatial and temporal heterogeneity of air temperature and its relationship with land surface temperature. Landscape and Urban Planning, 206:103979, 2021. Chung, F. R. K. Spectral Graph Theory. American Mathematical Society, 1997. Cucuringu, M., Koutis, I., Chawla, S., Miller, G., and Peng, R. Simple and scalable constrained clustering: a generalized spectral method. In Artificial Intelligence and Statistics, pp. 445 454, 2016. Dall, J. and Christensen, M. Random geometric graphs. Physical review E, 66(1):016121, 2002. Girault, B. Stationary graph signals using an isometric graph translation. In 2015 23rd European Signal Processing Conference (EUSIPCO), pp. 1516 1520, 2015. Jia, Y., Hou, J., and Kwong, S. Constrained clustering with dissimilarity propagation-guided graph-laplacian pca. IEEE Transactions on Neural Networks and Learning Systems, 32(9):3985 3997, 2021. Knyazev, A. V. Signed Laplacian for spectral clustering revisited. ar Xiv preprint ar Xiv:1701.01394, 1, 2017. Koutis, I., Miller, G., and Peng, R. A generalized Cheeger inequality. Linear Algebra and its Applications, pp. 139 152, 2023. Lange, C., Liu, S., Peyerimhoff, N., and Post, O. Frustration index and Cheeger inequalities for discrete and continuous magnetic Laplacians. Calculus of variations and partial differential equations, 54:4165 4196, 2015. Signed Laplacians for Constrained Graph Clustering Lee, J. R., Gharan, S. O., and Trevisan, L. Multiway spectral partitioning and higher-order cheeger inequalities. Journal of the ACM (JACM), 61(6):1 30, 2014. Li, H., Sun, H., and Zanetti, L. Hermitian Laplacians and a Cheeger inequality for the Max-2-Lin problem. In 27th Annual European Symposium on Algorithms, pp. 71:1 71:14, 2019. Ortega, A., Frossard, P., Kovaˇcevi c, J., Moura, J. M., and Vandergheynst, P. Graph signal processing: Overview, challenges, and applications. Proceedings of the IEEE, 106(5):808 828, 2018. Saad, Y. Numerical methods for large eigenvalue problems: revised edition. SIAM, 2011. Trevisan, L. Is Cheeger-type approximation possible for nonuniform sparsest cut? Co RR, abs/1303.2730, 2013. von Luxburg, U. A tutorial on spectral clustering. Statistics and computing, 17:395 416, 2007. Wagstaff, K., Cardie, C., Rogers, S., Schr odl, S., et al. Constrained k-means clustering with background knowledge. In ICML, pp. 577 584, 2001. Wang, X. and Davidson, I. Flexible constrained spectral clustering. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 563 572, 2010. Wang, X., Qian, B., and Davidson, I. On constrained spectral clustering and its applications. Data Mining and Knowledge Discovery, 28:1 30, 2014. Yao, Y.-D., Li, X., Cui, Y.-P., Wang, J.-J., and Wang, C. Energy-efficient routing protocol based on multithreshold segmentation in wireless sensors networks for precision agriculture. IEEE Sensors Journal, 22(7):6216 6231, 2022. Signed Laplacians for Constrained Graph Clustering A. Omitted Details from Section 3 This section presents the details omitted from Section 3. Proof of Theorem 3.2. Our objective is to prove that λ2( G H) λ2( H). (12) Step 1. Let x : V R be the eigenfunction associated with λ2( G H). Without loss of generality, we renumber the vertices such that the coordinates of x satisfy x1 x2 . . . xn. It suffices to prove the existence of a non-empty, proper subset S V such that w G(S, V \ S) w H(S, V \ S) = 4 x, Hx 2 . (13) This is sufficient because ΦG H w G(S, V \ S) w H(S, V \ S) (Definition of ΦG H as the minimum ratio cut) x, Gx x, Hx x, x x, Hx (By (13)) x, Hx (Using x is the eigenvector of λ2( G H)) 1 λ2( H) (Since x, Hx x, x λ2( H)). This inequality proves that establishing (13) suffices to derive (12). Step 2. We now show that proving (13) for x is equivalent to proving the existence of S such that w G(S, V \ S) w H(S, V \ S) = 4 z, Hz 2 , (14) where z = cx is a scaled version of x, defined such that z2 n + z2 1 = 1. Because x 1, we have x1 < 0 < xn, and thus scaling x does not affect the ratio. Specifically, s z, Gz z, Hz z, z z, Hz = c2 x, Gx c2 x, Hx c2 x, x c2 x, Hx = x, Gx x, Hx x, x x, Hx . Thus, proving (14) suffices to establish (13). Step 3. Let t R, and define the set St = {v V | zv t}, commonly referred to as a sweep set. To establish (14), it suffices to prove that there exists a threshold to such that w G(Sto, V \ Sto) w H(Sto, V \ Sto) 4 z, Gz z, Hz z, z z, Hz . (15) Establishing S = St0, (15) directly implies (14). Signed Laplacians for Constrained Graph Clustering Step 4. To establish (15), it suffices to show that there exists a distribution f(t) over [z1, zn] such that the expectation satisfies E (w G(St, V \ St)) E (w H(St, V \ St)) 4 z, Hz 2 . (16) Here, St is treated as a random variable parameterized by t [z1, zn], where t is drawn from a probability density function f(t). St depends on the distribution of t, which influences the likelihood of including a vertex v in St based on its value zv. This suffices because, if we define φ = E (w G(St, V \ St)) E (w H(St, V \ St)) R, then, by linearity of expectation it holds that E [w G(St, V \ St) φ w H(St, V \ St)] = E [w G(St, V \ St)] φ E [w H(St, V \ St)] = 0. This implies that for some t0: P [w G(Sto, V \ Sto) φ w H(Sto, V \ Sto) 0] > 0, (17) because if for all t we have P [w G(St, V \ St) φ w H(St, V \ St) 0] = 0, it would follow that w G(St, V \ St) φ w H(St, V \ St) > 0 for all t, contradicting the fact that E [w G(St, V \ St) φ w H(St, V \ St)] = 0. Thus, (17) holds, which implies that for some t0 we have w G(St0, V \ St0) φ w H(St0, V \ St0) 0, and equivalently w G(St0, V \ St0) w H(St0, V \ St0) E (w G(St, V \ St)) E (w H(St, V \ St)). Combining this with (17), we obtain that P w G(St0, V \ St0) w H(St0, V \ St0) E (w G(St, V \ St)) E (w H(St, V \ St)) From Eq. (16), it follows that ( w G(St0, V \ St0) w H(St0, V \ St0) 4 Establishing (16) thus guarantees the existence of t satisfying (15). Step 5. To establish (16), it suffices to find a probability distribution over [z1, zn] such that the following inequality holds: E (w G(St, V \ St)) E (w H(St, V \ St)) u Gv |zu zv|(|zu| + |zv|)wuv u Hv |zu zv|2 2 wuv . (18) Signed Laplacians for Constrained Graph Clustering We now show that (18) implies (16) as follows: E (w G(St, V \ St)) E (w H(St, V \ St)) u Gv |zu zv|(|zu| + |zv|)wuv u Hv |zu zv|2 2 wuv (By (18)) u Gv |zu zv|2wuv q P u Gv(|zu| + |zv|)2wuv P u Hv |zu zv|2 2 wuv (Cauchy-Schwarz inequality) u Gv(|zu| + |zv|)2wuv P u Hv |zu zv|2 2 wuv (Definition of G) u Gv(z2u + z2v)wuv P u Hv |zu zv|2 2 wuv (Since (a + b)2 2(a2 + b2)) 4 z, z ℓ2(G) P u Hv |zu zv|2 2 wuv (Definition of the inner product in G) z, z ℓ2(G) 1 2 z, Hz (Definition of H) z, Gz z, z ℓ2(H) z, Hz 2 (Since deg G(v) = deg H(v)). Thus, (18) implies (16). Step 6. Consider the non-negative function 2|t| defined on the interval [z1, zn]. This function serves as a probability density function (PDF) over [z1, zn] because z2 1 + z2 n = 1, z1 is negative, and zn is positive. Specifically, Z zn z1 2|t| dt = sgn(zn) z2 n sgn(z1) z2 1 = 1. The normalization in step 2 ensures that the integral of 2|t| over [z1, zn] equals 1, validating it as a PDF. Hence, the probability that a value between [zu, zv] is given by P[t [zv, zu]] = Z zu zv 2|t|dt = sgn(zu) z2 u sgn(zv) z2 v. The expectation E [w G(St, V \ St)] represents the expected weight of the cut w G(St, V \St), which depends on the random threshold t sampled from the previously defined probability density function. By the linearity of expectation it holds that E [w G(St, V \ St)] = X u Gv E [1u St and v / St] wuv, where 1u St and v / St is the indicator function that equals 1 when u St and v / St, and 0 otherwise. Using the definition of probability, the expectation simplifies to E [w G(St, V \ St)] = X u Gv P [zu t and t < zv] wuv. If we can establish |zu zv|2 2 P [zu t and t < zv] (|zu| + |zv|)|zu zv|, (19) Signed Laplacians for Constrained Graph Clustering then we can bound the expectation of the weight of the cut as follows: E [w G(St, V \ St)] X u Gv (|zu| + |zv|)|zu zv|wuv. Similarly, for H, we have E [w H(St, V \ St)] = X u Hv P [zu t and t < zv] wuv 1 u Hv |zu zv|2wuv. These last two inequalities establish (18). Therefore, to conclude the proof, it remains to prove (19). Step 7. To prove (19), recall that P [zu t and t < zv] = Z zu zv 2|t| dt = ( |z2 u z2 v| if sgn(zu) = sgn(zv), z2 u + z2 v if sgn(zu) = sgn(zv). We first establish the upper bound in (19): P [zu t and t < zv] (|zu| + |zv|)|zu zv|. (20) Case 1: sgn(zu) = sgn(zv). In this case, we have |z2 u z2 v| = |(zu + zv)(zu zv)| = |zu + zv||zu zv|. Since |zu + zv| |zu| + |zv|, we have |z2 u z2 v| (|zu| + |zv|)|zu zv|. Case 2: sgn(zu) = sgn(zv). In this case, we have z2 u + z2 v (zu zv)2 = |zu zv|2, and thus z2 u + z2 v (|zu| + |zv|)|zu zv|. Combining both cases establishes the upper bound in (20). Now, we establish the lower bound in (19): 2 P [zu t and t < zv] . (21) Case 1: sgn(zu) = sgn(zv). In this case we have z2 u z2 v = |zu zv| |zu + zv| . Since |zu + zv| |zu zv|, we have z2 u z2 v |zu zv|2 (zu zv)2 Case 2: sgn(zu) = sgn(zv). Using 0 (zu + zv)2 = 2(z2 u + z2 v) (zu zv)2, it follows that (zu zv)2 2 z2 u + z2 v. Combining both cases establishes the lower bound in (21). Finally, combining (20) and (21) proves (19), completing the proof. Signed Laplacians for Constrained Graph Clustering Proof of Lemma 3.4. Let f be the eigenfunction corresponding to λ2( H). Consider the function g : V R, defined as: gv = fv fv0, for all v V , hence gv0 = 0. By computing the Rayleigh quotient for g with respect to the Laplacian of H: g, Hg H = X u Hv |gu gv|2wuv = X u Hv |fu fv|2wuv = λ2( H) f, f H. The Rayleigh quotient for g with respect to H is given by g, H g H = X u H v |gu gv|2w uv + 2|gv0|2w v0v0 = X u Hv |fu fv|2wuv = g, Hg H. The norm of g with respect to H satisfies that v V |gv|2 deg H (v) = X v V |gv|2 deg H(v) + g2 (v0)(deg H(v0) + 2) = g, g H. Thus, we have, since f is an eigenfunction f 1, hence 0 = f, 1 H: v V |fv fv0|2 deg H(v) = f, f H 2fv0 f, 1 H + f 2 v0 1, 1 H f, f H. Using this, we apply the Rayleigh quotient to bound λ1( H α ) as λ1( H α ) g, H g H g, g H g, Hg H g, g H λ2( H) f, f H f, f H = λ2( H). Finally, let f be a non-zero function. Consider the following expression: f, H α f = X u v |fu fv|2wuv + 2f 2 v0wv0v0. We have 0 f, H α f , and equality holds if and only if: f, H α f = 0 fu fv = 0 u, v and fv0 = 0. This implies that f = 0 for all v V , which contradicts the assumption that f is a non-zero function. Therefore, λ1( H α ) > 0. Proof of Lemma 3.5. Since H α is positive-definite and self-adjoint, its inverse ( H α ) 1 and its square root ( H α )1/2 exist and are self-adjoint operators. Define the operator C as C = ( H α ) 1/2 G( H α ) 1/2. Self-adjointness of C: The adjoint of C is C = ( H α ) 1/2 G( H α ) 1/2 = ( H α ) 1/2( G) ( H α ) 1/2 = ( H α ) 1/2 G( H α ) 1/2 = C, since H α and G are self-adjoint operators. Positive Semi-definiteness of C: For any function φ ℓ2(V, w), consider Cφ, φ = D ( H α ) 1/2 G( H α ) 1/2φ, φ E . Let ψ = ( H α ) 1/2φ. Then, Cφ, φ = Gψ, ψ . Since G is positive semi-definite, we have Gψ, ψ 0. Therefore, Cφ, φ 0, which means C is positive semi-definite. Signed Laplacians for Constrained Graph Clustering Because C is self-adjoint and positive semi-definite, it is diagonalizable with real, non-negative eigenvalues. Thus, there exists an orthogonal matrix W and a diagonal matrix Λ with non-negative entries such that We can express ( H α ) 1 G as ( H α ) 1 G = ( H α ) 1/2 ( H α ) 1/2 G( H α ) 1/2 ( H α )1/2 = ( H α ) 1/2C( H α )1/2 = ( H α ) 1/2WΛW ( H α )1/2. Let V = ( H α ) 1/2W. Then, ( H α ) 1 G = V ΛV . Since V is invertible (as the product of invertible matrices), ( H α ) 1 G is diagonalizable with real, non-negative eigenvalues. This completes the proof. B. Additional Experiments from Section 4 In this section, we present additional experiments to evaluate the performance of our algorithm. To provide comprehensive analysis, we include comparisons with the following method FLEXIBLE CONSTRAINED SPECTRAL CLUSTERING (FC): This method is presented in Wang & Davidson (2010). As in Subsection 4.1, we evaluate the performance of clustering methods using synthetic graphs generated by the stochastic block model (SBM). We analyse the algorithms for graphs of varying sizes, focusing particularly on smaller graphs, where subtle variations are more prominent. For larger graphs, the performance trends tend to stabilize and exhibit fewer differences. In these experiments, graphs were generated with the number of nodes n varying across four sizes: n = 250, 500, 750, 1000. The results are summarised in Figure 6, which illustrates the performance of the four clustering methods Spectral Clustering (SC), Constrained Clustering (CC), Constrained Clustering with Self-loops (CC++), and Flexible Clustering (FC) for varying inter-cluster edge probabilities q and different graph sizes. Based on the results presented in Figure 6, we highlight the following key observations and advantages of our method compared to the baseline approaches: Improved Performance on Smaller Graphs: Our method demonstrates superior performance on smaller graphs (n = 250, 500) in terms of the mean Adjusted Rand Index (ARI), as shown in panels (a) and (b). As the graph size increases (n = 750, 1000), the performance of our approach becomes comparable to that of the other methods, indicating that our algorithm is robust across different graph sizes. Parameter-Free Advantage: The Flexible Clustering (FC) method presented in Wang & Davidson (2010) requires the user to define an additional parameter (β) that directly influences the solution of the generalized eigenvalue problem. This parameter must be carefully chosen to ensure at least one feasible solution exists, as incorrect parameter selection can result in negative eigenvalues and infeasible outcomes. In contrast, our method avoids this issue entirely. As shown in Lemma 3.5, the introduction of self-loops ensures that the operator ( H α ) 1 G is positive and self-adjoint, with all eigenvalues guaranteed to be real and non-negative. Cheeger-type inequality: Unlike the FC method, which does not provide a theoretical guarantee linking the eigenfunctions used for clustering to the optimization objective, our approach establishes a Cheeger-type inequality. Signed Laplacians for Constrained Graph Clustering 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Inter-cluster Edge Probability (q) Adjusted Rand Index (ARI) Mean ARI vs q (n = 250) CC++ CC SC FC (a) n = 250 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Inter-cluster Edge Probability (q) Adjusted Rand Index (ARI) Mean ARI vs q (n = 500) CC++ CC SC FC (b) n = 500 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Inter-cluster Edge Probability (q) Adjusted Rand Index (ARI) Mean ARI vs q (n = 750) CC++ CC SC FC (c) n = 750 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Inter-cluster Edge Probability (q) Adjusted Rand Index (ARI) Mean ARI vs q (n = 1000) CC++ CC SC FC (d) n = 1000 Figure 6. Mean Adjusted Rand Index (ARI) as a function of inter-cluster edge probability q for four clustering methods. Each panel represents a different graph size: (a) n = 250, (b) n = 500, (c) n = 750, and (d) n = 1000.