# generalizing_constraint_models_in_constraint_acquisition__f6b01e97.pdf

Generalizing Constraint Models in Constraint Acquisition

Dimos Tsouros1, Senne Berden1, Steven Prestwich2 Tias Guns1

1Department of Computer Science, KU Leuven, Belgium 2School of Computer Science and Information Technology, University College Cork dimos.tsouros@kuleuven.be, senne.berden@kuleuven.be, s.prestwich@cs.ucc.ie, tias.guns@kuleuven.be

Constraint Acquisition (CA) aims to widen the use of constraint programming by assisting users in the modeling process. However, most CA methods suffer from a significant drawback: they learn a single set of individual constraints for a specific problem instance, but cannot generalize these constraints to the parameterized constraint specifications of the problem. In this paper, we address this limitation by proposing GENCON, a novel approach to learn parameterized constraint models capable of modeling varying instances of the same problem. To achieve this generalization, we make use of statistical learning techniques at the level of individual constraints. Specifically, we propose to train a classifier to predict, for any possible constraint and parameterization, whether the constraint belongs to the problem. We then show how, for some classes of classifiers, we can extract decision rules to construct interpretable constraint specifications. This enables the generation of ground constraints for any parameter instantiation. Additionally, we present a generate-and-test approach that can be used with any classifier, to generate the ground constraints on the fly. Our empirical results demonstrate that our approach achieves high accuracy and is robust to noise in the input instances.

Code https://github.com/Dimosts/Gen Con Models Extended Version https://arxiv.org/abs/2412.14950

Introduction Constraint Programming (CP) is considered one of the main paradigms for solving combinatorial problems in AI. It provides powerful modeling languages and solvers for decisionmaking, with many successful applications (Wallace 1996; Simonis 1999). In CP, the user declaratively states the constraints over a set of decision variables, thereby defining the feasible solutions to their problem. A solver is then used to generate a solution. However, modeling a new application as a constraint problem requires significant expertise, which is a barrier to the wider use of CP (Freuder and O Sullivan 2014; Freuder 2018). This has motivated the development of methods to assist the user in the modeling process (De Raedt, Passerini, and Teso 2018; Freuder 2018; Kolb 2016; Lombardi and Milano 2018). This is

Copyright 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

the focus of the research area of Constraint Acquisition (CA) (Bessiere et al. 2017), which has been identified as an important topic for CP (De Raedt, Passerini, and Teso 2018) and as progress toward the Holy Grail of computer science (Freuder 2018). In CA, constraints are either learned passively from a set of known solutions and (optionally) nonsolutions (Beldiceanu and Simonis 2012; Berden et al. 2022; Prestwich et al. 2021) or actively through interaction with a user (Bessiere et al. 2023; Tsouros and Stergiou 2020, 2021). Recent advancements in both passive and active acquisition systems show significant potential (Prestwich et al. 2021; Prestwich and Wilson 2024; Tsouros, Berden, and Guns 2024). For instance, a recent application of interactive CA in a real-world scheduling problem was presented in (Barral et al. 2024). However, a significant limitation of most (passive and active) CA systems is that they only learn the ground constraints of a specific problem instance (Arcangioli, Bessiere, and Lazaar 2016; Bessiere et al. 2017; Prestwich 2021; Prestwich et al. 2021; Tsouros, Stergiou, and Sarigiannidis 2018), while in practice it is common for the instance at hand to change over time. To accommodate these changes, well-known constraint modeling languages like Mini Zinc (Nethercote et al. 2007) and CPMpy (Guns 2019) allow the use of parameters in the constraint model. An assignment of values to the parameters then instantiates the ground constraints for a given instance of the problem. For example, in exam timetabling with the requirement exams of courses that are given in the same semester should be scheduled on different days , the actual ground constraints will be instantiated based on the following parameters: which semester each course belongs to, and how many timeslots are available per day. Achieving the same ability to generalize the constraint models learned using CA over different instances of the same problem class has been identified as a key challenge for CA (Simonis 2023). In the literature on CA, there are only a few works that support generalization. The first approach for generalizing constraints was introduced in interactive CA (Bessiere et al. 2014; Daoudi et al. 2015), to enhance the acquisition process within a single instance. Although it targets withininstance generalization, this approach can be used across instances. Another approach, called extrapolation, was re-

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)

cently explored (Prestwich 2022). This method requires learning ground constraints for multiple instances, which are then extrapolated to a new instance with different parameters. A limitation is that because it is based on genetic programming, it tends to learn contrived, non-interpretable expressions. A third approach is to learn interpretable parameterized forms of constraints during the acquisition process (Lallouet et al. 2010; Kumar, Kolb, and Guns 2022), given examples from different instances of the problem, with the corresponding instance parameters also explicitly given. In (Lallouet et al. 2010) an inductive logic programming approach is used to learn rules of the form condition constraint. On the other hand, COUNT-CP (Kumar, Kolb, and Guns 2022) first learns instance-specific constraints, which are then grouped to obtain first-order constraints that can generalize to unseen instances. In the literature, there is no standardized method for representing parameterized constraint specifications. As a result, the different approaches mentioned target different generalizations based on pre-defined partitions of variables. This prevents them from capturing the diverse range of constraint specifications that may exist in CP models. Finally, methods relying on symbolic search can be expected to struggle in the presence of noise in the input ground constraints, which can obscure the underlying patterns they search for. In this paper, we first formalize the elements of a parameterized constraint specification, capturing the requirements of a constraint problem in a generic and parameterized way. Then, we propose GENCON, a novel approach to learn such constraint specifications from the ground constraints of given instances of the problem. To achieve that, our method leverages the capabilities of statistical machine learning (ML) to learn complex functions from labeled data. In more detail, our contributions are:

We introduce an approach for generalizing from one or more given problem instances using constraint-level classification, and we present a parameterized feature representation to capture the constraint specifications. For classifiers that allow decision rule extraction, we present a method to translate them into interpretable constraint specifications. These can then be used to generate ground constraints for new problem instances. To enable the use of our method with classifiers that do not allow rule extraction, we present a generate-and-test approach that can be used with any classifier. We conduct a comprehensive experimental evaluation of our methods, using different classifiers across a range of problem classes and instances. We also show that our approach is robust to noise in the given ground constraints.

We now introduce the necessary concepts used in the paper.

Constraint Satisfaction Problems A constraint satisfaction problem (CSP) is a triple P = (V, D, C), defining:

a set of n decision variables V = {v1, v2, ..., vn}, representing the entities of the problem,

a set of n domains D = {Dv1, Dv2, ..., Dvn}, with Dvi Z being the domain of vi V , a constraint set C = {c1, ..., ct}. over the variables in V . A constraint c is a pair (rel(c), var(c)), where var(c) V is the scope of the constraint, and rel(c) is a relation over the domains of the variables in var(c), restricting their allowed value assignments. The arity of the constraint, denoted as |var(c)| = arity(rel(c)), indicates the number of variables involved. The set of solutions of a constraint set C is denoted by sol(C).

Constraint Acquisition In Constraint Acquisition (CA), the pair (V, D) is called the vocabulary of the problem at hand and is common knowledge shared by the user and the system. Besides the vocabulary, the learner is also given a language Γ consisting of a broad range of fixed arity constraint relations that may exist in the problem at hand. Using the vocabulary (V, D) and the constraint language Γ, the system generates the constraint bias B, which is the set of all expressions that are candidate constraints for the problem. The (unknown) target constraint set CT is a constraint set such that for every example e it holds that e sol(CT ) iff e is a solution to the problem the user has in mind. The goal of CA is to learn a constraint set CL that is equivalent to the unknown target constraint set CT .

Machine Learning Classification ML classification is a supervised learning task that involves learning a function over a given dataset. The dataset, denoted as E, is a collection of N training examples, E = {(x1, y1), (x2, y2), ..., (x N, y N)}. Each example is a pair (xi, yi), where xi is a feature vector from the input space X and yi is the corresponding class label from the output space Y . The feature vector xi is composed of m features, xi = (ϕi1, ϕi2, . . . , ϕim), with each feature ϕij being a (quantifiable) property of example i. In the case of classification, Y is a set of possible class labels. An ML classifier aims to learn a function fθ : X Y , using a set of learnable parameters θ. These parameters are adjusted during training to minimize a loss function L(fθ(x), y) measuring the error between the predicted and actual class labels.

Decision Rules With the rising importance of explainable AI (XAI) and interpretable ML, various approaches focus on extracting decision rules from ML models (Gilpin et al. 2018). These approaches represent the function fθ : X Y with if-then rules, denoted as a set R = {r1, r2, ..., rk}. Each decision rule ri is a pair (Qi, yi), with Qi being a set of conditions and yi a class label. Each condition qij Qi is a function qij : X {0, 1} that maps an example x to a binary value indicating whether the given example satisfies the condition. For a rule ri = (Qi, yi) to be satisfied, all of its conditions need to be satisfied, i.e., qij(x) = 1 | qij Qi.

Problem Definition Constraint problems are often not thought of as a single CSP, but as a set of requirements, with the specific instantiation of the ground CSPs depending on the values of some input parameters P. We illustrate this with the following example, which we will also use as a running example.

Example 1. Consider a simplified exam timetabling problem with s semesters and n courses per semester. The goal is to schedule the courses exams over d days, each having t timeslots. The parameters of the problem are P = {s, n, d, t}. The requirements are that all exams must be scheduled in different timeslots, while exams of courses from the same semester must be on different days. Different values for the parameters will lead to different ground CSPs for each problem instance. The parameters n and s determine the variables V of the problem, while d and t determine their domains D. The parameter values also determine the set of constraints C. The problem contains different_day constraints, which are defined over partitions of variables for courses in the same semester, and the allowed assignments depend on the timeslots per day t.

Definition 1. A parameterized constraint problem consists of a set of parameters P = {p1, p2, . . . , pq}, and a function mapping each parameter instantiation PA = {(p1, u A1), (p2, u A2), . . . , (pq, u Aq)} onto a ground CSP with (VA, DA, CA). The resulting tuple (PA, VA, DA, CA) is a problem instance.

Most CA techniques are aimed at learning the constraint set CT of a single ground CSP, from examples of solutions and non-solutions of that instance. However, in learning, one is often interested in generalizing beyond the instance used for learning, across other instances of the same underlying parameterized constraint problem.

Definition 2. Given one or more problem instances described by tuples (PA, VA, DA, CA), the objective of generalization is to construct a function F such that, for any target problem instance with a vocabulary (VT , DT ), defined by a unique set of parameter values PT = {(p1, u T 1), (p2, u T 2), . . . , (pq, u T q)}, F(PT , VT , DT ) will return the corresponding set of constraints CT . That is, the aim is to learn the function F from the given ground CSPs, to be able to accurately determine the set of constraints CT for any target instance with parameters PT .

Constraint Specifications

Being able to generalize constraint models involves finding such a function F (definition 2). Typically, in constraint problems, such a function F can be decomposed into several inner functions which we model as constraint specifications (CSs) each corresponding to a specific requirement of the problem; for example all courses must be scheduled in a different timeslot . The complete set of constraints CT of an instance T is then the union of the sets of constraints produced by each inner function. Each requirement is modeled by a constraint specification, which defines how to derive the pairs (rel(c), var(c)) for any target instance T, using the parameter values PT and the corresponding vocabulary (VT , DT ). We consider the following three key elements of a CS:

1. Constraint relation. The relation rel(c) of each constraint in this CS, which may optionally depend on parameters that determine constant values involved in the relation.

2. Variable partition(s). Typically, a pattern that appears in a constraint model concerns certain partitions of variables of the problem and is applied to sequences of variables in this partition. Such partitions can be the dimensions (e.g., rows and columns) of the (multi-dimensional) matrix the variables are given in, or based on latent dimensions in this tensor. 3. Sequence conditions. These define which scopes within a partition of variables to apply the constraints to. It is common to have a constraint apply to all possible scopes in a partition. This is done by using the sequence all_pairs for binary constraints, or more generally, the sequence combinations, to find combinations of size arity(r) (the arity of the given relation). However, there may also be sequence conditions, restricting the variable combinations that should be taken as scopes.

Using these three key elements, we now formally define constraint specifications:

Definition 3. A constraint specification (CS) is a triple (r, G, S), defining

a relation r, along with any parameters defining its constants, a variable partition function G : VT P(VT ), that partitions a given set of variables VT into subsets based on certain characteristics, a set of sequence conditions S restricting the scopes to which the constraints are applied.

A CS can generate the corresponding ground constraints of an instance T using the following generator template:

Foreach Y P(V T): Foreach scope combinations(Y, arity(r), S): c (rel(c) = r, var(c) = scope)

Example 2. Consider a problem with the requirement Values of consecutive variables in the same row must differ . The CS modeling this requirement uses the = relation. The different partitions used in the CS are the rows of the variable matrix. The CS alsone needs to express the sequence condition that the variables need to be consecutive, as it does not apply to all pairs of variables in the same row. In a modeling language, this requirement would be modeled as:

Foreach row all_rows: Foreach v1, v2 consecutive_pairs(row): c v1 = v2

This is equivalent to our formally defined CS generator, where P(VT ) corresponds to all_rows and consecutive_pairs corresponds to combinations with arity 2, and sequence condition column(v1) - column(v2) == 1.

Generalizing Constraint Models To generalize beyond one or more known instances and learn the CSs of a problem, we propose an approach named GENCON. The key idea is to use ML to identify patterns in the constraints of the known instance(s) and reconstruct the CSs

of the problem. This is especially promising because, recently, an approach using probabilistic classification during active CA (Tsouros, Berden, and Guns 2024) demonstrated that ML classifiers can effectively detect patterns within the learned constraint network. GENCON is shown in Figure 1. The given set of ground constraints in the input instance(s) is used in order to train a classifier to predict for any constraint whether it belongs to the set of constraints of any target instance of the problem. For this, we use a (parameterized) feature representation of the constraints, whose design is inspired by the different elements of CSs discussed in the previous section. For classes of classifiers allowing the extraction of decision rules, we directly translate these rules to the CSs of the problem, which can produce the ground constraints of any target instance. When decision rules cannot be extracted, a generate-and-test approach is used instead.

Building the Dataset We build a dataset on the constraint level, i.e., its examples correspond to individual constraints. Given a set of constraints CA for a problem instance A, and a distinct set of constraints C A, consisting of constraints that are not part of the model, we define a dataset E, wherein each training example is represented as a tuple (xi, yi), corresponding to a constraint ci C. For each example (xi, yi), we have xi = ϕσ(ci, P), which is a parameterized feature representation of constraint ci, and yi = [ci CA], a Boolean label that indicates whether ci is part of the set of true constraints or not. E = {(xi, yi) | xi = ϕσ(ci, P) yi = [ci CA],

ci {CA C A}}, (1)

However, realistically, we may only have access to the set of true constraints CA for each problem instance. But we also need a set of constraints C A consisting of constraints that will have a negative label, for the classifier to learn how to distinguish between the classes. To produce this set, we first generate a set of constraints BA, using as a language Γ all relations detected in the given set of constraints CA, i.e, Γ = {rel(c) | c CA}. The bias BA is created by applying each relation in Γ to all possible scopes in VA. The set C A then consists of all constraints in BA that are not part of the given instance(s), i.e., C A = BA \ CA.

Parameterized Feature Representation In our approach, we propose a framework for the (parameterized) feature representation of constraints, targeted at learning patterns based on the different elements of CSs discussed above. For any constraint c, the classifier expects a fixed-size feature representation ϕ(c) as input. As shown in the middle of Figure 1, this feature representation ϕ(c) is then transformed to a parameterized version, denoted by ϕσ(c), to be able to learn patterns across instances with different parameter values.

Feature Representation. The feature representation ϕ(c) must be designed based on the different elements of CSs that we want the classifier to detect in the problem, i.e., the

relations, partitioning functions, and sequence conditions. Hence, it must contain features that describe the constraint relations, variable characteristics that can be used to recognize partitions of the variables, and other attributes that can play a role in the sequence conditions. Based on this, we construct a feature representation consisting of three groups of features:

1. Relation features: Features that capture properties of the relation rel(c) of a given constraint c, along with numerical values of the constants present in the constraint. 2. Partitioning features: Features describing whether the variables in the scope of constraint c have characteristics in common that can be used in the partitioning function. These characteristics can be problem-specific variable properties (e.g., in what semester a course takes place in exam timetabling), or based on information regarding the structure the variables were given in. For example, in many cases, the variables V are given in the form of a matrix or tensor, and the position of each variable in this tensor often plays a crucial role in the partitioning function of the CSs. 3. Conditioning features: These features describe how the variables in the scope of the constraint relate in different ways, to capture sequence conditions that may exist in the CSs of the problem. For example, a constraint may only apply to pairs of variables that are a certain distance away from each other in the variable tensor. Thus, the distance between the variables in a constraint s scope may be included as a conditioning feature. Note that the partitioning features can also be used to capture the sequence conditions, since they describe whether the variables share a certain property or not. For example, a sequence condition may state that the variables must not be part of the same row in the variable matrix.

The grammar of relations, partitioning functions, and sequence conditions used can be considered as the inductive bias of our method. The feature representation needs to be able to capture the CSs existing in the problem at hand. In our implementation, a proof-of-concept feature representation was used based on structural properties of the variables matrix, as matrix modeling is common and beneficial in CP (Flener et al. 2001), with no problem-specific variable attributes.1 In more detail: 1. We used 3 relation features, describing the name of the constraint relation and its constant values. 2. As candidate attributes for partitioning, we used the indices of the variables in the dimensions of the tensor they were given in, along with latent dimensions that may be discovered using the problem parameters. We include one partitioning feature for each (latent) dimension, expressing whether all variables in the constraint s scope share the same index in them. 3. As additional conditioning features, we used the average difference between the variable indices in each dimension and latent dimension.

1More information regarding the feature representation used can be found in the appendix (in the extended version).

Figure 1: GENCON: Generalizing constraint models through constraint classification, using a parameterized feature representation of constraints

From numerical to categorical features over parameters. As the goal is to generalize beyond a single problem instance, the feature representation of the constraints should capture the characteristics of the constraints in a generic, parameterized way. Numerical attributes of the constraints typically are not static across instances but depend on parameters of the problem; e.g., in our running example, the constant present in the different_day constraints depends on the timeslots-per-day parameter and is not a static value. We want the classifier to be able to capture that. In this step of our approach, we thus replace numerical features with categorical features over the parameters. We do so using a numerical-to-categorical parameter mapping function σ : R { Na N } PA (where PA is the list of named parameters and their value), defined as follows:

σ(v) = pi v = ui | (pi, ui) PA Na N otherwise. (2)

For any numerical feature value that corresponds to a parameter value of the problem instance, function σ replaces the feature by the corresponding parameter s name. Note that, in a constraint model, it is sometimes not the value u of parameter p that comes up directly in the features. Instead, a trivial arithmetic adaptation of the parameter value may be used, e.g., u 1, u + 1 or the multiplication of two parameter values. To capture this, we extend the set of parameters P with these adaptations, along with the common basic constants 0 and 1, as is also done in COUNT-CP (Kumar, Kolb, and Guns 2022). For the new categorical features, there are thus |P| + 1 categories (where P is the extended set of parameters): one for every parameter, plus a Na N in case none of the parameters match the given value. Our categorical features will thus be able to represent the constraint in a parameterized way. Although arbitrary constants cannot be captured this way, we make the assumption that every constant present is related to the parameters of the problem. Also note that, when parameterizing the feature representation of a constraint, a single numerical feature value might correspond to multiple parameter values. When this occurs, one example is included in the dataset for each possible matching. Although this could add noise to the dataset, due to examples with a wrong parameter replacing the numerical feature, it ensures that the correct feature representations will definitely be included.

Extracting Constraint Specifications Decision rules continue to be popular due to their interpretability, with methods existing to extract rules from various classes of classifiers (Barakat and Bradley 2010; Iqbal 2012), in addition to traditional rule-based classification methods. The goal is to derive a set of rules R = {r1, r2, . . . , rk} that represent the classification function. Each rule ri specifies some conditions Qi on a subset of features, and a class label yi, such that Qi yi. In our context, rules that lead to a positive classification define the conditions for a constraint to be part of the target problem. Thus, these conditions can be converted into the CSs of the problem. The extracted CSs can then be used to generate the constraints of any given target instance. We now propose a method for extracting the interpretable CSs of the problem from such a set of learned decision rules. Our approach focuses on the positive-classification rules, iterating over their conditions to identify the relation, partitioning function, and sequence conditions of each CS. Our method is shown in more detail in Algorithm 1. First, the rules leading to a positive classification are extracted in Rpos (line 1). Then, for each rule r Rpos (line 3), a CS is constructed. The elements of the CS are first initialized (lines 5-7), and then the algorithm iterates over the rule conditions in Q to construct the CS (line 8) as follows: 1. Relation Extraction: Identify conditions in Q related to relation features (RF) (lines 10-11). These conditions determine which relations from Γ are used, and which constants are used in these relations, if any. 2. Partitioning Function (lines 12-14): Identify the partitioning function of the CS using conditions in Q involving partitioning features (PF). Use conditions that require certain characteristics to be equal in the constraint s variables, and thus can be used to partition the variables based on them. 3. Sequence Conditions (lines 15-18): Sequence conditions can be derived from both partitioning features and sequence condition features (SF). More concretely, if a condition Q involves a partitioning feature, and requires certain characteristics to not be equal, then this requirement is added to the sequence condition (lines 15-16). If a condition in Q involves a sequence condition feature, it is also added to the sequence conditions (lines 17-18). Finally, if more than one relation is allowed by the rule s conditions, we create one CS for each, retaining the partitioning function and sequence conditions (lines 19-21).

Algorithm 1: Extracting Constraint Specifications

Input: R: a set of decision rules, Γ: a set of relations, RF: a set of relation features, PF: a set of partitioning features, SF: a set of sequence condition features Output: CS: a set of constraint specifications 1: Rpos {r R | y(r) = True} 2: CS 3: for all r Rpos do 4: Q conditions(r) 5: rel Γ 6: partition 7: seq cond 8: for all q Q do 9: f feature(q) 10: if f RF then 11: rel rel {γ Γ | q is satisfied by γ} 12: else if f PF then 13: if value(q) = True then 14: partition partition {q} 15: else 16: seq cond seq cond {q}

17: else if f SF then 18: seq cond seq cond {q}

19: for all γ rel do 20: cs create CS(γ, partition, seq cond) 21: CS CS {cs}

22: return CS

Example 3. Recall the exam timetabling problem from Example 1, which has two requirements: all courses must be scheduled in different timeslots and exams of courses from the same semester must be scheduled on different days . In Figure 2, we can see a decision tree learned to classify constraints in this problem, using our parameterized feature representation. Recall that parameter t represents the timeslots per day. We can extract the following decision rules from this tree by following the paths from the root to the leaves:

r1: Relation == "different_day" & Dim0_same == "false" then 0 r2: Relation == "different_day" & Dim0_same == "true" & Constant_parameter != "t" then 0 r3: Relation == "different_day" & Dim0_same == "true" & Constant_parameter == "t" then 1 r4: Relation == "!=" then 1 Rules r3, r4 are the positive-classification rules, which will be used to construct our CSs. The two CSs that will be extracted, along with their generators, are: 1. CS1: Relation: different day(t) , Partitioning attribute: dim0 same, Sequence conditions: Foreach row all_rows: Foreach scope all_pairs(row): c (rel(c) = "different_day(t)", var(c) = scope)

Relation == different day

Dim0 same == false class: 0

Dim0 same == true

Constant parameter != t class: 0

Constant parameter == t class: 1

Relation == != class: 1

Figure 2: Decision Tree for Exam Timetabling in Example 3

2. CS2: Relation: != , Partitioning attribute(s): None, Sequence conditions: Foreach scope all_pairs(V): c (rel(c) = "!=", var(c) = scope)

Generate-and-Test

To enable the use of our method even when decision rules cannot be extracted, we now present a generate-and-test approach that can be used with any classifier, as an alternative to extracting the CSs from interpretable classifiers. The intuition of this approach is the following: Even if we cannot extract the CSs from the learned classifier fθ explicitly, we know that it implicitly represents them. Thus, we can use the classifier itself to recognize the true constraints for any problem instance. Our generate-and-test approach does so by generating a set of candidate constraints BT for the target problem T, using the language Γ as described above. For relations with constants, the set P provides candidate values. To decide which of the constraints from BT to use, each of them is featurized, and the function fθ predicts whether it should be part of the model. We keep all constraints with positive classification:

CT = {c | c BT fθ(ϕσ(c, P)) = True} (3)

Experimental Evaluation

We now experimentally evaluate GENCON, using ground CSPs of different instances on a variety of benchmarks. We evaluate our approach both when the given sets of constraints are correct and when noise exists. Noisy CSPs can result when the ground CSPs were themselves acquired using passive CA, on a noisy dataset of solutions and nonsolutions, or on a dataset containing too few examples. We recognize two different types of noise in our setting:

1. False positive (FP) noise, where the input set of constraints is not sound, also including wrong constraints. 2. False negative (FN) noise, where the input set of constraints is not complete, missing some true constraints.

We aim to answer the following experiment questions:

(Q1) To what extent does GENCON effectively generalize ground CSPs? (Q2) What is GENCON s performance when the input set of constraints also includes wrong constraints? (Q3) What is GENCON s performance when the input set of constraints does not include all true constraints?

Experimental Setup

Benchmarks. We focused on using benchmarks that have different constraint specifications so that our method is evaluated in distinct cases. Namely, we used the following benchmarks that are commonly used in CA: Sudoku, Golomb, Exam Timetabling (ET) and Nurse Rostering (NR). In each benchmark, we used 10 instances with different parameters.2 We employed a challenging variant of leave-one-out cross-validation, referred to as leave-one-in cross-validation: for each fold, we used just a single instance for training and the remaining nine instances for testing. We present the average results of this process. Metrics. We evaluate each method by identifying the correctly generated constraints and the number of constraints missing from the model of the target instance(s). Note that, for a constraint to have a positive ground truth label, our evaluation did not only check if a constraint is part of the ground truth model but also if it is logically implied by it. Based on that, we define as True Positives (TP) the correctly identified constraints, as False Positives (FP) the incorrectly identified constraints, and as False Negatives (FN) the missing constraints. Using the defined concepts, our evaluation is based on the following metrics that are common in ML:

Precision (Pr): It measures the accuracy of the identified constraints in the target instance.A high precision score signifies a low rate of false positives. When the precision score is 100%, the predicted set of constraints is sound. Recall (Re): It measures the method s ability to identify all relevant constraints.A high recall score indicates a low rate of false negatives. When the recall score is 100%, the predicted set of constraints is complete.

Comparison. To obtain the CSs of the problem from extracted decision rules, we used Decision trees (DT) and the rule-based classifier CN2. Then, these CSs were used to generate the ground CSPs of the target instances. We also evaluated the generate-and-test approach with a variety of classifiers: Random Forests (RF), Naive Bayes (NB), Multi-layer Perceptron (MLP), and K-Nearest Neighbours (KNN). We used CN2, DT, RF, and NB with their default parameters and tuned the most important hyperparameters for MLP and KNN2. We compare our method with the generalization approach used in COUNT-CP (Kumar, Kolb, and Guns 2022). For the experiments that evaluate the impact of noise, we changed the ground CSPs for the input instances, injecting noisy constraints in them. We evaluated our method on 4 different levels of noise (5%, 10%, 15%, 20%) w.r.t. the original size of the input set of constraints CA. To inject FP noise, we randomly add the respective percentage of constraints

2Details can be found in the appendix.

(a) Precision

Figure 3: Results comparing our method (using different classifiers) with COUNT-CP generalization

from the set constraints C A in CA, while for FN noise, we directly remove constraints randomly from CA. Implementation and hardware All experiments were conducted on a system with an Intel(R) Core(TM) i7-2600 CPU, 3.40GHz clock speed, with 16 GB of RAM. All methods and benchmarks were implemented in Python. We used the CPMpy library (Guns 2019) for constraint modeling, and the Scikit-Learn library (Pedregosa et al. 2011) for the ML classifiers, except CN2, for which the Orange library (Demˇsar et al. 2013) was used. For COUNT-CP, in the available implementation3 the generalization is mixed with learning, so we re-implemented it stand-alone.

Results Q1: To what extent does GENCON effectively generalize ground CSPs? Figure 3 shows the results of GENCON using different classifiers, and of COUNT-CP s generalization. Both the extraction of CSs for DT and CN2, as well as generate-and-test for the other classifiers, achieve high precision and recall in all benchmarks. The only exception is NB, which gets lower recall in ET and NR, and significantly lower precision in Sudoku. We believe that this is due to its feature independency assumption, which makes it hard for the classifier to recognise the relationship of the different features in the CSs. The benchmark that turned out to be the most difficult for all methods was ET, with the difficulty being recognizing the parameter of the CS regarding the different_day constraints. The problem occurred in instances with many parameters with the same value, with all

3https://github.com/ML-KULeuven/COUNT-CP

(a) Precision with FP noise

(b) Recall with FP noise

(c) Precision with FN noise

(d) Recall with FN noise

Figure 4: (a) and (b) Results with the presence of FP noise in the input constraint model, (c) and (d) Results with the presence of FN noise in the input constraint model (best viewed in color).

of them being recognized as part of the different_day CS. This led to the generation of additional constraints in the target instances, lowering the precision. COUNT-CP also demonstrates good performance in general. Besides ET, where it presents the same issue as our method, it achieves 100% precision in the other three benchmarks. However, its main drawback is illustrated in the recall results, as it was not able to capture all CSs in NR and Sudoku. In NR, the CSs regarding consecutive shifts cannot be captured, as COUNT-CP does not include sequence conditions in its generalization approach, and only searches for patterns that apply in all sequences of predefined partitions. In Sudoku, the block partitions are not automatically found. COUNT-CP includes an option to manually give custom partitions as input, and in this case, its recall in Sudoku is increased to 100%. Notably, when we include special features for these custom partitions in our approach, the results with all classifiers also increase to 100%.

Q2: What is the impact of FP noise in the performance of GENCON? Due to space limitations, we present the average results over all benchmarks for each method.4 The precision results are shown in Figure 4a, while the recall results are shown in Figure 4b. We can observe that the CN2 classifier (and thus also the CSs extracted from it) and NB are the most sensitive to false positives, presenting increasingly worse precision scores when noise increases. This is because the learning approach of CN2 is not very tolerant to this kind of noise, overfitting in many cases to non-existing patterns. When any of the other classifiers are used in GENCON, their performance remains about as good as in the noiseless setting. The classifiers that already presented high scores in Q1 stay around 95-100%, even when the noise percentage reaches 20%. Similarly, the COUNT-CP generalization approach keeps the same performance as in the original results without noise. That is because it only searches for specific partition patterns and symbolic expression bounds, and thus the randomly inserted constraints are directly disregarded.

Q3: What is the impact of FN noise on the performance of GENCON? As in Q2, we present the average results

4Detailed results per benchmark can be found in the appendix.

over all benchmarks. The precision results are shown in Figure 4c, while the recall results are shown in Figure 4d. We can observe that KNN is the most sensitive to FN noise, with worsening recall when more noise is added, meaning that it struggles to find all the constraints of the target instance. For CN2, the lack of noise tolerance shows up again, as in Q2, though precision and recall stay above 80%. When any of the other classifiers is used, results remain good, even for up to 20% noise. These results demonstrate the ability of our classification-based approach to generalize even in the presence of high percentages of noise. On the other hand, in the presence of false negatives, the COUNT-CP generalization fails to detect any patterns and does not find any constraints in the target instances, as it searches for partitions in which all sequences of variables share a given constraint. Importantly, COUNT-CP fails to generalize, even when only 5% noise is added.

Conclusions CP models are typically defined by parameterized specifications, rather than a flat list of ground constraints. However, most CA methods focus on learning a single ground CSP for a specific instance. Our work addresses this limitation by generalizing ground CSPs to parameterized models using a constraint-level classification approach named GENCON. We showed how interpretable CSs can be derived from decision rules, and introduced a generate-and-test method for non-interpretable classifiers. Our evaluation indicates that GENCON achieves high accuracy and robustness, even for high levels of noise, highlighting the potential of ML-based techniques for generalizing constraint models and making CA more robust. We recommend using decision trees as the classifier of choice, as they facilitate the extraction of interpretable CSs while presenting strong performance. Promising avenues for future work include exploring active learning to enhance generalization; and using generalization during interactive constraint learning to reduce queries, leveraging also the noise robustness demonstrated here. Additionally, GENCON can be applied during passive CA, enabling the learning of constraint models from a limited amount of solutions and non-solutions across various instances, a scenario common in real-world applications.

Acknowledgments

This research received funding from the European Research Council (ERC) under the EU Horizon 2020 research and innovation programme (Grant No. 101002802, CHAT-Opt); the Science Foundation Ireland under Grant No. 12/RC/2289-P2 which is co-funded under the European Regional Development Fund; and the Research Foundation Flanders (FWO) (Grant No. 11PQ024N).

References Arcangioli, R.; Bessiere, C.; and Lazaar, N. 2016. Multiple Constraint Acquisition. In 25th International Joint Conference on Artificial Intelligence. Barakat, N.; and Bradley, A. P. 2010. Rule extraction from support vector machines: a review. Neurocomputing, 74(13): 178 190. Barral, H.; Gaha, M.; Dems, A.; Cˆot e, A.; Nguewouo, F.; and Cappart, Q. 2024. Acquiring Constraints for a Nonlinear Transmission Maintenance Scheduling Problem. In International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Springer. Beldiceanu, N.; and Simonis, H. 2012. A model seeker: Extracting global constraint models from positive examples. In Principles and practice of constraint programming, 141 157. Springer. Berden, S.; Kumar, M.; Kolb, S.; and Guns, T. 2022. Learning MAX-SAT Models from Examples using Genetic Algorithms and Knowledge Compilation. In 28th International Conference on Principles and Practice of Constraint Programming (CP 2022). Bessiere, C.; Carbonnel, C.; Dries, A.; Hebrard, E.; Katsirelos, G.; Narodytska, N.; Quimper, C.-G.; Stergiou, K.; Tsouros, D. C.; and Walsh, T. 2023. Learning constraints through partial queries. Artificial Intelligence, 319: 103896. Bessiere, C.; Coletta, R.; Daoudi, A.; Lazaar, N.; Mechqrane, Y.; and Bouyakhf, E.-H. 2014. Boosting Constraint Acquisition via Generalization Queries. In ECAI, 99 104. Bessiere, C.; Koriche, F.; Lazaar, N.; and O Sullivan, B. 2017. Constraint acquisition. Artificial Intelligence, 244: 315 342. Daoudi, A.; Lazaar, N.; Mechqrane, Y.; Bessiere, C.; and Bouyakhf, E. H. 2015. Detecting types of variables for generalization in constraint acquisition. In 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), 413 420. IEEE. De Raedt, L.; Passerini, A.; and Teso, S. 2018. Learning constraints from examples. In Proceedings in Thirty-Second AAAI Conference on Artificial Intelligence.

Demˇsar, J.; Curk, T.; Erjavec, A.; Gorup, ˇC.; Hoˇcevar, T.; Milutinoviˇc, M.; Moˇzina, M.; Polajnar, M.; Toplak, M.; Stariˇc, A.; et al. 2013. Orange: data mining toolbox in Python. the Journal of machine Learning research, 14(1): 2349 2353.

Flener, P.; Frisch, A.; Hnich, B.; Kiziltan, Z.; Miguel, I.; and Walsh, T. 2001. Matrix modelling. In Proc. of the CP-01 Workshop on Modelling and Problem Formulation, 223. Freuder, E. C. 2018. Progress towards the Holy Grail. Constraints, 23(2): 158 171. Freuder, E. C.; and O Sullivan, B. 2014. Grand challenges for constraint programming. Constraints, 19(2): 150 162. Gilpin, L. H.; Bau, D.; Yuan, B. Z.; Bajwa, A.; Specter, M.; and Kagal, L. 2018. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), 80 89. IEEE. Guns, T. 2019. Increasing modeling language convenience with a universal n-dimensional array, CPpy as pythonembedded example. In Proceedings of the 18th workshop on Constraint Modelling and Reformulation at CP (Modref 2019), volume 19. Iqbal, M. R. A. 2012. Rule extraction from ensemble methods using aggregated decision trees. In Neural Information Processing: 19th International Conference, ICONIP 2012, Doha, Qatar, November 12-15, 2012, Proceedings, Part II 19, 599 607. Springer. Kolb, S. M. 2016. Learning constraints and optimization criteria. In Workshops at the Thirtieth AAAI Conference on Artificial Intelligence. Kumar, M.; Kolb, S.; and Guns, T. 2022. Learning Constraint Programming Models from Data Using Generate And-Aggregate. In 28th International Conference on Principles and Practice of Constraint Programming (CP 2022). Schloss Dagstuhl-Leibniz-Zentrum f ur Informatik. Lallouet, A.; Lopez, M.; Martin, L.; and Vrain, C. 2010. On Learning Constraint Problems. In Proceedings of the IEEE International Conference on Tools With Artificial Intelligence, 45 52. Lombardi, M.; and Milano, M. 2018. Boosting Combinatorial Problem Modeling with Machine Learning. In Lang, J., ed., Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 1319, 2018, Stockholm, Sweden, 5472 5478. Nethercote, N.; Stuckey, P. J.; Becket, R.; Brand, S.; Duck, G. J.; and Tack, G. 2007. Mini Zinc: Towards a Standard CP Modelling Language. In International Conference on Principles and Practice of Constraint Programming. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vander Plas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikitlearn: Machine Learning in Python. J. Mach. Learn. Res., 12: 2825 2830. Prestwich, S. 2021. Unsupervised Constraint Acquisition. In 33rd International Conference on Tools with Artificial Intelligence. Prestwich, S. 2022. Extrapolating Constraint Networks by Symbolic Classification. In 5th IJCAI Workshop on Data Science Meets Optimization.

Prestwich, S. D.; Freuder, E. C.; O Sullivan, B.; and Browne, D. 2021. Classifier-based constraint acquisition. Annals of Mathematics and Artificial Intelligence, 1 20. Prestwich, S. D.; and Wilson, N. 2024. A Statistical Approach to Learning Constraints. International Journal of Approximate Reasoning, Special Issue on Synergies Between Machine Learning and Reasoning, 171. To appear. Simonis, H. 1999. Building industrial applications with constraint programming. In International Summer School on Constraints in Computational Logics, 271 309. Springer. Simonis, H. 2023. Requirements for Practical Constraint Acquisition. In AAAI 2023 Bridge on Constraint Programming and Machine Learning. Tsouros, D. C.; Berden, S.; and Guns, T. 2024. Learning to Learn in Interactive Constraint Acquisition. In Wooldridge, M. J.; Dy, J. G.; and Natarajan, S., eds., Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, February 20-27, 2024, Vancouver, Canada, 8154 8162. AAAI Press. Tsouros, D. C.; and Stergiou, K. 2020. Efficient multiple constraint acquisition. Constraints, 25(3): 180 225. Tsouros, D. C.; and Stergiou, K. 2021. Learning Max-CSPs via Active Constraint Acquisition. In 27th International Conference on Principles and Practice of Constraint Programming (CP 2021). Schloss Dagstuhl-Leibniz-Zentrum f ur Informatik. Tsouros, D. C.; Stergiou, K.; and Sarigiannidis, P. G. 2018. Efficient Methods for Constraint Acquisition. In 24th International Conference on Principles and Practice of Constraint Programming, Lecture Notes in Computer Science vol. 11008, 373 388. Wallace, M. 1996. Practical applications of constraint programming. Constraints, 1(1-2): 139 168.