# learningbased_abstractions_for_nonlinear_constraint_solving__c21209ab.pdf Learning-Based Abstractions for Nonlinear Constraint Solving Sumanth Dathathri 1, Nikos Arechiga2, Sicun Gao3, and Richard M. Murray1 1Computing and Mathematical Sciences, California Institute of Technology 2Toyota Info Technology Center USA 3Computer Science and Engineering, University of California, San Diego We propose a new abstraction refinement procedure based on machine learning to improve the performance of nonlinear constraint solving algorithms on large-scale problems. The proposed approach decomposes the original set of constraints into smaller subsets, and uses learning algorithms to propose sequences of abstractions that take the form of conjunctions of classifiers. The core procedure is a refinement loop that keeps improving the learned results based on counterexamples that are obtained from partial constraints that are easy to solve. Experiments show that the proposed techniques significantly improve the performance of state-of-the-art constraint solvers on many challenging benchmarks. The mechanism is capable of producing intermediate symbolic abstractions that are also important for many applications and for understanding the internal structures of hard constraint solving problems. 1 Introduction Constraint satisfaction problems play a key role in diverse applications, ranging from planning and scheduling for AI systems to program analysis. However, the constraints that arise for real-world systems are often nonconvex, nonlinear, and lack structure that can be exploited, making these problems prohibitively computationally expensive. For instance, if the constraints contain only polynomial arithmetic, then the constraint-solving problem is decidable [Tarski, 1998], but the algorithms are doubly exponential [Brown and Davenport, 2007]. If the constraints contain transcendental functions, such as trigonometrics or exponentials, then the constraint solving problem is undecidable. A relaxed version of the problem that admits approximate solutions, however, is NP-complete [Gao et al., 2013]. Several approaches have been proposed to improve performance for discrete constraint satisfiability problems using decompositions, including [Darwiche, 2001; Bacchus et al., 2009]. Recent work in this direction is [Friesen and Domingos, 2015] which proposes problem-decomposition approach sdathath@caltech.edu where the original nonconvex optimization problem is decomposed into subproblems that are approximately independent. In contrast, our approach first decomposes the problem and then uses a learning procedure to find simpler abstractions of the problem that are more tractable to solve. The main problem with finding these interpolants which serve as simplifications through syntactic manipulations [Albarghouthi and Mc Millan, 2013; Drews and Albarghouthi, 2016] is that they are typically slower and difficult to scale. However, a concern with using learning-based approaches for constraint solving problems is finding provable relaxations or restrictions of the original problem. Our approach decomposes a set of nonlinear arithmetic constraints into subsets and learns simpler relaxations and restrictions for them. To learn these simplifications, we begin by computing a large number of satisfying and falsifying instances for each subset of constraints using a sampling procedure. For a given set of samples, we use a semi-soft support vector machine (SVM) to learn asymmetric classifiers as candidate antecedents and consequents. The semi-soft SVM is embedded into a refinement loop where a reduced constraint problem is solved to ensure that the candidate antecedent and consequent are indeed a provable antecedent and a provable consequent. The simpler learned antecedents can be used to find a satisfying instance, and the learned consequents can be used to demonstrate that no satisfying instance exists. Other potential applications of the abstraction learning framework proposed here could be in finding Craig interpolants and reachability analysis for dynamical systems. Other attempts at using learning to find interpolants in domains such as program analysis and safety-analysis for hybrid systems [Sharma et al., 2012; Ar echiga et al., 2015] do not address the problem of scalability, which is the main focus of our work. We show that our techniques significantly improve the performance of constraint solving on various challenging benchmarks, including model-predictive control of Dubins car, finding valid encoder expressions for FPGA design, and geometric reasoning in high-dimensional spaces. The paper is structured as follows. First, we provide an overview of the problem statement and terminology used in the paper. Next, we describe our asymmetric non-strict classifier, which is used to learn abstractions of sets of constraints. Subsequently, we describe the heuristics that we use to de- Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) compose the constraints into subsets for abstraction. Lastly, we describe our results from experiments and conclude. 2 Background Constraint solving over real numbers. Suppose a set of constraints A(x) = {A1(x) . . . , Am(x)} is given, where x represents a vector of variables. The real-valued constraint solving problem, also called satisfiability modulo theory of the reals, is to compute a real-valued vector ˆx that simultaneously satisfies all of the constraints in A or to prove that no such vector exists. If a satisfying instance ˆx is found, then we say that the constraint set is satisfiable. If no such satisfying instance exists, then we say that the constraint set is unsatisfiable. Antecedents and consequents. Let A(x) be a logical formula with a vector of free variables x. A logical formula α(x) is an antecedent of A(x) if x . α(x) = A(x). This means that the values of x that satisfy α are a subset of the values of x that satisfy A, so we will also call α(x) an underapproximator of A(x). Conversely, a formula γ(x) is a consequent of A(x) if x . A(x) = γ(x). In this case, all values of x that satisfy A(x) also satisfy γ(x), so we refer to γ(x) as an overapproximator of A(x). If {A1, . . . , Ak} is a set of constraints, we say that α(x) is an antecedent for the set of constraints if it is an antecedent for the conjunction, i.e. x . α(x) = A1 Ak. (1) The consequent of a set of constraints is similarly defined to be a logical consequence of the conjunction of the constraints. 3 Faster Solving With Abstractions Our approach is to decompose the original set of constraints A = {A1, . . . , Am} into subsets A1, . . . , Ak, and then learn antecedents and consequents for each subset. We then use these antecedents and consequents to solve the original constraint satisfaction problem. The details of our learning procedure will be described in Section 4, but for now we simply note that it relies on obtaining satisfying and falsifying instances and learning classifiers from them that are antecedents or consequents. Let αj be the learned antecedent for the subset Aj, and let γj be its consequent. Then, we solve the sets of constraints α = {α1, . . . , αj} and γ = {γ1, . . . , γj}. If the antecedents α are satisfiable by some instance ˆx, then ˆx satisfies the original constraint set, by definition of antecedence. Conversely, if the consequents γ are unsatisfiable, then the original set A is also unsatisfiable. For if A were satisfiable, then each Aj would be satisfiable, and so would each γj, by the definition of consequence. Alternatively, if α is unsatisfiable and γ is satisfiable, then we cannot conclude anything about A. We must refine our antecedents and consequents and try again. Our technique provides performance improvements under the following premises: 1. The subsets of constraints Aj are simple enough that they can be solved quickly and repeatedly to obtain satisfying and falsifying instances. 2. The learned antecedents and consequents are simple enough that checking the properties of antecedence and consequence is fast. 3. The learned antecedents and consequents are accurate enough to establish the existence or non-existence of a solution. The above requirements are inherently conflicting. Greater simplification is obtained by considering larger subsets Aj, but these larger subsets are harder to solve quickly to obtain samples. Similarly, simpler antecedents and consequents will be easier to check, but will provide poorer approximations to solve the original problem. To illustrate the proposed approach, consider the problem of solving the constraint sets A and B shown in Figure 1. The region arising from constraint set A is underapproximated by the region A . The constraint set A is now solved in conjunction with B to find a solution satisfying A and B. The learned constraint set A has a simpler form, enabling faster computation. 4 Learning Abstractions This section describes an approach for conservative learning of constraints with SVMs [Cortes and Vapnik, 1995]. First, we use a sampling procedure to generate satisfying and falsifying instances for the set of constraints. The proposed sampling procedure (Algorithm 1, described in Section 4.2) uses an existing constraint solver, and is able to generate samples while keeping the constraint queries small. Next, we use a semi-soft SVM with boosting to generate a constraint that serves as either a candidate antecedent or consequent. This classifier is refined using an abstraction refinement loop until the SMT solver is able to verify that the candidate constraint is a true antecedent or consequent. This constraint is a conjunction of classifiers, and the procedure is described in Algorithm 2. First we present the formulation of the semi-soft SVM followed by the approach for boosting, where the approximation is learned as a conjunction of classifiers. Figure 1: Illustration of the approach Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Algorithm 1: Sampling Procedure Input : Set of constraints A(x), k, l, m, R1, R2, . . . , Rn Output: Sets of points X+ satisfying A(x) 1 X+ = , X+ = ; 2 for j=1:m do 3 Sample p A(x) s.t. s X+.d(p, s) > ri+1 ; 4 X+ = X+ p ; 6 for i=1:l-1 do 8 for q X+ do 10 for j=1:m do 11 Sample p A(x) s.t. d(p, q) < ri and d(p, s) > ri+1. s X+; 12 X+ = X+ p ; 14 X+ = X+ X+ ; 16 X+ = X+ ; 18 X+ = X+ ; 19 return X+ ; Algorithm 2: Algorithm to learn abstractions for sets of nonlinear arithmetic constraints Input : Set of constraints A(x), parameters m, k Output: g(x) such that A(x) = g(x) > 0 1 Sample set of points X+ with A(x) as input to Algorithm 1 ; 2 Sample set of points X with A(x) as input to Algorithm 1 ; 3 Initialize g0(x) = TRUE , iter = 0; 4 Set X = X+ X ; 5 Map the X to a higher-dimension feature space ; 6 Cluster points in X into N clusters ; 7 Learn a classifier separating each cluster in X from X+ using the semi-soft SVM; 8 Set g(x) = g0(x) conjunction of the N classifiers ; 9 if A(x) = g(x) > 0 iter k ; 11 return g(x); 12 else if iter k then 13 iter=iter+1; 15 g0 = g(x) ; 17 Sample a set of points X with (A(x) = g(x) > 0) as input to Algorithm 1; 18 X+ = X+ X ; 19 go to 4 ; 4.1 Semi-Soft SVM As a supervised learning approach, SVMs have been known to perform effectively in learning classifiers. Two types of SVM are commonly employed: hard margin and soft margin. Hard-margin SVM approaches do not allow for any misclassification, but may be infeasible if the data points are not linearly separable. On the other hand, soft-margin SVM approaches allow for misclassification of some points, but they do not serve our purpose. Since in general an antecedent will need to exclude all falsifying instances and a consequent will need to include satisfying instances, soft-margin SVMs are not suitable for our purposes. Instead, we use a semi-soft margin SVM, where we allow only samples of one type to be misclassified. When searching for an antecedent (underapproximation of satisfying instances), we allow some positive examples to be misclassified. Similarly when searching for a consequent, we allow some negative examples to be misclassified, resulting in an overapproximation of the satisfying instances. Asymmetric SVMs have been of interest to minimize cases of falsenegatives (or false-positives) in certain applications. Some efforts directed at asymmetric learning [Wu et al., 2008; Wu et al., 2013] aim to minimize the false-negatives but for learning consequents we seek to eliminate false-negatives. Use of a semi-soft SVM yields a quadratic program. To abstract the constraints Aj, we label its satisfying instances as +1 and its falsifying instances as 1. To compute a consequent, we set up the optimization problem to necessarily yield positive labels for the positive instances, while simply making a best effort attempt to provide negative labels for the negative instances. Let X+ be the set of satisfying instances of the subset of constraints Aj(x) and X the set of falsifying instances of Aj(x). X+ and X are obtained by the sampling procedure described in Section 4.2. We will search for a consequent by the following optimization problem, which learns the classifier g(x) = sgn(w0 + w T x): min w,w0,eb 1 2 w2 0 + w T w + λ s.t y wo + w T x b 1 eb, y+ wo + w T x+ 1, {x 1 , x 2 , . . . , x N } = X , b = 1, 2, . . . , N , where N = |X | and N = |X+ X |. Here eb are the slack variables allowing for misclassification of the points in X . Notice that this deviates from the standard soft-margin SVM in the sense that only one type of data-points are assigned slack-variables. The labels y+ = +1 and y = 1 correspond to points in X+ and X respectively. Note that this optimization problem provides hard constraints for classifying positive points, but soft constraints for classifying Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) negative points. Proposition 1 For any given sets of points X+ and X , the semi-soft SVM as formulated in (2) finds a hyperplane w0 + w T x = 0 such that x+ X+, w0 + w T x+ > 0. Proof For x+ X+, by the constraint, y+ (wo + w+x+) 1, we have w0 + w T x+ 1 > 0. The minimization problem in equation (2) is a quadratic program (QP). A solution to the constraints of the QP is w0 = 1, w = 0 and eb = 2, implying that the QP has a non-empty feasible set. Therefore, equation (2) always returns a hyperplane that has the above property of correctly classifying the points in X+. The case for computing an antecedent can be derived in a straightforward manner, by requiring that the classifier correctly provide negative labels for the negative instances, while simply making a best effort attempt to provide positive labels for the positive instances. 4.2 Sampling for Learning A key aspect of this problem is sampling points that reasonably cover both the regions satisfying the constraints and the regions that do not satisfy the constraints to learn meaningful classifiers. One such sampling strategy follows. We will use a distance metric d( , ) to enforce spacing between samples. For the experiments in this paper, the distance function is the ℓ2-norm. We begin by choosing a large radius R0, and sample a set of points C = {c1, . . . , ck} that are separated by a distance of at least R0. These samples will serve as centers for circles that we will sample from at the next iteration. Now, consider a smaller radius R1(< R0). Define circles with radius R0 around each of the sampled points Ωi = {x|d(x, ci) R0}. In each Ωi, sample points that are at least R1 apart. In this manner we can iterate through a sequence of decreasing radii R2, R3 . . . Rl by defining circles around the sampled points with radius Ri and sampling points from these circles that are at least Ri+1 apart and so on. The radii Rk are chosen so that the points sampled in a particular layer are distributed around the feasible region (the intersection of the feasible region for the problem and the circles defined by us) for that layer. Given a fixed number of samples m to be sampled in a layer, the optimal radius for that layer can be determined by a bisection search to find approximately the largest radius that gives m samples per layer. This implies that there exists no other point in the feasible region that can be sampled that is at a distance greater than rk from the other sampled m points. Further increasing the radius would imply that we cannot find m such points and if we can find more than m points, we increase the radius until only m are found. This approach provides good coverage in our experiments. However, even if the initial samples are poorly distributed, the CEGAR loop described in Section 4.3 iteratively searches for antecedents and consequents, and only terminates once provable abstractions are found. As a result, the quality of the initial samples only has an effect on performance, but does not compromise soundness. The approach is more formally described in Algorithm 1. The term Sample in Algorithm 1 refers to a query to the constraint solver to find a feasible point satisfying the constraints. Though Algorithm 1 provides no formal guarantees about m samples being found during each iteration, the learning can be performed with fewer than m samples. When the constraints have no solution and no sample can be found, the classifier learned is false. Besides good coverage, an advantage of this approach is that it prevents the explosive growth of the formula used to sample using an SMT solver. Once the first set of samples are generated, the procedure is parallelized to generate samples in the balls around each of these points since the sampling in the individual balls is independent of that in the others. 4.3 Counterexample Guided Abstraction Refinement(CEGAR) Counterexample-guided abstraction refinement (CEGAR) was first proposed in [Clarke et al., 2000] for refining abstractions of control structures in programs. We develop our algorithm in the paradigm of CEGAR. Once the classifier g(x) has been learned by the semisoft SVM, it still remains to be validated that the classifier learned is an overapproximation (underapproximation) of the constraint-set we seek to abstract. Let Ai be the conjunction of the constraints in Ai. To verify that the classifier learned is indeed a consequent (antecedent) of the constraint-set Ai(x), we need to check the consequence condition Ai(x) = g(x) > 0 (for showing antecedence g(x) > 0 = Ai(x)). If the negation of this formula is not satisfiable, i.e there is no assignment to x over which the formula interprets to false, we have learned a consequent γi(x) (g(x) > 0) (respectively, antecedent αi(x) (g(x) > 0)). Otherwise, we sample points that violate the formula as proposed in Algorithm 1, add them to the training set, and learn a new classifier. This procedure is iteratively repeated until we have learned a consequent (antecedent). Note that although Algorithm 2 may fail to terminate, it will terminate only if it has found a consequent for Ai(x). 4.4 Boosting In some problems, the CEGAR procedure might fail to converge after a large number of iterations or learn poor approximations of the original problem. This can occur if the constraints to be approximated represent regions in space that cannot be approximated by a single classifier. This section describes how to sequentially learn classifiers that in conjunction form a better relaxation. As a first step, the falsifying points are partitioned into clusters using a clustering algorithm (K-means for all examples presented in this paper). A classifier is learned from each cluster separating it from all the feasible points. These classifiers in conjunction form a first approximation of the constraints. Next, a few iterations of CEGAR are run where this approximation is refined. Subsequently, collect the misclassified falsifying points and learn a new classifier using the misclassified points and the satisfying points in a similar manner. Note that none of the feasible points are misclassified by the construction of the semi-soft SVM. During the CEGAR step that learns the new classifier, the condition Ai(x) = γi(x) is checked where now γi(x) is Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Region to be overapproximated Learned Overapproximation Figure 2: Quadratic Overapproximation of the infeasible region. the conjunction of the most recent classifier learned and the previous classifiers. The classifier learned at the end of these CEGAR iterations is the conjunction of the two classifiers (which in turn are a conjunction of classifiers). This differs from conventional boosting [Ting and Zhu, 2009], where the classifier learned is a weighted sum of an ensemble of classifiers, and cascade SVMs [Graf et al., 2004], where the classifiers are learned in parallel and then combined into a single one. 4.5 Example To demonstrate the approach described above consider the following constraint A =x2 + y3 4 x 4 x2 + y3 + 30y 20 x5 + 3xy 2x 5. Suppose we want to find a point that lies outside A but satisfies a constraint C i.e we are trying to solve A C. We first seek to learn an underapproximation of the feasible region ( A) or an overapproximation of the infeasible region (A) with quadratic classifiers, since a collection of quadratic constraints is easier to solve than a collection of higher order polynomial constraints. Recall that by negating the overapproximation of A, we can obtain an underapproximation of A. For constraint A we obtain satisfying and falsifying instances using Algorithm 1. Next, the sampled instances are transformed to a higher dimensional space by a feature map [Boser et al., 1992], which lets us learn quadratic classifiers in the higher dimensional space by the semi-soft SVM procedure described in Section 4.1. An antecedent for A is learned as described in Algorithm 2. Recall that the antecedence is verified using an SMT solver in Algorithm 2. Here, the number of clusters is set at 2 and we repeat CEGAR until it converges. We learn an abstraction that is a conjunction of two quadratic classifiers. The abstraction is A = 0.109xy 0.788x + 0.915x2 + 0.935y + 0.135y2 1.03 5.077 0.338xy+6.001x+1.36x2 +0.887y+0.257y2 0. A underapproximates A. To solve A C, we solve A C. Figure 2 shows the original set of constraints (red). The infeasible region (A) is shaded. The classifiers learned are shown in the graph (blue). The feasible area is underapproximated by the intersection of the regions outside each of the classifiers. The solution space for the problem is restricted to x [ 2π, 2π] and y [ 2π, 2π]. 5 Decomposition Methods This section describes the heuristics that we use to select which constraints should be considered together to learn an abstraction. The first level of decomposition is based on the Hamming distance. Hamming decomposition This heuristic groups together constraints that have many common variables i.e., those constraints whose sets of variables, treated as vectors, have a small Hamming distance. Let FV ( ) be the function that takes a clause and returns the set of free variables of that clause. Then, the Hamming distance between constraints cm and cn can be computed as H(cm, cn) = | (FV (cm) \ FV (cn)) (FV (cn) \ FV (cm)) |. We assume that a maximum distance bound θ is given, such that any two constraints with a Hamming distance less than or equal to θ will be grouped together for abstraction. Our implementation loops over the constraints; starting with the first constraints c0, it chooses all constraints ck such that H(c0, ck) θ. After the first pass, it chooses the next constraint that was not grouped and loops over the remaining constraints. If we group the constraints into a few classes with many constraints each, then sampling each class to generate abstractions will be computationally difficult. In the limit case, if all constraints are grouped into a single class, then choosing a single satisfying instance corresponds to solving the original satisfiability problem. Bounded-size decomposition In some cases, it may be the case that the earlier decomposition did not produce classes that are sufficiently small to ensure that sampling can be performed quickly. In this case, we set a bound n, and divide the large classes into smaller subsets with n constraints each. 6 Experiments This section details the results of experiments on various benchmark sets on a 32-core AMD Opteron machine at 2.3 GHz, with 96 GB of RAM 1. A timeout of 200 seconds is set for abstracting each subset, if the abstraction procedure fails to finish in this period the constraints are used as is. 1Benchmark problem instances can be found at https:// github.com/dathath/IJCAI_2017_SD Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) 40 60 80 100 120 140 160 180 6 Number of Time Steps Log (Runtime in milliseconds) Abstracted constraints (incl. time for abstraction) Abstracted constraints (a) Car Reachability 0 1 2 3 4 5 6 Number of Constraints Log (Runtime in seconds) Abstracted Constraints Abstracted Constraints (incl. time for abstracion) (b) Randomized spheres Figure 3: Performance on various benchmarks 0 5 10 15 20 25 30 35 40 45 50 0 Runtime in ms d Real after abstraction Figure 4: Performance Hong Family Model-predictive control for Dubins car The model predictive control problem is to find a sequence of control inputs so that a car can reach a specific position. We use a Dubins car dynamics model with additional nonlinear terms: at + c1(at + xt/yt) at+1, yt + c2 cos (at) y2 t yt+1, xt + c3 sin (at) x2 t xt+1. The constraints encode the problem of starting from an initial point and the objective is to determine if a target polytope can be reached i.e (xfinal, yfinal) ([a, b] [c, d]). The benchmark comes from practical problems in mobile robotics, and its scale can be varied by changing the number of time steps. All the problem instances are satisfiable. Figure 3a shows a plot of the run-times for d Real, our abstraction approach, and the time required to check abstractions alone. We cannot compare with un-abstracted performance of z3, because z3 does not support solving over sines and cosines. For this same reason, the sampling and abstraction checking are carried out with d Real, though we solve the simplified problem with z3, because in this example the resulting abstractions are conjunctions of linear constraints, and z3 typically performs better than d Real at linear constraints. The classifier learned for this example is a conjunction of several linear classifiers, one linear classifier learned during each step of CEGAR. The Hamming distance threshold here was set to 4, which results in subsets with 3 constraints each. Since sampling these constraints is fast, further decomposition is not required. The abstracted constraints are always solved faster that solving the constraints directly, which could be useful in an application scenario in which abstractions can be cached and re-used. For large problems, the abstraction-based approach (including the time for abstraction) performs considerably better than solving directly with d Real. Energy-efficient FPGA design The constraint sets here correspond to the search for valid probabilistic encoder expressions in a value-deviation-bounded serial encoding technique [Stanley-Marbell et al., 2016]. The search problem is to find assignments of probabilities for a set of Bernoulli random variables such that they satisfy the constraints for valid encoder family formulations. The hamming distance threshold is set to 3 and n is set at 5. The number of clusters was set to 2. The classifiers learned were of the form: aix2 i + bixi + c0 > 0 dix2 i + eixi + f0 > 0 Sphere packing in high-dimensions In this set of benchmarks, we solve for a point in the intersection of the exterior of a set of randomized high-dimensional spheres of the form X j I (xj aj)2 30 (5) Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Solver No. Of Average Time Avg. Number Avg. Degree Benchmarks for Benchmarks of Constraints of Constraints Solved Fully Input Input Abstracted (ms) Per Instance to Solver z3 29 1793.4 32 5.93 d Real 28 1788.8 32 5.93 Post-Abstraction (with d Real) 20 133.3 11.5 2.06 Table 1: Comparison on benchmarks for which the abstraction procedure completes and cubes bounding the search region (|xk| 30). For each sphere, the parameters aj : j I are randomly chosen in the range [ 50, 50] and the set of variables xj : j I in each constraint are chosen at random uniformly from {1, 2, . . . , 15}. Figure 3b shows a plot of the different runtimes for varying number of constraints. The time for solving the abstracted constraints is significantly smaller than the time for solving directly with d Real and z3 and the time for abstraction scales almost linearly since the time for abstracting each subset is independent of the total number of constraints. The parameters for the learning procedure are the same as those from the FPGA design problem. Hong family. These sets of benchmarks correspond to the Hong family of benchmarks [Jovanovi c and de Moura, 2013]. We restrict the solution space to the positive quadrant. The problem instances are always unsat by construction. A parameterized generalization of the problem is: i=1 xi > 1, i=1 x2 i < 1. To facilitate learning abstractions, we rewrite the problem by replacing the quadratic constraint with constraints of the form x2 i + x2 i 1 zj and P jzj 2. This makes the sampling process easier because each of these constraints have to be sampled only in three dimensions. The Hamming threshold is set to 3 and bounded-size decomposition is not used here. The classifiers learned are a conjunction of linear classifiers as in the car benchmark. Figure 4 shows the runtimes. z3 does not scale beyond low dimensions, while d Real almost scales linearly. The abstraction-based scheme does reasonably well at low dimensions but at higher dimensions the relaxations computed slow down d Real. 7 Complexity and Discussion Since the benchmarks contain trigonometric and highly nonlinear functions, even the relaxed versions of the problem that admit solutions are NP-complete [Gao et al., 2013]. Existing solvers run into scalability issues when dealing with large constraint solving instances as in [Gorcitz et al., 2015]. In our approach, the number of samples grows exponentially with the number of variables in the constraints. But by restricting to small subsets of the original set of constraints, this exponential growth can be capped. Lloyd s algorithm [Lloyd, 2006] for K-means clustering is a linear-time algorithm and the semi-soft SVM results in a QP that is solvable in polynomial time. Verifying that the classifier learned from a small subset of constraints is a consequent (or an antecedent) is a reduced problem whose complexity is much smaller than solving the entire large set of constraints, since the complexity of constraint solving scales exponentially. We use a sequence of polynomial time algorithms to learn an approximate simpler instance of the constraint satisfaction problem. From the experiments, we observe that though the abstraction procedure is not complete it can improve the handling of large sets of complex constraints abstracting certain subsets of the constraints. The experiments demonstrate the potential of the learning based abstraction approach in improving the scalability of constraint solvers. 8 Future Work A future direction of work is to develop more sophisticated heuristics for deciding which constraints are to be abstracted together. Certain decompositions could result in regions that are easier to abstract with a low-complexity, high-fidelity abstraction. Additionally, our current implementation samples points that satisfy and falsify a logical formula in the learning process, without considering the other logical constraints. In future work, we would like to address the scenario of proving a sequent, and leveraging knowledge about the consequent to guide the search of abstractions for portions of the antecedent. Another immediate direction is to restrict the variables in the classifiers for a subset to be abstracted to those that the subset shares with the remaining constraints. A subset imposes restrictions on the other constraints only through the shared variables and learning constraints on the shared variables should suffice. Acknowledgments This work was partially supported by STARnet, a Semiconductor Research Corporation program, sponsored by MARCO and DARPA and in part by Toyota Info Technology Center and NSF CPS1446725. The authors would also like to thank Joel W. Burdick for helpful input. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) [Albarghouthi and Mc Millan, 2013] Aws Albarghouthi and Kenneth L. Mc Millan. Beautiful interpolants. In Natasha Sharygina and Helmut Veith, editors, Computer Aided Verification: 25th International Conference, CAV 2013, Saint Petersburg, Russia, July 13-19, 2013. Proceedings, pages 313 329, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. [Ar echiga et al., 2015] Nikos Ar echiga, James Kapinski, Jyotirmoy V. Deshmukh, Andr e Platzer, and Bruce Krogh. Forward invariant cuts to simplify proofs of safety. In Proceedings of the 12th International Conference on Embedded Software, EMSOFT 15, pages 227 236, Piscataway, NJ, USA, 2015. IEEE Press. [Bacchus et al., 2009] Fahiem Bacchus, Shannon Dalmao, and Toniann Pitassi. Solving #sat and bayesian inference with backtracking search. J. Artif. Int. Res., 34(1):391 442, March 2009. [Boser et al., 1992] Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 92, pages 144 152, New York, NY, USA, 1992. ACM. [Brown and Davenport, 2007] Christopher W. Brown and James H. Davenport. The complexity of quantifier elimination and cylindrical algebraic decomposition. In Proceedings of the 2007 International Symposium on Symbolic and Algebraic Computation, ISSAC 07, pages 54 60, New York, NY, USA, 2007. ACM. [Clarke et al., 2000] Edmund Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith. Counterexample-guided abstraction refinement. In E. Allen Emerson and Aravinda Prasad Sistla, editors, Computer Aided Verification: 12th International Conference, CAV 2000, Chicago, IL, USA, July 15-19, 2000. Proceedings, pages 154 169, Berlin, Heidelberg, 2000. Springer Berlin Heidelberg. [Cortes and Vapnik, 1995] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273 297, 1995. [Darwiche, 2001] Adnan Darwiche. Recursive conditioning. Artificial Intelligence, 126(1):5 41, 2001. Tradeoffs under Bounded Resources. [Drews and Albarghouthi, 2016] Samuel Drews and Aws Albarghouthi. Effectively propositional interpolants. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification: 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II, pages 210 229, Cham, 2016. Springer International Publishing. [Friesen and Domingos, 2015] Abram L. Friesen and Pedro Domingos. Recursive decomposition for nonconvex optimization. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 15, pages 253 259. AAAI Press, 2015. [Gao et al., 2013] Sicun Gao, Soonho Kong, and Edmund M. Clarke. d Real: An SMT Solver for Nonlinear Theories over the Reals. In Automated Deduction CADE-24: 24th International Conference on Automated Deduction, Lake Placid, NY, USA, June 9-14, 2013. Proceedings, pages 208 214, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. [Gorcitz et al., 2015] Raul Gorcitz, Emilien Kofman, Thomas Carle, Dumitru Potop-Butucaru, and Robert de Simone. On the Scalability of Constraint Solving for Static/Off-Line Real-Time Scheduling, pages 108 123. Springer International Publishing, Cham, 2015. [Graf et al., 2004] Hans P. Graf, Eric Cosatto, Lon Bottou, Igor Dourdanovic, and Vladimir Vapnik. Parallel support vector machines: The cascade svm. In Lawrence K. Saul, Yair Weiss, and Lon Bottou, editors, Advances in Neural Information Processing Systems 17, pages 521 528. MIT Press, Cambridge, MA, 2004. [Jovanovi c and de Moura, 2013] Dejan Jovanovi c and Leonardo de Moura. Solving non-linear arithmetic. ACM Commun. Comput. Algebra, 46(3/4):104 105, January 2013. [Lloyd, 2006] Stuart P. Lloyd. Least squares quantization in pcm. IEEE Trans. Inf. Theor., 28(2):129 137, September 2006. [Sharma et al., 2012] Rahul Sharma, Aditya V. Nori, and Alex Aiken. Interpolants as classifiers. In Proceedings of the 24th International Conference on Computer Aided Verification, CAV 12, pages 71 87, 2012. [Stanley-Marbell et al., 2016] Philip Stanley-Marbell, Pier Andrea Francese, and Martin Rinard. Encoder logic for reducing serial i/o power in sensors and sensor hubs. In 28th Annual IEEE Symposium on High-Performance Chips (Hot Chips 16), August 2016. [Tarski, 1998] Alfred Tarski. A Decision Method for Elementary Algebra and Geometry. Springer Vienna, Vienna, 1998. [Ting and Zhu, 2009] Kai Ming Ting and Lian Zhu. Boosting support vector machines successfully. In J on Atli Benediktsson, Josef Kittler, and Fabio Roli, editors, Multiple Classifier Systems: 8th International Workshop, MCS 2009, Reykjavik, Iceland, June 10-12, 2009. Proceedings, pages 509 518, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. [Wu et al., 2008] Shan-Hung Wu, Keng-Pei Lin, Chung-Min Chen, and Ming-Syan Chen. Asymmetric support vector machines: Low false-positive learning under the user tolerance. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 08, pages 749 757, New York, NY, USA, 2008. ACM. [Wu et al., 2013] Shan-Hung Wu, Keng-Pei Lin, Hao-Heng Chien, Chung-Min Chen, and Ming-Syan Chen. On generalizable low false-positive learning using asymmetric support vector machines. IEEE Transactions on Knowledge and Data Engineering, 25(5):1083 1096, 2013. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)