# satsisfiability_and_systematicity__398b9e35.pdf Journal of Artificial Intelligence Research 53 (2015) 497-540 Submitted 01/15; published 07/15 Satisfiability and Systematicity Matthew L. Ginsberg Connected Signals, Inc. 355 Goodpasture Island Road, Suite 200 Eugene, Oregon 97401 We introduce a new notion of systematicity for satisfiability algorithms with restarts, saying that an algorithm is strongly systematic if it is systematic independent of restart policy but weakly systematic if it is systematic for some restart policies but not others. We show that existing satisfiability engines are generally only weakly systematic, and describe flex, a strongly systematic algorithm that uses an amount of memory polynomial in the size of the problem. On large number factoring problems, flex appears to outperform weakly systematic approaches. 1. Introduction Once upon a time, dpll (Davis, Logemann, & Loveland, 1962; Davis & Putnam, 1960) was the algorithm of choice for solving Boolean satisfiability problems, or sat. There were three reasons for this. First, dpll was systematic in that if a particular problem had a solution, dpll would eventually find it. If a problem had no solution, dpll would identify it as unsatisfiable. Such properties are essential if we want to simply invoke a solver, allow it as long as necessary to solve a problem, and be assured that some answer will always result. Second, dpll used an amount of memory polynomial in the size of the problem. The amount of time used was, of course, exponential given the NP-complete nature of sat. But the memory used was polynomial, and therefore only logarithmic in the running time. And finally, dpll worked. Its success was only a harbinger of the uses to which later sat engines would be put, but the range of problems to which dpll could practically be applied exceeded that of any previous general-purpose NP algorithm. Today that range is wider still, including microprocessor verification (Kaivola, Ghughal, Narasimhan, Telfer, Whittemore, Pandav, Slobodov, Taylor, Frolov, Reeber, & Naik, 2009), device driver validation (Moura & Bjørner, 2010) and many others (Biere, Heule, van Maaren, & Walsh, 2009). Two changes led to significant algorithmic improvements. The first was the recognition that a dpll backtrack was best thought of not as a move in a tree-like search space, but instead as a resolution-based inference step. This idea appeared first in Stallman and Sussman s (1977) work on dependency-directed backtracking and Doyle s subsequent (1979) work on truth maintenance. All of these earlier authors, however, described algorithms that might accumulate an exponential number of resolution consequences as the search proceeded. Ginsberg s (1993) work on dynamic backtracking was the first to show that systematic nonchronological inference methods could be constructed using only a polynomial amount of memory, describing the algorithm as an extension of backjumping (Gaschnig, c 2015 AI Access Foundation. All rights reserved. 1979). Bayardo and Schrag (1997) continued to work on the approach, renaming it (sensibly) relevance-bounded learning. These ideas are currently known as conflict driven clause learning, or cdcl, a name that was introduced by Ryan (2002) and then popularized by Marques-Silva, Lynce and Malik (2009). The work through 1997 involved using a scheme for retaining learned clauses that guaranteed that only a polynomial number of such clauses would be kept at any particular point in the search. With grasp (Marques-Silva & Sakallah, 1999), this requirement was dropped. Virtually all modern sat solvers use cdcl, generally modified in a way that either allows an unlimited number of clauses to be collected (as in tinisat, Huang, 2006; or Quest0.5, Lynce, Baptista, & Marques-Silva, 2001) or, more commonly, in a way that no longer guarantees systematicity (z Chaff, Moskewicz, Madigan, Zhao, Zhang, & Malik, 2001; rsat, Pipatsrisawat & Darwiche, 2007; minisat, S orensson & Een, 2005, and many others). The abandonment of systematicity was linked to the inclusion of restarts in satisfiability algorithms. This appears to have been due to Gomes, Selman and colleagues (1998); the basic idea is to escape from areas where the solver is thrashing (searching nearly endlessly without ever identifying the real source of a problem) by simply restarting from scratch. Put somewhat differently (but equivalently), restarts provide an out when the solver makes a mistake near the root of the search tree, what Harvey (1995) refers to as an early mistake. In the setting provided by dpll, restarting the prover will inevitably result in nonsystematicity. This is because crucial information about the state of the search is stored in the position in the search tree, and restarting the search abandons this information. The work on restarts has matured in two separate directions. First, there has been a considerable body of work examining the theoretical properties of search with restarts in the cdcl case where all of the clauses are retained as the search proceeds. Such a search is obviously systematic, since only a limited number of new clauses can be learned before either a solution is found or the empty clause is derived (proving the original problem to be unsatisfiable). There are more interesting conclusions to be drawn, however, The fact that restarts can prevent thrashing appears to be related to the fact that they appear to allow foxr more flexible inference schema than previous methods. This work was begun by Beame and colleagues (2004), continued by Buss et al. (2008) and Hertel et al. (2008), and extended further by Pipatsrisawat and Darwiche (2011), who showed that cdcl with restarts is equivalent to general resolution. Cdcl without restarts is not known to be generally able to produce proofs exponentially shorter than tree-based resolution, and general resolution can lead to exponentially shorter proofs than tree-based resolution in many instances (Ben Sasson, Impagliazzo, & Wigderson, 2000).1 In addition to this theoretical work, practical work was proceeding as well, focusing in large measure on the development of strategies identifying points at which the search should be restarted. It was realized fairly quickly that virtually any restart policy was an improvement on not restarting at all. Luby et al. (1993) had developed a restart policy that they showed was within a logarithmic factor of optimal on a wide range of problems. The 1. It is also not known that tree-based resolution can t polynomially simulate general resolution (Bonet & Johannsen, 2014). Satisfiability and Systematicity Luby restart policy was shown to outperform a variety of alternatives in the sat domain by Huang (2007). The Luby restart policy involves gradually (but not monotonically) increasing the number of backtracks between restarts; roughly speaking, the 2kth restart is the first that allows 2k backtracks before restarting once again. We will refer to the effort undertaken between consecutive restarts as a probe; the number of backtracks between the restarts will be called the size of the probe. The Luby approach has two useful consequences. The first is that the total number of backtracks grows quadratically with the number of restarts because of the infrequency of probes of large size. The second consequence of the Luby policy is that the search is once again systematic, assuming that the original procedure, without restarts, was itself systematic. The reason is that the Luby restart policy will eventually involve probes of sufficiently large size that systematicity is guaranteed. This is a distinction worth formalizing. Given an algorithm A that involves a restart policy, we will say that A is strongly systematic if it is systematic independent of the restart policy chosen. We will say that A is weakly systematic if it is systematic for some restart policies but not for others, and we will say that A is nonsystematic if it is not systematic for any restart policy. The connection between systematicity and the amount of memory used is sufficiently important that we present the following classification of some sat solvers: Strongly Weakly Systematic Systematic Nonsystematic Exp-space tinisat Polyspace ? minisat wsat By exp-space, we mean algorithms that demonstrably may take an amount of space exponential in the size of the problem being solved; polyspace algorithms are those that remove learned clauses and thus could, at least in theory, use only polynomial amounts of space as a result. Memory is cheap these days; perhaps exp-space is not as much of a drawback as it once was. But this turns out not to be the case, for the following reason. Most of the CPU time consumed by sat engines is spent in the unit propagation procedure, which finds obvious consequences of the variable assignments made at any particular point in the search. Although there have been a variety of engineering improvements that speed this process considerably (Moskewicz et al., 2001), the overall time can still be linear in the size of the accumulated clausal database. As this database grows exponentially in size, the time needed for a single inference can therefore grow exponentially as well. Given tinisat s existence in the upper left, there has been little interest in an exp-space solver with weaker systematicity properties. But tinisat cannot really be considered a production solver; on difficult problems, the number of stored clauses grows impractically and the solver itself essentially grinds to a halt. We have not previously discussed the wsat family of algorithms (Selman, Kautz, & Cohen, 1993), nor shall we. These algorithms are very different in spirit from the ones that we have described thus far, working with complete variable assignments and using hill climbing to attempt to gradually convert these assignments to solutions of the problems being investigated. While competitive in the 1990 s, the progress on systematic algorithms has been such that these fundamentally nonsystematic approaches are currently given relatively little attention, although recent work has shown that the connection between these algorithms and the cdcl approach is closer than one might think (Goultiaeva & Bacchus, 2012). Most modern sat solvers join minisat in the middle of the bottom row, weakly systematic but using only a relatively modest amount of memory to get the job done. Removing some clauses is essential in any practical solver and there simply has been no known strongly systematic algorithm capable of doing so. The general view has been that although not strongly systematic, these systems work exceptionally well in practice, including managing to produce proofs of unsatisfiability when necessary. It should be noted, however, that this loss of systematicity is more than a simple theoretical concession. It is obviously important, for example, that distinct probes begin their search from distinct locations in the search space. Should those starting locations be close in some sense, it is also important that the corresponding probes overlap as little as possible. We believe that the experimental results in Section 4 support this intuition. As we have remarked, our work in dynamic backtracking introduced the first nonchronological, polyspace, weakly systematic sat engine. Our goal in the current paper is to describe a new cdcl-based sat engine flex that is both polyspace and strongly systematic. The outline of the paper is as follows. In Section 2, we introduce the necessary notation and describe a standard cdcl algorithm. We also describe the choices typically made in implementing such algorithms. Section 3 describes the flex algorithm; the proof that it is both systematic and polyspace is deferred to an appendix. We discuss the additional data structures needed to guarantee systematicity, and the restrictions that must be placed on some standard choices as a result. Experimental results are presented in Section 4 and concluding remarks are in Section 5. 2. Background, Notation and Existing Work By a satisfiability problem, we will mean a collection of clauses over Boolean variables such as: Each of the clauses is a disjunction, and the goal is to value all of the variables so that each clause is satisfied. In this particular example, we have to make a and b true, and c false. Definition 2.1 A variable is a letter, such as a, b, c and so on. A literal is a variable or the negation of a variable. A clause is a disjunction of literals; the (unsatisfiable) empty clause will be denoted . A theory is a conjunction of clauses. Satisfiability and Systematicity An interpretation or bias is an assignment of true or false to each variable in a theory. An interpretation will be said to be a model of a theory T if every clause in T is satisfied by the interpretation. We will often call theories sat instances or satisfiability problems, and will think of clauses simply as sets of the literals they contain, writing, for example, that a a b. We will similarly think of theories as sets of clauses, and will think of the size of a theory T as simply the number of clauses in T, denoting it by |T|.2 Modern satisfiability algorithms work by manipulating partial assignments of values to variables; the idea is to either extend the partial assignment to value more variables (and eventually to a model of the theory in question) or to show that a particular partial assignment cannot be extended to a model. Definition 2.2 A partial assignment is a sequence P = l1, l2, . . . , ln of literals such that if i = j, then li = lj and li = lj. A variable v will be called valued by P if v P or v P. A literal l will be called satisfied by P if l P, and will be called falsified by P if l P. A clause c will be called falsified by P if every literal l c is falsified by P. In practice, the variables assigned values in a partial assignment are typically annotated with reasons of some sort. A variable value may be the result of a choice made by the search engine, or the obvious result of previous choices: Definition 2.3 An annotated partial assignment is a sequence (l1, c1), (l2, c2), . . . , (ln, cn) of pairs, where each pair contains a literal li and a clause ci. The clause ci will be called the reason for li, and the set {c1, . . . , cn} { } will be denoted R(P) and called the reasons of the annotated partial assignment. Given an annotated partial assignment P, the ith pair (l, c) P will be called justified if either c = or l c and every literal in c other than l is lj for some j < i. An annotated partial assignment P will be called justified if every (l, c) P is justified. Note that a justified partial assignment can always contain pairs of the form (l, ), since the justification condition is trivially satisfied. We will use this to indicate that l is a choice made by the search algorithm: Definition 2.4 If P is an annotated partial assignment and (l, ) P, we will say that l is a choice in P. The set of choices in P will be denoted by C(P). The set of negations of choices in P will be denoted by C(P). Similarly, a justified partial assignment can always contain pairs of the form (l, l), which we will use to indicate that l itself has been proven to be a consequence of the theory being investigated. In general, all of the partial assignments that we consider will be justified. 2. From a complexity perspective, it might seem more natural to think of the size of a theory T as the total number of literals in the clauses in T. Our choice is a bit more convenient notationally and for a theory involving v variables, the total number of literals in T is obviously at most v|T|. Definition 2.5 Let P be an annotated partial assignment. Then if (l, c) is the ith element of P, we will say that the assertion level of l is |{j i|cj = }| For a clause α, we will say that the assertion level of α is the maximum of the assertion levels of the literals in α. In other words, the assertion level of a literal is the number of choice variables that precede it in P. Definition 2.6 Let P be an annotated partial assignment for a theory T. A clause α = l1 ln such that no literal in α is satisfied by P, and exactly one literal is unvalued by P, will be called unit. If a single literal l α is unvalued by P for some unsatisfied clause α, then any extension of P that is a model of T must include l, and P will remain a justified partial assignment if we add (l, α) to its end. This is called unit propagation: Definition 2.7 Unit propagation is a procedure unit that accepts as input a theory T and a justified partial assignment P. It returns a pair (P , c) where P is a maximal justified partial assignment that extends P, so that P P , and either c = true or c T is falsified by P . If unit propagation returns c = true, unit propagation succeeded . If it returns a partial assignment P and a clause c falsified by P , a contradiction has been found and c is the reason. Definition 2.8 We will say that the backtrack level of α is zero if every literal in α has the same assertion level, or the second greatest of the assertion levels of literals in α otherwise. Given an annotated partial assignment P and clause α, we define the result of backtracking to α, to be denoted backtrack(P, α), to be the result of removing from P any literal with assertion level greater than the backtrack level of α. If α is a clause with backtrack level n = 0, there is clearly some literal l α such that the assertion level of l is greater than n. If this literal l is unique, then backtracking to the backtrack level of α will leave every element of α other than l unchanged, so that α itself (which was a contradiction prior to the backtrack) will be a unit clause after the backtrack is complete. Most sat implementations spend the bulk of their time in the unit procedure; this is a consequence of the fact that they must search the theory T for unit clauses. Moskewicz et al. s (2001) introduction of so-called watched literals in z Chaff sped this process considerably, but it continues to consume the bulk of the computational resources in implemented systems. Eventually, unit propagation either terminates without finding a contradiction (presumably because no unit clauses remain in T), or returns a clause c T that is falsified by the current justified partial assignment P. In the latter case, the satisfiability engine needs Satisfiability and Systematicity to respond to this new information and modify P in some way. Dpll, for example, would modify P simply by returning to the most recent choice point and then resuming the search. Cdcl implementations work differently. They take P and c and produce a new clause that is entailed by T and that captures, as best possible, the reason for what went wrong. As an example, suppose that P gives α = a b c as the reason for c (so that P contains a, b and therefore c), and that β = a c appears in T. This clause is falsified by P. We can now resolve α and β to get the new clause a b, which is also falsified by P and suggests (correctly) that either a or b is the source of the problem even if many other variable choices appear subsequently in P itself. This may enable a much deeper backtrack than simply returning to the most recent choice point. Definition 2.9 Let P be a justified partial assignment, and c a clause falsified by P. By a (P, c)-conflict we will mean any clause γ that is a logical consequence of c together with the reasons in P and that is also falsified by P, so that c R(P) |= γ and P |= γ. As shown above, once unit propagation identifies a contradiction, we can use resolution to derive a (P, c)-conflict where P is the current partial assignment and c is the conflicting clause in T. But we can also sometimes do more. Consider the following theory: and imagine that after e and f are true, we choose to make a true as well. Now unit propagation allows us to conclude b and then c from b (and e), and also d and then c from d (and e and f). Now a single resolution of (2) (the reason for c) and (4) (the reason for c) will allow us to conclude b d e f (5) But we can we go further, resolving (5) with the reason (1) for b to get a d e f and then also resolving with the reason (3) for d to get simply a e f. This last conclusion correctly identifies a itself as the source of the problem once e and f are both true. We see from this example that a single (P, c) pair may have many conflicts and we will generally need to identify a single such conflict to continue the search. In general, we will want to select a conflict that has a single literal at its assertion level. That will allow the algorithm to redraw the conclusion in question after backtracking to the backtrack level of the conflict. Continuing with our example, suppose that the levels of the choice literals involved were 1 for e and f and 4 for the choice literal a. As we explained, all of the following are possible learned clauses in this case: clause assertion literals at backtrack level assertion level level b d e f 4 b, d 1 a d e f 4 a, d 1 a e f 4 a 1 It is only if we learn the last of these clauses that we will be able to draw a unit conclusion ( a) after backtracking to level one. Definition 2.10 Given a justified partial assignment P and a clause γ, let n be the assertion level of γ. Then γ will be called a unique implication point, or uip, if it has a unique literal whose assertion level is n. Uip s were introduced in grasp (Marques-Silva & Sakallah, 1999), and experience has shown that part of the strategy for selecting a conflict clause should be to always select a uip. This can be done by resolving away reasons for any multiple literals at the assertion level of γ. We can now describe a cdcl-based inference engine. As we do so, recall from Definition 2.1 that a bias is a complete assignment of values to variables. If B is a bias and v a variable, we will denote by B(v) the value specified by the bias for v, so that B(v) B and B(v) = v or B(v) = v. We also define: Definition 2.11 For a bias B and set of literals S, we define the result of restricting B to S, to be denoted B|S, to be B|S = B S { l|l S}. Informally, the result of restricting a bias is to label all of the literals in S as true, removing any negations that appeared in the bias before the restriction was performed. The bias is intended to capture our current guess as to the most likely values in a model of T. So when we extend our partial assignment to take a value at a new variable, we choose the variable and then use the value suggested by the bias. A specific example of this idea was introduced by Pipatsrisawat and Darwiche (2007) in rsat and was called phase saving in that work.3 3. A similar idea was introduced slightly earlier by Selmann and Ans otegul (2006); earlier still, Beck (2005) had introduced the idea of saving some preferred values to use in the next portion of the search. Satisfiability and Systematicity Procedure 2.12 (CDCL) Let T be a theory, Γ a set of clauses such that T |= Γ, and B a bias for T. Then to compute cdcl(T, Γ, B, n), one of: if T is shown to be unsatisfiable, a model P of T if one is found, or unknown if no solution is found after n steps: 2 while i < n do 3 (P, c) unit(T Γ, P); 4 while c = true do 5 if P assigns a value to every variable in T then return P; 6 v a variable unvalued by P; // choice 7 (P, c) unit(T Γ, P, (B(v), ) ); 8 γ any (P, c)-conflict that is a uip; // choice 11 if Γ then return ; 12 P backtrack(P, γ); 13 Γ Γ discard(Γ, P); // choice 15 return unknown This algorithm is hardly original with us, but let us give an explanation in any event. As the algorithm proceeds, P is the justified partial assignment being considered and Γ is the collection of learned clauses. The algorithm works by extending the partial assignment P until either a solution is found (line 5) or a new clause is learned (line 7 with c = true). In the latter case, we identify a new conflict-driven uip γ (line 8), add it to Γ, backtrack until γ is unit, and repeat. We update the bias as the search proceeds; we present in line 10 the rsat choice of updating the bias every time a variable is set by unit propagation. As we will see in Section 3, other choices are also possible.4 After the new clause is discovered and added to Γ on line 9, we discard some elements of Γ (line 13); it is here that we can potentially shrink Γ so that only a relatively small (and potentially polynomial) number of clauses is retained. Then we either give up if we have encountered n backtracks, or continue searching. Recall that for an annotated partial assignment P, R(P) is the set of reasons for literals in P. If (P, c) is a value such as is returned by unit propagation, we will abuse notation somewhat and use R(P, c) to denote simply R(P). We now have: Lemma 2.13 Suppose that R(unit(T Γ, P)) discard(Γ) = Ø for all Γ, so that we never discard reasons or unit clauses that are about to become reasons. Then throughout the execution of Procedure 2.12, we will have T |= Γ and P will be a justified partial assignment of T Γ. In addition: 4. The attentive reader may wonder why the test on line 11 does not appear until after γ is added to Γ on line 9 and the bias is updated on line 10. The reason is that flex may perform additional inference at this point. 1. If Procedure 2.12 returns , then T is unsatisfiable. If Procedure 2.12 returns P for some partial assignment P, then the literals in P are a model of T. 2. For a theory with v variables and n 2v, Procedure 2.12 will not return unknown, independent of the further nature of discard. Proof. We first show the invariants mentioned above. That P is a justified partial assignment follows from the fact that it is returned by unit; any justified partial assignment will obviously remain so during a backtrack such as appears on line 12. That T |= Γ follows from the fact that T entails any (P, c)-conflict if P is a justified partial assignment and c T. Now consider the various points at which Procedure 2.12 returns. On line 5, P is a model of T. And on line 11, unit propagation and resolution have collectively derived as a (P, c)-conflict, so that T |= and T is unsatisfiable. That Procedure 2.12 does not return unknown is a bit more subtle. To see this, suppose that we have a partial assignment P. Now we can associate to P a sequence of positive integers n0, n1, . . . , nk , where nj is the number of literals in P with assertion level j. We will call this sequence the size of P and denote it by size(P). Now let n = n0, n1, . . . , nk and m = m0, m1, . . . , ml be two sizes. We will say that m is smaller than n, writing m < n, if one of the following two conditions holds: 1. There is some j with mj = nj and, for the first such j, mj > nj. 2. mi = ni for all i k, and there is a j with k < j l and mj > 1. Informally, if the size of a partial assignment gets smaller, we have made progress in that we have derived more consequences earlier in the search tree, and the unexplored region of the search space has therefore gotten smaller. We formalize this via the following two claims: 1. Denote by Pi the partial assignment after line 3 in Procedure 2.12 for a given value of i. Then size(Pi+1) < size(Pi). 2. If s1 > s2 > > sz is a properly descending sequence of sizes, then z 2v, where v is the number of variables in the theory in question. Note that these two results suffice to complete the proof, since it will follow that the loop in Procedure 2.12 can be repeated no more than 2v times. For the first claim, note that the new clause γ is unit after the backtrack on line 12. Since it is unit, it will be a reason in unit(T Γ, P) and therefore cannot be discarded by discard. Now let j be the assertion level of γ. It is immediate that after the unit propagation on line 12, there will be at least one new variable assigned a value at level j. Now suppose that size(Pi) = n0, n1, . . . , nk , and size(Pi+1) = m0, m1, . . . , mj . (The number of assertion levels in Pi+1 is necessarily equal to j.) Pi and Pi+1 agree before the point of the backtrack, so mk = nk for k < j. Now if j k (so that the backtrack brought us into Pi), then mj > nj and size(Pi+1) < size(Pi). If, on the other hand, j > k, then the Satisfiability and Systematicity sizes agree up to k and mj > 1 because the conclusion of γ is a unit consequence at this level. This proves the first claim. For the second, note first that the ordering conditions are unchanged if we append to the end of any size n0, n1, . . . , nk enough 1 s so that P i ni = v. This essentially pretends that we have added values for all subsequent variables, but there have been no unit propagations from those variables. If we write n for the result of extending a sequence in this way, it is easy to see that n < m if and only if n < m. Now let n be a size. We claim that any descending sequence of length at least 2v n0 steps beginning with n and ending in a size m will have m0 > n0. The proof is by induction on v n0. If v n0 = 1, then we have n0 = v 1 and must have n = v 1, 1 . But now the only way to make it smaller at all is to have m = v , and m0 > n0. For the inductive step, suppose that n = n0, 1, . . . . Now after at most 2v n0 1 steps, we will have gotten to n0, 2, . . . by the inductive hypothesis. After at most 2v n0 2 steps, we will have gotten to n0, 3, . . . , and so on. We will therefore arrive at n0, v n0 after at most i=1 2v n0 i = 2v n0 2 steps. One more step, a total of 2v n0 1, causes n0 to increment. 2v n0 steps thus cause n0 to increment as well. This completes the induction. Now imagine that we add a new variable w to our theory, where w appears in no clauses but is evaluated first when the search begins. For our new theory with v +1 variables, given that the size of any partial assignment must have its first component incremented after at most 2(v+1) 1 steps, we will either have learned something about w (an impossibility) or derived the empty clause . Thus the second claim follows and the lemma is proved. This proof is very similar in spirit to the proof of Theorem 3.10, our main theoretical result. In general, Lemma 2.13 appears to be common knowledge in the sat community, although I am unaware of any other published proof of the second claim and it is not completely clear if the sat community knows it to be true.5 Until this point, we have not yet included restarts in our algorithm, but doing so is straightforward. We define: Definition 2.14 A restart policy is a mapping r : IN IN, where IN is the set of positive integers. The restart policy simply gives the number of backtracks permitted as a function of probe. 5. A similar result, with a similar proof but based on the assumption that no clauses are ever deleted, does appear in Section 4.2 of Zhang s (2003) Ph.D. thesis. Procedure 2.15 (CDCL with restarts) Let T be a theory and r a restart policy. Then to compute sat(T, n), either the empty clause if T is unsatisfiable or a model of T: 2 B any bias for T; // choice 4 while true do 5 x cdcl(T, Γ, B, r(i)); 6 if x = unknown then return x; 2.1 Theoretical Results We can now show the following: Proposition 2.16 Let T be a theory involving v variables, and r a restart policy. Then if Procedure 2.15 returns , T is unsatisfiable. If Procedure 2.15 returns a partial assignment P, then the literals in P are a model of T. In addition: 1. If R(unit(T Γ, P)) discard(Γ, P) = Ø and if there is some i such that r(i) 2v, then Procedure 2.15 will always terminate. 2. If discard(Γ, P) = Γ R(unit(T Γ, P)), then |Γ| v as Procedure 2.15 is executed. 3. If discard(Γ, P) = Ø, then the size of Γ may grow to be exponential in v, independent of the restart policy r. Proof. The proof of Lemma 2.13 also shows that Procedure 2.15 is correct. Claim 1 of Proposition 2.16 follows immediately from claim 2 of Lemma 2.13. Claim 2 follows from the fact that each variable can have at most one reason in R(P). Claim 3 follows from the fact that the amount of memory used by Procedure 2.15 is linear in the run time if no clauses are ever discarded, and there are known to be many problems for which the shortest resolution proof is of exponential length (Bonet, Pitassi, & Raz, 1997; Haken, 1985, 1995; Krajiˇcek, 1997; Pudlak, 1997; Tseitin, 1970). Corollary 2.17 There are weakly systematic, polyspace instances of Procedure 2.15. There are strongly systematic, exp-space instances of Procedure 2.15. Proof. Take discard(Γ, P) = Γ R(unit(T Γ, P)) to get a weakly systematic polyspace instance by virtue of the first two claims of Proposition 2.16. If discard(Γ) = Ø, the resulting instance is strongly systematic because any instantiation will eventually run out of new clauses to learn, and may use an exponential amount of memory because of the third claim of Proposition 2.16. Tinisat is an exp-space, strongly systematic instance. Minisat is a (potentially) polyspace, weakly systematic instance. Satisfiability and Systematicity 2.2 Practical Concerns In addition to the selection of restart policy, there are three further points where an explicit choice must be made in Procedure 2.12, and one where a choice must be made in Procedure 2.15. In line 6 of Procedure 2.12, a variable is selected for addition to P. In line 8, a particular conflict clause is selected for addition to Γ. In line 13, the set Γ is reduced, presumably in order to conserve memory (and thereby speed unit propagation). And in line 2 of Procedure 2.15, an initial bias is selected. Given that much work on sat currently focuses on one implementation of Procedure 2.15 (and therefore of Procedure 2.12) or another, there is a considerable volume of literature on each of these choices. We will summarize that work here, providing additional details in areas where our ideas will constrain the allowed options. Variable and bias selection Variables are generally selected in a cdcl prover using a heuristic called vsids ( variable state independent decaying sum ) that was introduced in z Chaff (Moskewicz et al., 2001). We continue to use vsids in our work and do not describe it further here. We also need to specify an initial bias, before search or choice has valued any of the variables. There is no real reason to prefer one initial bias over another, especially given that the values are likely to be overwritten as the search moves forward. We will follow minisat s example and set every variable to false in our initial bias.6 Conflict clause selection We have already remarked both that a particular (P, c) pair may have many associated conflicts, and that it is generally wise to select one that is a uip. That will allow the algorithm to redraw the conclusion in question after backtracking to the backtrack level of the conflict, and is central to the proof of Lemma 2.13. But we can sometimes do more. Recall that in our running example, we derived the uip a e f. Suppose also that the actual choice literal at level one was g, with e and f following from g e and g f respectively. Now we can continue the resolution process, using g a as the learned clause instead of a e f. Should we do so? Definition 2.18 A uip γ will be called a decision uip, or duip, if C(P) |= γ, so that every literal in γ is the negation of a choice in P. Proposition 2.19 Let P be a justified partial assignment and γ a clause falsified by P. Then (P, γ) has a unique duip. Proof. The only way to construct a duip is to resolve literals away until only decisions remain. It is not hard to see that the set of decisions so obtained is independent of the order of such resolutions. Should we always work with a duip as our algorithm proceeds? Surprisingly, the answer appears to be no; the extra resolutions seem to move the conflict away from the real problem more often than not. But one thing we should do is to remove any literals that 6. On naturally occurring problems, the effectiveness of biasing every variable to be false is perhaps rooted in the so-called closed world assumption (Clark, 1978; Reiter, 1978), but that will not concern us here. are simply unnecessary. So if in our running example the reason for f is e f, it is obviously wise to resolve this into the conflict clause being learned, replacing a e f with the properly stronger a e. Zhang et al. define a first uip as follows.7 The basic idea is that we want to define a first uip as a uip that is both minimal and as close to the eventual conflict as possible. Definition 2.20 For an annotated partial assignment P and conflict c, a (P, c)-conflict γ that is a uip will be called reduced if no proper subset of γ is a (P, c)-conflict. By close to the eventual conflict, Zhang et al. actually mean that the uip in question is as late in the conflict graph as possible: Definition 2.21 Let P be an annotated partial assignment and suppose that γ1 and γ2 are clauses falsified by P. We will write γ1
l}
The antecedent of γ, to be denoted γ , is γ = (γ γ ).
The conclusion of the learned clause γ is the disjunction of the maximal literals in γ, and the antecedent is the negation of what s left. Again, an example may help. Suppose that γ = a b c d and the partial order has a c and b d. Now c and d are the maximal elements of γ, so the conclusion γ is the disjunction c d. The antecedent is
= ({a, b, c, d} {c, d})
as one would expect. The learned clause has effectively been rewritten as
We have been cavalier in our use of notation, switching freely between representing γ as a disjunction and as a set.
Definition 3.5 If Γ is a set of learned clauses and a partial order, we will denote by Γ the conjunction of the antecedents γ for all γ Γ. We will denote by Γ the conjunction of the conclusions γ for all γ Γ.
We are finally in a position to describe the procedure by which we add a new clause γ to Γ. In Procedure 2.12, this was via the simple Γ Γ {γ} on line 9, but here it is a bit more complex. The addition of γ to Γ can also both modify the partial order and replace γ with a lexicographically stronger clause in some circumstances:
Procedure 3.6 To compute add(γ, Γ, ):
1 if γ = then return (γ, Γ { }, );
2 select l γ ; // l is the intended conclusion
3 if there is a ρ Γ with ρ = l then
4 return add(resolve(γ, ρ), Γ, )
5 add x l to for each x γ;
7 remove from Γ any c with either c {l, l} = Ø or both c {x|l < x} = Ø and |c | > 1;
8 return (γ, Γ {γ}, )
Line 1 handles the edge case where we have actually derived a contradiction. Line 4 calls for a resolution if both l and l appear as conclusions of clauses in Γ, and line 7 says that clauses should be deleted from Γ if they include either l or l in their antecedent, or if they have ambiguous conclusions based on variables that follow l in the partial order. This is in keeping with our overall lexicographic approach: Once we make progress on a particular literal l, we can forget everything about following literals in the partial order, removing both their relative orderings (as per the weakening on line 6 of Procedure 3.6) and clauses that contain them (as per the removal on line 7).
Proposition 3.7 Suppose that Γ contains n clauses involving v variables. Then Procedure 3.6 can be executed in O(v3 + nv2) time.
Proof. Except for the recursion, the most expensive steps are line 6, which can potentially take time O(v2) and line 7, which can take time O(nv). This assumes that we can determine whether x < y in time O(1), but whatever the data structure used to retain the partial order, we can always compute the entire partial order at each iteration of the loop, which will take time at most O(v2). The total time taken without the recursion is thus bounded by O(v2 + nv).
The maximum number of recursive calls is O(v), since each variable will be resolved upon at most once.
Just as the o(v2) unit propagation procedure remains practical on problems containing millions of variables, so we expect the add procedure 3.6 to do so as well. (And for roughly the same reasons: Clauses are not of length v, literals generally appear in only a relative handful of clauses, and so on.)
Procedure 3.6 is the primary difference between a standard cdcl algorithm and flex; we will replace the act of adding γ to Γ with the more involved Procedure 3.6. When we do so, note that Procedure 3.6 may actually derive the empty clause in line 4; we then return a theory containing the empty clause on line 1, leading to a failure on line 11 of Procedure 2.12. We have:
Satisfiability and Systematicity
Procedure 3.8 (Pure FLEX) Let T be a theory, Γ a set of clauses such that T |= Γ, a partial order on the variables in T, and B a bias for T. Then to compute flex(T, Γ, , B, n), one of: if T is shown to be unsatisfiable, a model P of T if one is found, or unknown if no solution is found after n steps:
2 while i < n do
3 (P, c) unit(T Γ, P);
4 while c = true do
5 if P assigns a value to every variable in T then return P;
6 v a variable unvalued by P;
7 (P, c) unit(T Γ, P, (B(v), ) );
8 γ any (P, c)-conflict that is a uip with B |= γ;
9 (γ, Γ, ) add(γ, Γ, );
11 if Γ then return ;
12 P backtrack(P, γ);
14 return unknown
Note that we require the selection of a buip on line 8 of Procedure 3.8 (since we require B |= γ), and that we adjust the bias only to cater to the conclusion of the new clause γ on line 10. Both of these observations will be important as we discuss practical implementation of our methods. The only modification needed to Procedure 2.15 is to initialize and then use the partial order :
Procedure 3.9 (Pure FLEX with restarts) Let T be a theory and r a restart policy. Then to compute sat(T, n), either the empty clause if T is unsatisfiable or a model of T:
3 B any bias for T;
5 while true do
6 x flex(T, Γ, , B, r(i));
7 if x = unknown then return x;
We finally have:
Theorem 3.10 Suppose that T is a theory with n clauses involving v variables. Then:
1. Each loop of Procedure 3.8 can be executed in time O(v3 + nv2),
2. |Γ| v as Procedure 3.9 is executed, and
3. For any n with Pn i=0 r(i) 2v, Procedure 3.9 will finish before completing iteration n.
Corollary 3.11 Procedure 3.9 is strongly systematic and polyspace.
The proof of Theorem 3.10 is lengthy and appears in the appendix.
Note that our contribution here is not to show that it is possible to ensure systematicity by retaining a sufficient number of learned clauses. Tinisat has already shown this, and Lecoutre et al. have shown (2007) that retaining a single nogood per restart can suffice. But in this other work, the number of restarts grows polynomially with the number of backtracks, and therefore potentially exponentially with problem size. All of these earlier results thus may require keeping an exponential number of learned clauses; our contribution is to show that a polynomial number of nogoods can suffice.
3.2 Hybrid FLEX
While Procedures 3.8 and 3.9 achieve the basic theoretical goal set in this paper, a direct implementation of these procedures as written encounters two practical difficulties.
First, the add procedure is expensive. Executing it at every backtrack introduces an overall computational burden greater than that of unit propagation.
Second, the search in Procedure 3.8 is guided by the bias B, and that bias is modified not to match every unit propagation, but only to match the conclusions of newly learned clauses. Experimentation shows that the rsat bias, matching as it does the result of every unit propagation, is more effective in practice. We cannot value newly selected variables as in rsat, since if the bias doesn t support the nogood, there may be no buip on line 8 of Procedure 3.8. In fact, experimentation on number factoring problems shows that approximately 40% of the time, none of the clauses learned in any particular probe can be resolved to a buip.
Nevertheless, there is a straightforward way to deal with these issues and to still preserve the intuition underlying our approach. In each probe, we follow our bias only until a new clause is learned. That new clause will have a buip (because the bias was followed in constructing it), and we then incorporate that new clause into Γ as usual. We then complete the probe using the rsat bias. By doing this, we follow the rsat bias almost all of the time and only call add once per probe, eliminating the substantial computational overhead that this procedure might introduce.
We will refer to this compound procedure as hybrid flex, or simply flex. In order to implement it, we need to maintain two biases, which we refer to as B (for the flex component) and R (for the rsat component), and two sets of learned clauses, which we refer to as Γ (for the flex component) and Φ (for the conventional component). We finally have:
Satisfiability and Systematicity
Procedure 3.12 (FLEX) Let T be a theory, Γ and Φ sets of clauses such that T |= Γ Φ, a partial order on the variables in T, and B and R biases for T. Then to compute flex(T, Γ, , B, Φ, R, n), one of: if T is shown to be unsatisfiable, a model P of T if one is found, or unknown if no solution is found after n steps:
2 while i < n do
3 (P, c) unit(T Γ Φ, P);
4 while c = true do
5 if P assigns a value to every variable in T then return P;
6 v a variable unvalued by P;
7 if i = 0 then l B(v);
8 else l R(v);
9 (P, c) unit(T Γ Φ, P, (l, ) );
10 if i = 0 then
11 γ any (P, c)-conflict that is a uip with B |= γ;
12 (γ, Γ, ) add(γ, Γ, );
γ any (P, c)-conflict that is a uip;
15 Φ Φ {γ};
17 if Γ Φ then return ;
18 P backtrack(P, γ);
19 Φ Φ discard(Φ);
21 return unknown
Procedure 3.13 (FLEX with restarts) Let T be a theory and r a restart policy. Then to compute sat(T, n), either the empty clause if T is unsatisfiable or a model of T:
3 B R any bias for T;
5 while true do
6 x flex(T, Γ, , B, Φ, R, r(i));
7 if x = unknown then return x;
As in Theorem 3.10 and Corollary 3.11, we have:
Corollary 3.14 Procedure 3.13 is strongly systematic and polyspace for any discard procedure that ensures that Φ is polyspace.
4. Experimental Results
Before we present experimental results based on our methods, let us describe the setting in which these experiments were performed. First, we expect the value of our results to vary with problem size. We have one major advantage over existing methods, in that the strong systematicity of our approach will cause probes late in the search to examine areas of the search space that were unexplored earlier. This advantage is also a drawback, however. Especially early in the search, where the probes are small, the rsat bias is known to be extremely effective and we are likely to see a performance degradation from the fact that we cannot follow this bias on the first set of choices in any particular probe. So for easy problems, we would expect our methods to perform worse than conventional ones because of our inability to begin probes by following the rsat bias. For difficult problems, we would expect our methods to perform better than conventional ones, as we manage to reach new areas of the search space while older methods are reexamining previously explored regions. What we don t know is where the transition will be between these two general problem types. In order to evaluate this, we used two separate sets of problems in our experiments. The first was the set of 1030 problems in the sat 2013 benchmarks. These problems were solved with a 5000 second cutoffin order to ensure that all processes terminated. The second set of problems used were number factoring problems (Bebel & Yuen, 2013). Although subexponential algorithms for number factoring are known to exist (Buhler, Lenstra, & Pomerance, 1993, and a great deal of subsequent work), there is no reason to believe that the techniques involved will bear significantly on the sat encodings of such problems. One advantage of using number factoring is that problems of gradually increasing difficulty can be run to completion in every case, avoiding the fact that a simple timeout is less informative regarding overall computational efficacy than is the time needed to actually solve a problem. Problems of gradually increasing difficulty can be generated by simply increasing the magnitudes of the numbers being factored. The number factoring problems were generated by using an iterated Miller-Rabin test to generate pairs of primes and then using the Purdom/Sabry (2005) sat generator to produce a sat encoding of the problem of factoring the product of these primes. In order to avoid situations where the factoring problem might be too easy, the prime factors were selected so that the sizes of their binary representations were within three bits of one another. For any given problem size (number of bits in the number being factored), fifty instances were generated randomly. As a baseline solver, we used minisat 2.2 (S orensson & Een, 2005). But there were in fact three separate sets of changes that we needed to make to produce flex:
1. First, we converted minisat from C to C++ and changed it to use the C++ standard library in many locations instead of the variety of specialized techniques included in minisat. We will refer to this solver as stlsat.
2. Second, flex can be expected to be more efficient as the number of restarts increases. In minisat and similar solvers, the number of backtracks before restarting is not given
Satisfiability and Systematicity
0.01 0.1 1 10 100 1000 10000
1.22 x ** 0.9916
Figure 1: minisat vs. stlsat, sat competition problems (CPU time in seconds)
as specified by the original Luby policy (Luby et al., 1993), but instead by multiplying the Luby numbers by some constant (generally taken to be 100, so that the first probe involves 100 backtracks). In order to increase the number of probes, we reduced the multiplicative factor to four (so that the first probe involves only four backtracks). We will refer to this solver as stlsat4.
3. Finally, we modified stlsat to produce flex, also with an initial probe size of four.
We present results for the sat 2013 benchmarks first, followed by results for number factoring. We compare minisat to stlsat, stlsat to stlsat4, and then stlsat4 to flex. All experiments were run on a 24-core Intel Xeon machine running at 2.2GHz with 256K of L2 cache per processor, 24GB of main memory, and 75GB of swap space. 24 problems were solved at a time, and the swap space was needed to deal with some of the larger sat competition instances, since many were running in parallel.9
The software used in these experiments is available as an online appendix to this paper. stlsat.zip contains the source for stlsat, and flex.zip contains the source for flex. (Stlsat is just flex with the various extensions removed.)
4.1 Sat Competition Problems Figure 1 compares minisat to stlsat for the sat competition problems. The solver that is closer to original minisat will always be on the x-axis, with the modification on the y-axis,
9. We did not evaluate the impact of the use of swap on the running times of larger problems, although it obviously might be substantial. In practice, however, once significant amounts of swap were used, none of the systems being evaluated was able to solve the problem in question. In addition, running the problems serially would simply have been impractical given the available hardware.
0.01 0.1 1 10 100 1000 10000
1.14 x ** 0.9999
Figure 2: stlsat vs. stlsat4, sat competition problems (CPU time in seconds)
with both axes plotted using a log scale and a dashed line giving the y = x baseline. One point is plotted for each problem solved by either of the two methods. A point above x = y means that the old solver is better; a point below x = y, that the new solver is better.
In many cases, we will also give the best polynomial fit of the form y = axb for fixed a and b. So in Figure 1, stlsat is seen to be 22% worse than minisat at the outset, but the exponent slightly below one indicates that the disparity is shrinking as the problems become more difficult. (Indeed, this is apparent from looking at the graph itself.)
To find the polynomial fit, we did not use the standard technique of minimizing the squared vertical distance between a point and the line in question. The reason is that if we flipped the axes, we would then be minimizing the squared horizontal distance to the line, and a potentially different fit would be found. To ensure that the same fit was found independent of the axis selection, we minimized the sum of the squares of the actual (perpendicular) distances between the data points and the line in question.
In some cases, we will split the data depending on whether the problem instances being solved were or were not satisfiable. This is not an interesting distinction here.
These results are as one might expect. Without the fine-tuned data structures of minisat, stlsat performs slightly worse. This is more significant on easy problems, and the two solvers appear to be virtually identical on more difficult instances.
Figures 2 and 3 show the impact of reducing the number of backtracks per probe. Overall (Figure 2), there is a 14% cost to making the change, and the cost is nearly independent of problem size. Somewhat curiously, however, the cost is significantly different for satisfiable instances (a 22% cost, top of Figure 3) when compared to unsatisfiable ones (a 7% savings, bottom of Figure 3).
Satisfiability and Systematicity
0.01 0.1 1 10 100 1000 10000
stlsat_4 -- SAT ONLY
stlsat -- SAT ONLY
1.22 x ** 1.0043
100 1000 10000
stlsat_4 -- UNSAT ONLY
stlsat -- UNSAT ONLY
0.93 x ** 0.9999
Figure 3: stlsat vs. stlsat4, sat competition problems (CPU time in seconds)
0.01 0.1 1 10 100 1000 10000
3.60 x ** 0.9935
Figure 4: stlsat4 vs. flex, sat competition problems (CPU time in seconds)
Note that the scatter in these cases is much more significant than that of Figure 1. This is as expected stlsat is intended to be essentially unchanged from minisat from an algorithmic perspective, while the change in restart frequency can be expected to have fairly dramatic effects. Note also the collection of points with either x = 5000 or y = 5000, indicating that the problem was solved by one method but timed out using the other.
Finally, Figure 4 compares stlsat4 to flex on the sat competition problems, and flex is about a factor of 3.6 slower (4.53 times slower for satisfiable instances; 2.01 times slower for unsatisfiable ones).10 Many more problems time out for flex than for the modified version of minisat.
This is shown clearly in Figure 5, where we compare the number of problems being solved by stlsat4 and by flex on the sat competition problems. For any specific time cutoff, stlsat4 solves about 2.5 times as many problems as does flex.
It does appear to be the case, however, that the performance gap is narrowing as the problems become more difficult; for example, flex solves 36 problems in between 3000 and 5000 seconds; stlsat4 solves 42 problems in that time. It is for this reason that we examined number factoring problems as well.11
10. The graphs for satisfiable and unsatisfiable instances are not shown; they look basically the same as Figure 4. 11. Note that we cannot simply increase the timeout limit for the sat competition problems; doing so caused only a relative handful of new problems to be solved. If we are to generate an interesting number of more difficult problems, and want the instances to be known to be satisfiable, number factoring seems to be the simplest way to go.
Satisfiability and Systematicity
0.01 0.1 1 10 100 1000 10000
problems solved
CPU time (secs)
flex stlsat4
Figure 5: stlsat4 vs. flex, sat competition problems (5000 second timeout)
0.0001 0.001 0.01 0.1 1 10 100 1000 10000 100000
stlsat (all SAT)
1.06 x ** 1.0069
Figure 6: stlsat vs. stlsat4, factoring problems (CPU time in seconds)
4.2 Number Factoring
For number factoring, we generated 50 factoring problems of each size from five to 45 bits. These problems were then run through stlsat, stlsat4 and flex.
1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 100000
stlsat_4 (all SAT)
1.41 x ** 0.9743
Figure 7: stlsat4 vs. flex, factoring problems (CPU time in seconds; 5000 second timeout)
The (relatively uninteresting) result of comparing stlsat and stlsat4 is shown in Figure 6. As for the sat competition problems, reducing the size of the first probe to four backtracks leads to a small degradation in performance on satisfiable problems. We include this primarily to ensure that the change in probe size is not responsible for the results in the following figures. Flex and stlsat4 are compared in Figure 7. Here, we show only problems that one method or the other was able to solve in 5000 seconds or less, although exact times to solution are used for both solvers. The figure also includes dotted lines indicating the 5000 second timeout for either solver. The overall results are similar to those of the previous section, although flex s performance is improved somewhat. On sat competition problems, it was 3.6 times slower overall and 4.53 times slower for satisfiable problems specifically. On number factoring problems, flex is only 41% slower than is stlsat4. The curve fit is being dominated by the large number of easy problems in this case. In addition, it appears that flex is actually doing better than stlsat4 as the problems become more difficult; many more problems appear to be timing out with stlsat4 (points to the right of x = 5000 seconds in Figure 7) than with flex (points above y = 5000 in the figure). This suggests that instead of looking at the easiest factoring problems, we look at the hardest ones. This is done in Figure 8, where we consider only problems for which one of the two solvers required at least 5000 seconds to reach a solution. The results are strikingly different. A glance shows that flex is in general solving problems more quickly than is stlsat4; in fact,
Satisfiability and Systematicity
10 100 1000 10000 100000
stlsat_4 (all SAT)
0.24 x ** 1.0010
Figure 8: stlsat4 vs. flex, factoring problems (CPU time in seconds; 5000 seconds or harder)
100 1000 10000 100000
stlsat_4 (all SAT)
0.13 x ** 1.0101
Figure 9: stlsat4 vs. flex, factoring problems (CPU time in seconds; 10,000 seconds or harder)
5 10 15 20 25 30 35 40 45
average CPU time (secs)
Figure 10: stlsat4 vs. flex, average time to factor an n-bit number
the line of best fit shows that flex is about four times as fast is as stlsat4. Restricting to problems that took at least 10,000 seconds makes the distinction sharper still; now (Figure 9) flex is eight times faster than is stlsat4.
Similar results appear in Figure 10, which shows the average running time needed by flex or by stlsat to factor a number of n bits. Flex is slower for smaller numbers but faster for larger ones. It is unfortunately impractical to extend this graph much further, since the average runtime for stlsat on a single 45-bit factoring problem is already approximately six hours.
As previously, we show the number of difficult problems (40 bits or more) solved by the solvers as a function of time in Figure 11. With a cutoffof less than 1000 seconds, stlsat4 performs best; as the problems get harder, this is reversed.
5. Conclusion and Future Work
The fundamental claim that we have made in this paper is that the sat community did not need to abandon the systematicity of its methods when it switched from dpll-style provers to cdcl-based ones. We showed that it was possible to simultaneously guarantee systematicity, maintain a polynomially-sized set of learned clauses, and restart the prover as frequently as desired. Furthermore, we showed that while a hybrid between a pure systematic prover and a more typical cdcl engine incurs significant computational cost in achieving these goals, that cost and more is recovered and run time appears to be reduced for difficult problems.
Looking forward, there are two obvious ways in which this work can be extended.
Satisfiability and Systematicity
1 10 100 1000 10000 100000
problems solved
CPU time (secs)
flex stlsat4
Figure 11: stlsat4 vs. flex, factoring problems (40 bits or larger)
First, we have seen that the value of our methods increases as the number of restarts increases. One of the principal reasons to limit the number of restarts in a cdcl-based prover is that restarting the prover is expensive. Recent work by Ramos et al. (2011), however, suggests that a significant part of this expense can be avoided. The basic idea is that any particular probe may well begin its search by repeating many of the literal choices and unit propagations from the previous probe. In such a case, there is no reason to backtrack past the point at which the two probes first diverge. Incorporating this idea into our methods should improve their value further by enabling a larger number of restarts and thus making the systematic component that we offer more effective.
Second, we hope that the overall approach that we have proposed guaranteeing systematicity by retaining a partial order and an appropriate set of learned clauses finally puts to rest the notion that sat engines achieve systematicity only by careful management of a search space defined by partial assignments. It is instead the learned clausal database that guarantees systematicity through semantic methods.
This general conclusion should lead to a variety of new algorithmic choices. As an example, the values represented by the bias are presumably best thought of as probabilistic estimates regarding the likelihood of any particular literal appearing in a model of the theory in question. This probabilistic approach gives a very different flavor to satisfiability in general, and suggests that recent results from Monte Carlo Tree Search, or mcts (Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, & Colton, 2012; Kocsis & Szepesv ari, 2006, and others) may well find a place in sat as well.
Acknowledgments
I would like to thank my Connected Signals and On Time Systems coworkers, especially Aran Clauson and Heidi Dixon, for useful technical advice and assistance. I would also like to thank the anonymous referees for their many carefully considered comments and suggestions, which improved this paper enormously. Finally, I would like to thank David Mc Allester for many contributions to this work over the years; the work presented here draws greatly from Mc Allester s original work (1993) on partial-order dynamic backtracking.
Appendix A. Proof of Theorem 3.10
The proof of Theorem 3.10 will depend on certain invariants that are maintained as Procedure 3.8 is executed. We begin by discussing those invariants.
A.1 Learned Clauses To understand them, suppose that a set of learned clauses Γ has been derived during the course of a systematic search. What conditions would we expect Γ to satisfy? To answer this, imagine that we have a partial assignment P and that we have derived a new learned clause γ that we are about to add to Γ. We certainly expect P to satisfy the existing learned clauses. We will also assume that P satisfies the antecedents of those learned clauses; this is in keeping with the idea that the learned clauses should continue to be relevant to the portion of the search space being explored. What about the new clause γ? First, it will turn out that γ should have a unique conclusion under the partial order . In addition, the new learned clause γ should be new knowledge in the sense that it eliminates a portion of the search space that is still admissible under Γ. In other words, we expect the conclusion γ to be distinct from the conclusions Γ of other elements of Γ. Formally, we have:
Definition A.1 We will say that a partial order parses a learned clause γ if |γ | = 1. We will say that a set of learned clauses Γ is acceptable for a partial order if the following conditions hold:
1. For any γ Γ, parses γ,
2. For any γ1, γ2 Γ, if γ 1 = γ 2 , then γ1 = γ2, and
The first condition repeats the requirement that the conclusions γ are all individual literals. The second condition says that Γ does not contain two distinct learned clauses with the same conclusion. The third says that it is at least possible to find a partial assignment P as described above, since the collection of learned clauses Γ is consistent with the antecedents Γ of those learned clauses.
Lemma A.2 If Γ is acceptable for a partial order , then Γ contains at most one learned clause with any particular variable x in its conclusion.
Satisfiability and Systematicity
Proof. If Γ contains clauses with both x and x as conclusions, it will be impossible to have both Γ and Γ , contradicting the requirement of acceptability.
Proposition A.3 If Γ is a set of clauses involving v variables and is acceptable for a partial order , then |Γ| v.
A.2 Search Space Size Before considering the general way in which the new clause γ allows us to improve the bias, and the way in which the new clause is incorporated into Γ, let us examine a simple special case where is in fact a total order on the variables in question. This means that any particular learned clause γ is always interpreted as forcing the value of the single variable it contains that occurs latest under the ordering . Now given a set of learned clauses Γ, we extend P until we either solve the problem or derive a new learned clause γ. Given the bias B, we are assuming that B |= γ so that the learned clause allows us to improve the bias, and we have already remarked that we assume that B is consistent with Γ Γ . It follows that γ is consistent with Γ Γ as well. If l is the conclusion of γ, how are we to update Γ? Note first that there can be no α Γ with l as its conclusion. If there were, then we would have Γ Γ |= l and therefore Γ Γ |= γ also. There are now two cases: Γ may contain a learned clause with conclusion l, or Γ may contain no such learned clause. If Γ does not contain a learned clause with conclusion l, we would like to simply add γ to Γ. But doing so may cause Γ to become unacceptable, since Γ may contain a learned clause with l in its antecedent. If we simply add γ to Γ, the result will be unacceptable with respect to because it will be impossible to satisfy all of the elements of Γ while simultaneously satisfying their antecedents. As an example, suppose that Γ contains the learned clause a c d, the antecedent of which is a c. If we learn a c and simply add it to Γ, then Γ Γ includes both a c and a c, a contradiction. What we can do in this case is to remove from Γ every learned clause that has l in its antecedent; call the result Γ . We now take add(γ, Γ) to be Γ {γ}. parses γ and obviously continues to parse all of the learned clauses remaining in Γ . Γ {γ} is therefore acceptable with respect to , since l can no longer appear in the antecedent of any learned clause in Γ . Given that we have discarded some of the learned clauses in Γ, do we have any basis for concluding that the procedure will eventually terminate? We do. As mentioned in the main body of the text, the key observation is that we have made lexicographic progress in the search space. While it is true that the allowed domains of all of the variables following l may have gotten bigger when we replaced Γ with Γ , the domain for l itself has gotten smaller. Since l precedes these other variables in the total order, we have made lexicographic progress and systematicity remains guaranteed. Consider now the second case, where Γ does contain a learned clause with conclusion l. Note first that since Γ is acceptable, the learned clause concluding l must be unique. We can resolve this learned clause with γ to obtain a new learned clause ρ. Every variable in ρ precedes l according to the ordering , so the conclusion of ρ must precede l as well. We can thus continue to make lexicographic progress if we define add(γ, Γ) to be add(ρ, Γ), where
we have essentially chosen to incorporate the lexicographically more powerful resolvent ρ instead of the original learned clause γ. To ensure that these ideas are clear, suppose that our original learned clause set Γ is given by the following, where the ordering ranks the variables alphabetically:
Now if the new learned clause is a b f, we simply add it to Γ. No learned clause need be removed because the conclusion f does not appear in the antecedent of any learned clause in Γ. Having added a b f, suppose that we now learn a c. As in the example above, we add the learned clause to Γ and remove every learned clause whose antecedent includes c, which means that both of the original learned clauses are removed from Γ, although a b f is retained. The new Γ set is:
We have made lexicographic progress. If a is true, the original learned clause set (6) allowed c to be either true or false; the new learned clause set (7) forces c to be false. Reducing the number of possible values for c counts as progress, whatever happens to the possible values for subsequent variables. Suppose that instead of learning a c, we had learned c f. This conflicts with a b f, so we resolve the two together to obtain a b c, which the total order causes us to interpret as a b c. We add this to Γ as in the previous paragraph; once again, lexicographic progress has been made. This restricted example contains the essence of the ideas we will use in the more general case. The general case is more complex because, just as we must ensure that Γ remains acceptable, we would also like to ensure that the partial order remains as flexible as possible. If were to approach a total order, it would tend to force specific interpretations of newly learned clauses, reducing our ability to reverse decisions that were made early in the search. This would mean that we would have relatively little flexibility in modifying the bias; we would always have to simply flip the bias variable that was most recently valued even as evidence accumulated that this variable was valued correctly. We begin by formalizing the notion of lexicographic progress.
Definition A.4 If Γ is a set of learned clauses and is a total order, we will denote by |Γ|x the number of elements of Γ that have conclusion either x or x. We will denote by |>x| the number of y with y > x. If v is the number of variables being considered, we now define the total order size of Γ under to be sizet(Γ, ) = 2v X
x 2|>x||Γ|x (8)
Proposition A.5 If Γ is acceptable for a total order , then 2v sizet(Γ, ) > 0.
Satisfiability and Systematicity
Proof. The first inequality is immediate. For the second, since |Γ|x 1,
sizet(Γ, ) 2v X
But consider the summation in isolation, where we have:
i=1 2i 1 = 2v 1.
Thus sizet(Γ, ) > 0.
Before moving on, let us explain the intuition underlying Definition A.4. The basic idea is that the expression in (8) should reflect the size of the search space remaining to be considered, because X
x 2|>x||Γ|x
counts the number of potential models eliminated by the learned clauses in Γ. That it does should be fairly clear. The unique learned clause in |Γ|x eliminates a value for x. Each such elimination corresponds to the removal of a subtree in the overall search space. The size of the subtree is given by 2|>x|, since |>x| is the number of variables properly following x in the ordering.
Proposition A.6 Fix a total order , and let Γ and Γ be two sets of learned clauses. Assume that there is some x with |Γ|x < |Γ |x but |Γ|y = |Γ |y for all y < x. Now if is a total order that agrees with when restricted to the set {y, z|y, z x}, then sizet(Γ , ) < sizet(Γ, ).
Proof. This is simply a formalization of the lexicographic argument made earlier. To show sizet(Γ, ) sizet(Γ , ) > 0, we have
sizet(Γ, ) sizet(Γ , ) = X
y 2|> y||Γ |y X
y 2|>y||Γ|y (9)
We will show:
1. For any y < x, 2|> y||Γ |y = 2|>y||Γ|y (10)
2. For y = x, 2|> x||Γ |x 2|>x||Γ|x 2|>x| (11)
3. For y > x, X
y 2|>y||Γ|y < 2|>x| (12)
Collectively, these suffice. The terms (10) with y < x have no effect on the difference in (9). The amount contributed by the term (11) with y = x is greater than the amount subtracted by the terms (12) with y > x. Thus the total sum is positive and the result follows. (10) is a consequence of the fact that for y x, |> y| = |>y| and |Γ |y = |Γ|y. For (11), since |Γ |x > |Γ|x, we get
2|>x||Γ|x 2|>x|(|Γ |x 1)
= 2|> x||Γ |x 2|>x|
For (12), the sum is essentially identical to that appearing at the end of the previous proof.
Definition A.7 A partial order will be called a refinement of a partial order if x y x y. A refinement of that is a total order will be called a total refinement of .
Definition A.8 For a set of learned clauses Γ and partial order , we define the size of Γ under , denoted size(Γ, ), to be the maximum of sizet(Γ, ) over all total refinements of .
The following is an immediate consequence of Proposition A.5:
Corollary A.9 If Γ is acceptable for a partial order , then 2v size(Γ, ) > 0.
The point of all this is that we can now state our desiderata for add a bit more precisely. We need:
1. Polynomial space: If Γ is acceptable with respect to and (γ , Γ , ) = add(γ, Γ, ), then we want Γ to be acceptable with respect to . This will allow us to use Proposition A.3 to conclude that Γ can be stored in polynomial space as the algorithm proceeds. and the bias B can obviously be stored in polynomial space as well.
2. Systematicity: If (γ , Γ , ) = add(γ, Γ, ), then we want size(Γ , ) < size(Γ, ). This will allow us to use Corollary A.9 to conclude that add can be invoked at most 2v times before the entire search space is eliminated.
A.3 Systematicity Recall: Procedure 3.6 To compute add(γ, Γ, ):
1 if γ = then return (γ, Γ { }, );
2 select l γ ; // l is the intended conclusion
3 if there is a ρ Γ with ρ = l then
4 return add(resolve(γ, ρ), Γ, )
5 add x l to for each x γ;
7 remove from Γ any c with either c {l, l} = Ø or both c {x|l < x} = Ø and |c | > 1;
8 return (γ, Γ {γ}, )
Satisfiability and Systematicity
Proposition A.10 Suppose that Γ is acceptable with respect to , and that Γ Γ |= γ. Then if (γ , Γ , ) = add(Γ, γ, ), Γ is acceptable with respect to .
Proof. Note first that the procedure terminates. The only possible source of nontermination is the recursive call on line 4, but the conclusion of the resolvent is necessarily < the conclusion of γ. The number of recursive calls is therefore bounded, and the procedure terminates. To show that Γ is acceptable, we consider first the nonrecursive case where there is no ρ Γ with conclusion l. For the three conditions of Definition A.1, we have:
1. To show that parses c for any c Γ , there are two cases.
If c is the learned clause γ, we know by construction that includes y x for every y appearing in c.
For the case where c Γ, we need the following lemma:
Lemma A.11 Suppose that x u but x u. Then l < x and l < x.
Proof. Suppose that we denote by + the partial order constructed in line 5 of the procedure, where additional arcs are added to the original . Since + is a refinement of , it is clear that if x u, we will have x + u as well.
We now have = + l , so the only way to have x + u but not x u is if l <+ x, so that l < x as well.
To show that l < x, note that if l < x, then l <+ x because arcs are removed in the construction of < from <+. But if l <+ x we must also have l < x, since the arcs added in the construction of <+ are all of the form z < l.
Returning to the proof of the proposition, suppose that y, z c with y and z incomparable under . Now if either l < y or l < z, the clause in question would have been eliminated by line 7 of the add construction, so it follows that l < y and l < z.
We claim that there is no u c with y < u. If there were such a u, then we would have y u by virtue of the lemma, so y c . Similarly there is no u c with z < u. Thus y, z c and therefore y = z, since parses c.
2. To show that no two elements of Γ have the same conclusion, note first that no two elements of Γ have the same conclusion under the original partial order . This is true by construction for two elements of Γ. If the new learned clause γ were to have the same conclusion as an element of Γ, we would have had Γ Γ |= γ.
The only way for the conclusion of c Γ under to be different than the conclusion under is if there are y, z c with y < z but z < y. But then by the lemma we must have l < y and l < y. Since l < y < z, l < z and therefore l < z. But now the weakening in line 6 causes y and z to be incomparable under < .
3. Finally, we must show that Γ |= Γ , or that Γ and Γ are consistent.
To see this, we can use the fact that Γ Γ |= γ to find a complete assignment P of values to variables that satisfies Γ, Γ and γ. Note that P must both satisfy the antecedent of γ and include l in order to falsify γ.
We now claim that P|l satisfies both Γ and Γ . P|l continues to satisfy the antecedent γ , since l γ , and P|l satisfies γ since it contains l. We therefore need only consider c Γ that appear in Γ as well.
Since P satisfies both c and c , the only way that P|l might not satisfy c and c is if P|l fails to satisfy c because the conclusion of c is no longer satisfied, or P|l fails to satisfy c because one of the terms in the antecedent is no longer satisfied. The first of these cannot happen because P|l differs from P only at l, and l cannot be c s conclusion. The second cannot happen because if l appears in the antecedent of c, c would have been removed in the construction of Γ .
This concludes the proof that Γ remains acceptable if a nonrecursive path is taken through add; to see that the recursive path remains acceptable as well, we need to show that the fundamental requirement that Γ Γ |= γ holds for the recursive call. In other words, the proof will be complete if we can show that Γ Γ |= resolve(γ, ρ) in line 4. Suppose that Γ Γ |= resolve(γ, ρ), and note that
resolve(γ, ρ) = γ ρ
Now ρ Γ , so we must therefore have
But this would imply Γ Γ |= γ, contradicting the assumptions of the proposition. The proof is complete.
Proposition A.12 Suppose that Γ is acceptable with respect to , and that Γ Γ |= γ. Then if (γ , Γ , ) = add(Γ, γ, ), size(Γ , ) < size(Γ, ).
Proof. We need the following lemma:
Lemma A.13 Let be a partial order and x a point. Then if T x is a total refinement of x, there is a total refinement T of such that T x and T agree when restricted to {y, z|y, z T x x}.
Proof. Define y z if and only if any of the following three conditions hold:
1. y, z T x x and y T x z,
2. x