# satsisfiability_and_systematicity__398b9e35.pdf

Journal of Artiﬁcial Intelligence Research 53 (2015) 497-540 Submitted 01/15; published 07/15

Satisﬁability and Systematicity

Matthew L. Ginsberg Connected Signals, Inc. 355 Goodpasture Island Road, Suite 200 Eugene, Oregon 97401

We introduce a new notion of systematicity for satisﬁability algorithms with restarts, saying that an algorithm is strongly systematic if it is systematic independent of restart policy but weakly systematic if it is systematic for some restart policies but not others. We show that existing satisﬁability engines are generally only weakly systematic, and describe flex, a strongly systematic algorithm that uses an amount of memory polynomial in the size of the problem. On large number factoring problems, flex appears to outperform weakly systematic approaches.

1. Introduction

Once upon a time, dpll (Davis, Logemann, & Loveland, 1962; Davis & Putnam, 1960) was the algorithm of choice for solving Boolean satisﬁability problems, or sat. There were three reasons for this. First, dpll was systematic in that if a particular problem had a solution, dpll would eventually ﬁnd it. If a problem had no solution, dpll would identify it as unsatisﬁable. Such properties are essential if we want to simply invoke a solver, allow it as long as necessary to solve a problem, and be assured that some answer will always result. Second, dpll used an amount of memory polynomial in the size of the problem. The amount of time used was, of course, exponential given the NP-complete nature of sat. But the memory used was polynomial, and therefore only logarithmic in the running time. And ﬁnally, dpll worked. Its success was only a harbinger of the uses to which later sat engines would be put, but the range of problems to which dpll could practically be applied exceeded that of any previous general-purpose NP algorithm. Today that range is wider still, including microprocessor veriﬁcation (Kaivola, Ghughal, Narasimhan, Telfer, Whittemore, Pandav, Slobodov, Taylor, Frolov, Reeber, & Naik, 2009), device driver validation (Moura & Bjørner, 2010) and many others (Biere, Heule, van Maaren, & Walsh, 2009). Two changes led to signiﬁcant algorithmic improvements. The ﬁrst was the recognition that a dpll backtrack was best thought of not as a move in a tree-like search space, but instead as a resolution-based inference step. This idea appeared ﬁrst in Stallman and Sussman s (1977) work on dependency-directed backtracking and Doyle s subsequent (1979) work on truth maintenance. All of these earlier authors, however, described algorithms that might accumulate an exponential number of resolution consequences as the search proceeded. Ginsberg s (1993) work on dynamic backtracking was the ﬁrst to show that systematic nonchronological inference methods could be constructed using only a polynomial amount of memory, describing the algorithm as an extension of backjumping (Gaschnig,

c 2015 AI Access Foundation. All rights reserved.

1979). Bayardo and Schrag (1997) continued to work on the approach, renaming it (sensibly) relevance-bounded learning. These ideas are currently known as conﬂict driven clause learning, or cdcl, a name that was introduced by Ryan (2002) and then popularized by Marques-Silva, Lynce and Malik (2009). The work through 1997 involved using a scheme for retaining learned clauses that guaranteed that only a polynomial number of such clauses would be kept at any particular point in the search. With grasp (Marques-Silva & Sakallah, 1999), this requirement was dropped. Virtually all modern sat solvers use cdcl, generally modiﬁed in a way that either allows an unlimited number of clauses to be collected (as in tinisat, Huang, 2006; or Quest0.5, Lynce, Baptista, & Marques-Silva, 2001) or, more commonly, in a way that no longer guarantees systematicity (z Chaff, Moskewicz, Madigan, Zhao, Zhang, & Malik, 2001; rsat, Pipatsrisawat & Darwiche, 2007; minisat, S orensson & Een, 2005, and many others). The abandonment of systematicity was linked to the inclusion of restarts in satisﬁability algorithms. This appears to have been due to Gomes, Selman and colleagues (1998); the basic idea is to escape from areas where the solver is thrashing (searching nearly endlessly without ever identifying the real source of a problem) by simply restarting from scratch. Put somewhat diﬀerently (but equivalently), restarts provide an out when the solver makes a mistake near the root of the search tree, what Harvey (1995) refers to as an early mistake. In the setting provided by dpll, restarting the prover will inevitably result in nonsystematicity. This is because crucial information about the state of the search is stored in the position in the search tree, and restarting the search abandons this information. The work on restarts has matured in two separate directions. First, there has been a considerable body of work examining the theoretical properties of search with restarts in the cdcl case where all of the clauses are retained as the search proceeds. Such a search is obviously systematic, since only a limited number of new clauses can be learned before either a solution is found or the empty clause is derived (proving the original problem to be unsatisﬁable). There are more interesting conclusions to be drawn, however, The fact that restarts can prevent thrashing appears to be related to the fact that they appear to allow foxr more ﬂexible inference schema than previous methods. This work was begun by Beame and colleagues (2004), continued by Buss et al. (2008) and Hertel et al. (2008), and extended further by Pipatsrisawat and Darwiche (2011), who showed that cdcl with restarts is equivalent to general resolution. Cdcl without restarts is not known to be generally able to produce proofs exponentially shorter than tree-based resolution, and general resolution can lead to exponentially shorter proofs than tree-based resolution in many instances (Ben Sasson, Impagliazzo, & Wigderson, 2000).1

In addition to this theoretical work, practical work was proceeding as well, focusing in large measure on the development of strategies identifying points at which the search should be restarted. It was realized fairly quickly that virtually any restart policy was an improvement on not restarting at all. Luby et al. (1993) had developed a restart policy that they showed was within a logarithmic factor of optimal on a wide range of problems. The

1. It is also not known that tree-based resolution can t polynomially simulate general resolution (Bonet & Johannsen, 2014).

Satisfiability and Systematicity

Luby restart policy was shown to outperform a variety of alternatives in the sat domain by Huang (2007). The Luby restart policy involves gradually (but not monotonically) increasing the number of backtracks between restarts; roughly speaking, the 2kth restart is the ﬁrst that allows 2k backtracks before restarting once again. We will refer to the eﬀort undertaken between consecutive restarts as a probe; the number of backtracks between the restarts will be called the size of the probe. The Luby approach has two useful consequences. The ﬁrst is that the total number of backtracks grows quadratically with the number of restarts because of the infrequency of probes of large size. The second consequence of the Luby policy is that the search is once again systematic, assuming that the original procedure, without restarts, was itself systematic. The reason is that the Luby restart policy will eventually involve probes of suﬃciently large size that systematicity is guaranteed. This is a distinction worth formalizing. Given an algorithm A that involves a restart policy, we will say that A is strongly systematic if it is systematic independent of the restart policy chosen. We will say that A is weakly systematic if it is systematic for some restart policies but not for others, and we will say that A is nonsystematic if it is not systematic for any restart policy. The connection between systematicity and the amount of memory used is suﬃciently important that we present the following classiﬁcation of some sat solvers:

Strongly Weakly Systematic Systematic Nonsystematic Exp-space tinisat Polyspace ? minisat wsat

By exp-space, we mean algorithms that demonstrably may take an amount of space exponential in the size of the problem being solved; polyspace algorithms are those that remove learned clauses and thus could, at least in theory, use only polynomial amounts of space as a result. Memory is cheap these days; perhaps exp-space is not as much of a drawback as it once was. But this turns out not to be the case, for the following reason. Most of the CPU time consumed by sat engines is spent in the unit propagation procedure, which ﬁnds obvious consequences of the variable assignments made at any particular point in the search. Although there have been a variety of engineering improvements that speed this process considerably (Moskewicz et al., 2001), the overall time can still be linear in the size of the accumulated clausal database. As this database grows exponentially in size, the time needed for a single inference can therefore grow exponentially as well. Given tinisat s existence in the upper left, there has been little interest in an exp-space solver with weaker systematicity properties. But tinisat cannot really be considered a production solver; on diﬃcult problems, the number of stored clauses grows impractically and the solver itself essentially grinds to a halt. We have not previously discussed the wsat family of algorithms (Selman, Kautz, & Cohen, 1993), nor shall we. These algorithms are very diﬀerent in spirit from the ones that we have described thus far, working with complete variable assignments and using hill

climbing to attempt to gradually convert these assignments to solutions of the problems being investigated. While competitive in the 1990 s, the progress on systematic algorithms has been such that these fundamentally nonsystematic approaches are currently given relatively little attention, although recent work has shown that the connection between these algorithms and the cdcl approach is closer than one might think (Goultiaeva & Bacchus, 2012). Most modern sat solvers join minisat in the middle of the bottom row, weakly systematic but using only a relatively modest amount of memory to get the job done. Removing some clauses is essential in any practical solver and there simply has been no known strongly systematic algorithm capable of doing so. The general view has been that although not strongly systematic, these systems work exceptionally well in practice, including managing to produce proofs of unsatisﬁability when necessary. It should be noted, however, that this loss of systematicity is more than a simple theoretical concession. It is obviously important, for example, that distinct probes begin their search from distinct locations in the search space. Should those starting locations be close in some sense, it is also important that the corresponding probes overlap as little as possible. We believe that the experimental results in Section 4 support this intuition. As we have remarked, our work in dynamic backtracking introduced the ﬁrst nonchronological, polyspace, weakly systematic sat engine. Our goal in the current paper is to describe a new cdcl-based sat engine flex that is both polyspace and strongly systematic. The outline of the paper is as follows. In Section 2, we introduce the necessary notation and describe a standard cdcl algorithm. We also describe the choices typically made in implementing such algorithms. Section 3 describes the flex algorithm; the proof that it is both systematic and polyspace is deferred to an appendix. We discuss the additional data structures needed to guarantee systematicity, and the restrictions that must be placed on some standard choices as a result. Experimental results are presented in Section 4 and concluding remarks are in Section 5.

2. Background, Notation and Existing Work

By a satisﬁability problem, we will mean a collection of clauses over Boolean variables such as:

Each of the clauses is a disjunction, and the goal is to value all of the variables so that each clause is satisﬁed. In this particular example, we have to make a and b true, and c false.

Deﬁnition 2.1 A variable is a letter, such as a, b, c and so on. A literal is a variable or the negation of a variable. A clause is a disjunction of literals; the (unsatisﬁable) empty clause will be denoted . A theory is a conjunction of clauses.

Satisfiability and Systematicity

An interpretation or bias is an assignment of true or false to each variable in a theory. An interpretation will be said to be a model of a theory T if every clause in T is satisﬁed by the interpretation.

We will often call theories sat instances or satisﬁability problems, and will think of clauses simply as sets of the literals they contain, writing, for example, that a a b. We will similarly think of theories as sets of clauses, and will think of the size of a theory T as simply the number of clauses in T, denoting it by |T|.2

Modern satisﬁability algorithms work by manipulating partial assignments of values to variables; the idea is to either extend the partial assignment to value more variables (and eventually to a model of the theory in question) or to show that a particular partial assignment cannot be extended to a model.

Deﬁnition 2.2 A partial assignment is a sequence P = l1, l2, . . . , ln of literals such that if i = j, then li = lj and li = lj. A variable v will be called valued by P if v P or v P. A literal l will be called satisﬁed by P if l P, and will be called falsiﬁed by P if l P. A clause c will be called falsiﬁed by P if every literal l c is falsiﬁed by P.

In practice, the variables assigned values in a partial assignment are typically annotated with reasons of some sort. A variable value may be the result of a choice made by the search engine, or the obvious result of previous choices:

Deﬁnition 2.3 An annotated partial assignment is a sequence (l1, c1), (l2, c2), . . . , (ln, cn) of pairs, where each pair contains a literal li and a clause ci. The clause ci will be called the reason for li, and the set {c1, . . . , cn} { } will be denoted R(P) and called the reasons of the annotated partial assignment. Given an annotated partial assignment P, the ith pair (l, c) P will be called justiﬁed if either c = or l c and every literal in c other than l is lj for some j < i. An annotated partial assignment P will be called justiﬁed if every (l, c) P is justiﬁed.

Note that a justiﬁed partial assignment can always contain pairs of the form (l, ), since the justiﬁcation condition is trivially satisﬁed. We will use this to indicate that l is a choice made by the search algorithm:

Deﬁnition 2.4 If P is an annotated partial assignment and (l, ) P, we will say that l is a choice in P. The set of choices in P will be denoted by C(P). The set of negations of choices in P will be denoted by C(P).

Similarly, a justiﬁed partial assignment can always contain pairs of the form (l, l), which we will use to indicate that l itself has been proven to be a consequence of the theory being investigated. In general, all of the partial assignments that we consider will be justiﬁed.

2. From a complexity perspective, it might seem more natural to think of the size of a theory T as the total number of literals in the clauses in T. Our choice is a bit more convenient notationally and for a theory involving v variables, the total number of literals in T is obviously at most v|T|.

Deﬁnition 2.5 Let P be an annotated partial assignment. Then if (l, c) is the ith element of P, we will say that the assertion level of l is

|{j i|cj = }|

For a clause α, we will say that the assertion level of α is the maximum of the assertion levels of the literals in α.

In other words, the assertion level of a literal is the number of choice variables that precede it in P.

Deﬁnition 2.6 Let P be an annotated partial assignment for a theory T. A clause α = l1 ln such that no literal in α is satisﬁed by P, and exactly one literal is unvalued by P, will be called unit.

If a single literal l α is unvalued by P for some unsatisﬁed clause α, then any extension of P that is a model of T must include l, and P will remain a justiﬁed partial assignment if we add (l, α) to its end. This is called unit propagation:

Deﬁnition 2.7 Unit propagation is a procedure unit that accepts as input a theory T and a justiﬁed partial assignment P. It returns a pair (P , c) where P is a maximal justiﬁed partial assignment that extends P, so that P P , and either c = true or c T is falsiﬁed by P .

If unit propagation returns c = true, unit propagation succeeded . If it returns a partial assignment P and a clause c falsiﬁed by P , a contradiction has been found and c is the reason.

Deﬁnition 2.8 We will say that the backtrack level of α is zero if every literal in α has the same assertion level, or the second greatest of the assertion levels of literals in α otherwise. Given an annotated partial assignment P and clause α, we deﬁne the result of backtracking to α, to be denoted backtrack(P, α), to be the result of removing from P any literal with assertion level greater than the backtrack level of α.

If α is a clause with backtrack level n = 0, there is clearly some literal l α such that the assertion level of l is greater than n. If this literal l is unique, then backtracking to the backtrack level of α will leave every element of α other than l unchanged, so that α itself (which was a contradiction prior to the backtrack) will be a unit clause after the backtrack is complete. Most sat implementations spend the bulk of their time in the unit procedure; this is a consequence of the fact that they must search the theory T for unit clauses. Moskewicz et al. s (2001) introduction of so-called watched literals in z Chaff sped this process considerably, but it continues to consume the bulk of the computational resources in implemented systems. Eventually, unit propagation either terminates without ﬁnding a contradiction (presumably because no unit clauses remain in T), or returns a clause c T that is falsiﬁed by the current justiﬁed partial assignment P. In the latter case, the satisﬁability engine needs

Satisfiability and Systematicity

to respond to this new information and modify P in some way. Dpll, for example, would modify P simply by returning to the most recent choice point and then resuming the search. Cdcl implementations work diﬀerently. They take P and c and produce a new clause that is entailed by T and that captures, as best possible, the reason for what went wrong. As an example, suppose that P gives α = a b c as the reason for c (so that P contains a, b and therefore c), and that β = a c appears in T. This clause is falsiﬁed by P. We can now resolve α and β to get the new clause a b, which is also falsiﬁed by P and suggests (correctly) that either a or b is the source of the problem even if many other variable choices appear subsequently in P itself. This may enable a much deeper backtrack than simply returning to the most recent choice point.

Deﬁnition 2.9 Let P be a justiﬁed partial assignment, and c a clause falsiﬁed by P. By a (P, c)-conﬂict we will mean any clause γ that is a logical consequence of c together with the reasons in P and that is also falsiﬁed by P, so that c R(P) |= γ and P |= γ.

As shown above, once unit propagation identiﬁes a contradiction, we can use resolution to derive a (P, c)-conﬂict where P is the current partial assignment and c is the conﬂicting clause in T. But we can also sometimes do more. Consider the following theory:

and imagine that after e and f are true, we choose to make a true as well. Now unit propagation allows us to conclude b and then c from b (and e), and also d and then c from d (and e and f). Now a single resolution of (2) (the reason for c) and (4) (the reason for c) will allow us to conclude b d e f (5)

But we can we go further, resolving (5) with the reason (1) for b to get a d e f and then also resolving with the reason (3) for d to get simply a e f. This last conclusion correctly identiﬁes a itself as the source of the problem once e and f are both true. We see from this example that a single (P, c) pair may have many conﬂicts and we will generally need to identify a single such conﬂict to continue the search. In general, we will want to select a conﬂict that has a single literal at its assertion level. That will allow the algorithm to redraw the conclusion in question after backtracking to the backtrack level of the conﬂict. Continuing with our example, suppose that the levels of the choice literals involved were 1 for e and f and 4 for the choice literal a. As we explained, all of the following are possible learned clauses in this case:

clause assertion literals at backtrack level assertion level level b d e f 4 b, d 1 a d e f 4 a, d 1 a e f 4 a 1

It is only if we learn the last of these clauses that we will be able to draw a unit conclusion ( a) after backtracking to level one.

Deﬁnition 2.10 Given a justiﬁed partial assignment P and a clause γ, let n be the assertion level of γ. Then γ will be called a unique implication point, or uip, if it has a unique literal whose assertion level is n.

Uip s were introduced in grasp (Marques-Silva & Sakallah, 1999), and experience has shown that part of the strategy for selecting a conﬂict clause should be to always select a uip. This can be done by resolving away reasons for any multiple literals at the assertion level of γ.

We can now describe a cdcl-based inference engine. As we do so, recall from Deﬁnition 2.1 that a bias is a complete assignment of values to variables. If B is a bias and v a variable, we will denote by B(v) the value speciﬁed by the bias for v, so that B(v) B and B(v) = v or B(v) = v. We also deﬁne:

Deﬁnition 2.11 For a bias B and set of literals S, we deﬁne the result of restricting B to S, to be denoted B|S, to be B|S = B S { l|l S}.

Informally, the result of restricting a bias is to label all of the literals in S as true, removing any negations that appeared in the bias before the restriction was performed.

The bias is intended to capture our current guess as to the most likely values in a model of T. So when we extend our partial assignment to take a value at a new variable, we choose the variable and then use the value suggested by the bias. A speciﬁc example of this idea was introduced by Pipatsrisawat and Darwiche (2007) in rsat and was called phase saving in that work.3

3. A similar idea was introduced slightly earlier by Selmann and Ans otegul (2006); earlier still, Beck (2005) had introduced the idea of saving some preferred values to use in the next portion of the search.

Satisfiability and Systematicity

Procedure 2.12 (CDCL) Let T be a theory, Γ a set of clauses such that T |= Γ, and B a bias for T. Then to compute cdcl(T, Γ, B, n), one of: if T is shown to be unsatisﬁable, a model P of T if one is found, or unknown if no solution is found after n steps:

2 while i < n do

3 (P, c) unit(T Γ, P);

4 while c = true do

5 if P assigns a value to every variable in T then return P;

6 v a variable unvalued by P; // choice

7 (P, c) unit(T Γ, P, (B(v), ) );

8 γ any (P, c)-conﬂict that is a uip; // choice

11 if Γ then return ;

12 P backtrack(P, γ);

13 Γ Γ discard(Γ, P); // choice

15 return unknown

This algorithm is hardly original with us, but let us give an explanation in any event. As the algorithm proceeds, P is the justiﬁed partial assignment being considered and Γ is the collection of learned clauses. The algorithm works by extending the partial assignment P until either a solution is found (line 5) or a new clause is learned (line 7 with c = true). In the latter case, we identify a new conﬂict-driven uip γ (line 8), add it to Γ, backtrack until γ is unit, and repeat. We update the bias as the search proceeds; we present in line 10 the rsat choice of updating the bias every time a variable is set by unit propagation. As we will see in Section 3, other choices are also possible.4

After the new clause is discovered and added to Γ on line 9, we discard some elements of Γ (line 13); it is here that we can potentially shrink Γ so that only a relatively small (and potentially polynomial) number of clauses is retained. Then we either give up if we have encountered n backtracks, or continue searching. Recall that for an annotated partial assignment P, R(P) is the set of reasons for literals in P. If (P, c) is a value such as is returned by unit propagation, we will abuse notation somewhat and use R(P, c) to denote simply R(P). We now have:

Lemma 2.13 Suppose that R(unit(T Γ, P)) discard(Γ) = Ø for all Γ, so that we never discard reasons or unit clauses that are about to become reasons. Then throughout the execution of Procedure 2.12, we will have T |= Γ and P will be a justiﬁed partial assignment of T Γ. In addition:

4. The attentive reader may wonder why the test on line 11 does not appear until after γ is added to Γ on line 9 and the bias is updated on line 10. The reason is that flex may perform additional inference at this point.

1. If Procedure 2.12 returns , then T is unsatisﬁable. If Procedure 2.12 returns P for some partial assignment P, then the literals in P are a model of T.

2. For a theory with v variables and n 2v, Procedure 2.12 will not return unknown, independent of the further nature of discard.

Proof. We ﬁrst show the invariants mentioned above. That P is a justiﬁed partial assignment follows from the fact that it is returned by unit; any justiﬁed partial assignment will obviously remain so during a backtrack such as appears on line 12. That T |= Γ follows from the fact that T entails any (P, c)-conﬂict if P is a justiﬁed partial assignment and c T. Now consider the various points at which Procedure 2.12 returns. On line 5, P is a model of T. And on line 11, unit propagation and resolution have collectively derived as a (P, c)-conﬂict, so that T |= and T is unsatisﬁable. That Procedure 2.12 does not return unknown is a bit more subtle. To see this, suppose that we have a partial assignment P. Now we can associate to P a sequence of positive integers n0, n1, . . . , nk , where nj is the number of literals in P with assertion level j. We will call this sequence the size of P and denote it by size(P). Now let n = n0, n1, . . . , nk and m = m0, m1, . . . , ml be two sizes. We will say that m is smaller than n, writing m < n, if one of the following two conditions holds:

1. There is some j with mj = nj and, for the ﬁrst such j, mj > nj.

2. mi = ni for all i k, and there is a j with k < j l and mj > 1.

Informally, if the size of a partial assignment gets smaller, we have made progress in that we have derived more consequences earlier in the search tree, and the unexplored region of the search space has therefore gotten smaller. We formalize this via the following two claims:

1. Denote by Pi the partial assignment after line 3 in Procedure 2.12 for a given value of i. Then size(Pi+1) < size(Pi).

2. If s1 > s2 > > sz is a properly descending sequence of sizes, then z 2v, where v is the number of variables in the theory in question.

Note that these two results suﬃce to complete the proof, since it will follow that the loop in Procedure 2.12 can be repeated no more than 2v times. For the ﬁrst claim, note that the new clause γ is unit after the backtrack on line 12. Since it is unit, it will be a reason in unit(T Γ, P) and therefore cannot be discarded by discard. Now let j be the assertion level of γ. It is immediate that after the unit propagation on line 12, there will be at least one new variable assigned a value at level j. Now suppose that size(Pi) = n0, n1, . . . , nk , and size(Pi+1) = m0, m1, . . . , mj . (The number of assertion levels in Pi+1 is necessarily equal to j.) Pi and Pi+1 agree before the point of the backtrack, so mk = nk for k < j. Now if j k (so that the backtrack brought us into Pi), then mj > nj and size(Pi+1) < size(Pi). If, on the other hand, j > k, then the

Satisfiability and Systematicity

sizes agree up to k and mj > 1 because the conclusion of γ is a unit consequence at this level. This proves the ﬁrst claim. For the second, note ﬁrst that the ordering conditions are unchanged if we append to the end of any size n0, n1, . . . , nk enough 1 s so that P i ni = v. This essentially pretends that we have added values for all subsequent variables, but there have been no unit propagations from those variables. If we write n for the result of extending a sequence in this way, it is easy to see that n < m if and only if n < m. Now let n be a size. We claim that any descending sequence of length at least 2v n0

steps beginning with n and ending in a size m will have m0 > n0. The proof is by induction on v n0. If v n0 = 1, then we have n0 = v 1 and must have n = v 1, 1 . But now the only way to make it smaller at all is to have m = v , and m0 > n0. For the inductive step, suppose that n = n0, 1, . . . . Now after at most 2v n0 1 steps, we will have gotten to n0, 2, . . . by the inductive hypothesis. After at most 2v n0 2 steps, we will have gotten to n0, 3, . . . , and so on. We will therefore arrive at n0, v n0 after at most

i=1 2v n0 i = 2v n0 2

steps. One more step, a total of 2v n0 1, causes n0 to increment. 2v n0 steps thus cause n0 to increment as well. This completes the induction. Now imagine that we add a new variable w to our theory, where w appears in no clauses but is evaluated ﬁrst when the search begins. For our new theory with v +1 variables, given that the size of any partial assignment must have its ﬁrst component incremented after at most 2(v+1) 1 steps, we will either have learned something about w (an impossibility) or derived the empty clause . Thus the second claim follows and the lemma is proved. This proof is very similar in spirit to the proof of Theorem 3.10, our main theoretical result.

In general, Lemma 2.13 appears to be common knowledge in the sat community, although I am unaware of any other published proof of the second claim and it is not completely clear if the sat community knows it to be true.5

Until this point, we have not yet included restarts in our algorithm, but doing so is straightforward. We deﬁne:

Deﬁnition 2.14 A restart policy is a mapping r : IN IN, where IN is the set of positive integers.

The restart policy simply gives the number of backtracks permitted as a function of probe.

5. A similar result, with a similar proof but based on the assumption that no clauses are ever deleted, does appear in Section 4.2 of Zhang s (2003) Ph.D. thesis.

Procedure 2.15 (CDCL with restarts) Let T be a theory and r a restart policy. Then to compute sat(T, n), either the empty clause if T is unsatisﬁable or a model of T:

2 B any bias for T; // choice

4 while true do

5 x cdcl(T, Γ, B, r(i));

6 if x = unknown then return x;

2.1 Theoretical Results We can now show the following:

Proposition 2.16 Let T be a theory involving v variables, and r a restart policy. Then if Procedure 2.15 returns , T is unsatisﬁable. If Procedure 2.15 returns a partial assignment P, then the literals in P are a model of T. In addition:

1. If R(unit(T Γ, P)) discard(Γ, P) = Ø and if there is some i such that r(i) 2v, then Procedure 2.15 will always terminate.

2. If discard(Γ, P) = Γ R(unit(T Γ, P)), then |Γ| v as Procedure 2.15 is executed.

3. If discard(Γ, P) = Ø, then the size of Γ may grow to be exponential in v, independent of the restart policy r.

Proof. The proof of Lemma 2.13 also shows that Procedure 2.15 is correct. Claim 1 of Proposition 2.16 follows immediately from claim 2 of Lemma 2.13. Claim 2 follows from the fact that each variable can have at most one reason in R(P). Claim 3 follows from the fact that the amount of memory used by Procedure 2.15 is linear in the run time if no clauses are ever discarded, and there are known to be many problems for which the shortest resolution proof is of exponential length (Bonet, Pitassi, & Raz, 1997; Haken, 1985, 1995; Krajiˇcek, 1997; Pudlak, 1997; Tseitin, 1970).

Corollary 2.17 There are weakly systematic, polyspace instances of Procedure 2.15. There are strongly systematic, exp-space instances of Procedure 2.15.

Proof. Take discard(Γ, P) = Γ R(unit(T Γ, P)) to get a weakly systematic polyspace instance by virtue of the ﬁrst two claims of Proposition 2.16. If discard(Γ) = Ø, the resulting instance is strongly systematic because any instantiation will eventually run out of new clauses to learn, and may use an exponential amount of memory because of the third claim of Proposition 2.16.

Tinisat is an exp-space, strongly systematic instance. Minisat is a (potentially) polyspace, weakly systematic instance.

Satisfiability and Systematicity

2.2 Practical Concerns In addition to the selection of restart policy, there are three further points where an explicit choice must be made in Procedure 2.12, and one where a choice must be made in Procedure 2.15. In line 6 of Procedure 2.12, a variable is selected for addition to P. In line 8, a particular conﬂict clause is selected for addition to Γ. In line 13, the set Γ is reduced, presumably in order to conserve memory (and thereby speed unit propagation). And in line 2 of Procedure 2.15, an initial bias is selected. Given that much work on sat currently focuses on one implementation of Procedure 2.15 (and therefore of Procedure 2.12) or another, there is a considerable volume of literature on each of these choices. We will summarize that work here, providing additional details in areas where our ideas will constrain the allowed options.

Variable and bias selection Variables are generally selected in a cdcl prover using a heuristic called vsids ( variable state independent decaying sum ) that was introduced in z Chaff (Moskewicz et al., 2001). We continue to use vsids in our work and do not describe it further here. We also need to specify an initial bias, before search or choice has valued any of the variables. There is no real reason to prefer one initial bias over another, especially given that the values are likely to be overwritten as the search moves forward. We will follow minisat s example and set every variable to false in our initial bias.6

Conﬂict clause selection We have already remarked both that a particular (P, c) pair may have many associated conﬂicts, and that it is generally wise to select one that is a uip. That will allow the algorithm to redraw the conclusion in question after backtracking to the backtrack level of the conﬂict, and is central to the proof of Lemma 2.13. But we can sometimes do more. Recall that in our running example, we derived the uip a e f. Suppose also that the actual choice literal at level one was g, with e and f following from g e and g f respectively. Now we can continue the resolution process, using g a

as the learned clause instead of a e f. Should we do so?

Deﬁnition 2.18 A uip γ will be called a decision uip, or duip, if C(P) |= γ, so that every literal in γ is the negation of a choice in P.

Proposition 2.19 Let P be a justiﬁed partial assignment and γ a clause falsiﬁed by P. Then (P, γ) has a unique duip.

Proof. The only way to construct a duip is to resolve literals away until only decisions remain. It is not hard to see that the set of decisions so obtained is independent of the order of such resolutions.

Should we always work with a duip as our algorithm proceeds? Surprisingly, the answer appears to be no; the extra resolutions seem to move the conﬂict away from the real problem more often than not. But one thing we should do is to remove any literals that

6. On naturally occurring problems, the eﬀectiveness of biasing every variable to be false is perhaps rooted in the so-called closed world assumption (Clark, 1978; Reiter, 1978), but that will not concern us here.

are simply unnecessary. So if in our running example the reason for f is e f, it is obviously wise to resolve this into the conﬂict clause being learned, replacing a e f with the properly stronger a e. Zhang et al. deﬁne a ﬁrst uip as follows.7 The basic idea is that we want to deﬁne a ﬁrst uip as a uip that is both minimal and as close to the eventual conﬂict as possible.

Deﬁnition 2.20 For an annotated partial assignment P and conﬂict c, a (P, c)-conﬂict γ that is a uip will be called reduced if no proper subset of γ is a (P, c)-conﬂict.

By close to the eventual conﬂict, Zhang et al. actually mean that the uip in question is as late in the conﬂict graph as possible:

Deﬁnition 2.21 Let P be an annotated partial assignment and suppose that γ1 and γ2 are clauses falsiﬁed by P. We will write γ1 <P γ2 if every literal in γ1 either appears in γ2 or has a reason that includes at least one literal in γ2. A (P, c)-conﬂict that is reduced and minimal under the transitive closure of <P will be called a ﬁrst uip, or fuip.

As remarked above, γ1 <P γ2 if γ1 is closer to the eventual contradiction (and therefore later in the implication graph) than is γ2.

Proposition 2.22 (Zhang, Madigan, & Moskewicz, 2001) Let P be a justiﬁed partial assignment and γ a clause falsiﬁed by P. Then (P, γ) has a unique fuip.

The basic idea of the proof is that we can construct an fuip by resolving just enough to ensure that we are working with a uip and then removing any literals that can be subsumed by other literals in γ. Additional resolutions are not needed and produce new clauses that are not minimal under <P .

Cdcl-based provers can work with any uip, but performance is generally best when fuip s are used (Zhang et al., 2001). The learned clauses interact with the bias; in general, we want to learn a clause that is in conﬂict with the bias (and therefore encourages us to adjust the bias in a useful way).

Deﬁnition 2.23 Given a bias B, a uip γ will be called a bias uip for B, or buip for B, if B |= γ. A buip that is minimal under the transitive closure of <p as in Deﬁnition 2.21 will be called a ﬁrst buip, or fbuip.

In the case where the bias B contains negations of all of the literals in P (as in Procedure 2.12 as written), any uip will be a buip, since the literals in P will always conﬂict with the values in B. So in this case, the fuip will be an fbuip as well. We also have:

Deﬁnition 2.24 Let P be an annotated partial assignment and B a bias. We will say that P is supported by B if C(P) B, so that every choice literal in P appears in B.

7. Zhang s description of an fuip is a bit less formal; Deﬁnitions 2.20 and 2.21 are simply a formalization of their ideas.

Satisfiability and Systematicity

In other words, a bias supports an annotated partial assignment if all of the choices made in P were as suggested by the bias.

Lemma 2.25 Let P be a justiﬁed partial assignment and c a clause falsiﬁed by P, and suppose that B is a bias that supports P. Then any duip for (P, c) is also a buip.

It follows from this and Proposition 2.19 that if the bias supports the annotated partial assignment, it will have at least one buip, and therefore at least one fbuip. It is not diﬃcult to see that this fbuip is unique (basically, we stop resolving as soon as the necessary conditions are met). Fbuip s share the crucial property that they modify the learned clause γ minimally subject to the restriction that γ contradicts the bias and is a uip.

Discard strategy Clause reduction generally proceeds by estimating the likely future value of each clause in Γ in the subsequent search. The least valuable clauses are periodically removed as the search proceeds. We will be free to follow an identical strategy, although some γ Γ will be marked as essential to preserving systematicity. Those clauses will need to be retained independent of their estimated future value. If our procedure is to use polynomial amounts of memory, the number of such clauses must be guaranteed to grow at most polynomially with the size of the problem.

Consider now the search for a polyspace, strongly systematic satisﬁability algorithm. We immediately have:

Observation 3.1 Strongly systematic, polyspace satisﬁability algorithms exist.

The reason for this is obvious. Imagine any implementation of cdcl with restarts, and suppose that we maintain, from one restart to another, a separate partial assignment PD as constructed by dpll. This additional data structure requires polynomial memory (at most v clauses, each of which is of size at most v), and if we simply incorporate the clauses in PD into Γ, the resulting algorithm is clearly systematic. In practice, however, this is of essentially no value. The systematicity guaranteed by dpll is completely separate from the inference going on in the individual probes, so it is unlikely that the additional clauses in Γ will do much to constrain the search. In addition, most restart policies restart relatively infrequently, so while a guarantee that no more than 2v restarts will be required in satisfying from a theoretical perspective, it may be of little or no value in practice.

3.1 Pure FLEX Far more interesting would be to ﬁnd an instance of Procedure 2.15 itself that was strongly systematic. Showing that to be possible is our main theoretical result in this paper. The basic idea behind our method is that we will manage the set of learned clauses so as to retain enough information to ensure that the solver is making progress in a lexicographic sense, where we identify some variables as being lexicographically earlier than others. The lexicographic ordering is quite weak, and is weakened further when possible as the search

proceeds. So, for example, if we make progress by eliminating one possible value for the variable a, then we can drop any restrictions on the relative ordering of variables that follow a (although we do need to remember that they do indeed follow a). To get a somewhat better feel for this, imagine that we have ordered the variables in our theory in some way, so that the m variables are ordered as v1 < v2 < < vm. We will now view it as progress if we eliminate a value for the variable vi without introducing new possible values for any vj with j < i. If we can repeatedly do this, we will eventually manage to eliminate a value for v1, and that value will remain eliminated as the search proceeds. Then we will eventually eliminate one of the two values for v2, and so on. We will say that we are making lexicographic progress when this happens. In fact, we don t need to maintain a complete ordering on the variables in question. When we make progress by eliminating a value for vi, all we really need to do is to remember which variables precede vi in the ordering (since their eliminated values must remain eliminated). Variables following vi in the ordering can be reordered if the subsequent search suggests doing so. To formalize this, we maintain a partial order on the variables appearing in the theory as Procedure 2.12 is executed.8 When we learn something new about a particular variable x, we should, as described above, be free to discard information regarding the relative ordering of variables that follow x in the partial order. Since we have eliminated a value for the variable x, this further ordering information is no longer needed. We formalize this as follows:

Deﬁnition 3.2 If is a partial order and x is a point, we deﬁne the weakening of at x, to be denoted x, to be the partial order given by y x z if and only if y z and either y = z or x < y.

In other words, the weakened partial order is the same as the original but forgets that y < z if x < y < z, so that both y and z follow x in the partial order.

Proposition 3.3 x is a partial order.

Proof. Reﬂexivity is immediate. For transitivity, suppose that u x v x w. Now if u = v, u x w since v x w. If u = v, then x < u since u x v. But we also have u v w so u w. Thus u x w. Anti-symmetry of x follows immediately from the anti-symmetry of .

As a speciﬁc example, if the initial partial order < is given by a < b < c < d, then weakening this partial order at b produces a <b b, b <b c and b <b d. c and d are incomparable under b, since they both followed b under the original partial order. Now imagine a justiﬁed partial assignment P. If (l, γ) is a pair in P, it is reasonable to think of γ as implying l. In other words, we think of the disjunction γ = W j lj as

l1 li 1 li+1 lk li

8. We note in passing that it is possible, with some diﬃculty, to modify all of our work to deal with the case where is a preorder instead of a partial order. This provides some additional ﬂexibility in the manipulation of the bias, but it is not obvious that that additional ﬂexibility is of practical value. We discuss only the partial order case in the interest of maintaining the clarity of the exposition.

Satisfiability and Systematicity

where li = l is the literal in the (l, γ) pair. While we do not retain the partial assignment from one search probe to another, we will retain the partial ordering and use it to split the antecedent from the conclusion in any particular clause. As a speciﬁc example, if c follows a and b in the ordering, so a c and b c, the clause a b c will be interpreted as a b c. In general, if γ is a clause and a partial order, we will think of the -maximal elements of γ as its conclusion, and the rest of γ as its antecedent:

Deﬁnition 3.4 Let γ be a clause and a partial order on the variables it contains. The conclusion of γ, to be denoted γ , is the disjunction

γ = {l γ| m γ, m > l}

The antecedent of γ, to be denoted γ , is γ = (γ γ ).

The conclusion of the learned clause γ is the disjunction of the maximal literals in γ, and the antecedent is the negation of what s left. Again, an example may help. Suppose that γ = a b c d and the partial order has a c and b d. Now c and d are the maximal elements of γ, so the conclusion γ is the disjunction c d. The antecedent is

= ({a, b, c, d} {c, d})

as one would expect. The learned clause has eﬀectively been rewritten as

We have been cavalier in our use of notation, switching freely between representing γ as a disjunction and as a set.

Deﬁnition 3.5 If Γ is a set of learned clauses and a partial order, we will denote by Γ the conjunction of the antecedents γ for all γ Γ. We will denote by Γ the conjunction of the conclusions γ for all γ Γ.

We are ﬁnally in a position to describe the procedure by which we add a new clause γ to Γ. In Procedure 2.12, this was via the simple Γ Γ {γ} on line 9, but here it is a bit more complex. The addition of γ to Γ can also both modify the partial order and replace γ with a lexicographically stronger clause in some circumstances:

Procedure 3.6 To compute add(γ, Γ, ):

1 if γ = then return (γ, Γ { }, );

2 select l γ ; // l is the intended conclusion

3 if there is a ρ Γ with ρ = l then

4 return add(resolve(γ, ρ), Γ, )

5 add x l to for each x γ;

7 remove from Γ any c with either c {l, l} = Ø or both c {x|l < x} = Ø and |c | > 1;

8 return (γ, Γ {γ}, )

Line 1 handles the edge case where we have actually derived a contradiction. Line 4 calls for a resolution if both l and l appear as conclusions of clauses in Γ, and line 7 says that clauses should be deleted from Γ if they include either l or l in their antecedent, or if they have ambiguous conclusions based on variables that follow l in the partial order. This is in keeping with our overall lexicographic approach: Once we make progress on a particular literal l, we can forget everything about following literals in the partial order, removing both their relative orderings (as per the weakening on line 6 of Procedure 3.6) and clauses that contain them (as per the removal on line 7).

Proposition 3.7 Suppose that Γ contains n clauses involving v variables. Then Procedure 3.6 can be executed in O(v3 + nv2) time.

Proof. Except for the recursion, the most expensive steps are line 6, which can potentially take time O(v2) and line 7, which can take time O(nv). This assumes that we can determine whether x < y in time O(1), but whatever the data structure used to retain the partial order, we can always compute the entire partial order at each iteration of the loop, which will take time at most O(v2). The total time taken without the recursion is thus bounded by O(v2 + nv).

The maximum number of recursive calls is O(v), since each variable will be resolved upon at most once.

Just as the o(v2) unit propagation procedure remains practical on problems containing millions of variables, so we expect the add procedure 3.6 to do so as well. (And for roughly the same reasons: Clauses are not of length v, literals generally appear in only a relative handful of clauses, and so on.)

Procedure 3.6 is the primary diﬀerence between a standard cdcl algorithm and flex; we will replace the act of adding γ to Γ with the more involved Procedure 3.6. When we do so, note that Procedure 3.6 may actually derive the empty clause in line 4; we then return a theory containing the empty clause on line 1, leading to a failure on line 11 of Procedure 2.12. We have:

Satisfiability and Systematicity

Procedure 3.8 (Pure FLEX) Let T be a theory, Γ a set of clauses such that T |= Γ, a partial order on the variables in T, and B a bias for T. Then to compute flex(T, Γ, , B, n), one of: if T is shown to be unsatisﬁable, a model P of T if one is found, or unknown if no solution is found after n steps:

2 while i < n do

3 (P, c) unit(T Γ, P);

4 while c = true do

5 if P assigns a value to every variable in T then return P;

6 v a variable unvalued by P;

7 (P, c) unit(T Γ, P, (B(v), ) );

8 γ any (P, c)-conﬂict that is a uip with B |= γ;

9 (γ, Γ, ) add(γ, Γ, );

11 if Γ then return ;

12 P backtrack(P, γ);

14 return unknown

Note that we require the selection of a buip on line 8 of Procedure 3.8 (since we require B |= γ), and that we adjust the bias only to cater to the conclusion of the new clause γ on line 10. Both of these observations will be important as we discuss practical implementation of our methods. The only modiﬁcation needed to Procedure 2.15 is to initialize and then use the partial order :

Procedure 3.9 (Pure FLEX with restarts) Let T be a theory and r a restart policy. Then to compute sat(T, n), either the empty clause if T is unsatisﬁable or a model of T:

3 B any bias for T;

5 while true do

6 x flex(T, Γ, , B, r(i));

7 if x = unknown then return x;

We ﬁnally have:

Theorem 3.10 Suppose that T is a theory with n clauses involving v variables. Then:

1. Each loop of Procedure 3.8 can be executed in time O(v3 + nv2),

2. |Γ| v as Procedure 3.9 is executed, and

3. For any n with Pn i=0 r(i) 2v, Procedure 3.9 will ﬁnish before completing iteration n.

Corollary 3.11 Procedure 3.9 is strongly systematic and polyspace.

The proof of Theorem 3.10 is lengthy and appears in the appendix.

Note that our contribution here is not to show that it is possible to ensure systematicity by retaining a suﬃcient number of learned clauses. Tinisat has already shown this, and Lecoutre et al. have shown (2007) that retaining a single nogood per restart can suﬃce. But in this other work, the number of restarts grows polynomially with the number of backtracks, and therefore potentially exponentially with problem size. All of these earlier results thus may require keeping an exponential number of learned clauses; our contribution is to show that a polynomial number of nogoods can suﬃce.

3.2 Hybrid FLEX

While Procedures 3.8 and 3.9 achieve the basic theoretical goal set in this paper, a direct implementation of these procedures as written encounters two practical diﬃculties.

First, the add procedure is expensive. Executing it at every backtrack introduces an overall computational burden greater than that of unit propagation.

Second, the search in Procedure 3.8 is guided by the bias B, and that bias is modiﬁed not to match every unit propagation, but only to match the conclusions of newly learned clauses. Experimentation shows that the rsat bias, matching as it does the result of every unit propagation, is more eﬀective in practice. We cannot value newly selected variables as in rsat, since if the bias doesn t support the nogood, there may be no buip on line 8 of Procedure 3.8. In fact, experimentation on number factoring problems shows that approximately 40% of the time, none of the clauses learned in any particular probe can be resolved to a buip.

Nevertheless, there is a straightforward way to deal with these issues and to still preserve the intuition underlying our approach. In each probe, we follow our bias only until a new clause is learned. That new clause will have a buip (because the bias was followed in constructing it), and we then incorporate that new clause into Γ as usual. We then complete the probe using the rsat bias. By doing this, we follow the rsat bias almost all of the time and only call add once per probe, eliminating the substantial computational overhead that this procedure might introduce.

We will refer to this compound procedure as hybrid flex, or simply flex. In order to implement it, we need to maintain two biases, which we refer to as B (for the flex component) and R (for the rsat component), and two sets of learned clauses, which we refer to as Γ (for the flex component) and Φ (for the conventional component). We ﬁnally have:

Satisfiability and Systematicity

Procedure 3.12 (FLEX) Let T be a theory, Γ and Φ sets of clauses such that T |= Γ Φ, a partial order on the variables in T, and B and R biases for T. Then to compute flex(T, Γ, , B, Φ, R, n), one of: if T is shown to be unsatisﬁable, a model P of T if one is found, or unknown if no solution is found after n steps:

2 while i < n do

3 (P, c) unit(T Γ Φ, P);

4 while c = true do

5 if P assigns a value to every variable in T then return P;

6 v a variable unvalued by P;

7 if i = 0 then l B(v);

8 else l R(v);

9 (P, c) unit(T Γ Φ, P, (l, ) );

10 if i = 0 then

11 γ any (P, c)-conﬂict that is a uip with B |= γ;

12 (γ, Γ, ) add(γ, Γ, );

γ any (P, c)-conﬂict that is a uip;

15 Φ Φ {γ};

17 if Γ Φ then return ;

18 P backtrack(P, γ);

19 Φ Φ discard(Φ);

21 return unknown

Procedure 3.13 (FLEX with restarts) Let T be a theory and r a restart policy. Then to compute sat(T, n), either the empty clause if T is unsatisﬁable or a model of T:

3 B R any bias for T;

5 while true do

6 x flex(T, Γ, , B, Φ, R, r(i));

7 if x = unknown then return x;

As in Theorem 3.10 and Corollary 3.11, we have:

Corollary 3.14 Procedure 3.13 is strongly systematic and polyspace for any discard procedure that ensures that Φ is polyspace.

4. Experimental Results

Before we present experimental results based on our methods, let us describe the setting in which these experiments were performed. First, we expect the value of our results to vary with problem size. We have one major advantage over existing methods, in that the strong systematicity of our approach will cause probes late in the search to examine areas of the search space that were unexplored earlier. This advantage is also a drawback, however. Especially early in the search, where the probes are small, the rsat bias is known to be extremely eﬀective and we are likely to see a performance degradation from the fact that we cannot follow this bias on the ﬁrst set of choices in any particular probe. So for easy problems, we would expect our methods to perform worse than conventional ones because of our inability to begin probes by following the rsat bias. For diﬃcult problems, we would expect our methods to perform better than conventional ones, as we manage to reach new areas of the search space while older methods are reexamining previously explored regions. What we don t know is where the transition will be between these two general problem types. In order to evaluate this, we used two separate sets of problems in our experiments. The ﬁrst was the set of 1030 problems in the sat 2013 benchmarks. These problems were solved with a 5000 second cutoﬀin order to ensure that all processes terminated. The second set of problems used were number factoring problems (Bebel & Yuen, 2013). Although subexponential algorithms for number factoring are known to exist (Buhler, Lenstra, & Pomerance, 1993, and a great deal of subsequent work), there is no reason to believe that the techniques involved will bear signiﬁcantly on the sat encodings of such problems. One advantage of using number factoring is that problems of gradually increasing diﬃculty can be run to completion in every case, avoiding the fact that a simple timeout is less informative regarding overall computational eﬃcacy than is the time needed to actually solve a problem. Problems of gradually increasing diﬃculty can be generated by simply increasing the magnitudes of the numbers being factored. The number factoring problems were generated by using an iterated Miller-Rabin test to generate pairs of primes and then using the Purdom/Sabry (2005) sat generator to produce a sat encoding of the problem of factoring the product of these primes. In order to avoid situations where the factoring problem might be too easy, the prime factors were selected so that the sizes of their binary representations were within three bits of one another. For any given problem size (number of bits in the number being factored), ﬁfty instances were generated randomly. As a baseline solver, we used minisat 2.2 (S orensson & Een, 2005). But there were in fact three separate sets of changes that we needed to make to produce flex:

1. First, we converted minisat from C to C++ and changed it to use the C++ standard library in many locations instead of the variety of specialized techniques included in minisat. We will refer to this solver as stlsat.

2. Second, flex can be expected to be more eﬃcient as the number of restarts increases. In minisat and similar solvers, the number of backtracks before restarting is not given

Satisfiability and Systematicity

0.01 0.1 1 10 100 1000 10000

1.22 x ** 0.9916

Figure 1: minisat vs. stlsat, sat competition problems (CPU time in seconds)

as speciﬁed by the original Luby policy (Luby et al., 1993), but instead by multiplying the Luby numbers by some constant (generally taken to be 100, so that the ﬁrst probe involves 100 backtracks). In order to increase the number of probes, we reduced the multiplicative factor to four (so that the ﬁrst probe involves only four backtracks). We will refer to this solver as stlsat4.

3. Finally, we modiﬁed stlsat to produce flex, also with an initial probe size of four.

We present results for the sat 2013 benchmarks ﬁrst, followed by results for number factoring. We compare minisat to stlsat, stlsat to stlsat4, and then stlsat4 to flex. All experiments were run on a 24-core Intel Xeon machine running at 2.2GHz with 256K of L2 cache per processor, 24GB of main memory, and 75GB of swap space. 24 problems were solved at a time, and the swap space was needed to deal with some of the larger sat competition instances, since many were running in parallel.9

The software used in these experiments is available as an online appendix to this paper. stlsat.zip contains the source for stlsat, and ﬂex.zip contains the source for flex. (Stlsat is just flex with the various extensions removed.)

4.1 Sat Competition Problems Figure 1 compares minisat to stlsat for the sat competition problems. The solver that is closer to original minisat will always be on the x-axis, with the modiﬁcation on the y-axis,

9. We did not evaluate the impact of the use of swap on the running times of larger problems, although it obviously might be substantial. In practice, however, once signiﬁcant amounts of swap were used, none of the systems being evaluated was able to solve the problem in question. In addition, running the problems serially would simply have been impractical given the available hardware.

0.01 0.1 1 10 100 1000 10000

1.14 x ** 0.9999

Figure 2: stlsat vs. stlsat4, sat competition problems (CPU time in seconds)

with both axes plotted using a log scale and a dashed line giving the y = x baseline. One point is plotted for each problem solved by either of the two methods. A point above x = y means that the old solver is better; a point below x = y, that the new solver is better.

In many cases, we will also give the best polynomial ﬁt of the form y = axb for ﬁxed a and b. So in Figure 1, stlsat is seen to be 22% worse than minisat at the outset, but the exponent slightly below one indicates that the disparity is shrinking as the problems become more diﬃcult. (Indeed, this is apparent from looking at the graph itself.)

To ﬁnd the polynomial ﬁt, we did not use the standard technique of minimizing the squared vertical distance between a point and the line in question. The reason is that if we ﬂipped the axes, we would then be minimizing the squared horizontal distance to the line, and a potentially diﬀerent ﬁt would be found. To ensure that the same ﬁt was found independent of the axis selection, we minimized the sum of the squares of the actual (perpendicular) distances between the data points and the line in question.

In some cases, we will split the data depending on whether the problem instances being solved were or were not satisﬁable. This is not an interesting distinction here.

These results are as one might expect. Without the ﬁne-tuned data structures of minisat, stlsat performs slightly worse. This is more signiﬁcant on easy problems, and the two solvers appear to be virtually identical on more diﬃcult instances.

Figures 2 and 3 show the impact of reducing the number of backtracks per probe. Overall (Figure 2), there is a 14% cost to making the change, and the cost is nearly independent of problem size. Somewhat curiously, however, the cost is signiﬁcantly diﬀerent for satisﬁable instances (a 22% cost, top of Figure 3) when compared to unsatisﬁable ones (a 7% savings, bottom of Figure 3).

Satisfiability and Systematicity

0.01 0.1 1 10 100 1000 10000

stlsat_4 -- SAT ONLY

stlsat -- SAT ONLY

1.22 x ** 1.0043

100 1000 10000

stlsat_4 -- UNSAT ONLY

stlsat -- UNSAT ONLY

0.93 x ** 0.9999

Figure 3: stlsat vs. stlsat4, sat competition problems (CPU time in seconds)

0.01 0.1 1 10 100 1000 10000

3.60 x ** 0.9935

Figure 4: stlsat4 vs. flex, sat competition problems (CPU time in seconds)

Note that the scatter in these cases is much more signiﬁcant than that of Figure 1. This is as expected stlsat is intended to be essentially unchanged from minisat from an algorithmic perspective, while the change in restart frequency can be expected to have fairly dramatic eﬀects. Note also the collection of points with either x = 5000 or y = 5000, indicating that the problem was solved by one method but timed out using the other.

Finally, Figure 4 compares stlsat4 to flex on the sat competition problems, and flex is about a factor of 3.6 slower (4.53 times slower for satisﬁable instances; 2.01 times slower for unsatisﬁable ones).10 Many more problems time out for flex than for the modiﬁed version of minisat.

This is shown clearly in Figure 5, where we compare the number of problems being solved by stlsat4 and by flex on the sat competition problems. For any speciﬁc time cutoﬀ, stlsat4 solves about 2.5 times as many problems as does flex.

It does appear to be the case, however, that the performance gap is narrowing as the problems become more diﬃcult; for example, flex solves 36 problems in between 3000 and 5000 seconds; stlsat4 solves 42 problems in that time. It is for this reason that we examined number factoring problems as well.11

10. The graphs for satisﬁable and unsatisﬁable instances are not shown; they look basically the same as Figure 4. 11. Note that we cannot simply increase the timeout limit for the sat competition problems; doing so caused only a relative handful of new problems to be solved. If we are to generate an interesting number of more diﬃcult problems, and want the instances to be known to be satisﬁable, number factoring seems to be the simplest way to go.

Satisfiability and Systematicity

0.01 0.1 1 10 100 1000 10000

problems solved

CPU time (secs)

flex stlsat4

Figure 5: stlsat4 vs. flex, sat competition problems (5000 second timeout)

0.0001 0.001 0.01 0.1 1 10 100 1000 10000 100000

stlsat (all SAT)

1.06 x ** 1.0069

Figure 6: stlsat vs. stlsat4, factoring problems (CPU time in seconds)

4.2 Number Factoring

For number factoring, we generated 50 factoring problems of each size from ﬁve to 45 bits. These problems were then run through stlsat, stlsat4 and flex.

1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 100000

stlsat_4 (all SAT)

1.41 x ** 0.9743

Figure 7: stlsat4 vs. flex, factoring problems (CPU time in seconds; 5000 second timeout)

The (relatively uninteresting) result of comparing stlsat and stlsat4 is shown in Figure 6. As for the sat competition problems, reducing the size of the ﬁrst probe to four backtracks leads to a small degradation in performance on satisﬁable problems. We include this primarily to ensure that the change in probe size is not responsible for the results in the following ﬁgures. Flex and stlsat4 are compared in Figure 7. Here, we show only problems that one method or the other was able to solve in 5000 seconds or less, although exact times to solution are used for both solvers. The ﬁgure also includes dotted lines indicating the 5000 second timeout for either solver. The overall results are similar to those of the previous section, although flex s performance is improved somewhat. On sat competition problems, it was 3.6 times slower overall and 4.53 times slower for satisﬁable problems speciﬁcally. On number factoring problems, flex is only 41% slower than is stlsat4. The curve ﬁt is being dominated by the large number of easy problems in this case. In addition, it appears that flex is actually doing better than stlsat4 as the problems become more diﬃcult; many more problems appear to be timing out with stlsat4 (points to the right of x = 5000 seconds in Figure 7) than with flex (points above y = 5000 in the ﬁgure). This suggests that instead of looking at the easiest factoring problems, we look at the hardest ones. This is done in Figure 8, where we consider only problems for which one of the two solvers required at least 5000 seconds to reach a solution. The results are strikingly diﬀerent. A glance shows that flex is in general solving problems more quickly than is stlsat4; in fact,

Satisfiability and Systematicity

10 100 1000 10000 100000

stlsat_4 (all SAT)

0.24 x ** 1.0010

Figure 8: stlsat4 vs. flex, factoring problems (CPU time in seconds; 5000 seconds or harder)

100 1000 10000 100000

stlsat_4 (all SAT)

0.13 x ** 1.0101

Figure 9: stlsat4 vs. flex, factoring problems (CPU time in seconds; 10,000 seconds or harder)

5 10 15 20 25 30 35 40 45

average CPU time (secs)

Figure 10: stlsat4 vs. flex, average time to factor an n-bit number

the line of best ﬁt shows that flex is about four times as fast is as stlsat4. Restricting to problems that took at least 10,000 seconds makes the distinction sharper still; now (Figure 9) flex is eight times faster than is stlsat4.

Similar results appear in Figure 10, which shows the average running time needed by flex or by stlsat to factor a number of n bits. Flex is slower for smaller numbers but faster for larger ones. It is unfortunately impractical to extend this graph much further, since the average runtime for stlsat on a single 45-bit factoring problem is already approximately six hours.

As previously, we show the number of diﬃcult problems (40 bits or more) solved by the solvers as a function of time in Figure 11. With a cutoﬀof less than 1000 seconds, stlsat4 performs best; as the problems get harder, this is reversed.

5. Conclusion and Future Work

The fundamental claim that we have made in this paper is that the sat community did not need to abandon the systematicity of its methods when it switched from dpll-style provers to cdcl-based ones. We showed that it was possible to simultaneously guarantee systematicity, maintain a polynomially-sized set of learned clauses, and restart the prover as frequently as desired. Furthermore, we showed that while a hybrid between a pure systematic prover and a more typical cdcl engine incurs signiﬁcant computational cost in achieving these goals, that cost and more is recovered and run time appears to be reduced for diﬃcult problems.

Looking forward, there are two obvious ways in which this work can be extended.

Satisfiability and Systematicity

1 10 100 1000 10000 100000

problems solved

CPU time (secs)

flex stlsat4

Figure 11: stlsat4 vs. flex, factoring problems (40 bits or larger)

First, we have seen that the value of our methods increases as the number of restarts increases. One of the principal reasons to limit the number of restarts in a cdcl-based prover is that restarting the prover is expensive. Recent work by Ramos et al. (2011), however, suggests that a signiﬁcant part of this expense can be avoided. The basic idea is that any particular probe may well begin its search by repeating many of the literal choices and unit propagations from the previous probe. In such a case, there is no reason to backtrack past the point at which the two probes ﬁrst diverge. Incorporating this idea into our methods should improve their value further by enabling a larger number of restarts and thus making the systematic component that we oﬀer more eﬀective.

Second, we hope that the overall approach that we have proposed guaranteeing systematicity by retaining a partial order and an appropriate set of learned clauses ﬁnally puts to rest the notion that sat engines achieve systematicity only by careful management of a search space deﬁned by partial assignments. It is instead the learned clausal database that guarantees systematicity through semantic methods.

This general conclusion should lead to a variety of new algorithmic choices. As an example, the values represented by the bias are presumably best thought of as probabilistic estimates regarding the likelihood of any particular literal appearing in a model of the theory in question. This probabilistic approach gives a very diﬀerent ﬂavor to satisﬁability in general, and suggests that recent results from Monte Carlo Tree Search, or mcts (Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, & Colton, 2012; Kocsis & Szepesv ari, 2006, and others) may well ﬁnd a place in sat as well.

Acknowledgments

I would like to thank my Connected Signals and On Time Systems coworkers, especially Aran Clauson and Heidi Dixon, for useful technical advice and assistance. I would also like to thank the anonymous referees for their many carefully considered comments and suggestions, which improved this paper enormously. Finally, I would like to thank David Mc Allester for many contributions to this work over the years; the work presented here draws greatly from Mc Allester s original work (1993) on partial-order dynamic backtracking.

Appendix A. Proof of Theorem 3.10

The proof of Theorem 3.10 will depend on certain invariants that are maintained as Procedure 3.8 is executed. We begin by discussing those invariants.

A.1 Learned Clauses To understand them, suppose that a set of learned clauses Γ has been derived during the course of a systematic search. What conditions would we expect Γ to satisfy? To answer this, imagine that we have a partial assignment P and that we have derived a new learned clause γ that we are about to add to Γ. We certainly expect P to satisfy the existing learned clauses. We will also assume that P satisﬁes the antecedents of those learned clauses; this is in keeping with the idea that the learned clauses should continue to be relevant to the portion of the search space being explored. What about the new clause γ? First, it will turn out that γ should have a unique conclusion under the partial order . In addition, the new learned clause γ should be new knowledge in the sense that it eliminates a portion of the search space that is still admissible under Γ. In other words, we expect the conclusion γ to be distinct from the conclusions Γ of other elements of Γ. Formally, we have:

Deﬁnition A.1 We will say that a partial order parses a learned clause γ if |γ | = 1. We will say that a set of learned clauses Γ is acceptable for a partial order if the following conditions hold:

1. For any γ Γ, parses γ,

2. For any γ1, γ2 Γ, if γ 1 = γ 2 , then γ1 = γ2, and

The ﬁrst condition repeats the requirement that the conclusions γ are all individual literals. The second condition says that Γ does not contain two distinct learned clauses with the same conclusion. The third says that it is at least possible to ﬁnd a partial assignment P as described above, since the collection of learned clauses Γ is consistent with the antecedents Γ of those learned clauses.

Lemma A.2 If Γ is acceptable for a partial order , then Γ contains at most one learned clause with any particular variable x in its conclusion.

Satisfiability and Systematicity

Proof. If Γ contains clauses with both x and x as conclusions, it will be impossible to have both Γ and Γ , contradicting the requirement of acceptability.

Proposition A.3 If Γ is a set of clauses involving v variables and is acceptable for a partial order , then |Γ| v.

A.2 Search Space Size Before considering the general way in which the new clause γ allows us to improve the bias, and the way in which the new clause is incorporated into Γ, let us examine a simple special case where is in fact a total order on the variables in question. This means that any particular learned clause γ is always interpreted as forcing the value of the single variable it contains that occurs latest under the ordering . Now given a set of learned clauses Γ, we extend P until we either solve the problem or derive a new learned clause γ. Given the bias B, we are assuming that B |= γ so that the learned clause allows us to improve the bias, and we have already remarked that we assume that B is consistent with Γ Γ . It follows that γ is consistent with Γ Γ as well. If l is the conclusion of γ, how are we to update Γ? Note ﬁrst that there can be no α Γ with l as its conclusion. If there were, then we would have Γ Γ |= l and therefore Γ Γ |= γ also. There are now two cases: Γ may contain a learned clause with conclusion l, or Γ may contain no such learned clause. If Γ does not contain a learned clause with conclusion l, we would like to simply add γ to Γ. But doing so may cause Γ to become unacceptable, since Γ may contain a learned clause with l in its antecedent. If we simply add γ to Γ, the result will be unacceptable with respect to because it will be impossible to satisfy all of the elements of Γ while simultaneously satisfying their antecedents. As an example, suppose that Γ contains the learned clause a c d, the antecedent of which is a c. If we learn a c and simply add it to Γ, then Γ Γ includes both a c and a c, a contradiction. What we can do in this case is to remove from Γ every learned clause that has l in its antecedent; call the result Γ . We now take add(γ, Γ) to be Γ {γ}. parses γ and obviously continues to parse all of the learned clauses remaining in Γ . Γ {γ} is therefore acceptable with respect to , since l can no longer appear in the antecedent of any learned clause in Γ . Given that we have discarded some of the learned clauses in Γ, do we have any basis for concluding that the procedure will eventually terminate? We do. As mentioned in the main body of the text, the key observation is that we have made lexicographic progress in the search space. While it is true that the allowed domains of all of the variables following l may have gotten bigger when we replaced Γ with Γ , the domain for l itself has gotten smaller. Since l precedes these other variables in the total order, we have made lexicographic progress and systematicity remains guaranteed. Consider now the second case, where Γ does contain a learned clause with conclusion l. Note ﬁrst that since Γ is acceptable, the learned clause concluding l must be unique. We can resolve this learned clause with γ to obtain a new learned clause ρ. Every variable in ρ precedes l according to the ordering , so the conclusion of ρ must precede l as well. We can thus continue to make lexicographic progress if we deﬁne add(γ, Γ) to be add(ρ, Γ), where

we have essentially chosen to incorporate the lexicographically more powerful resolvent ρ instead of the original learned clause γ. To ensure that these ideas are clear, suppose that our original learned clause set Γ is given by the following, where the ordering ranks the variables alphabetically:

Now if the new learned clause is a b f, we simply add it to Γ. No learned clause need be removed because the conclusion f does not appear in the antecedent of any learned clause in Γ. Having added a b f, suppose that we now learn a c. As in the example above, we add the learned clause to Γ and remove every learned clause whose antecedent includes c, which means that both of the original learned clauses are removed from Γ, although a b f is retained. The new Γ set is:

We have made lexicographic progress. If a is true, the original learned clause set (6) allowed c to be either true or false; the new learned clause set (7) forces c to be false. Reducing the number of possible values for c counts as progress, whatever happens to the possible values for subsequent variables. Suppose that instead of learning a c, we had learned c f. This conﬂicts with a b f, so we resolve the two together to obtain a b c, which the total order causes us to interpret as a b c. We add this to Γ as in the previous paragraph; once again, lexicographic progress has been made. This restricted example contains the essence of the ideas we will use in the more general case. The general case is more complex because, just as we must ensure that Γ remains acceptable, we would also like to ensure that the partial order remains as ﬂexible as possible. If were to approach a total order, it would tend to force speciﬁc interpretations of newly learned clauses, reducing our ability to reverse decisions that were made early in the search. This would mean that we would have relatively little ﬂexibility in modifying the bias; we would always have to simply ﬂip the bias variable that was most recently valued even as evidence accumulated that this variable was valued correctly. We begin by formalizing the notion of lexicographic progress.

Deﬁnition A.4 If Γ is a set of learned clauses and is a total order, we will denote by |Γ|x the number of elements of Γ that have conclusion either x or x. We will denote by |>x| the number of y with y > x. If v is the number of variables being considered, we now deﬁne the total order size of Γ under to be sizet(Γ, ) = 2v X

x 2|>x||Γ|x (8)

Proposition A.5 If Γ is acceptable for a total order , then 2v sizet(Γ, ) > 0.

Satisfiability and Systematicity

Proof. The ﬁrst inequality is immediate. For the second, since |Γ|x 1,

sizet(Γ, ) 2v X

But consider the summation in isolation, where we have:

i=1 2i 1 = 2v 1.

Thus sizet(Γ, ) > 0.

Before moving on, let us explain the intuition underlying Deﬁnition A.4. The basic idea is that the expression in (8) should reﬂect the size of the search space remaining to be considered, because X

x 2|>x||Γ|x

counts the number of potential models eliminated by the learned clauses in Γ. That it does should be fairly clear. The unique learned clause in |Γ|x eliminates a value for x. Each such elimination corresponds to the removal of a subtree in the overall search space. The size of the subtree is given by 2|>x|, since |>x| is the number of variables properly following x in the ordering.

Proposition A.6 Fix a total order , and let Γ and Γ be two sets of learned clauses. Assume that there is some x with |Γ|x < |Γ |x but |Γ|y = |Γ |y for all y < x. Now if is a total order that agrees with when restricted to the set {y, z|y, z x}, then sizet(Γ , ) < sizet(Γ, ).

Proof. This is simply a formalization of the lexicographic argument made earlier. To show sizet(Γ, ) sizet(Γ , ) > 0, we have

sizet(Γ, ) sizet(Γ , ) = X

y 2|> y||Γ |y X

y 2|>y||Γ|y (9)

We will show:

1. For any y < x, 2|> y||Γ |y = 2|>y||Γ|y (10)

2. For y = x, 2|> x||Γ |x 2|>x||Γ|x 2|>x| (11)

3. For y > x, X

y 2|>y||Γ|y < 2|>x| (12)

Collectively, these suﬃce. The terms (10) with y < x have no eﬀect on the diﬀerence in (9). The amount contributed by the term (11) with y = x is greater than the amount subtracted by the terms (12) with y > x. Thus the total sum is positive and the result follows. (10) is a consequence of the fact that for y x, |> y| = |>y| and |Γ |y = |Γ|y. For (11), since |Γ |x > |Γ|x, we get

2|>x||Γ|x 2|>x|(|Γ |x 1)

= 2|> x||Γ |x 2|>x|

For (12), the sum is essentially identical to that appearing at the end of the previous proof.

Deﬁnition A.7 A partial order will be called a reﬁnement of a partial order if x y x y. A reﬁnement of that is a total order will be called a total reﬁnement of .

Deﬁnition A.8 For a set of learned clauses Γ and partial order , we deﬁne the size of Γ under , denoted size(Γ, ), to be the maximum of sizet(Γ, ) over all total reﬁnements of .

The following is an immediate consequence of Proposition A.5:

Corollary A.9 If Γ is acceptable for a partial order , then 2v size(Γ, ) > 0.

The point of all this is that we can now state our desiderata for add a bit more precisely. We need:

1. Polynomial space: If Γ is acceptable with respect to and (γ , Γ , ) = add(γ, Γ, ), then we want Γ to be acceptable with respect to . This will allow us to use Proposition A.3 to conclude that Γ can be stored in polynomial space as the algorithm proceeds. and the bias B can obviously be stored in polynomial space as well.

2. Systematicity: If (γ , Γ , ) = add(γ, Γ, ), then we want size(Γ , ) < size(Γ, ). This will allow us to use Corollary A.9 to conclude that add can be invoked at most 2v times before the entire search space is eliminated.

A.3 Systematicity Recall: Procedure 3.6 To compute add(γ, Γ, ):

1 if γ = then return (γ, Γ { }, );

2 select l γ ; // l is the intended conclusion

3 if there is a ρ Γ with ρ = l then

4 return add(resolve(γ, ρ), Γ, )

5 add x l to for each x γ;

7 remove from Γ any c with either c {l, l} = Ø or both c {x|l < x} = Ø and |c | > 1;

8 return (γ, Γ {γ}, )

Satisfiability and Systematicity

Proposition A.10 Suppose that Γ is acceptable with respect to , and that Γ Γ |= γ. Then if (γ , Γ , ) = add(Γ, γ, ), Γ is acceptable with respect to .

Proof. Note ﬁrst that the procedure terminates. The only possible source of nontermination is the recursive call on line 4, but the conclusion of the resolvent is necessarily < the conclusion of γ. The number of recursive calls is therefore bounded, and the procedure terminates. To show that Γ is acceptable, we consider ﬁrst the nonrecursive case where there is no ρ Γ with conclusion l. For the three conditions of Deﬁnition A.1, we have:

1. To show that parses c for any c Γ , there are two cases.

If c is the learned clause γ, we know by construction that includes y x for every y appearing in c.

For the case where c Γ, we need the following lemma:

Lemma A.11 Suppose that x u but x u. Then l < x and l < x.

Proof. Suppose that we denote by + the partial order constructed in line 5 of the procedure, where additional arcs are added to the original . Since + is a reﬁnement of , it is clear that if x u, we will have x + u as well.

We now have = + l , so the only way to have x + u but not x u is if l <+ x, so that l < x as well.

To show that l < x, note that if l < x, then l <+ x because arcs are removed in the construction of < from <+. But if l <+ x we must also have l < x, since the arcs added in the construction of <+ are all of the form z < l.

Returning to the proof of the proposition, suppose that y, z c with y and z incomparable under . Now if either l < y or l < z, the clause in question would have been eliminated by line 7 of the add construction, so it follows that l < y and l < z.

We claim that there is no u c with y < u. If there were such a u, then we would have y u by virtue of the lemma, so y c . Similarly there is no u c with z < u. Thus y, z c and therefore y = z, since parses c.

2. To show that no two elements of Γ have the same conclusion, note ﬁrst that no two elements of Γ have the same conclusion under the original partial order . This is true by construction for two elements of Γ. If the new learned clause γ were to have the same conclusion as an element of Γ, we would have had Γ Γ |= γ.

The only way for the conclusion of c Γ under to be diﬀerent than the conclusion under is if there are y, z c with y < z but z < y. But then by the lemma we must have l < y and l < y. Since l < y < z, l < z and therefore l < z. But now the weakening in line 6 causes y and z to be incomparable under < .

3. Finally, we must show that Γ |= Γ , or that Γ and Γ are consistent.

To see this, we can use the fact that Γ Γ |= γ to ﬁnd a complete assignment P of values to variables that satisﬁes Γ, Γ and γ. Note that P must both satisfy the antecedent of γ and include l in order to falsify γ.

We now claim that P|l satisﬁes both Γ and Γ . P|l continues to satisfy the antecedent γ , since l γ , and P|l satisﬁes γ since it contains l. We therefore need only consider c Γ that appear in Γ as well.

Since P satisﬁes both c and c , the only way that P|l might not satisfy c and c is if P|l fails to satisfy c because the conclusion of c is no longer satisﬁed, or P|l fails to satisfy c because one of the terms in the antecedent is no longer satisﬁed. The ﬁrst of these cannot happen because P|l diﬀers from P only at l, and l cannot be c s conclusion. The second cannot happen because if l appears in the antecedent of c, c would have been removed in the construction of Γ .

This concludes the proof that Γ remains acceptable if a nonrecursive path is taken through add; to see that the recursive path remains acceptable as well, we need to show that the fundamental requirement that Γ Γ |= γ holds for the recursive call. In other words, the proof will be complete if we can show that Γ Γ |= resolve(γ, ρ) in line 4. Suppose that Γ Γ |= resolve(γ, ρ), and note that

resolve(γ, ρ) = γ ρ

Now ρ Γ , so we must therefore have

But this would imply Γ Γ |= γ, contradicting the assumptions of the proposition. The proof is complete.

Proposition A.12 Suppose that Γ is acceptable with respect to , and that Γ Γ |= γ. Then if (γ , Γ , ) = add(Γ, γ, ), size(Γ , ) < size(Γ, ).

Proof. We need the following lemma:

Lemma A.13 Let be a partial order and x a point. Then if T x is a total reﬁnement of x, there is a total reﬁnement T of such that T x and T agree when restricted to {y, z|y, z T x x}.

Proof. Deﬁne y z if and only if any of the following three conditions hold:

1. y, z T x x and y T x z,

2. x <T x y, z and y z, or

3. y T x x <T x z.

Satisfiability and Systematicity

We claim that is a partial order satisfying the requirements of the lemma. (Informally, is the partial order that matches above x and T x below x.) That is reﬂexive is clear. For transitivity, suppose that u v w. If x <T x u, the second clause of the deﬁnition is the only one that can support u v, so we must have x <T x v and similarly x <T x w together with u v w, so that u w. If w T x x, an analogous argument about applicability of the ﬁrst clause requires v T x x and similarly u T x x; now the transitivity of T x gives u T x w so u w also. Finally, if u T x x <T x w, we get u w directly. To see that is anti-symmetric, suppose u v u with u = v. We cannot have both u T x x T x v and v T x x T x u because T x is a partial order, so either the ﬁrst or the second clause of the deﬁnition of must apply. Either way, we must have u = v because both T x and are partial orders. To see that T x and agree when restricted to points y and z with y, z T x x, note that it is only the ﬁrst clause in the deﬁnition that ever asserts y x for such y, and this clause says that T x and match. It remains only to show that is a reﬁnement of , so that if y z, then y z. So suppose that y z. Now if x y, then x x y and therefore x T x y. Similarly, x T x z. Thus case 2 of the deﬁnition applies, and y z. If, on the other hand, x y, then y x z by the deﬁnition of the weakening x. Thus y T x z as well. Now there are two cases, depending on the T x relationship between x and y:

1. If x T x y, then x T x z as well and case 2 of the deﬁnition implies y z.

2. If y T x x (recall that T x is total), there are two subcases:

(a) If x T x z, then case 3 of the deﬁnition applies and y z.

(b) If z T x x, then case 1 applies.

In all cases, we can conclude y z from y z. This shows that is a partial order satisfying the conditions of the lemma. Taking T

to be any total reﬁnement of completes the proof.

We can now return to the proof of Proposition A.12. As earlier, we denote by + the partial order obtained in line 5 of the add procedure, where we have added new constraints based on the selection of l as the conclusion of γ. We will show that

size(Γ , ) < size(Γ, +) size(Γ, ) (13)

The second inequality is easy; since + is a reﬁnement of , there are fewer further reﬁnements to consider in Deﬁnition A.4 of size. For the ﬁrst inequality, if 1 is any total order used to evaluate size(Γ , ), we must show that there is some total order 2 used to evaluate size(Γ, +) for which sizet(Γ , 1) < sizet(Γ, 2). Recall that lines 5 and 6 of Procedure 3.6 constructed as + l . It follows that 1 is a total reﬁnement of + l . We therefore know from the lemma that there is some 2 that is a total reﬁnement of + such that 1 and 2 agree for {x, y|x, y 1 l}. Since γ Γ , |Γ |l > |Γ|l and we can thus apply Proposition A.6 to conclude

sizet(Γ , 1) < sizet(Γ, 2)

It follows that the maximum over the 1 s is less than the maximum over the 2 s, so that size(Γ , ) < size(Γ, +)

as needed in (13). The proof is complete. Procedure 3.8 Let T be a theory, Γ a set of clauses such that T |= Γ, a partial order on

the variables in T, and B a bias for T. Then to compute flex(T, Γ, , B, n), one of: if T is shown to be unsatisﬁable, a model P of T if one is found, or unknown if no solution is found after n steps:

2 while i < n do

3 (P, c) unit(T Γ, P);

4 while c = true do

5 if P assigns a value to every variable in T then return P;

6 v a variable unvalued by P;

7 (P, c) unit(T Γ, P, (B(v), ) );

8 γ any (P, c)-conﬂict that is a uip with B |= γ;

9 (γ, Γ, ) add(γ, Γ, );

11 if Γ then return ;

12 P backtrack(P, γ);

14 return unknown

Proposition A.14 At every iteration of the loop in Procedure 3.8, the following conditions hold:

1. B |= Γ Γ C(P).

2. Γ Γ |= C(P).

4. Γ is acceptable for .

Proof. To see that B |= C(P), note that this is obviously true at the start, where C(P) = Ø. Then there are three places where it could fail: line 7 where a new variable assignment is made and unit propagated, line 3 where P is extended by unit propagation, and line 10, where B is modiﬁed to incorporate the conclusion of γ. Line 7 is consistent with B by construction. Line 3 does not change C(P), since no new choices are made. On line 10, γ is always unit after the backtrack on line 12, so adding the conclusion of γ to B will not conﬂict with any choice that survives that backtrack. That B |= Γ Γ throughout is a bit more subtle; we begin by showing that B |= γ γ speciﬁcally. Clearly B |= γ, since the bias has been modiﬁed to satisfy the conclusion γ . To see that B |= γ , note that prior to the adjustment on line 10, B |= γ by construction so that B |= γ . Negating the conclusion γ will not change this.

Satisfiability and Systematicity

Next, we must show that for any other β Γ, B |= β β after line 10 is complete. In order for B to stop entailing a particular learned clause β, the conclusion of β must be γ , since only γ is modiﬁed in line 10. But in this case, a resolution will have been performed by add and a diﬀerent γ returned. Thus B |= β. In order for B to stop entailing a particular antecedent β , γ must have been included in that antecedent. But in this case, the learned clause in question will have been removed by the add procedure. Thus B |= β as well. This concludes the proof of the ﬁrst claim of Proposition A.14. The second claim follows from this; if Γ Γ |= C(P), then B would entail both C(P) and C(P) and would be inconsistent as a result. To see that T |= Γ, note ﬁrst that T |= γ in line 8 by the construction of a buip. The add procedure 3.6 adds to Γ either γ or the result of resolving γ with existing elements of Γ; either way, the clause being added is necessarily entailed by T. The fact that various learned clauses may be removed from Γ clearly does not aﬀect the conclusion that T |= Γ. Finally, Γ Γ |= γ by virtue of the facts that B |= Γ Γ and B |= γ. Thus Proposition A.10 can be used to conclude that Γ remains acceptable for .

Theorem 3.10 Suppose that T is a theory with n clauses involving v variables. Then:

1. Each loop of Procedure 3.8 can be executed in time O(v3 + nv2),

2. |Γ| v as Procedure 3.9 is executed, and

3. For any n with Pn i=0 r(i) 2v, Procedure 3.9 will ﬁnish before completing iteration n.

Proof. Since Γ is acceptable with respect to , the second claim follows from Proposition A.3. The ﬁrst claim now follows from Proposition 3.7. The third claim follows from Proposition A.5, which bounds size(Γ, ) to be between 0 and 2v, and Proposition A.12, which ensures that the size decreases on each pass through the loop in Procedure 3.8.

Bayardo, R. J., & Schrag, R. C. (1997). Using CSP look-back techniques to solve real-world SAT instances. In Proceedings of the Fourteenth National Conference on Artiﬁcial Intelligence, pp. 203 208.

Beame, P., Kautz, H., & Sabharwal, A. (2004). Towards understanding and harnessing the potential of clause learning. Journal of Artiﬁcial Intelligence Research, 22, 319 351.

Bebel, J., & Yuen, H. (2013). Hard SAT instances based on factoring. In Proceedings of SAT Competition 2013: Solver and Benchmark Description, p. 102, Helsinki, Finland.

Beck, J. C. (2005). Multi-point constructive search. In Proc. of the Eleventh Int l. Conf. on Principles and Practice of Constraint Programming (CP05), pp. 737 741.

Ben-Sasson, E., Impagliazzo, R., & Wigderson, A. (2000). Near-optimal separation of treelike and general resolution. Tech. rep., Electronic Colloquium in Computation Complexity.

Biere, A., Heule, M. J. H., van Maaren, H., & Walsh, T. (Eds.). (2009). Handbook of Satisﬁability, Vol. 185 of Frontiers in Artiﬁcial Intelligence and Applications. IOS Press.

Bonet, M. L., & Johannsen, J. (2014). Improved separations of regular resolution from clause learning proof systems. Journal of Artiﬁcial Intelligence Research, 49, 669 703.

Bonet, M. L., Pitassi, T., & Raz, R. (1997). Lower bounds for cutting planes proofs with small coeﬃcients. Journal of Symbolic Logic, 62(3), 708 728.

Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., & Colton, S. (2012). A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 1 49.

Buhler, J., Lenstra, H., & Pomerance, C. (1993). Factoring integers with the number ﬁeld sieve, Vol. 1554 of Lecture Notes in Mathematics, pp. 50 94. Springer-Verlag.

Buss, S. R., Hoﬀmann, J., & Johannsen, J. (2008). Resolution trees with lemmas: Resolution reﬁnements that characterize DLL algorithms with clause learning. Logical Methods in Computer Science, 4.

Clark, K. L. (1978). Negation as failure. In Gallaire, H., & Minker, J. (Eds.), Logic and Data Bases, pp. 293 322. Plenum, New York.

Davis, M., & Putnam, H. (1960). A computing procedure for quantiﬁcation theory. J. Assoc. Comput. Mach., 7, 201 215.

Davis, M., Logemann, G., & Loveland, D. (1962). A machine program for theorem-proving. Communications of the ACM, 5(7), 394 397.

Doyle, J. (1979). A truth maintenance system. Artiﬁcial Intelligence, 12, 231 272.

Gaschnig, J. (1979). Performance Measurement and Analysis of Certain Search Algorithms. Ph.D. thesis, Carnegie-Mellon University.

Ginsberg, M. L. (1993). Dynamic backtracking. Journal of Artiﬁcial Intelligence Research, 1, 25 46.

Gomes, C. P., Selman, B., & Kautz, H. (1998). Boosting combinatorial search through randomization. In Proceedings of the Fifteenth National Conference on Artiﬁcial Intelligence (AAAI 98), pp. 431 437, Madison, Wisconsin.

Goultiaeva, A., & Bacchus, F. (2012). Oﬀthe trail: Re-examining the CDCL algorithm. In Proceedings of the 15th International Conference on Theory and Applications of Satisﬁability Testing (SAT-2012), pp. 30 43.

Haken, A. (1985). The intractability of resolution. Theoretical Computer Science, 39, 297 308.

Haken, A. (1995). Counting bottlenecks to show monotone P = NP. In Proceedings 36th Annual IEEE Symp. on Foundations of Computer Science (FOCS-95), pp. 36 40, Milwaukee, MN. IEEE.

Harvey, W. D. (1995). Nonsystematic Backtracking Search. Ph.D. thesis, Stanford University, Stanford, CA.

Satisfiability and Systematicity

Hertel, P., Bacchus, F., & Pitassi, T. (2008). Clause learning can eﬀectively P-simulate general propositional resolution. In Proceedings of the Twenty-Third National Conference on Artiﬁcial Intelligence, pp. 283 290.

Huang, J. (2006). TINISAT in SAT-race 2006..

Huang, J. (2007). The eﬀect of restarts on the eﬃciency of clause learning. In Proceedings of the 20th International Joint Conference on Artiﬁcal Intelligence, IJCAI 07, pp. 2318 2323, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Kaivola, R., Ghughal, R., Narasimhan, N., Telfer, A., Whittemore, J., Pandav, S., Slobodov, A., Taylor, C., Frolov, V., Reeber, E., & Naik, A. (2009). Replacing testing with formal veriﬁcation in Intel Core i7 processor execution engine validation. In Bouajjani, A., & Maler, O. (Eds.), Computer Aided Veriﬁcation, Vol. 5643 of Lecture Notes in Computer Science, pp. 414 429. Springer Berlin Heidelberg.

Kocsis, L., & Szepesv ari, C. (2006). Bandit based Monte-Carlo planning. In In: ECML-06. Number 4212 in LNCS, pp. 282 293. Springer.

Krajiˇcek, J. (1997). Interpolation theorems, lower bounds for proof systems, and independence results for bounded arithmetic. J. Symb. Logic, 62(2), 457 486.

Lecoutre, C., Sa ıs, L., Tabary, S., & Vidal, V. (2007). Recording and minimizing nogoods from restarts. J. on Satisﬁability, Boolean Modeling and Computation (JSAT), 1, 147 167.

Luby, M., Sinclair, A., & Zuckerman, D. (1993). Optimal speedup of Las Vegas algorithms. Information Processing Letters, 47, 173 180.

Lynce, I., Baptista, L., & Marques-Silva, J. (2001). Stochastic systematic search algorithms for satisﬁability. In In LICS Workshop on Theory and Applications of Satis Testing, pp. 1 7.

Marques-Silva, J. P., & Sakallah, K. A. (1999). GRASP A search algorithm for propositional satisﬁability. Computers, 48(5), 506 521.

Marques-Silva, J., Lynce, I., & Malik, S. (2009). Conﬂict-Driven Clause Learning SAT Solvers, Vol. 185 of Frontiers in Artiﬁcial Intelligence and Applications, pp. 131 153. IOS Press.

Mc Allester, D. A. (1993). Partial order backtracking. Unpublished.

Moskewicz, M., Madigan, C., Zhao, Y., Zhang, L., & Malik, S. (2001). Chaﬀ: Engineering an eﬃcient SAT solver. In 39th Design Automation Conference.

Moura, L., & Bjørner, N. (2010). Bugs, moles and skeletons: Symbolic reasoning for software development. In Giesl, J., & H ahnle, R. (Eds.), Automated Reasoning, Vol. 6173 of Lecture Notes in Computer Science, pp. 400 411. Springer Berlin Heidelberg.

Pipatsrisawat, K., & Darwiche, A. (2007). A lightweight component caching scheme for satisﬁability solvers. In Proceedings of 10th International Conference on Theory and Applications of Satisﬁability Testing (SAT), pp. 294 299.

Pipatsrisawat, K., & Darwiche, A. (2011). On the power of clause-learning SAT solvers as resolution engines. Artif. Intell., 175(2), 512 525.

Pudlak, P. (1997). Lower bounds for resolution and cutting planes proofs and monotone computations. J. Symbolic Logic, 62(3), 981 998.

Purdom, P., & Sabry, A. (2005). CNF generator for factoring problems.. cgi.cs.indiana. edu/~sabry/cnf.html.

Ramos, A., Tak, P., & Heule, M. (2011). Between restarts and backjumps. In Sakallah, K., & Simon, L. (Eds.), Theory and Applications of Satisﬁability Testing - SAT 2011, Vol. 6695 of Lecture Notes in Computer Science, pp. 216 229. Springer.

Reiter, R. (1978). On closed world data bases. In Gallaire, H., & Minker, J. (Eds.), Logic and Data Bases, pp. 119 140. Plenum, New York.

Ryan, L. (2002). Eﬃcient algorithms for clause-learning SAT solvers. Master s thesis, Simon Fraser University.

Sellmann, M., & Ans otegui, C. (2006). Disco Novo Go Go: Integrating local search and complete search with restarts. In Proceedings of the Twenty-First National Conference on Artiﬁcial Intelligence, pp. 1051 1056.

Selman, B., Kautz, H. A., & Cohen, B. (1993). Local search strategies for satisﬁability testing. In Proceedings 1993 DIMACS Workshop on Maximum Clique, Graph Coloring, and Satisﬁability.

S orensson, N., & Een, N. (2005). Minisat v1.13 - a SAT solver with conﬂict-clause minimization. 2005. SAT-2005 poster. Tech. rep..

Stallman, R. M., & Sussman, G. J. (1977). Forward reasoning and dependency-directed backtracking in a system for computer-aided circuit analysis. Artiﬁcial Intelligence, 9, 135 196.

Tseitin, G. (1970). On the complexity of derivation in propositional calculus. In Slisenko, A. (Ed.), Studies in Constructive Mathematics and Mathematical Logic, Part 2, pp. 466 483. Consultants Bureau.

Zhang, L. (2003). Searching for Truth: Techniques for Satisﬁability of Boolean Formulas. Ph.D. thesis, Princeton University, Princeton, NJ.

Zhang, L., Madigan, C. F., & Moskewicz, M. H. (2001). Eﬃcient conﬂict driven learning in a boolean satisﬁability solver. In In ICCAD, pp. 279 285.