# improving_model_counting_by_leveraging_definability__7c3d049c.pdf

Improving Model Counting by Leveraging Deﬁnability

Jean-Marie Lagniez and Emmanuel Lonca and Pierre Marquis

CRIL-CNRS and Universit e d Artois, Lens, France email:{lagniez, lonca, marquis}@cril.univ-artois.fr

We present a new preprocessing technique for propositional model counting. This technique leverages deﬁnability, i.e., the ability to determine that some gates are implied by the input formula . Such gates can be exploited to simplify without modifying its number of models. Unlike previous techniques based on gate detection and replacement, gates do not need to be made explicit in our approach. Our preprocessing technique thus consists of two phases: computing a bipartition h I, Oi of the variables of where the variables from O are deﬁned in in terms of I, then eliminating some variables of O in . Our experiments show the computational beneﬁts which can be achieved by taking advantage of our preprocessing technique for model counting.

1 Introduction Propositional model counting (alias the #SAT problem) is the task consisting in computing the number of models of a given propositional formula . This problem and its direct generalization, weighted model counting, are central to many AI problems including probabilistic inference [Sang et al., 2005; Chavira and Darwiche, 2008; Apsel and Brafman, 2012; Choi et al., 2013] and forms of planning [Palacios et al., 2005; Domshlak and Hoffmann, 2006]. They have also many applications outside AI, like in SAT-based automatic test pattern generation, for evaluating the vulnerability to malicious fault attacks in hardware circuits (see e.g., [Feiten et al., 2012]). However, propositional model counting is computationally hard (a #P-complete problem), actually much harder in practice than the satisﬁability issue (the SAT problem). Its signiﬁcance explains why much effort has been spent for the last decade in developing new algorithms for model counting (either exact or approximate) which prove practical for larger and larger instances [Samer and Szeider, 2010; Bacchus et al., 2003; Gomes et al., 2009].

In this paper, we present a new preprocessing technique for improving exact model counting. Preprocessing techniques are nowadays acknowledged as computationally valuable for a number of automated reasoning tasks, especially SAT solving and QBF solving [Bacchus and Winter, 2004;

Subbarayan and Pradhan, 2004; Lynce and Marques-Silva, 2003; Een and Biere, 2005; Piette et al., 2008; Han and Somenzi, 2007; Heule et al., 2010; J arvisalo et al., 2012; Heule et al., 2011]. As such, they are now embodied in some state-of-the-art SAT solvers, like Glucose [Audemard and Simon, 2009] which takes advantage of the Sat ELite preprocessor [Een and Biere, 2005], Lingeling [Biere, 2014] which has an internal preprocessor, and Riss [Manthey, 2012b] which takes advantage of the Coprocessor preprocessor [Manthey, 2012a].

Our approach elaborates on [Lagniez and Marquis, 2014], which describes a number of preprocessing techniques that can be exploited for improving the model counting task, computationally speaking. Among them is gate detection and replacement. Basically, every variable y of the input formula which turns out to be deﬁned in in terms of other variables X = {x1, . . . , xk} can be replaced by its deﬁnition ΦX, while preserving the number of models of . Indeed, whenever a partial assignment over the variables of X is considered, either it is jointly inconsistent with or every model of which extends this partial assignment gives to y the same truth value. In [Lagniez and Marquis, 2014], literal equivalences, AND/OR gates and XOR gates are detected (either syntactically or using Boolean Constraint Propagation). The empirical results reported in [Lagniez and Marquis, 2014] about the preprocessor pmc equipped with the so-called #eq combination of preprocessings clearly show that huge computational beneﬁts can be achieved through the detection and the replacement of gates. However, pmc remains limited due to the small number of families of gates which are targeted (literal, AND, XOR gates and their negations).

In order to ﬁll the gap, our preprocessing technique to model counting aims at exploiting in a much more aggressive way the existence of gates within the input formula . The key idea of our approach is that one does not need to identify the gates themselves but it can be enough to determine that such gates exist. To be more precise, it proves sufﬁcient to detect that some deﬁnability relations between variables hold, without needing to identify the corresponding deﬁnitions. This distinction is of tremendous importance since on the one hand, the search space for the possible deﬁnitions ΦX is very large (22k elements up to logical equivalence, when X contains k variables), and on the other hand, in the general case, the size of any explicit deﬁnition ΦX of y in is

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

not polynomially bounded in | | + |X| unless NP \ co NP P/poly (which is considered unlikely in complexity theory) [Lang and Marquis, 2008].

Thus, in the following, we describe a new preprocessor B + E which associates with a given CNF formula1 a CNF formula Φ which has the same number of models as , but is at least as simple as w.r.t. the number of variables and the size. B + E consists of two parts: B which aims at determining a Bipartition h I, Oi of the variables of such that every variable of O is deﬁned in in terms of the remaining variables (in I), and E which aims at Eliminating in some variables of O. Our contribution mainly consists of the presentation of the algorithms B and E, a property establishing the correctness of our preprocessor, and some empirical results showing the computational improvements for model counting offered by B + E compared to pmc. The benchmarks used, the implementation (runtime code) of B + E, some detailed empirical results, and a full-proof version of the paper are available on line from www.cril.fr/KC/.

The rest of the paper is organized as follows. Section 2 gives some background on propositional deﬁnability. In Section 3 we introduce our preprocessor B + E and prove that it is correct. Section 4 presents results from our large scale experiments, showing B + E as a challenging preprocessor for model counting, especially when compared with pmc. Finally, Section 5 concludes the paper and lists some perspectives for further research.

2 On Deﬁnability

Let L be the (classical) propositional language deﬁned inductively from a countable set P of propositional variables, the usual connectives ( , _, , $, etc.) and including the Boolean constants > and ?. Formulae are interpreted in the classical way. |= denotes logical entailment and logical equivalence. For any formula from L, Var( ) is the set of variables from P occurring in , and k k is the number of models of over Var( ). A literal is a variable = x from P or a negated one = x. When is a literal, var( ) denotes the variable upon which is built. A term is a conjunction of literals or >, and a clause is a disjunction of literals or ?. A CNF formula is a conjunction of clauses. Let X be any subset of P. A canonical term γX over X is a consistent term into which every variable from X appears (either as a positive literal or as a negative one, i.e., as a negated variable). 9X. denotes any formula from L equivalent to the forgetting of X in , i.e., the strongest logical consequence of which is independent of variables from X.

Let us now recall the two (equivalent) forms under which the concept of deﬁnability in propositional logic can be encountered:

1Requiring the input to be a CNF formula is not a major restriction since Tseitin transformation [Tseitin, 1968] can be used to turn any propositional circuit into a CNF formula which has the same number of models indeed, the variables which are introduced are actually deﬁned from the original ones and the transformation consists in adding gates to the input. Interestingly, the CNF format is the one considered by state-of-the-art model counters.

Deﬁnition 1 (implicit deﬁnability) Let 2 L, X P and y 2 P. implicitly deﬁnes y in terms of X if and only if for every canonical term γX over X, we have γX |= y or γX |= y. Deﬁnition 2 (explicit deﬁnability) Let 2 L, X P and y 2 P. explicitly deﬁnes y in terms of X if and only if there exists a formula ΦX 2 PROPX s.t. |= ΦX $ y. In such a case, ΦX is called a deﬁnition (or gate) of y on X in , y is the output variable of the gate, and X are its input variables. Example 1 Let be the CNF formula consisting of the following clauses:

a _ b, a _ c _ e, a _ d, b _ c _ d,

a _ b _ d, a _ c _ d, a_ b_c_ e, a_b_ c_ e,

a _ e, b _ c _ e, b _ c _ e.

d and e are implicitly deﬁned in in terms of X = {a, b, c}. For instance, the canonical term γX = a b c is such that γX |= d e. On the other hand, γ0

X = a b c is such that γ0

X is inconsistent. d and e are also explicitly deﬁned in in terms of X = {a, b, c} since implies

d $ (a (b _ c)) and e $ ( a _ (b $ c)). What happens in this example is not fortuitous due to the following theorem from [Beth, 1953]: Theorem 1 Let 2 L, X P and y 2 P. implicitly deﬁnes y in terms of X if and only if explicitly deﬁnes y in terms of X.

Since implicit deﬁnability and explicit deﬁnability coincide, one can simply say that y is deﬁned in terms of X in . An interesting consequence of this theorem is that it is not mandatory to point out a deﬁnition ΦX of y in terms of X in order to prove that such a deﬁnition exists. Indeed, it is enough to show that implicitly deﬁnes y in terms of X to do the job, and this problem is only co NP-complete [Lang and Marquis, 2008]. To prove it, we can take advantage of the following result (Padoa s theorem [Padoa, 1903]), restricted to propositional logic and recalled in [Lang and Marquis, 2008]; this theorem gives an entailment-based characterization of (implicit) deﬁnability: Theorem 2 For any 2 L and any X P, let 0

X be the formula obtained by replacing in in a uniform way every propositional symbol z from Var( ) \ X by a new propositional symbol z0. Let y 2 P. If y 62 X, then (implicitly) deﬁnes y in terms of X if and only if 0

X y y0 is inconsistent.2

3 A New Preprocessor to Model Counting Instead of detecting gates and replacing them in in order to remove output variables, our preprocessing technique consists in detecting output variables, then in forgetting them in . To be more precise, the objective is ﬁrst to ﬁnd (if possible) a deﬁnability bipartition h I, Oi of where I contains as few elements as possible.

2Obviously enough, in the remaining case when y 2 X, deﬁnes y in terms of X.

Deﬁnition 3 (deﬁnability bipartition) Let 2 L. A deﬁnability bipartition of is a pair h I, Oi such that I [ O = Var( ), I \ O = ;, and deﬁnes every variable o 2 O in terms of I.

Then in a second step, variables from O are forgotten in so as to simplify it. This leads to the preprocessing algorithm B + E (B(ipartition), then E(liminate)) given at Algorithm 1:

Algorithm 1: B + E

input : a CNF formula output: a CNF formula Φ such that kΦk = k k O B( ); 1

Φ E(O, ); 2

Interestingly, both steps in this algorithm can be tuned in order to keep the preprocessing phase light from a computational standpoint. On the one hand, it is not necessary to determine a deﬁnability bipartition h I, Oi of for which the cardinality of I is minimal (identifying a reasonable amount of output variables can prove sufﬁcient). On the other hand, it is not necessary to forget (i.e., eliminate) in every variable from O but focusing on a subset E O is enough. Formally, our approach is based on the following result, which establishes the correctness of B + E:

Proposition 1 Let 2 L. Let h I, Oi be a deﬁnability bipartition of Var( ). Let E O. Then k k = k9E. k.

The ability to identify only a subset O of output variables in the bipartition generation phase, and to consider only a subset E of O in the elimination phase is valuable. In fact, computing a shortest base (i.e., a subset I of minimal cardinality such that every variable not in I is deﬁnable in in terms of I) [Lang and Marquis, 2008] would be prohibitive; indeed, computing such a base using a branch-and-bound algorithm would require, in the worst case, exponentially many deﬁnability tests in the number of variables occurring in . Furthermore, while forgetting variables in obviously leads to diminishing the number of variables occurring in it, it may also lead to an exponential increase of its size. Eliminating in only a subset E of variables from those found in O renders it possible to focus on those variables for which the elimination step will not increase the size of ( a la Ni VER [Subbarayan and Pradhan, 2004]), or only by a negligible factor. More generally, the elimination of an output variable can be committed only if the size of after the elimination step remains small enough, once some additional preprocessing has been achieved. Among the equivalence-preserving preprocessings of interest are occurrence simpliﬁcation [Lynce and Marques-Silva, 2003] and viviﬁcation [Piette et al., 2008] (already considered in [Lagniez and Marquis, 2014]), which aim at shortening some clauses, and at removing some clauses (for viviﬁcation).

Example 2 (Example 1 cont ed) No literal equivalences, AND/OR gates or XOR gates are logical consequences of . Nevertheless, since implies

d $ (a (b _ c)) and e $ ( a _ (b $ c))

a deﬁnability bipartition of Var( ) is h{a, b, c}, {d, e}i. Now, forgetting d and e in leads to the generation of two nonvalid clauses a _ c and a _ b _ c so that 9{d, e}. can then be computed as the conjunction of:

a _ b, a _ c, a _ b _ c.

which can be simpliﬁed further into (a_b) (a_c). This CNF formula has only 5 models, hence this is also the case of .

Algorithm 2: B

input : a CNF formula output: a set O of output variables, i.e., variables deﬁned

in in terms of I = Var( ) \ O h , Oi backbone( ); 1

V sort(Var( )); 2

foreach x 2 V do 4

if defined?(x, , I [ succ(x, V), max#C) then 5

O O [ {x}; 6

I I [ {x}; 8

Algorithm 2 shows how a bipartition h I, Oi of Var( ) is computed by B in a greedy fashion. At line 1, backbone( ) computes the backbone of (i.e., the set of all literals implied by ), and initializes O with the corresponding variables (indeed, a literal belongs to the backbone of precisely when var( ) is deﬁned in in terms of ;). Boolean Constraint Propagation is also done on completed by its backbone (this typically leads to simplifying ). While the variables of the backbone can be simpliﬁed away in by ﬁxing their values, they are nevertheless kept in O in order to ensure that the set O of variables returned by B is such that {I, O} is a bipartition of Var( ). At line 2, the remaining variables occurring in are sorted by considering their number of occurrences from less to more frequent. At line 4, defined? takes advantage of Padoa s method (Theorem 2) for determining whether x is deﬁned in in terms of I [ succ(x, V), where succ(x, V) is the set of all variables of V which appear after x in V. defined? takes advantage of an anytime SAT solver solve based on CDCL architecture for achieving the (un)satisﬁability test required by Padoa s method. In our implementation, the input of solve is the CNF formula 0

z2Var( )(( sz_ z_z0) ( sz_z_ z0)), completed by assumptions: for every z belonging to I [ succ(x, V), the unit clause sz associated with z is added as an assumption to the CNF formula (its effect is to make z equivalent to its copy z0); then, x and x0 are also added as assumptions. Interestingly, clauses which are learnt at each call to solve are kept for the subsequent calls. defined? is parameterized by max#C which bounds the number of clauses which can be learnt. When no contradiction has been found before max#C is reached, defined? returns false (i.e., x is considered as not deﬁned in in terms of I [ succ(x, V), while this could

be questioned had a larger bound be considered). Clearly, the number of output variables found by B is not guaranteed to be maximal, but this is on purpose for the sake of efﬁciency (observe that the number of calls to solve does not exceed the number of variables occurring in ).

Algorithm 3: E

input : a CNF formula and a set of output variables

O Var( ) output: a CNF formula Φ such that Φ 9E. for some

iterate true; P O; 2

while iterate do 3

E P; P ;; iterate false; 4

Φ vivification Simpl(Φ, E); 5

while E 6= ; do 6

x select(E, Φ); 7

E E \ {x}; 8

Φ occurrence Simpl(Φ, x); 9

if #(Φx) #(Φ x) > max#Res then 10

P P [ {x} 11

R remove Sub(Res(x, Φ), Φ); 13

if #((Φ \ Φx, x) [ R) #(Φ) then 14

Φ (Φ \ Φx, x) [ R; 15

iterate true; 16

P P [ {x} 18

return Φ 19

Algorithm 3 shows how variables from O are eliminated in by E. P contains variables x from O which are candidates for elimination. P is initialized with the full set O (line 2). The main loop at line 3 is repeated while the elimination of at least one variable is effective (line 16). At line 4, the set E of variables that will be tentatively eliminated during the iteration is initialized with P, and P is reset to ;. At line 5, the clauses of Φ are successively viviﬁed using a slight variant of the viviﬁcation algorithm vivification Simpl reported in [Lagniez and Marquis, 2014]. Viviﬁcation [Piette et al., 2008] is a preprocessing technique which aims at reducing the input CNF formula, i.e., to remove some clauses in it and some literals in the other clauses while preserving equivalence, using Boolean Constraint Propagation. The additional parameter E is used to sort the literals within the clauses of so that the literals over E are put ﬁrst (i.e., one tries to eliminate occurrences of literals over E in priority). At line 6, one enters into the inner loop that operates while there are remaining variables in E. At line 7, a variable x is selected in E for being possibly eliminated by counting the number #(Φx) of clauses of Φ where x appears as a positive literal, and the number #(Φ x) of clauses of Φ where x appears as a negative literal; x is retained if it minimizes #(Φx) #(Φ x), which is a upper bound of the number of resolvents that the elimination of x in Φ may generate. At line 8, x is removed

from E. Then, at line 9, one tries ﬁrst to eliminate in Φ some occurrences of variable x using occurrence Simpl. occurrence Simpl is a restriction of the algorithm for occurrence simpliﬁcation reported in [Lagniez and Marquis, 2014], where instead of considering the whole set of literals occurring in Φ, we just focus on those in {x, x}. At line 10, one recomputes #(Φx) #(Φ x) and checks whether it exceeds or not a preset bound max#Res. If this is the case, then we possibly postpone the elimination of x in Φ at the next iteration by adding it to P (line 11). Otherwise, we compute the set Res(x, Φ) of all non-valid resolvents of clauses from Φ on x and we remove from it using remove Sub every clause which is properly subsumed by a clause of Φ or another clause from Res(x, Φ); the resulting set of clauses is R (line 13). At line 14, we test whether the elimination of x in Φ, obtained by removing from Φ its subset Φx, x of the clauses into which variable x occurs (either as a positive literal or as a negative literal), and adding the resolvents from R, leads or not to increasing the number of clauses in Φ. If so, we possibly postpone the elimination of x in Φ at the next iteration by adding it to P (line 18). If not, the elimination of x in Φ is committed (line 15). Clearly, it can be the case that some variables of O are not eliminated by E, but again, this is on purpose for efﬁciency reasons.

4 Empirical Results

In our experiments, we have considered 703 CNF instances from the SAT LIBrary.3 They are gathered into 8 data sets, as follows: BN (Bayesian networks) (192), BMC (Bounded Model Checking) (18), Circuit (41), Conﬁguration (35), Handmade (58), Planning (248), Random (104), Qif (7) (Quantitative Information Flow analysis - security). Our experiments have been conducted on Intel Xeon E5-2643 (3.30 GHz) processors with 32 Gi B RAM on Linux Cent OS. A time-out of 1h and a memory-out of 7.6 Gi B has been considered for each instance. We set max#Res to 500.

As a matter of comparison, we have considered the pmc preprocessor for model counting, described in [Lagniez and Marquis, 2014] and available from www.cril.fr/KC/. To be more precise, we considered pmc equipped with the #eq combination of preprocessings, which combines backbone simpliﬁcation, occurrence elimination, viviﬁcation and gates detection and replacement. pmc equipped with #eq proved empirically as a very efﬁcient preprocessor for model counting [Lagniez and Marquis, 2014].

We evaluated the impact of B + E (for several values of max#C) by coupling it with exact model counters. We considered the search-based model counters Cachet4 [Sang et al., 2004] and Sharp SAT5 [Thurley, 2006], run with their default settings. Though compilation-based approaches do much more than model counting (since they compute equivalent, compiled representations of the input CNF formula and not only the number of models of ), some of them appear as competitive for the model counting purpose. Thus, we

3www.cs.ubc.ca/ hoos/SATLIB/index-ubc.html 4www.cs.rochester.edu/ kautz/Cachet/ 5sites.google.com/site/marcthurley/sharpsat

also took advantage of the C2D compiler 6 [Darwiche, 2001; 2004] for achieving the downstream model counting task. C2D generates a Decision-DNNF representation of . The size of is exponential in the size of in the worst case, but the number of models of conditioned by any consistent term γ can be computed efﬁciently from in every case. And when γ is >, one gets the number of models of . C2D has been invoked with the following options -count -in memory -smooth all, which are suited when C2D is used as a model counter.

By the way, it is worth noting that B + E cannot be considered upstream to compilation-based approaches to model counting, while preserving the possibility of counting efﬁciently the number of models of the input conditioned by any consistent term. Indeed, when B + E( ) is not equivalent to , the Decision-DNNF representation (B + E( )) computed by C2D is not equivalent to . Therefore, the possibility of efﬁcient model counting after any conditioning is lost, but general conditioning must be downsized to a restricted form of conditioning where terms γ built up from I are allowed, but no other terms. Interestingly, such a restricted form of conditioning can prove enough in some scenarios. Especially, when the set of variables of can be partitioned into a set of controllable variables (those which may require to be conditioned) and a remaining set of uncontrollable variables, one may take advantage of a slight variant of B + E ensuring that every controllable variable is put into I in order to simplify the input CNF formula before compiling it.

The next table makes precise the number of instances (over 703) solved within 1h by each of the model counters Cachet, Sharp SAT, and C2D (ﬁrst column), when no preprocessing has been applied (second column), pmc (equipped with #eq) has been applied ﬁrst (third column), and ﬁnally B + E( ) for several values of max#C has been applied ﬁrst (the remaining columns). The preprocessing time is taken into account in the computations (it is part of the 1h CPU time allocated per instance).

model counter no preprocessing pmc 10 100 1000 1 Cachet 525 558 586 588 594 602 Sharp SAT 507 537 575 581 586 593 C2D 547 602 605 613 616 621

The results reported in this table show the beneﬁts which can be achieved by applying B + E before using a model counter. In particular, B + E leads to better performances than pmc. Since the best performances of B + E are achieved for max#C = 1, we focus on this parameter assignment in the following.

The cactus plot given in the next ﬁgure illustrates the performances of Cachet, Sharp SAT, and C2D, possibly empowered by pmc or by B + E. For each value t on the y-axis (a model counting time, in seconds) and each dot of a curve for which this value is reached on the y-axis, the corresponding value on the x-axis makes precise how many instances have been solved by the approach associated with the curve within a time limit of t (which includes the preprocessing time, when

6reasoning.cs.ucla.edu/c2d/

200 250 300 350 400 450 500 550 600

time (in seconds)

number of instances solved

Sharp SAT( ) Sharp SAT(pmc( )) Sharp SAT(B+E( , 1))

Cachet( ) Cachet(pmc( )) Cachet(B+E( , 1))

c2d( ) c2d(pmc( )) c2d(B+E( , 1))

a preprocessing has been used). For the sake of readibility, only 10% of the dots have been printed. Again, the plot clearly shows B + E as a better preprocessor than pmc.

In order to determine how much applying B + E leads to reduction of the input CNF formula compared to pmc, we considered two measures for assessing the reduction of : #var( ), the number of variables of , and #lit( ), the number of literals occurring in (i.e., the size of ). Empirically, the results are presented on the two scatter plots (a) and (b) where each point corresponds to an instance , its x-coordinate corresponds to the value of the measure (#var (a) or #lit (b)) on pmc( ) equipped with the #eq combination of preprocessings, while its y-coordinate corresponds to the value of the same measure on B + E( ) (with max#Conflicts = 1). The scales used for both coordinates are logarithmic ones. Clearly enough, B + E often leads to much larger reductions than pmc for both measures. The beneﬁts appear as very signiﬁcant for instances from the Planning family.

Finally, we have evaluated how much B + E leads to reduction of the overall model counting time compared to pmc. The results are presented on the two scatter plots (with logarithmic scales) (c) and (d), for the two model counters Cachet and C2D (which appeared as the best counters in our experiments) considered downstream. Each point corresponds to an instance , its x-coordinate corresponds to the time (in seconds) required to compute k k by computing pmc( ) ﬁrst, then calling the model counter on the resulting CNF formula, while its y-coordinate corresponds to the time required to compute k k by computing B + E( ) (with max#Conflicts= 1) ﬁrst, then calling the model counter on the resulting CNF formula. Again, whatever the downstream model counter, B + E appears often as a more efﬁcient preprocessor than pmc. The rightmost parts of the two scatter plots cohere with the results reported in the previous table, showing a number of instances which can be solved by any of the model counters when B + E has been applied ﬁrst, while they cannot be solved within the time limit of 1h when pmc is used instead. Finally, note that considering only the preprocessing times (and not the overall time needed to count the number of models of the input) for evaluating the preprocessor would be misleading: for some instances, the preprocess-

10 100 1000 10000

Qif Handmade

Circuit Conﬁguration

10 100 1000 10000 100000

Qif Handmade

Circuit Conﬁguration

0.1 1 10 100 1000

Cachet(B+E( , 1))

Cachet(pmc( ))

Qif Handmade

Circuit Conﬁguration

(c) B + E+Cachet vs. pmc+Cachet

0.1 1 10 100 1000

c2d(B+E( , 1))

c2d(pmc( ))

Qif Handmade

Circuit Conﬁguration

(d) B + E+C2D vs. pmc+C2D

ing times can be (relatively) long (details are available from www.cril.fr/KC/), just because the preprocessor does almost all the job (it may happen that the simpliﬁcation of the instance is so important that the downstream model counter has almost nothing to do afterwards).

5 Conclusion We have deﬁned a new preprocessing technique B + E which associates with a given CNF formula a CNF formula B + E( ) which has the same number of models as , but is often simpler w.r.t. the number of variables and the size. B + E is based on standard theorems in classical logic by Beth and Padoa. Remarkably enough, while those results are quite old, they prove useful for deﬁning a very effective preprocessing technique to model counting. Thus, experiments have shown that for many instances , the overall computation time needed to calculate k B + E( )k using state-of-the art exact model counters is often much lower than the time needed to compute k k with the same counters.

This work opens a number of perspectives for further re-

search. Considering other heuristics in B for determining a bipartition of the variables and determining how to tune the constants max#C and max#Res depending on the instance at hand will be studied in the future. Other perspectives concern the notion of projected model counting, as considered in [Aziz et al., 2015]. The purpose is to compute k9E. k given a set E of variables and a formula . Instead of taking advantage of B + E followed by any model counter to compute k k, we could instead use B followed by any projected model counter (where the projection is onto I). The other way around, we could also exploit E on E and as a preprocessor for projected model counters. It would be interesting to implement both approaches and to determine whether they are helpful in practice.

Acknowledgments

We would like to thank the anonymous reviewers for their attentive reading, useful comments and advices.

References [Apsel and Brafman, 2012] U. Apsel and R. I. Brafman.

Lifted MEU by weighted model counting. In Proc. of AAAI 12, 2012. [Audemard and Simon, 2009] G. Audemard and L. Simon.

Predicting learnt clauses quality in modern sat solver. In Proc. of IJCAI 09, pages 399 404, 2009. [Aziz et al., 2015] R. A. Aziz, G. Chu, Ch. J. Muise, and P. J.

Stuckey. #9sat: Projected model counting. In Proc. of SAT 15, pages 121 137, 2015. [Bacchus and Winter, 2004] F. Bacchus and J. Winter. Ef-

fective preprocessing with hyper-resolution and equality reduction. In Proc. of SAT 04, pages 341 355, 2004. [Bacchus et al., 2003] F. Bacchus, S. Dalmao, and T. Pitassi.

Algorithms and complexity results for #sat and Bayesian inference. In Proc. of FOCS 03, pages 340 351, 2003. [Beth, 1953] E.W. Beth. On Padoa s method in the theory

of deﬁnition. Indagationes mathematicae, 15:330 339, 1953. [Biere, 2014] A. Biere. Lingeling essentials, A tutorial on

design and implementation aspects of the the SAT solver lingeling. In Proc. of POS 14, page 88, 2014. [Chavira and Darwiche, 2008] M. Chavira and A. Darwiche.

On probabilistic inference by weighted model counting. Artiﬁcial Intelligence, 172(6-7):772 799, 2008. [Choi et al., 2013] A. Choi, D. Kisa, and A. Darwiche. Com-

piling probabilistic graphical models using sentential decision diagrams. In Proc. of ECSQARU 13, pages 121 132, 2013. [Darwiche, 2001] A. Darwiche. Decomposable negation normal form. Journal of the Association for Computing Machinery, 48(4):608 647, 2001. [Darwiche, 2004] A. Darwiche. New advances in compiling

cnf into decomposable negation normal form. In Proc. of ECAI 04, pages 328 332, 2004. [Domshlak and Hoffmann, 2006] C. Domshlak and J. Hoff-

mann. Fast probabilistic planning through weighted model counting. In Proc. of ICAPS 06, pages 243 252, 2006. [Een and Biere, 2005] N. Een and A. Biere. Effective pre-

processing in SAT through variable and clause elimination. In Proc. of SAT 05, pages 61 75, 2005. [Feiten et al., 2012] L. Feiten, M. Sauer, T. Schubert, A. Czutro, E. B ohl, I. Polian, and B. Becker. #SATbased vulnerability analysis of security components - A case study. In Proc. of DFT 12, pages 49 54, 2012. [Gomes et al., 2009] C. P. Gomes, A. Sabharwal, and B. Sel-

man. Model counting. In Handbook of Satisﬁability, pages 633 654. 2009. [Han and Somenzi, 2007] H. Han and F. Somenzi. Alembic:

An efﬁcient algorithm for cnf preprocessing. In Proc. of DAC 07, pages 582 587, 2007. [Heule et al., 2010] M. Heule, M. J arvisalo, and A. Biere.

Clause elimination procedures for cnf formulas. In Proc. of LPAR 10, pages 357 371, 2010.

[Heule et al., 2011] M. Heule, M. J arvisalo, and A. Biere.

Efﬁcient cnf simpliﬁcation based on binary implication graphs. In Proc. of SAT 11, pages 201 215, 2011. [J arvisalo et al., 2012] M. J arvisalo, A. Biere, and M. Heule.

Simulating circuit-level simpliﬁcations on cnf. Journal of Automated Reasoning, 49(4):583 619, 2012. [Lagniez and Marquis, 2014] J.-M. Lagniez and P. Marquis.

Preprocessing for propositional model counting. In Proc. of AAAI 14, pages 2688 2694, 2014. [Lang and Marquis, 2008] J. Lang and P. Marquis. On propositional deﬁnability. Artiﬁcial Intelligence, 172(89):991 1017, 2008. [Lynce and Marques-Silva, 2003] I. Lynce and J. Marques-

Silva. Probing-based preprocessing techniques for propositional satisﬁability. In Proc. of ICTAI 03, pages 105 110, 2003. [Manthey, 2012a] N. Manthey. Coprocessor 2.0 - A ﬂexible

CNF simpliﬁer - (tool presentation). In Proc. of SAT 12, pages 436 441, 2012. [Manthey, 2012b] N. Manthey. Solver description of RISS

2.0 and PRISS 2.0. Technical report, TU Dresden, Knowledge Representation and Reasoning, 2012. [Padoa, 1903] A. Padoa. Essai d une th eorie alg ebrique des

nombres entiers, pr ec ed e d une introduction logique a une th eorie d eductive quelconque. In Biblioth eque du Congr es International de Philosophie, pages 309 365. Paris, 1903. [Palacios et al., 2005] H. Palacios, B. Bonet, A. Darwiche,

and H. Geffner. Pruning conformant plans by counting models on compiled d-DNNF representations. In Proc. of ICAPS 05, pages 141 150, 2005. [Piette et al., 2008] C. Piette, Y. Hamadi, and L. Sa ıs. Vivi-

fying propositional clausal formulae. In Proc. of ECAI 08, pages 525 529, 2008. [Samer and Szeider, 2010] M. Samer and S. Szeider. Algo-

rithms for propositional model counting. J. Discrete Algorithms, 8(1):50 64, 2010. [Sang et al., 2004] T. Sang, F. Bacchus, P. Beame, H.A.

Kautz, and T. Pitassi. Combining component caching and clause learning for effective model counting. In Proc. of SAT 04, 2004. [Sang et al., 2005] T. Sang, P. Beame, and H. A. Kautz. Per-

forming Bayesian inference by weighted model counting. In Proc. of AAAI 05, pages 475 482, 2005. [Subbarayan and Pradhan, 2004] S. Subbarayan and D.K.

Pradhan. Ni VER: Non increasing variable elimination resolution for preprocessing SAT instances. In Proc. of SAT 04, pages 276 291, 2004. [Thurley, 2006] M. Thurley. sharp SAT - counting models with advanced component caching and implicit BCP. In Proc. of SAT 06, pages 424 429, 2006. [Tseitin, 1968] G.S. Tseitin. On the complexity of derivation

in propositional calculus, chapter Structures in Constructive Mathematics and Mathematical Logic, pages 115 125. Steklov Mathematical Institute, 1968.