# engineering_an_exact_pseudoboolean_model_counter__90f9c64b.pdf

Engineering an Exact Pseudo-Boolean Model Counter*

Suwei Yang1,2,3, Kuldeep S. Meel3,4

1Grab Taxi Holdings 2Grab-NUS AI Lab 3National University of Singapore 4University of Toronto

Model counting, a fundamental task in computer science, involves determining the number of satisfying assignments to a Boolean formula, typically represented in conjunctive normal form (CNF). While model counting for CNF formulas has received extensive attention with a broad range of applications, the study of model counting for Pseudo-Boolean (PB) formulas has been relatively overlooked. Pseudo-Boolean formulas, being more succinct than propositional Boolean formulas, offer greater flexibility in representing real-world problems. Consequently, there is a crucial need to investigate efficient techniques for model counting for PB formulas. In this work, we propose the first exact Pseudo-Boolean model counter, PBCount, that relies on knowledge compilation approach via algebraic decision diagrams. Our extensive empirical evaluation shows that PBCount can compute counts for 1513 instances while the current state-of-the-art approach could only handle 1013 instances. Our work opens up several avenues for future work in the context of model counting for PB formulas, such as the development of preprocessing techniques and exploration of approaches other than knowledge compilation.

1 Introduction

Propositional model counting involves computing the number of satisfying assignments to a Boolean formula. Model counting is closely related to the Boolean satisfiability problem where the task is to determine if there exists an assignment of variables such that the Boolean formula evaluates to true. Boolean satisfiability and model counting have been extensively studied in the past decades and are the cornerstone of an extensive range of real-life applications such as software design, explainable machine learning, planning, and probabilistic reasoning (Bacchus, Dalmao, and Pitassi 2003; Narodytska et al. 2019; Jackson 2019; Fan, Miller, and Mitra 2020). Owing to decades of research, there are numerous tools and techniques developed for various aspects of Boolean satisfiability and model counting, from Boolean formula preprocessors to SAT solvers and model counters.

*The full version of the paper is available at https://arxiv.org/abs/2312.12341 and code is available at https://github.com/grab/pbcount Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

The dominant representation format of Boolean formulas is Conjunctive Normal Form (CNF), and accordingly, the tools in the early days focused on CNF as the input format. Over the past decade and a half, there has been considerable effort in exploring other representation formats: one such format that has gained significant interest from the community is Pseudo-Boolean (PB) formulas, which are expressed as the conjunction of linear inequalities. PB formulas are shown to be more succinct than CNF formulas and natural for problems such as Knapsack, sensor placement, binarized neural networks, and the like. Furthermore, PB formulas are able to express constraints more succinctly compared to Boolean formulas in CNF (Berre et al. 2018). As an example, a single PB constraint is sufficient to express atmost-k and at-least-k types of cardinal constraints whereas the equivalent in CNF would require a polynomial number of clauses (Sinz 2005). On a higher level, an arbitrary CNF clause can be expressed with a single PB constraint but the converse is not true (Berre et al. 2018). The past decade has witnessed the development of satisfiability solving techniques based on the underlying proof systems naturally suited to PB constraints, and accordingly, the state-ofthe-art PB solvers, such as Rounding Sat significantly outperform CNF solvers on problems that are naturally encoded in PB (Elffers and Nordstr om 2018; Devriendt 2020; Devriendt et al. 2021). In contrast to satisfiability, almost all the work in the context of model counting has focused on the representation of Boolean formulas in Conjunctive Normal Form (CNF), with the sole exception of the development of an approximate model counter for PB formulas (Yang and Meel 2021). The primary contribution of this work is to address the aforementioned gap through the development of a native scalable exact model counter, called PBCount, for PB formulas. PBCount is based on the knowledge compilation paradigm, and in particular, compiles a given PB formula into algebraic decision diagrams (ADDs) (Bahar et al. 1993), which allows us to perform model counting. We perform extensive empirical evaluations on benchmark instances arising from different applications, such as sensor placement, multi-dimension knapsack, and combinatorial auction benchmarks (Gens and Levner 1980; Blumrosen and Nisan 2007; Latour, Sen, and Meel 2023). Our evaluations highlighted the efficacy of PBCount against ex-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

isting state-of-the-art CNF model counters. In particular, PBCount is able to successfully count 1513 instances while the prior state of the art could only count 1013 instances, thereby demonstrating significant runtime improvements. It is worth remarking that PBCount achieves superior performance with substantially weaker preprocessing techniques in comparison to techniques employed in CNF model counters, making a strong case for the advantages of native PB model counting and reasoning. Furthermore, given the crucial importance of preprocessing techniques for CNF counting, we hope our work will motivate the development of preprocessing techniques for PB model counting. The rest of the paper is organized as follows: We discuss the preliminaries and existing counting algorithm in Section 2. In Section 3, we discuss existing works and how they relate to our approach, which we detail in Section 4. Following that, we analyze the empirical results of PBCount against existing tools in Section 5 and conclude in Section 6.

2 Preliminaries Boolean Formula A Boolean variable can take values true or false. A literal is either a Boolean variable or its negation. Let F be a Boolean formula. F is in conjunctive normal form (CNF) if F is a conjunction of clauses, where each clause is a disjunction of literals. F is satisfiable if there exists an assignment τ of variables of F such that F evaluates to true. We refer to τ as a satisfying assignment of F and denote the set of all τ as Sol(F). Model counting for Boolean formula F refers to the task of determining |Sol(F)|.

Pseudo-Boolean Formula A PB constraint is either an equality or inequality of the form Pn i=1 aixi k where x1, ..., xn are Boolean literals, a1, ..., an, and k are integers, and is one of { , =, }. We refer to a1, ..., an as term coefficients in the PB constraint, where each term is of the form aixi. A PB formula, G, consists of a set of PB constraints. G is satisfiable if there exists an assignment τ of all variables of G such that all its PB constraints hold. PB model counting refers to the computation of |Sol(G)| where Sol(G) is the set of all satisfying assignments of G.

Projected Model Counting Let G be a formula defined over the set of variables X. Let Vi, Vj be subsets of X such that Vi Vj = and Vi Vj = X. Projected model counting of G on Vi refers to the number of assignments of all variables in Vi such that there exists an assignment of variables in Vj that makes G evaluate to true (Aziz et al. 2015). In the evaluations, CNF model counter baselines perform projected model counting on the original variables in the PB formula, to avoid additional counts due to auxiliary variables introduced in the PB to CNF conversion process.

Algebraic Decision Diagram An algebraic decision diagram (ADD) is a directed acyclic graph representation of a function f : 2X S where X is the set of Boolean variables that f is defined over, and S is an arbitrary set known as the carrier set. We denote the function represented by an ADD ψ as Func(ψ). The internal nodes of ADD represent decisions on variables x X and the leaf nodes represent s S. In this work, we focus on the setting where S Z.

As an example, an ADD representing 3x1 +4x2 is shown in Figure 1. In the figure, a dotted arrow from an internal node represents when the corresponding variable is set to false and a solid arrow represents when it is set to true.

Figure 1: An ADD representing 3x1 + 4x2

In addition, we make use of Apply and ITE operations on ADDs (Bryant 1986; Bahar et al. 1993). The Apply operation takes as input a binary operator , two ADDs ψ1, ψ2, and outputs an ADD ψ3 such that the Func(ψ3) = Func(ψ1) Func(ψ2). The ITE operation (if-then-else) involves 3 ADDs ψ1, ψ2, ψ3, where carrier set of ψ1 is restricted to {0, 1}. ITE outputs an ADD that is equivalent to having 1 valued leaf nodes in ψ1 replaced with ψ2 and 0 valued leaf nodes with ψ3.

Relation of Pseudo-Boolean Constraint to CNF Clause Given an arbitrary CNF clause D, one could always convert D to a PB constraint. Given that D is of the form Wm i=1 li, where l1, ..., lm are Boolean literals, D can be represented by a single PB constraint Pm i=1 aili 1 where all coefficients a1, ..., ai, ...am are 1. However, there are PB constraints that require polynomially many CNF clauses to represent. An example would be Pm i=1 li k which requires at least k of m literals to be true. We refer the reader to the Appendix for statistics of the number of variables and clauses before and after PB to CNF conversion for benchmarks used.

2.1 Model Counting with ADDs

In this work, we adapt the existing dynamic programming counting algorithm of ADDMC (Dudek, Phan, and Vardi 2020a), shown in Algorithm 1, to perform PB model counting with ADDs. This includes using the default ADDMC configurations for ADD variable ordering (MCS) and cluster ordering ρ (BOUQUET TREE). The algorithm takes in a list φ of ADDs, representing all constraints, and an order ρ of which to process the ADDs. ADD ψ is initialized with value 1. According to cluster ordering ρ, cluster ADDs ψj are formed using the Apply operation with operator on each of the individual constraint ADDs of constraints in the cluster. The cluster ADD ψj is combined with ψ using the same Apply operation. If variable x does not appear in later clusters in ρ, it is abstracted out from ψ (early projection process in ADDMC) using ψ W( x) ψ[x 7 0]+W(x) ψ[x 7 1] in line 8, where W( ) is the user provided literal weight function. In unweighted model counting, W( ) is 1 for all literals. Once all clusters have been processed, the unprocessed variables x of the formula G are abstracted out using the same operation as before (line 10).

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

After all variables are abstracted out, ψ is a constant ADD that represents the final count.

Algorithm 1: compute Count(φ, ρ) Input: φ - list of ADD, ρ - cluster merge ordering Output: model count 1: ψ constant ADD(1) 2: for cluster Aj ρ do 3: ψj constant ADD(1) 4: for constraint Ci Aj do 5: ψj ψj φ[Ci] 6: ψ ψ ψj 7: for each x ψ where x not in later clusters in ρ do 8: ψ W( x) ψ[x 7 0] + W(x) ψ[x 7 1] 9: for all unprocessed variable x do 10: ψ W( x) ψ[x 7 0] + W(x) ψ[x 7 1] 11: return get Value(ψ)

3 Related Work Boolean Formula Preprocessing Boolean formula preprocessing involves simplifying a given formula to reduce runtimes of downstream tasks such as determining satisfiability of the formula (SAT-solving) and model counting. Preprocessing is crucial to modern SAT solvers and model counters performance improvements in recent decades. There are numerous preprocessing techniques introduced over the years by the research community, some of which are unit propagation, bounded variable elimination, failed literal probing, and vivification (Dowling and Gallier 1984; Berre 2001; E en and Biere 2005; Piette, Hamadi, and Sais 2008). In this work, we adapt some of the SAT preprocessing techniques, namely unit propagation and a variant of failed literal probing, to simplify PB formulas.

Search-Based Model Counters Among the numerous existing CNF model counters, we can classify them into two main categories search-based model counters and decision diagram-based model counters. Notable existing search-based model counters include GPMC, Ganak, and Sharpsat-TD (Ryosuke Suzuki and Sakai 2017; Sharma et al. 2019; Korhonen and J arvisalo 2021). Search-based model counters work by setting values to variables in a given formula in an iterative manner, which is equivalent to implicitly exploring a search tree. In addition, search-based model counters adapt techniques such as sub-component caching from SAT solving for more efficient computation.

Decision Diagram-Based Model Counter Decision diagram-based model counters employ knowledge compilation techniques to compile a given formula into directed acyclic graphs (DAGs) and perform model counting with these DAGs. Some of the recent decision diagram-based model counters are D4, Exact MC, ADDMC, and its related variant DPMC (Lagniez and Marquis 2017; Dudek, Phan, and Vardi 2020a,b; Lai, Meel, and Yap 2021). D4 and Exact MC compile the formula in a top-down manner into the respective decision diagram forms. In contrast, ADDMC and DPMC (decision diagram mode) perform bottom-up

compilations of algebraic decision diagrams (ADDs). In this work, we based PBCount on ADDMC and introduced techniques to compile a PB constraint directly into an ADD and employ the same counting approach in ADDMC.

Pseudo-Boolean Conversion One way to perform PB model counting is to convert the PB formula to a Boolean formula and use existing CNF model counters. A notable tool for the conversion of PB to CNF is PBLib (Philipp and Steinke 2015). PBLib implements various encodings to convert PB formulas into CNF form, some of which include cardinality networks, sorting networks, and BDD-based encodings (E en and S orensson 2006; Ab ıo et al. 2011, 2013). In this work, we use default settings for the PBEncoder binary provided as part of PBLib to perform the required conversions. We subsequently compare PBCount against stateof-the-art CNF model counters. It is worth noting that the model counting task for PB formula becomes a projected model counting task of the corresponding CNF formula, as previously mentioned in Section 2.

We show the overall flow of PBCount in Figure 2. We first preprocess the PB formula using propagation and assumption probing. Subsequently, we compile each of the PB constraints into an algebraic decision diagram (ADD). Next, we merge constraint ADDs using Apply operation and perform model counting by abstracting out variables (Section 2.1). The model count would be the value after all variables are abstracted out. Without loss of generality, the algorithms described in this work handle PB constraints involving = and operators, as type constraints can be manipulated into type constraints.

Preprocess (Section 4.1)

Compile into individual ADD (Section 4.2)

Count with ADDs (Section 2.1)

PB formula G

Model Count PBCount

Figure 2: Overall flow of our PB model counter PBCount. Shaded boxes indicate our contributions.

4.1 Preprocessing

Propagate Assumption Probing PB formula G PB formula G

Preprocessing

Figure 3: Preprocessing of PB formula

The preprocessing phase of PBCount performs assumption probing and unit propagation (Biere, J arvisalo, and Kiesl 2021). PBCount repeatedly performs unit propagation and assumption probing until no change is detected, as shown in Algorithm 2.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm 2: Preprocess(G) Input: G - PB formula Output: G - preprocessed PB formula 1: mapping []; G G 2: repeat 3: for all single variable constraint C G do 4: mapping mapping Infer Decision(C) 5: G propagate(G , mapping) 6: for all variable x G do 7: mapping mapping Assum Probe(G , x) 8: G propagate(G , mapping) 9: until G does not change 10: return G

Sign Manipulation Let C be the PB constraint 3x1 4x2 3. One can multiply both sides of the constraint by 1 to form 3x1 + 4x2 3. In addition, one would be able to switch the sign of the coefficient of x2 as follows.

3x1 + 4x2 3 3x1 + 4(1 x2) 3 3x1 4 x2 1

In general, one is able to manipulate the sign of any term coefficient as shown in the example above. We use the above technique to optimize PB constraint compilation approaches which we discuss in later sections.

Propagation Propagation in the Pseudo-Boolean context refers to the simplification of the PB constraints if decisions on some PB variables can be inferred. In particular, one might be able to infer decisions on PB variable xi from PB constraint Cj when the constraint is of either 1) aixi k or 2) aixi = k forms. We defer the details of the Infer Decision algorithm to the Appendix.

Algorithm 3: Assum Probe(G, xi) Input: G - PB formula, xi - assumption variable Output: mapping of variable values 1: temp, mapping [] 2: for all constraint C G[xi 7 1] do 3: temp temp Infer Decision(C) 4: for all constraint C G[xi 7 0] do 5: temp temp Infer Decision(C) 6: for all variable xj, where j = i do 7: if exactly one literal of xj in temp then 8: mapping mapping temp[xj] 9: return mapping

Assumption Probing Assumption probing can be viewed as a weaker form of failed literal probing (Biere, J arvisalo, and Kiesl 2021) as well as single step look ahead propagation process. For an arbitrary variable xi G, where G is the PB formula, assumption probing involves performing propagation and decision inference independently for when xi = 0 and xi = 1. If another variable xj is inferred to have the same value assignment τ[xj] in both cases, then it

can be inferred that xj should be set to τ[xj] in all satisfying assignments of G. Algorithm 3 illustrates the process for a single variable xi, and in the preprocessing stage, we perform assumption probing on all variables in G.

4.2 Pseudo-Boolean Constraint Compilation In this work, we introduce two approaches, namely topdown and bottom-up, to compile each constraint of a PB formula into an ADD. We use T, k, and eq in place of PB constraint C when describing the compilation algorithms. T refers to the term list, which is a list of aixi terms of C. k is the constraint constant and eq indicates if C is = constraint.

Figure 4: An ADD ψ1 representing 3x1 + 4x2 3

Bottom-up ADD Constraint Compilation In order to compile an ADD which represents a PB constraint of the following form Pn i=1 aixi[ , =, ]k, we first start compiling the expression Pn i=1 aixi from literal and constant ADDs as shown by line 3 of Algorithm 4. A constant ADD which represents integer ai is a single leaf node that has value ai. A literal ADD comprises of an internal node, which represents variable x, and true and false leaf nodes, which represent the evaluated values of the literal if x is set to true and false. With the literal and constant ADDs, we use Apply with operator to form ADDs for each term aixi. We use Apply with + operator on term ADDs to form the ADD representing expression Pn i=1 aixi. As an example, the ADD ψ for the expression 3x1 + 4x2 is shown in Figure 1. To account for the inequality or equality, we look at the value of leaf nodes in expression ADD ψ and determine if they satisfy the constraint (lines 4 to 10). We replace the leaf nodes with 1 node if the constraint is satisfied and 0 node otherwise, the resultant ADD is illustrated in Figure 4.

Top-down ADD Constraint Compilation In contrast to the bottom-up ADD compilation approach, the top-down ADD compilation for a given PB constraint involves the ifthen-else (ITE) operation for decision diagrams. We only consider PB constraints that involve = or as mentioned previously. The top-down compilation algorithm (Algorithm 5) makes use of recursive calls of Algorithm 6 to construct an ADD that represents a given PB clause. In particular, Algorithms 5 and 6 work by iterating through the terms of the PB constraint using idx. The algorithms build the sub ADDs when the literal at position idx evaluates to true for the if-then case and otherwise for the else case of the ITE operation while updating the constraint constant k (lines 2-3 of Algorithm 5 and lines 9-10 of Algorithm 6). Notice that the top-down compilation approach allows for early termination when the current k value is negative for k case. However,

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm 4: compile Constraint Bottom Up(T, k, eq) Input: T - term list, k - constraint value, eq - indicator if constraint is = type Output: ψ - constraint ADD 1: ψ constant ADD(0) 2: for term t in T do 3: ψ += constant ADD(t.coeff) literal ADD(t.literal) 4: for node n in Leaf Node(ψ) do 5: if eq is true & n.value = k then 6: n.value 1 7: else if eq is false & n.value k then 8: n.value 1 9: else 10: n.value 0 11: return ψ

early termination is possible only if all unprocessed coefficients are positive, implying that k in subsequent recursive calls cannot increase. One way would be to sort the term list T in ascending order of term coefficients, processing terms with negative coefficients before positive coefficients.

Algorithm 5: compile Constraint Top Down(T, k, eq) Assumption: T is in ascending order of term coefficients or all coefficients are non-negative Input: T - term list, k - constraint value, eq - indicator if constraint is = type Output: ψ - constraint ADD 1: ψ literal ADD(T[0].literal) 2: ψlo compile TDRecur(T, k, eq, 1) 3: ψhi compile TDRecur(T, k T[0].coeff, eq, 1) 4: ψ.ITE(ψhi, ψlo) 5: return ψ

Optimizations for Bottom-up Compilation In the bottom-up compilation approach, an ADD is built from the individual literal and constant ADDs to represent the expression, before subsequently having leaf node values converted to 1 and 0 depending on if the PB constraint is satisfied. In the process, an ADD could be exponential in size with respect to the number of variables processed. In order to minimize the intermediate ADD during the compilation process, we introduce an optimization for bottom-up compilation. The key idea is to increase the number of shared sub-components of the intermediate ADD, and this amounts to processing the PB constraint terms in a manner that results in fewer distinct subset sums of term coefficients as every distinct subset sum requires a separate leaf node. To this end, we optimize the compilation process by sorting the terms according to the absolute values of their coefficients in ascending order. Subsequently, we manipulate the coefficients, using x = (1 x), of the terms such that alternate terms have coefficients of different signs. We defer the pseudo code to the Appendix.

Algorithm 6: compile TDRecur(T, k, eq, idx) Input: T - term list, k - current constraint value, eq-input constraint equality, idx-index of current term in T Output: ψ - constraint ADD from idx to end of T 1: if T[idx].coeff 0 then 2: is Pos true 3: if eq & is Pos & k < 0 then 4: return constant ADD(0) 5: else if !eq & is Pos & k 0 then 6: return constant ADD(1) 7: else if idx < T.length then 8: ψ literal ADD(T[idx].literal) 9: ψlo compile TDRecur(T, k, eq, idx + 1) 10: ψhi compile TDRecur(T, k T[idx].coeff, eq, idx + 1) 11: return ψ.ITE(ψhi, ψlo) 12: else 13: if eq & k = 0 then 14: return constant ADD(1) 15: else 16: return constant ADD(0)

Optimizations for Top-down Compilation Similarly, we also introduce optimizations for the top-down compilation approach. Recall that one would only be able to perform early termination for PB constraints of the form P aixi k after all negative coefficient terms have been processed. To this end, we manipulate all coefficients to be positive and adjust k accordingly so that early termination is possible. Furthermore, we sort the terms in descending value of the term coefficients as larger coefficients are more likely to satisfy the constraint. We defer the pseudo code to the Appendix.

Algorithm 7: compile Constraint Dynamic(T, k, eq) Input: T - term list, k - constraint value, eq-input constraint equality Output: ψ - constraint ADD Cond 1: T.length 25 and k < 25th percentile of T.coeff Cond 2: k < 25th percentile of T.coeff and unique coefficient rate 0.9 and unique adjacent difference rate 0.85 1: if cond 1 or cond 2 then 2: bottom Up false 3: else 4: bottom Up true 5: if bottom Up then 6: return optimize Compile Bottom Up(T, k, eq) 7: else 8: return optimize Compile Top Down(T, k, eq)

Dynamic Compilation A PB formula can include more than one PB constraint. As we will show in a case study in the experiments section, the choice of compilation approach has a substantial impact on overall runtime. To this end, we introduce a dynamic heuristic (Algorithm 7) to select the appropriate compilation approach and perform optimization of the compilation process as previously discussed. In Algo-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

rithm 7, we choose top-down compilation if either condition 1 or 2 is met. Conditions 1 and 2 are designed to be in favor of the botttom-up compilation approach, we provide performance analysis in the experiments section.

5 Experiments We performed extensive empirical evaluations to compare the runtime performance of PBCount with state-of-the-art exact model counters. Our empirical evaluation focuses on benchmarks arising from three application domains: sensor placement, auctions, and multi-dimensional knapsack. Through our evaluations and analysis, we sought to answer the following research questions:

RQ 1 How does the runtime performance of PBCount compare to that of the state-of-the-art approaches? RQ 2 How does the dynamic compilation approach impact the runtime performance of PBCount?

Setup We performed our evaluations on machines with AMD EPYC 7713 processors. Each benchmark instance is provided with 1 core, 16GB memory, and a timeout of 3600 seconds. Since all the state-of-the-art exact model counters take CNF as input, we employed the CNF model counters with the help of PB to CNF conversion tool PBLib1 (Philipp and Steinke 2015). We evaluated PBCount against stateof-the-art projected counters: DPMC, D42 and GPMC; D4 and GPMC are among the winners of the Projected counting track at Model Counting Competition 2022 and 2023.

Benchmarks We generated 3473 benchmarks of the following application areas sensor placement, auctions, and multi-dimension knapsack. We detail the benchmark statistics (number of variables and constraints) in the Appendix.

The sensor placement benchmark setting (1473 instances after removal of 0 counts) is adapted from prior work on identifying code sets (Latour, Sen, and Meel 2023). Given a network graph, a maximum number of sensors allowed, count the number of ways to place sensors such that failures in the network are uniquely identifiable. For the auction benchmark setting (1000 instances), we adapt the combinatorial auction setting (Blumrosen and Nisan 2007) to a counting variant. There are m participants and n items, each of which can be shared by one or more participants. Given that each participant has a minimum utility threshold, we count the number of ways the n items can be shared such that all participants achieve their minimum threshold. The utilities are additive and can be negative. For the multi-dimension knapsack benchmark setting (Gens and Levner 1980) (1000 instances), there are n items and constraints on m different features or dimensions of the items in the form of the sum of each dimension should not exceed a given constant. Given such a setting, the goal would be the count the number of subsets of items that satisfies the constraints.

1We used the provided PBEncoder for conversion. 2Binary from Model Counting Competition 2022

5.1 RQ1: Runtime Comparison

0 400 800 1200 1600 2000 Instances completed

Time elapsed (s)

PBCount GPMC DPMC D4

Figure 5: Cactus plot of number of benchmark instances completed by different counters. A point (x, y) on each line plot indicates the corresponding counter completes x number of benchmarks after y seconds has elapsed.

We show the cactus plot of the number of instances completed by each counter out of the 3473 benchmarks in Figure 5. The exact number of instances completed by each counter for each benchmark set is shown in Table 1. Additionally, we provide individual cactus plots for each set of benchmarks in the Appendix.

Benchmarks DPMC D4 GPMC PBCount

Sensor placement 625 566 575 638 M-dim knapsack 81 281 279 503 Auction 76 116 159 372

Total 782 963 1013 1513

Table 1: Number of benchmark instances completed by each counter in 3600s, higher is better.

In sensor placement benchmarks, PBCount count completed 638 instances, narrowly ahead of DPMC (625 instances), and more than D4 (566 instances) and GPMC (575 instances). In multi-dimension knapsack (M-dim knapsack) and auction benchmarks, PBCount significantly outperforms the competing counters. PBCount completed 503 M-dim knapsack instances, around 1.8 that of GPMC (279 instances) and D4 (281 instances), and 6.2 that of DPMC (81 instances). In auction benchmarks, PBCount completed 372 instances, around 2.3 that of GPMC (159 instances), 3.2 of D4 (116 instances), and 4.9 of DPMC (76 instances). Overall, PBCount completed 1513 instances out of 3473 total instances, around 1.5 that of GPMC, 1.6 of D4, and 1.9 that of DPMC. Note that PBCount achieved superior performance with minimal preprocessing over GPMC, which has advanced preprocessing capabilities. Our results demonstrate the significant performance advantages of counting natively for PB formulas and provide an affirmative answer to RQ1.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

5.2 RQ2: Analysis of Compilation Approaches We now focus on the analysis of different compilation approaches: top-down (Algorithm 5), bottom-up (Algorithm 4), and dynamic (Algorithm 7). The results in Table 2 show that for the benchmarks, bottom-up PB constraint compilation outperforms top-down approach significantly in auction and multi-dimension knapsack and to a lesser degree sensor placement. In addition, the evaluation result also highlights that our dynamic compilation heuristic and constraint term optimization closely match the bottomup approach, with the exception of completing 3 fewer instances in auction benchmarks. However, in the 372 auction instances completed by both bottom-up and dynamic approaches, the dynamic approach with term coefficient optimization completes the counting task faster for 257 instances. We show the scatter plot comparison in Figure 6.

Benchmarks Top-down Bottom-up Dynamic

Sensor placement 580 638 638 M-dim knapsack 109 503 503 Auction 158 375 372

Table 2: Number of benchmarks completed by PBCount when employing different compilation strategies, higher number indicates better performance.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Bottom up runtime (log10 axis)

Dynamic runtime (log10 axis)

Figure 6: Dynamic vs bottom-up runtime (log10) for auction benchmarks. Points beneath red diagonal line indicate dynamic compilation is faster (257 points), points above otherwise (115 points).

Compilation Approach Performance Case Study We provide an example to highlight the performance impact of the choice of compilation approach. The example involves the following PB formula in Equation 1 with a single constraint that has unique term coefficients:

i=0 2ixi+1 +

i=1 3ixi+13 +

i=1 7ixi+23 k (1)

We vary the value of k in the above PB constraint from 101 to 105 and compare the runtime between top-down and bottom-up compilation approaches in Table 3. Note that bottom-up compilation takes around the same time irrespective of k as there is no early termination. On the other hand for top-down compilation, the PB constraint is easily satisfied when k is small and thus allows for early termination, leading to significant time savings compared to when k is large. Notice that when top-down compilation is unable to terminate early, it is much slower than bottom-up compilation even when all term coefficients are unique.

Approach k value

101 102 103 104 105

Top-down 0.005 0.009 0.228 8.586 46.071 Bottom-up 6.927 7.202 7.198 7.434 6.732

Table 3: Runtime (seconds) to complete model counting for formula in Equation 1. Lower is better

Approach k value

101 102 103 104 105

Top-down 3.325 61.753 60.530 60.881 64.097 Bottom-up 0.005 0.004 0.004 0.004 0.004

Table 4: Runtime (seconds) to complete model counting for formula in Equation 1 with all coefficients set to 1.

As mentioned previously, bottom-up compilation benefits from having large numbers of same term coefficients or collisions in subset sums of coefficients. To this end, we changed all term coefficients of the PB constraint in equation 1 to 1 and compared runtimes in Table 4. We observed around three orders of magnitude reduction in the runtime of the bottom-up compilation approach. In contrast, the topdown approach terminates early only in k = 101 case and requires full enumeration in other cases. In the absence of early termination, top-down compilation approach is much slower than bottom-up compilation approach, and this is reflected in our dynamic compilation heuristic.

6 Conclusion In this work, we introduce the first exact PB model counter, PBCount. PBCount directly compiles PB formulas into ADDs, enabling us to reuse the ADD counting framework in ADDMC. In the design of PBCount, we introduce both top-down and bottom-up PB constraint compilation techniques and highlight the performance differences between them. While we introduced dynamic compilation heuristics to determine the per constraint compilation method and preliminary preprocessing techniques for PB formulas, it would be of interest to develop more advanced heuristics and preprocessing techniques in future works. A strong motivation is PBCount s performance lead over existing CNF model counters. We hope this work will gather more interest in PB formulas and PB model counting.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Acknowledgments

The authors thank Anna L.D. Latour for helping during benchmark generation. The authors thank Arijit Shaw and Jiong Yang for constructive discussions. The authors thank reviewers for providing feedback. This work was supported in part by the Grab-NUS AI Lab, a joint collaboration between Grab Taxi Holdings Pte. Ltd. and National University of Singapore, and the Industrial Postgraduate Program (Grant: S18-1198-IPP-II), funded by the Economic Development Board of Singapore. This work was supported in part by National Research Foundation Singapore under its NRF Fellowship Programme [NRF-NRFFAI12019-0004], Ministry of Education Singapore Tier 2 grant [MOE-T2EP20121-0011], and Ministry of Education Singapore Tier 1 Grant [R-252-000-B59-114]. The computational work for this article was performed on resources of the National Supercomputing Centre, Singapore.

Ab ıo, I.; Nieuwenhuis, R.; Oliveras, A.; and Rodr ıguezcarbonell, E. 2011. BDDs for Pseudo-Boolean Constraints - Revisited. In International Conference on Theory and Applications of Satisfiability Testing. Ab ıo, I.; Nieuwenhuis, R.; Oliveras, A.; and Rodr ıguezcarbonell, E. 2013. A Parametric Approach for Smaller and Better Encodings of Cardinality Constraints. In International Conference on Principles and Practice of Constraint Programming. Aziz, R. A.; Chu, G.; Muise, C.; and Stuckey, P. J. 2015. # SAT: Projected Model Counting. In International Conference on Theory and Applications of Satisfiability Testing. Bacchus, F.; Dalmao, S.; and Pitassi, T. 2003. Algorithms and complexity results for #SAT and Bayesian inference. 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. Bahar, R. I.; Frohm, E. A.; Gaona, C. M.; Hachtel, G. D.; Macii, E.; Pardo, A.; and Somenzi, F. 1993. Algebraic decision diagrams and their applications. In International Conference on Computer Aided Design. Berre, D. L. 2001. Exploiting the real power of unit propagation lookahead. Electron. Notes Discret. Math., 9: 59 80. Berre, D. L.; Marquis, P.; Mengel, S.; and Wallon, R. 2018. Pseudo-Boolean Constraints from a Knowledge Representation Perspective. In International Joint Conference on Artificial Intelligence. Biere, A.; J arvisalo, M.; and Kiesl, B. 2021. Preprocessing in SAT Solving. In Handbook of Satisfiability. Blumrosen, L.; and Nisan, N. 2007. Algorithmic Game Theory, chapter 11, 267 300. Cambridge University Press. Bryant, R. E. 1986. Graph-Based Algorithms for Boolean Function Manipulation. IEEE Transactions on Computers, C-35(8): 677 691. Devriendt, J. 2020. Watched Propagation of 0-1 Integer Linear Constraints. In International Conference on Principles and Practice of Constraint Programming.

Devriendt, J.; Gocht, S.; Demirovic, E.; Nordstr om, J.; and Stuckey, P. J. 2021. Cutting to the Core of Pseudo-Boolean Optimization: Combining Core-Guided Search with Cutting Planes Reasoning. In AAAI Conference on Artificial Intelligence. Dowling, W. F.; and Gallier, J. H. 1984. Linear-Time Algorithms for Testing the Satisfiability of Propositional Horn Formulae. J. Log. Program., 1: 267 284. Dudek, J. M.; Phan, V. H. N.; and Vardi, M. Y. 2020a. ADDMC: Weighted Model Counting with Algebraic Decision Diagrams. In AAAI Conference on Artificial Intelligence. Dudek, J. M.; Phan, V. H. N.; and Vardi, M. Y. 2020b. DPMC: Weighted Model Counting by Dynamic Programming on Project-Join Trees. In International Conference on Principles and Practice of Constraint Programming. E en, N.; and Biere, A. 2005. Effective Preprocessing in SAT Through Variable and Clause Elimination. In International Conference on Theory and Applications of Satisfiability Testing. E en, N.; and S orensson, N. 2006. Translating Pseudo Boolean Constraints into SAT. J. Satisf. Boolean Model. Comput., 2: 1 26. Elffers, J.; and Nordstr om, J. 2018. Divide and Conquer: Towards Faster Pseudo-Boolean Solving. In International Joint Conference on Artificial Intelligence. Fan, C.; Miller, K.; and Mitra, S. 2020. Fast and Guaranteed Safe Controller Synthesis for Nonlinear Vehicle Models. Computer Aided Verification, 12224: 629 652. Gens, G.; and Levner, E. 1980. Complexity of approximation algorithms for combinatorial problems: a survey. SIGACT News, 12: 52 65. Jackson, D. 2019. Alloy: a language and tool for exploring software designs. Commun. ACM, 62. Korhonen, T.; and J arvisalo, M. 2021. Integrating Tree Decompositions into Decision Heuristics of Propositional Model Counters. In International Conference on Principles and Practice of Constraint Programming. Lagniez, J.-M.; and Marquis, P. 2017. An Improved Decision-DNNF Compiler. In International Joint Conference on Artificial Intelligence. Lai, Y.; Meel, K. S.; and Yap, R. H. C. 2021. The Power of Literal Equivalence in Model Counting. In AAAI Conference on Artificial Intelligence. Latour, A.; Sen, A.; and Meel, K. 2023. Solving the Identifying Code Set Problem with Grouped Independent Support. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence. Narodytska, N.; Shrotri, A. A.; Meel, K. S.; Ignatiev, A.; and Marques-Silva, J. 2019. Assessing Heuristic Machine Learning Explanations with Model Counting. In International Conference on Theory and Applications of Satisfiability Testing. Philipp, T.; and Steinke, P. 2015. PBLib - A Library for Encoding Pseudo-Boolean Constraints into CNF. In International Conference on Theory and Applications of Satisfiability Testing.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Piette, C.; Hamadi, Y.; and Sais, L. 2008. Vivifying Propositional Clausal Formulae. In European Conference on Artificial Intelligence. Ryosuke Suzuki, K. H.; and Sakai, M. 2017. Improvement of Projected Model-Counting Solver with Component Decomposition Using SAT Solving in Components. In JSAI Technical Report. Sharma, S.; Roy, S.; Soos, M.; and Meel, K. S. 2019. GANAK: A Scalable Probabilistic Exact Model Counter. In Proceedings of International Joint Conference on Artificial Intelligence. Sinz, C. 2005. Towards an Optimal CNF Encoding of Boolean Cardinality Constraints. In International Conference on Principles and Practice of Constraint Programming. Yang, J.; and Meel, K. S. 2021. Engineering an Efficient PB-XOR Solver. In International Conference on Principles and Practice of Constraint Programming.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)