# combining_the__da6252a3.pdf

Combining the k-CNF and XOR Phase-Transitions

Jeffrey M. Dudek

Rice University

Kuldeep S. Meel

Rice University

Moshe Y. Vardi

Rice University

The runtime performance of modern SAT solvers on random k-CNF formulas is deeply connected with the phase-transition phenomenon seen empirically in the satisﬁability of random k-CNF formulas. Recent universal hashing-based approaches to sampling and counting crucially depend on the runtime performance of SAT solvers on formulas expressed as the conjunction of both k-CNF and XOR constraints (known as k-CNF-XOR formulas), but the behavior of random k-CNF-XOR formulas is unexplored in prior work. In this paper, we present the ﬁrst study of the satisﬁability of random k-CNF-XOR formulas. We show empirical evidence of a surprising phase-transition that follows a linear trade-off between k-CNF and XOR constraints. Furthermore, we prove that a phasetransition for k-CNF-XOR formulas exists for k = 2 and (when the number of k-CNF constraints is small) for k > 2.

1 Introduction

The Constraint-Satisfaction Problem (CSP) is one of the most fundamental problem in computer science, with a wide range of applications arising from diverse areas such as artiﬁcial intelligence, programming languages, biology and the like [Apt, 2003; Dechter, 2003]. The problem is, in general, NP-complete, and the study of run-time behavior of CSP techniques is a topic of major interest in AI, cf. [Dechter and Meiri, 1994]. Of speciﬁc interest is the behavior of CSP solvers on random problems [Cheeseman et al., 1991]. Speciﬁcally, a deep connection was discovered between the density (ratio of clauses to variables) of random propositional CNF ﬁxed-width (ﬁxed number of literals per clause) formulas and the runtime behavior of SAT solvers on such formulas [Mitchell et al., 1992; Crawford and Auton, 1993; Kirkpatrick and Selman, 1994]. The key experimental ﬁndings are: (1) as the density of random CNF instances increases, the probability of satisﬁability decreases with a precipitous drop, believed to be a phase-transition, around the

The author list has been sorted alphabetically by last name; this should not be used to determine the extent of authors contributions.

point where the satisﬁability probability is 0.5, and (2) instances at the phase-transition point are particularly challenging for DPLL-based SAT solvers. Indeed, phase-transition instances serve as a source of challenging benchmark problems in SAT competitions [Belov et al., 2014]. The connection between runtime performance and the satisﬁability phasetransition has propelled the study of such phase-transition phenomena over the past two decades [Achlioptas, 2009], including detailed studies of how SAT solvers scale at different densities [Coarfa et al., 2003; Mu and Hoos, 2015].

For random k-CNF formulas, where every clause contains exactly k literals, experiments suggest a speciﬁc phasetransition density, for example, density 4.26 for random 3SAT, but establishing this analytically has been highly challenging [Coja-Oghlan and Panagiotou, 2013], and it has been established only for for k = 2 [Chv atal and Reed, 1992; Goerdt, 1996] and all large enough k [Ding et al., 2015]. A phase-transition phenomenon has also been identiﬁed in random XOR formulas (conjunctions of XOR constraints). Creignou and Daud e [1999] proved a phase-transition at density 1 for variable-width random XOR formulas. Creignou and Daud e [2003] also proved the existence of a phase transition for random l-XOR formulas (where each XOR-clause contains exactly l literals), for l 1, without specifying an exact location for the phase-transition. Dubois and Mandler [2002] independently identiﬁed the location of a phase transition for random 3-XOR formulas. More recently, Pittel and Sorkin [2015] identiﬁed the location of the phase-transition for l-XOR formulas for l > 3.

Despite the abundance of prior work on the phasetransition phenomenon in the satisﬁability of random k-CNF formulas and XOR formulas, no prior work considers the satisﬁability of random formulas with both k-clauses and variable-width XOR-clauses together, henceforth referred as k-CNF-XOR formulas. Recently, successful hashing-based approaches to the fundamental problems of constrained sampling and counting employ SAT solvers to solve k-CNF-XOR formulas [Chakraborty et al., 2013a; 2013b; 2014b; 2016; Meel et al., 2016]. Unlike previous approaches to sampling and counting, hashing-based approaches provide strong theoretical guarantees and scale to real-world instances involving formulas with hundreds of thousands of variables. The scalability of these approaches crucially depends on the runtime performance of SAT solvers in handling k-CNF-XOR

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

formulas [Ivrii et al., 2015]. Moreover, since the phasetransition behavior of k-CNF constraints have been analyzed to explain runtime behavior of SAT solvers [Achlioptas and Coja-Oghlan, 2008], we believe that analysis of the phasetransition phenomenon for k-CNF-XOR formula is the ﬁrst step towards demystifying the runtime behavior of CNF-XOR solvers such as Crypto Mini SAT [Soos et al., 2009] and thus explain the runtime behavior of hashing-based algorithms.

The primary contribution of this work is the ﬁrst study of phase-transition phenomenon in the satisﬁability of random k-CNF-XOR formulas, henceforth referred to as the k-CNFXOR phase-transition. In particular:

1. We present (in Section 3) experimental evidence for a k-

CNF-XOR phase-transition that follows a linear tradeoff between k-CNF clauses and XOR clauses.

2. We prove (in Section 4) that the k-CNF-XOR phase-

transition exists when the ratio of k-CNF clauses to variables is small. This fully characterizes the phasetransition when k = 2.

3. We prove (in Section 4) upper and lower bounds on the

location of the k-CNF-XOR phase-transition region.

4. We conjecture (in Section 5) that the exact location of

a phase-transition for k 3 follows the linear trade-off between k-CNF and XOR clauses seen experimentally.

2 Notations and Preliminaries

Let X = {X1, , Xn} be a set of propositional variables and let F be a formula deﬁned over X. A satisfying assignment or witness of F is an assignment of truth values to the variables in X such that F evaluates to true. Let #F denote the number of satisfying assignments of F. We say that F is satisﬁable (or sat.) if #F > 0 and that F is unsatisﬁable (or unsat.) if #F = 0.

We use Pr(X) to denote the probability of event X. We say that an inﬁnite sequence of random events E1, E2, occurs with high probability (denoted, w.h.p.) if lim

n!1 Pr(En) = 1.

We use E [Y ] and Var [Y ] to denote respectively the expected value and variance of a random variable Y . We use Cov [Y, Z] to denote the covariance of random variables Y and Z. We use ok(1) to denote a term which converges to 0 as k ! 1.

A k-clause is the disjunction of k literals out of {X1, , Xn}, with each variable possibly negated. For ﬁxed positive integers k and n and a nonnegative real number r, let the random variable Fk(n, rn) denote the formula consisting of the conjunction of drne k-clauses, with each clause chosen uniformly and independently from all

2k possible k-clauses over n variables.

The early experiments on Fk(n, rn) [Mitchell et al., 1992; Crawford and Auton, 1993; Kirkpatrick and Selman, 1994] led to the following conjecture:

Conjecture 1 (Satisﬁability Phase-Transition Conjecture). For every integer k 2, there is a critical ratio rk such that:

1. If r < rk, then Fk(n, rn) is satisﬁable w.h.p.

2. If r > rk, then Fk(n, rn) is unsatisﬁable w.h.p.

The Conjecture was quickly proved for k = 2, where r2 = 1 [Chv atal and Reed, 1992; Goerdt, 1996]. In recent work, Ding, Sly, and Sun established the Satisﬁability Phase Transition Conjecture for all sufﬁciently large k [Ding et al., 2015]. The Conjecture has remained elusive for small values of k 3, although values for these critical ratios rk can be estimated experimentally, e.g., r3 seems to be near 4.26.

An XOR-clause over n variables is the exclusive or of either 0 or 1 together with a subset of the variables X1, , Xn. An XOR-clause including 0 (respectively, 1) evaluates to true if and only if an odd (respectively, even) number of the included variables evaluate to true. Note that all k-clauses contain exactly k variables, whereas the number of variables in an XOR-clause is not ﬁxed; a uniformly chosen XOR-clause over n variables contains n

2 variables in expectation. For a ﬁxed positive integer n and a nonnegative real number s, let the random variable Q(n, sn) denote the formula consisting of the conjunction of dsne XOR-clauses, with each clause chosen uniformly and independently from all 2n+1 XOR-clauses over n variables. Creignou and Daude [1999; 2003] proved a phase-transition in the satisﬁability of Q(n, sn): if s < 1 then Q(n, sn) is satisﬁable w.h.p., while if s > 1 then Q(n, sn) is unsatisﬁable w.h.p.

A k-CNF-XOR formula is the conjunction of some number of k-clauses and XOR-clauses. For ﬁxed positive integers k and n and ﬁxed nonnegative real numbers r and s, let the random variable k(n, rn, sn) denote the formula consisting of the conjunction of drne k-clauses and dsne XOR-clauses, with each clause chosen uniformly and independently from all possible k-clauses and XOR-clauses over n variables. (The motivation for using ﬁxed-width clauses and variable-width XOR-clauses comes from the hashing-based approaches to constrained sampling and counting discussed in Section 1.) Although random k-CNF formulas and XOR formulas have been well studied separately, no prior work considers the satisﬁability of random mixed formulas arising from conjunctions of k-clauses and XOR-clauses.

3 Experimental Results To explore empirically the behavior of the satisﬁability of k-CNF-XOR formulas, we built a prototype implementation in Python that employs the Crypto Mini SAT1 [Soos et al., 2009] solver to check satisﬁability of k-CNF-XOR formulas. We chose Crypto Mini SAT due to its ability to handle the combination of k-clauses and XOR-clauses efﬁciently [Chakraborty et al., 2014b; 2014a]. The objective of the experimental setup was to empirically determine the behavior of Pr( k(n, rn, sn) is sat) with respect to r and s, the k-clause and XOR-clause densities respectively, for ﬁxed k and n.

3.1 Experimental Setup We ran 11 experiments with various values of k and n. For k = 2, we ran experiments for n 2 {25, 50, 100, 150}. For k = 3, we ran experiments for n 2 {25, 50, 100}. For k = 4 and k = 5, we ran experiments for n 2 {25, 50}. We were not able to run experiments for values of n signiﬁcantly larger

1http://www.msoos.org/cryptominisat4/

0.0 0.5 1.0 1.5 2.0 r: Density of 2-clauses

s: Density of XOR-clauses

Figure 1: Phase transition for 2-CNF-XOR formulas

than those listed above: at some k-clause and XOR-clause densities, the run-time of Crypto Mini SAT scaled far beyond our computational capabilities.

In each experiment, the XOR-clause density s ranged from 0 to 1.2 in increments of 0.02. Since the location of phasetransition for k CNF depends on k, the range of k-clause density r also depends on k. For k = 3, r ranged from from 0 to 6 in increments of 0.04; for k = 5, r ranged from 0 to 26 in increments of 0.43, and the like.

To uniformly choose a k-clause we uniformly selected without replacement k out of the variables {X1, , Xn}. For each selected variable Xi, we include exactly one of the literals Xi or Xi in the k-clause, each with probability 1/2. The disjunction of these k literals is a uniformly chosen kclause. To uniformly choose an XOR-clause, we include each variable of {X1, , Xn} with probability 1/2 in a set A of variables. Additionally we include in A exactly one of 0 or 1, each with probability 1

2. The exclusive-or of all elements of A is a uniformly chosen XOR-clause. For each assignment of values to k, n, r, and s, we evaluated satisﬁability, using Crypto Mini SAT, of 100 uniformly generated formulas of k(n, rn, sn) by constructing the conjunction of drne kclauses and dsne XOR-clauses, with each clause chosen uniformly and independently as described above. The percentage of satisﬁable formulas gives us an empirical estimate of Pr( k(n, rn, sn) is satisﬁable).

Each experiment was run on a node within a highperformance computer cluster. These nodes contain 12processor cores at 2.83 GHz each with 48 GB of RAM per node. Each formula was given a timeout of 1000 seconds.

3.2 Results

We present scatter plots demonstrating the behavior of satisﬁability of k-CNF-XOR formulas. For lack of space, we present results only for three experiments2. The plots for k = 2, 3 and 5 are shown in Figure 1, 2, and 3 respectively. The value of n is set to 150, 100, and 50 respectively for the three experiments above.

2The data from all experiments is available at http://www.cs.rice. edu/CS/Veriﬁcation/Projects/CUSP/

0 1 2 3 4 5 6 r: Density of 3-clauses

s: Density of XOR-clauses

Figure 2: Phase transition for 3-CNF-XOR formulas

0 5 10 15 20 25 r: Density of 5-clauses

s: Density of XOR-clauses

Figure 3: Phase transition for 5-CNF-XOR formulas

Each ﬁgure is a 2D plot, representing the observed probability that k(n, rn, sn) is satisﬁable as the density of kclauses r and the density of XOR-clauses s varies. The xaxis indicates the density of k-clauses r. The y-axis indicates the density of XOR-clauses s. The dark (respectively, light) regions represent clause densities where almost all (respectively, no) sampled formulas were satisﬁable.

Note that k(n, rn, sn) consists only of XOR clauses when r = 0. Examining the ﬁgures along the line r = 0 the phase-transition location is around (r = 0, s = 1), which matches previous theoretical results on the phase-transition for XOR formulas [Creignou and Daud e, 1999]. Likewise, k(n, rn, 0) = Fk(n, rn) and, by examining the ﬁgures along the line s = 0, we observe phase-transition locations that match previous studies on the phase-transition for k CNF formulas for k = 2, 3, and 5 [Achlioptas, 2009]. Note that the phase-transition we observe for 2-CNF formulas is slightly above the true location at s = 1 [Chv atal and Reed, 1992; Goerdt, 1996]; the correct phase-transition point for 2CNF formulas is observed only when the number of variables is above 4096 [Wilson, 2000].

In all the plots, we observe a large triangular region where the probability that k(n, rn, sn) is satisﬁable is nearly 1. We likewise observe a separate region where the observed

probability that k(n, rn, sn) is satisﬁable is nearly 0. More surprisingly, the shared boundary between the two regions for large areas of the plots seems to be a constant-slope line. A closer examination of this line at the bottom-right corners of the ﬁgures for k = 2 and k = 3, where the k-clause density is large, reveals that the line appears to kink and abruptly change slope. We discuss this further in Section 5.

4 Establishing a Phase-Transition

The experimental results presented in Section 3 empirically demonstrate the existence of a k-CNF-XOR phase-transition. Theorem 1 shows that the k-CNF-XOR phase-transition exists when the density of k-clauses is small. In particular, the function φk(r) (deﬁned in Lemma 3) gives the location of a phase-transition between a region of satisﬁability and a region of unsatisﬁability in random k-CNF-XOR formulas.

Theorem 1. Let k 2. There is a function φk(r), a constant k 1, and a countable set of real numbers Ck (all deﬁned in Lemma 3) such that for all r 2 [0, k)\Ck and s 0:

(a). If s < φk(r), then w.h.p. k(n, rn, sn) is satisﬁable.

(b). If s > φk(r), then w.h.p. k(n, rn, sn) is unsatisﬁable.

Proof. Part (a) follows directly from Lemma 9. Part (b) follows directly from Lemma 14. The proofs of these lemmas are presented in Sections 4.1 and 4.2 respectively.

φk(r) is the free-entropy density of k-CNF, drawing on concepts from spin-glass theory [Gogioso, 2014]. From the expression for φk(r) in Lemma 3, it is easily veriﬁed that φk(0) = 1 and that φk(r) is a monotonically decreasing function of r. Thus when the k-clause density (r) is 0, Theorem 1 says that an XOR-clause density of 1 is a phasetransition for XOR-formulas, matching previously known results [Creignou and Daud e, 1999]. As the k-clause density increases, φ(r) is decreasing and so the XOR-clause density required to reach the phase-transition decreases.

Theorem 1 fully characterizes the random satisﬁability of k(n, rn, sn) when r < 1. In the case k = 2, prior results on random 2-CNF satisﬁability characterize the rest of the region. If r > 1, then F2(n, rn) is unsatisﬁable w.h.p. [Chv atal and Reed, 1992; Goerdt, 1996] and so the 2-clauses within 2(n, rn, sn) are unsatisﬁable w.h.p. without considering the XOR-clauses. Therefore 2(n, rn, sn) is unsatisﬁable w.h.p. if r > 1. This, together with Theorem 1, proves that φ2(r) is the complete location of the 2-CNF-XOR phase-transition.

Moreover, Lemma 4 shows that k (1 ok(1)) 2k ln(k)/k (where ok(1) denotes a term that converges to 0 as k ! 1) and so Theorem 1 shows that a phase-transition exists until near r = 2k ln(k)/k for sufﬁciently large k.

For small k 3, the region r < 1 characterized by Theorem 1 is only a small portion of the region where the subset of k-clauses remains satisﬁable. Moreover, the location of the phase-transition φk(r) given by Theorem 1 is difﬁcult to compute directly. Theorem 2 gives explicit lower and upper bounds on the location of a phase-transition region.

Theorem 2. Let k 3. There is a function b(k, r) (deﬁned in Lemma 5) such that for all s 0 and r 0:

Theorem 2.(a)

Theorem 2.(b)

r: Density of k-clauses

s: Density of XOR-clauses

Satisﬁable w.h.p. Unsatisﬁable w.h.p.

Figure 4: Satisﬁability of k(n, rn, sn) as n ! 1

(a). If s < 1 2 log2( b(k, r)) and r < 2k ln(2) 1

2((k + 1) ln(2) + 3), then w.h.p. k(n, rn, sn) is satisﬁable. (b). If s > r log2(1 2 k) + 1, then w.h.p. k(n, rn, sn) is

unsatisﬁable.

Proof. Part (a) follows directly from Lemma 10. Part (b) follows directly from Lemma 15. The proofs of these lemmas are presented in Sections 4.1 and 4.2 respectively.

Both the upper bound r log2(1 2 k) + 1 and (using the expression for b(k, r) in Lemma 5) the lower bound

1 2 log2( b(k, r)) are linear in r. When the k-clause density r is 0, Theorem 2 agrees with Theorem 1. As the k-clause density increases past (2k), Theorem 2 no longer gives a lower bound on the location of a possible phase-transition.

4.1 A Proof of the Lower Bound We now establish Theorem 1.(a) and Theorem 2.(a), which follow directly from Lemma 9 and Lemma 10 respectively.

The key idea in the proof of these lemmas is to decompose k(n, rn, sn) into independently generated k-CNF and XOR formulas, so that k(n, rn, sn) = Fk(n, rn) Q(n, sn). We can then bound the number of solutions to Fk(n, rn) from below with high probability and bound from below the probability that Fk(n, rn) becomes unsatisﬁable after including XOR-clauses on top of Fk(n, rn).

The following three lemmas achieve the ﬁrst of the two tasks. The ﬁrst, Lemma 3, gives a tight bound on #Fk(n, rn) for small k-clause densities. Lemma 3. Let k 2 and let k be the supremum of {r : 9δ > 0 s.t. Pr(Fk(n, rn) is unsat.) O(1/(log n)1+δ)}. Then k 1. Furthermore, there exists a countable set of real numbers Ck such that for all r 2 [0, k)\Ck:

(a). The sequence 1

n E [log2(#Fk(n, rn)) | Fk(n, rn) is sat.] converges to a limit as n ! 1. Let φk(r) be this limit.

(b). For all > 0, w.h.p. (2φk(r) )n #Fk(n, rn).

(c). For all > 0, w.h.p. (2φk(r)+ )n #Fk(n, rn).

Proof. These proofs are given in [Abbe and Montanari, 2014]. k 1 is given as Remark 2. Part (a) is given as Theorem 3. Parts (b) and (c) are given as Theorem 1.

We abuse notation to let φk(r) denote the limit of the sequence in Lemma 3.(a) for all r > 0, although a priori this sequence may not converge for r k. Later work reﬁned the value of k in Lemma 3 for sufﬁciently large k and so extended the tight bound on #Fk(n, rn). In particular, Lemma 4 implies that k (1 ok(1)) 2k ln(k)/k. Lemma 4. Let k 2. For all r 0, if r (1 ok(1)) 2k ln(k)/k then Pr(Fk(n, rn) is sat.) 1 O(1/n).

Proof. The proof of this is given as Theorem 1.3 of [Coja Oghlan and Reichman, 2013].

It is difﬁcult to compute φk(r) directly. Instead, Lemma 5 provides a weaker but explicit lower bound on #Fk(n, rn). Lemma 5. Let k 3, > 0, and r 0. Let βk be the smallest positive solution to βk(2 βk)k 1 = 1 and deﬁne b(k, r) = 4(((1 βk/2)k 2 k)2/(1 βk)k)r.

If r < 2k ln(2) 1

2((k + 1) ln(2) + 3), then w.h.p. 1 2( b(k, r) )n/2 #Fk(n, rn).

Proof. The proof of this is given on page 264 of [Achlioptas et al., 2011] within Section 6 (Proof of Theorem 6); the deﬁnition of b(k, r) is given as equation (20).

Before we analyze how the solution space of Fk(n, rn) interacts with the solution space of Q(n, sn), we must characterize the solution space of Q(n, sn). The following lemma shows that the solutions of Q(n, sn) are pairwise independent, meaning that a single satisfying assignment of Q(n, sn) gives no information on other satisfying assignments. Lemma 6. Let n 0 and s 0. If σ and σ0 are distinct assignments of truth values to the variables {X1, , Xn}:

(a). Pr(σ satisﬁes Q(n, sn)) = 2 dsne

(b). Pr(σ satisﬁes Q(n, sn) | σ0 satisﬁes Q(n, sn)) = 2 dsne

Proof. The proof of this is given in the proof of Lemma 1 of [Gomes et al., 2007].

The following lemma bounds from below the probability that a formula H (in Lemma 8 we take H = Fk(n, rn)) remains satisﬁable after including XOR-clauses on top of H. This result and proof is similar to Corollary 3 from [Gomes et al., 2006]. Lemma 7. Let 1, s 0, n 0, and let H be a formula deﬁned over {X1, , Xn}. Then Pr(H Q(n, sn) is satisﬁable | #H 2dsne+ ) 1 2 .

Proof. Let R be the set of all truth assignments to the variables in X that satisfy H; there are #H such truth assignments. For every truth assignment σ 2 R, let Yσ be a 0-1 random variable that is 1 if σ satisﬁes H Q(n, sn) and 0 otherwise. Note that Var [Yσ] = E

. Since Yσ is a 0-1 random variable, Y 2

σ = Yσ and thus Var [Yσ] E [Yσ].

Let σ and σ0 be distinct truth assignments in R. By Lemma 6, E [YσYσ0] = E [Yσ] E [Yσ0] = 2 dsne 2 dsne. Thus Cov [Yσ, Yσ0] = E [YσYσ0] E [Yσ] E [Yσ0] = 0.

Let the the random variable Y be the number of solutions to H Q(n, sn), so Y = #(H Q(n, sn)) = P

σ Yσ. Thus Var [Y ] = Var [P

σ Var [Yσ] + P

σ6=σ0 Cov [Yσ, Yσ0]. Since the covariance of Yσ and Yσ0 is 0 for all pairs of distinct truth assignments σ and σ0 in R, we get that Var [Y ] = P

σ Var [Yσ]. Since Var [Yσ] E [Yσ] for all truth assignments σ in R, we get that Var [Y ] P

σ E [Yσ]. Since E [Y ] = E [P

σ E [Yσ], we conclude that Var [Y ] E [Y ]. Moreover, since E [Yσ] = Pr(σ satisﬁes Q(n, sn)) = 2 dsne we get E [Y ] = #H 2 dsne.

Let the event En denote that H Q(n, sn) is unsatisﬁable. Thus if En occurs then then Y = 0 and so |Y E [Y ]| E [Y ]. This implies that Pr( En) Pr(|Y E [Y ]| E [Y ]). Chebyshev s inequality says that Pr(|Y E [Y ]| E [Y ]) Var [Y ] / E [Y ]2. It follows that Pr( En) Var [Y ] / E [Y ]2. Since Var [Y ] E [Y ], we get that Pr( En) E [Y ] 1. Therefore by plugging in the value for E [Y ] we get Pr( En) (#H) 1 2dsne.

Finally, if we assume that #H 2dsne+ then (#H) 1 2dsne 2 . Therefore Pr( En | #H 2dsne+ ) 2 .

Using the key behavior of XOR-clauses described in Lemma 7, we can transform lower bounds (w.h.p.) on the number of solutions to Fk(n, rn) into lower bounds on the location of a possible k-CNF-XOR phase-transition. Lemma 8. Let k 2, s 0, and r 0. Let B1, B2, be an inﬁnite convergent sequence of positive real numbers such that Bn

i #Fk(n, rn) occurs w.h.p. for all i 1. If s < log2(limi!1 Bi), then w.h.p. k(n, rn, sn) is satisﬁable.

Proof. For all integers n 0, let the event En denote the event when k(n, rn, sn) is satisﬁable. We would like to show that Pr(En) converges to 1 as n ! 1.

The general idea of the proof follows. We ﬁrst decompose k(n, rn, sn) as k(n, rn, sn) = Fk(n, rn) Q(n, sn). Let the event Ln denote the event when the number of solutions of Fk(n, rn) is bounded from below (by a lower bound to be speciﬁed later). We show that Ln occurs w.h.p.. Next, we use Lemma 7 to bound from below the probability that Fk(n, rn) Q(n, sn) remains satisﬁable given that Fk(n, rn) has enough solutions; we use this to show that Pr(En | Ln) converges to 1 as n ! 1. Finally, we combine these results to prove that Pr(En) converges to 1.

Since 2s < limi!1 Bi, there is some integer i 1 such that 2s < Bi. Deﬁne the event Ln as the event when #Fk(n, rn) Bn

i . Then Ln occurs w.h.p. by hypothesis. Next, we show that Pr(En | Ln) converges to 1. Choose δ > 0 and N > 0 such that 2s+δ+1/N < Bi; we can always ﬁnd sufﬁciently small δ and sufﬁciently large N such that this holds. Since we are concerned only with the behavior of Pr(En | Ln) in the limit, we can restrict our attention only to large enough n. In particular, consider n > 2N. Then we get that 2sn+δn+2 < Bn

i and so 2dsne+δn+1 < Bn

δn + 1, so that 2dsne+ Bn

i . Then Lemma 7 says that Pr(En | Ln) 1 2 δn 1. Since 1 2 δn 1 converges to 1 as n ! 1, Pr(En | Ln) must also converge to 1.

Thus both Pr(En | Ln) and Pr(Ln) converge to 1 as n ! 1. Since Pr(En \ Ln) = Pr(En | Ln) Pr(Ln), this implies that Pr(En \ Ln) also converges to 1. Since Pr(En \ Ln) Pr(En) 1, this implies that Pr(En) converges to 1.

Finally, it remains only to use Lemma 8 to obtain bounds on the k-CNF-XOR phase-transition. The tight lower bound on #Fk(n, rn) from Lemma 3.(b) corresponds to a tight lower bound on the location of the phase-transition.

Lemma 9. Let k 2, and let k, Ck, and φk(r) be as deﬁned in Lemma 3. For all r 2 [0, k)\Ck and s 2 [0, φk(r)), k(n, rn, sn) is satisﬁable w.h.p..

Proof. Let Bi = 2φk(r) 1/i. By Lemma 3.(b), Bn

i #Fk(n, rn) w.h.p. for all i 1. Furthermore, limi!1 Bi = 2φk(r) and so s < log2(limi!1 Bi). Thus k(n, rn, sn) is satisﬁable w.h.p. by Lemma 8.

The weaker lower bound on #Fk(n, rn) from Lemma 5 corresponds to a weaker lower bound on the location of the phase-transition.

Lemma 10. Let k 3, s 0, and r 0. If r < 2k ln(2) 1

2(k + 1) ln(2) + 3

2 and s < 1 2 log2( b(k, r)), then k(n, rn, sn) is satisﬁable w.h.p..

Proof. Let Bi = ( b(k, r) 1/i)1/2. This is an increasing sequence in i, so log2(Bi+1/Bi) is positive for all i 1. Consider one such i 1 and deﬁne Ni = 1/ log2(Bi+1/Bi). Then for all n > Ni it follows that 21/n < Bi+1/Bi and so Bn

i+1. By Lemma 5, 1

i+1 #Fk(n, rn) w.h.p. and therefore Bn

i+1 #Fk(n, rn) w.h.p. as well. Furthermore, limi!1 Bi = b(k, r)1/2 and so s < log2(limi!1 Bi). Thus k(n, rn, sn) is satisﬁable w.h.p. by Lemma 8.

4.2 A Proof of the Upper Bound We now establish Theorem 1.(b) and Theorem 2.(b), which follow directly from Lemma 14 and Lemma 15 respectively.

Similar to Section 4.1, the key idea in the proof of these lemmas is to decompose k(n, rn, sn) into independently generated k-CNF and XOR formulas, so that k(n, rn, sn) = Fk(n, rn) Q(n, sn). We can then bound the number of solutions to Fk(n, rn) from above with high probability and bound from below the probability that Fk(n, rn) becomes unsatisﬁable after including XORclauses on top of Fk(n, rn).

The ﬁrst of these two tasks is accomplished through Lemma 3.(c), which gives a tight upper bound on #Fk(n, rn) for small k-clause densities, and by Lemma 11, which gives a weaker explicit upper bound on #Fk(n, rn).

Lemma 11. For all > 1, k 2, and r 0, w.h.p. #Fk(n, rn) < (2 (1 2 k)r)n.

Proof. Let X = #Fk(n, rn). For a random assignment on n variables σ, note that Pr(σ satisﬁes Fk(n, 1)) = (1 2 k).

Since the drne k-clauses of Fk(n, rn) were chosen independently, this implies that E [X] = 2n(1 2 k)drne.

By Markov s inequality, we get Pr(X n E [X]) E [X] /( n E [X]) = n. Since 1 2 k < 1 and so 2n n(1 2 k)drne 2n n(1 2 k)rn, it follows that Pr(X n2n(1 2 k)rn) Pr(X n E [X]) n. Thus lim

n!1 Pr(X < n2n(1 2 k)rn) = 1.

The following lemma bounds from below the probability that a formula H (in Lemma 13 we take H = Fk(n, rn)) remains satisﬁable after including XOR-clauses on top of H. This result and proof is similar to Corollary 1 from [Gomes et al., 2006].

Lemma 12. Let 1, s 0, n 0, and let H be a formula deﬁned over X = {X1, , Xn}. Then Pr(H Q(n, sn) is unsatisﬁable | #H 2dsne ) 1 2 .

Proof. Let the random variable Y denote #(H Q(n, sn)) as in Lemma 7. Markov s inequality implies that Pr(Y 1) E [Y ]. Recall from Lemma 7 that E [Y ] = #H 2 dsne, so Pr(Y 1) #H 2 dsne. If #H 2dsne , then #H 2 dsne 2 . Thus Pr(Y 1 | #H 2dsne ) 2 . Since H Q(n, sn) is unsatisﬁable exactly when Y = 0, we conclude Pr(H Q(n, sn) is unsatisﬁable) 1 2 .

Using the key behavior of XOR-clauses described in Lemma 12, we can transform upper bounds (w.h.p.) on the number of solutions to Fk(n, rn) into upper bounds on the location of a possible k-CNF-XOR phase-transition.

Lemma 13. Let k 2, s 0, and r 0. Let B1, B2, be an inﬁnite convergent sequence of positive real numbers such that #Fk(n, rn) Bn

i occurs w.h.p. for all i 1. If s > log2(limi!1 Bi), then w.h.p. k(n, rn, sn) is unsatisﬁable.

Proof. For all integers n 0, let the event En denote the event when k(n, rn, sn) is unsatisﬁable. We would like to show that Pr( En) converges to 1 as n ! 1.

The general idea of the proof follows. Note that k(n, rn, sn) = Fk(n, rn) Q(n, sn) as in Lemma 8. Let the event Un denote the event when the number of solutions of Fk(n, rn) is bounded from above (by an upper bound to be speciﬁed later). We show that Un occurs w.h.p.. Next, we use Lemma 12 to bound from below the probability that Fk(n, rn) Q(n, sn) becomes unsatisﬁable given that Fk(n, rn) has few solutions; we use this to show that Pr( En | Un) converges to 1 as n ! 1. Finally, we combine these results to prove that Pr( En) converges to 1.

Since 2s > limi!1 Bi, there is some integer i 1 such that 2s > Bi. Deﬁne the event Un as the event when #Fk(n, rn) Bn

i . Then Un occurs w.h.p. by hypothesis. Next, we show that Pr( En | Un) converges to 1. Choose δ > 0 and N > 0 such that 2s δ 1/N > Bi. As in Lemma 8 we are concerned only with the behavior of Pr( En | Un) in the limit so we can restrict our attention only to large enough n. In particular, consider n > N. Then we get that 2dsne δn 1 > 2sn δn n/N > Bn

i . Let = δn + 1, so that 2dsne Bn

i . Then Lemma 12 says that Pr( En | Un)

1 2 δn 1. Since 1 2 δn 1 converges to 1 as n ! 1, Pr( En | Un) must also converge to 1.

Thus both Pr( En | Un) and Pr(Un) converge to 1 as n ! 1. Since Pr( En \ Un) = Pr( En | Un) Pr(Un), this implies that Pr( En \ Un) also converges to 1. Since Pr( En \ Un) Pr( En) 1, this implies that Pr( En) converges to 1.

Finally, it remains only to use Lemma 13 to obtain bounds on the k-CNF-XOR phase-transition. The tight upper bound on #Fk(n, rn) from Lemma 3.(c) corresponds to a tight upper bound on the location of the phase-transition.

Lemma 14. Let k 2, and let k, Ck, and φk(r) be as deﬁned in Lemma 3. Then for all r 2 [0, k)\Ck and s > φk(r), k(n, rn, sn) is unsatisﬁable w.h.p..

Proof. Let Bi = 2φk(r)+1/i. By Lemma 3.(c), Bn

i #Fk(n, rn) w.h.p. for all i 1. Furthermore, limi!1 Bi = 2φk(r) and so s > log2(limi!1 Bi). Thus k(n, rn, sn) is unsatisﬁable w.h.p. by Lemma 13.

The weaker upper bound on #Fk(n, rn) from Lemma 11 corresponds to a weaker upper bound on the phase-transition.

Lemma 15. Let k 2, s 0, and r 0. If s > 1 + r log2(1 2 k), then k(n, rn, sn) is unsatisﬁable w.h.p..

Proof. Let Bi = ((1 + 1/i) 2(1 2 k)r). By Lemma 11, Bn

i #Fk(n, rn) w.h.p. for all i 1. Furthermore, limi!1 Bi = 2(1 2 k)r and so s > log2(limi!1 Bi). Thus k(n, rn, sn) is unsatisﬁable w.h.p. by Lemma 13.

5 Extending the Phase-Transition Region

Section 4 proved that a phase-transition exists for k-CNFXOR formulas when the k-clause density is small. Our empirical observations in Section 3 suggest that a phase-transition exists for higher k-clause densities as well. In this section, we conjecture two possible extensions to our theoretical results.

The ﬁrst extension follows from Theorem 1, which implies that s = φk(r) gives the location of the phase-transition for small k-clause densities. It is thus natural to conjecture that φk(r) gives the location of the k-CNF-XOR phase-transition for all (except perhaps countably many) r > 0. This would follow from a conjecture of [Abbe and Montanari, 2014].

The second extension follows from the experimental results in Section 3, which suggest that the location of the phase-transition follows a linear trade-off between k-clauses and XOR-clauses. This leads to the following conjecture:

Conjecture 2 (k-CNF-XOR Linear Phase-Transition Conjecture). Let k 2. Then there exists a slope Lk < 0 and a constant

k > 0 such that for all r 2 [0,

k) and s 0:

(a). If s < r Lk + 1, then w.h.p. k(n, rn, sn) is satisﬁable.

(b). If s > r Lk+1, then w.h.p. k(n, rn, sn) is unsatisﬁable.

Theorem 2 bounds the possible values for Lk. Moreover, if the Linear k-CNF-XOR Phase-Transition Conjecture holds, then Theorem 1 implies that φk(r) is linear for all r < k and r <

k. Explicit computations of φk(r) (or sufﬁciently tight bounds) would resolve this conjecture.

Note that this conjecture does not necessarily describe the entire k-CNF-XOR phase-transition; a phase-transition may exist when r >

k as well. The experimental results in Section 3 for k = 2 and k = 3 suggest that the location of the phase-transition may kink and become non-linear for large enough k-clause densities. We leave the full characterization of the k-CNF-XOR phase-transition for future work, noting that a full characterization would resolve the Satisﬁability Phase-Transition Conjecture.

6 Conclusion

We presented the ﬁrst study of phase-transition phenomenon in the satisﬁability of k-CNF-XOR random formulas. We showed that the free-entropy density φk(r) of k-CNF formulas gives the location of the phase-transition for k-CNF-XOR formulas when the density of the k-CNF clauses is small. We conjectured in the k-CNF-XOR Linear Phase-Transition Conjecture that this phase-transition is linear. We leave further analysis and proof of this conjecture for future work.

Pittel and Sorkin [Pittel and Sorkin, 2015] recently identiﬁed the location of the phase-transition for random -XOR formulas, where each clause contains exactly literals. This suggests that a phase-transition may also exist in formulas that mix k-CNF clauses together with -XOR clauses.

In this work we did not explore the runtime of SAT solvers over the space of k-CNF-XOR formulas. Historically, other phase-transition phenomena have been closely connected empirically to solver runtime. Developing this connection in the case of k-CNF-XOR formulas is an exciting direction for future research and may lead to practical improvements to hashing-based sampling and counting algorithms.

Acknowledgments The authors would like to thank Dimitris Achlioptas for helpful discussions in the early stages of this project.

Work supported in part by NSF grants CCF-1319459 and IIS-1527668, by NSF Expeditions in Computing project Ex CAPE: Expeditions in Computer Augmented Program Engineering , by BSF grant 9800096, by the Ken Kennedy Institute Computer Science & Engineering Enhancement Fellowship funded by the Rice Oil & Gas HPC Conference, and Data Analysis and Visualization Cyberinfrastructure funded by NSF under grant OCI-0959097. Part of this work was done during a visit to the Israeli Institute for Advanced Studies.

References [Abbe and Montanari, 2014] E. Abbe and A. Montanari. On

the concentration of the number of solutions of random satisﬁability formulas. Random Structures & Algorithms, 45(3):362 382, 2014. [Achlioptas and Coja-Oghlan, 2008] D. Achlioptas and A. Coja-Oghlan. Algorithmic barriers from phase transitions. In Proc. of Fo CS, pages 793 802, 2008. [Achlioptas et al., 2011] D. Achlioptas, A. Coja-Oghlan,

and F. Ricci-Tersenghi. On the solution-space geometry of random constraint satisfaction problems. Random Structures & Algorithms, 38(3):251 268, 2011.

[Achlioptas, 2009] D. Achlioptas. Random satisﬁability. In

Handbook of Satisﬁability, pages 245 270. IOS Press, 2009. [Apt, 2003] K. Apt. Principles of Constraint Programming.

Cambridge Univ. Press, 2003. [Belov et al., 2014] A. Belov, D. Diepold, M. J. Heule, and

M. J arvisalo. Sat competition 2014. 2014. [Chakraborty et al., 2013a] S. Chakraborty, K. S. Meel, and

M. Y. Vardi. A scalable and nearly uniform generator of SAT witnesses. In Proc. of CAV, pages 608 623, 2013. [Chakraborty et al., 2013b] S. Chakraborty, K. S. Meel, and

M. Y. Vardi. A scalable approximate model counter. In Proc. of CP, pages 200 216, 2013. [Chakraborty et al., 2014a] S. Chakraborty, D. J. Fremont,

K. S. Meel, S. A. Seshia, and M. Y. Vardi. Distributionaware sampling and weighted model counting for SAT. In Proc. of AAAI, pages 1722 1730, 2014. [Chakraborty et al., 2014b] S. Chakraborty, K. S. Meel, and

M. Y. Vardi. Balancing scalability and uniformity in SAT witness generator. In Proc. of DAC, pages 1 6, 2014. [Chakraborty et al., 2016] S. Chakraborty, K. S. Meel, and

M. Y. Vardi. Algorithmic improvements in approximate counting for probabilistic inference: From linear to logarithmic SAT calls. In Proc. of IJCAI, 2016. [Cheeseman et al., 1991] P. Cheeseman, B. Kanefsky, and

W. M. Taylor. Where the really hard problems are. In Proc. of IJCAI, pages 331 340, 1991. [Chv atal and Reed, 1992] V. Chv atal and B. Reed. Mick gets

some (the odds are on his side). In Proc. of Fo CS, pages 620 627, 1992. [Coarfa et al., 2003] C. Coarfa, D. D. Demopoulos, A. S. M.

Aguirre, D. Subramanian, and M. Y. Vardi. Random 3-sat: The plot thickens. Constraints, 8(3):243 261, 2003. [Coja-Oghlan and Panagiotou, 2013] A. Coja-Oghlan and

K. Panagiotou. Going after the k-sat threshold. In Proc. of STo C, pages 705 714, 2013. [Coja-Oghlan and Reichman, 2013] A. Coja-Oghlan and D. Reichman. Sharp thresholds and the partition function. In Journal of Physics: Conference Series, volume 473, page 012015, 2013. [Crawford and Auton, 1993] J. M. Crawford and L. D. Au-

ton. Experimental results on the crossover point in satisﬁability problems. In Proc. of AAAI, pages 21 27, 1993. [Creignou and Daud e, 1999] N. Creignou and H. Daud e.

Satisﬁability threshold for random xor-cnf formulas. Discrete Applied Mathematics, 9697:41 53, 1999. [Creignou and Daud e, 2003] N. Creignou and H. Daud e.

Smooth and sharp thresholds for random k-xor-cnf satisﬁability. RAIRO:ITA, 37(2):127 147, 2003. [Dechter and Meiri, 1994] R. Dechter and I. Meiri. Experimental evaluation of preprocessing algorithms for constraint satisfaction problems. Artiﬁcial Intelligence, 68:211 241, 1994.

[Dechter, 2003] R. Dechter. Constraint Processing. Morgan

Kaufmman, 2003. [Ding et al., 2015] J. Ding, A. Sly, and N. Sun. Proof of the

satisﬁability conjecture for large k. In Proc. of STo C, pages 59 68. ACM, 2015. [Dubois and Mandler, 2002] O. Dubois and J. Mandler. The

3-xorsat threshold. Comptes Rendus Mathematique, 335(11):963 966, 2002. [Goerdt, 1996] A. Goerdt. A threshold for unsatisﬁability.

Journal of Computer and System Sciences, 53(3):469 486, 1996. [Gogioso, 2014] S. Gogioso. Aspects of statistical physics in

computational complexity. ar Xiv:1405.3558, 2014. [Gomes et al., 2006] C. P. Gomes, A. Sabharwal, and B. Sel-

man. Model counting: A new strategy for obtaining good bounds. In Proc. of AAAI, volume 21, pages 54 61, 2006. [Gomes et al., 2007] C. P. Gomes, A. Sabharwal, and B. Sel-

man. Near-uniform sampling of combinatorial spaces using XOR constraints. In Proc. of NIPS, pages 670 676, 2007. [Ivrii et al., 2015] A. Ivrii, S. Malik, K. Meel, and M. Vardi.

On computing minimal independent support and its applications to sampling and counting. Constraints, pages 1 18, 2015. [Kirkpatrick and Selman, 1994] S. Kirkpatrick and B. Sel-

man. Critical behavior in the satisﬁability of random boolean expressions. Science, 264(5163):1297 1301, 1994. [Meel et al., 2016] K. S. Meel, M. Vardi, S. Chakraborty,

D. J. Fremont, S. A. Seshia, D. Fried, A. Ivrii, and S. Malik. Constrained sampling and counting: Universal hashing meets sat solving. In Proc. of Beyond NP Workshop, 2016. [Mitchell et al., 1992] D. Mitchell, B. Selman, and H. Levesque. Hard and easy distributions of sat problems. In Proc. of AAAI, pages 459 465, 1992. [Mu and Hoos, 2015] Z. Mu and H. H. Hoos. On the empiri-

cal time complexity of random 3-sat at the phase transition. In Proc. of IJCAI, pages 367 373, 2015. [Pittel and Sorkin, 2015] B. Pittel and G. B. Sorkin. The sat-

isﬁability threshold for k-xorsat. Combinatorics, Probability and Computing, First View:1 33, 10 2015. [Soos et al., 2009] M. Soos, K. Nohl, and C. Castelluccia.

Extending SAT Solvers to Cryptographic Problems. In Proc. of SAT. Springer-Verlag, 2009. [Wilson, 2000] D. B. Wilson. Random 2-sat data. http://dbwilson.com/2sat-data/, 2000.