# constraint_optimization_over_semirings__2f6f09dc.pdf

Constraint Optimization over Semirings

A. Pavan 1, r Kuldeep S. Meel2, r N. V. Vinodchandran 3, r Arnab Bhattacharyya2

1 Iowa State University 2 National University of Singapore 3 University of Nebraska-Lincoln

Interpretations of logical formulas over semirings (other than the Boolean semiring) have applications in various areas of computer science including logic, AI, databases, and security. Such interpretations provide richer information beyond the truth or falsity of a statement. Examples of such semirings include Viterbi semiring, min-max or access control semiring, tropical semiring, and fuzzy semiring.

The present work investigates the complexity of constraint optimization problems over semirings. The generic optimization problem we study is the following: Given a propositional formula ϕ over n variable and a semiring (K, +, , 0, 1), ﬁnd the maximum value over all possible interpretations of ϕ over K. This can be seen as a generalization of the well-known satisﬁability problem (a propositional formula is satisﬁable if and only if the maximum value over all interpretations/assignments over the Boolean semiring is 1). A related problem is to ﬁnd an interpretation that achieves the maximum value. In this work, we ﬁrst focus on these optimization problems over the Viterbi semiring, which we call opt Conf Val and opt Conf.

We ﬁrst show that for general propositional formulas in negation normal form, opt Conf Val and opt Conf are in FPNP. We then investigate opt Conf when the input formula ϕ is represented in the conjunctive normal form. For CNF formulae, we ﬁrst derive an upper bound on the value of opt Conf as a function of the number of maximum satisﬁable clauses. In particular, we show that if r is the maximum number of satisﬁable clauses in a CNF formula with m clauses, then its opt Conf value is at most 1/4m r. Building on this we establish that opt Conf for CNF formulae is hard for the complexity class FPNP[log]. We also design polynomial-time approximation algorithms and establish an inapproximability for opt Conf Val. We establish similar complexity results for these optimization problems over other semirings including tropical, fuzzy, and access control semirings.

The authors decided to forgo the old convention of alphabetical ordering of authors in favor of a randomized ordering, denoted by r . The publicly veriﬁable record of the randomization is available at https://www.aeaweb.org/journals/policies/random-author-order/ search Copyright c 2023, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

1 Introduction

Classically, propositional formulae are interpreted over the Boolean semiring B = ({F, T}, , , F, T) which is the standard semantics for the logical truth. In this setting, the variables take one of the two values T (true) or F (false). However, it is natural to extend the semantics to other semirings. Here, the idea is to interpret logical formulae when the variables take values over a semiring K = (K, +, , 0, 1). Such interpretations provide richer information beyond the truth or falsity of a statement and have applications in several areas such as databases, AI, logic, and security (see (Imieli nski and Lipski Jr 1989; Fuhr and R olleke 1997; Zim anyi 1997; Cui, Widom, and Wiener 2000; Cui 2002; Gr adel and Tannen 2020) and references therein). In particular, semiring provenance analysis has been successfully applied in several software systems, such as Orchestra and Propolis (see, e.g., (Amsterdamer, Deutch, and Tannen 2011; Deutch et al. 2014; Foster, Green, and Tannen 2008; Green 2011; Tannen 2013)). Examples of semirings that are studied in the literature include Viterbi semiring, fuzzy semiring, min-max or access control semiring, and tropical semiring. Semantics over the Viterbi semiring V = ([0, 1], max, , 0, 1) has applications in database provenance, where x [0, 1] is interpreted as a conﬁdence score (Gr adel and Tannen 2020; Green, Karvounarakis, and Tannen 2007; Tannen 2017; Gr adel and Mrkonjic 2021), in probabilistic parsing, in probabilistic CSPs, and in Hidden Markov Models (Viterbi 1967; Klein and Manning 2003; Bistarelli, Montanari, and Rossi 1995). The access control semiring can be used as a tool in security speciﬁcations (Gr adel and Tannen 2020). Other semirings of interest include the tropical semiring, used in cost analysis and algebraic formulation for shortest path algorithms (Mohri 2002), and fuzzy semirings used in the context of fuzzy CSPs (Bistarelli, Montanari, and Rossi 1995). Optimization problems over Boolean interpretations have been central in many application as well as foundation areas. Indeed, the classical satisﬁability problem is determining whether a formula φ(x1, , xn) has an interpretation/assignment over the Boolean semiring that evaluates to True. Even though semiring semantics naturally appear in a variety of applications, the optimization problems over semirings, other than the Boolean semiring, have not received

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

much attention. In this work, we introduce and investigate the complexity of optimization problems over semiring semantics. Let K = (K, +, , 0, 1) be a semiring with a total order over K and ϕ be a propositional formula over a set X of variables. A K-interpretation π is a function from X to K. Such an interpretation can be naturally extended to formula ϕ, which we denote by Sem(ϕ, π). We study the following computational problem: Given a propositional formula ϕ in negation normal form over a set X of variables, compute the maximum value of Sem(ϕ, π) over all possible interpretations π. We call this problem opt Sem Val. A related problem, denoted opt Sem, is to compute an interpretation π that maximizes Sem(ϕ, π). Refer to Section 2 for a precise formulation of these problems. There has been a rich history of work which formulated the notion of CSP over semirings and investigated local consistency algorithms in the general framework (Bistarelli 2004; Bistarelli and Gadducci 2006; Bistarelli, Montanari, and Rossi 1995, 1997; Bistarelli et al. 1999; Meseguer, Rossi, and Schiex 2006). These works did not involve interpretations and did not focus on the computational complexity of the above-deﬁned problems. Relatedly, the computational complexity of sum-of-product problems over semirings has been studied recently (Eiter and Kiesel 2021). However, the problems they study are different from ours. To the best of our knowledge, optimization problems opt Sem and opt Sem Val that we consider over semirings have not been studied earlier and there are no characterizations of their computational complexity.

1.1 Our Results We comprehensively study the computational complexity of opt Sem and the related problem opt Sem Val over various semirings such as Viterbi semiring, tropical semiring, access control semiring and fuzzy semiring, from both an algorithmic and a complexity-theoretic viewpoint. When the underlying semiring is the Viterbi semiring, we call these problems opt Conf and opt Conf Val. Our results can be summarized as follows: 1. We establish that both opt Conf and opt Conf Val are in the complexity class FPNP. The crucial underlying observation is that even though π maps X to real values in the range [0, 1]; the solution to opt Conf Val can be represented using polynomially many bits. We then draw upon connections to Farey sequences to derive an algorithm with polynomially many NP calls (Theorem 3.2). 2. For CNF formulas, we establish an upper bound on opt Conf Val as a function of the number of maximum satisﬁable clauses (Theorem 3.7). 3. We also establish a lower bound on the complexity of opt Conf Val and opt Conf. In particular, we show that both the problems are hard for the complexity class FPNP[log]. To this end, we demonstrate a reduction from Max SATVal to opt Conf Val; this reduction crucially relies on the above-mentioned upper bound on opt Conf Val in terms of the number of maximum satisﬁable clauses (Theorem 3.9).

4. We design a polynomial-time approximation algorithm for opt Conf Val and establish an inapproximability result. In particular, for 3-CNF formulas with m clauses, we design a 0.716m-approximation algorithm and show that the approximation factor can not be improved to 0.845m unless P = NP (Theorems 4.3 and 4.4). 5. Finally, we show that for the access control semiring, the complexity of these optimization problems is equivalent to the corresponding problems over Boolean semiring (Theorem 5.3).

Remark 1. Since Viterbi semiring and tropical semiring are isomorphic via the mapping x ln x, results established for Viterbi semiring also hold for the tropical semiring. Fuzzy semiring can be seen as an inﬁnite reﬁnement of access control semiring with the same algebraic structure, results that we establish for access control semiring also hold for fuzzy semiring.

Organization. The rest of the paper is organized as follows. We give the necessary notation and deﬁnitions in Section 2. Section 3 details our results on the computational complexity of opt Conf and opt Conf Val. Section 4 deals with approximate algorithms and the hardness of approximation of opt Conf Val. In Section 5, we give complexity results for optimization problems for the access control semiring. Finally, we conclude in Section 6. Due to space constraints, many of the involved proofs are omitted and will appear in the full version (Pavan et al. 2023)

2 Preliminaries

We assume that the reader is familiar with deﬁnition of a semiring. We denote a generic semiring by K = (K, +, , 0, 1) where K is the underlying set. For interpreting formulas over K, we will add a negation function ℸ: K K. We assume ℸis a bijection so that ℸ(ℸ(x)) = x, and ℸ(0) = 1. For ease of presentation, we use the most natural negation function (depending on the semiring). However, many of our results hold for very general interpretations of negation. Finally, as our focus is on optimization problems, we will also assume a (natural) total order on the elements of K. For a set X = {x1, x2, . . . xn} of variables, we associate the set X = { x1, . . . , xn}. We call X X the literals and formulas we consider are propositional formulas over X X in negation normal form. We also view a propositional formula ϕ in negation normal form as a rooted directed tree wherein each leaf node is labeled with a literal, 1, or 0 and each internal node is labeled with conjunction ( ) or disjunction . Note that viewing ϕ as a tree ensures a similar size as its string representation. We call the tree representing the formula ϕ as formula tree and denote it with Tϕ. For a propositional formula ϕ(x1, , xn), in negation normal form we use m to denote the size of the formula, i.e. the total number of occurrences of each variable and its negation. When ϕ(x1, xn) is in CNF form, m denotes the number of clauses. We interpret a propositional formula over a semiring K by mapping the variables to K and naturally extending it.

Formally, a K-interpretation is a function π : X K. We extend π to an arbitrary propositional formula ϕ in negation normal form, which is denoted by Sem(ϕ, π) (Sem stands for semantics ), as follows.

- Sem(x, π) = π(x) - Sem( x, π) = ℸ(π(x)) - Sem(α β, π) = Sem(α, π) + Sem(β, π) - Sem(α β, π) = Sem(α, π) Sem(β, π)

2.1 Optimization Problems and Complexity Classes For a formula ϕ, we deﬁne opt Sem Val(ϕ) as

opt Sem Val(ϕ) = max π {Sem(ϕ, π)},

where max is taken over all possible K-interpretations from X to K.

Deﬁnition 2.1 (opt Sem and opt Sem Val). Given a propositional formula ϕ in negation normal form, the opt Sem Val problem is to compute opt Sem Val(ϕ). The opt Sem problem is to compute a K-interpretation that achieves opt Sem Val(ϕ), i.e, output π so that opt Sem Val(ϕ) = Sem(ϕ, π ).

Notice that when K is the Boolean semiring (with 0 < 1 ordering and standard negation interpretation), opt Sem Val is the well-known satisﬁability problem: the formula ϕ is satisﬁable if and only if opt Sem Val(ϕ) = 1. Also, the problem opt Sem is to output a satisfying assignment if the formula ϕ is satisﬁable. In this work, we consider the following semirings.

1. Viterbi semiring V = ([0, 1], max, , 0, 1). As mentioned, the Viterbi semiring has applications in database provenance, where x [0, 1] is interpreted as conﬁdence scores, in probabilistic parsing, in probabilistic CSPs, and in Hidden Markov Models. 2. The tropical semiring T = (R { }, min, +, , 0). The tropical semiring is isomorphic to the Viterbi semiring via the mapping x ln x. 3. The fuzzy semiring F = ([0, 1], max, min, 0, 1). 4. Access control semiring Ak = ([k], max, min, 0, k). Intuitively, each i [k] is associated with an access control level with natural ordering. Here 0 corresponds to public access and n corresponds to no access at all. [k] is the set {0 < 1 < < k}.

Most of our focus will be on complexity of opt Sem and opt Sem Val problems over the Viterbi semiring. We call the corresponding computational problems opt Conf and opt Conf Val respectively. We call the extended interpretation function Sem as Conf in this case.

Deﬁnition 2.2 (Max Sat and Max Sat Val). Given a propositional formula ϕ in CNF form, the Max Sat problem is to compute an assignment of ϕ that satisﬁes the maximum number of clauses. Given a propositional formula ϕ in CNF form, the Max Sat Val problem is to compute the maximum number of clauses of ϕ that can be satisﬁed.

We need a notion of reductions between functional problems. We use the notion of metric reductions introduced by Krentel (Krentel 1988). Deﬁnition 2.3 (Metric Reduction). For two functions f, g : {0, 1} {0, 1} , we say that f metric reduces to g if there are polynomial-time computable functions h1 and h2 where h1 : {0, 1} {0, 1} (the reduction function) and h2 : {0, 1} {0, 1} {0, 1} so that for any x, f(x) = h2(x, g(h1(x))).

Deﬁnition 2.4. For a function t : N N, FPNP[t(n)] denotes the class of functions that can be solved in polynomialtime with O(t(n)) queries to an NP oracle where n is the size of the input. When t(n) is some polynomial, we denote the class by FPNP. Metric reductions are used to deﬁne notions of completeness and hardness for function classes FPNP and FPNP[log]. The following result due to Krentel (Krentel 1988) characterizes the complexity of the Max Sat Val problem. Theorem 2.5 ((Krentel 1988)). Max Sat Val is complete for FPNP[log] under metric reductions. The following proposition is a basic ingredient in our results. It can be proved using basic calculus. Proposition 1. Let f(x) = xa(1 x)b where a, b are nonnegative integers, the maximum value of f(x) over the domain [0, 1] is attained when x = a a+b. The maximum value of the function is ( a a+b)a( b a+b)b.

3 Computational Complexity of Conﬁdence Maximization For semantics over Viterbi semiring we assume the standard closed world semantics and use the negation function ℸ(x) = 1 x. Thus we have Conf( x, π)+Conf(x, π) = 1. However, our upper bound proofs go through for any reasonable negation function. We discuss this in Remark 2. Since Conf(ϕ, π) can be computed in polynomial time, opt Conf is at least as hard as opt Conf Val. The following observation states that computing opt Conf Val and opt Conf are NP-hard. Observation 3.1. For a formula ϕ, opt Conf Val(ϕ) = 1 if and only if ϕ satisﬁable. Hence both opt Conf and opt Conf Val are NP-hard. While both opt Conf and opt Conf Val are NP-hard, we would like to understand their relation to other maximization problems. In the study of optimization problems, the complexity classes FPNP and FPNP[log] play a key role. In this section, we investigate both upper and lower bounds for these problems in relation to the classes FPNP and FPNP[log].

3.1 An Upper Bound for General Formulae We show that opt Conf Val and opt Conf can be computed in polynomial-time with oracle queries to an NP language. Theorem 3.2. opt Conf Val for formulas in negation normal form is in FPNP.

Proof Idea: In order to show that opt Conf Val is in FPNP, we use a binary search strategy using a language in NP. One of the challenges is that the conﬁdence value could potentially be any real number in [0, 1] and thus apriori we may not be able to bound the number of binary search queries. However, we ﬁrst argue that for any formula ϕ on n variables and with size m, opt Conf(ϕ) is a fraction of the form A/B where 1 A B 2nm log m. Ordered fractions of such form are known as Farey sequence of order 2nm log m (denoted as F2nm log m). Thus our task is to do a binary search over F2nm log m with time complexity O(nm log m). However, in general binary search for an unknown element in the Farey sequence FN with time complexity O(log N) appears to be unknown. We overcome this difﬁculty by using an NP oracle to aid the binary search. We will give the details now.

Deﬁnition 3.3. Let ϕ(x1, , xn) be a propositional formula in negation normal form with size m. Let Tϕ be its formula tree. A proof tree T of Tϕ is a subtree obtained by the following process: for every OR node v, choose one of the sub-trees of v. For every AND node v, keep all the subtrees.

Note that in a proof tree every OR node has only one child.

Deﬁnition 3.4. Let ϕ(x1, , xn) be a propositional formula in negation normal form and let T be a proof tree. We deﬁne the proof tree polynomial p T by inductively deﬁning a polynomial for the subtree at every node v (denoted by pv): If the node v is a variable xi, the polynoimal is xi and if it is xi, the polynomial is (1 xi). If v is an AND node with children v1, . . . , vs, then pv = Qs i=1 ps. If v is an OR node with a child u, then pv = pu.

Claim 3.4.1. Let ϕ(x1, , xn) be a propositional formula in negation normal form and let T be a proof tree of ϕ.

1. The proof tree polynomial p T is of the form

i=1 xai i (1 xi)bi

where 0 ai + bi m. 2. For a V-interpretation π,

Conf(T, π) = p T (π(x1), . . . , π(xn)) .

3. Both opt Conf(T) and opt Conf Val(T) can be computed in polynomial-time.

4. opt Conf Val(T) = Πn i=1 ai ai+bi

ai bi ai+bi

The next claim relates opt Conf of the formula ϕ to opt Conf of its proof trees. The proof of this claim follows from the deﬁnition of proof tree and standard induction.

Claim 3.4.2. For a formula ϕ,

opt Conf Val(ϕ) = max T opt Conf Val(T)

where maximum is taken over all proof trees T of Tϕ. If T is the proof tree for which opt Conf(T) is maximized, then opt Conf(T ) = opt Conf(ϕ).

The above claim states that opt Conf(ϕ) can be computed by cycling through all proof trees T of ϕ and computing opt Conf(T). Since there could be exponentially many proof trees, this process would take exponential time. Our task is to show that this process can be done in FPNP. To do this we establish a claim that restricts values that opt Conf Val(ϕ) can take. We need the notion of Farey sequence.

Deﬁnition 3.5. For any positive integer N, the Farey sequence of order N, denoted by FN, is the set of all irreducible fractions p/q with 0 < p < q N arranged in increasing order.

Claim 3.5.1. 1. For a propositional formula ϕ(x1, , xn), opt Conf Val(ϕ) belongs to the Farey sequence F2nm log m. 2. For any two fractions u and v from F2nm log m, |u v| 1/22nm log m

Consider the following language

Lopt = { ϕ, v | opt Conf Val(ϕ) v}

Claim 3.5.2. Lopt is in NP.

We need a method that given two fractions u and v and an integer N, outputs a fraction p/q : u p/q v, and p/q FN. We give an FPNP algorithm that makes O(N) queries to the NP oracle to achieve this. We ﬁrst deﬁne the NP language Lfarey. For this we ﬁx any standard encoding of fraction using the binary alphabet. Such an encoding will have O(log N) bit representation for any fraction in FN.

Lfarey = { N, u, v, z | z ; u zz v & zz FN}

The following claim is easy to see.

Claim 3.5.3. Lfarey NP.

Now we are ready to prove the Theorem 3.2.

Proof. (of Theorem 3.2). The algorithm performs a binary search over the range [0, 1] by making adaptive queries ϕ, v to the NP language Lopt starting with v = 1. At any iteration of the binary search, we have an interval I = [Il, Ir] and with the invariant Il opt Conf Val(ϕ) < Ir. The binary search stops when the size of the interval [Il, Ir] = 1/22nm log m. Since each iteration of the binary search reduces the size of the interval by a factor of 2, the search stops after making 2nm log m queries to Lopt. The invariant ensures that opt Conf Val(ϕ) is in this interval. Moreover, opt Conf Val(ϕ) F2nm log m (by item (1) of Claim 3.5.1) and there are no other fractions from F2nm log m in this interval (by item (2) of Claim 3.5.1). Now, by making O(nm log m) queries to Lfarey with N = 2nm log m, u = Il, v = Ir, we can construct the binary representation of the unique fraction in F2nm log m that lies between Il and Ir which is opt Conf Val(ϕ).

Next we show the optimal V-interpretation can also be computed in polynomial time with queries to an NP oracle.

Theorem 3.6. opt Conf for formulas in negation normal form can be computed in FPNP.

Remark 2. We revisit the semantics of negation. As stated earlier, by assuming the closed world semantics, we have ℸ(x) = 1 x. We note that this assumption is not strictly necessary for the above proof to go through. Recall that Item (1) of Claim 3.4.1 states that the proof tree polynomial is of the form Q xai i (1 xi)bi. For a general negation function ℸ, the proof tree polynomial is of the form Q xai i (ℸ(xi))bi. Now if the maximum value of a term xa(ℸ(x))b can be found, for example when ℸis an explicit differentiable function, the result will hold.

3.2 Relation to Max Sat for CNF Formulae

In this section we study the opt Conf Val problem for CNF formulae and establish its relation to the Max Sat problem. We ﬁrst exhibit an upperbound on the opt Conf Val(ϕ) using the maximum number of satisﬁable clauses. Building on this result, in Section 3.3 we show that opt Conf Val for CNF formulae is hard for the complexity class FPNP[log]. We ﬁrst deﬁne some notation that will be used in this and next subsections. Let ϕ(x1, xn) = C1 Cm be a CNF formula and let π be an optimal V-interpretation. For each clause C from ϕ, let π (C) be the value achieved by this interpretation, i.e π (C) = Conf(C, π ). Observe that since C is a disjunction of literals, π (C) = maxℓ C{π (ℓ)}. For a clause C, let

ℓC = argmaxℓ C{π (ℓ)}

In the above, if there are multiple maximums, we take the smallest literal as ℓC (By assuming an order x1 < x1 < x2 < x2 < xn < xn). Observe that, since we are working over the Viterbi semiring, Conf(C, π ) = π (ℓC). A literal ℓis maximizing literal for a clause C, if ℓC = ℓ. Since ϕ is a CNF formula, for any V-interpretation π Conf(ϕ, π) is of the form Πm i=1Conf(Ci, π). Given a collection of clauses D from ϕ, the contribution of D to Conf(ϕ, π) is deﬁned as Πc DConf(C, π). The following theorem provides an upperbound on opt Conf Val(ϕ) using Max Sat Val. This is the main result of this section.

Theorem 3.7. Let ϕ(x1, , xn) be a CNF formula with m clauses. Let r be the maximum number of clauses that can be satisﬁed. Then opt Conf Val(ϕ) 1/4(m r).

Proof. Let π be an optimal V-interpretation for ϕ. A clause C is called low-clause if π (C) < 1/2, C is called a high-clause of π (C) > 1/2, and C is a neutral-clause if π (C) = 1/2. Let L, H, and N respectively denote the number of low, high, and neutral clauses. We start with the following claim that relates the number of neutral clauses and the number of high-clauses to r.

Claim 3.7.1. N

Proof. Suppose that the number of low-clauses is strictly less than m r, thus number of high-clauses is more than r. For a variable x, let

px = |{C | C is neutral and ℓC = x}|

and qx = |{C | C is neutral and ℓC = x}|

That is px is the number of neutral clauses for which x is the maximizing literal and qx is the number of neutral clauses for which x is the maximizing literal. Consider the truth assignment that is constructed based on the following three rules: (1) For every high-clause C, set ℓC to True and ℓC to False, 2) For every variable x, if one of px or qx is not zero, then if px qx, then set x to True, otherwise set x to False. (3) All remaining variables are consistently assigned arbitrary to True/False values. We argue that this is a consistent assignment: I.e, for every literal ℓ, both ℓand ℓare not assigned the same truth value. Consider a literal ℓ. If there is a high clause C such that ℓ= ℓC, then this literal is assigned truth value True and ℓ is assigned False. In this case, since π (ℓ) > 1/2, π ( ℓ) < 1/2. Thus ℓcan not be maximizing literal for any highclause and thus Rule (1) does not assign True to ℓ. Again, since π (ℓ) > 1/2, there is no neutral-clause D such that ℓ= ℓD or ℓ= ℓD. Thus Rule (2) does not assign a truth value to either of ℓor ℓ. Since ℓand ℓare assigned truth values, Rule (3) does not assign a truth value to ℓor ℓ. Consider a variable x where at least one of px or qx is not zero. In this case x or x is maximizing literal for a neutral clause. Thus π (x) = π ( x) = 1/2 and neither x nor x is maximizing literal for a high-clause. Thus Rule (1) does not assign a truth value to x or x. Now x is True if and only if px qx, thus the truth value assigned to x (and x) is consistent. Since Rule (3) consistently assigns truth values of literals that are not covered by Rules (1) and (2), the constructed assignment is a consistent assignment. For every high clause C, literal ℓC is set to true. Thus the assignment satisﬁes all the high-clauses. Consider a variable x and let D be the (non-empty) collection of neutral clauses for which either x or x is a maximizing literal. As x is assigned True if and only if px qx, at least half the clauses from D are satisﬁed. Thus this assignment satisﬁes at least H + N

2 clauses. Since r is the maximum number of satisﬁable clauses, the claim follows.

For a literal ℓ, let aℓbe the number of low-clauses C for which ℓis a maximizing literal, i.e,

aℓ= |{C | C is a low-clause and ℓC = ℓ}|,

bℓ= |{C | C is a high-clause and ℓC = ℓ}|,

We show the following relation between aℓand bℓ.

Claim 3.7.2. For every literal ℓ, aℓ bℓ.

We next bound the contribution of neutral and low clauses to opt Conf Val(ϕ). For every neutral clause C, π (C) = 1/2, thus we have the following observation.

Observation 3.8. The contribution of neutral clauses to Conf(ϕ, π ) is exactly 1/2N.

We establish the following claim.

Claim 3.8.1.

Conf(ϕ, π ) = Y

π (ℓ)aℓ (1 π (ℓ))bℓ 1

Finally, we are ready to complete the proof of Theorem 3.7. For every literal ℓ, By Claim 3.7.2, aℓ bℓ. Let bℓ= aℓ+ cℓ, cℓ 0. Consider the following inequalities.

opt Conf Val(ϕ) = Conf(ϕ, π )

π (ℓ)aℓ (1 π (ℓ))bℓ 1

π (ℓ)aℓ (1 π (ℓ))aℓ+cℓ 1

ℓ (π (ℓ)aℓ (1 π (ℓ))aℓ) 1

2N = 1 4L+N/2

In the above, equality at line 2 is due to Claim 3.8.1. The inequality at line 4 follows because (1 π (ℓ)) 1. The last inequality follows because x(1 x) is maximized at x = 1/2. The last equality follows as P aℓ= L. Note that the number of clauses m = N + H + L and by Claim 3.7.1 H + N/2 r. It follows that L + N/2 m r. Thus opt Conf Val(ϕ) = Conf(ϕ, π ) 1 4L+N/2 1 4m r .

3.3 FPNP[log]- Hardness In this subsection, we show that opt Conf Val is hard for the class FPNP[log]. We show this by reducing Max Sat Val to opt Conf Val. Since Max Sat Val is complete for FPNP[log], the result follows. We also show that the same reduction can be used to compute a Max Sat assignment from an optimal V-interpretation. Theorem 3.9. Max Sat Val metric reduces to opt Conf Val for CNF formulae. Hence opt Conf Val is hard for FPNP[log] for CNF formulae.

Proof. Let ϕ(xi, . . . , xn) = C1 . . . Cm be a formula with m clauses on variables x1, . . . , xn. Consider the formula ϕ with m additional variables y1, . . . , ym constructed as follows: For each clause Ci of ϕ, add the clause C i = Ci yi in ϕ . Also add m unit clauses yi. That is

ϕ = (C1 y1) . . . (Cm ym) y1 ym

Claim 3.9.1. opt Conf Val(ϕ ) = 1 4m r where r is the maximum number of clauses that can be satisﬁed in ϕ.

Proof. We show this claim by ﬁrst showing that opt Conf Val(ϕ ) 1 4m r and exhibiting an interpretation π so that Conf(ϕ, π ) = 1 4m r . We claim that if r is the maximum number of clauses that can be satisﬁed in ϕ, then m + r is the maximum number of clauses that can be satisﬁed in ϕ . We will argue this by contradiction. Let a be an assignment that satisﬁes > m + r clause in ϕ . Let s be the number of yis that are set to False. This assignment will

satisfy m s clauses of the form Ci yi. However the total number of clauses of the form Ci yi that are satisﬁed is > m + r s. Thus there are > r clauses of the form Ci yi that are satisﬁed where yi is set to False. This assignment when restricted to xis will satisfy more than r clauses of ϕ. Hence the contradiction. Thus from Theorem 3.7, it follows that opt Conf Val(ϕ ) 1 4m r . Now we exhibit an interpretation π so that Conf(ϕ, π ) = 1 4m r . Consider an assignment a = a1, . . . , an for ϕ that satisﬁes r clauses. Consider the following interpretation π over the variable of ϕ : π (xi) = 1 if ai = True and π (xi) = 0 if ai = False. π (yi) = 0 if and only if Ci is satisﬁed by a. Else π (yi) = 1/2. For every satisﬁable clause Ci, Conf(Ci yi, π ) = 1 and Conf( yi, π ) = 1. For all other clauses C in ϕ , Conf(C, π ) = 1/2. Since there are r clauses that are satisﬁed, the number of clauses for which Conf(C, π ) = 1/2 is 2m 2r. Hence the Conf(ϕ , π ) = 1 4(m r) . Thus opt Conf Val(ϕ ) = 1 4m r .

Since opt Conf Val(ϕ ) = 1/4m r, Max Sat Val for ϕ can be computed by knowing the opt Conf Val.

While the above theorem shows that Max Sat Val can be computed from opt Conf Val, the next theorem shows that a maxsat assignment can be computed from an optimal Vinterpretation.

Theorem 3.10. Max Sat metric reduces to opt Conf.

Proof. Consider the same reduction as from the previous theorem. Our task is to construct a Max Sat assignment for ϕ, given an optimal V-interpretation π for ϕ . By the earlier theorem, Conf(ϕ , π) = 1 4m r , where r is the maximum number of satisﬁable clauses of ϕ. We ﬁrst state a set of claims without proof.

Claim 3.10.1. For every i, if yi is not maximizing literal for clause C i, then π(yi) = 0.

Claim 3.10.2. For all yi; π(yi) {0, 1/2}.

Claim 3.10.3. For all xi, if xi or xi is a maximizing literal, then π(xi) {0, 1, 1/2}

Claim 3.10.4. For every xi with π(xi) = 1/2, xi and xi are maximizing literals for exactly the same number of clauses.

We will show how to construct a Max Sat assignment from π: If π(xi) = 0, set the truth value of xi to False, else set the truth value of xi to True. By Claim 3.10.3, π(xi) = {0, 1/2, 1}. Let H be the number of clauses for which maximizing literal ℓis a x-variable and π(ℓ) = 1. Note that the above truth assignment will satisfy all the H clauses. Let N be number of clauses for which maximizing literal ℓis a x-variable and π(ℓ) = 1/2. By Claim 3.10.4, in exactly N/2 clauses a positive literal is maximizing, and thus all these N/2 clauses are satisﬁed by our truth assignment. Thus the total number of clauses satisﬁed by the truth assignment is N/2+H. Let Y the number of

clauses in which yi is maximizing literal. By Claim 3.10.2, π(yi) = 1/2 when yi is maximizing literal. Thus

Conf(ϕ , π) = 1H (1

2)2Y = 1 4N/2+Y = 1 4m r

The last equality follows from Claim 3.9.1. Thus m r = N/2 + Y , combining this with m = H + N + Y , we obtain that N/2 + H = r. Thus the truth assignment constructed will satisfy r clauses and is thus a Max Sat assignment.

4 Approximating opt Conf Val We study the problem of approximating opt Conf Val efﬁciently. Below, a k-SAT formula is a CNF formula with exactly k distinct variables in any clause. We start with the following proposition. Lemma 4.1. Let a1, an be an assignment, that satisﬁes r clauses of a CNF formula ϕ(x1, xn). There is an interpretation π so that Conf(ϕ, π) is m r

Hence, for example, if ϕ is a 3-SAT formula, since a random assignment satisﬁes 7/8 fraction of the clauses in expectation, for a random assignment r 7m/8, and by Lemma 4.1, opt Conf Val(ϕ) > 0.686m. The following lemma shows that one can get a better lower bound on opt Conf Val in terms of the clause sizes for CNF formulae. Lemma 4.2. For every CNF formula ϕ, opt Conf Val(ϕ) e P

i 1 ki where ki is the arity of the i th clause in ϕ. This yields a probabilistic algorithm. For example, if ϕ is a 3-SAT formula, opt Conf Val(ϕ) > 0.716m and thus improving on the result of Lemma 4.1. In fact, we can design a deterministic polynomial time algorithm that ﬁnds an interpretation achieving the trust value guaranteed by Lemma 4.2, using the well-known method of conditional expectation to derandomize the construction in the proof (For example, see (Alon and Spencer 2008; Goemans and Williamson 1994)). Theorem 4.3. There is a polynomial-time, e m/kapproximation algorithm for opt Conf, when the input formulas are k-CNF formulas with m-clauses. Next, we show that the approximation factor e m/k can not be signiﬁcantly improved. Theorem 4.4. There is no polynomial-time 1 4m(2 k ε) - approximation algorithm for opt Conf for k-SAT formulae, unless P = NP. Thus, for example for 3-SAT formulas, while we have a polynomial-time, 0.716m-approximation algorithm (by Theorem 4.3), we cannot expect an efﬁcient 0.845mapproximation algorithm by the above result unless P equals NP. It remains an interesting open problem to determine the optimal approximation ratio for this problem achievable by a polynomial time algorithm.

5 Complexity of Access Maximization In this section, we study the optimization problems for the access control semiring Ak = ([k], max, min, 0, k). We refer to the corresponding computational problems as

opt Access Val and opt Access. For this section we ﬁrst assume the negation function is the additive inverse modulo k. That is ℸ(a) = b such that a + b 0 (mod k). Theorem 5.1. Let ϕ(x1, xn) be a propositional formula in negation normal form and Ak = ([k], max, min, 0, k). The following statement holds. If ϕ is satisﬁable, then opt Access Val(ϕ) = k. If ϕ is not satisﬁable, then opt Access Val(ϕ) = k

2 . For a general negation function, we can establish an analogous theorem. For this, we deﬁne the notion of the index of negation. Given a negation function ℸ, its index denoted by Index(ℸ) is the largest ℓfor which there exists a [k], such that both a and ℸ(a) are at least ℓ. Theorem 5.2. Let ϕ(x1, xn) be a propositional formula in negation normal form and Ak = ([k], max, min, 0, k). The following statement holds. If ϕ is satisﬁable, then opt Access Val(ϕ) = k. If ϕ is not satisﬁable, then opt Access Val(ϕ) = Index(ℸ). The following is a corollary to the above result and its proof which states that the complexity of optimization problems over access control semiring is equivalent to their complexity over the Boolean semiring. Theorem 5.3. The problem opt Access Val and SAT are equivalent under metric reductions. Similarly, the problem opt Access and the problem of computing a satisfying assignment of a given Boolean formula are equivalent under metric reductions.

6 Conclusion In this work, we provided a comprehensive study of the computational complexity of opt Sem and the related problem opt Sem Val over various semirings such as Viterbi semiring, tropical semiring, access control semiring and fuzzy semiring, from both an algorithmic and a complexity-theoretic viewpoint. An exciting recent development in the ﬁeld of CSP/SAT solving has been the development of solvers for Lex SAT, which seeks to ﬁnd the smallest lexicographic satisfying assignment of a formula (Marques-Silva et al. 2011). In this regard, Theorem 3.2 opens up exciting directions of future work to develop efﬁcient techniques for opt Conf.

Acknowledgements We thank the anonymous reviewers of AAAI-23 for their valuable comments. We thank Val Tannen for introducing us to the world of semiring semantics and for helpful conversations during the nascent stage of this work. Meel s research is supported by the National Research Foundation under the NRF Fellowship Programme[NRF-NRFFAI1-20190004] and Campus for Research Excellence and Technological Enterprise (CREATE) programme. Bhattacharyya was supported in part by the NRF Fellowship Programme [NRFNRFFAI1-2019-0002] and an Amazon Research Award. Vinod was supported in part by NSF CCF-2130608 and NSF HDR:TRIPODS-1934884 awards. Pavan was supported in part by NSF CCF-2130536, and NSF HDR:TRIPODS1934884 awards.

References Alon, N.; and Spencer, J. H. 2008. The Probabilistic Method, Third Edition. Wiley-Interscience series in discrete mathematics and optimization. Wiley. Amsterdamer, Y.; Deutch, D.; and Tannen, V. 2011. Provenance for aggregate queries. In Proc. of PODS, 153 164. Bistarelli, S. 2004. Semirings for soft constraint solving and programming, volume 2962. Springer Science & Business Media. Bistarelli, S.; and Gadducci, F. 2006. Enhancing constraints manipulation in semiring-based formalisms. In ECAI, volume 141, 63 67. Bistarelli, S.; Montanari, U.; and Rossi, F. 1995. Constraint solving over semirings. In IJCAI (1), 624 630. Citeseer. Bistarelli, S.; Montanari, U.; and Rossi, F. 1997. Semiringbased constraint satisfaction and optimization. J. ACM, 44(2): 201 236. Bistarelli, S.; Montanari, U.; Rossi, F.; Schiex, T.; Verfaillie, G.; and Fargier, H. 1999. Semiring-based CSPs and valued CSPs: Frameworks, properties, and comparison. Constraints, 4(3): 199 240. Cui, Y. 2002. Lineage tracing in data warehouses. Ph.D. thesis, Stanford University. Cui, Y.; Widom, J.; and Wiener, J. L. 2000. Tracing the lineage of view data in a warehousing environment. ACM Transactions on Database Systems (TODS), 25(2): 179 227. Deutch, D.; Milo, T.; Roy, S.; and Tannen, V. 2014. Circuits for Datalog Provenance. In Proc. of ICDT, 201 212. Open Proceedings.org. Eiter, T.; and Kiesel, R. 2021. On the Complexity of Sum-of Products Problems over Semirings. In Proc. of AAAI, 6304 6311. AAAI Press. Foster, J. N.; Green, T. J.; and Tannen, V. 2008. Annotated XML: queries and provenance. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 271 280. Fuhr, N.; and R olleke, T. 1997. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems (TOIS), 15(1): 32 66. Goemans, M. X.; and Williamson, D. P. 1994. New 3/4Approximation Algorithms for the Maximum Satisﬁability Problem. SIAM J. Discret. Math., 7(4): 656 666. Gr adel, E.; and Mrkonjic, L. 2021. Elementary Equivalence Versus Isomorphism in Semiring Semantics. In Proc. of ICALP, volume 198 of LIPIcs, 133:1 133:20. Gr adel, E.; and Tannen, V. 2020. Provenance analysis for logic and games. Moscow Journal of Combinatorics and Number Theory, 9(3): 203 228. Green, T. J. 2011. Containment of conjunctive queries on annotated relations. Theory of Computing Systems, 49(2): 429 459. Green, T. J.; Karvounarakis, G.; and Tannen, V. 2007. Provenance semirings. In Proc. of PODS, 31 40.

Imieli nski, T.; and Lipski Jr, W. 1989. Incomplete information in relational databases. In Readings in Artiﬁcial Intelligence and Databases, 342 360. Elsevier. Klein, D.; and Manning, C. D. 2003. A* Parsing: Fast Exact Viterbi Parse Selection. In Hearst, M. A.; and Ostendorf, M., eds., Proc. of HLT-NAACL. The Association for Computational Linguistics. Krentel, M. W. 1988. The Complexity of Optimization Problems. J. Comput. Syst. Sci., 36(3): 490 509. Marques-Silva, J.; Argelich, J.; Grac a, A.; and Lynce, I. 2011. Boolean lexicographic optimization: algorithms & applications. Annals of Mathematics and Artiﬁcial Intelligence, 62(3): 317 343. Meseguer, P.; Rossi, F.; and Schiex, T. 2006. Soft constraints. In Foundations of Artiﬁcial Intelligence, volume 2, 281 328. Elsevier. Mohri, M. 2002. Semiring Frameworks and Algorithms for Shortest-Distance Problems. J. Autom. Lang. Comb., 7(3): 321 350. Pavan, A.; Meel, K. S.; Vinodchandran, N. V.; and Bhattacharyya, A. 2023. Constraint Optimization over Semirings. ar Xiv:2302.12937. Tannen, V. 2013. Provenance propagation in complex queries. In In Search of Elegance in the Theory and Practice of Computation, 483 493. Springer. Tannen, V. 2017. Provenance analysis for FOL model checking. ACM SIGLOG News, 4(1): 24 36. Viterbi, A. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory, 13(2): 260 269. Zim anyi, E. 1997. Query evaluation in probabilistic relational databases. Theoretical Computer Science, 171(1-2): 179 219.