# büchi_lindenbaum_tarski_a_program_analysis_appetizer__a9e591d8.pdf B uchi, Lindenbaum, Tarski: A Program Analysis Appetizer Vijay D Silva Google Inc., San Francisco Caterina Urban ETH Z urich, Zurich One can prove that a program satisfies a correctness property in different ways. The deductive approach uses logic and is automated using decision procedures and proof assistants. The automata-theoretic approach reduces questions about programs to algorithmic questions about automata. In the abstract interpretation approach, programs and their properties are expressed in terms of fixed points in lattices and reasoning uses fixed point approximation techniques. We describe a research programme to establish precise, mathematical correspondences between these approaches and to develop new analyzers using these results. The theoretical tools we use are the theorems of B uchi that relate automata and logic and a construction of Lindenbaum and Tarski for generating lattices from logics. This research has lead to improvements in existing tools and we anticipate further theoretical and practical consequences. 1 Introduction The problem of determining if a program does what it is supposed to do dates back to the origins of computer science. Goldstine and von Neumann [1947] included assertion boxes in their language for the IAS machine and stated that a programmer should guarantee that assertions were not violated. At a meeting in Cambridge, Turing [1949] presented techniques for reasoning about program correctness and termination. Their work has been rediscovered and significantly extended by program verification research [Knuth, 2003]. We briefly recall three approaches for reasoning about programs and describe our efforts to relate them mathematically and combine them algorithmically. In satisfiability-based approaches, bounded executions of a program P are encoded as a formula Exec(P) and assertion violations are encoded as a formula Err. The theorem below states that no bounded execution of P violates the assertion. Exec(P) =) Err Solvers for satisfiability of formulae in a theory prove such theorem by showing that Exec(P) Err is unsatisfiable [Bjørner and de Moura, 2014]. Rather than viewing correctness as a theorem, in model checking, one checks if P is a model of the formula Err [Clarke et al., 1999; Baier and Katoen, 2008]. In the automata-theoretic approach to model checking [Vardi and Wolper, 1994], the executions of P are viewed as words accepted by an automaton AP and erroneous executions are viewed as words accepted by an automaton AErr. The program P contains no assertion violation exactly if the language of the product automaton is empty. L(AP AErr) = ; One appeal of this approach is that it reduces questions about complex structures such as temporal properties and programs to language inclusion. Moreover, automata are labelled, directed graphs, so the model checking problem becomes one amenable to graph algorithms. The lattice-theoretic approach to reasoning about programs has its origins in programming language semantics and compiler construction. Scott [1971] defined the meaning of a program, denoted JPK, as a fixed point of a function on a lattice. The abstract interpretation framework of Cousot and Cousot [1977], extended early work in compiler construction, by showing how to interpret P and Err in a lattice A of approximations. The program is error-free if the lattice element is separate from the lattice element denoted by the error. JPKA u JErr KA v ? Algorithms for approximation of fixed points are used to determine if the order above holds. There are currently both academic and commercial tools based on these techniques. These tools differ in their degree of automation, the programs they can reason about, and their performance. These differences have lead to research to combine these techniques. We describe here the initial steps of a programme that seeks to facilitate exchange of techniques between these approaches by establishing mathematical translations between them. Our main observation, illustrated in Figure 1, is that by applying classic theorems in logic, automata theory and lattice theory, one can translate between the mathematical structures used in these three approaches. A theorem of B uchi [1960] shows that a word is accepted by a finite automaton exactly Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Automata L(AP AErr) ; Logic Exec(P) =) Err Concrete Lattice (P(Exec), ) Abstract Lattice JPKA u JErr KA v ? B uchi s Theorem Lindenbaum-Tarski Figure 1: One may use deductive, automata theoretic, or lattice-theoretic techniques to prove a property of a program. B uchi s theorem allows us to view automata as formulae, and by extending it, we view control-flow graphs as formulae. The Lindenbaum-Tarski construction allows for generating a lattice from a logic and by inverting it, we identify logics and proof systems corresponding to the lattice-theoretic approach. if that word is a model of a formula in the weak, monadic, second-order theory of the successor function (WS1S). That is, a regular language, which is an element of the lattice of languages, can be defined as the language L(A) of an automaton or the models mod(') of a formula. We show that by extending B uchi s construction, we can encode the executions of a CFG as the models of formulae in WS1S(T), an extension of WS1S that can represent program variables. To relate logic and lattices, we use the Lindenbaum-Tarski construction [Rasiowa and Sikorski, 1963], which was originally used to relate the propositional calculus and Boolean algebras. Our observation is that a logic L characterizes the lattice A used in an abstract interpreter if the Lindenbaum-Tarski algebra of L is isomorphic to A. The problem of giving a logical characterization of the lattice in an abstract interpreter amounts to inverting the Lindenbaum-Tarski construction. 2 Programs to Second-Order Logics Program verification algorithms, like compiler optimizations, are often not formulated in terms of programs but rather in terms of control flow graphs (CFGs). We now describe a connection between CFGs and second-order logic, which follows from a simple extension of B uchi s theorem. We use a condensed, non-standard, representation of CFGs, with labels on edges, to emphasise the similarity to automata. Figure 2 contains the CFG for a program that initializes a variable x to 0 and increments x by 2 as long as x is at most 9. The property we are interested in is encoded as the assertion that x is 10 after the program executes. The CFG has locations for the entry of the program (in), the loop head (hd), the loop body (bd), the loop exit (ex), an error (er), reached if the assertion is violated, and the safe location (sf), reached if the assertion holds. A CFG can be viewed as an automaton in which the states [x 6= 10] [x = 10] Figure 2: A CFG for a program with an assertion. 8i.First(i) ) Xin(i) 8i.8j.Xhd(j) Suc(i, j) ) ((suc(x) = 0)(i) Xin(i)) _ ((suc(x) = x + 2)(i) Xbd(i)) 8i.8j.Xbd(j) Suc(i, j) ) (x 9 ) suc(x) = x)(i) Xhd(i)) 8i.8j.Xex(j) Suc(i, j) ) (x > 9 ) suc(x) = x)(i) Xhd(i) 8i.8j.Xer(j) Suc(i, j) ) (x 6= 10 ) suc(x) = x)(i) Xex(i) 8i.8j.Xsf(j) Suc(i, j) ) (x = 10 ) suc(x) = x)(i) Xex(i) 8i.Last(i) ) Xex(i) Figure 3: A formula in the monadic, second order theory of one successor extended with a first-order theory of arithmetic. Model of this formula correspond to executions of the CFG. correspond to locations in code and transitions are labelled with statements that are executed when moving from one location to the other. The initial location is an initial state. There are several choices for the final location. For this example we consider the error location as the final state, so that we reason about the assertion violation in terms of reachability of the final state. An execution of a program is a path through a CFG and every execution corresponds to a word accepted by the automaton. The converse is not true: not every path from the initial to the final state corresponds to an execution. The sequence of locations and labels in, x := 0, hd, [x > 9], ex, [x 6= 10], er is a path in the automaton but it does not define an execution because the condition [x > 9] is not satisfied after the assignment x := 0. A path only defines an execution if it is possible to execute all statements on that path in sequence. The program respects the assertion if the CFG, viewed as an automaton, accepts a word that defines an executions. We now describe how, by extending B uchi s theorem, the question of existence of a feasible path can be viewed as that of satisfiability of a formula. First, we describe the structures over which such formulae are interpreted. Intuitively, a structure consists of a trace, which encodes the data flow of a single execution, and a position assignment, which encodes control flow. Formally, a state s : Var ! Val associates program variables with values and a trace = s0, s1, . . . , sn 1 is a sequence of states. A position assignment σ maps secondorder variables to finite subsets of the natural numbers. A structure ( , σ) consists of a trace of length n and a position assignment that only uses finite subsets of [0, n 1]. The formula corresponding to our running example is shown in Figure 3. The formula contains second-order variables, such as Xin or Xex, for CFG locations. The variables are monadic meaning they are interpreted as one-place predicates. The predicate Xex(i) is interpreted as being true at po- sition i of an execution if the program is in location ex after i steps. The predicate Suc(i, j) is true if there is a transition from position i to position j in the trace. The condition Xhd(j) Suc(i, j) ) Xin(i) expresses that the location hd is reached from the location in. The condition Xhd(j) Suc(i, j) ) (suc(x) = 0) Xin(i) additionally states that the program variable x is 0 after this transition. The conjuncts in Figure 3 logically encode the structure of the CFG and the constraints imposed by its labels. The logic in which this formula is expressed is called the weak, monadic, second-order theory of one successor, modulo a theory, abbreviated to WS1S(T), which extends B uchi s logic WS1S. The logic is weak because second-order variables are interpreted over finite (rather than infinite) sets of positions. The theory T enables reasoning about program data. Theorem 1 A location loc is reachable in a CFG G exactly if the formula Reach G,loc is satisfiable. The formula Reach G,loc, defined by D Silva and Urban [2015a], has similar structure to the one in Figure 3. Theorem 1 reduces the problem of finding an execution that violates an assertion to a satisfiability problem. It is important to recognize that B uchi s original theorem established a deeper connection between automata and logic than what our theorem does for CFGs. B uchi gave an algorithm for constructing a finite automaton from a WS1S formula. It followed, from the decidability of language inclusion for finite automata, that the satisfiability problem for WS1S was decidable. Our translation only goes from CFGs to formulae and not in the other direction. In fact, like reachability in programs, the satisfiability problem for WS1S(T) is not decidable. 3 Abstract Interpretation and the Lindenbaum-Tarski Construction Loop invariants are fundamental to reasoning about programs. The loop invariant below enables proving that the assertion is not violated in Figure 2. ' ˆ= x 0 x 10 x 0 (mod2) A central tenet of lattice-theoretic program analysis is that loop invariants are fixed points of functions on lattices [Cousot and Cousot, 1977; Clarke, 1977]. We now recall elements of abstract interpretation and our initial results in identifying logics corresponding to lattices. If the variable x in Figure 2 is an integer, the set of possible values of x is an element of the lattice (P(Z), ) of subsets of integers. The function below models the loop. F(X) = {0} [ {x + 2 | x 2 X, x 9} A fixed point of this function is a set X satisfying that F(X) = X. The set of values of x satisfying the formula ' above is a fixed point and in fact, the least one. In general, the strongest invariant of a loop is not computable. In abstract interpretation, one reasons about fixed points of a different function over a different lattice. Figure 4 [ 1, 70] [0, +1] [30, +1] [ 1, 100] [30, 70] [0, 70] [30, 100] [3, 5] [5, 7] [3, 3] [5, 5] [7, 7] Figure 4: The lattice of intervals. 1st iteration 2nd 6th/fixed point in x:> x:> x:> hd x:[0, 0] x:[0, 2] x:[0, 10] bd x:[2, 2] x:[2, 4] x:[2, 10] ex x:? x:? x:[10, 10] er x:? x:? x:? sf x:? x:? x:[10, 10] Figure 5: A fixed point computation in the lattice of intervals. depicts the lattice (Itv, v) of integer intervals. An interval is a pair [a, b], where a b and a and b are in Z [ { 1, 1}. Every set of integers is abstracted by the unique, smallest interval that contains it: for instance [1, 3] is the smallest interval containing {1, 3}. The interval abstracts {1, 3} because one loses the information that 2 is not in the set. The function G, below, lifts the loop to intervals. G([a, b]) = [0, 0] t ([a, b] u [ 1, 9]) [2, 2] This function states that the values of x are initially in the interval [0, 0] and in each iteration, the sub-interval of [a, b] below 9 is incremented by 2. By associating such a function with each edge in the CFG, one obtains a system of equations that can be solved to obtain a fixed point. Figure 5 demonstrates an interval analysis of the loop. Each column contains the interval associated with each program location at each iteration. Initially, the value of x is arbitrary at in, while ex, er and sf are considered unreachable. The final column contains bounds on x computed by the analysis. The interval at hd is [0, 10], which is a loop invariant but not the strongest one. The na ıve iteration shown here is inefficient and may not terminate, and numerous improvements are used in practice. It is folk wisdom in the abstract interpretation community that lattices such as the intervals can be viewed as logics that are closed under conjunction but not disjunction or negation. The intuition behind this wisdom is that the meet operation on intervals [0, 5] u [3, 7] = [3, 5] is the intersection of the values in these intervals. However, the join, such as [0, 3] t [5, 7] = [0, 7], may contain more values than the union. We now describe how this intuition can be made precise. Γ, x n ' [m n] UB-L Γ, x m ' Γ x m [m n] UB-R Γ x n Γ, x m ' [m n] LB-L Γ, x n ' Γ x n [m n] LB-R Γ x m [m < n] R Γ, x m x n Figure 6: A subset of the proof rules of the interval logic. Interval logic. The sets of values definable by the intervals correspond to models of formulae in the logic below. ' ::= x m | x n | ' ' For example the interval x:[ 1, 9] corresponds to x 9, x:[3, 9] to x 3 x 9, while the logical constants tt and correspond to the maximal interval [ 1, 1] and the empty interval ?, respectively. Some proof rules for reasoning about interval formulae are shown in Figure 6. The proof system is meant to capture the reasoning encoded in the lattice. For example, the meet operation of the lattice satisfies the identity [ 1, 3] u [5, 1] = ?, which we can derive, logically, by applying a proof rule. [3 < 5] R x 3 x 5 Another example is [ 1, 3]u[ 1, 9] = [ 1, 3], which we can view as two inequalities. [ 1, 3] u [ 1, 9] v [ 1, 3] [ 1, 3] v [ 1, 3] u [ 1, 9] These two inequalities correspond to the two proofs shown below, which use standard sequent calculus rules in addition to the interval rules. I x 3 x 3 L1 x 3 x 9 x 3 UB-L x 3 x 9 R x 3, x 3 x 3 x 9 CL x 3 x 3 x 9 A major question that now remains is whether we can rigorously argue that the logic we have provided captures the interval lattice. A closely related question was addressed in the early days of logic, by Tarski, extending a construction of Lindenbaum. Two formulae in a logic L are considered equivalent if they are interderivable in the proof system of L. The order relation below holds between two equivalence classes if some formula in the second is derivable from some formula in the first. A meet operation can be defined by lifting conjunction from formulae to equivalence classes. ' L if ' and '. [']L 4 [ ]L if 1 2 for some 1 2 [']L, 2 2 [ ]L. [']L f [ ]L ˆ= [ 1 2]L for 1 2 [']L, 2 2 [ ]L. The Lindenbaum-Tarski construction described above defines a lattice only for logics in which L is a congruence with respect to logical operations. Such logics are algebraizable [Rasiowa and Sikorski, 1963]. We have shown that applying this construction to the interval logic yields a lattice isomorphic to the intervals. By providing an explicit logic that characterizes the interval lattice, we have made precise the intuition about the reasoning capabilities of that lattice. This characterization allows for calculations that are performed during interval analysis to be viewed as deductions in a proof system. Other features of the lattice highlighted by this logical treatment are that the atomic predicates are one-way infinite intervals, which correspond to meet-irreducibles: lattice elements that are not derivable from other, distinct lattice elements by meet operations. In [D Silva and Urban, 2015a], we further show how the operation of an abstract interpreter, as illustrated in Figure 5, can be viewed as deduction in a satisfiability solver. 4 Discussion and Conclusion There are currently multiple techniques for reasoning about programs. These techniques appear fundamentally different in the mathematical foundations they use. This difference impedes our ability to combine the strengths of the techniques, both in theory and practice. We have shown that by using classic results in logic, lattice theory and automata theory, one can identify new relationships between the automata-theoretic, deductive and latticetheoretic approaches to program verification. Though the work described here is a first step in a longer research effort, it has already lead to improvements in a tool for termination analysis [D Silva and Urban, 2015b]. There are several immediate extensions that will deepen our understanding of the relationships we have identified. We have focused on languages of finite words and the connection to reachability analysis. To model termination, procedure calls and concurrency, one has to consider the analogues of B uchi s theorem for B uchi automata, nested word automata and asynchronous automata. We have restricted our study of abstract interpreters to lattices. We believe that functions in an abstract interpreter correspond to first-order modalities, and our proofs have to be extended using the Lindenbaum Tarski construction for modal logics. Our work has opened the door to a proof-theoretic interpretation and investigation of lattice-based analyzers. Questions concerning cut elimination, proof normalization, and lower bounds on proof size, which would not have made sense in the context before our work are now waiting to be answered. We believe that answering these and other questions will deepen our theoretical understanding of different approaches to reasoning about programs and also extend the boundary of what can be successfully automated in practice. [Baier and Katoen, 2008] Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. The MIT Press, 2008. [Bjørner and de Moura, 2014] N. Bjørner and L. de Moura. Applications of SMT solvers to program verification. In Notes for the Summer School on Formal Techniques, 2014. [B uchi, 1960] J. Richard B uchi. On a Decision Method in Restricted Second Order Arithmetic. In Logic, Methodology and Philosophy of Science, pages 1 11. Stanford Univ. Press, 1960. [Clarke et al., 1999] Edmund M. Clarke, Jr., Orna Grum- berg, and Doron A. Peled. Model Checking. MIT Press, Cambridge, MA, USA, 1999. [Clarke, 1977] Edmund M. Clarke. Program Invariants as Fixed Points. Foundations of Computer Science, pages 18 29, 1977. [Cousot and Cousot, 1977] Patrick Cousot and Radhia Cousot. Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Proc. of Principles of Programming Languages, pages 238 252. ACM Press, 1977. [D Silva and Urban, 2015a] Vijay D Silva and Caterina Ur- ban. Abstract Interpretation as Automated Deduction. In Proc. of Automated Deduction, pages 450 464. Springer, 2015. [D Silva and Urban, 2015b] Vijay D Silva and Caterina Ur- ban. Conflict-Driven Conditional Termination. In Proc. of Computer Aided Verification, pages 271 286. Springer, 2015. [Goldstine and von Neumann, 1947] Herman H. Goldstine and John von Neumann. Planning and Coding of Problems for an Electronic Computing Instrument. Technical report, Institute for Advanced Study, 1947. [Knuth, 2003] Donald E. Knuth. Robert w floyd, in memo- riam. SIGACT News, 34(4):3 13, December 2003. [Rasiowa and Sikorski, 1963] H. Rasiowa and R. Sikorski. The Mathematics of Metamathematics. Polish Academy of Science, Warsaw, 1963. [Scott, 1971] Dana S. Scott. The Lattice of Flow Diagrams. Semantics of Algorithmic Languages, 188:311 366, 1971. [Turing, 1949] Alan Turing. Checking a Large Routine. Re- port of a Conference on High Speed Automatic Calculating Machines, pages 67 69, 1949. [Vardi and Wolper, 1994] Moshe Y. Vardi and Pierre Wolper. Reasoning about Infinite Computations. Information and Computation, 115(1):1 37, 1994.