# learning_the_language_of_software_errors__03c13a35.pdf

Journal of Artiﬁcial Intelligence Research 67 (2020) 881-903 Submitted 10/2019; published 04/2020

Learning the Language of Software Errors

Hana Chockler hana.chockler@kcl.ac.uk Department of Informatics, King s College London

Pascal Kesseli pascal.kesseli@diffblue.com Daniel Kroening kroening@cs.ox.ac.uk Department of Computer Science, University of Oxford, UK

Ofer Strichman ofers@ie.technion.ac.il Information Systems Engineering, Technion, Haifa, Israel

We propose to use algorithms for learning deterministic ﬁnite automata (DFA), such as Angluin s L algorithm, for learning a DFA that describes the possible scenarios under which a given program error occurs. The alphabet of this automaton is given by the user (for instance, a subset of the function call sites or branches), and hence the automaton describes a user-deﬁned abstraction of those scenarios. More generally, the same technique can be used for visualising the behavior of a program or parts thereof. It can also be used for visually comparing diﬀerent versions of a program (by presenting an automaton for the behavior in the symmetric diﬀerence between them), and for assisting in merging several development branches. We present experiments that demonstrate the power of an abstract visual representation of errors and of program segments, accessible via the project s web page. In addition, our experiments in this paper demonstrate that such automata can be learned eﬃciently over real-world programs. We also present lazy learning, which is a method for reducing the number of membership queries while using L , and demonstrate its eﬀectiveness on standard benchmarks.

1. Introduction

Many automated veriﬁcation tools produce a counterexample trace when an error is found. These traces are often unintelligible because they are too long (an error triggered after a single second can correspond to a path with millions of states), too low-level, or both. Moreover, a trace focuses on just one speciﬁc scenario. Thus, error traces are frequently not general enough to help focus the attention of the programmer on the root cause of the problem. A variety of methods have been proposed for the explanation of counterexamples, such as ﬁnding similar paths that satisfy the property (Groce et al., 2006) and analysing causality (Beer et al., 2012), but these focus on a single counterexample. The analysis of multiple counterexamples has been suggested in the hardware domain by Copty et al. (2003), who propose to compute all counterexamples and present those states that occur in all of them to the user. The analysis of multiple counterexamples has also been suggested in the context of

2020 AI Access Foundation. All rights reserved.

Chockler, Kesseli, Kroening & Strichman

inspect before

seq sort core merge pair merge single node inspect after

Learn Assert

inspect after

inspect after

inspect after

Figure 1: An abstract description of a merge-sort program, where the letters are the function calls.

a push-down automaton (PDA) (representing software) and a deterministic ﬁnite automaton (DFA) (representing a negated property) by Basu et al. (2003), who describe the generation of all loop-free counterexamples of a certain category, and the presentation of them to the user in a tree-like structure. In the software domain, another notable example is the model checker MS-SLAM, which reports multiple counterexamples if they are believed to relate to diﬀerent causes (Ball et al., 2003). Each example is localized by comparing it to a trace that does not violate the property. We suggest a method (and a corresponding tool) that presents the user a DFA that summarizes all the ways (up to a given bound, as will be explained) in which an assertion can fail. Furthermore, the alphabet of this automaton is user-deﬁned and corresponds to events in the program execution. For example, the user can specify that a call to a function f in a speciﬁc location is a letter and an entry to a speciﬁc branch is a diﬀerent letter. With a small number of such user-deﬁned events, the traces of the program that lead to an assertion failure can be visualized concisely. We argue that this combination of user-deﬁned abstraction with a compact representation of multiple counterexamples addresses all three problems mentioned above. Moreover, the same idea can be applied to describing a program or, more realistically, parts of a program by adding an assert(false) at the end of the sub-program to be explained. Figure 1, for instance, gives an automaton that describes the operation of a merge-sort program in terms of its possible function calls.1 We obtained it by inserting such a statement at the end of the main function. Our method is based on a DFA-learning algorithm, such as L (Angluin, 1987b), RS (Rivest & Schapire, 1993), COMFORT (Chaki & Strichman, 2008), KV (Kearns & Vazirani, 1994), DT (Howar, 2012) or TTT (Isberner et al., 2014). These algorithms learn a minimal DFA that captures the (regular) language of a model U over a given alphabet Σ, the behavior of which is communicated to the learning algorithm via interface functions called the teacher and the oracle . In the discussion that follows we will simply call this algorithm the learner, as our methods are orthogonal to the learning algorithm. The learner asks the teacher membership queries over Σ, namely whether w U, where w is a word and U is the language (the model), and conjecture queries, namely whether for a given DFA A, L(A) = U. The number of queries that the algorithm performs is polynomial in the size of the alphabet, in the number of states of the resulting minimal DFA, and in the length of the longest counterexample (feedback) to a conjecture query returned by the oracle (see Sec. 2 for a detailed description). The use of learners in the veriﬁcation community, to the best of our knowledge, has been restricted so far to the veriﬁcation process itself: to model components in an assumeguarantee framework, e.g., by Giannakopoulou et al. (2012), or to model the input-output relation in speciﬁc types of programs, in which that relation is suﬃcient for verifying certain

1. Source code for all examples we use is available online (http://www.cprover.org/learning-errors/, 2015).

Learning the Language of Software Errors

properties (Botin can & Babi c, 2013; Argyros et al., 2016). Another popular direction of research that uses learners and has some similarities to our work is speciﬁcation mining (Alur et al., 2005; Shoham et al., 2008). In both veriﬁcation and speciﬁcation mining the focus is on inferring the speciﬁcation of an API, i.e., deducing the correct usage of it, and in both cases the answer is given in the form of a ﬁnite automaton where the alphabet is the API functions identiﬁers. As the setting in speciﬁcation mining is diﬀerent from ours, the methodology and the results diﬀer as well; for example, the speciﬁcation-mining algorithms typically infer an overapproximating abstraction of the code under analysis. Trivially, the language that describes a part of a program or the behaviors that fail an assertion, is not regular in the general case. We therefore bound the length of the traces we consider by a constant, and thereby obtain a ﬁnite set of bounded words. The automaton that we learn may accept unbounded words, but our guarantee to the user is limited: any word in L(A), up to the given bound, corresponds to a real trace in the program. We will formalize this concept in Sect. 3. The fact that A may have loops has both advantages and disadvantages. Consider, for example, the program in Figure 2 (left). Suppose that Σ is the set of functions that are called. With a small bound on the word length we may get the automaton in Figure 2 (right), which among others, accepts the word g120 f. The bound is not long enough to exclude this word, rendering the language of the automaton a superset of the language of error. On the other hand, if g had no eﬀect on the reachability of f (i.e., in this case, g did not contain the exit(0) statement), then the automaton would capture the language of error precisely, despite the fact that we are only examining bounded traces.

void g(int x) { if (x > 100) exit(0); }

void f() { assert(0); }

int main(int argc, char* argv[]) { for (int i=0; i < argc; i++) g(i); f(); }

Figure 2: A program and an automaton that we learn from it when using a low bound (< 100) on the word length.

We note that the automaton we generate is conceptually diﬀerent from a control-ﬂow graph (CFG) of a program mapped on a set of interesting events. This is because a CFG is based on the structure of a program, whereas the automaton generated by the learner is based on the actual executions, and in general, cannot be deduced from the CFG. It is also very distinct from just showing the call graph of the program, even if we choose function calls as our alphabet: ﬁrst, it is based on a semantic analysis of the program and hence excludes non existing transitions; second, a call graph has one node per function, and hence cannot make a distinction between paths that go through it. For example, suppose that the program can progress (only) in the following two paths:

main f1 g h1

and main f2 g h2.

Chockler, Kesseli, Kroening & Strichman

Figure 3: Example call graph.

A call graph, depicted in Figure 3, illustrates this point: based on the control graph, we cannot distinguish our program from one that also allows the path

main f1 g h2.

A DFA representation, on the other hand, does not make this abstraction. In the next section we will brieﬂy describe a DFA learning algorithm (speciﬁcally, we describe the L algorithm, since it is the most basic from the various known DFA-learning algorithms that later improved it). In Sec. 3 we will deﬁne the language we learn precisely, and describe our method for answering the queries in Sec. 4. In Sec. 5 we pause the technical description of the method, and describe usage scenarios. We then continue in Sec. 6 by describing a performance optimization, which constitutes the main contribution of this article comparing to the early proceedings version (Chapman et al., 2015). Speciﬁcally, we describe a technique we call lazy learning, which is targeted at reducing expensive membership queries. In Sec. 8 we describe various aspects of our system and our empirical evaluation of it. More examples can be found on the project s website (http://www.cprover.org/learning-errors/, 2015). We conclude with some ideas for future research in Sec. 9.

2. Preliminaries

The ideas and algorithms presented in this paper are based on the combination of learning and software (bounded) model-checking. We now outline the relevant concepts from these two areas. In both cases we focus on the minimum that is necessary for understanding the contribution, as we mostly use them (especially the latter) as black-box .

2.1 Deterministic Finite Automata

We start by revisiting the (well-known) deﬁnition of deterministic ﬁnite automata (Hopcroft et al., 2000).

Deﬁnition 1 (Deterministic Finite Automaton). A deterministic ﬁnite automaton (DFA) A is a 5 tuple S, init, Σ, δ, F , where S is a ﬁnite set of states, init S is the initial state, Σ is the alphabet, δ : S Σ S is the transition function, and F S is the set of accepting states. A ﬁnite string w over Σ is said to be accepted by A if and only if starting at init and following the path in A induced by w ends in an S state. The set of strings accepted by A is called A s language and is denoted by L(A).

Learning the Language of Software Errors

yes / (no + feedback)

A such that

Figure 4: The input and output of L , and its interaction with the teacher and oracle.

The complement operation on DFA A is naturally deﬁned as A = S, init, Σ, δ, S\F , that is, an automaton in which the accepting and non-accepting states are switched. A complement automaton accepts a complement language: L( A) = Σ \ L(A). Given two DFAs A1, A2 over the same alphabet, there are well-known polynomial techniques (see, e.g., Sipser, 1997) for constructing the intersection automaton A , i.e., L(A ) = L(A1) L(A2), the diﬀerence automaton A\, i.e. L(A\) = L(A1) \ L(A2) and the union automaton A , i.e., L(A ) = L(A1) L(A2). The symmetric diﬀerence between A1 and A2 is computed as the union of L(A1) \ L(A2) and L(A2) \ L(A1). In the context of this paper, a language L over a given alphabet Σ is a subset of the set of all ﬁnite sequences (words) in Σ , denoted as L 2Σ . A language L(A) of a given automaton A is the set of all ﬁnite words accepted by A (we also say that A recognizes L). A language L is regular if it is a language of some DFA A. A DFA A is minimal if it has the smallest number of states out of all DFAs over the same alphabet Σ that recognize L(A) (since a deterministic ﬁnite automaton has exactly one transition from each state on each letter of Σ, the number of transitions is linear in the number of states).

2.2 The L Algorithm

The L algorithm, developed by Angluin (1987b), introduces a framework for iterative learning of DFA. Essentially, L learns an unknown regular language U by iteratively constructing a minimal DFA A such that L(A) = U. The algorithm includes two types of queries, as can be seen in Figure 4:

membership queries (top arrow), namely whether for a given word w Σ , w U, and

conjecture queries (third arrow from top), namely whether a given conjectured automaton A has the property L(A) = U.

The components that answer the two queries are called teacher and oracle , respectively. L begins with the trivial conjecture query corresponding to A0 such that L(A0) = . If the answer is yes (i.e., U is the empty language), L terminates with A0 as the answer. Otherwise it expects the teacher to provide a counterexample string σ such that σ U \L(A0) or σ L(A0) \ U. In the ﬁrst case, we call σ a positive feedback, because it should be added to L(A0). In the second case, we call σ a negative feedback since it should be removed from L(A0). Based on the counterexample, L initiates a series of membership queries, until

Chockler, Kesseli, Kroening & Strichman

it is able to build a new conjecture automaton A1, that has the property that it is 1) a legal DFA and 2) consistent with the answers to the membership queries answered so far. It then makes a new conjecture query with A1. This process repeats, building conjectures A0, A1, A2, . . . until it converges on a DFA Ai for some i such that L(Ai) = U. Additional details about the inner-workings of L are immaterial for the current work. The interested reader is referred to Angluin s original paper (Angluin, 1987b), or one of its improvements (Rivest & Schapire, 1993; Kearns & Vazirani, 1994; Howar, 2012; Isberner et al., 2014). Complexity: The number of conjecture queries is bounded by n, where n is the number of states of the resulting (minimal) DFA. The number of membership queries is bounded by O(km2n3), where k is the size of the alphabet, and m is the length of the longest feedback (counterexample) returned by the oracle during learning.

2.3 Control Flow Graphs

A control ﬂow graph (CFG) is a data structure representing an overapproximation of all paths which may be traversed in a program execution. For every statement in a program the control ﬂow graph provides all potential successor statements which may be executed after the original statement. At runtime for every given program state only a single successor statement is actually executed, depending on loop guards and branching conditions.

2.4 Bounded Model Checking of Software

A key component in our solution is the software analyzer CBMC (Clarke et al., 2004), which we use in the implementation of the teacher and oracle in the learner, as will later be explained. CBMC implements the bounded model checking algorithm as described below. Let P be a program instrumented with an assertion ϕ (an assertion is a statement in the program parameterized by a propositional formula over the variables of P that are currently in scope; it is only used for debugging and has no impact on the execution of the program). The goal of model checking is to determine whether ϕ holds in P. The general model checking problem is undecidable (this can be proved by a simple reduction from the Halting problem), hence all model-checking tools work by solving a decidable fragment of it. SAT-based bounded model checking (BMC) is an approach that solves the model-checking problem up to a bound k, for a given program P with an assertion ϕ (Biere et al., 2003). BMC works by ﬁrst unwinding the symbolic transition relation up to k steps, hence obtaining a program without loops or recursive calls. CBMC also supports setting a diﬀerent bound for each loop and recursive call. It then generates a propositional formula that represents symbolically all the possible traces in this program (this is possible since all traces are of length at most k), and adds to the formula the negation of the assertion. It passes the resulting formula to a SAT solver for determining its satisﬁability. If the BMC formula is unsatisﬁable, the program satisﬁes ϕ within the bound k, and is hence safe with respect to this bound (note that errors might still occur on traces longer than k steps). Otherwise, the program contains errors, and each satisfying assignment of the formula corresponds to a counterexample, i.e., an execution that falsiﬁes ϕ. BMC is the standard approach to model-checking of software and is widely used both in academia and in the industry, the latter mostly for verifying safety-critical code. While

Learning the Language of Software Errors

CBMC (as well as other bounded model-checkers) was originally built in the veriﬁcation community for detecting errors in programs, here we use it for answering queries about programs as part of our learning algorithm.

3. The Language We Learn

Our learning scheme is based on user-deﬁned events, which are whatever a user chooses as their atoms for describing the behaviors that lead to an assertion violation. At source level, events are identiﬁed by instrumenting the code with a Learn(id) instruction at the desired position, where id is an identiﬁer of the event. Typical locations for such instrumentation are at the entry to functions and branches, both of which can be done automatically by our tool. Each location obtains its own unique id. The set of event identiﬁers constitutes the alphabet Σ of the automaton A that we construct. A sequence of events is a Σ-word that may or may not be in L(A), the language of A. For an instrumented program P, a trace π of P induces a Σ-word by a projection to Σ, which we denote by α(π). The language of such a program, denoted L(P), is deﬁned naturally by L(P) .= {α(π) | π P} . (1)

Recall that our goal is to obtain a representation over the set of events Σ of P s traces that violate a given assertion. Let ϕ be that assertion, and denote by π |= ϕ the fact that a given trace π violates ϕ. We now deﬁne

Fail(P) .= {α(π) | π P π |= ϕ} . (2)

In general, this set is irregular and incomputable and, even in cases in which it is computable, it is likely to contain too much information to be useful. However, if we bound the loops and recursion in P, this set becomes ﬁnite, and hence regular and computable. Let b be such a bound, and let

Fail(P, b) .= {α(π) | π P Depth(π) b π |= ϕ} , (3)

where Depth(π) denotes the maximal number of loop iterations or recursive calls made along π. Restricting the set of paths this way implicitly restricts the length of the abstract traces that we consider, i.e., |α(π)| b , where b can be computed from P and b. We also allow users to bound the word length |α(π)| directly with another value bwl. In Sec. 8 we will describe strategies for obtaining such bounds automatically. Based on these bounds we deﬁne

Fail(P, b, bwl) .= {α(π) | π P Depth(π) b |α(π)| bwl π |= ϕ} . (4)

The DFA A, that we learn and present to the user, has the following property for all π P:

Depth(π) b |α(π)| bwl α(π) L(A) α(π) Fail(P, b, bwl) . (5)

Note that this means that given a trace π which has more loop iterations or recursive calls than b, or its projection to Σ is longer than bwl, we give no guarantee regarding whether it is accepted or rejected by A.

Chockler, Kesseli, Kroening & Strichman

(a) Empty conjecture

g, f, assert

(b) Intermediate conjecture

(c) Accepted automaton

Figure 5: A sequence of automata that are conjectured by the learner.

void g(int x) { if (x > 100) exit(0); }

void f() { assert(0); }

int main(int argc, char* argv[]) { for (int i=0; i < argc; i++) g(i); f(); }

Figure 6: The program analyzed by the conjectures in Figure 5.

4. Answering the Queries

Consider the automaton in Figure 5c, which is learned by our system from the program in Figure 6, when b = bwl = 4. This automaton is the second conjecture of the learner. Let us brieﬂy review the steps the learner follows that lead to this conjecture. Initially it has a single state with no transitions, as given in Figure 5a. Then it asks the teacher three single-letter membership queries: whether f, g and assert are in U. The answer is no to all three since, e.g., we cannot reach an assertion failure on a path hitting f alone (in fact the ﬁrst two are trivially false because they do not end with assert).

After answering these queries, the learner has a legal DFA that is consistent with the answers to the membership queries, which appears in Figure 5b: an automaton with one non-accepting state. It poses this automaton as a conjecture to the oracle, which answers no and returns σ = f assert as positive feedback, i.e., this word should be added to L(A). Now the learner poses 12 further membership queries and conjectures the automaton in Figure 5c. The oracle answers yes , which terminates the algorithm.

We continue by describing the teacher and the oracle in our case, namely how we answer those queries. The source code of P is instrumented with two functions: Learn(id) at a location of each Σ-event (recall that id is the identiﬁer of the event), and Learn Assert at the location of the assertion that is being investigated. The implementation of these functions depends on whether we are checking a membership or a conjecture query, as we will now show.

Learning the Language of Software Errors

1: int word[|w|] = {3, 3, 6, 2, 0}; The checked word w

2: int idx = 0;

3: function void Learn(int x) Event

4: if idx |w| word[idx] = x then

5: assume(false); Block paths incompatible with query

6: idx = idx + 1;

7: function void Learn Assert(bool assert Condition)

8: if assert Condition then Assertion fail

9: if idx = |w| 1 then assert(false); w L(A). Answer true to query

10: assume(false); Arrived here at the wrong time: block path.

Figure 7: Pseudocode generated for a particular membership query.

4.1 Membership Queries

A membership query is as follows: given a word w, is there a π P such that α(π) = w and π |= ϕ? Figure 7 gives the pseudocode that we generate for a membership query in this case for the word (3 3 6 2 0). The letter 0 always symbolizes an assertion failure event, and indeed queries that do not end with 0 are trivially rejected. This code, which is an implementation of the instrumented functions mentioned above, is added to P, and the combined code is then checked with the Bounded Model Checker for software CBMC (Clarke et al., 2004). CBMC supports assume(pred) statements, which block any path that does not satisfy the predicate pred. In lines 4 5 we use this feature to block paths that are not compatible with w. Learn Assert is called when the path arrives at the checked assertion, and declares the membership to be true (i.e., w L(A)) if the assertion fails exactly at the end of the word. Figure 8 provides an example illustrating how a user program would be instrumented to invoke these API functions.

Optimisations

We bypass a CBMC call and answer no to a membership query if one of the following holds:

the query does not end with a call to assert;

the query contains more than one call to assert;

w is incompatible with the control-ﬂow graph.

The last optimisation is based on the CFG properties described in Sec. 2.3. If there is no path π in the CFG, restricted by the conﬁgured loop unwinding, such that w = α(π), we skip exploration using membership queries in CBMC, since the CFG models an overapproximation of all possible paths in the program. We call those preliminary p-time checks pre-checks. In our experiments they answer 96% of the membership queries. The remaining 4%, which involve a call to CBMC, dominate the overall run-time. We will describe a heuristic that can signiﬁcantly reduce the number of CBMC calls in Sec. 6.

Chockler, Kesseli, Kroening & Strichman

1: function void main(int input) Entry point

2: if input 10 then

3: F(); Learn(3)

4: if input 20 then

5: F(); Learn(3)

6: G(); Learn(6)

7: H(); Learn(2)

8: Learn Assert(input < 20); Learn(0)

10: function F

11: Learn(3);

12: // . . .

13: function G

14: Learn(6);

15: // . . .

16: function H

17: Learn(2);

18: // . . .

Figure 8: Sample instrumentation of an input program.

4.2 Conjecture Queries

A conjecture query is: given a DFA A, is there a π P such that

α(π) L(A) π |= ϕ, or

α(π) L(A) π |= ϕ ?

The two cases correspond to negative and positive feedback to the learner, respectively. Figure 9 presents the code that we add to P when checking a conjecture query. The candidate A is given in a two-dimensional array, A (representing the edges in δ), and the accepting states of A are given in an array accepting (both are not shown here). We denote by path an array that captures the abstract path, as can be seen in the implementation of learn. Learn Assert simulates the path accumulated so far (lines 7 8) on A in order to ﬁnd the current state. It then aborts if one of the two conditions above holds. In both cases the path path serves as the feedback to the learner.

Eliminating Spurious Words

The conjecture-query mechanism described above only applies to those paths that end with Learn Assert. Other paths should be rejected, and for this we add a trap at the exit points of the program. The implementation of this function appears in Figure 11. It ends with negative feedback if the current path is a preﬁx of a path that a) reaches an accepting state in A (line 6), and b) was not marked earlier as belonging to L(A) (line 7). The reason for this ﬁltering is that the same abstract path can belong to both a real path π P and to a path π P that we chose nondeterministically in this function (see line 9). For example,

Learning the Language of Software Errors

1: function learn(int x) Event

2: assume(idx < bwl);

3: path[++idx] = x;

4: function Learn Assert(bool assert Condition)

5: if assert Condition then learn(0); 0 = the assert letter

6: char state = 0; Initial state

7: for (int i = 0; i < idx; ++i) do

8: state = A[state][path[i]]; Finding current state.

9: if assert Condition accepting[state] then assert(false); neg. feedback

10: if assert Condition accepting[state] then assert(false); pos. feedback

Figure 9: Code added to P for checking conjecture queries.

consider the program in Figure 10, which is the same as the one in Figure 2, with the sole diﬀerence that it has a branch before f(), allowing it to exit in case a particular input has been received. The ﬁnal automaton is unchanged. At the exit point a line before the end (the call to exit), our instrumentation adds a call to the trap function. Now the path g f 0 can occur in two diﬀerent ways: both the path that calls f and the path through the trap function that nondeterministically chooses f 0. Since this word is in the language, we do not want to report it as a negative feedback, and this is the reason for the second condition.

void g(int x) { if (x > 100) exit(0); }

void f() { assert(0); }

int main(int argc, char* argv[]) { for (int i=0; i < argc; i++) g(i); if (argv[0] == 1 ) exit(0); f(); }

Figure 10: The same program as Figure 2, but with a branch before calling f().

We now show that the above implementation indeed guarantees the properties described in Eqn. (5):

Theorem 4.1. The implementations of Figure 9 and 11 ensure that for all π P:

|π| b |α(π)| bwl α(π) L(A) α(π) Fail(P, b, bwl) .

Proof. We need to show that for all π P

|π| b |α(π)| bwl α(π) L(A) |π| b |α(π)| bwl π |= ϕ

The ﬁrst two conjuncts are trivially true. We prove the third by contradiction. Thus assume that π |= ϕ. We separate the discussion to two cases:

α(π) ends with a Learn Assert statement. In the (last) conjecture query π calls that function, which appears in Figure 9. Since π |= ϕ the guard in line 5 is

Chockler, Kesseli, Kroening & Strichman

1: function Learn trap

2: char state = 0;

3: for (; idx < bwl; ++idx) do

4: for (int i = 0; i idx; ++i) do Compute current state in A

5: state = A[state][path[i]];

6: if accepting[state] then state is an accepting state

7: if path L(A) is known then assume(false); Block path

8: assert(false); Negative feedback

9: path[idx] = non-deterministic element from Σ;

Figure 11: Learn trap is called at P s exit points. It gives negative feedback to conjecture queries in which w L(A) such that w does not correspond to any path in P.

false. In line 8 state, in the last iteration of the for loop, is accepting, because we know from the premise that α(π) L(A). This fails the assertion in line 9, and the conjecture is rejected. Contradiction.

Otherwise, in the (last) conjecture query, π calls the trap function of Figure 11. In line 5 state, in the end of the for loop, is accepting, again because we know from the premise that α(π) L(A). The condition in line 7 is false, and the assert(0) in the following line is reached. The conjecture is rejected. Contradiction.

We need to show that for all π P

|π| b |α(π)| bwl α(π) L(A) |π| b |α(π)| bwl π |= ϕ

Again, the ﬁrst two conjuncts are trivially true, and we prove the third by contradiction: assume that α(π) L(A). Since π |= ϕ, π must end with a call to Learn Assert. Hence in the (last) conjecture query π calls that function, which appears in Figure 9. By our premise, the state is not accepting. Hence the condition in line 10 is met, and α(π) is returned as a positive feedback to L , which adds it to A. Contradiction.

The trap function has an additional beneﬁt: it brings us close to the following desired property for every word w Σ :

w L(A) |w| bwl = π P. α(π) = w . (6)

That is, ideally we should exclude from L(A) any word w, |w| bwl that does not correspond to a path in P. The reason that this trap function does not guarantee (6) is that it only catches a word w L(A) if there is a path π P to an exit point, such that α(π) is a preﬁx of w. In other cases, the user can check the legality of w L(A) either manually or with a membership query.

5. Usage Scenarios for the Learning Framework

We foresee three distinct use cases for our framework. The ﬁrst is to help understand software errors. Given such an error, we build a DFA that presents an abstraction of the

Learning the Language of Software Errors

language of error to user-deﬁned events, as deﬁned in Theorem 4.1. The DFA represents the set of error traces in a concise way and is amenable to standard analyses on DFAs, such as the computation of dominators and doomed states and transitions. These analyses may aid in understanding the root cause of errors. The second use case is program explanation, i.e., providing a DFA that describes abstractly the behavior of a program fragment. The motivation for this should be clear to any programmer, because programs are frequently diﬃcult to understand. This is either because of the sheer complexity of the implementation, because of a change in ownership of the code, or because the program was, at least in part, generated automatically. By adding a failing assertion to the exit point of the program, our framework produces a DFA that represents a bounded regular abstraction of the program behavior with respect to important events (as deﬁned by the user or deﬁned automatically). Finally, our framework can be used for assisting in merging software development branches. In this common scenario, several developers make changes to diﬀerent branches of the same (version controlled) source code. Often, when the developers attempt to merge their changes back into the parent branch, automatic merging is performed by the version control system. This can introduce unexpected behavior. For example, consider the source code in Figure 12, representing an original program, its two branches developed independently, and the result of the automatic merge (this example is based on Merge Case, 2012). Note that the merge operation creates an unexpected behavior, where func Z() calls func C(), despite this not being the intent of either developer.

main { ... func A(); func B(); ... }

main { ... func A(); ... func Z(); }

func Z() { func B(); }

(b) Branch A

main { func A(); if (guard) func B(); else func C(); ... }

(c) Branch B

main { ... func A(); ... func Z(); }

func Z() { if (guard) func B(); else func C(); ... }

Figure 12: The eﬀects of automatic merge in a version control system.

Using our framework, both versions and the merged program are represented as DFAs, and the diﬀerence between the merged program and each branch is computed as a diﬀerence between automata (see Sec. 2). Figure 13 and Figure 14 draw attention to the new behavior introduced by the automatic merge. The changes are easy to see in small examples; in arbitrarily large programs, however, the issues introduced by an automatic merge could be hard to identify. Moreover, this behavior

Chockler, Kesseli, Kroening & Strichman

Learn assert

Figure 13: Behavior not in Branch A

Learn assert

Figure 14: Behavior not in Branch B

in larger programs might be diﬃcult to understand due to a lack of single ownership over the code. This mechanism can also be applied to merge conﬂicts (i.e., when diﬀerent versions of code cannot be merged automatically), in order to visually display diﬀerences between branches, rather than annotating the repository code directly with conﬂict markers.

6. Performance Optimization: Lazy Learning

In the previous sections, we saw that the learning algorithm requires to answer membership queries which may amount in our case to a computationally expensive CBMC call. More generally, while it is known that with an O(1)-time teacher and oracle, DFA-learners are polynomial in the size of the resulting automaton A, in all applications we are aware of in veriﬁcation the teacher and oracle are exponential, as they involve some type of a formal semantic reasoning about the program. Examples include assume-guarantee reasoning (Chaki & Strichman, 2008), model learning (Fiterau-Brostean et al., 2016) and synthesis (Drachsler Cohen et al., 2017). The method we suggest here, lazy learning, is relevant to any application of DFA-learners in which answering membership queries is computationally expensive or impossible. We emphasize that we do not intervene in the learning algorithm itself (i.e., the generation of the queries), and this is why this method is orthogonal to the variant of the learning algorithm that is used. The idea is to guess or approximate the answer to membership queries whenever they are diﬃcult to answer; then we detect wrong guesses during conjecture queries, cache the correct answer so as not to repeat this wrong guess, and ﬁnally restart the process with the updated cache. What should that guess be? In the case of error explanation, we guess no , since most paths do not fail the assertion and hence do not belong to the language that we wish to learn. On the other hand in case of program explanation our default guess is yes because all paths fail the assertion assert(false) and hence belong to the language (the only abstract paths that do not belong to the language are those that do not have a corresponding concrete path in the program). In the following we will only focus on the case of error explanation, and hence by default the guess is no . Figure 15 visualizes our lazy learning algorithm. We recommend the reader to read the caption. A summary of the part of the algorithm that answers the conjecture query, while ﬁxing previous wrong guesses to the membership queries (the middle part in Figure 15), appears in Figure 16. The queries in this algorithm (lines 2 and 7) are not normal membership queries, rather the program is restricted to paths that match one of the membership queries for which the algorithm received possibly incorrect answers. We call such queries batch

Learning the Language of Software Errors

with the cache queries

guesses Learner

A : L(A) = U

Answer = yes/(no+feedback)

Lazy w U? π : α(π) = w? Filter with

consistent A guess that is Membership

guesses Fix wrong

queries Conjecture

L(A) = Fail(P, b)? L(A) = U?

empty UCQueries

Additional correct entries in the cache,

Guessed answer

Answer = no

P Program CBMC

Figure 15: The three main elements of lazy learning: (top) membership queries with polynomial-time pre-checks and guessing. In our case the pre-checks can only answer no to a membership query. The guesses are consistent with the cache of correct answers; (middle) a ﬁxing stage (see Figure 16) of the wrong guesses that ends up with an enlarged cache of correct answers; and (bottom) a conjecture query. UCQueries holds the set of Un Conﬁrmed queries. After a ﬁxing step, the state after the last conjecture is reloaded from memory and the membership queries are answered against an updated cache. Note that the learner is adaptive, which means that the queries may be diﬀerent as a result of the diﬀerent answers to the membership queries. Conjecture queries (bottom) is invoked only when we know that the conjecture was built with fully correct answers to the membership queries.

membership checks. Technically this is done by adding automatically an assume statement to the C ﬁle with which we check the conjecture query, with a disjunction between paths. For example, if we had two membership queries, (1, 2) and (2, 3) where 1,2,3 are letters in Σ, for which we guessed the answer, our tool will now add an assume statement

assume((path[0] == 1 && path[1] == 2) || (path[0] == 2 && path[1] == 3))

where path is an array recording the events in the code corresponding to Σ. CBMC checks the membership property under this assumption (lines 2 and 7), and hence any feedback that it ﬁnds, corresponds to one of those words. In our case since we always guess no , the counterexample must be a positive feedback. We update our cache with the correct answer in line 5, remove that word from the assume statement in line 6 and re-run CBMC in line 7. We repeat this process as long as CBMC ﬁnds new positive feedbacks among the unconﬁrmed queries. Note that the number of CBMC calls is equal to the number of wrong guesses plus one. Hence the success of this technique depends on the success rate of the guess. Once all the wrong answers are ﬁxed, we backtrack the learner to the last conjecture. For this purpose we always save the state of the learner at the last conjecture, in a

Chockler, Kesseli, Kroening & Strichman

1: function fix( )

2: feedback = Batch Membership(UCQueries);

3: if (feedback.empty()) then return ;

5: ﬁx the cache according to the feedback;

6: UCQueries.remove(feedback);

7: feedback = Batch Membership(UCQueries);

8: while (!feedback.empty())

9: UCQueries.clear();

10: ls.table=Prev Conjecture Table; ls is the learner object

Figure 16: Lazy learning with local restarts. In the case of our tool (called pv), where the guessed answers are always false , entries in the cache are always ﬁxed from no to yes .

1: function Learn Upto Bound(Program P)

2: for b [1 . . . bmax] do

3: for bwl [bwl min . . . bwl max] do

4: A = learn P with b and bwl;

5: if A = Aprev and A does not have back edges then return A;

6: Aprev = A;

Figure 17: The autonomous discovery of the appropriate bounds

variable Previous Conjecture Table. Since the cache has been updated in the process, we are guaranteed to advance to the next conjecture with all guesses either corrected or conﬁrmed.

7. Usability Optimizations

In this section we describe further features that we implemented in pv in order to improve its usability. We ﬁrst describe how we determine the bounds automatically, and then how we remove unnecessary edges from the resulting automaton.

7.1 Determining the Bounds

The automatic estimation of suitable values for both the loop bound b and the word length bwl contributes signiﬁcantly to the usability of our framework. Our strategy for this is illustrated in Figure 17. We let b range between 1 and bmax, where bmax is relatively small (4 in our default conﬁguration). This reﬂects the fact that higher values of b may have a negative impact on performance, and that in practice, low values of b are usually suﬃcient for triggering the error. As an initial value for bwl (bwl min), we take a conservative estimation of the shortest word possible, according to the length of a shortest loop-free path to the assertion violation on the control-ﬂow graph of P. We increase the value of bwl up to a maximum of bwl max, which is user-deﬁned. The value of bwl max reﬂects a user s estimation of how long these words can be before the explanation becomes too long for her to understand. Recall that the value of b implies a bound on the word length (we denoted it b in Sec. 3), and hence for a given b, increasing the explicit bound on the word length bwl beyond a

Learning the Language of Software Errors

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0 10 20 30 40 50 60 70 80 90 100 110 120

Edges/states in A

Figure 18: Size of A (bubble-sort example).

certain value is meaningless. In other words, for a given b, the process of increasing bwl

converges. Until convergence, the number of states of A can both increase and decrease as a result of increasing bwl (it can decrease because paths not belonging to the language are caught in the conjecture query, which may lead to a smaller automaton). Figure 18 demonstrates this fact for one of the benchmarks (bubble sort with b = 2). We are not aware of a way to detect convergence in P-time, so in practice we terminate when two conditions hold (see line 5): a) A has not changed from the previous iteration, and b) A does not contain edges leaving an accepting state ( back edges ). Recall that a failing assertion aborts execution, and hence no path can continue beyond it. Therefore, the existence of such edges in A indicates that increasing b, bwl or both should eventually remove them. These conditions can also help the user to decide whether to continue increasing bwl max. The incremental nature of Learn Upto Bound is exploited by our system for improving performance. We maintain a cache of words that have already been proven to be in U, and consult it as the ﬁrst step of answering membership queries. Negative results from membership queries can only be cached and reused if this result does not depend on the bound. For example, the optimisation mentioned in Sec. 4, by which we reject words that are not compatible with the control-ﬂow graph, does not depend on the bound and hence can be cached and reused. In our experiments caching reduces the number of membership queries sent to CBMC by an average of 32%.

7.2 Post-processing

Our system performs the following post-processing on A in order to assist the user:

Marking dominating edges: edges that represent events that must occur in order to reach the accepting state. In order to detect these edges, we remove each event in turn (recall that the same event can label more than one edge), and check whether the accepting state is still reachable from the root.

Marking doomed states: states such that the accepting state is inevitable (Hoenicke et al., 2010).

Chockler, Kesseli, Kroening & Strichman

Removing the (non-accepting) sink state and its incoming edges: Such a state always exists, because the outgoing edges of the accepting state must transition to it (because, recall, an assertion failure corresponds to aborting the execution). Missing transitions, then, are interpreted as rejection.

7.3 Using pv for Debugging

Our tool pv has an option of storing traces and using them for debugging later. If it is executed with the --debug option, the executions that correspond to membership queries are stored and can be used for guiding the debugging process. This option is useful when a particular execution demonstrating the assertion violation is only possible for a speciﬁc combination of parameters, such as values of input variables, which are not captured by the learned automaton. In this case, there are many executions corresponding to a particular sequence of important events represented as a trace in the automaton, and only some of them may actually exhibit the assertion violation. Having a debug option allows the user to replay the particular execution that demonstrates the error and to see the values of other relevant parameters in this execution.

8. Experimental Evaluation

Multiple programs with errors and the corresponding DFA that explain those errors can be viewed in our online supplement (http://www.cprover.org/learning-errors/, 2015). In that web site the user can check the eﬀect of changing the bounds b and bwl. We encourage the reader to check that page in order to appreciate the value of visual explanations. In this section we describe our experimental setup and results of implementing and executing the optimisation techniques described in Sec. 6 on a set of benchmarks. Thus the focus here is on performance.

We implemented lazy learning in pv, the error/program visualization tool (Chapman et al., 2015). The implementation of the algorithm uses the automata library libalf (Bollig et al., 2010) as the learner. This library supports L , its improvement RS Rivest and Schapire (1993) and several other variants; we experimented with the ﬁrst two. pv uses the bounded software model checker CBMC 5.4 (Clarke et al., 2004) for answering queries. For each benchmark we measured both the run-time and the number of CBMC calls. We evaluated two conﬁgurations:

baseline CBMC only, as in (Chapman et al., 2015).

lazy the method described in Sect. 6.

8.2 Results

We applied pv for visualizing errors in a set of seven software veriﬁcation benchmarks that are relatively easy as veriﬁcation targets for CBMC. They were drawn from three sources: the Competition on Software Veriﬁcation (Beyer, 2015), the Software-artifact Infrastructure

Learning the Language of Software Errors

p 97.9 % Runtime Decrease 30.06 % Saved CBMC Queries 75.31 %

Table 1: Saved CBMC queries and runtime decrease for lazy learning, comparing to a run with standard learning.

Repository2, and a Docking Approach program: a program describing the behavior of a space shuttle as it docks with the International Space Station (an open-source version of the NASA system Docking Approach). Each of these programs contain a single instrumented assertion. These are the same benchmarks used in (Chapman et al., 2015), other than the addition of the JQ benchmark, which is a lightweight command-line JSON parser. We use JQ as a motivating industrial example which, as we explain below, beneﬁts greatly from our lazy algorithms. Each of these benchmarks is executed at diﬀerent loop unwinding and word length settings, resulting in 83 data points. An Excel sheet with detailed results can be downloaded from the project s website (http://www.cprover.org/learning-errors/, 2015). Tab. 1 summarizes the average runtime decrease as well as the percentage of saved CBMC queries when using lazy learning. We denote by p the ratio of correct guesses. As we mentioned in Sec. 6, in our application domain, the strategy of guessing no performs best for learning the language of error, and yes for program explanation, if there is no information on the query. As we can see from Tab. 1, the probability of an answer being correct with this strategy is more than 0.9. In our experiments, an average of 75 % of all CBMC queries can be avoided using lazy learning. This results in an average runtime decrease of 30 %. This suggests that, while lazy learning invokes CBMC fewer times, the batch CBMC queries create larger formulas on average and thus take more time to solve. Hence the runtime decrease does not exactly match the reduction in CBMC invocations. Tab. 2 shows the number of saved CBMC queries and the runtime improvement for each of the examined benchmarks. Note that lazy learning leads to a runtime increase in the benchmark sll to dll rev false, owing to the fact that the CBMC queries only amount to a few seconds for this benchmark, and that the overhead of constructing more complicated batch queries negates any potential beneﬁts. While this represents an outlier in our benchmark set, it illustrates that lazy learning is not always beneﬁcial. In addition to those experiments, Appendix A describes in detail an industrial use case that we experimented with, based on the JQ JSON parser. It describes not only performance, but also the process of using pv that helped us detect a previously unknown bug in this widely used application.

9. Conclusions and Future Work

Our deﬁnition of Fail(P) in (2) captures the language of error , but this language is, in the general case, not computable. We have presented a method for automatically learning a DFA, A, that captures a well-deﬁned subset of this language (see Theorem 4.1), for the purpose of assisting the user in understanding the cause of the error. More generally, the

2. http://sir.unl.edu

Chockler, Kesseli, Kroening & Strichman

p (%) Runtime Saved CBMC Decrease (%) Queries (%) Docking Approach Example Ext 100.0 81.54 85.45 ifstofunctions defroster 98.5 11.58 44.50 merge sort false 95.5 23.61 75.83 sll to dll rev false 99.0 11.92 89.58 tcas auto instrumented 100.0 5.32 92.10 JQ 96.5 24.88 63.77 schedule 95.7 75.44 75.97

Table 2: Experimental results per benchmark

same technique can be used for visualising the behavior of a program or parts thereof, and for assisting in merging code.

There are multiple venues for future research. On the application side, perhaps a similar method can be used for visually comparing diﬀerent versions of a program (by presenting an automaton that captures the behavior in the symmetric diﬀerence between them).

On the more theoretical side, an interesting future work is to adapt the framework to learn ω-regular languages, represented by B uchi automata (see Farzan, Chen, Clarke, Tsay, & Wang, 2008; Angluin & Fisman, 2016 for extensions of DFA-learners to ω-regular languages). This extension would enable the learning of behaviors that violate the liveness properties of non-terminating programs. Another future direction is learning non-regular languages, as it will enable the learning of richer abstract representations of the language of error for a given program. Context-free grammars are of particular interest because of the natural connection between context-free grammars and the syntax of programming languages; some subclasses of context-free grammars have been shown to be learnable, such as k-bounded context free grammars (Angluin, 1987a), providing us with the possibility of harnessing these algorithms in our framework. The class of context-free grammars without restrictions is not believed to be learnable (Angluin & Kharitonov, 1995).

Finally, since one of the main goals of our framework is to present the language of error (or interesting behavior) in a compact, easy to analyze and understandable way, then clearly small automata are preferable. Even for a given alphabet Σ, we believe it should be possible to reduce the size of the learned DFA A, based on the observation that we do not care whether a word w such that π P. |π| b |α(π)| bwl α(π) = w is accepted or rejected by the automaton. Adding a don t care value to the learning scheme requires a learning mechanism that can recognize three-valued answers (see Leucker & Neider, 2012 for a learning algorithm with inconclusive answers).

Acknowledgments

This work was supported in part by the Google Faculty Research Award 2014. A preliminary version of the ﬁrst part of the paper appeared in the Proceedings of the 13th International Symposium on Automated Technology for Veriﬁcation and Analysis (ATVA 2015).

Learning the Language of Software Errors

Appendix A. A Detailed Case Study: JQ

The JQ3 project is a lightweight command-line JSON processor. JQ is one of the most popular C projects on Git Hub with over 5000 stars, and is widely used in industrial and productive settings. It is comprised of over 75k lines of C code and includes a multitude of complex heap data structures. The latter makes JQ a very challenging target for static analysis, since even simple operations such as string concatenation are implemented using sizeable heap data structures. The goal of our case study is to illustrate that for a large industrial program such as JQ, the improvements to the error learning algorithm can reduce the analysis time signiﬁcantly. In order to demonstrate the increased learning speed, we picked a known and at the time of our ﬁrst experiments open bug from the JQ bug tracker page4. We give a brief bug description in the following paragraph and then describe how we used the lazy error learning framework to create a visual description of the bug which allowed us to understand and ﬁx it.5

The bug occurs during the module-loading phase, where JQ is linking external modules into the current JSON context. JQ users can provide library search paths in which JQ will look for these additional modules. Users can specify absolute as well as relative paths for this purpose. However, depending on the input syntax, module lookup using relative paths fails. In particular, using, e.g., ./mod instead of mod will trigger this bug. Figure 19 shows both a successful and a failing invocation of JQ.

> jq -L mod -n import "module0" as check; check::check > true

> jq -L ./mod -n import "module0" as check; check::check > jq: error: module not found: module0 > > jq: 1 compile error

Figure 19: Command line example triggering the JQ bug.

In order to isolate and ﬁx this bug, we ﬁrst used a traditional GDB debugging session to identify the function build lib search chain(...), which is responsible for creating the fully expanded list of library search paths from the user input. Expanded in this context means absolute ﬁle system paths with all JQ environment variables replaced. Our debugging session further showed that this process works for some invocations of build lib search chain(...), even with the ./ path syntax, but fails for others. Our assumption was that this discrepancy must be linked to the values of the three arguments passed into the function. We created user-deﬁned learning events monitoring the status of these arguments and analysed the program using pv. All generated automata are published on the project s website (http://www.cprover.org/learning-errors/, 2015).

3. https://github.com/stedolan/jq 4. https://github.com/stedolan/jq/issues/817 5. By the time we ﬁxed the bug a formal patch was already committed.

Chockler, Kesseli, Kroening & Strichman

The resulting automaton highlighted which events and input parameter manipulations were necessary to reach the bug in build lib search chain(...). Using this information we successfully implemented a patch ﬁxing this issue. For this particular benchmark, the lazy learning algorithm described in Sec. 6 reduced the runtime of pv from over 59 minutes down to 28 minutes at user unwind limit 4 and maximum word length 12. On average, a runtime decrease by 25% was achieved across all unwind limits and word lengths. More detailed results can be found in (http://www.cprover.org/learning-errors/, 2015).

Alur, R., Cern y, P., Madhusudan, P., & Nam, W. (2005). Synthesis of interface speciﬁcations for Java classes. In Principles of Programming Languages (POPL) (pp. 98 109). ACM. Angluin, D. (1987a). Learning k-bounded context-free grammars (Tech. Rep.). Dept. of Computer Science, Yale University. Angluin, D. (1987b). Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 87-106. Angluin, D., & Fisman, D. (2016). Learning regular omega languages. Theor. Comput. Sci., 650, 57 72. Angluin, D., & Kharitonov, M. (1995). When won t membership queries help? J. Comput. Syst. Sci., 50(2), 336 355. Argyros, G., Stais, I., Kiayias, A., & Keromytis, A. D. (2016). Back in black: Towards formal, black box analysis of sanitizers and ﬁlters. In IEEE Symposium on Security and Privacy (S&P) (pp. 91 109). IEEE Computer Society. Ball, T., Naik, M., & Rajamani, S. K. (2003). From symptom to cause: localizing errors in counterexample traces. In Principles of Programming Languages (POPL) (pp. 97 105). Basu, S., Saha, D., Lin, Y., & Smolka, S. A. (2003). Generation of all counter-examples for push-down systems. In Formal Techniques for Networked and Distributed Systems (FORTE) (pp. 79 94). Springer. Beer, I., Ben-David, S., Chockler, H., Orni, A., & Treﬂer, R. J. (2012). Explaining counterexamples using causality. Formal Methods in System Design, 40(1), 20 40. Beyer, D. (2015). Software veriﬁcation and veriﬁable witnesses report on SV-COMP 2015. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS) (pp. 401 416). Springer. Biere, A., Cimatti, A., Clarke, E. M., Strichman, O., & Zhu, Y. (2003). Bounded model checking. Advances in Computers, 58, 117 148. Bollig, B., Katoen, J., Kern, C., Leucker, M., Neider, D., & Piegdon, D. R. (2010). libalf: The automata learning framework. In Computer Aided Veriﬁcation (CAV) (pp. 360 364). Springer. Botin can, M., & Babi c, D. (2013). Sigma*: Symbolic learning of input-output speciﬁcations. In Principles of Programming Languages (POPL) (pp. 443 456). ACM. Chaki, S., & Strichman, O. (2008). Three optimizations for assume guarantee reasoning with L*. Formal Methods in System Design, 32(3), 267 284. Chapman, M., Chockler, H., Kesseli, P., Kroening, D., Strichman, O., & Tautschnig, M. (2015). Learning the language of error. In Automated technology for veriﬁcation and

Learning the Language of Software Errors

analysis (ATVA) (Vol. 9364, pp. 114 130). Springer. Clarke, E., Kroening, D., & Lerda, F. (2004). A tool for checking ANSI-C programs. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS) (Vol. 2988, pp. 168 176). Springer. Copty, F., Irron, A., Weissberg, O., Kropp, N. P., & Kamhi, G. (2003). Eﬃcient debugging in a formal veriﬁcation environment. STTT, 4(3), 335 348. Drachsler-Cohen, D., Shoham, S., & Yahav, E. (2017). Synthesis with abstract examples. In Computer Aided Veriﬁcation (CAV), part I (Vol. 10426, pp. 254 278). Springer. Farzan, A., Chen, Y., Clarke, E. M., Tsay, Y., & Wang, B. (2008). Extending automated compositional veriﬁcation to the full class of omega-regular languages. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS) (pp. 2 17). Springer. Fiterau-Brostean, P., Janssen, R., & Vaandrager, F. (2016). Combining model learning and model checking to analyze TCP implementations. In Computer Aided Veriﬁcation (CAV) (pp. 454 471). Springer. Giannakopoulou, D., Rakamaric, Z., & Raman, V. (2012). Symbolic learning of component interfaces. In Static Analysis 19th International Symposium (SAS) (Vol. 7460, pp. 248 264). Springer. Groce, A., Chaki, S., Kroening, D., & Strichman, O. (2006). Error explanation with distance metrics. STTT, 8(3), 229-247. Hoenicke, J., Leino, K., Podelski, A., Sch af, M., & Wies, T. (2010). Doomed program points. Formal Methods in System Design, 37(2 3), 171 199. Hopcroft, J., Motwani, R., & Ullman, J. (2000). Introduction to automata theory, languages, and computation (2nd edition). Addison-Wesley. Howar, F. (2012). Active learning of interface programs (Unpublished doctoral dissertation). TU Dortmund University. (2015). (http://www.cprover.org/learning-errors/) Isberner, M., Howar, F., & Steﬀen, B. (2014). The TTT algorithm: A redundancy-free approach to active automata learning. In B. Bonakdarpour & S. A. Smolka (Eds.), Proceedings of RV (Vol. 8734, pp. 307 322). Springer. Kearns, M., & Vazirani, U. (1994). An introduction to computational learning theory. MIT Press. Lee, T. (2012). The problem of automatic code merging. (http://www.personal.psu.edu/ txl20/blogs/tks tech notes/2012/03/the-problem-of-automatic-code-merging.html) Leucker, M., & Neider, D. (2012). Learning minimal deterministic automata from inexperienced teachers. In Proc. of 5th ISo LA (pp. 524 538). Springer. Rivest, R. L., & Schapire, R. E. (1993). Inference of ﬁnite automata using homing sequences. In S. J. Hanson, W. Remmele, & R. L. Rivest (Eds.), Machine learning: From theory to applications cooperative research at Siemens and MIT (Vol. 661, pp. 51 73). Springer. Shoham, S., Yahav, E., Fink, S. J., & Pistoia, M. (2008). Static speciﬁcation mining using automata-based abstractions. IEEE Trans. Software Eng., 34(5), 651 666. Sipser, M. (1997). Introduction to the theory of computation. PWS Publishing Company.