# concurrency_debugging_with_maxsmt__3efc7320.pdf

The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)

Concurrency Debugging with Max SMT

Miguel Terra-Neves,1 Nuno Machado,2 Inˆes Lynce,1 Vasco Manquinho1

1INESC-ID / Instituto Superior T ecnico, Universidade de Lisboa, Portugal 2Teradata Spain and HASLab - INESC TEC / Universidade do Minho, Portugal {neves, ines, vmm}@sat.inesc-id.pt nuno.machado@teradata.com

Current Maximum Satisfiability (Max SAT) algorithms based on successive calls to a powerful Satisfiability (SAT) solver are now able to solve real-world instances in many application domains. Moreover, replacing the SAT solver with a Satisfiability Modulo Theories (SMT) solver enables effective Max SMT algorithms. However, Max SMT has seldom been used in debugging multi-threaded software. Multi-threaded programs are usually non-deterministic due to the huge number of possible thread operation schedules, which makes them much harder to debug than sequential programs. A recent approach to isolate the root cause of concurrency bugs in multi-threaded software is to produce a report that shows the differences between a failing and a non-failing execution. However, since they rely solely on heuristics, these reports can be unnecessarily large. Hence, reports may contain operations that are not relevant to the bug s occurrence. This paper proposes the use of Max SMT for the generation of minimal reports for multi-threaded software with concurrency bugs. The proposed techniques report situations that the existing techniques are not able to identify. Experimental results show that using Max SMT can significantly improve the accuracy of the generated reports and, consequently, their usefulness in debugging the root cause of concurrency bugs.

1 Introduction Nowadays, multicore and multiprocessor architectures have become the dominant platforms. To take full advantage of these computer architectures, one has to resort to parallel and concurrent programming. However, multi-threaded programs are much more challenging to develop than sequential programs, as they typically allow a huge diversity of thread operation schedules that can yield different outcomes. Among the many possible schedules, some may result in undesirable behavior, like a crash or incorrect output (Lu et al. 2008). Such failing schedules (i.e. schedules that cause the program to fail) are typically non-deterministic and hard to reproduce, which renders the debugging of multi-threaded programs notoriously difficult. A concurrency bug occurs when some shared memory is incorrectly changed due to a specific schedule of operations of concurrent threads being executed. A large body

Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

of work has focused on the ability to reproduce concurrency bugs in multi-threaded programs, namely deterministic execution (Berger et al. 2009; Devietti et al. 2010; Olszewski, Ansel, and Amarasinghe 2009) and deterministic record and replay (Lee et al. 2009; Altekar and Stoica 2009; Huang, Zhang, and Dolby 2013). However, simply replicating a failing schedule does not provide any special insight into the reason why the program has failed. As a result, finding the bug in the code that allows the failure to surface can still be a daunting task. One way to ease the diagnosis of concurrency bugs is to report the differences between a failing execution and a non-failing execution. Given a failing schedule and a failure condition, some systems try to produce a similar non-failing schedule by systematically reordering pairs of thread events in the failing execution. For each alternative schedule resulting from a thread event swapping, a Satisfiability Modulo Theories (SMT) model is built that encodes the program s logic and synchronization constraints of the original execution and a condition representing the absence of the failure. Next, an SMT solver is used to check if there is an alternate, non-failing schedule that satisfies the constraints. After finding a feasible non-failing schedule, a report is generated that highlights the differences between the original failing schedule and the non-failing. Hence, the programmer only needs to focus on the differences in order to fix the bug. Although effective in some cases, this technique has several limitations. In particular, if more than two pairs of operations need to be reordered simultaneously in order to prevent the failure, then there is no guarantee that a non-failing schedule will be generated. Alongside, there are systems that expose and isolate concurrency errors that depend on both the thread operation schedule and execution path. However, these systems do not guarantee that the failing and non-failing executions exhibit similar orderings of operations. As a consequence, the generated reports can be unnecessarily long and hard to analyze. Recently, new Maximum Satisfiability Modulo Theories (Max SMT) algorithms have been proposed. However, its application in many areas is still in its infancy. In this paper, we propose the usage of Max SMT to improve the effectiveness of reports produced by systems that analyze concurrency bugs in multi-threaded programs. In particular, we argue that a minimal explanation for the occurrence of a con-

(initially x = 0, y = 1, z = 1) T1

1. T2.join()

2. T3.join()

3. T4.join()

4. T5.join()

5. assert(x == 0)

T2 1. x = x + y

1. x = x + z

T4 1. y = y + 1

1. z = z + 1

Figure 1: Schedule dependent bug in multi-thread program.

currency bug in a multi-threaded program can be obtained by maximizing the similarities between the failing schedule and non-failing schedules. Hence, the main contributions of this paper are as follows: (i) three complementary metrics for measuring schedule similarity and a Max SMT model that optimizes these metrics in order to produce a report with a minimal explanation of a concurrency bug; (ii) a novel approximation algorithm that considerably speeds up the Max SMT model solving procedure by exploiting domain knowledge, and (iii) an extensive experimental evaluation, with benchmark and real-world concurrency bugs, showing that the proposed model produces more accurate reports than the previous approaches based on heuristics. The rest of the paper is organized as follows: Section 2 provides background on concurrency bugs, SMT and other concepts necessary for this work. Section 3 exemplifies limitations of current systems for concurrency debugging. Section 4 reviews Max SMT and describes the model for obtaining minimal reports. Section 5 evaluates the contributions of the paper and discusses the results of the experiments. Finally, Section 6 concludes the paper.

2 Background This section defines the concepts used in the remainder of the paper. First, we review concurrency bugs. Next, we introduce the Satisfiability Modulo Theories formalism. Finally, two types of reports on concurrency bugs are described.

2.1 Concurrency Bugs A concurrency bug is an error in multi-threaded code that may cause the program to fail depending on the order in which thread operations are executed. An operation is either a read/write of a variable or a synchronization primitive (e.g. lock, unlock, fork, etc). As an example, consider the multi-threaded program with five threads (T1 T5) shown in Figure 1. The program has three shared variables: x, y and z. Variable x is accessed by T1, T2 and T3, y is accessed by T2 and T4, and z by T3 and T5. T1 simply waits for all other threads to terminate and then checks if x is 0. This program can fail under two different scenarios, as the assertion x == 0 is violated when T4 executes before T2 or T5 before T3, but not otherwise. For instance, a failing schedule can occur if the sequence of execution is: <T4, T5, T2, T3>, since the value of x is 2 when the assertion is tested. In this case, the failure is caused by two concurrency bugs, namely a data race on y between T2 and T4 and a data race on z between T3 and T5. A data race occurs when two

different threads access the same variable without synchronization, and at least one of the accesses is a write.1

For this data race, T2 reads the value of y concurrently with the write on y by T4. Since the execution is nondeterministic, one may observe a schedule where T4 increments y before T2 adding y to x, thus causing x to be greater than 0 at the end of the program. However, if T2 increments x prior to T4 setting y to 2 (and T5 also executes before T3 setting z to 0), then x = 0 at the end of the execution and the program satisfies the assertion. In this example, each thread executes the exact same sequence of operations in both the failing and non-failing executions. Hence, since the difference between the executions concerns only the thread interleaving, the bug is said to be strictly schedule dependent. However, not all concurrency bugs are strictly scheduledependent. For example, consider the multi-threaded program in Figure 2, which has three threads (T1-T3) and four shared variables (x, y, z and w). The program will fail if x is not greater than 0 at the end of T1 s execution. As depicted in Figure 3, it may be the case that T2 runs before T1, causing w to be incremented at line 2 of T1, and T3 sets y = 0 in-between lines 4 and 5 of T1, i.e. after T1 increments w but before checking if y == 0. This interleaving will cause x to be decremented to 0 at line 6 of T1 and, consequently, violate the assertion at line 7. Notice that the failing and non-failing executions of this program necessarily differ in their control-flow paths. As such, the bug is deemed path-schedule dependent.

2.2 Satisfiability Modulo Theories A propositional formula φ in Conjunctive Normal Form (CNF), defined over a set X of n Boolean variables {x1, . . . , xn}, is a conjunction of clauses, where a clause is a disjunction of literals. A literal is either a variable xi or its complement xi. The Propositional Satisfiability (SAT) problem consists of deciding whether there exists an assignment to the variables in X such that the formula φ is satisfied. For ease of explanation, we often use the set notation to represent the conjunction of clauses and formulas. As an example, consider the CNF formula φ = {( x1 x2), (x1 x2 x3), ( x2 x3)}. In this case, a possible satisfying assignment would be x1 = 0, x2 = 0, x3 = 1, since it satisfies all clauses in φ. The Satisfiability Modulo Theories (SMT) problem is a generalization of SAT. Given a decidable first order theory T , a T -atom is a ground atomic formula in T . A T -literal is either a T -atom t or its complement t. A T -formula is similar to a propositional formula, but a T -formula is composed of T -literals instead of propositional literals. Given a T -formula φ, the SMT problem consists of deciding if there exists a total assignment over the variables of φ such that φ is satisfied. Depending on the theory T , the variables can be of type integer, real, Boolean, etc. Using SMT to encode multi-threaded executions. Some debugging systems represent multi-threaded executions as SMT formulations over symbolic variables (Huang, Zhang,

1We refer to the literature for a more detailed background on concurrency bugs (Lu et al. 2006; Park, Vuduc, and Harrold 2010).

(initially x = y = w = z = 0) T1

1. if (z > 0)

5. if (y == 0)

7. assert(x > 0)

2. x = x + 1

3. y = y + 1

1. if (w > 0)

Figure 2: A multi-threaded program with a path and schedule-dependent bug.

(initially x = y = w = z = 0) T1

5. [y == 0]

7. assert(x > 0)

2. x = x + 1

3. y = y + 1

Figure 3: Failing schedule for program in Figure 2.

and Dolby 2013; Machado, Lucia, and Rodrigues 2015; Machado, Lucia, and Rodrigues 2016). The constraints of a concurrent execution are composed by a set of sub-formulae that encode, respectively, the path conditions, the program order of operations in each individual thread, the read-write linkages of shared variables, the synchronization points, and the failure condition. Let vart,l denote the value of a shared variable var at line l of thread t. Consider the initial values of the variables as having line and thread equal to 0. Moreover, let the order variables Rt,l (Wt,l) denote the execution order of the read (write) operation at line l of thread t. A concrete instance of the SMT formula for the multi-threaded program in Figure 2 is as follows.2

Path Constraints. Let us assume an execution path in which the branch condition at line 1 of thread T1 evaluates to true, whereas the branch conditions at lines 5 and 1 of threads T1 and T3, respectively, evaluate to false. The path constraints are then encoded as:

(z1,1 > 0) (y1,5 = 0) (w3,1 > 0)

Program Order Constraints. The following SMT formula

2Since the program in Figure 2 does not have synchronization points, we do not consider this type of constraints in the model description for the example.

enforces the schedule of operations in a particular thread:

(R1,1 < W1,2 < W1,3 < W1,4 < R1,5 < R1,7)

(W2,1 < W2,2 < W2,3) (R3,1 < W3,2)

Read-Write Constraints. To encode all possible data-flows over shared variables due to reads and writes performed by the three threads along their execution paths, one would add the following conjunction of constraints to the system:

(x0,0 = 0) (y0,0 = 0) (w0,0 = 0) (z0,0 = 0) initial values

((z1,1 = z0,0 R1,1 < W2,1)

(z1,1 = z2,1 R1,1 > W2,1)) (z2,1 = 1) reads/writes on z

(w1,2 = w0,0 + 1)

((w3,1 = w0,0 R3,1 < W1,2) (w3,1 = w1,2 R3,1 > W1,2)) reads/writes on w

((y1,5 = y1,4 (R1,5 < W2,3 W1,4 > W2,3))

(y1,5 = y2,3 R1,5 > W2,3 W2,3 > W1,4))

((y2,3 = y0,0 + 1 R2,3 < W1,4) (y2,3 = y1,4 + 1 R2,3 > W1,4)) reads/writes on y

((x1,7 = x1,3 (R1,7 < W2,2 W1,3 > W2,2))

(x1,7 = x2,2 R1,7 > W2,2 W2,2 > W1,3))

((x2,2 = x0,0 + 1 R2,2 < W1,3) (x2,2 = x1,3 + 1 R2,2 > W1,3)) reads/writes on x

Note that each conjunct expresses the possible values returned by a read operation on a shared variable by encoding the most-recent write to that variable. For instance, the constraints (x2,2 = x0,0 + 1 R2,2 < W1,3) (x2,2 = x1,3 + 1 R2,2 > W1,3) indicate that the read on x by T2 at line 2 either returns the result of adding 1 to the initial value (therefore, read R2,2 happens before write W1,3) or the result of T1 s write at line 3 (hence, write W1,3 happens before read R2,2 and is the most recent write to x). Failure Constraint. Finally, to force the solver to produce a failing schedule, one adds a constraint encoding the negation of the assertion: (x1,7 > 0). In contrast, to obtain a non-failing schedule, one would add the original assertion to the formulation instead of the constraint above. Furthermore, one would also need to add constraints limiting the order variables to have all distinct integer values within the range [1, n], where n is the number of operations of the program. In this example, the Linear Integer Arithmetic (LIA) theory is being used as T . Using SMT in related areas. Over the last decade, there has been a growing interest in using SMT for concurrency debugging. For instance, several record and replay systems use SMT to deterministically reproduce concurrency bugs (Lee et al. 2009; Altekar and Stoica 2009; Huang, Zhang, and Dolby 2013). Alongside, Bug Assist (Jose and Majumdar 2011) pioneered root cause isolation with SMT constraint solving by leveraging unsatisfiable cores. However, Bug Assist is only able to diagnose errors in sequential

programs. Conc Bug Assist (Khoshnood, Kusano, and Wang 2015) extends Bug Assist to handle concurrency bugs and automatically generate fixes by casting the binate covering problem as a constraint formulation. Finally, SMT has also been extensively used in testing (Huang 2015; Farzan et al. 2013) of concurrent programs.

2.3 Differential Schedule Projections

In the context of root cause isolation using SMT solving, the SYMBIOSIS debugging tool (Machado, Lucia, and Rodrigues 2015) introduced the notion of Differential Schedule Projection (DSP). Like the name suggests, a DSP is built by projecting a failing schedule onto an alternate schedule (which is a non-failing variant of the failing schedule). The DSP prunes out the common operations in the projection and reports solely the subset of thread events that differ between the schedules. These events correspond to read-write linkages that cause the failure in the failing schedule. More formally, let us define an execution schedule as a 3-tuple (V, Es, Edf), corresponding to a directed acyclic graph (DAG) with a set V of vertices, representing thread operations in the execution, and two sets of directed edges connecting the vertices: schedule edges (Es) and data-flow edges (Edf). A schedule edge links a pair of operations, belonging to the same thread or not, according to the execution order given by the schedule. In turn, a data-flow edge connects a write operation w to a read operation r, thus indicating that r reads the value written by w during the execution. Consider now that F = (V, EF s , EF df) and A = (V, EA s , EA df) denote the DAGs of the failing and alternate schedules, respectively. The portions of the schedules relevant to the bug s root cause are obtained by computing the union of the edges exclusive to both schedules: Ep s = (EF s EA s )\(EF s EA s ) and Ep df = (EF df EA df)\(EF df EA df). The DSP is then given by the pair (F p, Ap), where F p = (V p, EF s Ep s, EF df Ep df), Ap = (V p, EA s Ep s, EA df Ep df), and V p is the set of vertices (i.e. operations) appearing in the edges in both Ep s and Ep df. By highlighting the data-flow variations between the schedules, the DSP allows obviating most of the execution s complexity (which is often irrelevant for the failure to occur) and steer the programmer s focus to the bug s root cause.

2.4 Differential Path-Schedule Projections

When the manifestation of the failure depends on the threads execution path, an alternate, non-failing schedule will necessarily exhibit a variation in the program s controlflow with respect to the failing schedule. To handle such path-schedule dependent bugs, CORTEX (Machado, Lucia, and Rodrigues 2016) introduced Differential Path-Schedule Projections (DPSPs), which are an extension of DSPs that highlight differences in the control-flow, in addition to differences in the data-flow. However, since a change in the control-flow of a multithreaded program is, in practice, the result of a variation in the schedule of the thread operations that changes the dataflow on shared variables and causes a given branch condition

to evaluate differently, in the rest of the paper we will use the term DSP to refer to both a DSP and a DPSP. It should also be noted that the accuracy of DSPs is heavily determined by the similarity between the schedules. The more different the thread interleaving of the failing schedule is from that of the alternate schedule, the fewer common portions their projection will exhibit and the longer the resulting DSP will be. An unnecessarily long DSP will contain events from the original execution that are not related to the bug s root cause and, consequently, hinder the debugging task. It is thus of paramount importance to obtain failing and alternate schedules as identical as possible, in order to generate an accurate and useful differential projection.

3 SYMBIOSIS and CORTEX Shortcomings

This section describes systems SYMBIOSIS and CORTEX s approaches for generating Differential Schedule Projections (DSPs), and highlights their key shortcomings.

3.1 Swapping One Operation Pair is not Enough

SYMBIOSIS leverages symbolic execution and SMT constraint solving to produce DSPs that isolate concurrency bugs. SYMBIOSIS receives as input a set of lightweight per-thread path profiles recorded from an original failing execution. Next, it performs a guided symbolic execution along the thread path profiles with the goal of generating per-thread traces with symbolic information. The symbolic traces are used to generate an SMT constraint formulation encoding the failure condition, as described in Section 2.2. Afterwards, SYMBIOSIS resorts to an SMT solver to solve the constraint system and output a failure-inducing ordering of operations, i.e. the failing schedule. After obtaining the failing schedule, SYMBIOSIS has to devise an alternate, non-failing schedule in order to be able to compute the DSP and pinpoint the bug s root cause. The generation of the alternate schedule is done by applying the following heuristic:

1. Select a pair of events in the failing schedule, prioritizing pairs whose events are closer in the execution;

2. Generate a candidate alternate schedule by swapping the order of the two events in the failing schedule;

3. Check the satisfiability of the candidate alternate schedule by solving an SMT formula like the one used to produce the failing schedule, but modified to have the failure constraint inverted and new constraints enforcing the order of operations given by the candidate alternate schedule;

4. If the SMT formula is satisfiable, return the non-failing schedule. Otherwise, repeat steps 1 to 3 until a feasible alternate schedule is found, or until all possible event pairs have been reordered (in which case, SYMBIOSIS fails).

Once a feasible alternate schedule is found, SYMBIOSIS proceeds to computing the DSP as described in Section 2.3. Experimental results showed that this heuristic works well in various scenarios (Machado, Lucia, and Rodrigues 2015). However, it falls short for cases where the failure may stem from more than one data-flow.

As an example of this limitation, recall the buggy program in Figure 1. Note that it suffices that T4 runs before T2 or T5 runs before T3 for the program to fail. Consider the failing schedule of operations <T4, T5, T2, T3> for Figure 1. SYMBIOSIS will attempt to generate an alternate schedule by reordering the following pairs of instructions: (R1,5, W3,1), (R1,5, W2,1), (R3,1, W2,1), (R3,1, W5,1), and (R2,1, W4,1). Swapping the former two event pairs renders the model unsatisfiable because it violates the synchronization semantics of the program, which state that the end of a thread must happen before the join operation by the parent thread. In turn, inverting the order of the remaining pairs will fix one of the two problematic data-flows (z = z + 1 before x = x + z or y = y + 1 before x = x + y), but never both simultaneously. For that reason, SYMBIOSIS will fail to find a feasible alternate schedule.

3.2 The Disregard for Minimality Conversely to SYMBIOSIS, CORTEX aims at exposing and isolating concurrency bugs in correct executions. CORTEX starts from a set of per-thread path profiles collected from one or more executions and produces a non-failing schedule using symbolic execution and SMT constraint solving. In fact, the model built by CORTEX is similar to the one employed by SYMBIOSIS to obtain the initial failing schedule. Afterwards, CORTEX systematically explores the state space of the program in an attempt to unveil a schedule that leads to a failure. More concretely, CORTEX inverts branch conditions in the original non-failing schedule, starting from branches that are closer to the assertion point. For each new combination of thread symbolic traces, CORTEX builds an SMT formula encoding the failure condition and checks its satisfiability. If the formula is satisfiable, then CORTEX obtains the failing schedule and proceeds to the computation of the DSP. Otherwise, CORTEX flips a different branch condition, synthesizes new thread traces via symbolic execution, and repeats the procedure until a maximum number of branch flips is reached. Using this approach, CORTEX is able to generate failing schedules for failures that are path-schedule dependent. Then, an alternate schedule is produced through an SMT solver, as in SYMBIOSIS. Branch conditions are flipped and new thread traces synthesized if necessary. Unfortunately, the alternate schedule found by the SMT solver does not come with any guarantees in terms of similarity to the initial non-failing schedule. In fact, CORTEX simply uses the first alternate schedule produced by the solver. To better illustrate this limitation, recall the program in Figure 2. Figure 3 shows a possible failing schedule for this program. Recall that the failure occurs because T3 sets y to 0 before the evaluation of the branch condition in line 5 of T1, which causes x to be decremented to 0 at line 6 and, consequently, the violation of the assertion. For the purpose of this example, let us consider that CORTEX starts from the failing schedule in Figure 3 and aims at finding an alternate schedule that allows computing the DSP. Since this failure is path-schedule dependent, CORTEX needs to synthesize new execution traces in order to find a feasible alternate schedule. Suppose that CORTEX synthesizes an execution that flips

(initially x = y = w = z = 0) T1

5. [ (y == 0)]

7. assert(x > 0)

2. x = x + 1

3. y = y + 1

1. [ (w > 0)]

Figure 4: Possible alternate schedule for Figure 3.

(initially x = y = w = z = 0) T1

5. [ (y == 0)]

7. assert(x > 0)

2. x = x + 1

3. y = y + 1

1. [ (w > 0)]

Figure 5: An alternate schedule for Figure 3 that results in a smaller DPSP than the one in Figure 4.

the path conditions of lines 1 and 5 of T3 and T1, respectively. The alternate schedules depicted in Figures 4 and 5 are two possible interleavings found by the SMT solver for this scenario. The DSPs generated for these two alternate schedules would differ. The alternate schedule in Figure 5 is more similar to the failing schedule, as they share a longer common prefix (namely all the instructions executed by T2). Since the common prefix is irrelevant to the occurrence of the bug, it can be discarded when computing the DSP.

4 Generating DSPs with Max SMT This section proposes an approach based on Maximum Satisfiability to address the limitations in SYMBIOSIS and CORTEX presented in Section 3.

4.1 Maximum Satisfiability in SMT The Maximum Satisfiability (Max SAT) problem can be seen as an optimization version of the SAT problem. In Max SAT, the goal is to find an assignment to the variables of a CNF formula that maximizes the number of satisfied clauses. There are several variants of Max SAT (Li and Many a 2009). In the context of this work, we consider the partial Max SAT problem where a Max SAT formula φ = φh φs has some clauses considered as hard (φh), while others are declared as soft (φs). In partial Max SAT, the goal is to find an assignment to the variables in φ such that all hard clauses are sat-

isfied and the number of satisfied soft clauses is maximized. One can also define the Maximum Satisfiability problem in SMT formulas. The Max SMT problem is similar to Max SAT, except that φh, as well as φs contain T -formulas instead of propositional clauses. In Max SMT, the goal is to find an assignment that satisfies all T -formulas in φh and maximizes the number of satisfied soft T -formulas in φs.

Example 4.1. Consider the Max SMT formula φ = φh φs where φh = {(x1 0), (x1 4)} and φs = {(x2 1), (x2 0), (x1+x2 = 5)}. Here, T is the Linear Integer Arithmetic (LIA) theory. An optimal solution would be x1 = 4, x2 = 1, where soft constraint (x2 0) is not satisfied.

Observe that algorithms to solve Max SAT that use iterative calls to a SAT solver can be adapted to solve Max SMT. In Max SMT, an SMT solver is used instead of a SAT solver. For instance, the SMT solver Z3 (de Moura and Bjørner 2008) is able to solve Max SMT formulas (Bjørner and Narodytska 2015). In the last decade, SMT solvers became able to solve much larger problem instances than previously. As a result, concurrency debuggers such as CLAP (Huang, Zhang, and Dolby 2013), SYMBIOSIS and CORTEX have successfully used SMT constraint solving to achieve their goals. However, these tools rely on heuristic procedures with no guarantee of approximation to the optimum. One should note that soft formulas encode objectives or preferences. It is well-known that any linear objective function can be easily encoded by using a set of soft formulas in Max SAT or Max SMT. However, there exists a vast number of applications where there is more than one objective function to optimize. When there are multiple objectives, one can define a different set of soft formulas for each individual objective. Moreover, if those objectives can be sorted according to some criteria, then each objective (encoded as a set of soft formulas) can be optimized in order by using lexicographic optimization (Marques-Silva et al. 2011).

4.2 Finding an Optimal DSP

In this section, we propose using Max SMT to compute alternate schedules as similar as possible to the corresponding failing schedules. Those alternate schedules can then be used to generate simpler and more informative DSPs than the ones currently produced by the SYMBIOSIS and CORTEX systems. We consider three DSP quality criteria when measuring schedule similarity:

Number of data-flow variations. This corresponds to the amount of data-flows that appear in the alternate schedule but not in the failing schedule, i.e. EA df Ep df .

Number of broken segments. Let a segment S = o1o2 . . . ok be a maximal sequence of operations in the failing schedule such that all oi belong to the same thread T and for all o not belonging to T we have that o occurs before o1 or after ok. S is considered broken if, in the alternate schedule, some operation of a thread T = T appears after o1 but before ok.

Number of context switch variations. This accounts for the context switches in the failing schedule that do not

occur in the alternate schedule, i.e., contiguous pairs of operations oioj in the failing schedule such that oi and oj belong to different threads and, in the alternate schedule, some other operation ok appears in-between oi and oj.

The rationale for the aforementioned DSP quality criteria stems from the observation that most real-world concurrency bugs can be triggered by only a few thread context switches (Musuvathi et al. 2008). By generating an alternate schedule that minimizes the variation in the data-flow and context switching with respect to the failing sequence of operations, one is able to reduce the cardinality of the projection edge sets Ep df and Ep s and thus produce a DSP that accurately isolates the bug s root cause. The soft formulas in our Max SMT model encode the minimization of these criteria, while constraints presented in section 2.2 are considered as hard formulas. For each dataflow edge (w, r) EF df, we add a soft formula stating that operation r must read the value written by w. Given some operation o, we denote its order variable as O. Therefore, let R and W be the order variables of r and w respectively. For every data-flow edge (w, r), the Max SMT model considers, as a soft formula, the conjunction of W < R with W < W W > R for all w = w such that w is a write operation on the variable read by r. To encode the broken segments criteria, we consider every possible maximal sequence S = o1o2...ok of operations, in the failing schedule, belonging to the same thread, and add the soft formula Ok O1 = k 1. Finally, in order to minimize the number of context switch changes, for each schedule edge (oi, oj) such that oi and oj belong to different threads, the model considers the soft formula Oi = Oj 1. We can look at this schedule computation problem as a multi-objective problem with three different sets of soft formulas φdf s , φbs s and φcs s , representing the data-flows, broken segments and context switches criteria respectively. We consider two types of Max SMT models: one where all criteria are equally significant (i.e., φs = φdf s φbs s φcs s ) and one lexicographic model that optimizes φdf s first, followed by φbs s and then φcs s . By prioritizing the optimization of the data-flows, one ensures that the specific interleaving of operations that causes the bug is the most accurate possible. Having that, the optimization of the other criteria is mainly to allow pruning out additional thread segments that, although differing in the two executions, are not relevant to the failure. Differential Path-Schedule Projections. Recall from section 2.4 that, if the failure is path dependent, CORTEX synthesizes execution traces that follow a different control-flow than the one in the original non-failing schedule. As a result, some operations in the original schedule may not appear in new traces and vice-versa. When building φdf s , we ignore data-flow edges containing at least one operation that does not appear in the synthesized traces. Same goes for operation sequences (pairs) considered when building φbs s (φcs s ).

4.3 Approximating an Optimal DSP

In our experiments, we observed that the Max SMT approach proposed in the previous section struggled to find alternate schedules within reasonable time. However, in practice, one

Algorithm 1: Progressive algorithm for computing an alternate schedule.

1 Sa Assert Segment(F)

2 OPfix Operations(F) \ Sa 3 Sf Sa; Sl Sa

6 Sf Previous Segment(F, Sf)

7 Sl Next Segment(F, Sl)

8 OPfix OPfix \ (Sf Sl)

9 A Find Alt Schedule(F, OPfix)

10 while A = nil OPfix = ;

11 return A

does not necessarily have to find the alternate schedule that is the most similar to the failing schedule in order to produce a DSP that is sufficiently short and accurate. State-of-the-art Max SMT algorithms rely on multiple calls to an SMT oracle and compute intermediate suboptimal solutions during the search process (Bjørner and Phan 2014). Internally, modern SMT solvers apply an adaptation of the conflict-driven DPLL procedure used in stateof-the-art SAT solvers (Sebastiani 2007). A simple approach is to terminate the search early if the number of conflicts crosses a fixed threshold. In such case, the Max SMT solver returns the best solution it was able to find. If no solution was found, the formula is assumed to be unsatisfiable. A better approach is to generate simpler Max SMT models by first focusing on the operations close to the assertion and progressively expanding until an alternate schedule is found. This is the idea behind Algorithm 1, which receives a failing schedule F as input and outputs an alternate schedule A. Recall that a failing schedule is a sequence of operations that results in a failure and segments are maximal sequences of operations belonging to the same thread. The assertion segment Sa is the segment that includes the failing assertion. Algorithm 1 starts by setting OPfix to all operations except those in Sa (line 1). At each iteration, the order of the operations in OPfix is fixed. The algorithm then retrieves the segments immediately before (Sf) and after (Sl) the assertion segment (lines 6 and 7). Next, it unfixes the corresponding operations from OPfix (line 8) and attempts to compute an alternate schedule by solving a Max SMT model (line 9). If an alternate schedule is found, then that schedule is returned. Otherwise, the algorithm keeps removing operations from OPfix until it either finds an alternate schedule or the set of fixed operations becomes empty (line 10). Note that conflict thresholds can also be used in algorithm 1 in the computation of the alternate schedule in order to avoid getting stuck solving hard Max SMT formulas.

5 Experimental Evaluation In order to evaluate the proposed Max SMT-based approaches, the Max SMT models and algorithms were integrated in the publicly available versions of SYMBIOSIS3 and

3http://github.com/nunomachado/symbiosis

CORTEX4 tools. The experimental evaluation is mainly focused on assessing the performance and the quality of the DSPs produced by our Max SMT-based approaches with respect to the base versions of SYMBIOSIS and CORTEX. The performance criterion is measured in terms of constraint solving time. For the latter criterion, we assume that reducing the size of the DSPs and the values of the metrics discussed in section 4.2 improves the quality of the DSP and the productivity of the programmer when fixing the bug. For the comparison, we resorted to several buggy multithreaded programs commonly used in the literature. We consider two sets of test cases. The first includes several C/C++ and Java applications with strictly schedule-dependent bugs, such as shared buffer implementation or an adapted parallel file scanner (Elmas et al. 2013). The second set includes programs with path-schedule dependent bugs from the IBM Con Test benchmark suite, as well as the Ex MCR benchmark. In our experiments, Z3 (Bjørner and Phan 2014) (version 4.6.0) was used as the Max SMT solver. Z3 has native support for lexicographic optimization and implements Max SMT algorithms Max Res and WMax.

5.1 Performance Comparison Table 1 shows the execution times of the heuristics of SYMBIOSIS/CORTEX and the Max SMT-based approaches. Columns Max SMT and Lexico correspond to the full Max SMT and lexicographic models respectively. The Prog (X) columns correspond to the progressive algorithm (Algorithm 1), where X stands for the type of Max SMT model used. The cmp (complete) sub-column indicates that no conflict threshold was imposed. For the inc (incomplete) sub-column, a conflict threshold of 200000 was used. Each line corresponds to a different test case and the best entries for each test case are highlighted. A time limit of 3 hours was imposed for each execution on a particular test case. Entries with TO indicate that the algorithm timed out, while entries with FAIL indicate that the algorithm terminated within the time limit, but failed to find an alternate schedule. The results in Table 1 show that algorithms using the full Max SMT model require, in general, more time to generate the DSPs. However, the progressive algorithm is always as fast or much faster than all remaining approaches, including the heuristic of SYMBIOSIS/CORTEX. In particular, for the bufwriter test case, all configurations of the progressive algorithm generate the DPSP at least 60 times faster than the heuristic. The only exception is the manager test case, but our approach is able to produce a much improved DPSP (see Table 2). The Average line shows the average execution time for each algorithm only considering solved instances. However, some algorithms are not able to solve all instances within the time limit. These entries are underlined. Nevertheless, the average execution time of the progressive algorithm is much smaller than the remaining approaches.

5.2 DSP Quality Comparison Table 2 shows the quality of the DSPs found by the original version of SYMBIOSIS/CORTEX and by the proposed

4http://github.com/nunomachado/cortex-tool

Table 1: Time in seconds required to generate DSPs (DPSPs) using SYMBIOSIS/CORTEX and Max SMT

Max SMT Lexico Prog (Max SMT) Prog (Lexico)

SYMB/ Max Res WMax Max Res WMax Max Res WMax Max Res WMax

CORTEX cmp inc cmp inc cmp inc cmp inc cmp inc cmp inc cmp inc cmp inc

crasher 22.6 3373 FAIL 4275 FAIL 4321 FAIL TO FAIL 0.4 0.4 0.4 0.5 0.4 0.4 0.4 0.4

pbzip2 (S) 0.3 22.8 22.8 19.9 20.1 37.5 38.3 38.2 38.0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

pbzip2 (M) 0.9 122.8 122 112 112 TO 1280 4306 3751 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3

pbzip2 (L) 28.7 TO TO TO TO TO TO TO TO 2.3 2.3 2.3 3.0 2.2 2.2 2.2 2.1

pfscan 0.2 812 99.1 2.1 2 0.2 0.6 0.4 0.7 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1

airline 0.2 TO 88.9 2.1 1.7 11.7 11.7 1.0 1.0 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1

bank FAIL 11.6 11.8 3.3 3.4 2.5 2.6 2.2 2.2 1.5 1.4 1.1 1.1 1.2 1.2 1.1 1.1

cache4j (M) 114.1 3048 3301 3508 3794 3489 3494 2821 2846 6.0 5.9 6.7 7.8 6.0 5.3 6.9 7.0

cache4j (L) 343 TO TO TO 10668 TO TO TO TO 176 166 19.9 20.4 16.2 16.6 109 99.6

bufwriter 5790 TO TO 8939 8940 3773 3828 5803 5782 91.7 91.5 32.9 33.0 60.6 62.6 38.5 38.4

critical 0.5 2.5 0.3 2.3 0.2 0.1 0.3 0.2 0.3 0.2 0.7 2.6 0.2 0.2 0.2 0.3 0.6

ex MCR 1.1 0.6 0.6 0.2 0.2 0.2 0.6 2.1 2.2 0.5 0.7 0.6 0.6 0.4 0.6 0.6 0.6

garage 13.4 TO 673 39.7 40.8 62.9 59.6 77.3 76.8 0.5 0.4 0.5 0.4 0.4 0.4 0.5 0.4

loader 55.8 107 102 239 246 130 135 TO 701 97.9 96.8 10.5 10.5 22.4 22.0 31.3 31.2

manager 189 TO FAIL TO FAIL TO FAIL TO FAIL TO 2171 TO 1630 TO 1567 TO 1199

ticketorder 7.1 1101 1097 373 371 182 186 146 146 9.1 9.1 3.7 3.6 12.7 13.0 3.3 3.4

Average 437 782 502 1347 1862 1001 753 1200 1112 25.8 159 5.5 107 8.2 106 13 86.5

Table 2: Quality of the DSPs (DPSPs), generated by SYMBIOSIS/CORTEX and Max SMT, measured using the number of operations (OP), data-flow variations (DF), broken segments (BS) and context switch variations (CTX)

Test Case SYMB / CORTEX Max SMT Lexico Prog (Max SMT) Prog (Lexico)

OP DF BS CTX OP DF BS CTX OP DF BS CTX OP DF BS CTX OP DF BS CTX

crasher 9 1 1 2 9 1 1 2 9 1 0 3 9 1 1 2 9 1 0 3

pbzip2 (S) 7 1 2 1 7 1 1 1 7 1 1 1 7 1 1 1 7 1 1 1

pbzip2 (M) 4 1 2 1 3 1 1 1 TO 3 1 0 2 3 1 0 2

pbzip2 (L) 7 1 2 1 TO TO 7 1 1 1 7 1 1 1

pfscan 27 1 2 1 27 1 1 1 27 1 0 2 27 1 0 2 27 1 0 2

airline 11 2 2 1 3 1 1 1 3 1 1 1 8 2 1 1 8 2 1 1

bank FAIL 6 1 1 2 6 1 1 2 54 2 2 1 54 2 1 2

cache4j (M) 14 1 1 2 3 1 1 2 3 1 0 3 43 1 1 2 3 1 0 3

cache4j (L) 5 1 2 1 TO TO 3 1 1 2 3 1 1 2

bufwriter 724 88 7 6 197 1 2 1 35 1 1 2 221 8 3 3 221 8 3 3

critical 9 4 3 4 7 4 2 2 7 4 2 2 7 4 2 2 7 4 2 2

ex MCR 22 8 8 12 21 7 6 8 21 7 5 9 21 7 6 8 21 7 5 9

garage 5 2 1 2 3 1 2 1 3 1 1 2 5 2 0 3 5 2 0 3

loader 267 22 18 18 48 21 3 2 47 21 2 3 48 21 4 2 47 21 3 3

manager 120 46 58 147 TO TO 78 33 34 109 57 32 9 70

ticketorder 178 4 5 5 46 4 1 2 46 4 1 2 47 4 2 1 46 4 1 2

Average (%) - - - - 61 82 67 70 57 80 34 92 89 91 47 102 68 91 28 109

Max SMT-based approaches. We consider the number of operations, data-flow variations, broken segments and context switch variations in the DSP. Results are shown only for a representative set of configurations (see Table 3) which exhibit better performance. The Average line shows the average relative DSP quality improvement/degradation with respect to SYMBIOSIS/CORTEX. For example, 61% in the OPs column for Max SMT indicates that the alternate schedule found produces, on average, a DSP with 100% - 61% = 39% fewer operations than that of SYMBIOSIS/CORTEX.

In general, using the full Max SMT model results in higher

quality DSPs. For example, when Lexico is able to find an alternate schedule, it produces a DSP with the smallest number of operations, data-flow variations and broken segments. Max SMT performs better in terms of context switches, which is expected since Lexico prioritizes data-flow variations and broken segments. The DSPs produced by the progressive approaches are, usually, of equal quality to those produced with the full models. Moreover, all the Max SMTbased approaches outperform SYMBIOSIS/CORTEX, especially in the test cases with path-schedule dependent bugs.

Table 3: Representative configurations per type of approach

Approach Max SMT Algorithm Conflict Threshold Max SMT WMax No Lexico Max Res No Prog (Max SMT) WMax Yes Prog (Lexico) WMax Yes

6 Conclusions This paper proposes the use of Maximum Satisfiability Modulo Theories (Max SMT) to generate minimal Differential Schedule Projections (DSPs) that automatically provide an explanation of a concurrency bug in a multi-threaded program. Also, in cases where solving the Max SMT model is too time-consuming, we exploit domain knowledge to build smaller models and reduce the constraint solving time while still providing a close approximation to an optimal DSP. perations The proposed Max SMT-based approaches were integrated into two state-of-the-art systems SYMBIOSIS and CORTEX. Experimental results, using benchmark and realworld concurrency bugs, show that using a Max SMT model greatly improves the quality of the DSPs, especially when bugs are path and schedule dependent.

Acknowledgments This work was supported by national funds through Fundac ao para a Ciˆencia e a Tecnologia (FCT) with references UID/CEC/50021/2013, SFRH/BD/111471/2015, CMU/AIR/0022/2017 and PTDC/CCI-COM/31198/2017.

References Altekar, G., and Stoica, I. 2009. ODR: output-deterministic replay for multicore debugging. In 22nd Symposium on Operating Systems Principles, 193 206. ACM. Berger, E. D.; Yang, T.; Liu, T.; and Novark, G. 2009. Grace: safe multithreaded programming for C/C++. In 24th International Conference on Object-Oriented Programming, Systems, Languages, and Applications, 81 96. ACM. Bjørner, N., and Narodytska, N. 2015. Maximum satisfiability using cores and correction sets. In International Joint Conference on Artificial Intelligence, 246 252. AAAI Press. Bjørner, N., and Phan, A. 2014. νZ - maximal satisfaction with Z3. In 6th International Symposium on Symbolic Computation in Software Science, 1 9. de Moura, L. M., and Bjørner, N. 2008. Z3: an efficient SMT solver. In 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, 337 340. Springer. Devietti, J.; Lucia, B.; Ceze, L.; and Oskin, M. 2010. DMP: deterministic shared-memory multiprocessing. IEEE Micro 30(1):40 49. Elmas, T.; Burnim, J.; Necula, G. C.; and Sen, K. 2013. Concurrit: a domain specific language for reproducing concurrency bugs. In International Conference on Programming Language Design and Implementation, 153 164. ACM.

Farzan, A.; Holzer, A.; Razavi, N.; and Veith, H. 2013. Con2colic testing. In 26th Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 37 47. ACM. Huang, J.; Zhang, C.; and Dolby, J. 2013. CLAP: recording local executions to reproduce concurrency failures. In 34th International Conference on Programming Language Design and Implementation, 141 152. ACM. Huang, J. 2015. Stateless model checking concurrent programs with maximal causality reduction. In 36th International Conference on Programming Language Design and Implementation, 165 174. ACM. Jose, M., and Majumdar, R. 2011. Cause clue clauses: Error localization using maximum satisfiability. In 32nd International Conference on Programming Language Design and Implementation, 437 446. ACM. Khoshnood, S.; Kusano, M.; and Wang, C. 2015. Conc Bug Assist: constraint solving for diagnosis and repair of concurrency bugs. In International Symposium on Software Testing and Analysis, 165 176. ACM. Lee, D.; Said, M.; Narayanasamy, S.; Yang, Z.; and Pereira, C. 2009. Offline symbolic analysis for multi-processor execution replay. In 42nd International Symposium on Microarchitecture, 564 575. ACM. Li, C. M., and Many a, F. 2009. Maxsat, hard and soft constraints. In Handbook of Satisfiability. 613 631. Lu, S.; Tucek, J.; Qin, F.; and Zhou, Y. 2006. AVIO: detecting atomicity violations via access interleaving invariants. In International Conference on Architectural Support for Programming Languages and Operating Systems, 37 48. ACM. Lu, S.; Park, S.; Seo, E.; and Zhou, Y. 2008. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 329 339. ACM. Machado, N.; Lucia, B.; and Rodrigues, L. 2015. Concurrency debugging with differential schedule projections. In 36th International Conference on Programming Language Design and Implementation, 586 595. ACM. Machado, N.; Lucia, B.; and Rodrigues, L. 2016. Production-guided concurrency debugging. In 21st Symposium on Principles and Practice of Parallel Programming, 29:1 29:12. ACM. Marques-Silva, J.; Argelich, J.; Grac a, A.; and Lynce, I. 2011. Boolean lexicographic optimization: algorithms & applications. Annals of Mathematics and Artificial Intelligence 62(3-4):317 343. Musuvathi, M.; Qadeer, S.; Ball, T.; Basler, G.; Nainar, P. A.; and Neamtiu, I. 2008. Finding and reproducing heisenbugs in concurrent programs. In Symposium on Operating Systems Design and Implementation, 267 280. Olszewski, M.; Ansel, J.; and Amarasinghe, S. P. 2009. Kendo: efficient deterministic multithreading in software. In International Conference on Architectural Support for Programming Languages and Operating Systems, 97 108.

Park, S.; Vuduc, R. W.; and Harrold, M. J. 2010. Falcon: fault localization in concurrent programs. In International Conference on Software Engineering, 245 254. ACM. Sebastiani, R. 2007. Lazy satisability modulo theories. Journal on Satisfiability, Boolean Modeling and Computation 3(3-4):141 224.