# neural_circuit_synthesis_from_specification_patterns__1b62feac.pdf

Neural Circuit Synthesis from Speciﬁcation Patterns

Frederik Schmitt CISPA Helmholtz Center for Information Security Saarbrücken, Germany frederik.schmitt@cispa.de

Christopher Hahn CISPA Helmholtz Center for Information Security Saarbrücken, Germany christopher.hahn@cispa.de

Markus N. Rabe Google Research Mountain View, California, USA mrabe@google.com

Bernd Finkbeiner CISPA Helmholtz Center for Information Security Saarbrücken, Germany finkbeiner@cispa.de

We train hierarchical Transformers on the task of synthesizing hardware circuits directly out of high-level logical speciﬁcations in linear-time temporal logic (LTL). The LTL synthesis problem is a well-known algorithmic challenge with a long history and an annual competition is organized to track the improvement of algorithms and tooling over time. New approaches using machine learning might open a lot of possibilities in this area, but suffer from the lack of sufﬁcient amounts of training data. In this paper, we consider a method to generate large amounts of additional training data, i.e., pairs of speciﬁcations and circuits implementing them. We ensure that this synthetic data is sufﬁciently close to human-written speciﬁcations by mining common patterns from the speciﬁcations used in the synthesis competitions. We show that hierarchical Transformers trained on this synthetic data solve a signiﬁcant portion of problems from the synthesis competitions, and even out-of-distribution examples from a recent case study.

1 Introduction

In reactive synthesis, a circuit is automatically constructed from a logical speciﬁcation given as a formula in linear-time temporal logic (LTL). LTL is widely used by the veriﬁcation community and is the basis for industrial speciﬁcation languages like the IEEE standard PSL [24]. Efﬁcient synthesis tools for LTL would simplify the hardware design process: a hardware designer could focus on specifying what the circuit is supposed to compute, instead of implementing how the computation is done. LTL synthesis procedures, however, have to invoke involved reasoning engines, which turn often out to be infeasible when facing real-world problem instances. Much research has been conducted to push this form of hardware construction process closer to practice (see, for example, the synthesis of the AMBA protocol [5]). The high computational complexity of the general problem (2-EXPTIME-complete), however, is so far a barrier that seems insurmountable with classical, e.g., automaton-based, approaches. Recent successful applications of machine learning for logical tasks, such as SAT solving [45, 46], higher-order theorem proving [36, 3], and the LTL trace generation problem [22] encourage new approaches to the LTL synthesis problem using machine learning. Similar to the success of machine learning for program synthesis, e.g., [37, 20, 42], machine learning approaches might open a lot of possibilities in hardware synthesis. For example, secondary design

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

(assumptions)

( ( (i0))) ( (( (o2)) ((( (i4)) ( (i1))) U(( (i4)) (i1))))) (guarantees:)

((i0) ( (( (i0)) (o4)))) ((i2) ( (o0))) ((i1) ( (o0))) ( ( (o4))) ( (( (i4)) ( (i1)))) ((i4) ( (o3))) ( (( (o4)) ( (o2)))) ((o1) ( ((i1) R(((i1) (o2)) (( (i1)) (o0)))))) (( (o3)) (i3))

Figure 1: A speciﬁcation in our test set, consisting of 2 assumption patterns and 8 guarantee patterns (left). A circuit, predicted by a hierarchical Transformer, satisfying the speciﬁcation (right).

goals, which cannot be easily formalized, might be incorporated into the process using natural language. Applying machine learning to the area of hardware synthesis, however, suffers from a severe lack of sufﬁcient amounts of training data.

In this paper, we consider a method to generate large amounts of additional training data, i.e., pairs of speciﬁcations and circuits implementing them. We show that hierarchical Transformers [33] can be trained on the circuit synthesis problem using the generated data and that the models can solve a signiﬁcant portion of problems from the annual synthesis competition. In practice, logical hardware speciﬁcations follow speciﬁc design patterns [12]. To cope with the data scarcity of this problem, we propose a method that makes use of speciﬁcation patterns, from which data for a successful training can be derived.

For example, a common LTL speciﬁcation pattern looks as follows: (r g). The formula describes a response property, stating that at every point in time ( ), a request r must be eventually ( ) followed by a grant g. We obtain these patterns from the annual reactive synthesis competition [25]. We mined 2099 speciﬁcation patterns from 346 benchmarks, which we split into assumption patterns and guarantee patterns. Assumption patterns restrict the space of possible inputs (environment behavior), and guarantee patterns describe how the circuit has to react to the environment. From these speciﬁcation patterns, we generate larger speciﬁcations by conjoining assumption patterns to a speciﬁcation ϕA and by conjoining guarantee patterns to a speciﬁcation ϕG. The implication ϕA ϕG forms the ﬁnal speciﬁcation of the circuit. We obtained 200 000 speciﬁcations and used classical synthesis tools [14, 35] to compute circuits satisfying the speciﬁcations. Figure 1 shows an example held-out speciﬁcation constructed in this fashion and a circuit predicted by one of our models (details on the data representation can be found in Section 3). When checking, the predicted circuit indeed satisﬁes the speciﬁcation.

To train a machine learning model on the LTL synthesis task, we represent the decomposed speciﬁcations and circuits as sequences and use hierarchical Transformers [33]. We show that many of the model s predictions that differ from the circuits in our dataset satisfy the speciﬁcations when verifying the predictions1, i.e., the model constructs a different, yet correct solution. When using a beam search, models achieve an accuracy of up to 79.9% on our synthetic test data and up to 66.8% on the original formulas from SYNTCOMP. The Transformer can even solve out-of-distribution formulas, taken from a recent case study [1], i.e., formulas that were not used for the speciﬁcation pattern mining. Furthermore, the models can solve generated test instances on which classical LTL synthesis tools timed out. In practice, it is essential to handle both realizable (i.e., when a hardware implementation exists) and unrealizable (i.e., when no hardware implementation exists) speciﬁcations. We demonstrate that our approach achieves similar results on both realizable and unrealizable speciﬁcations.

1Note that verifying the solutions, i.e., model-checking, is a by-far easier problem (PSPACE vs 2-EXPTIME) and can typically be done in a fraction of the time needed to synthesize the circuits classically.

The remainder of this paper is structured as follows: Related work is presented in Section 2. The data representation and generation process is described in Section 3. The experimental setup and the experimental evaluation are presented in Section 4 and Section 5, respectively. We conclude the paper in Section 6.

2 Related Work

Neural architectures for logical reasoning. Neural architectures for logical and mathematical reasoning have been studied recently. The closest work is the application of Transformers to the LTL trace generation problem demonstrating the generalization abilities of Transformers to the semantics of logics [22]. Despite the substantially greater complexity of the LTL synthesis problem, we are able to demonstrate the same generalization in this work. In addition, we consider both realizable and unrealizable speciﬁcations while for the LTL trace generation problem the satisﬁability of LTL formulas was assumed. Lample and Charton trained Transformers on symbolic integration and solving differential equations and were able to outperform commercial systems on a synthetic dataset [31]. Similar to our ﬁndings Lample and Charton observed signiﬁcant improvements in the Transformer s accuracy when using a beam search. Rabe et al. applied Transformers to formal mathematical statements and demonstrated the Transformer s reasoning abilities on tasks such as type inference and completing missing assumptions [41]. In contrast to the supervised setting in this work, Rabe et al. trained Transformers on an unsupervised skip-tree task that outperforms skip-sequence tasks for language modeling. For propositional logic Selsam et al. applied graph neural networks [43, 18] to solve the satisﬁability problem [46]. In subsequent work Selsam and Bjørner applied the same architecture to the unsat-core prediction problem and demonstrated that their model can be used as a heuristic to speed up SAT solvers [45]. Lederman et al. applied graph neural networks to quantiﬁed Boolean formulas to learn heuristics for QBF solvers through deep reinforcement learning [32]. Paliwal et al. trained graph neural networks on higher-order logic terms to predict tactics for higher-order theorem proving [36]. When integrated with the Deep HOL [3] neural theorem prover the graph neural networks achieved state-of-the-art performance for higherorder proof search. Similar, Balunovi c et al. applied graph neural networks to SMT formulas to predict tactics for SMT solvers [2]. Strategies synthesized from their model demonstrated signiﬁcant improvements over hand-crafted strategies from state-of-the-art SMT solvers. Earlier works on applying learning to mathematics, has focused on ranking premises or clauses Cairns [8], Urban [48, 49], Urban et al. [50], Meng and Paulson [34], Schulz [44], Kaliszyk and Urban [28].

Classic synthesis tools. The hardware synthesis problem traces back to the deﬁnition of the problem by Alonzo Church in 1957 [11], thus also called Church s Problem. With theoretical solutions, already in 1969 by Büchi and Landweber [7], the ﬁeld has matured today. From a foundational point of view, advances have been made algorithmically, e.g., with a quasi-polynomial algorithm for parity games [9], conceptually with distributed [40] and bounded synthesis [16], or expressiveness-wise, e.g., GR(1) [39] synthesis, which is an efﬁcient fragment of LTL or synthesis for security properties [17]. From a practical point of view, the ﬁeld can build on a rich supply of tools (e.g. [6, 15, 35]). The ﬁrst synthesis competition (SYNTCOMP) [26] was held in 2014, as part of the annual international conference on computer-aided veriﬁcation (CAV).

Property speciﬁcation patterns. Property speciﬁcation patterns for temporal logics have already been identiﬁed by Dwyer et al. [12]. They proposed a general hierarchical speciﬁcation pattern system containing 55 patterns that are mapped to formal speciﬁcation languages such as LTL and CTL. More patterns for temporal logical formulas are identiﬁed by Etessami and Holzmann [13], Holeˇcek et al. [23], Pelánek [38]. Konrad and Cheng [30] identiﬁed real-time speciﬁcation patterns formulated in different real-time temporal logics and a structured English grammar. Grunske [19] presented a speciﬁcation pattern system for probabilistic properties formulated in probabilistic temporal logic and a structured English grammar.

In the following, we will ﬁrst exemplary describe the speciﬁcation language LTL and the circuit representation (the interested reader can ﬁnd the full formalizations in the appendix). We will then describe our dataset, which is generated from speciﬁcation patterns from the LTL track of SYNTCOMP 2020 [26].

[0,0,0,0,0,0]

[1,0,0,0,0,0]

[1,0,1,0,0,0] [0,1,1,0,0,0]

[1,0,0,1,1,0]

Figure 2: Example tree positional encoding for the LTL request pattern (r g).

3.1 LTL and And-Inverter Graphs

LTL can specify that some proposition P must hold at every point in time ( P) or that P must hold at some future point of time ( P). By combining these operators, one can specify that P must occur inﬁnitely often ( P). The propositions are usually partitioned into inputs and outputs. In the following, we provide a small example. For inputs r1, r2 and outputs g1, g2 the LTL formula

(g1 g2) (r1 g1) (r2 g2)

speciﬁes a simple arbiter using a mutual exclusion property for grant g1 and grant g2 and two response properties that guarantee that always request r1 is eventually answered by grant g1 and always request r2 is eventually answered by grant g2. Given an LTL speciﬁcation ϕ, i.e., an LTL formula ϕ over atomic propositions AP and a partition of AP in inputs I and outputs O, the LTL synthesis problem is to determine whether a circuit over inputs I and outputs O exists such that the circuit satisﬁes the speciﬁcation. If no such circuit exists, we call the speciﬁcation to be unrealizable. Typically, an LTL speciﬁcation is decomposed into assumptions, posed on the inputs from the environment, and guarantees, that determine how to react to the inputs. For training Transformers, we represent assumptions and guarantees as sequences with a tree positional encoding [47]. The basic idea is, to encode the path through the syntax tree for each character. Since LTL has only unary and binary operations, this is encoded by appending either 1, 0, representing the left child or 0, 1, representing the right child, in front of the encoding. Figure 2 shows an example tree positional encoding for the response pattern (r g).

The AIGER format became an established format for benchmarks, competitions, and tool implementations in both computer-aided veriﬁcation and reactive synthesis. The AIGER format represents sequential circuits as and-inverter graphs in both ASCII and binary format. In this work, we refer to the original version 20071012 in ASCII format [4]. The ﬁrst line in an AIGER ﬁle in ASCII format contains the header that is the format identiﬁer string aag followed by 5 non-negative integers indicating the maximum variable index M, the number of inputs I, the number latches L, the number of outputs O, and the number of AND gates A. The header is followed by I lines deﬁning the inputs, L lines deﬁning the latches, O lines deﬁning outputs, and A lines deﬁning the AND gates. An optional symbol table to name inputs, outputs, and latches and a comment section may follow after the deﬁnitions. Inputs, latches, outputs, and AND gates are deﬁned using variables and literals represented as non-negative integers. The relationship between literals and variables is that we divide the literal by 2 to obtain the variable and if the literal modulo 2 equals 1 it corresponds to the negated variable and if the literal modulo 2 equals 0 it corresponds to the unnegated variable. Further literal 0 represents the Boolean constant and literal 1 represents the Boolean constant . Inputs are deﬁned as unnegated literals. Latches are deﬁned as two literals separated by a space. The ﬁrst literal provides the current state of the latch and the second literal the next state of the latch. Outputs are deﬁned as arbitrary literals. AND gates are deﬁned as three literals separated by a space. The ﬁrst literal is the output of the AND gate and the second and third literals are the inputs of the AND gate.

For our small arbiter example above, we show below a circuit (left) and its AIGER representation (right), which is actually a prediction of a hierarchical Transformer.

aag 3 2 1 2 0 header 2 r1 4 r2 6 7 latch 6 g1 7 g2

The triangles represent inputs and outputs, the rectangles represent variables, the diamond-shaped variables represent latches and the black dots represent inverter (NOT gates). The circuit implementation ignores the inputs I0 and I1, which represent both requests r1 and r2 (except for unnecessarily assigning them to variables 2 and 4). The circuit implementation satisﬁes the speciﬁcation by alternating indeﬁnitely between both outputs O0 and O1, which represent both grants g1 and g2, independently of the given inputs. This is, in fact, the smallest solution satisfying the simple arbiter speciﬁcation above. The hierarchical Transformer also predicts correct circuit implementations for more involved speciﬁcations where the circuit has to react to inputs (see, for example, Section 5 for an arbiter that prioritizes a certain request).

3.2 Data Generation From the LTL track of SYNTCOMP 2020 [26] we collected 346 benchmarks in Temporal Logic Synthesis Format (TLSF) [27]. Using Sy FCo [27] we translated the TLSF speciﬁcations to the Bo Sy input format [14]. The Bo Sy input format is a JSON-based format representing speciﬁcations as a list of assumptions and a list of guarantees where assumptions and guarantees can be arbitrary LTL formulas. The LTL speciﬁcation results from the implication of the conjunction of assumptions to the conjunction of guarantees. An example of the format for a prioritized arbiter speciﬁcation is shown in Listing 1 in the appendix. From the 346 benchmarks in Bo Sy input format we collected assumptions and guarantees and ﬁltered LTL formulas with more than ﬁve inputs and more than ﬁve outputs, which is a restriction that is comparable with most SYNTCOMP specs: 242 of 346 have 5 inputs and 274 have 5 outputs. Further, we ﬁltered out speciﬁcations with an abstract syntax tree of size greater than 25 resulting in 157 instantiated assumption patterns and 1942 instantiated guarantee patterns. In a ﬁnal step, we renamed inputs and outputs with a uniform random choice from input atomic propositions i0, i1, i2, i3, i4 and a uniform random choice from output atomic propositions o0, o1, o2, o3, o4, respectively. The table below, shows three random examples of assumption patterns and three random examples of guarantee patterns.

assumption patterns guarantee patterns (i0 ( o0 o1) i0) (o2 U i3) o2 i0 (i0 (o3 i3 (o3 i3 (o3 i3)))) ( i0 o3 o2 o1 o0 i0) ( o2 o4)

Given the set of speciﬁcation patterns, we generate a dataset for supervised learning, i.e., pairs of speciﬁcations and systems, by combining randomly instantiated speciﬁcation patterns. Speciﬁcally, we alternate between sampling guarantees until the speciﬁcation becomes unrealizable and sampling assumptions until the speciﬁcation becomes realizable where the number of trials to ﬁnd a suitable assumption is limited to 5. Further, we implemented stopping criteria that limit the maximal number of guarantees to 10, the maximal number of assumptions to 3, and the runtime for the synthesis tool to 120 seconds. If the resulting speciﬁcation is unrealizable we also consider its realizable predecessor for our dataset. Apart from that intermediate speciﬁcations are discarded. To synthesize speciﬁcations, we use the LTL synthesis tool Strix [35]. Systems are represented in the AIGER format. For unrealizable speciﬁcations we provide an AIGER circuit representing the winning strategy for the environment, i.e., a counter strategy showing that the speciﬁcation is unrealizable. When synthesizing speciﬁcations, we provide all ﬁve inputs i0, i1, i2, i3, i4 and all ﬁve outputs o0, o1, o2, o3, o4 to the tool such that all AIGER circuits in our datasets have the same ﬁve inputs and the same ﬁve outputs. Based on the AIGER format, we apply two additional ﬁlters when generating data: 1) we ﬁlter AIGER circuits exceeding a maximum variable index of 50, 2) we ﬁlter AIGER circuits with k AND gates if the number of circuits in the dataset with k AND gates exceeds 20% of the dataset size. This ﬁltering especially reduces the number of circuits that have a low amount of AND gates.

Figure 3: A hierarchical Transformer [33] ﬁrst encodes assumption and guarantee patterns in isolation, before encoding them globally.

The data generation method allows to generate a large number of speciﬁcations from a comparatively small set of speciﬁcation patterns; especially the generation of speciﬁcations that include meaningful assumptions and are realized by complex implementations. Following the described method, we constructed a dataset containing 250 000 unique samples split into 200 000 training samples, 25 000 validation samples, and 25 000 test samples. We included the unrealizable speciﬁcations met through the ﬁrst stopping criteria such that half of the dataset consists of unrealizable speciﬁcations. Figure 1 shows an example of a realizable speciﬁcation in the test data.

4 Experimental Setup

Based on the code base of Deep LTL [21] (MIT license), we implemented a hierarchical Transformer [33] and augment it with a tree-positional encoding [47].2 In contrast to a baseline Transformer, the encoder has two types of layers, local and global layers.

aag 11 5 1 5 5 2 4 6 8 10 12 1 1 0 14 16 22 14 12 10 16 13 10 18 4 2 20 19 11 22 21 13 i0 i0 i1 i1 i2 i2 i3 i3 i4 i4 l0 l0 o0 o0 o1 o1 o2 o2 o3 o3 o4 o4

Figure 4: AIGER representation of the circuit in Figure 1.

The local layers encode individual assumptions and guarantees, and only the global layers can combine the representations of tokens across all assumptions and all guarantees. With this hierarchical encoding, we gain approximately 10% of accuracy across all models compared to using a standard Transformer (see Figure 6). Figure 3 sketches the use of local and global layers in the encoder for our setting.

We trained hierarchical Transformers with model dimension 256. The dimension of the feed-forward networks was set to 1024. The encoder employs 4 local layers followed by 4 global layers, and the decoder employs 8 (unmodiﬁed) layers. All our attention layers use 4 attention heads. We trained with a batch size of 256 for 30 000 steps and saved the model with the best accuracy per sequence on the validation data. We trained on an NVIDIA DGX A100 system for around 10 hours.

4.1 Training Details The Transformer architecture is a sequence-to-sequence model trained to predict a sequence of output tokens provided a sequence of input tokens. Similarly, we provide multiple sequences of input tokens to an hierarchical Transformer. Assumptions and guarantees are LTL formulas and can thus be directly represented as sequences of tokens with each atomic proposition, Boolean operator, temporal operator, and Boolean constant being a separate token. We omit

2The code, our data sets, and data generators are part of the Python library ML2 (https://github.com/ reactive-systems/ml2).

Number of AND Gates

5 10 15 20 25

Number of AND Gates

5 10 15 20 25

Number of AND Gates

Number of AND Gates

Figure 5: Accuracy with respect to the size of the synthesized circuits measured by the number of AND gates for test set (top, left), SYNTCOMP (top, right), timeouts (bottom, left), and smart home benchmarks (bottom, right). Number of AND gates are binned into intervals of size 5.

parentheses because we add a tree-positional encoding [47] that identiﬁes each token with its position in the abstract syntax tree of the LTL formula (see Figure 2). To distinguish assumptions from guarantees in the global step we prepend assumptions with a special assumption token. Circuits are in AIGER format that we represent as a sequence of tokens by representing each integer with a corresponding token and replacing each newline character with a special new line token. Since all circuits in our dataset have the same inputs and outputs we can omit the header and the symbol table when tokenizing an AIGER circuit. Additionally, we include a special realizability token at the beginning of the sequence indicating whether a speciﬁcation is realizable.

We trained all models using the Adam optimizer [29] with β1 = 0.9, β2 = 0.98 and ϵ = 10 9. The optimizer was used with a learning rate schedule proposed by Vaswani et al. [51] that increases the learning rate linear for a given number of warmup steps followed by a decreasing learning rate proportionally to the inverse square root of the step number. In our experiments, we used 4000 warmup steps as proposed by Vaswani et al. [51].

4.2 Performance Measures

There are inﬁnitely many circuits satisfying a realizable LTL speciﬁcation. To evaluate the performance of the trained models we thus distinguish between the syntactic accuracy and the semantic accuracy: For a dataset of speciﬁcations and systems, the syntactic accuracy measures the percentage of the Transformer s predictions that match the circuit in the dataset. Potentially, a prediction that does not match the system still satisﬁes the speciﬁcation. We thus also measure the semantic accuracy, i.e., the percentage of the Transformer s predictions that satisfy the speciﬁcation. Note that, when using a beam search algorithm only one of the predictions needs to match the system in the dataset or satisfy the speciﬁcation, respectively. To model check predictions we use the nu Xmv model checker [10]. When training Transformers on the (easier) LTL trace generation problem [22], a signiﬁcant difference between syntactic and semantic accuracy was observed. It appears that the Transformers rather generalize to the semantics of the logic than the particularities of the data generator. As we will see in the next section, our results are consistent with this observation even for the harder problem of predicting circuits.

0 5 000 10 000 15 000 20 000 25 000 30 000 0

accuracy per sequence

Figure 6: Accuracy per sequence over the training course shown for the training split (red) and validation split (light red) when training the hierarchical Transformer and for the training split (green) and the validation split (light green) when training the standard Transformer.

5 Experiments

In this section, we report on a variety of experiments that analyze the performance of hierarchical Transformers on the circuit synthesis task and their generalization behavior. In the following, we will ﬁrst analyze the overall performance of the models and see that they often construct different solutions, yet correct ones, than the classical tool we generated the training data with. For this, we consider four different test sets and group results on the size of the predicted circuits. Secondly, we compare the training with our data mining method against the ground truth, i.e., against a training of a hierarchical Transformer on the raw SYNTCOMP benchmarks. Thirdly, we compare the models performance on realizable and unrealizable speciﬁcations, where for the latter the model is supposed to construct a circuit representing a counter strategy. Lastly, we will take a deeper look into one of the speciﬁcations, which, compared to the example in Section 3 is an arbiter that prioritizes a certain request.

Overall results. We tested our models on four different datasets. A Testset consisting of heldout instances generated by our data mining method, the SYNTCOMP set, consisting of the synthesis competition benchmarks, a set Timeouts that consists of generated speciﬁcations on which Strix, the classical synthesis tool that we used for generating the circuits, timed out (< 120s), and an out-of-distribution (OOD) benchmark set Smart Home consisting of speciﬁcations for smart homes. We consistently observed in all experiments that the beam search signiﬁcantly increases the accuracy. When analyzing the results we found that the beam search often yields several correct circuits. For a beam size of 16 and the Testset, on average 4.6 of the 16 AIGER circuits satisfy the speciﬁcation.

In our Testset (see Table 1), we observe in many cases that the circuit prediction of our model is different from the circuit the tool would synthesize. Since it has already shown that this gap between syntactic and semantic accuracy exists for such tasks [22], we concentrate on the semantic accuracy, i.e., the total accuracy. When analyzing the size of those circuits, we found both smaller and larger

Dataset Beam Size 1 Beam Size 4 Beam Size 8 Beam Size 16

Testset 53.6(31.1) 2.4 70.4(39.0) 2.3 75.8(41.9) 2.1 79.9(44.5) 2.0 SYNTCOMP 51.9 2.2 60.0 1.5 63.6 1.9 66.8 1.2 Timeouts 11.7 1.1 21.1 0.9 25.9 1.0 30.1 1.2 Smart Home 22.9 3.6 31.4 7.1 44.8 6.5 40.0 6.5

Table 1: Accuracy reported on test data, SYNTCOMP benchmarks, timeouts, and smart home benchmarks for different beam sizes averaged over 5 trainings including standard deviation. For the test data we show the syntactic accuracy in parenthesis.

(assumptions) ( ( (rm))) (guarantees:) ( (( (gm)) ( (g0)))) ( ((r0) ( (g0)))) ( ((rm) ( (( (g0)) U(gm)))))

aag 7 5 1 5 1 2 4 6 8 10 12 4 14 0 13 14 0 14 12 5

Figure 7: The speciﬁcation (left), the predicted AIGER circuit (middle) and the visualization of the circuit (right) for a prioritizing arbiter.

circuits, with no signiﬁcant decrease or increase in average circuit size. In total, the model was able to solve 79.9% of the held-out generated test instances with a beam size of 16.

While the training data is based on speciﬁcation patterns extracted from SYNTCOMP benchmarks it is unlikely that our data generation process reassembles SYNTCOMP benchmarks. This allows to evaluate the model on them. After ﬁltering out benchmarks with more than 5 inputs/outputs, more than 12 properties, and properties of size greater than 25, the model achieved an accuracy of 66.8% for the resulting 145 benchmarks using a beam size of 16.

For a timed out speciﬁcation it is not known whether it is realizable or unrealizable. The model achieves an accuracy of 30.1% for beam size 16 demonstrates that our approach can yield performance gains in practice. To highlight the capabilities of our model we display in Figure 9 in the appendix the largest circuit that is predicted for a timed out speciﬁcation and satisﬁes the speciﬁcation.

We constructed the Smart Home set, with the same restriction as for SYNTCOMP, from a recently published benchmark set for synthesizing smart home applications [1]. The hierarchical Transformer is able to solve 44.8% of the provided instances. When compared to the full benchmark (i.e., without the size restrictions), the model solved 11.1% of the formulas. Note that this benchmark set was not used to mine speciﬁcations from and the benchmarks include instances with larger assumptions and guarantees than seen during training.

We also analyzed the performance of the model depending on the size of the predicted circuit. Results are shown in Figure 5. As expected, for larger circuit implementations, the model accuracy drops. The size distribution of the training data resembles the size distribution of the test set (top left in Figure 5 and Figure 8). Meaning that the model has seen a signiﬁcantly lower percentage of large circuits during training. Future experiments have to determine how large the training data and the hierarchical Transformers could be scaled, before the training process breaks down.

Training on raw SYNTCOMP benchmarks. We did a baseline experiment by training with various batch sizes on the raw SYNTCOMP benchmark. This training (not surprisingly) fails whereas our data generation method enables a stable training.

Unrealizabile Speciﬁcations. The training data contains both realizable and unrealizable speciﬁcations. In Table 3 we analyze the accuracy for realizable and unrealizable speciﬁcations separately on our test data. While the syntactic accuracy is higher for realizable speciﬁcations, in terms of the

Beam Size 1 Beam Size 4 Beam Size 8 Beam Size 16

Realizable 50.8 (39.0) 64.3 (48.0) 67.5 (50.0) 70.7 (52.6) Unrealizable 55.4 (23.0) 74.6 (31.9) 81.0 (35.2) 86.7 (39.0)

Table 2: Accuracy on Testset reported separately for realizable and unrealizable speciﬁcations. For different beam sizes we report the semantic accuracy and the syntactic accuracy in parenthesis.

semantic accuracy the model solves unrealizable speciﬁcations more accurately. Further, we found for a beam size of 1 that the Transformer predicts the correct realizability token for 91.4% of the speciﬁcations from the test data.

Prioritizing arbiter. Building on the example of Section 3, we show that the model can handle more interesting, real-world speciﬁcations. Figure 7 shows the speciﬁcation, AIGER ﬁle and the circuit visualization of an arbiter that prioritizes one of the requests whenever access is requested by both processes at the same time; meaning that the implementation can no longer ignore the input as for the example in Section 3.

6 Conclusion

We proposed a method to address the lack of data for training a neural network on the task of synthesizing circuits out of LTL speciﬁcations. We mine speciﬁcation patterns from the annual reactive synthesis competition (SYNTCOMP) and generate new formulas by combining multiple speciﬁcation patterns. We showed that this dataset can be used to successfully train hierarchical Transformers on the LTL synthesis problem for speciﬁcations composed of speciﬁcation patterns. We also showed that the models generalize to unseen speciﬁcations, including speciﬁcations that are both realizable and unrealizable and speciﬁcations that cannot be solved by a classical synthesis tool within a time limit of 120 seconds. Furthermore, we performed an out-of-distribution test on a recently added benchmark set on synthesis problems for smart homes. Experimental results suggest that the Transformer can be especially useful for predicting unrealizability.

Acknowledgments and Disclosure of Funding

This work was partially supported by the European Research Council (ERC) Grant OSARES (No. 683300) and the Collaborative Research Center Foundations of Perspicuous Software Systems (TRR 248, 389792660).

[1] J.A.R.V.I.S. TSL/TLSF benchmark suite, 2021. URL https://github.com/SYNTCOMP/ benchmarks/tree/master/tlsf/tsl_smart_home_jarvis.

[2] M. Balunovic, P. Bielik, and M. T. Vechev. Learning to solve SMT formulas. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, Neur IPS 2018, December 3-8, 2018, Montréal, Canada, pages 10338 10349, 2018.

[3] K. Bansal, S. M. Loos, M. N. Rabe, C. Szegedy, and S. Wilcox. HOList: An environment for machine learning of higher order logic theorem proving. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 454 463. PMLR, 2019.

[4] A. Biere. The AIGER and-inverter graph (AIG) format version 20071012. FMV Reports Series, Institute for Formal Models and Veriﬁcation, Johannes Kepler University, Altenbergerstr, 69: 4040, 2007.

[5] R. Bloem, S. Galler, B. Jobstmann, N. Piterman, A. Pnueli, and M. Weiglhofer. Automatic hardware synthesis from speciﬁcations: A case study. In 2007 Design, Automation & Test in Europe Conference & Exhibition, pages 1 6. IEEE, 2007.

[6] A. Bohy, V. Bruyere, E. Filiot, N. Jin, and J.-F. Raskin. Acacia+, a tool for LTL synthesis. In International Conference on Computer Aided Veriﬁcation, pages 652 657. Springer, 2012.

[7] J. R. Buchi and L. H. Landweber. Solving sequential conditions by ﬁnite-state strategies. In The Collected Works of J. Richard Büchi, pages 525 541. Springer, 1990.

[8] P. Cairns. Informalising formal mathematics: Searching the mizar library with latent semantics. In International Conference on Mathematical Knowledge Management, pages 58 72. Springer, 2004.

[9] C. S. Calude, S. Jain, B. Khoussainov, W. Li, and F. Stephan. Deciding parity games in quasi-polynomial time. SIAM Journal on Computing, (0):STOC17 152, 2020.

[10] R. Cavada, A. Cimatti, M. Dorigatti, A. Griggio, A. Mariotti, A. Micheli, S. Mover, M. Roveri, and S. Tonetta. The nu Xmv symbolic model checker. In Computer Aided Veriﬁcation - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings, volume 8559 of Lecture Notes in Computer Science, pages 334 342. Springer, 2014. doi: 10.1007/978-3-319-08867-9\_22.

[11] A. Church. Application of recursive arithmetic to the problem of circuit synthesis. 1963.

[12] M. B. Dwyer, G. S. Avrunin, and J. C. Corbett. Property speciﬁcation patterns for ﬁnitestate veriﬁcation. In Proceedings of the Second Workshop on Formal Methods in Software Practice, March 4-5, 1998, Clearwater Beach, Florida, USA, pages 7 15. ACM, 1998. doi: 10.1145/298595.298598.

[13] K. Etessami and G. J. Holzmann. Optimizing büchi automata. In C. Palamidessi, editor, CONCUR 2000 - Concurrency Theory, 11th International Conference, University Park, PA, USA, August 22-25, 2000, Proceedings, volume 1877 of Lecture Notes in Computer Science, pages 153 167. Springer, 2000. doi: 10.1007/3-540-44618-4\_13. URL https://doi.org/ 10.1007/3-540-44618-4_13.

[14] P. Faymonville, B. Finkbeiner, and L. Tentrup. Bo Sy: An experimentation framework for bounded synthesis. In Computer Aided Veriﬁcation - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II, volume 10427 of Lecture Notes in Computer Science, pages 325 332. Springer, 2017. doi: 10.1007/978-3-319-63390-9\_17.

[15] P. Faymonville, B. Finkbeiner, and L. Tentrup. Bo Sy: An experimentation framework for bounded synthesis. In International Conference on Computer Aided Veriﬁcation, pages 325 332. Springer, 2017.

[16] B. Finkbeiner and S. Schewe. Bounded synthesis. International Journal on Software Tools for Technology Transfer, 15(5):519 539, 2013.

[17] B. Finkbeiner, C. Hahn, P. Lukert, M. Stenger, and L. Tentrup. Synthesis from hyperproperties. Acta informatica, 57(1):137 163, 2020.

[18] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 1263 1272. PMLR, 2017.

[19] L. Grunske. Speciﬁcation patterns for probabilistic quality properties. In 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany, May 10-18, 2008, pages 31 40. ACM, 2008. doi: 10.1145/1368088.1368094.

[20] S. Gulwani, J. Hernández-Orallo, E. Kitzelmann, S. H. Muggleton, U. Schmid, and B. Zorn. Inductive programming meets the real world. Communications of the ACM, 58(11):90 99, 2015.

[21] C. Hahn, F. Schmitt, J. U. Kreber, M. N. Rabe, and B. Finkbeiner. Deepltl, 2020. URL https://github.com/reactive-systems/deepltl.

[22] C. Hahn, F. Schmitt, J. U. Kreber, M. N. Rabe, and B. Finkbeiner. Teaching temporal logics to neural networks. International Conference on Learning Representations, ICLR, 2021.

[23] J. Holeˇcek, T. Kratochvíla, V. ˇRehák, D. Šafránek, P. Šimeˇcek, et al. Veriﬁcation results in Liberouter project, 2004.

[24] IEEE-Commission et al. IEEE standard for property speciﬁcation language (PSL). IEEE Std 1850-2005, 2005.

[25] S. Jacobs and G. A. Pérez. The reactive synthesis competition. http://www.syntcomp.org, .

[26] S. Jacobs and G. A. Pérez. The 7th reactive synthesis competition: SYNTCOMP 2020. http://www.syntcomp.org/syntcomp-2020-results/, .

[27] S. Jacobs, F. Klein, and S. Schirmer. A high-level LTL synthesis format: TLSF v1.1. In Proceedings Fifth Workshop on Synthesis, SYNT@CAV 2016, Toronto, Canada, July 17-18, 2016, volume 229 of EPTCS, pages 112 132, 2016. doi: 10.4204/EPTCS.229.10.

[28] C. Kaliszyk and J. Urban. Learning-assisted automated reasoning with ﬂyspeck. Journal of Automated Reasoning, 53(2):173 213, 2014.

[29] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.

[30] S. Konrad and B. H. C. Cheng. Real-time speciﬁcation patterns. In 27th International Conference on Software Engineering (ICSE 2005), 15-21 May 2005, St. Louis, Missouri, USA, pages 372 381. ACM, 2005. doi: 10.1145/1062455.1062526.

[31] G. Lample and F. Charton. Deep learning for symbolic mathematics. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. Open Review.net, 2020.

[32] G. Lederman, M. N. Rabe, S. Seshia, and E. A. Lee. Learning heuristics for quantiﬁed boolean formulas through reinforcement learning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. Open Review.net, 2020.

[33] W. Li, L. Yu, Y. Wu, and L. C. Paulson. Isar Step: a benchmark for high-level mathematical reasoning. In International Conference on Learning Representations, 2021.

[34] J. Meng and L. C. Paulson. Lightweight relevance ﬁltering for machine-generated resolution problems. Journal of Applied Logic, 7(1):41 57, 2009.

[35] P. J. Meyer, S. Sickert, and M. Luttenberger. Strix: Explicit reactive synthesis strikes back! In Computer Aided Veriﬁcation - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, Flo C 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part I, volume 10981 of Lecture Notes in Computer Science, pages 578 586. Springer, 2018. doi: 10.1007/978-3-319-96145-3\_31.

[36] A. Paliwal, S. M. Loos, M. N. Rabe, K. Bansal, and C. Szegedy. Graph representations for higher-order logic and theorem proving. In The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence, AAAI 2020, New York, NY, USA, February 7-12, 2020, pages 2967 2974. AAAI Press, 2020.

[37] E. Parisotto, A.-r. Mohamed, R. Singh, L. Li, D. Zhou, and P. Kohli. Neuro-symbolic program synthesis. ar Xiv preprint ar Xiv:1611.01855, 2016.

[38] R. Pelánek. BEEM: benchmarks for explicit model checkers. In D. Bosnacki and S. Edelkamp, editors, Model Checking Software, 14th International SPIN Workshop, Berlin, Germany, July 1-3, 2007, Proceedings, volume 4595 of Lecture Notes in Computer Science, pages 263 267. Springer, 2007. doi: 10.1007/978-3-540-73370-6\_17. URL https://doi.org/10.1007/ 978-3-540-73370-6_17.

[39] N. Piterman, A. Pnueli, and Y. Sa ar. Synthesis of reactive(1) designs. In E. A. Emerson and K. S. Namjoshi, editors, Veriﬁcation, Model Checking, and Abstract Interpretation, 7th International Conference, VMCAI 2006, Charleston, SC, USA, January 8-10, 2006, Proceedings, volume 3855 of Lecture Notes in Computer Science, pages 364 380. Springer, 2006. doi: 10.1007/11609773\_24. URL https://doi.org/10.1007/11609773_24.

[40] A. Pneuli and R. Rosner. Distributed reactive systems are hard to synthesize. In Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science, pages 746 757. IEEE, 1990.

[41] M. N. Rabe, D. Lee, K. Bansal, and C. Szegedy. Mathematical reasoning via self-supervised skip-tree training. 2020.

[42] B. Rozière, M. Lachaux, L. Chanussot, and G. Lample. Unsupervised translation of programming languages. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 612, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/ ed23fbf18c2cd35f8c7f8de44f85c08d-Abstract.html.

[43] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. IEEE Trans. Neural Networks, 20(1):61 80, 2009. doi: 10.1109/TNN.2008. 2005605.

[44] S. Schulz. System description: E 1.8. In International Conference on Logic for Programming Artiﬁcial Intelligence and Reasoning, pages 735 743. Springer, 2013.

[45] D. Selsam and N. Bjørner. Guiding high-performance SAT solvers with unsat-core predictions. In Theory and Applications of Satisﬁability Testing - SAT 2019 - 22nd International Conference, SAT 2019, Lisbon, Portugal, July 9-12, 2019, Proceedings, volume 11628 of Lecture Notes in Computer Science, pages 336 353. Springer, 2019. doi: 10.1007/978-3-030-24258-9\_24.

[46] D. Selsam, M. Lamm, B. Bünz, P. Liang, L. de Moura, and D. L. Dill. Learning a SAT solver from single-bit supervision. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. Open Review.net, 2019.

[47] V. L. Shiv and C. Quirk. Novel positional encodings to enable tree-based transformers. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. B. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 12058 12068, 2019. URL https://proceedings.neurips.cc/paper/ 2019/hash/6e0917469214d8fbd8c517dcdc6b8dcf-Abstract.html.

[48] J. Urban. MPTP motivation, implementation, ﬁrst experiments. Journal of Automated Reasoning, 33(3-4):319 339, 2004.

[49] J. Urban. Malarea: a metasystem for automated reasoning in large theories. ESARLT, 257, 2007.

[50] J. Urban, G. Sutcliffe, P. Pudlák, and J. Vyskoˇcil. Malarea sg1-machine learner for automated reasoning with semantic guidance. In International Joint Conference on Automated Reasoning, pages 441 456. Springer, 2008.

[51] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998 6008, 2017. URL https://proceedings.neurips. cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.