# learning_broadcast_protocols__54f9a9a0.pdf

Learning Broadcast Protocols

Dana Fisman*1, Noa Izsak*1 , Swen Jacobs*2

1Ben-Gurion University 2CISPA Helmholtz Center for Information Security dana@cs.bgu.ac.il, izsak@post.bgu.ac.il, jacobs@cispa.de

The problem of learning a computational model from examples has been receiving growing attention. For the particularly challenging problem of learning models of distributed systems, existing results are restricted to models with a fixed number of interacting processes. In this work we look for the first time (to the best of our knowledge) at the problem of learning a distributed system with an arbitrary number of processes, assuming only that there exists a cutoff, i.e., a number of processes that is sufficient to produce all observable behaviors. Specifically, we consider fine broadcast protocols, these are broadcast protocols (BPs) with a finite cutoff and no hidden states. We provide a learning algorithm that can infer a correct BP from a sample that is consistent with a fine BP, and a minimal equivalent BP if the sample is sufficiently complete. On the negative side we show that (a) characteristic sets of exponential size are unavoidable, (b) the consistency problem for fine BPs is NP hard, and (c) that fine BPs are not polynomially predictable.

1 Introduction Learning computational models has a long history starting with the seminal works of Gold (1967; 1978) and Angluin (1987). Questions regarding learning computational models have raised a lot of interest both in the artificial intelligence community and the verification community. (Peled et al. (2002), Vaandrager (2017)). Many results regarding the learnability of various computational models used in verification have already been obtained (Beimel et al. 2000; Bollig et al. 2013; Decker et al. 2014), Angluin et al. (2015), Nitay et al. (2023), Roy et al. (2023). Particularly challenging is learning of concurrent computational models. Compared to most sequential models, they offer another level of succinctness, and they usually have no unique minimal model. Both of these aspects can make learning significantly more difficult. Various results regarding learning concurrent models have already been obtained (Bollig et al. 2010), Esparza et al. (2011), Muscholl et al. (2022). However, these results are limited to models with

*These authors contributed equally. This author was supported by ISF grant 2507/21 & Frankel Center for Computer Science, BGU. Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

a fixed number of processes, and therefore cannot reliably learn models for (distributed) protocols that are expected to run correctly for any number of processes. Broadcast protocols (BPs) are a powerful concurrent computational model, allowing the synchronous communication of the sender of an action with an arbitrary number of receivers (Emerson and Namjoshi 1998). The basic model assumes that communication and processes are reliable, i.e., it does not consider communication failures or faulty processes. BPs have mainly been studied in the context of parameterized verification, i.e., proving functional correctness according to a formal specification, for all systems where an arbitrary number of processes execute a given protocol. The challenge in reasoning about parameterized systems such as BPs is that a parameterized system concisely represents an infinite family of systems: for each natural number n it includes the system where n indistinguishable processes interact. The system is correct only if it satisfies the specification for any number n of processes interacting. In the context of verification, a variety of approaches has been investigated to overcome this challenge. Some of these approaches are based on the notion of cutoff, i.e., a number c of processes such that a given property holds for any instance of the system with n c processes if and only if it holds for the cutoff system, where the cutoff system is a system with exactly c processes interacting. In the literature, many results exist that provide cutoffs for certain classes of properties in a given computational model (Emerson and Namjoshi 2003; Emerson and Kahlon 2000), Außerlechner et al. (2016). Moreover, cutoffs also enable the synthesis of implementations for parameterized systems from formal specifications (Jacobs and Bloem 2014), a problem closely related to learning. In this paper, we develop a learning approach for BPs. Given the expressiveness of BPs and the complexity of the general problem, we make some assumptions to keep the problem manageable. In particular, we assume (1) that the BP under consideration has no hidden states, i.e., every state has at least one broadcast sending action by which it can be recognized; and (2) that there exists a cutoff, i.e., a number c such that the language (of finite words over actions) derived by c processes is the same as the language derived by any number greater than c. We call such BPs fine, and note that many BPs studied in the literature are fine.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Moreover, the restriction to fine BPs does not overly simplify the problem, as even under this assumption we obtain negative results for some basic learning problems. We note that not all BPs have a cutoff (whether or not they have hidden states), and that when a cutoff exists the derived language is regular. The fact that the derived language is regular also holds in previous works on learning concurrent models (communicating automata (Bollig et al. 2010), workflow Petri nets (Esparza et al. (2011)), and negotiation protocols (Muscholl et al. (2022))) merely since a finite number of essentially finite state machines is in consideration.1 We emphasize that this does not reduce the problem to learning a regular language, since the aim is to obtain the concurrent representation, which we show to be much more succinct than a DFA for the language. Moreover, the problem we consider goes way beyond what has been considered in previous works in the sense that our approach works if there exists a cutoff, but it does not require that the cutoff is known a priori, which is equivalent to the assumption of a (given) fixed number of processes in existing approaches. We focus mainly on passive learning paradigms (de la Higuera 2010). Specifically, we consider the following problems: 1. Inference given a sample consistent with a BP, return a BP that is consistent with the sample, 2. Consistency whether there exists a BP with at most k states that agrees with a given sample, 3. Polynomial data whether characteristic sets are of polynomial size, and 4. Polynomial Predictability whether a learner can correctly classify an unknown word with high probability after asking polynomially many membership (MQ) and draw queries(DR). We prove a few properties of fine BPs relevant to learning in 3. In 4, we provide an inference algorithm that, given a sample of words that are consistent with a fine BP, infers a correct BP. In 5, we show that the inference algorithm produces a minimal equivalent BP when the sample subsumes a characteristic set, and that characteristic sets of exponential size are unavoidable. In 6, we show that consistency is NP-hard for the class of fine BPs. In 7 we show that fine BPs are not polynomially predictable. For complete proofs see the extended version (Fisman, Izsak, and Jacobs 2023).

2 Preliminaries Broadcast Protocols In the following we define broadcast protocols as introduced by Emerson and Namjoshi (1998) and studied in the seminal paper by Esparza et al. (1999). Broadcast protocols are one of the most powerful computational models for which some parameterized verification problems are still decidable, and are strictly more powerful than other standard communication primitives such as pairwise rendezvous or disjunctive guards (Emerson and Kahlon 2003).

Broadcast Protocols (BPs) A broadcast protocol B = (S, s0, L, R) consists of a finite set of states S with an initial state s0 S, a set of labels L and a transition relation

1The language of a BP in general need not be regular (Finkel and Schnoebelen 2001; Geeraerts, Raskin, and Begin 2007) and this is true also with the restriction to no hidden states.

Ma = 1 1 0 0

Mb = 0 0 1 1

b!!, b?? a??

Figure 1: A simple BP (left) and its algebraic representation.

R S L S, where L = {a!!, a?? | a A} for some set of actions A. A transition labeled with a!! is a broadcast sending transition, and a transition labeled with a?? is a broadcast receiving transition, also called a response.2 For each action a A, there must be exactly one outgoing response from every state. Given a BP B = (S, s0, L, R) we consider systems Bn, composed of n identical processes that execute B. Let [n] denote the set {0, 1, . . . , n}. A configuration of Bn is a function q : S [n], assigning to each state a number of processes. The initial configuration q0 is the configuration with q0(s0) = n and q0(s) = 0 for all s = s0. In a global transition, all processes make a move: One process takes a sending transition (labeled a!!), modeling that it broadcasts the value a to all the others processes in the system. Simultaneously, all of the other processes take the receiving transition (labeled a??) from their current state.3

Example 2.1. Fig.1 (left) depicts a simple BP B. In the initial configuration of the system B9 we have 9 processes in s0. If a process broadcasts a, it moves to s1 via the transition label a!!. The other processes respond following the a?? transition from their current state, hence they remain in s0. Now we are in a configuration q with one process in s1 and 8 in s0. If from q another process broadcasts a, it moves from s0 to s1. The processes in s0 stay there (following a?? from s0), and the process in s1 moves back to s0 (following a?? from s1). Thus, we return to q . If from q the process in s1 broadcasts b then it stays in s1 (following b!!), while all other processes move to s1 via b?? i.e., all will be in s1.

Following Esparza et al. (1999), we make the standard assumption that for each action a, there is a unique state sa with an outgoing sending transition on a!!. A state s in a broadcast protocol is said to be hidden if it has no outgoing sending transition. In this paper we consider broadcast protocols with no hidden states. Note that the additional assumption of no hidden states is modest, since many examples from the literature satisfy it (e.g., the MESI protocol in Esparza et al. (1999) or the last-in first-served protocol in Delzanno et al. (1999)), and every protocol that does not satisfy this restriction can easily be modified to satisfy it without changing its functionality.

Semantics of Bn We can represent transitions of a system Bn algebraically. Assuming some ordering s0, s1, ...s|S| 1 on the set of states S, we can identify configurations of Bn

with vectors from [n]|S|, also called state-vectors. We use

2Some models of BPs also consider rendezvous transitions, usually labeled with a! and a?, but these can be simulated by broadcast transitions with a quadratic blowup in the number of states. 3We give a formal semantics of Bn below.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

q[i] to denote the entry in position i of a state-vector q. For example, let uj be the unit vector with uj[j]=1 and uj[i]=0 for all i = j. Then the configuration where all n processes are in s0 is the vector n u0. If q is a state-vector with q[i] 1 we say that i is lit in q. If state si has an outgoing sending transition on action a we say that a is enabled in si; if i is lit in q we also say that a is enabled in q. With each action a we can associate (1) two unit-vectors va = ui for the origin and v a = uj for the destination, following its sending transition (si, a!!, sj), and (2) a broadcast matrix Ma, which is an |S| |S| matrix with Ma(m, k) = 1 if there is a response (sk, a??, sm) R, and Ma(m, k) = 0 otherwise. Every column of such a matrix is a unit vector. Then the transitions T of Bn are defined as follows: there is a transition between configurations q and q on action a in Bn, denoted (q, a, q ) T, iff there exists (si, a!!, sj) R with q[i] 1 and: q = Ma (q va) + v a. To see this, note that the state-vector q va corresponds to the sending process leaving the state si. The state-vector Ma (q va) describes the situation after the other processes take the responses on a. Finally, q is the resulting statevector after the sending process arrives at its target location. Example 2.2. Consider again the BP in Fig.1, depicted with the broadcast matrices for the two actions Ma and Mb, and the associated origin and destination vectors va, v a, vb, v b. In configuration q = [ 2 2 ], a is enabled (s0 is lit). Computing the effect of a, we first get q va = [ 1 2 ], then Ma [ 1 2 ] = [ 3 0 ], and finally, q = [ 3 0 ] + v a = [ 3 1 ]. An execution of Bn is a finite sequence e = q0, a1, q1, a2, . . ., am, qm such that (qi, ai+1, qi+1) T for every i [m 1]. We say that e is based on the sequence of actions a1, . . . , am and that Bn(a1 . . . am) = qm. We say that a word w A is feasible in Bn if there is an execution of Bn based on w. The language of Bn, denoted L(Bn), is the set of all words that are feasible in Bn, and the language of B, denoted L(B), is the union of L(Bn) over all n N. Let B1 and B2 be two BPs. We say that B1 and B2 are equivalent iff L(B1)=L(B2). A BP B is said to have a cutoff k N if for any k > k it holds that L(Bk) = L(Bk ). A BP with no hidden states is termed fine if it has a cutoff. We measure the size of a BP by its number of states. Thus, a BP is termed minimal if there is no equivalent BP with fewer states. Note that unlike the case of DFAs, there is no unique minimal fine BP, as shown by the following example. Example 2.3. Fig.2 shows two BPs B1 and B2. Note that L(B1 1) = a , since with a single process, a is the only possible action from q0, and we arrive in q0 after executing it. With 2 processes, after executing a we arrive in state-vector [ 1 1 ], and we can execute either a or b, and each of them brings us to [ 1 1 ] again. Therefore, L(B2 1) = a(a b) . Moreover, adding more processes does not change the language, i.e., L(Bn 1 ) = L(B2 1) for all n. Hence L(B1) = L(B2 1) and the cutoff of B1 is 2. Similarly, we can show that L(B2) = a(a b) and the cutoff of B2 is 2. Note that B1 and B2 are not isomorphic, but they are equivalent. We note that the aforementioned examples from the literature (Esparza et al. (1999) and Delzanno et al. (1999)) also have a cutoff, and thus are fine BPs.

0 1 B1 : 0 1 B2 : a??, b??

b!! a!! a??, b??

a!!, b?? b!!, a??, b??

Figure 2: Two non-isomorphic equivalent fine BPs

Learning Problems A sample for a BP B is a set S of triples in A N B where B = {T, F}. A triple (w, n, T) (resp. (w, n, F)) is consistent with B if w is feasible (resp. infeasible) in Bn. A sample is consistent with B if all triples in it are consistent with B. The size of S is defined as the sum of length of words in it. We consider the following problems related to learning a class C of computational models, phrased for BPs.

Problem 2.1 (Inference). Devise an algorithm that given a sample S that is consistent with some BP in C returns a BP B C that is consistent with S. We refer to such an algorithm as an inference algorithm.

Obviously one would prefer the returned BP to be minimal or sufficiently small. Phrased as a decision problem this is the consistency problem.

Problem 2.2 (Consistency). Given a sample S and k N determine whether there exists a BP B C consistent with S with at most k states.

The consistency problem is NP-hard even for DFAs (Gold 1978). Thus, inference algorithms are often based on SAT or SMT solvers In 4 we provide such an inference algorithm for fine BPs. Note that in many cases it is possible to devise a trivial inference algorithm (e.g., for DFAs the prefix-tree automaton) that is correct on the sample but makes no generalization and does not attempt to minimize the returned result. We show in 5 that if the sample is sufficiently complete (subsumes a characteristic set), the inference algorithm we provide in fact returns the minimal BP that agrees with the sample. One may thus ask how large should a characteristic set be. We show that, unfortunately, it can be of size exponential in the number of states of the BP.

Problem 2.3 (Polynomial data). Does there exist an inference algorithm A such that one can associate with every BP B C a sample SB of size polynomial in B so that A correctly infers L(B) from SB or any sample subsuming it.

The last problem we consider is in the active learning paradigm. Its definition is quite long and deferred to 7.

Problem 2.4 (Polynomial Predictability). Can a learner correctly classify an unknown word with high probability after asking polynomially many membership and draw queries.

3 Properties of Broadcast Protocols Below we establish some properties regarding broadcast protocols that will be useful in devising the learning algorithm. It is not hard to see that the set of feasible words L(B) of a given BP is prefix-closed. That is, if uv is feasible for

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

u, v A then u is feasible as well. Additionally, if w A is feasible in Bk then w is feasible in Bℓfor all ℓ> k. Lemma 3.1 (Prefix-closedness and monotonicity). If B is a BP then L(B) is prefix-closed. Moreover, L(Bk) L(Bℓ) for all ℓ> k. The following lemma asserts that if wa is feasible with n processes but not with m < n, while w is feasible with m processes (for w A and a A) then wa must be feasible with m + 1 processes. Intuitively this is since m processes are enough to execute all actions in w, and therefore any additional processes will take only receiving transitions along w, and will all arrive in the same local state. Thus, if a is not enabled after w with m + 1 processes, then the additional process did not lit the state sa enabling a, and the same is true if we add any bigger number of processes. Lemma 3.2 (Step by step progress). Let w A , a A, and m < n. If w L(Bm) and wa / L(Bm) yet wa L(Bn), then wa L(Bm+1). Recall that fine BPs have no canonical minimal representation, in the sense that, as shown in Ex.2.3, there could be two non-isomorphic BPs for the same language. The lack of a canonical minimal representation often makes it difficult to achieve a learning algorithm. The following important lemma asserts, that while two minimal fine BPs may be non-isomorphic there is a tight correlation between them. Since every action is enabled by a unique state, and every state enables at least one action, in a minimal fine BP the set A is partitioned between states, and if there is a state s1 in B1 whose set of enabled actions is A = {ai1, ai2, . . . , aik} then there should be a state s2 in B2 for which the set of enabled actions is exactly A . So we can define such a mapping between the states of two minimal fine BPs, and it must be that on every word w if pw and qw are the state-vectors B1 and B2 reach after reading w, resp., then if state s1 is lit in pw then the corresponding state s2 (that agrees on the set of enabled actions) is lit in qw. Moreover, the fact that L(B1) = L(B2) guarantees that L(Bm 1 ) = L(Bm 2 ) for any m N. This bundle of claims can be proven together by induction first on the number of processes m, and second the length of the word w. In the following we use f act(s) = A if A is the set of actions enabled in s. Lemma 3.3 (Relation between minimal fine equivalent BPs). Let B1 and B2 be minimal fine BPs with states S1 and S2 such that L(B1)=L(B2). Then for every m N it holds that L(Bm 1 )=L(Bm 2 ) and there exists a bijection h : S1 S2 satisfying that f act(s)=f act(h(s)) for any s S1; and for any w A if Bm 1 (w) = pw and Bm 2 (w) = qw then pw[i] is lit if and only if qw[h(i)] is lit, for every state i.

4 Inferring a BP from a Sample Let S be a sample. The inference algorithm I we devise constructs a BP BS that agrees with S. Let AS be the set of actions that appear in S in at least one feasible word. In order to return a BP BS with no hidden states, we allow BS to have a set of actions A AS.4 We

4Note that in 5 we show that it is enough to consider A = AS if S is sufficiently complete.

use S for the set of states of BS, and s0 for its initial state. We construct a set of constraints that define the BP BS. More precisely, we construct a set of constraints ΨS regarding the behavior of three partial functions f st : A S, f !! :A S, and f ?? a :S S for every a A so that any valuation of these functions that satisfies ΨS implement a BP consistent with the sample. Formally, we say that functions f st, f !!, {f ?? a | a A} implement a BP B = (S, s0, L, R) if for every (si, a!!, sj) R we have f st(a) = si, f !!(a) = sj and for every (si, a??, sj) R we have f ?? a (si) = sj. We also use f act(s) = A if A = {a A | f st(a) = s}. We turn to introduce some terminology regarding the sample. Let Pi be the set of words {w: (w, i, T) S}, and let Ni be {w: (w, i, F) S}. Note that by Lem.3.1 it follows that if w Pi then w is feasible in Bj for every j i. Similarly, if w Ni, then w is infeasible in Bj for every j i. We define N (resp. P) as the union of all Ni s (resp. Pi s). We define a relation between actions a, b AS as follows. We say that a #S b if there exist a word w A S and naturals n n such that (wa, n, T) S and (wb, n , F) S or vice versa (switching the roles of a and b). Following Lem.3.3, a #S b means that the sample S has information contradicting that a and b are enabled in the same state. 1. Our first constraints are therefore that for every a, b A such that a #S b it holds that f st(a) = f st(b). 2. Since we identify states by the set of actions they enable and we assume there are no hidden states, we define the set of states S = {f st(a): a A} as the set of terms f st(a). This definition guarantees that no states are hidden. Moreover, we designate one of the states, that we term s0, as the initial state: s S : s = s0. 3. The rest of the constraints are gathered by scanning the words first by length. For every word of length one, i.e. action a, if au Pi for some i and some u A then we add f st(a) = s0, and if a Ni then we add f st(a) = s0. 4. Next, we scan inductively for every word w P for the minimal i 1 such that w Pi and for every w N for the maximal i 1 such that w Ni. In the base case i=1 we have w P1 N1. Let w = a1a2. . .am. We define m+1 variables p0, p1, . . . pm. The variable pk indicates the state the process reaches after the system reads a1 . . . ak, and p0 should be s0. If w P1, we add the constraint ψw,1 defined as

(p0 = s0) V 1 ℓ m pℓ 1 = f st(aℓ) pℓ= f !!(aℓ)

This requires that the next letter aℓis enabled in the state the process reached after a1a2 . . . aℓ 1 was executed. If w N1 we add the following constraint _

ψw[..ℓ],1 pℓ = f st(aℓ+1)

where w[..ℓ] denotes the ℓ th prefix of w, namely a1a2 . . . aℓ, and we let ψϵ,1 = T. This requires that at least one of the letters in the word is not enabled in the state the process reached, implying the entire word is infeasible with one process.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

5. For the induction step i > 1, let w Pi Ni and assume w = a1a2 . . . am. We define i(m + 1) variables p1,0, p2,0, . . . pi,m. The variable pj,k indicates the state the j-th process reaches after the system reads a1 . . . ak. Accordingly, we set pj,0 = s0 for every 1 j i. The state of the processes after reading the next letter, al+1, depends on their state after reading al. Let w Pi, we add the constraint ψw,i defined as follows.

pj,ℓ 1 = f st(aℓ) φj,ℓ

pj,ℓ= f !!(aℓ)

1 j i j = j

pj ,ℓ= f ?? aℓ(pj ,ℓ 1)

Intuitively, ψw,i requires that for every letter aℓof w one of the processes, call it j, reached a state in which aℓ is enabled. The formula φj,ℓstates that the j-th process took the sending transition on aℓand the rest of the processes took the respective receiving transition. Let w Ni. We then add the following requirement

pj,ℓ = f st(aℓ+1)

where we let ψϵ,i = T for every i. Intuitively, if w is infeasible with i processes, then there exists a (possibly empty) prefix w[..ℓ] which is feasible with i processes, therefore ψw[..ℓ],i holds, while w[..ℓ+1] is infeasible, meaning none of the i processes is in a state where aℓ+1 is enabled. Theorem 4.1. Let S be a sample that is consistent with some fine BP. Let BS be a BP that satisfies the prescribed constraints ΨS. Then BS is a BP consistent with S.

Proof. We prove that if w Pi (resp. w Ni) then w is feasible (resp. infeasible) in Bi S, by induction first on the length of w and then on i. For w of length 1, this holds by the constraints in item (3). Let w =a1a2. . .an Pi. If i=1 then this holds by induction on w thanks to constraint (4). Next we consider words of the form w that are in Pi Ni. If w Pi is already in Pj for j < i then by the induction hypothesis it is already feasible for j processes in the constructed BP, and by Lem.3.1, it is also feasible with i processes. Otherwise, w Pi \ S j<i Pj. In this case, constraint (5) makes sure that every prefix of w is feasible with i processes, and requiring that for the next letter aℓone of the i processes reached the state enabling aℓafter reading the prefix up to aℓ 1. If w Ni then w is infeasible with i processes. In this case, there exists a letter aℓfor 1 ℓ m such that while w[..ℓ 1] is feasible, aℓis not enabled in any of the states that the i processes reach after reading (the possibly empty) prefix w[..ℓ 1]. This is exactly what constraint (5) stipulates, which is added for i or some j > i. In the latter case, Lem.3.1 implies that w is infeasible in Bi S.

Finally, note that our constraints are in the theory of equality with uninterpreted functions (EUF), and are therefore decidable.5 Thus, an algorithm for inferring fine BPs can be implemented using an SMT solver.

Corollary 4.2. I is an inference algorithm for fine BPs.

5 Returning a Minimal BP In this section we show that when the sample is sufficiently complete we can guarantee that we return a minimal equivalent BP, and not just a BP that agrees with the sample. We thus first show that every fine BP B can be associated with a sample SB so that there exists an inference algorithm A that when applied to any sample S that subsumes SB and is consistent with B, returns a minimal fine BP that is equivalent to B. We refer to such a sample as a characteristic set (CS). In the following, we first describe a procedure G that generates a sample SB from a fine BP B, and then we prove that an inference algorithm A can correctly infer a minimal BP B equivalent to B from any sample subsuming SB.

Generating a Characteristic Set The CS generation algorithm G builds a sequence of trees Ti starting with i = 0 and incrementing i by one until Ti+1 = Ti. The edges of the tree are actions. The name of a node is taken to be the unique sequence of actions w that leads to it. Thus, the root is named ε and a child of a node w A is named wa for some a A. A node w A in tree Ti is annotated with Bi(w) = pw,i, i.e. the state-vector Bi

reaches when reading w, if w is feasible in Bi, and with the special symbol otherwise. We call a node in the tree positive if it is annotated with a state-vector, and negative otherwise. All nodes are either leaves or have exactly |A| children. Negative nodes are always leaves. The tree T0 consists of only a root ε and is annotated with the state-vector of all zeros. The tree Ti+1 is constructed from the tree Ti by first re-annotating all its nodes: The annotation of a positive pw,i is replaced by pw,i+1, a negative node w in Ti may become positive in Ti+1 (if w is feasible with i+1 processes) and will be annotated accordingly with pw,i+1. Then we check, from every positive node, whether further exploration is needed. A positive node will be declared a leaf if it is of the form va and it has an ancestor u, a prefix of v, for which pu,i+1 = pv,i+1. Otherwise its |A| children are constructed. That is, once we reach a node whose state-vector is the same as one of its ancestors, we develop its children, but the children are not developed further. The entire process terminates when Ti+1 = Ti.6 Note that given that the BP has a cutoff, such an i must exist. We use T for the last tree constructed, namely Ti+1. The sample is then produced as follows. For n [i + 1], let Pn = {(u, n, T) | n is the minimal for which u is positive in Tn}, Nn = {(u, n, F) | n is the maximal for which u is negative in Tn}. Then the sample is SB = Si+1 n=1 (Pn Nn).

5While constraint (2) depends on the unknown set A, we can bound the size of A, e.g., by the size of the prefix tree of S. 6We say Ti+1 = Ti if they agree on the tree structure and the edge labels (regardless of the nodes annotations).

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Proving That G Generates Characteristic Sets We first note that for any reachable state s of the original BP, there exists at least one node v in the tree where s is lit (i.e. the entry for s in the state-vector annotating the node is lit). The following lemma strengthens this statement further. Lemma 5.1. Let p be a state-vector that is reachable in Bn. Then for every shortest word w that reaches p in Bn there exists a node w in Tn such that pw = p. When the sample S subsumes the set SB then #S induces an equivalence relation between the actions: Lemma 5.2. For two actions a and b define a S b iff it is not the case that a #S b. If S subsumes SB then S is an equivalence relation. Theorem 5.3. Let B be a fine minimal BP, and let SB be the sample generated for it as above. There is an inference algorithm A such that if B is the result of A when applied to any set subsuming SB and consistent with B then B is minimal and L(B ) = L(B).

Proof. The inference algorithm A we use to prove this claim runs in two steps. First it runs a variation I of the inference algorithm I presented in 4 that turns the constraint (1) into an iff constraint. I.e. adding that f st(a) = f st(b) unless a #S b. If running I returns that there is no satisfying assignment then it runs I. In both cases Thm. 4.1 guarantees that the returned BP is consistent with the given sample. Therefore A is an inference algorithm. Next we claim that if the given sample subsumes SB then B , the resulting BP, is minimal. This holds since Lem.5.2 ensures that #S defines the desired equivalence S between actions, and the revised constraint (1) guarantees that actions are not enabled from the same state if and only if the sample separates them. (Note that any word consistent with the BP cannot separate actions a and b if they are enabled from the same state.) Hence I will not return that there is no satisfying assignment. Next we note that by Lem.5.1 for every state-vector p that is reachable in Bm and for every shortest word w that reaches p in Bm there exists a node w in Tm such that pw = p. If w = a1a2 . . . an then for each 1 i n one process took the sending transition ai!! and the rest of the processes responded with ai??. Constraint (5) makes sure the assignment to f st, f !! and f ?? respect all the possible options that enabled this, making sure that for every two options for enabling w that result in state-vectors p1 and p2, resp., the same states are lit in both p1 and p2. Hence, for any BP B that adheres to the constraints there exists a mapping h between the states of B and B satisfying the requirements of Lem.3.3. Thus, L(B)=L(B ).

Regarding the problem of polynomial data we show that there exist fine BPs for which there is no CS of polynomial size. The proof constructs a family of BPs of size quadratic in n for which there exists an action a such that the length of the shortest word containing a is exponential in n. Thus, any CS has to include at least one such long word. Theorem 5.4. There exists a family of fine BPs with no characteristic set of polynomial size.

i!! i?? $?? $??

x!! !! !! ??

Figure 3: Reduction of DFA-consistency to BP-consistency.

The same family used in the proof of Thm.5.4 also shows that fine BPs can be exponentially smaller than the minimal DFA accepting the same language. This is since in a DFA for every state q the length of the shortest word reaching q is bounded by the number of states in the DFA.

Corollary 5.5. There exists a family of fine BPs for which the corresponding minimal DFA is of exponential size.

6 Consistency Is NP-Hard for Fine BPs We show below that consistency is NP-hard even for fine BPs. We note that hardness is expected since DFA consistency is NP-hard (Gold 1978), but it does not directly follow from hardness of DFA consistency. This is since a DFA is not a special case of a fine BP. However, a fine BP can simulate a DFA, in a manner prescribed in Lem.6.1. Fig.3 provides a schematic illustration of the simulation. The states in the rectangle are the original states of the DFA, and ι is the initial state of the DFA. All responses that are not shown in the figure are self-loops, we omit them to avoid clutter. In addition, to make the BP fine, we need to allow each state q to enable some action, call it q. The simulation would like to ignore these actions, i.e. consider the projection of the words to words without these actions. Formally, let Γ, Γ be alphabets such that Γ Γ. Let w be a word over Γ . We use πΓ(w ) for the word obtained from w by removing letters in Γ \ Γ. If B is a BP over A , such that A A, we refer to the words in {πA(w) | w L(B)}, abbreviated πA(L(B)), as the A-feasible words of B. Lem.6.1 states in which manner the BP simulates the DFA.

Lemma 6.1. Let L be a non-trivial regular language over Σ, and assume n is the number of states in the minimal DFA for L.7 Let A = Σ {i, $, , , x}. 1. There exists a fine BP B over actions A such that A A satisfying that πA(L(B1)) = {i} and πA(L(Bn)) i(Σ) $x ( ) for every n 2. 2. In addition, for every w Σ : w L (iw$ , 2) is feasible in B. w / L (iw$ , 2) is feasible in B. 3. Moreover, for every B satisfying the above it holds that B has at least n + 5 states.

Proof sketch. In order to keep the number of processes in the DFA part exactly one, the i action from the initial state of the BP (state I) sends one process to ι and the rest to state C. To allow every letter σ Σ to be taken from every state

7A language is non-trivial if it is not the empty set or Σ .

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

of the DFA, the state C of the BP enables all letters in Σ, and the original (q, σ, q ) transitions of the DFA are transformed into responses (q, σ??, q ). Next, to deal with the fact that the language of a DFA is defined by the set of words that reach an accepting state whereas the language of a BP is the set feasible words, we introduce the letters $, and . Upon $ the single process in one of the DFA states moves into either state or state depending on whether it was in an accepting state or a rejecting state; and all processes in states C move to state X where they wait to follow the respective or . Now only {x, } or {x, } are enabled. An x!! transition would not change this situation whereas a (resp. ) transition will ensure only (resp. ) can be taken henceforth. The complete proof (in the extended version) shows the three requirements of the lemma are satisfied.

Theorem 6.2. BP consistency is NP-hard.

Proof Sketch. The proof is by reduction from DFA consistency which is NP-hard by (Gold 1978). Given an input to DFA-consistency, namely a sample S and k N, we produce a sample S and k =k+5 as an input to BP-consistency so the relation between the minimal BP consistent with S and the minimal DFA consistent with S is as stated in Lem.6.1. The sample S consists of various triples ensuring a BP consistent with S has the structure given in Fig.3. For instance, (i, 1, T) and (ii, 2, F) enforce that i is enabled in the initial state but is not a self-loop. A pair (w, T) (resp. (w, F)) in S is altered to triple (iw$ , 2, T) (resp. (iw$ , 2, T) in S , making sure that words accepted (resp. rejected) by the DFA create respective feasible words ending with s (resp. s). Each such pair carries some additional triples added to S to continue enforcing the structure of Fig.3.

See the extended version for an alternative proof via a direct reduction from all-eq-sat, inspired by Lingg et al (2024). Note that given a BP B and a pair (w, n) A N it is possible to check in polynomial time whether w is feasible in Bn by developing the state-vector n u0 along the word w in B. Consequently, and since a BP with m states over set of actions A can be described in size polynomial in m and |A|, if m is given in unary then BP-consistency is NP-complete.

7 BPs Are Not Polynomially Predictable Here we show that fine BPs are not polynomially predictable with membership queries. The learning paradigm of polynomial predictability of a class C can be explained as follows. The learner has access to an oracle answering membership queries (MQ) with regard to the target language C C or draw queries (DR) that can be implemented using MQ. A membership query receives a word w as input and answers whether w is or is not in C. A draw query receives no inputs and returns a pair (w, b) where w is a word that is randomly chosen according to some probability distribution D and b is MQ(w). We assume some bound ℓon the length of the relevant examples, so that D is a probability distribution on the set of relevant words. We assume the learner knows ℓ but D is unknown to her. At some point, the learner is expected to ask for a word whose membership it needs to predict, in which case it is handed a word w (drawn randomly

s!!, s?? Σ!! $!!, $??

Figure 4: A BP simulating intersection of k DFAs.

according to the same distribution D) and it should then answer whether w is or is not in C. We say that the class C is polynomially predictable with membership queries, if given a bound s on the size of the target language, the mentioned bound ℓon the length of relevant examples, and an accuracy parameter ε between 0 and 1, there exists a learner that will classify the word to predict correctly with probability at least (1 ε), after asking a number of queries that is polynomial in the size of the minimal BP of the target language. We show that under plausible cryptography assumptions fine BPs (thus BPs in general) are not polynomially predictable.

Theorem 7.1. Assuming the intractability of any of the following three problems: testing quadratic residues modulo a composite, inverting RSA encryption, or factoring Blum integers, fine BPs are not polynomially predictable with MQ.

Proof Sketch. The proof is via a reduction from the class D of intersection of DFAs, for which Angluin and Kharitonov have shown that D is not polynomially predictable under the same assumptions (Angluin and Kharitonov 1995). We show that given a predictor B for fine BPs we can construct a predictor D for the intersection of DFAs as follows. Given a set D1, D2, . . . , Dk of DFAs, we can construct a BP B as shown in Fig.4 such that B simulates the run of the k DFAs together. As in the proof of Lem.6.1 we can send one process to simulate any of the DFAs. Here we need k processes to send a process to the initial state of each of the DFAs, and an additional process to enable all letters in Σ. Thus, the cutoff is k + 1. The BP detects whether a given word w is accepted by all the DFAs by checking whether uw$ is infeasible in B where u is some initialization sequence that is required to send the processes to the initial states of the DFAs.

8 Conclusion We investigated the learnability of the class of fine broadcast protocols. To the best of our knowledge, this is the first work on learning concurrent models that does not assume a fixed number of processes interact. On the positive, we showed a passive learning algorithm that can infer a BP consistent with a given sample, and even return a minimal equivalent BP if the sample is sufficiently complete. On the negative, we showed that the consistency problem for fine BPs is NPhard; characteristic sets may be inevitably of exponential size; and the class is not polynomially predictable.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

References Angluin, D. 1987. Learning Regular Sets from Queries and Counterexamples. Inf. Comput., 75(2): 87 106. Angluin, D.; Eisenstat, S.; and Fisman, D. 2015. Learning Regular Languages via Alternating Automata. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, 3308 3314. Angluin, D.; and Kharitonov, M. 1995. When Won t Membership Queries Help? J. Comput. Syst. Sci., 50(2): 336 355. Außerlechner, S.; Jacobs, S.; and Khalimov, A. 2016. Tight Cutoffs for Guarded Protocols with Fairness. In VMCAI, volume 9583 of Lecture Notes in Computer Science, 476 494. Springer. Beimel, A.; Bergadano, F.; Bshouty, N. H.; Kushilevitz, E.; and Varricchio, S. 2000. Learning functions represented as multiplicity automata. J. ACM, 47(3): 506 530. Bollig, B.; Habermehl, P.; Leucker, M.; and Monmege, B. 2013. A Fresh Approach to Learning Register Automata. In Developments in Language Theory - 17th International Conference, DLT 2013, Marne-la-Vall ee, France, June 1821, 2013. Proceedings, 118 130. Bollig, B.; Katoen, J.; Kern, C.; and Leucker, M. 2010. Learning Communicating Automata from MSCs. IEEE Trans. Software Eng., 36(3): 390 408. de la Higuera, C. 2010. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press. Decker, N.; Habermehl, P.; Leucker, M.; and Thoma, D. 2014. Learning Transparent Data Automata. In Application and Theory of Petri Nets and Concurrency - 35th International Conference, PETRI NETS 2014, Tunis, Tunisia, June 23-27, 2014. Proceedings, 130 149. Delzanno, G.; Esparza, J.; and Podelski, A. 1999. Constraint-Based Analysis of Broadcast Protocols. In CSL, volume 1683 of Lecture Notes in Computer Science, 50 66. Springer. Emerson, E. A.; and Kahlon, V. 2000. Reducing Model Checking of the Many to the Few. In CADE, volume 1831 of Lecture Notes in Computer Science, 236 254. Springer. Emerson, E. A.; and Kahlon, V. 2003. Model Checking Guarded Protocols. In LICS, 361 370. IEEE Computer Society. Emerson, E. A.; and Namjoshi, K. S. 1998. On Model Checking for Non-Deterministic Infinite-State Systems. In LICS, 70 80. IEEE Computer Society. Emerson, E. A.; and Namjoshi, K. S. 2003. On Reasoning About Rings. Int. J. Found. Comput. Sci., 14(4): 527 550. Esparza, J.; Finkel, A.; and Mayr, R. 1999. On the Verification of Broadcast Protocols. In LICS, 352 359. Esparza, J.; Leucker, M.; and Schlund, M. 2011. Learning Workflow Petri Nets. Fundam. Informaticae, 113(3-4): 205 228. Finkel, A.; and Schnoebelen, P. 2001. Well-structured transition systems everywhere! Theor. Comput. Sci., 256(1-2): 63 92.

Fisman, D.; Izsak, N.; and Jacobs, S. 2023. Learning Broadcast Protocols. Co RR, abs/2306.14284. Fisman, D.; Nitay, D.; and Ziv-Ukelson, M. 2023. Learning of Structurally Unambiguous Probabilistic Grammars. Log. Methods Comput. Sci., 19(1). Geeraerts, G.; Raskin, J.; and Begin, L. V. 2007. Wellstructured languages. Acta Informatica, 44(3-4): 249 288. Gold, E. M. 1967. Language Identification in the Limit. Inf. Control., 10(5): 447 474. Gold, E. M. 1978. Complexity of Automaton Identification from Given Data. Inf. Control., 37(3): 302 320. Jacobs, S.; and Bloem, R. 2014. Parameterized Synthesis. Log. Methods Comput. Sci., 10(1). Lingg, J.; de Oliveira Oliveira, M.; and Wolf, P. 2024. Learning from positive and negative examples: New proof for binary alphabets. Information Processing Letters, 183: 106427. Muscholl, A.; and Walukiewicz, I. 2022. Active learning for sound negotiations. In LICS 22: 37th Annual ACM/IEEE Symposium on Logic in Computer Science, Haifa, Israel, August 2 - 5, 2022, 21:1 21:12. Peled, D. A.; Vardi, M. Y.; and Yannakakis, M. 2002. Black Box Checking. J. Autom. Lang. Comb., 7(2): 225 246. Roy, R.; Gaglione, J.; Baharisangari, N.; Neider, D.; Xu, Z.; and Topcu, U. 2023. Learning Interpretable Temporal Properties from Positive Examples Only. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, 6507 6515. Vaandrager, F. W. 2017. Model learning. Commun. ACM, 60(2): 86 95.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)