# learning_residual_alternating_automata__c84ca9f4.pdf Learning Residual Alternating Automata Sebastian Berndt, Maciej Li skiewicz, Matthias Lutter, R udiger Reischuk Institute for Theoretical Computer Science, University of L ubeck Ratzeburger Allee 160, 23562 L ubeck, Germany {berndt,liskiewi,lutter,reischuk}@tcs.uni-luebeck.de Residuality plays an essential role for learning finite automata. While residual deterministic and non-deterministic automata have been understood quite well, fundamental questions concerning alternating automata (AFA) remain open. Recently, Angluin, Eisenstat, and Fisman (2015) have initiated a systematic study of residual AFAs and proposed an algorithm called AL an extension of the popular L algorithm to learn AFAs. Based on computer experiments they have conjectured that AL produces residual AFAs, but have not been able to give a proof. In this paper we disprove this conjecture by constructing a counterexample. As our main positive result we design an efficient learning algorithm, named AL , and give a proof that it outputs residual AFAs only. In addition, we investigate the succinctness of these different FA types in more detail. 1 Introduction Learning finite automata is an important issue in machine learning and of great practical significance to solve substantial learning problems like pattern recognition, robot s navigation, automated verification, and many others (see e. g. the textbook (De la Higuera 2010)). Depending on applications, different types of automata might be required as desirable targets of learning. The ones of particular concern are deterministic (DFA), non-deterministic (NFA), the dual of NFAs the universal finite automata (UFA), and their generalization the alternating finite automata (AFA). Though they are of the same expressive power, the automata have different modeling capabilities and succinctness properties. A minimal (w. r. t. the number of states) DFA might be exponentially larger than an NFA and double-exponentially larger than an AFA. Thus, for many applications, e. g. in automated verification, it is desirable to work directly with AFAs rather than with the other types as the membershipproblem for AFAs is still efficiently solvable. In the common exact learning framework for FA the learner can ask membership queries to test if a word is accepted by the unknown target automaton and equivalence queries to compare his current hypothesis and, if there is a This work was supported by the Deutsche Forschungsgemeinschaft (DFG) grant LI 634/4-1. Copyright c 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. mismatch to receive a counterexample. This model has been introduced in (Angluin 1987) that launched a tremendous amount of subsequent research yielding many effective algorithms of relevance in machine learning and other areas. Angluin (1987) has provided an algorithm, named L that learns a minimal DFA in polynomial time. The minimality of the resulting DFA plays an important role here since this condition makes it unique (up to naming of states). Thus, L learns precisely the target automaton if this is minimal. Beside uniqueness, minimal DFAs have also another nice property termed residuality. An automaton A accepting a language L is residual if every state q of A can be associated with a word wq such that the language accepted by Aq the automaton A that starts in q, is exactly the set of words v for which wqv is in L. Thus, every state q of A corresponds to the residual language of L determined by wq. For many learning algorithms the residuality property plays an essential role in inferring the target automaton. Angluin s L algorithm makes heavy use of this concept: The states of a hypothesized automaton are represented by a prefix-closed set of strings such that for every state qs corresponding to a string s, the language accepted from qs is residual with respect to s and the target language. Unfortunately, non-deterministic automata, in general do not satisfy the residuality property. Even worse, for an NFA A languages accepted by Aq, for states q of A, have no natural interpretation and two minimal NFAs can be non-isomorphic. The disadvantageous properties may lead to ambiguity problems and difficulties in learning automata. Moreover the goal is to learn automata containing a certain structure, that may be helpful for later use in specific applications, like e. g. in model checking. Residuality is one such structural property that allows to assign a natural semantic to the states of a complex automaton. This allows a simpler analysis of the (possibly) involved behaviour of the automaton. Denis, Lemay, and Terlutte (2001) introduced the class of residual NFA (RNFA). For every regular language L there is a unique RNFA AL called canonical such that the number of states is minimal, the number of transitions between states is maximal, and for every state q of AL the language accepted by AL q is residual. In addition, AL can be exponentially more succinct than the equivalent minimal DFA. Using the residuality property, Bollig et al. (2009) proposed a sophisticated extension of Angluin s algorithm named NL that learns a Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) canonical RNFA with a polynomial number of membership and equivalence queries. Recently, Angluin, Eisenstat, and Fisman (2015) extended the definition of residual automata to universal and alternating automata, and provided UL , a learning algorithm for UFAs, and AL , a learning algorithm for AFAs. To analyze the advantages and trade-offs among these algorithms, the authors performed experiments and showed that for randomly generated automata, AL outperforms the other algorithms w. r. t. the number of membership queries, but w. r. t. the number of equivalence queries L is the best, followed by UL , NL , and AL (which is justified due to the succinctness obtained). However, as the authors write, they have not been able to prove that AL always outputs residual AFAs. Based on the experiments they have conjectured that this property indeed holds, but left its proof as future work. In this paper we disprove their conjecture by providing a counterexample that has been constructed with the help of specially designed software tools for learning residual automata. Next, we continue the systematic study of residual AFAs and discuss several properties to get a better understanding of these machines. As our main positive result we design an efficient learning algorithm, named AL , and give a proof that it outputs residual AFAs only. In addition, we investigate the succinctness of these different FA types in more detail. 2 Preliminaries We denote the symmetric difference of sets by Δ, the Boolean value true as and the value false as . For a set S we denote by F(S) all formulas over S that can be generated with the binary operators and . For R { , }, the set FR(S) is the subset of F(S) of formulas that are build form S and operator R. In order to help the reader in understanding the complex behaviour of alternating finite automata, we compare them to the well-known deterministic and non-deterministic models. 2.1 Automata An alternating finite automaton (AFA) first introduced by Chandra, Kozen, and Stockmeyer (1981) with alphabet Σ is a four-tuple (Q, Q0, F, δ), where Q is the set of states, Q0 F(Q) is the initial configuration, F Q are the accepting states, and δ: Q Σ F(Q) is the transition function. If Q0 and, for all q Q and all a Σ, the transition δ(q, a) consist of a single state, such an automaton is called a deterministic automaton (DFA). If Q0 F (Q) and δ(q, a) F (Q) for all q Q and all a Σ, it corresponds to non-deterministic finite automata (NFA). E. g. if δ(q, a) = p1 p2, this describes a non-deterministic choice between p1 or p2. If Q0 F (Q) and δ(q, a) F (Q) for all q Q and all a Σ, the definition corresponds to universal finite automata (UFA), where, e. g. a transition δ(q, a) = p1 p2 leads to state p1 and state p2. As usual, the function δ is extended to arbitrary formulas and strings: If ϕ F(Q), assume w. l. o. g. ϕ is in disjunctive normal form (DNF) with ϕ = i Mi and Mi = then we define δ(ϕ, a) = i j δ(qi,j, a) for a single symbol a Σ and δ(ϕ, ϵ) = ϕ for the empty string ϵ. For a nonempty string wa Σ , we define δ(ϕ, wa) = δ(δ(ϕ, w), a). For an NFA, this definition simply reduces to δ(q p, a) = δ(q, a) δ(q, b) as usual, if one interprets the formula q p as set {q, p}. For an AFA A = (Q, Q0, F, δ) and a formula ϕ F(Q), we define the evaluation of ϕ, denoted as ϕ , recursively as follows: for the empty set of states we let = , for singletons we define q = if q F, q = if q / F, and finally ϕRψ = ϕ R ψ for R { , }. The AFA A accepts a word w, if δ(Q0, w) = . For an NFA, δ(Q0, w) = expresses the same as {q1, . . . , qk} F = if δ(Q0, w) = q1 . . . qk (i. e. when starting with initial configuration and reading the word w some accepting state is reached) and for a UFA it is the same as {q1, . . . , qk} F if δ(Q0, w) = q1 . . . qk (i. e. all states reached are accepting). The language L(A) of the automaton A is the set of all accepted strings. For an AFA A = (Q, Q0, F, δ) and a state q Q, we write Aq to indicate the automaton Aq = (Q, q, F, δ) that starts with configuration q instead of Q0. 2.2 Residuality Let L Σ be a regular language. For a word u Σ , we define the residual language u 1L as {v Σ | uv L}. The set of all residual languages of L is denoted by RES(L). A residual language u 1L is called -prime, resp. -prime if u 1L cannot be defined as the union, resp. intersection of other residual languages. We denote the subsets of RES(L) by -Primes(L), resp. -Primes(L). An automaton A with states Q is residual, if L(Aq) RES(L) for all q Q, i. e. if every state corresponds to a prefix u and its residual language u 1L. Let RNFA, RUFA and RAFA denote the appropriate residual restrictions. 2.3 Learning Algorithms All of the learning algorithms x L for automata (i. e. L , NL , UL , and AL ) and our AL follow a very similar pattern. Two sets U, V Σ are constructed, where U is prefix-closed and V is suffix-closed. For all strings uv UV or uav UΣV a membership query is performed. The resulting matrix, indexed by U UΣ and V is called a table. The rows indexed by U correspond to possible states. To minimize the number of states, a subset B of rows (a base) is constructed such that all rows can be built from the elements of B. The specific way to build a row depends on the type of automaton. A hypothesized automaton is constructed from this subset B. For a row ru indexed by u U and a symbol a Σ, the transition δ(ru, a) equals the formula that builds the row indexed by ua. Formally similar to Bollig et al. (2009) for the prefixclosed set U and the suffix-closed set V , we define a |U UΣ| |V | table T = (T, U, V ) with entries in {+, } determined by function T : Σ {+, , } specified below. Let W(T ) denote the set (U UΣ)V , for short. We call W(T ) the set of words described by T . Define T for w Σ as if w W(T ), + if w W(T ) L, if w W(T ) \ L. The entry of T in row x and column y is equal to T(xy). Note, that to define T we need only values T on W(T ). We extend the domain of T to all words over Σ for the sake of completeness. For an example for T and for the forthcoming definitions concerning tables, see Fig. 1. ϵ ab b ϵ - + - a - - + b - - - aa - - - ab + - + Figure 1: Table T = (T, U, V ) for the language L = ab+, with U = {ϵ, a}, V = {ϵ, ab, b}, and R = UΣ \ U = {b, aa, ab}. The entries of the table are determined by T: the value in row x and column y is equal to T(xy). For example, the value in row ab and column b is + since T(abb) = +, as abb L and abb W(T ). An example row is e. g. rϵ = ( + ). Furthermore, Rowshigh(T ) = {rϵ, ra}. An automaton A and a table T are compatible, if for all w W(T ), A accepts w iff T(w) = +. For every u U UΣ, we associate a vector ru of length |V | over {+, } with ru[v] = T(uv) for v V . The vector ru is called the row of u. The set of all rows is denoted by Rows(T ). An important subset, denoted by Rowshigh(T ), are those ru with u U. Finally, we say that T = (T, U, V ) is consistent if for every u, u U and a Σ, we have that ru = ru implies rua = ru a. If T is consistent then, to simplify the notation, for any r Rowshigh(T ) and a Σ we write ra for the vector rua s.t. u U is any string with ru = r. 3 Learning Residual Universal Automata The classical result of Angluin (1987) that one can learn the unique minimal DFA for a regular language L from membership and equivalence queries the algorithm L has been extended by Bollig et al. (2009) to NFAs. They have designed an algorithm NL that learns the unique residual NFA with a minimal number of states and a maximal number of transitions between these states (the canonical RNFA) accepting L. Angluin, Eisenstat, and Fisman (2015) presented a modification of NL named UL algorithm to learn a residual UFA, but without a detailed analysis. To better understand residual UFAs we introduce the following definition. Definition 1 (Canonical RUFA). The canonical RUFA for a regular language L is the tuple (Q, Q0, F, δ) where Q = -Primes(L), Q0 = {L Q | L L}, F = {L Q | ϵ L }, and δ(L1, a) = {L2 Q | a 1L1 L2}. The canonical RUFA has the minimal number of states and the maximal number of transitions between these states, which makes it unique. In the following we prove that UL always outputs such automata. The order + on the set {+, } is extended to a partial order on vectors by requiring to hold for each component. The binary operators , on the set {+, } are defined by a b = min{a, b} and a b = max{a, b}. For vectors, these operators are extended by performing the operation componentwise. We say that a row ru is -composite if there are rows ru1, ru2, . . . , ruk Rowshigh(T ), with rui = ru, such that ru = k i=1 rui. For example, in Fig. 1, the row rb is composite, as rb = rϵ ra. Otherwise, ru is -prime. Let Primes (T ) be the set of -prime rows in Rowshigh(T ). For the table T in Fig. 1, rϵ, ra, rab are -prime, and Primes (T ) = {rϵ, ra}. To simplify the notation, for every ru Rows(T ), let B (ru) = {ru Rowshigh(T ) | ru ru }. A table T is -closed if every row ru Rows(T ) can be generated from a subset of rows in Primes (T ) that are combined with the operator. A subset of rows that can generate all rows of a table T using is called a -basis. Thus, T is -closed if Primes (T ) is a -basis for T . Note that the table T in Fig. 1 is not -closed, as the row rab Rows(T ) is not composable by rows of Rowshigh(T ). Algorithm UL constructs the RUFA in a way dual to the nondeterministic case. For a consistent and -closed table T , let the UFA A(T ) = (Q, Q0, F, δ), with Q = Primes (T ), Q0 = B (rϵ) Q, F = {r Q | r[ϵ] = +}, and for all r Q and a Σ, let δ(r, a) = B (ra) Q. Lemma 1. For all u, u U with ru, ru Q, v V and r δ(Q0, u) it holds: 1. ru[v] = + δ(ru, v) F, 2. rϵ[v] = + δ(Q0, v) F. If T and A(T ) are compatible then additionally 3. ru δ(Q0, u) and ru r, 4. ru ru w δ(ru, w) F δ(ru , w) F. Theorem 1. If T and A(T ) are compatible, then A(T ) is the canonical RUFA. Proof. We first show that the automaton is residual. Let r Q. Thus, r δ(Q0, ur) and hence (ur) 1L(A(T )) L(Ar(T )). Furthermore, for all r δ(Q0, ur), we have r r and thus L(Ar(T )) L(Ar (T )) by Lemma 1. This implies L(Ar(T )) (ur) 1L(A(T )). Hence, L(Ar(T )) = (ur) 1L(A(T )). The language L(Ar(T )) is also -prime, as r is -prime due to Lemma 1. 4 Learning Alternating Automata This section contains the main results of our work. In (Angluin, Eisenstat, and Fisman 2015), the algorithm AL to learn alternating automata has been presented and its running time analyzed. As noted by Angluin, Eisenstat, and Fisman, properties of the automata produced remain unclear. We first prove several properties of AL and disprove the conjecture of residuality. Then we present the modified algorithm AL that guarantees residuality. Finally we discuss how to find a provably good basis for AFAs (defined in the next subsection). 4.1 Analysis of AL Algorithm Let us review the construction of the automata generated by AL and analyze the properties of these automata in detail. Note that we use the basic version of the algorithm AL without the optimizations suggested by Angluin, Eisenstat, and Fisman (2015), as some of these optimizations may lead to a non-terminating behaviour of the algorithm. For a formula ϕ F(Rows(T )) on the rows of a table, we define the evaluation ϕ by ru = ru, ϕ ψ = ϕ ψ and ϕ ψ = ϕ ψ and extend this to a set B of formulas by B = { ϕ | ϕ B}. For example, (+ + +) + = + +. In the following P will always denote a subset of Rowshigh(T ). P is a ( , )-basis for T (in the following simply called a basis) if Rows(T ) F(P) . T is then called P-closed. T is called P-minimal if P is a minimal basis for T , i. e. for all p P, P \ {p} is not a basis. For a P-closed table T and v V , let MP (v) be the monomial defined by p P,p[v]=+ p , which is a maximal one over all monomials in F (P) such that M P (v) [v] = +. For a P-closed table T and r Rows(T ), let b P (r) F(P) be the expression v V,r[v]=+ M P (v) representing r. Note that b P (r) = r. Let M be a monomial and a Σ. We define Ma as the monomial derived from M by replacing every row r P of M by ra. For a DNF-formula ϕ consisting of monomials Mi, we use the notation Mi ϕ and for a monomial M = j xj the notation xj Mi for its literals xj. For formulas ϕ(x1, . . . , xk) and ψ(x1, . . . , xk) with literals x1, . . . , xk that represent vectors r over {+, }, we say that ϕ and ψ are equivalent (in symbols ϕ ψ), if ϕ(r1, r2, . . . , rk) = ψ(r1, . . . , rk) for all vectors r1, . . . , rk of identical length. For a formula ϕ, let ϕDNF denote a DNF-formula that is equivalent to ϕ. For a consistent and P-closed table T , let us define the AFA AP (T ) = (Q, Q0, F, δ) as follows: Q = P, Q0 = b P (rϵ), F = {r P | r[ϵ] = +}, and for all r Q let δ(r, a) = b P (ra). Note that δ(r, a) = b P (ra) is always a DNF-formula. Lemma 2. For every ϕ F(Q) and every automaton AP (T ) : ϕ = iff ϕ [ϵ] = +. In the following, fix a regular language L, a prefix-closed set U, a suffix-closed set V , the corresponding table T and a minimal basis P of Rowshigh(T ). Lemma 3. r[v] = δ(r, v) [ϵ] for all r P and v V . Lemma 4. ϕ [v] = δ(ϕ, v) [ϵ] for all ϕ F(P) and v V . Lemma 5. If T and AP (T ) are compatible, then for every u U with ru P it holds L(AP ru(T )) u 1L(AP (T )). Proof. Assume L(AP ru(T )) u 1L(AP (T )), i. e. there exists a string ω such that ω L(AP ru(T )) and ω / u 1L(AP (T )). Since ω L(AP ru(T )), we have δ(ru, ω) [ϵ] = + by definition. Moreover, ω / u 1L(AP (T )) implies uω / L(AP (T )) and thus δ(δ(Q0, u), ω) [ϵ] = . We will now prove that such an ω cannot come from V or ΣV by showing that ω / (Σ {ϵ})V . Assume that ω = av with a Σ {ϵ}, v V . By Lemma 4, rua[v] = δ(ru, a) [v] = δ(ru, ω) [ϵ] = +. This contradicts compatibility, as rua[v] = + implies that uav = uω L(AP (T )). Now, let δ(Q0, u)DNF = M1 M2 Mk be the formula that is reached in the automaton after reading u. The fact ω / (Σ {ϵ})V implies that this formula agrees with ru, i. e. M1 Mk = ru. Now let ω = a ω. From the construction of δ, we know that the row rua is not completely filled with , since δ(ru, ω) [ϵ] = δ(ru, a ω) [ϵ] = δ(δ(ru, a), ω) [ϵ] = δ(b P (rua) , ω) [ϵ] = would contradict compatibility. For every column v V with rua[v] = +, consider all monomials Mi with Mia [v] = +. There must be at least one monomial, because otherwise uav / L(AP (T )), which would contradict the compatibility of T and AP (T ). We have M P (v) δ(ru, a) by the construction of δ(ru, a) = b P (rua). For every row r u Mi, we have δ(r u, a) = b P (r ua) = v V,r ua[ v]=+ M P ( v). Hence, M P (v) δ(r ua). Thus, M P (v) δ(Mi, a)DNF and M P (v) δ(M1 Mk, a)DNF. So, for every monomial M P (v) δ(ru, a), we have M P (v) δ(M1 Mk, a)DNF and thus M P (v) δ(Q0, u)DNF. Hence, if δ(ru, a ω) [ϵ] = +, this directly implies δ(M1 Mk, a ω) [ϵ] = +. But δ(M1 Mk, a ω) [ϵ] = δ(δ(Q0, u), ω) [ϵ] = . Hence, this is a contradiction and no such ω exists. For NFAs and UFAs, the reverse inclusion between the two languages in the statement of Lemma 5 holds in the case of compatibility, too. Angluin, Eisenstat, and Fisman (2015) have conjectured that this is also the case for AFAs since extensive tests of their algorithm AL never gave a nonresidual AFA. With the help of specially developed software that simulates and visualizes the run of AL interactively, we have been able to construct a counterexample. Lemma 6. There exists a regular language L for which the algorithm AL constructs a table T defining a compatible AFA AP (T ), with L(AP (T )) = L, such that for some r P and all ω Σ the residual language ω 1L is not contained in L(AP r (T )). γ1, σnr, ρc1f γ2, σnr, ρc2f ρw1 ω, ρw2 σnr Figure 2: A non-residual AFA constructed by AL . The initial configuration is Q0 = s and the set of accepting states F = {f}. A filled square indicates a conjunction of its successors. Proof. It can be shown that the AFA in Fig. 2 is compatible to a table T that can be constructed by AL on a carefully designed language L. The state labeled nr is not residual. 4.2 Learning Residual Alternating Automata Let L be a given regular language. In order to construct only residual AFAs for L we build on AL and design a new algorithm AL presented as Algorithm 1 that solves this problem. The main obstacle that one encounters is the test of residuality of the constructed automaton. We use the power of the equivalence-oracle to incorporate this task into AL by reducing it to a single equivalence query of a larger automaton. Let Suffs(w) denote the set of all suffixes of a string w. We start the analysis of AL with the following observation which guarantees that the automata constructed successively from tables T are well defined. Lemma 7. In AL algorithm table T is always consistent. The main difference between AL and AL lies in the construction of the automaton AP (T ) in line 12. This modification of AL allows us to guarantee the residuality of the generated automaton. As shown in the previous section, the reason for the possible non-residuality of the automaton produced by AL is that the reverse statement of Lemma 5 does not hold for AFAs. As we perform no basis reduction at the construction of AP (T ), compatibility of the table and the automaton guarantees residuality of the automaton. Lemma 8. If the AFA AP (T ) constructed in line 12 is compatible with T then AP (T ) is residual. Proof. Consider some u U. As P = Rowshigh(T ), we have ru P and thus L(AP ru(T )) u 1L(AP (T )) by 1 U {ϵ}; V {ϵ}; 2 initialize T = (T, U, V ) with |Σ| + 1 membership queries; 3 while true do 4 P Rowshigh(T ); 5 while T is not P-closed do 6 find a row rua Rows(T ) with rua / F(P) ; 7 add ua to U; 8 complete T via membership queries; 9 P Rowshigh(T ); 10 construct a minimal basis P and AP (T ) for P; 11 if L(AP (T )) = L then 12 construct AP (T ) with P = Rowshigh(T ); 13 if L(AP (T )) = L then 14 return AP (T ); 16 get counterexample w LΔL(AP (T )); 17 set V V Suffs(w); 18 complete T via membership queries; 20 get counterexample w LΔL(AP (T )); 21 set V V Suffs(w); 22 complete T via membership queries; Algorithm 1: AL for the target language L. Lemma 5. It remains to prove the inclusion in the other direction. Iterating over the length of u one can show that for every configuration of the AFA δ(Q0, u) ru Ru, where Ru is some expression. By construction, every monomial of Q0 = b P (rϵ) contains rϵ. Therefore, Q0 rϵ Rϵ for some expression Rϵ. Hence, δ(Q0, ϵ) = Q0 rϵ Rϵ. As U is prefix-closed, every prefix of u is also in U. If u = u a, every monomial of δ(u , a) contains ru a = ru P by the induction hypothesis. Therefore, δ(u , a) ru R u, where R u is an expression. We thus have δ(Q0, u) = δ(δ(Q0, u ), a) δ(ru Ru , a) (ru R u) Ru ru Ru for some expression Ru. Therefore, L(AP ru(T )) u 1L(AP (T )). Computing the large residual automaton AP (T ) in line 12 upon the trivial basis P allows us to test the smaller automaton AP (T ) for residuality via the following lemma. If AP (T ) passes the equivalency test, it certificates the residuality of AP (T ). Otherwise, our construction directly gives us a counter-example that helps AP (T ) to pass the equivalence test the next time. Lemma 9. If the two AFAs AP (T ) and AP (T ) constructed in line 10, resp. 12 satisfy L(AP (T )) = L = L(AP (T )) then AP (T ) is residual. Proof. Assume L(AP (T )) = L = L(AP (T )). Lemma 8 states that AP (T ) is residual. Consider a state q = ru of AP (T ) with corresponding state q of AP (T ). As ru P Rowshigh(T ) = P , there is always such a corresponding state. Let a Σ be any alphabet symbol. For every monomial M δ(q , a), there is a mono- mial M δ(q, a) such that every literal of M is in M (with the corresponding v we have M = M P (v) and M = M P (v) and M P (v) may consist of states not in P). Hence we have δ(q, w) δ(q , w) . We know u 1L = u 1L(AP (T )) L(AP q (T )) from Lemma 8. We also know L(AP q (T )) u 1L(AP (T )) = u 1L from Lemma 5. So we have u 1L L(AP q (T )) L(AP q (T )) u 1L. Thus, u 1L = u 1L(AP (T )) = L(AP q (T )). Thus, automaton AP (T ) is also residual. For a language L the reverse of L contains all strings a1 . . . aℓ Σ such that aℓ. . . a1 is in L. Now we are ready to give the main result of this section. Theorem 2. For any given regular language L, the algorithm AL always generates an RAFA AP such that L(AP ) = L. Moreover, if the basis P constructed in the run of AL is of minimal size, AP has the minimal number of states over all RAFAs for L. The algorithm terminates after at most κL equivalence queries and κLˆκL(1 + |Σ|)ℓ membership queries, where κL and ˆκL denote the number of states of the minimal DFA for L, resp. the reverse of L and ℓis the size of the longest counterexample obtained from the equivalence oracle. 4.3 Approximating the Minimum Basis Assume T = (T, U, V ) is a table for a regular language. Note that algorithm AL constructs a minimal basis P (of Rowshigh(T )) because computing a minimum basis (i. e. of minimal cardinality) is NP-hard, as shown by Angluin, Eisenstat, and Fisman (2015). In order to guarantee that the basis (and hence the set of states) used by the algorithm is small enough, we give an approximation algorithm for this problem. In the optimization problem MIN-SETCOVER, one is given a groundset X and a set S of subsets of X and searches the smallest S S with s S s = X (see e. g. (Williamson and Shmoys 2011)). If MP := M P (v) | v V for P Rowshigh(T ), we obtain the following lemma. Lemma 10. For any P it is true that MRowshigh(T ) = MP , iff P is a basis of Rowshigh(T ). We will now reduce the problem of finding a basis of Rowshigh(T ) to the problem of finding a solution to a SETCOVER instance. Lemma 11. Let X = {(v, i) | v, i V M Rowshigh(T )(v) [i] = } be the groundset and S = {mu | u U} with subsets mu = {(v, i) X | ru M Rowshigh(T )(v) and ru[i] = } be an instance of SETCOVER. The set P is a basis of Rowshigh(T ), iff there exists a feasible solution C of the set cover instance above such that P = {ru | mu C}. Proof. Every vector of MP can be composed by the vectors of P by intersection, so requiring these compositions does not increase P. Now we apply the lemma above. We can now use the well known algorithm for the optimization problem MIN-SET-COVER due to (Johnson 1974) Figure 3: The residual AFA for the language An of Theorem 4 with n = 2. The corresponding alphabet is Σn = Σa Σb with Σa = {a1, a2} and Σb = {b1, b2}, the initial configuration is Q0 = p1 p2, and the set of accepting states is F = {q1, q2}. that, on input (X, S) produces a feasible solution S S with |S| (ln(|X|) + 1)|S | in polynomial time, where S is an optimal solution to the instance. We get the following result. Theorem 3. There exists a polynomial time algorithm that for a given table T = (T, U, V ) returns a basis P of Rowshigh(T ) with |P| (2 ln(|V |) + 1) |P |, where P is a minimum basis of Rowshigh(T ). 5 On the Size of Residual AFAs In (Angluin, Eisenstat, and Fisman 2015) it has been shown that RAFAs may be exponentially more succinct than RNFAs and RUFAs and double exponentially more succinct than DFAs. We strengthen these results by proving that RAFAs may be exponentially more succinct than every equivalent non-residual NFAs or UFAs. Furthermore, there exists a RAFA that is double exponentially more succinct than the minimal DFA and uses only 2 non-deterministic (i. e. ) transitions and only a linear number of universal (i. e. ) transitions. Thus, the restriction to residual automata still allows a very compact representation. On the other hand, we give an example where the residuality of an automata demands an exponentially larger state set. Theorem 4. For every even n N, there exists a language An that can be accepted by a residual AFA with 2n+1 states and every NFA or UFA for An needs at least n n/2 states. Proof Sketch. The alphabet Σn for An consists of disjoint subsets Σa = {a1, a2, . . . , an} and Σb = {b1, b2, . . . , bn}. An = {w1w2 | w1 Σ a, w2 Σ n, w1 contains all symbols from Σa, w2 does not contain all symbols from Σb}. We construct a residual AFA with states {p1, . . . , pn, q1, . . . , qn, x} that is sketched for n = 2 in Fig. 3. To prove the second property we show that every NFA has to remember all n n/2 subsets of size n/2 of Σa, while UFAs need to remember all n n/2 subsets of size n/2 of Σb. Figure 4: The (non-residual) AFA for the language Bn of Theorem 5 with n = 2. Theorem 5. For every n N there exists a language Bn over a binary alphabet that can be accepted by a (nonresidual) AFA with 2n + 2 states, but every residual AFA for Bn requires at least 2n states. One can construct the succinct AFAs of Theorem 5 as follows. Let Σ = {a, b} and consider Bn = {w w | w Σn, w is a prefix of w} (based on the construction of (Vardi 1995)). For n = 2, the non-residual AFA A = (Q, Q0, δ, F) for Bn is sketched in Fig. 4. A closer look at the constructions of succinct automata for Bn reveals that the resulting AFAs are in fact UFAs. Dually, Bn = Σ \ Bn can be accepted by an NFA with the same number of states 2n + 2. Thus, we obtain families of languages Bn and Bn, n = 1, 2, . . . , such that every residual AFA for Bn, resp. Bn, is exponentially larger than the corresponding minimal UFA, resp. NFA. As it was already noted in (Angluin, Eisenstat, and Fisman 2015), RAFAs may be double exponentially smaller than the minimal DFAs. We give a more precise bound inspired by a language defined by Chandra, Kozen, and Stockmeyer (1981). Theorem 6. For every n N there exists a language Cn such that the minimal DFA for Cn needs at least 22n states and there is a residual AFA with 2n2 + 5n states for Cn. The tables below summarize the results presented in this section. Here A1 A2 k1(n) k2(n) has the following meaning: For every n there exists a language Ln with k1(n) state automata of type A1 for Ln and every automaton of type A2 for Ln needs at least k2(n) states. RAFA NFA/UFA 2n + 1 n n/2 NFA/UFA RAFA 2n + 2 2n RAFA DFA 2n2 + 5n 22n 6 Discussion We have disproved the conjecture that the algorithm AL outputs residual AFAs only and designed a modified algorithm AL that achieves this property. This algorithm has almost the same complexity as AL . In fact, as all automata produced by AL in the experiments in (Angluin, Eisenstat, and Fisman 2015) were residual, we expect that our new algorithm AL only performs a single additional equivalence-query to verify the residuality compared to AL . Thus, based on the performance experiments reported for randomly generated automata or regular expressions AL outperforms the algorithms L and NL w. r. t. the number of membership queries. Simultaneously AL infers an (approximately minimal) RAFA which is always smaller than (or equal to) the corresponding minimal DFA generated by L and RNFA produced by NL . Typically, AL generates automata which are significantly more succinct than DFAs and RNFAs. Theoretical analysis shows that residual AFAs can be exponentially smaller than NFAs and even double exponentially more succinct than DFAs. This makes RAFAs an attractive choice for language representations in the design of learning algorithms. While residual non-deterministic automata have been understood quite well (Denis, Lemay, and Terlutte 2001; 2004; Bollig et al. 2009; Kasprzik 2010), fundamental questions concerning residual alternating automata remain open. In our paper we introduced a complementary notion to the canonical RNFA the canonical RUFA. Recently, we have exhibited languages for which the canonical RNFA and RUFA differ, but both automata are minimal AFAs. Thus, a meaningful notion for canonical AFAs would be desirable, but this seems to be a difficult problem, which we leave for future work. References Angluin, D.; Eisenstat, S.; and Fisman, D. 2015. Learning regular languages via alternating automata. In Proc. 24. IJCAI, 3308 3314. Angluin, D. 1987. Learning regular sets from queries and counterexamples. Information and Computation 75(2):87 106. Bollig, B.; Habermehl, P.; Kern, C.; and Leucker, M. 2009. Angluin-style learning of NFA. In Proc. 21. IJCAI, 1004 1009. Chandra, A. K.; Kozen, D. C.; and Stockmeyer, L. J. 1981. Alternation. J. ACM 28(1):114 133. De la Higuera, C. 2010. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press. Denis, F.; Lemay, A.; and Terlutte, A. 2001. Residual finite state automata. In Proc. 18. STACS, LNCS 2010, 144 157. Springer. Denis, F.; Lemay, A.; and Terlutte, A. 2004. Learning regular languages using RFSAs. Theoretical Computer Science 313(2):267 294. Johnson, D. S. 1974. Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9(3):256 278. Kasprzik, A. 2010. Learning residual finite-state automata using observation tables. In Proc. 12. DCFS, 205 212. Vardi, M. Y. 1995. An automata-theoretic approach to linear temporal logic. In Logics for Concurrency: Structure versus Automata, LNCS 1043, 238 266. Springer. Williamson, D. P., and Shmoys, D. B. 2011. The Design of Approximation Algorithms. Cambridge University Press.