This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Université Libre de Bruxelles, Belgiumanirban.majumdar@ulb.behttps://orcid.org/0000-0003-4793-1892funded by a grant from Fondation ULB Université Libre de Bruxelles, Belgiumsayan.mukherjee@ulb.behttps://orcid.org/0000-0001-6473-3172funded by a grant from Fondation ULB Université Libre de Bruxelles, Belgiumjean-francois.raskin@ulb.behttps://orcid.org/0000-0002-3673-1097receives support from the Fondation ULB \ccsdescTheory of computation Timed and hybrid models \supplementThe source codes related to our algorithm can be found as follows: \supplementdetails[subcategory=Source Code]Softwarehttps://github.com/mukherjee-sayan/ERA-greybox-learn \supplementdetails[subcategory=]ATVA 2024 Artifact Evaluation approved artifacthttps://doi.org/10.5281/zenodo.12684215 \hideLIPIcs

Greybox Learning of Languages Recognizable by Event-Recording Automata

Anirban Majumdar    Sayan Mukherjee    Jean-François Raskin
Abstract

In this paper, we revisit the active learning of timed languages recognizable by event-recording automata. Our framework employs a method known as greybox learning, which enables the learning of event-recording automata with a minimal number of control states. This approach avoids learning the region automaton associated with the language, contrasting with existing methods. We have implemented our greybox learning algorithm with various heuristics to maintain low computational complexity. The efficacy of our approach is demonstrated through several examples.

keywords:
Automata learning, Greybox learning, Event-recording automata

1 Introduction

Formal modeling of complex systems is essential in fields such as requirement engineering and computer-aided verification. However, the task of writing formal models is both laborious and susceptible to errors. This process can be facilitated by learning algorithms [21]. For example, if a system can be represented as a finite state automaton, and thus if the system’s set of behaviors constitutes a regular language, there exist thoroughly understood passive and active learning algorithms capable of learning the minimal DFA of the underlying language111Minimal DFAs are a canonical representation of regular languages and are closely tied to the Myhill-Nerode equivalence relation. We call this problem the specification discovery problem in the sequel. In the realm of active learning, the LL^{*} algorithm [5], pioneered by Angluin, is a celebrated approach that employs a protocol involving membership and equivalence queries, and enabling an efficient learning process for regular languages.

In this paper, we introduce an active learning framework designed to acquire timed languages that can be recognized by event-recording automata (ERA, for short)[3]. The class of ERA enjoys several nice properties over the more general class of Timed Automata (TA) [2]. First, ERA are determinizable and hence complementable, unlike TA, making it particularly well suited for verification as inclusion checking is decidable for this class. Second, in ERA, clocks are explicitly associated to events and implicitly reset whenever its associated event occurs. This property leads to automata that are usually much easier to interpret than classical TA where clocks are not tied to events. This makes the class of ERA a very interesting candidate for learning, when interpretability is of concern.

Adapting the LL^{*} algorithm for ERA is feasible by employing a canonical form known as the simple deterministic event-recording automata (SDERA). These SDERAs, introduced in [12] specifically for this application, are essentially the underlying region automaton of the timed language. Although the region automaton fulfills all the necessary criteria for the LL^{*} algorithm, its size is generally excessive. Typically, the region automaton is significantly larger than a deterministic ERA (DERA, for short) with the minimum number of locations for a given timed language (and it is known that it can be exponentially larger). This discrepancy arises because the region automaton explicitly incorporates the semantics of clocks in its states, whereas in a DERA, the semantics of clocks are implicitly utilized. The difficulty to work directly with DERA is related to the fact that there is no natural canonical form for DERA and usually, there are several DERA with minimum number of states that recognize the same timed language. To overcome this difficulty, we propose in this paper an innovative application of the greybox learning framework, see for example [1], and demonstrate that it is possible to learn an automaton whose number of states matches that of a minimum DERA for the underlying timed language.

In our approach, rather than learning the region automaton of the timed language, we establish an active learning setup in which the semantics of the regions are already known to the learning algorithm, thus eliminating the need for the algorithm to learn the semantics of regions. Our greybox learning algorithm is underpinned by the following concept:

𝖱𝖶K(L)=L(C)𝖱𝖾𝗀𝖫(Σ,K)\mathsf{RW}_{K}(L)=L(C)\cap{\sf RegL}(\Sigma,K) (1)

Here, 𝖱𝖶K(L)\mathsf{RW}_{K}(L) is a set of consistent region words that exactly represents the target timed language LL. Meanwhile, 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K) denotes the regular language of consistent region words across the alphabet Σ\Sigma and a maximal constant KK. This language is precisely defined once Σ\Sigma and KK are fixed and remains independent of the timed language to be learned; therefore, it does not require learning. Essentially, deciding if a region word belongs to 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K) or not reduces to solving the following “consistency problem”: given a region word, are there concrete timed words consistent with this (symbolic) region word? The answer to this question does not depend on the timed language to be learned but only on the semantics of clocks. This reduces the consistency problem to the problem of solving a linear program, which is known to be solvable efficiently.

The target of our greybox learning algorithm is a DERA CC recognizing the timed language LL. We will demonstrate that if a DERA with nn states can accept the timed language to be learned, then the DERA CC returned by our algorithm has at most nn many states. Here is an informal summary of our learning protocol: within the context of the LL^{*} vocabulary, we assume that the Teacher is capable of responding to membership queries of consistent region words. Additionally, the Teacher handles inclusion queries, unlike equivalence queries in LL^{*} by determining whether a timed language is contained in another. If a discrepancy exists, Teacher provides a counter-example in the form of a region word.

This learning framework can be applied to specification discovery problem presented in the motivations above: we assume that the requirements engineer has a specific timed language in mind for which he wants to construct a DERA, but we assume that he is not able to formally write this DERA on his own. However, the requirement engineer is capable of answering membership and inclusion queries related to this language. Since the engineer cannot directly draft a DERA that represents the timed language, it is crucial that the queries posed are semantic, focusing on the language itself rather than on a specific automaton representing the language. This distinction ensures that the questions are appropriate and manageable given the assumptions about the engineer’s capabilities. This is in contrast with [15], where queries are syntactic and answered with a specific DERA known by the Teacher. We will give more details about this comparison later in Section 3.

Technical contribution.

First, we define a greybox learning algorithm to derive a DERA with the minimal number of states that recognizes a target timed language. To achieve the learning of a minimal automaton, we must solve an NP-hard optimization problem: calculating a minimal finite state automaton that separates regular languages of positive and negative examples derived from membership and equivalence queries [11]. For that, we adapt to the timed setting the approach of Chen et al. in [7]. A similar framework has also been studied using SMT-based techniques by Moeller et al. in [16]. Second, we explore methods to circumvent this NP-hard optimization problem when the requirement for minimality is loosened. We introduce a heuristic that computes a candidate automaton in polynomial time using a greedy algorithm. This algorithm strives to maintain a small automaton size but does not guarantee the discovery of the absolute minimal solution, yet guaranteed to terminate with a DERA that accepts the input timed language. We have implemented this heuristic and demonstrate with our prototype that, in practice, it yields compact automata.

Related works.

Grinchtein et al. in [12] also present an active learning framework for ERA-recognizable languages. However, their learning algorithm generates simple DERAs – these are a canonical form of ERA recognizable languages, closely aligned with the underlying region automaton. In a simple DERA, transitions are labeled with region constraints, and notably, every path leading to an accepting state is satisfiable. This means the sequence of regions is consistent (as in the region automaton), and any two path prefixes reaching the same state are region equivalent. Typically, this results in a prohibitively large number of states in the automaton that maintains explicitly the semantics of regions. In contrast, our approach, while also labeling automaton edges with region constraints, permits the merging of prefixes that are not region equivalent. Enabled by equation 1, this allows for a more compact representation of the ERA languages (we generate DERA with minimal number of states). While our algorithm has been implemented, there is, to the best of our knowledge, no implementation of Grinchtein’s algorithm, certainly due to high complexity of their approach.

Lin et al. in [15] propose another active learning style algorithm for inferring ERA, but with zones (coarser constraints than regions) as guards on the transitions. This allows them to infer smaller automata than [12]. However, it assumes that Teacher is familiar with a specific DERA for the underlying language and is capable of answering syntactic queries about this DERA. In fact, their learning algorithm is recognizing the regular language of words over the alphabet, composed of events and zones of the specific DERA known to Teacher. Unlike this approach, our learning algorithm does not presuppose a known and fixed DERA but only relies on queries that are semantic rather than syntactic. Although the work of [15] may seem superficially similar to ours, we dedicate in Section 3 an entire paragraph to the formal and thorough comparison between our two different approaches.

Recently, Masaki Waga developed an algorithm [24] for learning the class of Deterministic Timed Automata (DTA), which includes ERA recognizable languages. Waga’s method resets a clock on every transition and utilizes region-based guards, often producing complex automata. Consequently, this algorithm does not guarantee the minimality of the resulting automaton, but merely ensures it accepts the specified timed language. In contrast, our approach guarantees learning of automata with the minimal number of states, featuring implicit clock resets that enhance readability. In the experimental section, we detail a performance comparison of our tool against their tool, LearnTA, across various examples. Further, in [20], the authors extend Waga’s approach to reduce the number of clocks in learned automata, though they also do not achieve minimality. Their method also requires a significantly higher number of queries to implement these improvements.

Several other works have proposed learning methods for different models of timed languages, such as, One-clock Timed Automata [25], Real-time Automata [4], Mealy-machines with timers [22, 6]. There have also been approaches other than active learning for learning such models, for example, using Genetic programming [19] and passive learning [23, 8].

2 Preliminaries and notations

DFA and 3DFA.

Let Σ={σ1,σ2,,σk}\Sigma=\{\sigma_{1},\sigma_{2},\dots,\sigma_{k}\} be a finite alphabet. A deterministic finite state automaton (DFA) over Σ\Sigma is a tuple A=(Q,q𝗂𝗇𝗂𝗍,Σ,δ,F)A=(Q,q_{\sf init},\Sigma,\delta,F) where QQ is a finite set of states, q𝗂𝗇𝗂𝗍Qq_{\sf init}\in Q is the initial state, δ:Q×ΣQ\delta:Q\times\Sigma\rightarrow Q is a partial transition function, and FQF\subseteq Q is the set of final (accepting) states. When δ\delta is a total function, we say that AA is total. A DFA A=(Q,q𝗂𝗇𝗂𝗍,Σ,δ,F)A=(Q,q_{\sf init},\Sigma,\delta,F) defines a regular language, noted L(A)L(A), which is a subset of Σ\Sigma^{*} and which contains the set of words w=σ1σ2σnw=\sigma_{1}\sigma_{2}\dots\sigma_{n} such that there exists an accepting run of AA over ww, i.e., there exists a sequence of states q0q1qnq_{0}q_{1}\dots q_{n} such that q0=q𝗂𝗇𝗂𝗍q_{0}=q_{\sf init}, and for all ii, 1in1\leq i\leq n, δ(qi1,σi)=qi\delta(q_{i-1},\sigma_{i})=q_{i}, and qnFq_{n}\in F. We then say that the word ww is accepted by AA. If such an accepting run does not exist, we say that AA rejects the word ww.

A 3DFA DD over Σ\Sigma is a tuple (Q,q𝗂𝗇𝗂𝗍,Σ,δ,𝖠,𝖱,𝖤)(Q,q_{\sf init},\Sigma,\delta,{\sf A},{\sf R},{\sf E}) where QQ is a finite set of states, q𝗂𝗇𝗂𝗍Qq_{\sf init}\in Q is the initial state, δ:Q×ΣQ\delta:Q\times\Sigma\rightarrow Q is a total transition function, and 𝖠𝖱𝖤{\sf A}\uplus{\sf R}\uplus{\sf E} forms a partition of QQ into a set of accepting states 𝖠{\sf A}, a set of rejection states 𝖱{\sf R}, and a set of ‘don’t care’ states 𝖤{\sf E}. The 3DFA DD defines the function D:Σ{0,1,?}D:\Sigma^{*}\rightarrow\{0,1,?\} as follows: for w=σ0σ1σnw=\sigma_{0}\sigma_{1}\dots\sigma_{n} let q0q1qn+1q_{0}q_{1}\dots q_{n+1} be the unique run of DD on ww. If qn+1𝖠q_{n+1}\in{\sf A} (resp. qn+1𝖱q_{n+1}\in{\sf R}), then D(w)=1D(w)=1 (resp. D(w)=0D(w)=0) and we say that ww is strongly accepted (resp. strongly rejected) by DD, and if qn+1𝖤q_{n+1}\in{\sf E}, then D(w)=?D(w)={?}. We interpret “??” as either way, i.e., so the word can be either accepted or rejected. We say that a regular language LL is consistent with a 3DFA DD, noted LDL\models D, if for all wΣw\in\Sigma^{*}, if D(w)=0D(w)=0 then wLw\not\in L, and if D(w)=1D(w)=1 then wLw\in L. Accordingly, we define DD^{-} as the DFA obtained from DD where the partition 𝖠𝖱𝖤{\sf A}\uplus{\sf R}\uplus{\sf E} is replaced by F=𝖱F={\sf R}, clearly L(D)={wΣD(w)=0}L(D^{-})=\{w\in\Sigma^{*}\mid D(w)=0\}, it is thus the set of words that are strongly rejected by DD. Similarly, we define D+D^{+} as the DFA obtained from DD where the partition 𝖠𝖱𝖤{\sf A}\uplus{\sf R}\uplus{\sf E} is replaced by F=𝖠F={\sf A}, then L(D+)={wΣD(w)=1}L(D^{+})=\{w\in\Sigma^{*}\mid D(w)=1\}, it is thus the set of words that are strongly accepted by DD. Now, it is easy to see that LDL\models D iff L(D+)LL(D^{+})\subseteq L and L(D)L¯L(D^{-})\subseteq\overline{L}.

Timed words and timed languages.

A timed word over an alphabet Σ\Sigma is a finite sequence (σ1,t1)(σ2,t2)(σn,tn)(\sigma_{1},t_{1})(\sigma_{2},t_{2})\dots(\sigma_{n},t_{n}) where each σiΣ\sigma_{i}\in\Sigma and ti0t_{i}\in\mathbb{R}_{\geq 0}, for all 1in1\leq i\leq n, and for all 1i<jn1\leq i<j\leq n, titjt_{i}\leq t_{j} (time is monotonically increasing). We use 𝖳𝖶(Σ)\mathsf{TW}(\Sigma) to denote the set of all timed words over the alphabet Σ\Sigma.

A timed language is a (possibly infinite) set of timed words. In existing literature, a timed language is called timed regular when there exists a timed automaton ‘recognizing’ the language. Timed Automata (TA) [2] extend deterministic finite-state automata with clocks. In what follows, we will use a subclass of TA, where clocks have a pre-assigned semantics and are not reset arbitrarily. This class is known as the class of Event-recording Automata (ERA) [3]. We now introduce the necessary vocabulary and notations for their definition.

Constraints.

A clock is a non-negative real valued variable, that is, a variable ranging over 0\mathbb{R}_{\geq 0}. Let KK be a positive integer. An atomic KK-constraint over a clock xx, is an expression of the form x=cx=c, x(c,d)x\in(c,d) or x>Kx>K where c,d[0,K]c,d\in\mathbb{N}\cap[0,K]. A KK-constraint over a set of clocks XX is a conjunction of atomic KK-constraints over clocks in XX. Note that, KK-constraints are closely related to the notion of zones [10] as in the literature of TA. However, zones also allow difference between two clocks, that are not allowed in KK-constraints.

An elementary KK-constraint over a clock xx, is an atomic KK-constraint where the intervals are restricted to unit intervals; more formally, it is an expression of the form x=cx=c, x(d,d+1)x\in(d,d+1) or x>Kx>K where c,d,d+1[0,K]c,d,d+1\in\mathbb{N}\cap[0,K]. A simple KK-constraint over XX is a conjunction of elementary KK-constraints over clocks in XX, where each variable xXx\in X appears exactly in one conjunct. The definition of simple constraints also appear in earlier works, for example [12]. One can again note that, simple constraints closely relate to the classical notion of regions [2]. However, again, regions consider difference between two clocks as well, which simple constraints do not. Interestingly, we do not need the granularity that regions offer (by comparing clocks) for our purpose, because a sequence of simple constraints induce a unique region in the classical sense (see Lemma 2.6).

Satisfaction of constraints.

A valuation for a set of clocks XX is a function v:X0{v\colon X\rightarrow\mathbb{R}_{\geq 0}}. A valuation vv for XX satisfies an atomic KK-constraint ψ\psi, written as vψv\models\psi, if: v(x)=cv(x)=c when ψ:=x=c\psi:=x=c, v(x)(c,d)v(x)\in(c,d) when ψ:=x(c,d)\psi:=x\in(c,d) and v(x)>Kv(x)>K when ψ:=x>K\psi:=x>K.

A valuation vv satisfies a KK-constraint over XX if vv satisfies all its conjuncts. Given a KK-constraint ψ\psi, we denote by [[ψ]][\![\psi]\!] the set of all valuations vv such that vψv\models\psi. We say that a simple constraint rr satisfies an elementary constraint ψ\psi, noted rψr\models\psi, if [[r]][[ψ]][\![r]\!]\subseteq[\![\psi]\!]. It is easy to verify that for any simple constraint rr and any elementary constraint ψ\psi, either [[r]][[ψ]]=[\![r]\!]\cap[\![\psi]\!]=\emptyset or [[r]][[ψ]][\![r]\!]\subseteq[\![\psi]\!].

Let Σ={σ1,σ2,,σk}\Sigma=\{\sigma_{1},\sigma_{2},\dots,\sigma_{k}\} be a finite alphabet. The set of event-recording clocks associated to Σ\Sigma is denoted by XΣ={xσσΣ}X_{\Sigma}=\{x_{\sigma}\mid\sigma\in\Sigma\}. We denote by 𝖢(Σ,K)\mathsf{C}(\Sigma,K) the set of all KK-constraints over the set of clocks XΣX_{\Sigma} and we use 𝖱𝖾𝗀(Σ,K)\mathsf{Reg}(\Sigma,K) to denote the set of all simple KK-constraints over the clocks in XΣX_{\Sigma}. For simple KK-constraints we adopt the notation 𝖱𝖾𝗀\mathsf{Reg}, since, as remarked earlier, these kinds of constraints relate closely to the well-known notion of regions. Since simple KK-constraints are also KK-constraints, we have that 𝖱𝖾𝗀(Σ,K)𝖢(Σ,K)\mathsf{Reg}(\Sigma,K)\subset\mathsf{C}(\Sigma,K).

Definition 2.1 (ERA).

A KK-Event-Recording Automaton (KK-ERA) AA is a tuple (Q,q𝗂𝗇𝗂𝗍,Σ,E,F)(Q,q_{\mathsf{init}},\Sigma,E,F) where QQ is a finite set of states, q𝗂𝗇𝗂𝗍Qq_{\sf init}\in Q is the initial state, Σ\Sigma is a finite alphabet, EQ×Σ×𝖢(Σ,K)×QE\subseteq Q\times\Sigma\times\mathsf{C}(\Sigma,K)\times Q is the set of transitions, and FQF\subseteq Q is the subset of accepting states. Each transition in AA is a tuple (q,σ,g,q)(q,\sigma,g,q^{\prime}), where q,qQq,q^{\prime}\in Q, σΣ\sigma\in\Sigma and g𝖢(Σ,K)g\in\mathsf{C}(\Sigma,K), gg is called the guard of the transition. AA is called a KK-deterministic-ERA (KK-DERA) if for every state qQq\in Q and every letter σ\sigma, if there exist two transitions (q,σ,g1,q1)(q,\sigma,g_{1},q_{1}) and (q,σ,g2,q2)(q,\sigma,g_{2},q_{2}) then [[g1]][[g2]]=[\![g_{1}]\!]\cap[\![g_{2}]\!]=\emptyset.

q0\scriptstyle q_{0}q1\scriptstyle q_{1}q2\scriptstyle q_{2}a\scriptstyle ab,xa=1\scriptstyle b,\leavevmode\nobreak\ x_{a}=1a,xb1\scriptstyle a,\leavevmode\nobreak\ x_{b}\leq 1
Figure 1: A DERA that accepts timed words where every aa is followed by a bb after exactly 11 time unit and every bb is followed by an aa within 11 time units

For the semantics, initially, all the clocks start with the value 0 and then they all elapse at the same rate. For every transition on a letter σ\sigma, once the transition is taken, its corresponding recording clock xσx_{\sigma} gets reset to the value 0.

Clocked words.

Given a timed word 𝗍𝗐=(σ1,t1)(σ2,t2)(σn,tn){\sf tw}=(\sigma_{1},t_{1})(\sigma_{2},t_{2})\dots(\sigma_{n},t_{n}), we associate with it a clocked word 𝖼𝗐(𝗍𝗐)=(σ1,v1)(σ2,v2)(σn,vn)\mathsf{cw}(\mathsf{tw})=(\sigma_{1},v_{1})(\sigma_{2},v_{2})\dots(\sigma_{n},v_{n}) where each vi:XΣ0v_{i}:X_{\Sigma}\rightarrow\mathbb{R}_{\geq 0} maps each clock of XΣX_{\Sigma} to a real-value as follows: vi(xσ)=titjv_{i}(x_{\sigma})=t_{i}-t_{j} where j=max{l<iσl=σ}j=\max\{l<i\mid\sigma_{l}=\sigma\}, with the convention that max()=0\max(\emptyset)=0. In words, the value vi(xσ)v_{i}(x_{\sigma}) is the amount of time elapsed since the last occurrence of σ\sigma in tw; which is why we call the clocks xσx_{\sigma} ‘recording’ clocks. Although not explicitly, clocks are implicitly reset after every occurrence of their associated events.

Timed language of a KK-ERA.

A timed word 𝗍𝗐=(σ1,t1)(σ2,t2)(σn,tn)\mathsf{tw}=(\sigma_{1},t_{1})(\sigma_{2},t_{2})\dots(\sigma_{n},t_{n}) with its clocked word 𝖼𝗐(𝗍𝗐)=(σ1,v1)(σ2,v2)(σn,vn){\sf cw}({\sf tw})=(\sigma_{1},v_{1})(\sigma_{2},v_{2})\dots(\sigma_{n},v_{n}), is accepted by AA if there exists a sequence of states q0q1qnq_{0}q_{1}\dots q_{n} of AA such that q0=q𝗂𝗇𝗂𝗍q_{0}=q_{\sf init}, qnFq_{n}\in F, and for all 1in1\leq i\leq n, there exists e=(qi1,σ,ψ,qi)Ee=(q_{i-1},\sigma,\psi,q_{i})\in E such that σi=σ\sigma_{i}=\sigma, and viψv_{i}\models\psi. The set of all timed words accepted by AA is called the timed language of AA, and will be denoted by L𝗍𝗐(A)L^{\mathsf{tw}}(A). We now ask a curious reader: is it possible to construct an ERA with two (or less) states that accepts the timed language represented in Figure 1? We will answer this question in Section 5.

Lemma 2.2 ([3]).

For every KK-ERA AA, there exists a KK-DERA AA^{\prime} such that L𝗍𝗐(A)=L𝗍𝗐(A)L^{\mathsf{tw}}(A)=L^{\mathsf{tw}}(A^{\prime}) and there exists a KK-DERA A¯\overline{A^{\prime}} such that L𝗍𝗐(A)¯=L𝗍𝗐(A¯)\overline{L^{\mathsf{tw}}(A)}=L^{\mathsf{tw}}(\overline{A^{\prime}}).

A timed language LL is KK-ERA (resp. KK-DERA) definable if there exists a KK-ERA (resp. KK-DERA) AA such that L𝗍𝗐(A)=LL^{\mathsf{tw}}(A)=L. Now, due to Lemma 2.2, a timed language LL is KK-ERA definable iff it is KK-DERA definable.

Symbolic words

over (Σ,K)(\Sigma,K) are finite sequences (σ1,g1)(σ2,g2)(σn,gn)(\sigma_{1},g_{1})(\sigma_{2},g_{2})\dots(\sigma_{n},g_{n}) where each σiΣ\sigma_{i}\in\Sigma and gi𝖢(Σ,K)g_{i}\in\mathsf{C}(\Sigma,K), for all 1in1\leq i\leq n. Similarly, a region word over (Σ,K)(\Sigma,K) is a finite sequence (σ1,r1)(σ2,r2)(σn,rn)(\sigma_{1},r_{1})(\sigma_{2},r_{2})\dots(\sigma_{n},r_{n}) where each σiΣ\sigma_{i}\in\Sigma and ri𝖱𝖾𝗀(Σ,K)r_{i}\in{\sf Reg}(\Sigma,K) is a simple KK-constraint, for all 1in1\leq i\leq n. Lemma 2.6 will justify why we refer to symbolic words over simple constraints as ‘region’ words.

We are now equipped to define when a timed word 𝗍𝗐{\sf tw} is compatible with a symbolic word 𝗌𝗐\mathsf{sw}. Let 𝗍𝗐=(σ1,t1)(σ2,t2)(σn,tn)\mathsf{tw}=(\sigma_{1},t_{1})(\sigma_{2},t_{2})\dots(\sigma_{n},t_{n}) be a timed word with its clocked word being (σ1,v1)(σ2,v2)(σn,tn)(\sigma_{1},v_{1})(\sigma_{2},v_{2})\dots(\sigma_{n},t_{n}), and let 𝗌𝗐=(σ1,g1)(σ2,g2)(σm,gm)\mathsf{sw}=(\sigma^{\prime}_{1},g_{1})(\sigma^{\prime}_{2},g_{2})\dots(\sigma^{\prime}_{m},g_{m}) be a symbolic word. We say that 𝗍𝗐\mathsf{tw} is compatible with 𝗌𝗐\mathsf{sw}, noted 𝗍𝗐𝗌𝗐\mathsf{tw}\models\mathsf{sw}, if we have that: (i)(i) n=mn=m, i.e., both words have the same length, (ii)(ii) σi=σi\sigma_{i}=\sigma^{\prime}_{i} and (iii)(iii) vigiv_{i}\models g_{i}, for all ii, 1in1\leq i\leq n. We denote by [[𝗌𝗐]]={𝗍𝗐𝖳𝖶(Σ)𝗍𝗐𝗌𝗐}[\![\mathsf{sw}]\!]=\{\mathsf{tw}\in{\sf TW}(\Sigma)\mid\mathsf{tw}\models\mathsf{sw}\}. We say that a symbolic word 𝗌𝗐\mathsf{sw} is consistent if [[𝗌𝗐]][\![\mathsf{sw}]\!]\neq\emptyset, and inconsistent otherwise. We will use 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K) to denote the set of all consistent region words over the set of all KK-simple constraints 𝖱𝖾𝗀(Σ,K)\mathsf{Reg}(\Sigma,K).

Example 2.3.

Let Σ={a,b}\Sigma=\{a,b\} be an alphabet and let XΣ={xa,xb}X_{\Sigma}=\{x_{a},x_{b}\} be its set of recording clocks. The symbolic word 𝗋𝗐1:=(a,xa=0xb=0)(b,xa=1xb=1)\mathsf{rw}_{1}:=(a,\leavevmode\nobreak\ x_{a}=0\wedge x_{b}=0)(b,\leavevmode\nobreak\ x_{a}=1\wedge x_{b}=1) is consistent, since the timed word (a,0)(b,1)𝗋𝗐1(a,0)(b,1)\models\mathsf{rw}_{1}. On the other hand, consider the symbolic word 𝗋𝗐2:=(a,xa=0xb=1)(b,xa=1xb=0)\mathsf{rw}_{2}:=(a,\leavevmode\nobreak\ x_{a}=0\wedge x_{b}=1)(b,\leavevmode\nobreak\ x_{a}=1\wedge x_{b}=0). Let 𝗍𝗐=(a,t1)(b,t2)\mathsf{tw}=(a,t_{1})(b,t_{2}) be a timed word and 𝖼𝗐(𝗍𝗐)=(a,v1)(b,v2)\mathsf{cw}(\mathsf{tw})=(a,v_{1})(b,v_{2}) be the clocked word of 𝗍𝗐\mathsf{tw}. Then 𝗍𝗐𝗋𝗐2\mathsf{tw}\models\mathsf{rw}_{2} would imply t1=0t_{1}=0 and t2t1=1t_{2}-t_{1}=1, but in that case, v2(xb)=1v_{2}(x_{b})=1, that is, v2⊧̸xa=1xb=0v_{2}\not\models x_{a}=1\wedge x_{b}=0. Hence, 𝗋𝗐2\mathsf{rw}_{2} is inconsistent.

Lemma 2.4.

For every timed word 𝗍𝗐\mathsf{tw} over an alphabet Σ\Sigma, for every fixed positive integer KK, there exists a unique region word 𝗋𝗐𝖱𝖾𝗀𝖫(Σ,K)\mathsf{rw}\in{\sf RegL}(\Sigma,K) where 𝗍𝗐[[𝗋𝗐]]\mathsf{tw}\in[\![\mathsf{rw}]\!].

Proof 2.5 (Proof sketch).

Given a timed word 𝗍𝗐\mathsf{tw}, consider its clocked word 𝖼𝗐(𝗍𝗐)\mathsf{cw}(\mathsf{tw}) (as in Page 2). Now, it is easy to see that each clock valuation satisfies only one simple KK-constraint. Therefore, we can construct a region word by replacing the valuations present in 𝖼𝗐(𝗍𝗐)\mathsf{cw}(\mathsf{tw}) with the simple KK-constraints they satisfy, thereby constructing the (unique) region word 𝗋𝗐\mathsf{rw} that 𝗍𝗐\mathsf{tw} is compatible with.

Lemma 2.6.

For every KK-ERA AA over an alphabet Σ\Sigma and for every region word 𝗋𝗐𝖱𝖾𝗀𝖫(Σ,K)\mathsf{rw}\in{\sf RegL}(\Sigma,K), either [[𝗋𝗐]]L𝗍𝗐(A)[\![\mathsf{rw}]\!]\subseteq L^{\mathsf{tw}}(A) or [[𝗋𝗐]]L𝗍𝗐(A)=[\![\mathsf{rw}]\!]\cap L^{\mathsf{tw}}(A)=\emptyset. Equivalently, for all timed words 𝗍𝗐1,𝗍𝗐2[[𝗋𝗐]]\mathsf{tw}_{1},\mathsf{tw}_{2}\in[\![\mathsf{rw}]\!], 𝗍𝗐1L𝗍𝗐(A)\mathsf{tw}_{1}\in L^{\mathsf{tw}}(A) iff 𝗍𝗐2L𝗍𝗐(A)\mathsf{tw}_{2}\in L^{\mathsf{tw}}(A).

The above lemma states that, two timed words that are both compatible with one region word, cannot be ‘distinguished’ using a DERA. This can be proved using Lemma 1818 of [12].

Symbolic languages.

Note that, every KK-DERA AA can be transformed into another KK-DERA AA^{\prime} where every guard present in AA^{\prime} is a simple KK-constraint. This can be constructed as follows: for every transition (q,σ,g,q)(q,\sigma,g,q^{\prime}) in AA where gg is a KK-constraint (and not a simple KK-constraint), AA^{\prime} contains the transitions (q,σ,r,q)(q,\sigma,r,q^{\prime}) such that rgr\models g. Since, for every KK-constraint gg, there are only finitely-many (possibly, exponentially many) simple KK-constraints rr that satisfy gg, this construction indeed yields a finite automaton. As for an example, the transition from q1q_{1} to q2q_{2} in the automaton in Figure 1 contains a non-simple KK-constraint as guard. This transition can be replaced with the following set of transitions without altering its timed language: (q1,b,xa=1xb=0,q2),(q1,b,xa=10<xb<1,q2),(q1,b,xa=1xb=1,q2),(q1,b,xa=1xb>1,q2)(q_{1},b,x_{a}=1\wedge x_{b}=0,q_{2}),(q_{1},b,x_{a}=1\wedge 0<x_{b}<1,q_{2}),(q_{1},b,x_{a}=1\wedge x_{b}=1,q_{2}),(q_{1},b,x_{a}=1\wedge x_{b}>1,q_{2}). Note that, AA^{\prime} in the above construction is different from the “simple DERA” for AA, as defined in [12]. A key difference is that, in AA^{\prime} there can be paths leading to an accepting state that are not satisfiable by any timed words, which is not the case in the simple DERA of AA. This relaxation ensures, AA^{\prime} here has the same number of states as AA, in contrast, in a simple DERA of [12], the number of states can be exponentially larger than the original DERA.

With the above construction in mind, given a KK-DERA AA, we associate two regular languages to it, called the syntactic region language of AA and the region language of AA, both over the finite alphabet Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K), as follows:

  • the syntactic region language of AA, denoted L𝗌𝗋𝗐(A)L^{\sf s\cdot rw}(A), is the set of region words 𝗋𝗐=(σ1,r1)(σ2,r2)(σm,rm){\sf rw}=(\sigma_{1},r_{1})(\sigma_{2},r_{2})\dots(\sigma_{m},r_{m}) such that there exists a sequence of states q0q1qnq_{0}q_{1}\dots q_{n} in AA and (i)(i) q0=q𝗂𝗇𝗂𝗍q_{0}=q_{\sf init}, qnFq_{n}\in F, and for all 1in1\leq i\leq n, there exists e=(qi1,σ,ψ,qi)Ee=(q_{i-1},\sigma,\psi,q_{i})\in E such that σi=σ\sigma_{i}=\sigma, and riψr_{i}\models\psi. It is easy to see that AA plays the role of a DFA for this language and that this language is thus a regular language over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K).

  • the (semantic) region language of AA, denoted L𝗋𝗐(A)L^{\sf rw}(A), is the subset of L𝗌𝗋𝗐(A)L^{\sf s\cdot rw}(A) that is restricted to the set of region words that are consistent. This language is thus the intersection of L𝗌𝗋𝗐(A)L^{\sf s\cdot rw}(A) with 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K). And so, it is also a regular language over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K). (This is precisely the language accepted by a “simple DERA” corresponding to AA, as defined in [12]).

Example 2.7.

Let AA be the automaton depicted in Figure 1. Now, consider the region word wr:=(a,xa=1xb=1)(b,xa=1xb=3)w_{r}:=(a,x_{a}=1\wedge x_{b}=1)(b,x_{a}=1\wedge x_{b}=3). Note that, 𝗋𝗐L𝗌𝗋𝗐(A)\mathsf{rw}\in L^{\mathsf{s\cdot rw}}(A), since there is the path q0a,q1b,xa=1q2q_{0}\xrightarrow[]{a,\leavevmode\nobreak\ \top}q_{1}\xrightarrow[]{b,\leavevmode\nobreak\ x_{a}=1}q_{2} in AA and (i) (xa=1xb=1)(x_{a}=1\wedge x_{b}=1)\models\top, and (ii) (xa=1xb=3)(xa=1)(x_{a}=1\wedge x_{b}=3)\models(x_{a}=1). However, one can show that 𝗋𝗐L𝗋𝗐(A)\mathsf{rw}\notin L^{\mathsf{rw}}(A). Indeed, [[wr]]=[\![w_{r}]\!]=\emptyset, since for every clocked word 𝖼𝗐=(a,v1)(b,v2)\mathsf{cw}=(a,v_{1})(b,v_{2}), 𝖼𝗐𝗋𝗐\mathsf{cw}\models\mathsf{rw} only if v2(xa)=1v_{2}(x_{a})=1 and v2(xb)=3v_{2}(x_{b})=3, however, this is not possible, since in order for v2(xa)v_{2}(x_{a}) to be 11, the time difference between the aa and bb must be 11-time unit, in which case, v2(xb)v_{2}(x_{b}) will always be 22 (because, v1(xb)=1v_{1}(x_{b})=1).

Remark 2.8.

Given a DFA CC over the alphabet Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K), one can interpret it as a K-DERA and associate the two –syntactic and semantic– languages as defined above. Then, the regular language L(C)L(C) is the set of all region words in L𝗌𝗋𝗐(C)L^{\sf s\cdot rw}(C), when CC is interpreted as a DERA. Similarly, abusing the notation, we write L𝗋𝗐(C)L^{\sf rw}(C) to denote the set of all consistent region words accepted by CC, and L𝗍𝗐(C)L^{\sf tw}(C) to denote the set of all timed words accepted by CC. This is a crucial remark to note, as we will use this notation often in the rest of the paper.

symbolic word L(A)L(A) L𝗌𝗋𝗐(A)L^{\mathsf{s\cdot rw}}(A) L𝗋𝗐(A)L^{\mathsf{rw}}(A)
(a,)(a,\top)
(a,xa=1xb=1)(b,xa=0xb=1)(a,\leavevmode\nobreak\ x_{a}=1\wedge x_{b}=1)(b,\leavevmode\nobreak\ x_{a}=0\wedge x_{b}=1)
(a,xa>1xb>1)(b,xa>1xb=0)(a,\leavevmode\nobreak\ x_{a}>1\wedge x_{b}>1)(b,\leavevmode\nobreak\ x_{a}>1\wedge x_{b}=0)
Table 1: Different symbolic languages for the automaton AA in Figure 1.

3 greybox learning framework

In this work, we are given a target timed language LL, and we are trying to infer a DERA that recognizes LL. To achieve this, we propose a greybox learning algorithm that involves a Teacher and a Student. Before giving details about this algorithm, we first make clear what are the assumptions that we make on the interaction between the Teacher and the Student. We then introduce some important notions that will be useful when we present the algorithm in Section 4.

3.1 The learning protocol

The purpose of the greybox active learning algorithm that we propose is to build a DERA for an ERA-definable timed language LL using the following protocol between a so-called Teacher, that knows the language, and Student who tries to discover it. We assume that the Student is aware of the alphabet Σ\Sigma and the maximum constant KK that can appear in a region word, thus, due to Lemma 3.1, the Student can then determine the set of consistent region words 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K).

Lemma 3.1.

Given an alphabet Σ\Sigma and a positive integer KK, the set 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K) consisting of all consistent region words over (Σ,K)(\Sigma,K), forms a regular language over the alphabet Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K) and it can be recognized by a finite automaton. In the worst case, the size of this automaton is exponential both in |Σ||\Sigma| and KK.

Proof 3.2 (Proof sketch).

Consider the one-state ERA where the state contains a transition to itself on every pair from Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times\mathsf{Reg}(\Sigma,K). Now, due to Lemma 2.2, we know there exists a DERA AdA^{d} such that L𝗍𝗐(A)=L𝗍𝗐(Ad)L^{\mathsf{tw}}(A)=L^{\mathsf{tw}}(A^{d}). Moreover, due to results presented in [12], we know AdA^{d} can be transformed into a “simple DERA” AsdA^{sd} such that L𝗍𝗐(Ad)=L𝗍𝗐(Asd)L^{\mathsf{tw}}(A^{d})=L^{\mathsf{tw}}(A^{sd}). Now, from the definition of simple DERA, we get that for every region word 𝗋𝗐L(Asd)\mathsf{rw}\in L(A^{sd}), [[𝗋𝗐]][\![\mathsf{rw}]\!]\neq\emptyset, i.e. L(Asd)𝖱𝖾𝗀𝖫(Σ,K)L(A^{sd})\subseteq{\sf RegL}(\Sigma,K). On the other hand, since the language of AA is the set of all timed words, this means, L(Asd)L(A^{sd}) must contain the set of all consistent region words which is 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K). Therefore, we get that L(Asd)=𝖱𝖾𝗀𝖫(Σ,K)L(A^{sd})={\sf RegL}(\Sigma,K). The size of AsdA^{sd} is exponential in both |Σ||\Sigma| and KK ([12]).

Since the Student can compute the automaton for 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K), we assume that the Student only poses membership queries for consistent region words, that is, region words from the set 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K). The Student, that is the learning algorithm, is allowed to formulate two types of queries to the Teacher:

Membership queries: given a consistent region word 𝗋𝗐\mathsf{rw} over the alphabet Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times\mathsf{Reg}(\Sigma,K), Teacher answers Yes if [[𝗋𝗐]]L[\![\mathsf{rw}]\!]\subseteq L, and No if [[𝗋𝗐]]L[\![\mathsf{rw}]\!]\not\subseteq L which, for region words, is equivalent to [[𝗋𝗐]]L=[\![\mathsf{rw}]\!]\cap L=\emptyset (cf. Lemma 2.6).

Equivalence queries: given a DERA AA, Student can ask two types of inclusion queries: L𝗍𝗐(A)LL^{\mathsf{tw}}(A)\subseteq L, or LL𝗍𝗐(A)L\subseteq L^{\mathsf{tw}}(A). The Teacher answers Yes to an inclusion query if the inclusion holds, otherwise returns a counterexample.

Comparison with the work in [15].

The queries that are allowed in our setting are semantic queries, so, they can be answered if the Teacher knows the timed language LL and the answers to the queries are not bound by the particular ERA that the Teacher has for the language. This is in contrast with the learning algorithm proposed in [15]. Indeed, the learning algorithm proposed there considers queries that are answered for a specific DERA that the Teacher uses as a reference to answer those queries. In particular, a membership query in their setting is formulated according to a symbolic word wzw_{z}, i.e. a finite word over the alphabet Σ×𝖢(Σ,K)\Sigma\times\mathsf{C}(\Sigma,K). The membership query about the symbolic word wzw_{z} will be answered positively if there is a run that follows edges exactly annotated by this symbolic word in the automaton and that leads to an accepting state. For the equivalence query, the query asks if a candidate DERA is equal to the DERA that is known by the Teacher. Again, the answer is based on the syntax and not on the semantics. For instance, consider the automaton AA in Figure 1. When a membership query is performed in their algorithm for the symbolic word wz:=(a,)(b,xa=1)(a,xa=1)w_{z}:=(a,\top)(b,x_{a}=1)(a,x_{a}=1), even though [[wz]]L𝗍𝗐(A)[\![w_{z}]\!]\subseteq L^{\mathsf{tw}}(A), their Teacher will respond negatively – this is because the guard on the transition q2a,xa1q1q_{2}\xrightarrow{a,x_{a}\leq 1}q_{1} in AA, does not syntactically match the expression xa=1x_{a}=1, which is present in the third position of wzw_{z}. That is, when treating AA as a DFA with the alphabet being Σ×𝖢(Σ,K)\Sigma\times\mathsf{C}(\Sigma,K), wzL(A)w_{z}\notin L(A). So, the algorithm in [15] essentially learns the regular languages of symbolic words over the alphabet Σ×𝖢(Σ,K)\Sigma\times\mathsf{C}(\Sigma,K) rather than the semantics of the underlying timed language. As a corollary, our result (which we develop in Section 4) on returning a DERA with the minimum number of states holds even if the input automaton has more states, contrary to [15].

3.2 The region language associated to a timed language

Definition 3.3 (The region language).

Given a timed language LL, we define its KK-region language as the set 𝖱𝖶K(L):={𝗋𝗐𝖱𝖾𝗀𝖫(Σ,K)[[𝗋𝗐]]L}\mathsf{RW}_{K}(L):=\{\mathsf{rw}\in{\sf RegL}(\Sigma,K)\mid[\![\mathsf{rw}]\!]\subseteq L\}.

For every ERA-definable timed language LL, its KK-region language 𝖱𝖶K(L)\mathsf{RW}_{K}(L) is uniquely determined due to Lemma 2.6. We now present a set of useful results regarding the set 𝖱𝖶K(L)\mathsf{RW}_{K}(L).

Lemma 3.4.

Let LL be an ERA-definable timed language, and let AA be a KK-DERA recognizing LL. Then, the following statements hold:

  1. 1.

    𝗋𝗐𝖱𝖶K(L)[[𝗋𝗐]]=L\bigcup_{\mathsf{rw}\in\mathsf{RW}_{K}(L)}[\![\mathsf{rw}]\!]=L,

  2. 2.

    𝖱𝖶K(L¯)=𝖱𝖶K(L)¯𝖱𝖾𝗀𝖫(Σ,K)\mathsf{RW}_{K}(\overline{L})=\overline{\mathsf{RW}_{K}(L)}\cap{\sf RegL}(\Sigma,K), where L¯\overline{L} denotes the complement of LL222The complement of LL is also ERA-definable, since the class of ERA is closed under complementation [3],

Proof 3.5.
  1. 1.

    From Definition 3.3, we get that 𝗋𝗐𝖱𝖶K(L)[[𝗋𝗐]]L\bigcup_{\mathsf{rw}\in\mathsf{RW}_{K}(L)}[\![\mathsf{rw}]\!]\subseteq L. Conversely, let 𝗍𝗐L\mathsf{tw}\in L. From Lemma 2.4 we know there exists a unique 𝗋𝗐𝖱𝖾𝗀𝖫(Σ,K)\mathsf{rw}\in{\sf RegL}(\Sigma,K) such that 𝗍𝗐[[𝗋𝗐]]\mathsf{tw}\in[\![\mathsf{rw}]\!]. Then we know [[𝗋𝗐]]L[\![\mathsf{rw}]\!]\cap L\neq\emptyset, which implies (using Lemma 2.6) [[𝗋𝗐]]L[\![\mathsf{rw}]\!]\subseteq L. Therefore, 𝗋𝗐𝖱𝖶K(L)\mathsf{rw}\in\mathsf{RW}_{K}(L). This shows first equality in the statement. The second equality holds since AA recognizes LL. For the third equality, again let 𝗍𝗐L𝗍𝗐(A)\mathsf{tw}\in L^{\mathsf{tw}}(A), then consider the unique region word 𝗋𝗐\mathsf{rw} such that 𝗍𝗐[[𝗋𝗐]]\mathsf{tw}\in[\![\mathsf{rw}]\!]. It must be the case that 𝗋𝗐L𝗋𝗐(A)\mathsf{rw}\in L^{\mathsf{rw}}(A). Hence, we get L𝗍𝗐(A)𝗋𝗐L𝗋𝗐(A)[[𝗋𝗐]]L^{\mathsf{tw}}(A)\subseteq\bigcup_{\mathsf{rw}\in L^{\mathsf{rw}}(A)}[\![\mathsf{rw}]\!]. On the other hand, let 𝗍𝗐[[𝗋𝗐]]\mathsf{tw}\in[\![\mathsf{rw}]\!] for a 𝗋𝗐L𝗋𝗐(A)\mathsf{rw}\in L^{\mathsf{rw}}(A). Then, clearly 𝗍𝗐L𝗍𝗐(A)\mathsf{tw}\in L^{\mathsf{tw}}(A).

  2. 2.

    Let 𝗋𝗐𝖱𝖶K(L¯)\mathsf{rw}\in\mathsf{RW}_{K}(\overline{L}), then from Definition 3.3, ()[[𝗋𝗐]]L¯(\emptyset\neq)[\![\mathsf{rw}]\!]\subseteq\overline{L}. Since LL¯=L\cap\overline{L}=\emptyset, we get [[𝗋𝗐]]L=[\![\mathsf{rw}]\!]\cap L=\emptyset and hence 𝗋𝗐𝖱𝖶K(L)¯𝖱𝖾𝗀𝖫(Σ,K)\mathsf{rw}\in\overline{\mathsf{RW}_{K}(L)}\cap{\sf RegL}(\Sigma,K).

In [12], the authors prove that for every ERA-definable timed language LL, there exists a KK-DERA AA, for some positive integer KK, such that L𝗍𝗐(A)=LL^{\mathsf{tw}}(A)=L. From the construction of such an AA, it in fact follows that L𝗋𝗐(A)=𝖱𝖶K(L)L^{\mathsf{rw}}(A)=\mathsf{RW}_{K}(L). The authors then present an extension of the LL^{*}-algorithm [5] that is able to learn a minimal DFA for the language L𝗋𝗐(A)L^{\mathsf{rw}}(A). Now, the following lemma shows that a minimal DFA for the regular language L𝗋𝗐(A)L^{\mathsf{rw}}(A) can be exponentially larger than a minimal DFA for the language L𝗌𝗋𝗐(A)L^{\mathsf{s\cdot rw}}(A). In this paper, we show how to cast the problem of learning a minimal DFA for L𝗌𝗋𝗐(A)L^{\sf s\cdot rw}(A) using an adaptation of the greybox active learning framework proposed by Chen et al. in [7].

Lemma 3.6.

Let LL be an ERA-definable timed language and AA be a KK-DERA recognizing LL, then the following statements hold:

  1. 1.

    𝖱𝖶K(L)=L𝗌𝗋𝗐(A)𝖱𝖾𝗀𝖫(Σ,K)=L𝗋𝗐(A)\mathsf{RW}_{K}(L)=L^{\sf s\cdot rw}(A)\cap{\sf RegL}(\Sigma,K)=L^{\mathsf{rw}}(A),

  2. 2.

    𝖱𝖶K(L¯)=L𝗌𝗋𝗐(A¯)𝖱𝖾𝗀𝖫(Σ,K)=L𝗋𝗐(A¯)\mathsf{RW}_{K}(\overline{L})=L^{\mathsf{s\cdot rw}}(\overline{A})\cap{\sf RegL}(\Sigma,K)=L^{\mathsf{rw}}(\overline{A}),

  3. 3.

    The minimal DFA over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K) that recognizes L𝗌𝗋𝗐(A)L^{\sf s\cdot rw}(A) can be exponentially smaller than the minimal DFA that recognizes L𝗋𝗐(A)L^{\sf rw}(A),

  4. 4.

    There exists a DFA with a number of states equal to the number of states in AA that accepts L𝗌𝗋𝗐(A)L^{\sf s\cdot rw}(A).

Proof 3.7.
  1. 1.

    The second equality comes from the definition of the sets L𝗋𝗐(A)L^{\mathsf{rw}}(A) and L𝗌𝗋𝗐(A)L^{\mathsf{s\cdot rw}}(A) (cf. Page 2). For the first equality, since AA recognizes LL, we know that for every 𝗋𝗐𝖱𝖶K(L)\mathsf{rw}\in\mathsf{RW}_{K}(L), 𝗋𝗐L𝗋𝗐(A)\mathsf{rw}\in L^{\mathsf{rw}}(A) and hence (from the definition of L𝗋𝗐(A)L^{\mathsf{rw}}(A)) 𝗋𝗐L𝗌𝗋𝗐(A)𝖱𝖾𝗀𝖫(Σ,K)\mathsf{rw}\in L^{\mathsf{s\cdot rw}}(A)\cap{\sf RegL}(\Sigma,K). On the other hand, if 𝗋𝗐L𝗌𝗋𝗐(A)𝖱𝖾𝗀𝖫(Σ,K)\mathsf{rw}\in L^{\mathsf{s\cdot rw}}(A)\cap{\sf RegL}(\Sigma,K), then 𝗋𝗐L𝗋𝗐(A)\mathsf{rw}\in L^{\mathsf{rw}}(A) and thus [[𝗋𝗐]]L[\![\mathsf{rw}]\!]\subseteq L.

  2. 2.

    This can be proved similarly to the previous result.

  3. 3.

    We know that checking emptiness of the timed language of an ERA is 𝖯𝖲𝖯𝖠𝖢𝖤\mathsf{PSPACE}-complete [3]. Now, we can reduce the emptiness checking problem of an ERA AA to checking if L𝗋𝗐(A)L^{\mathsf{rw}}(A) is empty or not. Note that, from Lemma 2.4 and 2.6, L𝗍𝗐(A)=L^{\mathsf{tw}}(A)=\emptyset iff L𝗋𝗐(A)=L^{\mathsf{rw}}(A)=\emptyset. Since it is possible to decide in polynomial time if the language of a DFA is empty or not, this implies, the DFA recognizing L𝗋𝗐(A)L^{\mathsf{rw}}(A) must be exponentially larger (in the worst case) than the DFA recognizing L𝗌𝗋𝗐(A)L^{\mathsf{s\cdot rw}}(A).

  4. 4.

    Note that, L𝗌𝗋𝗐(A)L^{\mathsf{s\cdot rw}}(A) contains inconsistent region words. Now, L(A)L(A) may contain symbolic words (and not only region words). Following the construction presented in Page 2, we know that from AA we can construct another KK-DERA AA^{\prime} with the same number of states as that of AA such that (i) AA^{\prime} contains only KK-simple constraints as its guards and (ii) L𝗍𝗐(A)=L𝗍𝗐(A)L^{\mathsf{tw}}(A)=L^{\mathsf{tw}}(A^{\prime}). It can then be verified easily that L(A)=L𝗌𝗋𝗐(A)L(A^{\prime})=L^{\mathsf{s\cdot rw}}(A). This is because, for every region word 𝗋𝗐L𝗌𝗋𝗐(A)\mathsf{rw}\in L^{\mathsf{s\cdot rw}}(A), there exists a path in AA^{\prime} corresponding to 𝗋𝗐\mathsf{rw} (that is, transitions marked with the letter-simple constraint pairs in 𝗋𝗐\mathsf{rw}) leading to an accepting state in AA^{\prime}.

Note that, the first equation in Lemma 3.6 is similar to Equation 1 that we introduced in Section 1. This equation is the cornerstone of our approach, and this is what justifies the advantage of our approach over existing approaches. It is crucial to understand that for a fixed Σ\Sigma and KK, the regular language 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K) present in this equation is already known and does not require learning. Consequently, the learning method we are about to describe focuses on discovering an automaton CC that satisfies this equation, and thereby it is a greybox learning algorithm (as coined in [1]) since a part of this equation is already known.

In order for an automaton CC to satisfy the first equation in Lemma 3.6, we can deduce that CC must (i) accept all the region words in 𝖱𝖶K(L)\mathsf{RW}_{K}(L), i.e. all consistent region words 𝗋𝗐\mathsf{rw} for which [[𝗋𝗐]]L[\![\mathsf{rw}]\!]\subseteq L, (ii) reject all the consistent region words 𝗋𝗐\mathsf{rw} for which [[𝗋𝗐]]L¯[\![\mathsf{rw}]\!]\subseteq\overline{L}, and (iii) it can do either ways with inconsistent region words. This is precisely the flexibility that allows for solutions CC that have fewer states than the minimal DFA for the language 𝖱𝖶K(L)\mathsf{RW}_{K}(L). The fourth statement in Lemma 3.6 implies that there is a DFA over the alphabet Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K) that has the same number of states as that of a minimal DERA recognizing the timed language LL. We give an overview of our greybox learning algorithm in Section 4.

In the following, we establish a strong correspondence between the timed languages and the region languages of two DERAs:

Lemma 3.8.

Given two DERAs AA and BB, L𝗋𝗐(A)L𝗋𝗐(B)L^{\mathsf{rw}}(A)\subseteq L^{\mathsf{rw}}(B) iff L𝗍𝗐(A)L𝗍𝗐(B)L^{\mathsf{tw}}(A)\subseteq L^{\mathsf{tw}}(B).

Proof 3.9.

()(\Rightarrow) This direction follows from Lemma 3.4.

()(\Leftarrow) Let wL𝗋𝗐(A)w\in L^{\mathsf{rw}}(A), then we know that [[w]][\![w]\!]\neq\emptyset. Choose a timed word 𝗍𝗐[[w]]\mathsf{tw}\in[\![w]\!]. Clearly, 𝗍𝗐L𝗍𝗐(A)\mathsf{tw}\in L^{\mathsf{tw}}(A) and from the assumption of the lemma, we get 𝗍𝗐L𝗍𝗐(B)\mathsf{tw}\in L^{\mathsf{tw}}(B). From Lemma 2.4 we know that ww is the only region word such that 𝗍𝗐[[w]]\mathsf{tw}\in[\![w]\!]. Finally, since 𝗍𝗐L𝗍𝗐(B)\mathsf{tw}\in L^{\mathsf{tw}}(B), we can deduce from Lemma 2.6 wL𝗋𝗐(B)w\in L^{\mathsf{rw}}(B).

3.3 (Strongly-)complete 3DFAs

Refer to caption
(a) Completeness
Refer to caption
(b) Strong-completeness
Figure 2: Illustration of the two completeness criteria

In this part, we define some concepts regarding 3DFAs that we will use in our greybox learning algorithm. These concepts are adaptations of similar concepts presented in [7] to the timed setting.

Definition 3.10 (Completeness).

A 3DFA DD over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K) is complete w.r.t. the timed language LL if the timed language of D+D^{+}, when interpreted as a DERA, is a subset of LL, i.e., L𝗍𝗐(D+)LL^{\mathsf{tw}}(D^{+})\subseteq L; and the timed language of DD^{-}, when interpreted as a DERA, is a subset of L¯\overline{L}, i.e., L𝗍𝗐(D)L¯L^{\mathsf{tw}}(D^{-})\subseteq\overline{L}.

Applying Lemma 3.6 and 3.8 to the above definition, we get the following result:

Lemma 3.11.

Let LL be a timed language over Σ\Sigma recognized by a KK-DERA AA over Σ\Sigma, then for any 3DFA DD over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K), DD is complete w.r.t. LL iff L𝗋𝗐(D+)L𝗋𝗐(A)=𝖱𝖶K(L)L^{\mathsf{rw}}(D^{+})\subseteq L^{\mathsf{rw}}(A)=\mathsf{RW}_{K}(L) and L𝗋𝗐(D)L𝗋𝗐(A¯)=𝖱𝖶K(L¯)L^{\mathsf{rw}}(D^{-})\subseteq L^{\mathsf{rw}}(\overline{A})=\mathsf{RW}_{K}(\overline{L}).

Proof 3.12.

Since AA accepts LL, so L𝗍𝗐(A)=LL^{\mathsf{tw}}(A)=L, and since AA is a DERA, L𝗍𝗐(A¯)=L¯L^{\mathsf{tw}}(\overline{A})=\overline{L}. Therefore, L𝗍𝗐(D+)LL^{\mathsf{tw}}(D^{+})\subseteq L iff L𝗍𝗐(D+)L𝗍𝗐(A)L^{\mathsf{tw}}(D^{+})\subseteq L^{\mathsf{tw}}(A) iff L𝗋𝗐(D+)L𝗋𝗐(A)L^{\mathsf{rw}}(D^{+})\subseteq L^{\mathsf{rw}}(A) (using Lemma 3.8). Similarly, L𝗍𝗐(D)L¯L^{\mathsf{tw}}(D^{-})\subseteq\overline{L} iff L𝗍𝗐(D)L𝗍𝗐(A¯)L^{\mathsf{tw}}(D^{-})\subseteq L^{\mathsf{tw}}(\overline{A}) iff L𝗋𝗐(D)L𝗋𝗐(A¯)L^{\mathsf{rw}}(D^{-})\subseteq L^{\mathsf{rw}}(\overline{A}). Then from Lemma 3.6, we know that L𝗋𝗐(A)=𝖱𝖶K(L)L^{\mathsf{rw}}(A)=\mathsf{RW}_{K}(L), and also L𝗋𝗐(A¯)=𝖱𝖶K(L¯)L^{\mathsf{rw}}(\overline{A})=\mathsf{RW}_{K}(\overline{L}), therefore, the statement follows.

However, to show the minimality in the number of states of the final DERA returned by our greybox learning algorithm, we introduce strong-completeness, by strengthening the condition on the language accepted by a 3DFA DD as follows: L(D+)=L𝗋𝗐(D+)L(D^{+})=L^{\mathsf{rw}}(D^{+}), and L(D)=L𝗋𝗐(D)L(D^{-})=L^{\mathsf{rw}}(D^{-}). In other words, we require that all the inconsistent words lead to a “don’t care” state in DD.

Definition 3.13 (Strong-completeness).

A 3DFA DD over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K) is strongly-complete w.r.t. the timed language LL if DD is complete w.r.t. LL and additionally, every region word that is strongly accepted or strongly rejected by DD must be consistent, i.e., if L𝗍𝗐(D+)LL^{\mathsf{tw}}(D^{+})\subseteq L and L(D+)=L𝗋𝗐(D+)L(D^{+})=L^{\mathsf{rw}}(D^{+}); and L𝗍𝗐(D)L¯L^{\mathsf{tw}}(D^{-})\subseteq\overline{L} and L(D)=L𝗋𝗐(D)L(D^{-})=L^{\mathsf{rw}}(D^{-}).

Corollary 3.14.

Let LL be a timed language over Σ\Sigma recognized by a KK-DERA AA over Σ\Sigma, then for any 3DFA DD over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K), DD is strongly-complete w.r.t. LL iff L(D+)L𝗋𝗐(A)=𝖱𝖶K(L)L(D^{+})\subseteq L^{\mathsf{rw}}(A)=\mathsf{RW}_{K}(L) and L(D)L𝗋𝗐(A¯)=𝖱𝖶K(L¯)L(D^{-})\subseteq L^{\mathsf{rw}}(\overline{A})=\mathsf{RW}_{K}(\overline{L}). DD is strongly-complete w.r.t. LL iff L(D+)𝖱𝖶K(L)L(D^{+})\subseteq\mathsf{RW}_{K}(L) and L(D)𝖱𝖶K(L¯)L(D^{-})\subseteq\mathsf{RW}_{K}(\overline{L}).

A pictorial representation of completeness and strong-completeness are given in Figure 2. A dual notion of completeness, called soundness can also be defined in this context, however, since we will only need that notion for showing the termination of tLSep, we will introduce it in the next section.

4 The tLSep algorithm

To learn a ERA-recognizable language LL, we adopt the following greybox learning algorithm, that is an adaptation of the separation algorithm proposed in [7]. The main steps of the algorithm are the following: we first learn a (strongly-)complete 3DFA DiD_{i} that maps finite words on the alphabet Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K) to {0,1,?}\{0,1,?\}, where 11 means “accept”, 0 means “reject”, and ?? means “don’t care”. The 3DFA DiD_{i} is used to represent a family of regular languages LL such that LDiL\models D_{i}. We then extract from DiD_{i} a minimal automaton CiC_{i} that is compatible with the 3DFA DiD_{i}, i.e., such that L(Ci)DiL(C_{i})\models D_{i}. Finally, we check, using two inclusion queries, if CiC_{i} satisfies the equation L𝗍𝗐(A)=L𝗍𝗐(Ci)L^{\mathsf{tw}}(A)=L^{\mathsf{tw}}(C_{i}). If this is the case then we return CiC_{i}, otherwise, we use the counterexample to this equality check to learn a new 3DFA Di+1D_{i+1}. We now give details of the greybox in the following subsections, and we conclude this section with the properties of our algorithm.

4.1 Learning the 3DFA DiD_{i}

In this step, we learn the 3DFA DiD_{i}, by relying on a modified version of LL^{*}. Similar to LL^{*}, we maintain an observation table (S,E,T)(S,E,T), where SS (the set of ‘prefixes’) and EE (the set of ‘suffixes’) are sets of region words. T:(S.E){0,1,?}T:(S.E)\to\{0,1,{?}\} is a function that maps the region words s.es.e, where sS,eEs\in S,e\in E, to 11 if [[s.e]]L[\![s.e]\!]\subseteq L, 0 if [[s.e]]L=[\![s.e]\!]\cap L=\emptyset and to ?{?} if [[s.e]]=[\![s.e]\!]=\emptyset. Crucially, we restrict membership queries exclusively to consistent region words. The rationale is that queries for inconsistent words are redundant, given our understanding that such words invariably yield a “??” response from the Teacher. This approach streamlines the learning phase by reducing the number of necessary membership queries. We can see the 3DFA DiD_{i} defining a space of solutions for the automaton CC that must be discovered.

Refer to caption
Figure 3: Overview of the tLSep algorithm

4.2 Checking if the 3DFA DiD_{i} is (strongly-)complete

We can interpret the 3DFA DiD_{i} as the definition for a space of solutions for recognizing the region language 𝖱𝖶K(L)\mathsf{RW}_{K}(L). In order to ensure that the final DERA of our algorithm has the minimum number of states, we rely on the fact that at every iteration ii, the 3DFA DiD_{i} is strongly-complete. To this end, we show the following result:

Lemma 4.1.

Let LL be a timed language, and A=(Q,q𝗂𝗇𝗂𝗍,Σ,E,F)A=(Q,q_{\sf init},\Sigma,E,F) be a K-DERA with the minimum number of states accepting L. Let Ci=(Q,q𝗂𝗇𝗂𝗍,Σ×𝖱𝖾𝗀(Σ,K),δ,F)C_{i}=(Q^{\prime},q^{\prime}_{\sf init},\Sigma\times{\sf Reg}(\Sigma,K),\delta,F^{\prime}) be the minimal consistent DFA such that L(Ci)DiL(C_{i})\models D_{i}. If DiD_{i} is strongly-complete, then |Q||Q||Q|\geq|Q^{\prime}|.

Proof 4.2.

We shall show that L𝗌𝗋𝗐(A)L^{\mathsf{s\cdot rw}}(A) is consistent with DiD_{i}. First, if 𝗐L(Di+)\mathsf{w}\in L(D_{i}^{+}), then 𝗐L𝗋𝗐(Di+)\mathsf{w}\in L^{\mathsf{rw}}(D_{i}^{+}) (since strongly-complete), therefore from Corollary 3.14, 𝗐L𝗋𝗐(A)\mathsf{w}\in L^{\mathsf{rw}}(A) and hence 𝗐L𝗌𝗋𝗐(A)\mathsf{w}\in L^{\mathsf{s\cdot rw}}(A). For the other side, if 𝗐L(Di)\mathsf{w}\in L(D_{i}^{-}), then 𝗐L𝗋𝗐(Di)\mathsf{w}\in L^{\mathsf{rw}}(D_{i}^{-}), and again from Corollary 3.14, 𝗐L𝗋𝗐(A¯)\mathsf{w}\in L^{\mathsf{rw}}(\overline{A}), which is equivalent to 𝗐L𝗌𝗋𝗐(A¯)𝖱𝖾𝗀𝖫(Σ,K)\mathsf{w}\in L^{\mathsf{s\cdot rw}}(\overline{A})\cap{\sf RegL}(\Sigma,K). Now note that L𝗌𝗋𝗐(A¯)=L𝗌𝗋𝗐(A)¯L^{\mathsf{s\cdot rw}}(\overline{A})=\overline{L^{\mathsf{s\cdot rw}}(A)}, and hence 𝗐L𝗌𝗋𝗐(A)¯𝖱𝖾𝗀𝖫(Σ,K)\mathsf{w}\in\overline{L^{\mathsf{s\cdot rw}}(A)}\cap{\sf RegL}(\Sigma,K), and so, 𝗐L𝗌𝗋𝗐(A)¯\mathsf{w}\in\overline{L^{\mathsf{s\cdot rw}}(A)}. Therefore, L𝗌𝗋𝗐(A)L^{\mathsf{s\cdot rw}}(A) is consistent with DiD_{i}. From hypothesis, CiC_{i} is the minimal DFA that is consistent with DiD_{i}. Therefore, it follows that |Q||Q||Q|\geq|Q^{\prime}|.

The lemma above implies that if we compute a minimal consistent DFA CiC_{i} from a strongly-complete 33DFA DiD_{i} w.r.t. the timed language LL and if this CiC_{i} satisfies Equation 1, then CiC_{i} recognizes (when it is interpreted as a DERA) LL and it has the minimum number of states among every DERA recognizing LL. The two requirements – DiD_{i} being strongly-complete and CiC_{i} being a minimal consistent DFA – are necessary to ensure that the resulting DERA for LL has the minimum number of states. However, checking both of these requirements are computationally expensive. First, for strong-completeness, one needs to check that no inconsistent region word is strongly accepted or strongly rejected by the 3DFA DiD_{i}. This can be done by checking if every region word that is either strongly accepted or strongly rejected by DiD_{i} is accepted by a DFA recognizing the regular language 𝖱𝖾𝗀𝖫(Σ,K)¯\overline{{\sf RegL}(\Sigma,K)} – this is computationally hard due to the size of the latter. Second, computing a minimal consistent DFA CiC_{i} from DiD_{i} requires essentially to find a DFA that accepts every region word that is strongly accepted by DiD_{i} and rejects every region word that is strongly rejected by DiD_{i}. This is a generalization of the problem of finding a DFA with minimum number of states accepting a set of positive words and rejecting a set of negative words, which is known – due to the works of Gold [11] – to be a computationally hard problem. In practice, we generally do not require (as is also remarked in [7]) the automaton with the minimum number of states, instead it is sufficient to compute an automaton with a small number of states – provided it makes the algorithm computationally easier. To this end, instead of implementing the algorithm depicted in Figure 3, we instead implement the algorithm depicted in Figure 4 that employs two relaxations – (i) we only compute a complete 33DFA DiD_{i}, and (ii) we only compute a small (not necessarily minimal) CiC_{i} consistent with DiD_{i}. Whereas checking whether a 33DFA is complete or not can be done by checking the two inclusions mentioned in Definition 3.10, we present a heuristic for computing a small consistent DFA in the next section. These two relaxations come at the cost of minimality of the solution, however, we show in Theorem 4.14 that this version of our algorithm still correctly computes a DERA recognizing LL. As we will see in Section 5, our implementation returns an automaton with a reasonably small, sometimes minimum, number of states.

Refer to caption
Figure 4: Overview of the tLSep algorithm with heuristics

4.3 Computing a small candidate CiC_{i} from DiD_{i}

In this section, we detail a heuristic that computes a small candidate DFA CiC_{i} from a 33DFA DiD_{i} that is complete with respect to LL. For the rest of this section, we fix the notation for the 33DFA Di=(Qi,q𝗂𝗇𝗂𝗍,(Σ×𝖱𝖾𝗀(Σ,K)),δi,𝖠𝗂,𝖱𝗂,𝖤𝗂)D_{i}=(Q_{i},q_{\sf init},(\Sigma\times{\sf Reg}(\Sigma,K)),\delta_{i},{\sf A_{i}},{\sf R_{i}},{\sf E_{i}}).

The first step is to find the set of all pairs of states that are incompatible in DiD_{i}. Two states are incompatible if there exists a region word 𝗐(Σ×𝖱𝖾𝗀(Σ,K))\mathsf{w}\in(\Sigma\times{\sf Reg}(\Sigma,K))^{*} s.t. 𝗐\mathsf{w} leads one of them to an accepting state in 𝖠𝗂{\sf A_{i}} and another to a rejecting state in 𝖱𝗂{\sf R_{i}}. Notice that, in the above definition, one does not take into account the timing constraints of a word, and that is precisely the reason why in the final automaton, the inconsistent words may lead to any state, unlike in a “simple DERA” for the corresponding timed language, where the inconsistent words must be rejected syntactically by the automaton, enabling it to be (possibly) smaller than the “minimal simple DERA”.

We recursively compute the set S𝖻𝖺𝖽S_{\sf bad} of incompatible pairs of states as follows: initially add every pair (q,q)(q,q^{\prime}) into S𝖻𝖺𝖽S_{\sf bad} where q𝖠𝗂q\in{\sf A_{i}}, q𝖱𝗂q^{\prime}\in{\sf R_{i}}; then recursively add new pairs as long as possible: add a new pair (q1,q2)(q_{1},q_{2}) into S𝖻𝖺𝖽S_{\sf bad} s.t. there exists a letter eΣ×𝖱𝖾𝗀(Σ,K)e\in\Sigma\times{\sf Reg}(\Sigma,K) s.t. q1𝑒q1q_{1}\xrightarrow{e}q_{1}^{\prime} and q2𝑒q2q_{2}\xrightarrow{e}q_{2}^{\prime} and (q1,q2)S𝖻𝖺𝖽(q_{1}^{\prime},q_{2}^{\prime})\in S_{\sf bad}.

The second step is to find the set of all \subseteq-maximal sets that are compatible in DiD_{i}. A set of states of DiD_{i} is compatible if it does not contain any incompatible pair. The set of maximal sets of compatible pairs, denoted by S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max}, can be computed recursively as follows: initially, S𝗀𝗈𝗈𝖽𝗆𝖺𝗑={Qi}S_{\sf good}^{\sf max}=\{Q_{i}\}; at any iteration, if there exists a set TS𝗀𝗈𝗈𝖽𝗆𝖺𝗑T\in S_{\sf good}^{\sf max} s.t. there exists a pair (q,q)TS𝖻𝖺𝖽(q,q^{\prime})\in T\cap S_{\sf bad}, we do the following operation: S𝗀𝗈𝗈𝖽𝗆𝖺𝗑:=(S𝗀𝗈𝗈𝖽𝗆𝖺𝗑T)(T{q})(T{q})S_{\sf good}^{\sf max}:=(S_{\sf good}^{\sf max}\setminus T)\cup(T\setminus\{q\})\cup(T\setminus\{q^{\prime}\}), with the condition that T{q}T\setminus\{q\} is added to the set S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max} only if there exists no TT^{\prime} in S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max} such that TT{q}T^{\prime}\supseteq T\setminus\{q\}, and similarly for T{q}T\setminus\{q^{\prime}\}. Then we can prove the correctness of the above procedure in the following manner.

Theorem 4.3.

The above procedure terminates. Moreover, the set S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max} returned by the algorithm is the set of all maximal compatible sets in the 3DFA DiD_{i}.

Proof 4.4.

Termination: The number of distinct subsets of QiQ_{i} is bounded by 2|Qi|2^{|Q_{i}|}, where |Qi||Q_{i}| denotes the size of QiQ_{i}. Notice that, at every iteration, a set TT from S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max} is chosen and (at most two) sets with ‘strictly’ smaller size gets added to S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max}. Therefore, it is not possible for the same set to be added twice along the procedure. In the worst case, it can contain all the distinct subsets of QiQ_{i}. Since the number of such sets is bounded and no set is added twice, the procedure terminates.

Correctness: Let us denote by S0S_{0} the set {Qi}\{Q_{i}\}, and let S1,S2,S_{1},S_{2},\ldots denote the sets obtained after each iteration of the algorithm. Let us denote by SmS_{m} the set returned by the algorithm, i.e., Sm=Sm+1S_{m}=S_{m+1}. The proof of correctness follows from the following lemma:

Lemma 4.5.

All sets in SmS_{m} are compatible, and are maximal w.r.t the subset ordering, i.e., they form a \subseteq-antichain. Furthermore, for every SQiS\subseteq Q_{i} that is compatible, there is some SSmS^{\prime}\in S_{m} such that SSS\subseteq S^{\prime}.

Proof 4.6.

First, if SmS_{m} contains a set TT which is not compatible, then TT would be deleted at the m+1m+1-th iteration of the algorithm, which contradicts the fact that Sm=Sm+1S_{m}=S_{m+1}. Therefore, all sets in SmS_{m} are compatible.

The second statement can be shown by induction on 0jm0\leq j\leq m.

Base case: S0={Qi}S_{0}=\{Q_{i}\} is maximal.

Induction hypothesis: Suppose all sets in Sj1S_{j-1} is maximal.

Induction step: Suppose we delete the set TT at iteration jj, and let (q,q)(q,q^{\prime}) be the incompatible pair in TT. Then T{q}SjSj1T\setminus\{q\}\in S_{j}\setminus S_{j-1} iff there is no TSj1T^{\prime}\in S_{j-1} s.t. TT{q}T^{\prime}\supseteq T\setminus\{q\} iff T{q}T\setminus\{q\} is maximal (same argument for T{q})T\setminus\{q^{\prime}\})). Since these are the only two sets that could possibly be added to SjS_{j}, an since by induction hypothesis, all sets in Sj1S_{j-1} are maximal, the statement follows also for jj.

The third statement can also be proved by induction on 0jm0\leq j\leq m.

Base case: Since S0={Qi}S_{0}=\{Q_{i}\}, and since SQiS\subseteq Q_{i}, the statement trivially holds.

Induction hypothesis: Suppose the statement holds for j1j-1.

Induction step: Let SQiS\subseteq Q_{i} be a compatible set. Let TT be the set selected in iteration jj with the incompatible pair being (q,q)(q,q^{\prime}). From induction hypothesis, there exists a set SSj1S^{\prime}\in S_{j-1} such that SSS\subseteq S^{\prime}.

Case 1. STS^{\prime}\neq T. In this case, SSjS^{\prime}\in S_{j} as well, and hence the result follows.

Case 2. SS^{\prime} is TT. Now, note that, in place of TT, the set SjS_{j} contains two sets T1,T2T_{1},T_{2} (both are compatible, due to Lemma 4.5) where T1(T{q})T_{1}\supseteq(T\setminus\{q\}) and T2(T{q})T_{2}\supseteq(T\setminus\{q^{\prime}\}), and (q,q)(q,q^{\prime}) is an incompatible pair. Now, since SS is consistent, SS (and also, T1T_{1}, T2T_{2}) cannot contain both the states q,qq,q^{\prime}. Therefore, ST1S\subseteq T_{1} if qSq\notin S, and ST2S\subseteq T_{2} if qSq^{\prime}\notin S.

In the final step, we define a DFA Ci=(Qi,q𝗂𝗇𝗂𝗍,(Σ×𝖱𝖾𝗀(Σ,K)),δi,Fi)C_{i}=(Q_{i}^{\prime},q^{\prime}_{\sf init},(\Sigma\times{\sf Reg}(\Sigma,K)),\delta_{i}^{\prime},F_{i}), where: Qi=S𝗀𝗈𝗈𝖽𝗆𝖺𝗑Q_{i}^{\prime}=S_{\sf good}^{\sf max}; q𝗂𝗇𝗂𝗍=TS𝗀𝗈𝗈𝖽𝗆𝖺𝗑q^{\prime}_{\sf init}=T\in S_{\sf good}^{\sf max} s.t. q𝗂𝗇𝗂𝗍Tq_{\sf init}\in T, and TT has the highest cardinality; for any TQiT\in Q_{i}^{\prime}, and e(Σ×𝖱𝖾𝗀(Σ,K))e\in(\Sigma\times{\sf Reg}(\Sigma,K)), define δi(T,e)=TS𝗀𝗈𝗈𝖽𝗆𝖺𝗑\delta_{i}^{\prime}(T,e)=T^{\prime}\in S_{\sf good}^{\sf max} s.t. qT{δi(q,e)}T\bigcup_{q\in T}\{\delta_{i}(q,e)\}\subseteq T^{\prime}, and has the largest cardinality; and Fi:={TS𝗀𝗈𝗈𝖽𝗆𝖺𝗑T𝖠𝗂}F_{i}:=\{T\in S_{\sf good}^{\sf max}\mid T\cap{\sf A_{i}}\neq\emptyset\}.

Note that, the DFA CiC_{i} constructed from the 3DFA DiD_{i} is not unique in the sense that (1) there might be more than one possible TT in S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max} s.t. q𝗂𝗇𝗂𝗍Tq_{\sf init}\in T, and (2) there might be more than one possible TT^{\prime} in S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max} such that TqT{δi(q,e)}T^{\prime}\supseteq\bigcup_{q\in T}\{\delta_{i}(q,e)\}. Also, it is important here to note that, one can apply further heuristics at this step, in particular, while adding a new transition from TT on ee, if there is already some maximal set TT^{\prime} that has been discovered previously, set the target of this transition to TT^{\prime}, even if it is not of maximum cardinality.

Lemma 4.7.

For any 3DFA DiD_{i}, the procedure described above is well-defined. Then for any CiC_{i} constructed according to the procedure, CiC_{i} is consistent with DiD_{i}, i.e., L(Di+)L(Ci)L(D^{+}_{i})\subseteq L(C_{i}) and L(Di)L(Ci)¯L(D^{-}_{i})\subseteq\overline{L(C_{i})}.

Proof 4.8.

Well-definedness: First, notice that, from Lemma 4.5, there must exist a set TS𝗀𝗈𝗈𝖽𝗆𝖺𝗑T\in S_{\sf good}^{\sf max} such that {q𝗂𝗇𝗂𝗍}T\{q_{\sf init}\}\subseteq T. Second, let TT be any maximal set in S𝗀𝗈𝗈𝖽𝗆𝖺𝗑S_{\sf good}^{\sf max}, and e(Σ×𝖱𝖾𝗀(Σ,K))e\in(\Sigma\times{\sf Reg}(\Sigma,K)). Let δi(T,e)=T′′\delta_{i}^{\prime}(T,e)=T^{\prime\prime}. Notice that, since TT is compatible, therefore T′′T^{\prime\prime} has to be a compatible set. Indeed, if there exists a pair of states (q,q)(q,q^{\prime}) in T′′T^{\prime\prime} that are incompatible, then there must exist q1,q1q_{1},q_{1}^{\prime} in TT such that q1𝑒qq_{1}\xrightarrow{e}q and q1𝑒qq_{1}^{\prime}\xrightarrow{e}q^{\prime} are two transitions in EiE_{i}, and hence (q1,q1)(q_{1},q_{1}^{\prime}) are also incompatible contradicting the hypothesis that TT is compatible. Now that we have shown that T′′T^{\prime\prime} must be a compatible set, again by Lemma 4.5, there must exist TS𝗀𝗈𝗈𝖽𝗆𝖺𝗑T^{\prime}\in S_{\sf good}^{\sf max} such that T′′TT^{\prime\prime}\subseteq T^{\prime}. This concludes the proof.

Consistency: Consider any CiC_{i} constructed according to the procedure from the 3DFA DiD_{i}. Suppose 𝗐=e1e2em\mathsf{w}=e_{1}e_{2}\dots e_{m} is a region word over (Σ,K)(\Sigma,K) (not necessarily consistent). Let ρ=q𝗂𝗇𝗂𝗍=q0e1q1e2emqm\rho=q_{\sf init}=q_{0}\xrightarrow{e_{1}}q_{1}\xrightarrow{e_{2}}\ldots\xrightarrow{e_{m}}q_{m} be the unique run of 𝗐\mathsf{w} in the 3DFA DiD_{i}. By applying Lemma 4.5 on the transitions of ρ\rho, in particular by induction, one can show that there exists a unique run ρ=T0e1T1e2emTm\rho^{\prime}=T_{0}\xrightarrow{e_{1}}T_{1}\xrightarrow{e_{2}}\ldots\xrightarrow{e_{m}}T_{m}, where for all 0jm0\leq j\leq m, TjT_{j} is a state in QiQ_{i}^{\prime} s.t. qjTjq_{j}\in T_{j}. Now suppose, 𝗐L(Di+)\mathsf{w}\in L(D_{i}^{+}). Then, qm𝖠𝗆Tmq_{m}\in{\sf A_{m}}\cap T_{m}. By definition of FiF_{i} (the set of accepting states of CiC_{i}), TmFiT_{m}\in F_{i}, which implies that 𝗐L(Ci)\mathsf{w}\in L(C_{i}). A similar argument can be used to show that for any 𝗐L(Di)\mathsf{w}\in L(D_{i}^{-}), 𝗐L(Ci)¯\mathsf{w}\in\overline{L(C_{i})}. This concludes the proof.

Taking intersection with 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K) on both sides, together with Lemma 3.6, we get the following corollary:

Corollary 4.9.

L𝗋𝗐(Di+)L𝗋𝗐(Ci)L^{\mathsf{rw}}(D_{i}^{+})\subseteq L^{\mathsf{rw}}(C_{i}), and L𝗋𝗐(Di)L𝗋𝗐(Ci)¯𝖱𝖾𝗀𝖫(Σ,K)L^{\mathsf{rw}}(D_{i}^{-})\subseteq\overline{L^{\mathsf{rw}}(C_{i})}\cap{\sf RegL}(\Sigma,K).

4.4 Checking if CiC_{i} is a solution

Now that CiC_{i} has been extracted from DiD_{i}, we can query the 𝚃𝚎𝚊𝚌𝚑𝚎𝚛{\tt Teacher} with two inclusion queries that check if L𝗍𝗐(Ci)=LL^{\sf tw}(C_{i})=L. If the answer is yes, then CiC_{i} is a solution to our the learning problem, and we return the automaton CiC_{i}; otherwise, the Teacher returns a counterexample – a consistent region word that is either accepted by CiC_{i} but is not in 𝖱𝖶K(L)\mathsf{RW}_{K}(L), or a consistent region word that is rejected by CiC_{i} but is in 𝖱𝖶K(L)\mathsf{RW}_{K}(L). Notice that the equality check has been defined above as an equality check between two timed languages, whereas, the counterexamples returned by the Teacher are region words. This is not contradictory, indeed, one can show a similar result to Lemma 3.8 that L𝗍𝗐(Ci)=LL^{\mathsf{tw}}(C_{i})=L iff L𝗋𝗐(Ci)=𝖱𝖶K(L)L^{\mathsf{rw}}(C_{i})=\mathsf{RW}_{K}(L). More details on how the counter-example is extracted from the answer received from the Teacher is described in Section 5. We will show in Lemma 4.12 that a counterexample to the equality check is also a ‘discrepancy’ in the 3DFA DiD_{i}. Therefore, this counterexample is used to update the observation table as in LL^{*} to produce the subsequent hypothesis Di+1D_{i+1}.

4.5 Correctness of the tLSep algorithm

Definition 4.10 (Soundness).

A 3DFA DD over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K) is sound w.r.t. the timed language LL if the timed language of D+D^{+}, when interpreted as a DERA, contains LL, i.e., LL𝗍𝗐(D+)L\subseteq L^{\mathsf{tw}}(D^{+}); and the timed language of DD^{-}, when interpreted as a DERA, contains L¯\overline{L}, i.e., L¯L𝗍𝗐(D)\overline{L}\subseteq L^{\mathsf{tw}}(D^{-}).

Similar to Lemma 3.11, one can show the following result:

Lemma 4.11.

Let LL be a timed language over Σ\Sigma recognized by a KK-DERA AA over Σ\Sigma, then for any 3DFA DD over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K), DD is sound w.r.t. LL iff 𝖱𝖶K(L)=L𝗋𝗐(A)L𝗋𝗐(D+)\mathsf{RW}_{K}(L)=L^{\mathsf{rw}}(A)\subseteq L^{\mathsf{rw}}(D^{+}) and 𝖱𝖶K(L¯)=L𝗋𝗐(A¯)L𝗋𝗐(D)\mathsf{RW}_{K}(\overline{L})=L^{\mathsf{rw}}(\overline{A})\subseteq L^{\mathsf{rw}}(D^{-}).

The following lemma shows that a counter-example of the equality check in Section 4.4 is also a counter-example to the soundness of DiD_{i}:

Lemma 4.12.

If a counter-example is generated for the equality check against CiC_{i}, then the corresponding 3DFA DiD_{i} is not sound w.r.t. 𝖱𝖶K(L)\mathsf{RW}_{K}(L).

Proof 4.13.

Recall from Corollary 4.9 that, L𝗋𝗐(Di+)L𝗋𝗐(Ci)L^{\sf rw}(D^{+}_{i})\subseteq L^{\sf rw}(C_{i}), and L𝗋𝗐(Di)L𝗋𝗐(Ci)¯𝖱𝖾𝗀𝖫(Σ,K)L^{\sf rw}(D^{-}_{i})\subseteq\overline{L^{\mathsf{rw}}(C_{i})}\cap{\sf RegL}(\Sigma,K). Let 𝗐\mathsf{w} be a counter-example for the equality check, then:

(1) either there exists a consistent region word 𝗐𝖱𝖶K(L)L𝗋𝗐(Ci)\mathsf{w}\in\mathsf{RW}_{K}(L)\setminus L^{\sf rw}(C_{i}), which implies 𝗐𝖱𝖶K(L)L𝗋𝗐(Di+)\mathsf{w}\in\mathsf{RW}_{K}(L)\setminus L^{\sf rw}(D_{i}^{+}). Hence, DiD_{i} is not sound w.r.t 𝖱𝖶K(L)\mathsf{RW}_{K}(L).

(2) or, there exists a consistent region word 𝗐L𝗋𝗐(Ci)𝖱𝖶K(L)\mathsf{w}\in L^{\sf rw}(C_{i})\setminus\mathsf{RW}_{K}(L). Since 𝗐\mathsf{w} is consistent, using Lemma 3.4, we can deduce that 𝗐𝖱𝖶K(L¯)(L𝗋𝗐(Ci)¯𝖱𝖾𝗀𝖫(Σ,K))\mathsf{w}\in\mathsf{RW}_{K}(\overline{L})\setminus(\overline{L^{\mathsf{rw}}(C_{i})}\cap{\sf RegL}(\Sigma,K)), which implies 𝗐𝖱𝖶K(L¯)L𝗋𝗐(Di)\mathsf{w}\in\mathsf{RW}_{K}(\overline{L})\setminus L^{\sf rw}(D_{i}^{-}). Hence, DiD_{i} is not sound w.r.t. L𝗋𝗐(A)L^{\mathsf{rw}}(A).

We can prove the following using similar lines of arguments as in [7].

Theorem 4.14.

Let LL be a timed language over Σ\Sigma recognizable by a ERA given as input to the tLSep algorithm. Then,

  1. 1.

    the algorithm tLSep terminates,

  2. 2.

    the algorithm tLSep is correct, i.e., letting CC be the automaton returned by the algorithm, L𝗍𝗐(C)=LL^{\sf tw}(C)=L,

  3. 3.

    at every iteration of tLSep, if strong-completeness is checked for 3DFA DiD_{i}, and if a minimization procedure is used to construct CiC_{i}, then CC has a minimal number of states.

Proof 4.15.

1. One can define the following ‘canonical’ 3DFA DD over Σ×𝖱𝖾𝗀(Σ,K)\Sigma\times{\sf Reg}(\Sigma,K) from LL as follows: for every consistent region word ww s.t. w𝖱𝖶K(L)w\in\mathsf{RW}_{K}(L), define D(w)=1D(w)=1; for every consistent region word ww s.t. w𝖱𝖶K(L)w\notin\mathsf{RW}_{K}(L), define D(w)=0D(w)=0; and for every inconsistent region word ww, define D(w)=?D(w)={?}. It is then easy to verify that DD is sound and complete w.r.t. LL. Due to Lemma 4.12, at iteration ii, a counterexample is produced only if the 3DFA DiD_{i} is not sound. Notice also that, from Lemma 3.1 and Lemma 3.6, the languages 𝖱𝖾𝗀𝖫(Σ,K){\sf RegL}(\Sigma,K), 𝖱𝖾𝗀𝖫(Σ,K)¯\overline{{\sf RegL}(\Sigma,K)}, 𝖱𝖶K(L)\mathsf{RW}_{K}(L) and 𝖱𝖶K(L)¯\overline{\mathsf{RW}_{K}(L)} are regular languages. Therefore, the 3DFAs constructed from the observation table at every iteration of tLSep algorithm, as described in Section 4.1, gradually converges to the canonical sound and complete 3DFA DD (as shown in the case of regular separability in [12]).

2. Since CC is the final automaton, it must be the case that when an equality check was asked to the teacher with hypothesis CC, no counter-example was generated, and hence L𝗍𝗐(C)=LL^{\mathsf{tw}}(C)=L.

3. This statement follows from Lemma 4.1 together with the fact that CiC_{i} is the minimal consistent DFA for DiD_{i}.

5 Implementation and its performance

We have made a prototype implementation of tLSep in Python. Parts of our implementation are inspired from the implementation of LL^{*} in AALpy [17]. For the experiments, we assume that the Teacher has access to a DERA AA recognizing the target timed language LL. Below, we describe how we implement the sub-procedures. Note that, in our implementation, we check for completeness, and not for strong-completeness, of the 3DFA’s.

Emptiness of region words.

Since we do not perform a membership query for inconsistent region words, before every membership query, we first check if the region word is consistent or not (in agreement with our greybox learning framework described in Page 3.1). The consistency check is performed by encoding the region word as an SMT formula in Linear Real Arithmetic and then checking (using the SMT solver Z3 [9]) if the formula is satisfiable or not. Unsatisfiability of the formula implies the region word is inconsistent.

Inclusion of Timed languages.

We check inclusion between timed languages recognizable by ERA in two situations: (i) while making the 3DFA DiD_{i} complete (cf. Definition 3.10), and (ii) during checking if the constructed DERA CiC_{i} recognizes LL (cf. Definition 4.10). Both of these checks can be reduced to checking emptiness of appropriate automata. We perform the emptiness checks using the reachability algorithm (covreach) implemented inside the tool TChecker [13].

Counterexample processing.

When one of the language inclusions during completeness checking returns False, we obtain a ‘concrete’ path from TChecker that acts as a certificate for non-emptiness. The path is of the form:

𝖼𝖾𝗑:=(q0,v0){xσ1}σ1,g1(q1,v1){xσ2}σ2,g2(q2,v2)(qn1,vn1){xσn}σn,gn(qn,vn){\sf cex}:=(q_{0},v_{0})\xrightarrow[\{x_{\sigma_{1}}\}]{\sigma_{1},\leavevmode\nobreak\ g_{1}}(q_{1},v_{1})\xrightarrow[\{x_{\sigma_{2}}\}]{\sigma_{2},\leavevmode\nobreak\ g_{2}}(q_{2},v_{2})\ldots(q_{n-1},v_{n-1})\xrightarrow[\{x_{\sigma_{n}}\}]{\sigma_{n},\leavevmode\nobreak\ g_{n}}(q_{n},v_{n})

Since the guards present in the automata Di+,DiD_{i}^{+},D_{i}^{-} are region words, every guard present in the product automaton is also a region word. From 𝖼𝖾𝗑{\sf cex} we then construct the region word (σ1,g1)(σ2,g2)(σn,gn)(\sigma_{1},g_{1})(\sigma_{2},g_{2})\ldots(\sigma_{n},g_{n}). We then use the algorithm proposed by Rivest-Schapire [14, 18] to compute the ‘witnessing suffix’ 𝗐𝗌{\sf ws} from 𝖼𝖾𝗑{\sf cex} and then add 𝗐𝗌{\sf ws} to the set of suffixes (EE) of the observation table. On the other hand, when we receive a counterexample during the soundness check, we add all the prefixes of this counterexample to the set SS instead. Note that, the guards present on the transitions in our output are simple constraints, and not constraints.

q0\scriptstyle q_{0}q1\scriptstyle q_{1}q2\scriptstyle q_{2}a,xa>0\scriptstyle a,\leavevmode\nobreak\ x_{a}>0a,xa=0\scriptstyle a,\leavevmode\nobreak\ x_{a}=0a, 0<xa<1\scriptstyle a,\leavevmode\nobreak\ 0<x_{a}<1a,xa=0\scriptstyle a,\leavevmode\nobreak\ x_{a}=0a,xa1\scriptstyle a,\leavevmode\nobreak\ x_{a}\geq 1a,xa0\scriptstyle a,\leavevmode\nobreak\ x_{a}\geq 0
(a) Example-4
q0\scriptstyle q_{0}q1\scriptstyle q_{1}q2\scriptstyle q_{2}a,x0>0,{x0}\scriptstyle a,\leavevmode\nobreak\ x_{0}>0,\leavevmode\nobreak\ \{x_{0}\}a,x0=0\scriptstyle a,\leavevmode\nobreak\ x_{0}=0{x1}\scriptstyle\{x_{1}\}a, 0<x0<10<x1<1\scriptstyle a,\leavevmode\nobreak\ 0<x_{0}<1\wedge 0<x_{1}<1{x0,x1}\scriptstyle\{x_{0},x_{1}\}a,x01x11\scriptstyle a,\leavevmode\nobreak\ x_{0}\geq 1\wedge x_{1}\geq 1{x0,x1,x2}\scriptstyle\{x_{0},x_{1},x_{2}\}a,x0=0x1=0\scriptstyle a,\leavevmode\nobreak\ x_{0}=0\wedge x_{1}=0{x2}\scriptstyle\{x_{2}\}a,x0=0x1=0x2=0\scriptstyle a,\leavevmode\nobreak\ x_{0}=0\wedge x_{1}=0\wedge x_{2}=0{x0,x1}\scriptstyle\{x_{0},x_{1}\}a,x0>0x1>0x2>0\scriptstyle a,\leavevmode\nobreak\ x_{0}>0\wedge x_{1}>0\wedge x_{2}>0{x0,x1,x2}\scriptstyle\{x_{0},x_{1},x_{2}\}
(b) DTA computed by LeanTA on Figure 5(a)
Figure 5: Explainability of tLSep

We now compare the performance of our prototype implementation of tLSep against the algorithm proposed by Waga in [24] on a set of synthetic timed languages and on a benchmark taken from their tool LearnTA. We witness encouraging results, among those, we stress on the important ones below.

Minimality.

We have shown in Section 4 that if one ensures strong-completeness of the 3DFAs throughout the procedure, then the minimality in the number of states is assured. However, this is often the case even with only checking completeness. For instance, recall our running example from Figure 1. For this language, our algorithm was able to find a DERA with only two states (depicted in Figure 6(a)) – which is indeed the minimum number of states. On the other hand, LearnTA does not claim minimality in the computed automaton, and indeed for this language, they end up constructing the automaton in Figure 6(b) instead. Note that for the sake of clarity, we draw a single edge from q0q_{0} to q1q_{1} on aa in Figure 6(a), however, our implementation constructs several parallel edges (on aa) between these two states, one for each constraint in 𝖱𝖾𝗀(Σ,K){\sf Reg}(\Sigma,K).

q0\scriptstyle q_{0}q1\scriptstyle q_{1}a\scriptstyle aa,xa=1xb=0\scriptstyle a,\leavevmode\nobreak\ x_{a}=1\land x_{b}=0a,xa>10<xb<1\scriptstyle a,\leavevmode\nobreak\ x_{a}>1\wedge 0<x_{b}<1a,xa>1xb=1\scriptstyle a,\leavevmode\nobreak\ x_{a}>1\wedge x_{b}=1b,xa=1xb=1\scriptstyle b,\leavevmode\nobreak\ x_{a}=1\wedge x_{b}=1b,xa=1xb>1\scriptstyle b,\leavevmode\nobreak\ x_{a}=1\wedge x_{b}>1
(a) output of tLSep 
q0\scriptstyle q_{0}q1\scriptstyle q_{1}q2\scriptstyle q_{2}a,x0=0\scriptstyle a,x_{0}=0{x1}\scriptstyle\{x_{1}\}a,x0>0\scriptstyle a,x_{0}>0{x0,x1}\scriptstyle\{x_{0},x_{1}\}b,x0=1x1=1\scriptstyle b,\leavevmode\nobreak\ x_{0}=1\land x_{1}=1{x2}\scriptstyle\{x_{2}\}a,1x021x12x21\scriptstyle a,1\leq x_{0}\leq 2\land 1\leq x_{1}\leq 2\land x_{2}\leq 1{x0,x1}\scriptstyle\{x_{0},x_{1}\}
(b) output of LearnTA
Figure 6: The automata computed by the two algorithms on Figure 1

Explainability.

From a practical point of view, explainability of the output automaton is a very important property of learning algorithms. One would ideally like the learning algorithm to return a ‘readable’, or easily explainable, model for the target language. We find some of the automata returned by tLSep are relatively easier to understand than the model returned by LearnTA. This is essentially due to the readability of ERA over TA. A comparative example can be found in Figure 5.

Efficiency.

We treat completeness checks for a 3DFA and language-equivalence checks for a DERA, as equivalence queries (EQ). Note that, EQ’s are not new queries, these are merely (at most) two Inclusion Queries (IQ). We report on the number of queries for tLSep and LearnTA in Table 2 on a set of examples. For both the algorithms, we only report on the number of queries ‘with memory’ in the sense that we fetch the answer to a query from the memory if it was already computed. Notice that LearnTA uses two types of membership queries, called symbolic queries (s-MQ) and timed-queries (t-MQ), and in most of the examples, tLSep needs much lesser number of membership queries than LearnTA.

Although the class of ERA-recognizable languages is a strict subclass of DTA-recognizable languages, we found a set of benchmarks from the LearnTA tool, that are in fact event-recording automata (after renaming the clocks). This example is called Unbalanced, Table 2 shows that tLSep uses significantly lesser number of queries than LearnTA for these, illustrating the gains of our greybox approach.

q0\scriptstyle q_{0}q1\scriptstyle q_{1}q2\scriptstyle q_{2}q3\scriptstyle q_{3}a\scriptstyle ab\scriptstyle ba,xa<1\scriptstyle a,\leavevmode\nobreak\ x_{a}<1b,xa>1\scriptstyle b,\leavevmode\nobreak\ x_{a}>1
(a) Example-1
q0\scriptstyle q_{0}q1\scriptstyle q_{1}a,xa=1\scriptstyle a,\leavevmode\nobreak\ x_{a}=1b\scriptstyle b
(b) Example-2
q0\scriptstyle q_{0}q1\scriptstyle q_{1}q2\scriptstyle q_{2}a\scriptstyle ab,xa=1\scriptstyle b,\leavevmode\nobreak\ x_{a}=1b,xa>1\scriptstyle b,\leavevmode\nobreak\ x_{a}>1
(c) Example-3
Figure 7: Automata corresponding to the models ex1, ex2 and ex3 of the table, respectively

Below, we give brief descriptions of languages that we have learnt using our tLSep algorithm. The timed language (ex1) has untimed language (abab)(abab)^{*}, where in every such four letter block, the second aa must occur before 11 time unit of the first aa and the second bb must occur after 11 time unit since the first aa. The language (ex2) has untimed language (ab)(ab)^{*}, where the first aa happens at 11 time unit and every subsequent aa occurs exactly 11 time unit after the preceding aa. The timed language (ex3) consists of timed words whose untimed language is abbab^{*}b, and the first aa can occur at any time, then there can be several bb’s all exactly 11 time unit after aa, and then the last bb occurs strictly after 11 time unit since the first aa. We also consider the language (ex4) represented by the automaton in Figure 5(a) (this language is taken from [12]), for whose language, as we have described, tLSep constructs a relatively more understandable automaton compared to LearnTA.

Model KK |Q||Q| |Σ||\Sigma| tLSep LearnTA
𝖬𝖰{\sf MQ} 𝖨𝖰{\sf IQ} 𝖤𝖰{\sf EQ} 𝗌𝖬𝖰{\sf s-MQ} 𝗍𝖬𝖰{\sf t-MQ} 𝖤𝖰{\sf EQ}
Figure 1 1 4 2 98 8 5 230 219 4
ex1 1 5 2 219 11 6 296 365 4
ex2 1 3 2 220 12 7 247 233 5
ex3 2 4 2 87 7 4 123 173 3
ex4 1 3 1 26 5 3 40 43 2
Unbalanced-1 1 5 3 421 17 12 1717 2394 6
Unbalanced-2 2 5 3 1095 27 20 7347 13227 10
Unbalanced-3 3 5 3 2087 37 28 21200 45400 17
Table 2: Experimental results. Here KK denotes the maximum constant appearing in the automaton, |Q||Q| and |Σ||\Sigma| denote the numbers of states (of the automaton provided as input to the algorithm) and the alphabet. Note that, for some of these automata, the number of states specified in the table is one more than the number of states depicted in the figures. This is because, for the automata that are not total, we add an additional sink state and add all missing transitions to this state.

6 Discussions

Our grey box learning algorithm can produce automata with a minimal number of states, equivalent to the minimal DERA for the target language. Currently, the transitions are labeled with region constraints. In the future, we plan to implement a procedure that consolidates edges labeled with the same event and regions, whose unions are convex, into a single edge with a zone constraint. This planned procedure aims to not only maintain an automaton with the fewest possible states, but also to minimize the number of edges and zones. Such optimization could further enhance the readability of the models we produce, thereby improving explainability—an increasingly important aspect of machine learning.

Building on our proposed future work above, we also aim to refine our approach to membership queries. Instead of defaulting to queries on regions where it may not be necessary, we plan to consider queries on zones. If the response from the Teacher is not uniform across the zone, only then will we break down the queries into more specific sub-queries, ending in regions if necessary. This strategy will necessitate revising the current learning protocol we use, but exploring these alternatives could enhance the efficiency and effectiveness of our learning process. Similar ideas have been used in [15].

Throughout this paper, we follow the convention used in prior research [12, 24] that the parameter KK is predetermined (the maximal constant against which the clocks are compared). Similarly, in our experimental section, we assume we know which events need to be tracked by a clock. While these assumptions are typically reasonable in practice, it is possible to consider a more flexible approach. We could start with no predetermined events to track with clocks and only introduce clocks for events and update the maximum constant, as needed. These could be inferred from the counterexamples provided by the Teacher.

Overall, we believe the greybox learning framework described in this work may also be well-suited for other classes of languages that enjoy a similar condition like Equation 1, where the automaton CC can be of relatively smaller size, while a canonical model recognizing LL might be too large.

References

  • [1] Andreas Abel and Jan Reineke. Gray-box learning of serial compositions of mealy machines. In NFM, volume 9690 of Lecture Notes in Computer Science, pages 272–287. Springer, 2016.
  • [2] Rajeev Alur and David L. Dill. A theory of timed automata. Theor. Comput. Sci., 126(2):183–235, 1994.
  • [3] Rajeev Alur, Limor Fix, and Thomas A. Henzinger. Event-clock automata: A determinizable class of timed automata. Theor. Comput. Sci., 211(1-2):253–273, 1999.
  • [4] Jie An, Lingtai Wang, Bohua Zhan, Naijun Zhan, and Miaomiao Zhang. Learning real-time automata. Sci. China Inf. Sci., 64(9), 2021.
  • [5] Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Comput., 75(2):87–106, 1987.
  • [6] Véronique Bruyère, Bharat Garhewal, Guillermo A. Pérez, Gaëtan Staquet, and Frits W. Vaandrager. Active learning of mealy machines with timers. CoRR, abs/2403.02019, 2024. URL: https://doi.org/10.48550/arXiv.2403.02019.
  • [7] Yu-Fang Chen, Azadeh Farzan, Edmund M. Clarke, Yih-Kuen Tsay, and Bow-Yaw Wang. Learning minimal separating dfa’s for compositional verification. In TACAS, volume 5505 of Lecture Notes in Computer Science, pages 31–45. Springer, 2009.
  • [8] Lénaïg Cornanguer, Christine Largouët, Laurence Rozé, and Alexandre Termier. TAG: learning timed automata from logs. In AAAI, pages 3949–3958. AAAI Press, 2022.
  • [9] Leonardo Mendonça de Moura and Nikolaj S. Bjørner. Z3: an efficient SMT solver. In TACAS, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008.
  • [10] David L. Dill. Timing assumptions and verification of finite-state concurrent systems. In Automatic Verification Methods for Finite State Systems, volume 407 of Lecture Notes in Computer Science, pages 197–212. Springer, 1989.
  • [11] E. Mark Gold. Complexity of automaton identification from given data. Inf. Control., 37(3):302–320, 1978. doi:10.1016/S0019-9958(78)90562-4.
  • [12] Olga Grinchtein, Bengt Jonsson, and Martin Leucker. Learning of event-recording automata. Theor. Comput. Sci., 411(47):4029–4054, 2010.
  • [13] Frédéric Herbreteau and Gerald Point. Tchecker. URL: https://github.com/ticktac-project/tchecker.
  • [14] Malte Isberner and Bernhard Steffen. An abstract framework for counterexample analysis in active automata learning. In ICGI, volume 34 of JMLR Workshop and Conference Proceedings, pages 79–93. JMLR.org, 2014.
  • [15] Shang-Wei Lin, Étienne André, Jin Song Dong, Jun Sun, and Yang Liu. An efficient algorithm for learning event-recording automata. In ATVA, volume 6996 of Lecture Notes in Computer Science, pages 463–472. Springer, 2011.
  • [16] Mark Moeller, Thomas Wiener, Alaia Solko-Breslin, Caleb Koch, Nate Foster, and Alexandra Silva. Automata learning with an incomplete teacher. In ECOOP, volume 263 of LIPIcs, pages 21:1–21:30. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
  • [17] Edi Muškardin, Bernhard Aichernig, Ingo Pill, Andrea Pferscher, and Martin Tappler. Aalpy: an active automata learning library. Innovations in Systems and Software Engineering, 18:1–10, 03 2022. doi:10.1007/s11334-022-00449-3.
  • [18] Ronald L. Rivest and Robert E. Schapire. Inference of finite automata using homing sequences. Inf. Comput., 103(2):299–347, 1993.
  • [19] Martin Tappler, Bernhard K. Aichernig, Kim Guldstrand Larsen, and Florian Lorber. Time to learn - learning timed automata from tests. In FORMATS, volume 11750 of Lecture Notes in Computer Science, pages 216–235. Springer, 2019.
  • [20] Yu Teng, Miaomiao Zhang, and Jie An. Learning deterministic multi-clock timed automata. In HSCC, pages 6:1–6:11. ACM, 2024.
  • [21] Frits W. Vaandrager. Model learning. Commun. ACM, 60(2):86–95, 2017.
  • [22] Frits W. Vaandrager, Roderick Bloem, and Masoud Ebrahimi. Learning mealy machines with one timer. In LATA, volume 12638 of Lecture Notes in Computer Science, pages 157–170. Springer, 2021.
  • [23] Sicco Verwer, Mathijs de Weerdt, and Cees Witteveen. One-clock deterministic timed automata are efficiently identifiable in the limit. In LATA, volume 5457 of Lecture Notes in Computer Science, pages 740–751. Springer, 2009.
  • [24] Masaki Waga. Active learning of deterministic timed automata with myhill-nerode style characterization. In CAV (1), volume 13964 of Lecture Notes in Computer Science, pages 3–26. Springer, 2023.
  • [25] Runqing Xu, Jie An, and Bohua Zhan. Active learning of one-clock timed automata using constraint solving. In ATVA, volume 13505 of Lecture Notes in Computer Science, pages 249–265. Springer, 2022.