This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: David R. Cheriton School of Computer Science, University of Waterloo
Waterloo, ON, Canada N2L 3G1
brzozo@uwaterloo.ca
22institutetext: Department of Pure Mathematics, University of Waterloo
Waterloo, ON, Canada N2L 3G1
sldavies@uwaterloo.ca

Most Complex Deterministic Union-Free Regular Languagesthanks: This work was supported by the Natural Sciences and Engineering Research Council of Canada grant No. OGP0000871.

Janusz A. Brzozowski 11    Sylvie Davies 22
Abstract

A regular language LL is union-free if it can be represented by a regular expression without the union operation. A union-free language is deterministic if it can be accepted by a deterministic one-cycle-free-path finite automaton; this is an automaton which has one final state and exactly one cycle-free path from any state to the final state. Jirásková and Masopust proved that the state complexities of the basic operations reversal, star, product, and boolean operations in deterministic union-free languages are exactly the same as those in the class of all regular languages. To prove that the bounds are met they used five types of automata, involving eight types of transformations of the set of states of the automata. We show that for each n3n\geqslant 3 there exists one ternary witness of state complexity nn that meets the bound for reversal and product. Moreover, the restrictions of this witness to binary alphabets meet the bounds for star and boolean operations. We also show that the tight upper bounds on the state complexity of binary operations that take arguments over different alphabets are the same as those for arbitrary regular languages. Furthermore, we prove that the maximal syntactic semigroup of a union-free language has nnn^{n} elements, as in the case of regular languages, and that the maximal state complexities of atoms of union-free languages are the same as those for regular languages. Finally, we prove that there exists a most complex union-free language that meets the bounds for all these complexity measures. Altogether this proves that the complexity measures above cannot distinguish union-free languages from regular languages.

Keywords: atom, boolean operation, concatenation, different alphabets, most complex, one-cycle-free-path, regular, reversal, star, state complexity, syntactic semigroup, transition semigroup, union-free

1 Introduction

Formal definitions are postponed until Section 2.

The class of regular languages over a finite alphabet Σ\Sigma is the smallest class of languages containing the empty language \emptyset, the language {ε}\{\varepsilon\}, where ε\varepsilon is the empty word, and the letter languages {a}\{a\} for each aΣa\in\Sigma, and closed under the operations of union, concatenation, and (Kleene) star. Hence each regular language can be written as a finite expression involving the above basic languages and operations. An expression defining a regular language in this way is called a regular expression. Because regular languages are also closed under complementation, we may also consider regular expressions that allow complementation, which are called extended regular expressions. In this paper we deal exclusively with regular languages.

A natural question is: what kind of languages are defined if one of the operations in the definitions given above is missing? If the star operation is removed from the extended regular expressions we get the well known star-free languages [8, 18, 23], which have been extensively studied. Less attention was given to classes defined by removing an operation from ordinary regular expressions, but recently language classes defined without union or concatenation have been studied.

If we remove some operations from regular expressions, we obtain the following classes of languages:

Union only

subsets of {ε}Σ\{\varepsilon\}\cup\Sigma.

Concatenation only

\emptyset and {w}\{w\} for each wΣw\in\Sigma^{*}.

Star only

\emptyset, {ε}\{\varepsilon\}, {a}\{a\} for each aΣa\in\Sigma, and {a}\{a\}^{*} for each aΣa\in\Sigma.

Union and Concatenation

Finite languages.

Concatenation and Star

These are the union-free languages that constitute the main topic of this paper.

Union and Star

These are the concatenation-free languages that were studied in [11, 13, 17].

Union-free regular languages were first considered by Brzozowski [3] in 1962 under the name star-dot regular languages, where dot stands for concatenation. He proved that every regular language is a union of union-free languages [3, p. 216, Theorem 9.5]111Terminology changed to that of the present paper.. Much more recently, in 2001, Crvenković, Dolinka and Ésik [11] studied equations satisfied by union-free regular languages, and proved that the class of these languages cannot be axiomatized by a finite set of equations. This is also known to be true for the class of all regular languages. In 2006 Nagy studied union-free languages in detail and characterized them in terms of nondeterministic finite automata (NFAs) recognizing them [19], which he called one-cycle-free-path NFAs. In 2009 minimal union-free decompositions of regular languages were studied in [1] by Afonin and Golomazov. They also presented a new algorithm for deciding whether a given deterministic finite automaton (DFA) accepts a union-free language. Decompositions of regular languages in terms of union-free languages were further studied by Nagy in 2010 [20]. The state complexities of operations on union-free languages were examined in 2011 by Jirásková and Masopust [15], who proved that the state complexities of basic operations on these languages are the same as those in the class of all regular languages. It was shown in [15] that the class of languages defined by DFAs with the one-cycle-free-path property is a proper subclass of that defined by one-cycle-free-path NFAs; the former class is called the class of deterministic union-free languages. In 2012 Jirásková and Nagy [16] proved that the class of finite unions of deterministic union-free languages is a proper subclass of the class of regular languages. They also showed that every deterministic union-free language is accepted by a special kind of a one-cycle-free-path DFA called a balloon DFA. A summary of the properties of union-free languages was presented in 2017 in [13].

2 Preliminaries

Let LL be a regular language. We define the alphabet of LL to be the set of letters which appear at least once in a word of LL. For example, consider the language L={a,ab,ac}L=\{a,ab,ac\} and the subset K={a,ac}K=\{a,ac\}; we say LL has alphabet {a,b,c}\{a,b,c\} and KK has alphabet {a,c}\{a,c\}.

A deterministic finite automaton (DFA) is a quintuple 𝒟=(Q,Σ,δ,q0,F){\mathcal{D}}=(Q,\Sigma,\delta,q_{0},F), where QQ is a finite non-empty set of states, Σ\Sigma is a finite non-empty alphabet, δ:Q×ΣQ\delta\colon Q\times\Sigma\to Q is the transition function, q0Qq_{0}\in Q is the initial state, and FQF\subseteq Q is the set of final states. We extend δ\delta to functions δ:Q×ΣQ\delta\colon Q\times\Sigma^{*}\to Q and δ:2Q×Σ2Q\delta\colon 2^{Q}\times\Sigma^{*}\to 2^{Q} as usual (where 2Q2^{Q} denotes the set of all subsets of QQ). A DFA 𝒟{\mathcal{D}} accepts a word wΣw\in\Sigma^{*} if δ(q0,w)F{\delta}(q_{0},w)\in F. The language accepted by 𝒟{\mathcal{D}} is the set of all words accepted by 𝒟{\mathcal{D}}, and is denoted by L(𝒟)L({\mathcal{D}}). If qq is a state of 𝒟{\mathcal{D}}, then the language Lq(𝒟)L_{q}({\mathcal{D}}) of qq is the language accepted by the DFA (Q,Σ,δ,q,F)(Q,\Sigma,\delta,q,F). A state is empty (or dead or a sink state) if its language is empty. Two states pp and qq of 𝒟{\mathcal{D}} are equivalent if Lp(𝒟)=Lq(𝒟)L_{p}({\mathcal{D}})=L_{q}({\mathcal{D}}). A state qq is reachable if there exists wΣw\in\Sigma^{*} such that δ(q0,w)=q\delta(q_{0},w)=q. A DFA 𝒟{\mathcal{D}} is minimal if it has the smallest number of states among all DFAs accepting L(𝒟)L({\mathcal{D}}). We say a DFA has a minimal alphabet if its alphabet is equal to the alphabet of L(𝒟)L({\mathcal{D}}). It is well known that a DFA with a minimal alphabet is minimal if and only if all of its states are reachable and no two states are equivalent.

A nondeterministic finite automaton (NFA) is a quintuple 𝒩=(Q,Σ,δ,I,F)\mathcal{N}=(Q,\Sigma,\delta,I,F), where QQ, Σ\Sigma and FF are as in a DFA, δ:Q×Σ2Q\delta\colon Q\times\Sigma\to 2^{Q}, and IQI\subseteq Q is the set of initial states. Each triple (p,a,q)(p,a,q) with p,qQp,q\in Q, aΣa\in\Sigma is a transition if qδ(p,a)q\in\delta(p,a). A sequence ((p0,a0,q0),(p1,a1,q1),,(pk1,ak1,qk1))((p_{0},a_{0},q_{0}),(p_{1},a_{1},q_{1}),\dots,(p_{k-1},a_{k-1},q_{k-1})) of transitions, where pi+1=qip_{i+1}=q_{i} for i=0,,k2i=0,\dots,k-2 is a path in 𝒩{\mathcal{N}}. The word a0a1ak1a_{0}a_{1}\cdots a_{k-1} is the word spelled by the path. A word ww is accepted by 𝒩{\mathcal{N}} is there exists a path with p0Ip_{0}\in I and qk1Fq_{k-1}\in F that spells ww. If qδ(p,a)q\in\delta(p,a) we also use the notation p𝑎qp\xrightarrow{a}q. We extend this notation also to words, and write p𝑤qp\xrightarrow{w}q for wΣw\in\Sigma^{*}.

The state complexity of a regular language LL, denoted by κ(L)\kappa(L), is the number of states in the minimal DFA accepting LL. Henceforth we frequently refer to state complexity as simply complexity, and we denote a language of complexity nn by LnL_{n}, and a DFA with nn states by 𝒟n{\mathcal{D}}_{n}.

The state complexity of a regularity-preserving unary operation \circ on regular languages is the maximal value of κ(L)\kappa(L^{\circ}), expressed as a function of one parameter nn, where LL varies over all regular languages with complexity at most nn. For example, the state complexity of the reversal operation is 2n2^{n}; it is known that if LL has complexity at most nn, then κ(LR)2n\kappa(L^{R})\leqslant 2^{n}, and furthermore this upper bound is tight in the sense that for each n1n\geqslant 1 there exists a language LnL_{n} such that κ(LnR)=2n\kappa(L_{n}^{R})=2^{n}. In general, to show that an upper bound on κ(L)\kappa(L^{\circ}) is tight, we need to exhibit a sequence (Lnnk)=(Lk,Lk+1,)(L_{n}\mid n\geqslant k)=(L_{k},L_{k+1},\dots), called a stream, of languages of each complexity nkn\geqslant k (for some small constant kk) that meet this upper bound. Often we are not interested in the special-case behaviour of the operation that may occur at very small values of nn; the parameter kk allows us to ignore these small values and simplify the statements of results.

The state complexity of a regularity-preserving binary operation \circ on regular languages is the maximal value of κ(LL)\kappa(L^{\prime}\circ L), epxressed as a function of two parameters mm and nn, where LL^{\prime} varies over all regular languages of complexity at most mm and LL varies over all regular languages of complexity at most nn. In this case, to show an upper bound on the state complexity is tight, we need to exhibit two classes (Lm,nmh,nk)(L^{\prime}_{m,n}\mid m\geqslant h,n\geqslant k) and (Lm,nmh,nk)(L_{m,n}\mid m\geqslant h,n\geqslant k) of languages meeting the bound; the notation Lm,nL^{\prime}_{m,n} and Lm,nL_{m,n} implies that Lm,nL^{\prime}_{m,n} and Lm,nL_{m,n} depend on both mm and nn. However, in most cases studied in the literature, it is enough to use witness streams (Lmmh)(L^{\prime}_{m}\mid m\geqslant h) and (Lnnk)(L_{n}\mid n\geqslant k), where LmL^{\prime}_{m} is independent of nn and LnL_{n} is independent of mm.

For binary operations we consider two types of state complexity: restricted and unrestricted state complexity. For restricted state complexity the operands of the binary operations are required to have the same alphabet. For unrestricted state complexity the alphabets of the operands may differ. See [7] for more details.

Sometimes the same stream can be used for both operands of a binary operation, but this is not always possible. For example, for boolean operations when m=nm=n, the state complexity of LnLn=LnL_{n}\cup L_{n}=L_{n} is nn, whereas the upper bound is mn=n2mn=n^{2}. However, in many cases the second language is a "dialect" of the first, that is, it “differs only slightly” from the first. A dialect of Ln(Σ)L_{n}(\Sigma) is a language obtained from Ln(Σ)L_{n}(\Sigma) by deleting some letters of Σ\Sigma in the words of Ln(Σ)L_{n}(\Sigma) – by this we mean that words containing these letters are deleted – or replacing them by letters of another alphabet Σ\Sigma^{\prime}. In this paper we consider only the cases where Σ=Σ\Sigma=\Sigma^{\prime}, and we encounter only two types of dialects:

  1. 1.

    A dialect in which some letters were deleted; for example, Ln(a,b)L_{n}(a,b) is a dialect of Ln(a,b,c)L_{n}(a,b,c) with cc deleted, and Ln(a,,c)L_{n}(a,-,c) is a dialect with bb deleted.

  2. 2.

    A dialect in which the roles of two letters are exchanged; for example, Ln(b,a)L_{n}(b,a) is such a dialect of Ln(a,b)L_{n}(a,b).

These two types of dialects can be combined, for example, in Ln(a,,b)L_{n}(a,-,b) the letter cc is deleted, and bb plays the role that cc played originally. The notion of dialects also extends to DFAs; for example, if 𝒟n(a,b,c){\mathcal{D}}_{n}(a,b,c) recognizes Ln(a,b,c)L_{n}(a,b,c) then 𝒟n(a,,b){\mathcal{D}}_{n}(a,-,b) recognizes the dialect Ln(a,,b)L_{n}(a,-,b).

We use Qn={0,,n1}Q_{n}=\{0,\dots,n-1\} as our basic set with nn elements. A transformation of QnQ_{n} is a mapping t:QnQnt\colon Q_{n}\to Q_{n}. The image of qQnq\in Q_{n} under tt is denoted by qtqt, and this notation is extended to subsets of QnQ_{n}. The preimage of qQnq\in Q_{n} under tt the set qt1={pQn:pt=q}qt^{-1}=\{p\in Q_{n}:pt=q\}, and this notation is extended to subsets of QnQ_{n} as follows: St1={pQn:ptS}St^{-1}=\{p\in Q_{n}:pt\in S\}. The rank of a transformation tt is the cardinality of QntQ_{n}t. If ss and tt are transformations of QnQ_{n}, their composition is denoted stst and we have q(st)=(qs)tq(st)=(qs)t for qQnq\in Q_{n}. The kk-fold composition ttttt\dotsb t (with kk occurences of tt) is denoted tkt^{k}, and for SQnS\subseteq Q_{n} we define Stk=S(tk)1St^{-k}=S(t^{k})^{-1}. Let 𝒯Qn\mathcal{T}_{Q_{n}} be the set of all nnn^{n} transformations of QnQ_{n}; then 𝒯Qn\mathcal{T}_{Q_{n}} is a monoid under composition.

For k2k\geqslant 2, a transformation tt of a set P={q0,q1,,qk1}QnP=\{q_{0},q_{1},\ldots,q_{k-1}\}\subseteq Q_{n} is a kk-cycle if q0t=q1,q1t=q2,,qk2t=qk1,qk1t=q0q_{0}t=q_{1},q_{1}t=q_{2},\ldots,q_{k-2}t=q_{k-1},q_{k-1}t=q_{0}. This kk-cycle is denoted by (q0,q1,,qk1)(q_{0},q_{1},\ldots,q_{k-1}), and leaves the states in QnPQ_{n}\setminus P unchanged. A 2-cycle (q0,q1)(q_{0},q_{1}) is called a transposition. A transformation that sends state pp to qq and acts as the identity on the remaining states is denoted by (pq)(p\to q). The identity transformation is denoted by 1.

Let 𝒟=(Qn,Σ,δ,0,F)\mathcal{D}=(Q_{n},\Sigma,\delta,0,F) be a DFA. For each word wΣw\in\Sigma^{*}, the transition function induces a transformation δw\delta_{w} of QnQ_{n} by ww: for all qQnq\in Q_{n}, qδw=δ(q,w).q\delta_{w}=\delta(q,w). The set T𝒟T_{\mathcal{D}} of all such transformations by non-empty words is the transition semigroup of 𝒟\mathcal{D} under composition. Often we use the word ww to denote the transformation tt it induces; thus we write qwqw instead of qδwq\delta_{w}. We also write w:tw\colon t to mean that ww induces the transformation tt.

The size of the syntactic semigroup of a regular language is another measure of the complexity of the language [4]. Write Σ+\Sigma^{+} for Σ{ε}\Sigma^{*}\setminus\{\varepsilon\}. The syntactic congruence of a language LΣL\subseteq\Sigma^{*} is defined on Σ+\Sigma^{+} as follows: For x,yΣ+,xLyx,y\in\Sigma^{+},x\,{\mathbin{\approx_{L}}}\,y if and only if wxzLwyzLwxz\in L\Leftrightarrow wyz\in L for all w,zΣ.w,z\in\Sigma^{*}. The quotient set Σ+/L\Sigma^{+}/{\mathbin{\approx_{L}}} of equivalence classes of L{\mathbin{\approx_{L}}} is a semigroup, the syntactic semigroup TLT_{L} of LL. The syntactic semigroup is isomorphic to the transition semigroup of the minimal DFA of LL [21].

The (left) quotient of LΣL\subseteq\Sigma^{*} by a word wΣw\in\Sigma^{*} is the language w1L={x:wxL}w^{-1}L=\{x:wx\in L\}. It is well known that the number of quotients of a regular language is finite and equal to the state complexity of the language.

The atoms of a regular language are defined by a left congruence, where two words xx and yy are congruent whenever uxLux\in L if and only if uyLuy\in L for all uΣu\in\Sigma^{*}. Thus xx and yy are congruent whenever xu1Lx\in u^{-1}L if and only if yu1Ly\in u^{-1}L for all uΣu\in\Sigma^{*}. An equivalence class of this relation is an atom of LL [10]. Atoms can be expressed as non-empty intersections of complemented and uncomplemented quotients of LL. The number of atoms and their state complexities were suggested as measures of complexity of regular languages [4] because all quotients of a language and all quotients of its atoms are unions of atoms [9, 10, 14].

3 Main Results

The automata described in [19] that characterize union-free languages are called there one-cycle-free-path automata. They are defined by the property that there is only one final state and a unique cycle-free path from each state to the final state. We are now ready to define a most complex deterministic one-cycle-free-path DFA and its most complex deterministic union-free language.

The most complex stream below meets all of our complexity bounds. However, our witness uses three letters for restricted product whereas [15] uses binary witnesses. The same shortcoming of most complex streams occurs in the case of regular languages [4]; that seems to be the price of getting a witness for all operations rather than minimizing the alphabet for each operation.

Definition 1

For n3n\geqslant 3, let 𝒟n=𝒟n(a,b,c,d)=(Qn,Σ,δn,0,{n1})\mathcal{D}_{n}=\mathcal{D}_{n}(a,b,c,d)=(Q_{n},\Sigma,\delta_{n},0,\{n-1\}), where Σ={a,b,c,d}\Sigma=\{a,b,c,d\}, and δn\delta_{n} is defined by the transformations a:(1,,n1)a\colon(1,\dots,n-1), b:(0,1)b\colon(0,1), c:(10)c\colon(1\to 0), and d:𝟏d\colon{\mathbf{1}}; see Figure 1. Let Ln=Ln(a,b,c,d)L_{n}=L_{n}(a,b,c,d) be the language accepted by 𝒟n(a,b,c,d)\mathcal{D}_{n}(a,b,c,d).

\gassetNh=1.8,Nw=3.5,Nmr=1.25,ELdist=0.4,loopdiam=1.5 \node(0)(1,7)0\imark(0) \node(1)(7,7)1 \node(2)(13,7)2 \node(3)(19,7)3 \node[Nframe=n](3dots)(25,7)\dots\node(n-2)(31,7)n2n-2\node(n-1)(37,7)n1n-1\rmark(n-1) \drawedge[curvedepth= .8,ELdist=.4](0,1)bb\drawedge[curvedepth= .8,ELdist=.4](1,0)b,cb,c\drawloop(0)a,c,da,c,d\drawedge[ELdist=.5](1,2)aa\drawloop(1)dd\drawloop(2)b,c,db,c,d\drawedge[ELdist=.5](n-2,n-1)aa\drawloop(3)b,c,db,c,d\drawedge[ELdist=.5](3dots,n-2)aa\drawedge[ELdist=.5](3,3dots)aa\drawloop(n-2)b,c,db,c,d\drawedge[ELdist=.5](2,3)aa\drawedge[curvedepth= 3.0,ELdist=-1.0](n-1,1)aa\drawloop(n-1)b,c,db,c,d
Figure 1: Most complex minimal one-cycle-free-path DFA 𝒟n(a,b,c,d){\mathcal{D}}_{n}(a,b,c,d) of Definition 1.

The DFA of Definition 1 bears some similarities to the DFA for reversal in Fig. 6 in [15, p. 1650]. It is evident that it is a one-cycle-free-path DFA. Let E=(a(bcd))n2aE=(a(b\cup c\cup d)^{*})^{n-2}a. One verifies that

Ln=[(acd)b(dE(bcd)a)(bc)]b(dE(bcd)a)E(bcd).\begin{array}[]{rl}L_{n}=&[(a\cup c\cup d)\cup b(d\cup E(b\cup c\cup d)^{*}a)^{*}(b\cup c)]^{*}\\ &b(d\cup E(b\cup c\cup d)^{*}a)^{*}E(b\cup c\cup d)^{*}.\end{array}

Noting that (E1E2Ek)=(E1E2Ek)(E_{1}\cup E_{2}\cup\cdots\cup E_{k})^{*}=(E_{1}^{*}E_{2}^{*}\cdots E_{k}^{*})^{*} for all regular expressions EiE_{i}, i=1,,ki=1,\dots,k, we obtain a union-free expression for LnL_{n}.

Theorem 3.1 (Most Complex Deterministic Union-Free Languages)

For each n3n\geqslant 3, the DFA of Definition 1 is minimal and recognizes a deterministic union-free language. The stream (Ln(a,b,c)n3)(L_{n}(a,b,c)\mid n\geqslant 3) with some dialect streams is most complex in the class of deterministic union-free languages in the following sense:

  1. 1.

    The syntactic semigroup of Ln(a,b,c)L_{n}(a,b,c) has cardinality nnn^{n}, and at least three letters are required to reach this bound.

  2. 2.

    Each quotient of Ln(a,b)L_{n}(a,b) has complexity nn.

  3. 3.

    The reverse of Ln(a,b,c)L_{n}(a,b,c) has complexity 2n2^{n}. Moreover, Ln(a,b,c)L_{n}(a,b,c) has 2n2^{n} atoms.

  4. 4.

    Each atom ASA_{S} of Ln(a,b,c)L_{n}(a,b,c) has maximal complexity:

    κ(AS)={2n1,if S{,Qn};1+x=1|S|y=1n|S|(nx)(nxy),if SQn.\kappa(A_{S})=\begin{cases}2^{n}-1,&\text{if $S\in\{\emptyset,Q_{n}\}$;}\\ 1+\sum_{x=1}^{|S|}\sum_{y=1}^{n-|S|}\binom{n}{x}\binom{n-x}{y},&\text{if $\emptyset\subsetneq S\subsetneq Q_{n}$.}\end{cases}
  5. 5.

    The star of Ln(a,b)L_{n}(a,b) has complexity 2n1+2n22^{n-1}+2^{n-2}.

  6. 6.
    1. (a)

      Restricted product: κ(Lm(a,b,c)Ln(a,b,c))=(m1)2n+2n1\kappa(L_{m}(a,b,c)L_{n}(a,b,c))=(m-1)2^{n}+2^{n-1}.

    2. (b)

      Unrestricted product: κ(Lm(a,b,c)Ln(a,b,c,d))=m2n+2n1\kappa(L_{m}(a,b,c)L_{n}(a,b,c,d))=m2^{n}+2^{n-1}.

  7. 7.
    1. (a)

      Restricted boolean operations: For (m,n)(3,3)(m,n)\neq(3,3), κ(Lm(a,b)Ln(b,a))=mn\kappa(L_{m}(a,b)\circ L_{n}(b,a))=mn for all binary boolean operations \circ that depend on both arguments.

    2. (b)

      Additionally, when mnm\neq n, κ(Lm(a,b)Ln(a,b))=mn\kappa(L_{m}(a,b)\circ L_{n}(a,b))=mn.

    3. (c)

      Unrestricted boolean operations (\oplus denotes symmetric difference):

      {κ(Lm(a,b,,c)Ln(b,a,,d))=(m+1)(n+1) if {,},κ(Lm(a,b,,c)Ln(b,a))=mn+n,Lm(a,b)Ln(b,a)=mn.\begin{cases}\kappa(L_{m}(a,b,-,c)\circ L_{n}(b,a,-,d))=(m+1)(n+1)\text{ if }\circ\in\{\cup,\oplus\},\\ \kappa(L_{m}(a,b,-,c)\setminus L_{n}(b,a))=mn+n,\\ L_{m}(a,b)\cap L_{n}(b,a)=mn.\end{cases}

All of these bounds are maximal for deterministic union-free languages.

Proof

Only state 0 accepts ban2ba^{n-2}, and the shortest word accepted by state qq, 1qn11\leqslant q\leqslant n-1, is an1qa^{n-1-q}. Hence all the states are distinguishable, and 𝒟n{\mathcal{D}}_{n} is minimal. We noted above that it recognizes a deterministic union-free language.

  1. 1.

    It is well known that the three transformations a:(0,n1)a^{\prime}\colon(0,\dots n-1), b:(0,1)b\colon(0,1), and c:(10)c\colon(1\to 0) generate all nnn^{n} transformations of QnQ_{n}. We have bb and cc in 𝒟n{\mathcal{D}}_{n}, and aa^{\prime} is generated by abab. Hence our semigroup is maximal.

  2. 2.

    This is easily verified.

  3. 3.

    By [10] the number of atoms is the same as the complexity of the reverse. By [22] the complexity of the reverse is 2n2^{n}.

  4. 4.

    The proof in [5] applies here as well.

  5. 5.

    We construct an NFA for (Ln(a,b))(L_{n}(a,b))^{*} by taking 𝒟n(a,b){\mathcal{D}}_{n}(a,b) and adding a new initial accepting state ss with sa0s\stackrel{{\scriptstyle a}}{{\longrightarrow}}0 and sb1s\stackrel{{\scriptstyle b}}{{\longrightarrow}}1, and adding new transitions n2a0n-2\stackrel{{\scriptstyle a}}{{\longrightarrow}}0 and n1b0n-1\stackrel{{\scriptstyle b}}{{\longrightarrow}}0; then we determinize to get a DFA. For SQnS\subseteq Q_{n} and aΣa\in\Sigma, the transition function of the DFA is given by

    Sa={Sa{0},if n1Sa;Sa,otherwise.Sa=\begin{cases}Sa\cup\{0\},&\text{if $n-1\in Sa$;}\\ Sa,&\text{otherwise.}\end{cases}

    We claim that the following states are reachable and pairwise distinguishable: the initial state {s}\{s\}, states of the form {0}S\{0\}\cup S with SQn{0}S\subseteq Q_{n}\setminus\{0\}, and non-empty states SS with SQn{0,n1}S\subseteq Q_{n}\setminus\{0,n-1\}, for a total of 2n1+2n22^{n-1}+2^{n-2} states.

    First consider states {0}S\{0\}\cup S with SQn{0}S\subseteq Q_{n}\setminus\{0\}. We prove by induction on |S||S| that all of these states are reachable. In the process, we will also show that SS is reachable when SQn{0,n1}\emptyset\neq S\subseteq Q_{n}\setminus\{0,n-1\}. For the base case |S|=0|S|=0, note that we can reach {0}\{0\} from the initial state {s}\{s\} by aa.

    To reach {0}S\{0\}\cup S with SQn{0}S\subseteq Q_{n}\setminus\{0\} and |S|>0|S|>0, assume we can reach all states {0}T\{0\}\cup T with TQn{0}T\subseteq Q_{n}\setminus\{0\} and |T|<|S||T|<|S|. Let qq be the minimal element of SS; then 1Sa1q1\in Sa^{1-q}. More precisely, if S={q,q1,q2,,qk}S=\{q,q_{1},q_{2},\dotsc,q_{k}\} with 1q<q1<<qkn11\leqslant q<q_{1}<\dotsb<q_{k}\leqslant n-1, then Sa1q={1,q1q+1,,qkq+1}Sa^{1-q}=\{1,q_{1}-q+1,\dotsc,q_{k}-q+1\}. Set T=Sa1q{1}T=Sa^{1-q}\setminus\{1\} and note that |T|<|S||T|<|S|. By the induction hypothesis, we can reach {0}T\{0\}\cup T. Apply bb to reach either {0,1}T\{0,1\}\cup T (if n1Tn-1\in T) or {1}T\{1\}\cup T (if n1Tn-1\not\in T). Note that the only way we can have n1Tn-1\in T is if n1Sn-1\in S and q=1q=1. Now apply aq1a^{q-1} to reach either {0}S\{0\}\cup S (if n1Sn-1\in S) or just SS (if n1Sn-1\not\in S). In the latter case, we can apply an1a^{n-1} to reach {0}S\{0\}\cup S.

    This shows that if SQn{0}S\subseteq Q_{n}\setminus\{0\}, then {0}S\{0\}\cup S is reachable. Furthermore, if SQn{0,n1}S\subseteq Q_{n}\setminus\{0,n-1\} then SS is reachable.

    For distinguishability, if S,TQnS,T\subseteq Q_{n} and STS\neq T, let qq be an element of the symmetric difference of SS and TT. If q0q\neq 0 then an1qa^{n-1-q} distinguishes SS and TT; if q=0q=0 use ban2ba^{n-2}. To distinguish the accepting state {s}\{s\} from accepting states SQnS\subseteq Q_{n}, use bb.

  6. 6.

    To avoid confusion between the states of 𝒟m{\mathcal{D}}_{m} and 𝒟n{\mathcal{D}}_{n}, we mark the states of 𝒟m{\mathcal{D}}_{m} with primes: instead of QmQ_{m} we use Qm={0,1,2,,(m1)}Q^{\prime}_{m}=\{0^{\prime},1^{\prime},2^{\prime},\dotsc,(m-1)^{\prime}\}. In the restricted case, we construct an NFA for Lm(a,b,c)Ln(a,b,c)L_{m}(a,b,c)L_{n}(a,b,c) by taking the disjoint union of 𝒟m(a,b,c){\mathcal{D}}_{m}(a,b,c) and 𝒟n(a,b,c){\mathcal{D}}_{n}(a,b,c), making state (m1)(m-1)^{\prime} non-final, and adding transitions (m2)a0(m-2)^{\prime}\stackrel{{\scriptstyle a}}{{\longrightarrow}}0 and (m1)σ0(m-1)^{\prime}\stackrel{{\scriptstyle\sigma}}{{\longrightarrow}}0 for σ{b,c}\sigma\in\{b,c\}; then we determinize to get a DFA. The states of this DFA are sets of the form {q}S\{q^{\prime}\}\cup S, where qQmq^{\prime}\in Q^{\prime}_{m} and SQnS\subseteq Q_{n}. For aΣa\in\Sigma, the transition function is given by

    ({q}S)a={{qa,0}Sa,if qa=(m1);{qa}Sa,otherwise.(\{q^{\prime}\}\cup S)a=\begin{cases}\{q^{\prime}a,0\}\cup Sa,&\text{if $q^{\prime}a=(m-1)^{\prime}$;}\\ \{q^{\prime}a\}\cup Sa,&\text{otherwise.}\end{cases}

    In the unrestricted case, we use the same construction with 𝒟m(a,b,c){\mathcal{D}}_{m}(a,b,c) and 𝒟n(a,b,c,d){\mathcal{D}}_{n}(a,b,c,d), but there are additional reachable states. In the NFA, if we are in subset {q}S\{q^{\prime}\}\cup S, then by input dd we reach SS, since dd is not in the alphabet of 𝒟m(a,b,c){\mathcal{D}}_{m}(a,b,c). So the determinization also has states SS where SQnS\subseteq Q_{n}.

    We claim the following states of our DFA for product are reachable and pairwise distinguishable:

    • Restricted case: All states of the form {q}S\{q^{\prime}\}\cup S with q(m1)q^{\prime}\neq(m-1)^{\prime} and SQnS\subseteq Q_{n}, and all states of the form {(m1),0}S\{(m-1)^{\prime},0\}\cup S with SQn{0}S\subseteq Q_{n}\setminus\{0\}.

    • Unrestricted case: All states from the restricted case, and all states SS where SQnS\subseteq Q_{n}.

    The initial state is {0}\{0^{\prime}\}, and we have

    {0}b{1}am2{(m1),0}a{1,0}b{0,1}.\{0^{\prime}\}\stackrel{{\scriptstyle b}}{{\longrightarrow}}\{1^{\prime}\}\stackrel{{\scriptstyle a^{m-2}}}{{\longrightarrow}}\{(m-1)^{\prime},0\}\stackrel{{\scriptstyle a}}{{\longrightarrow}}\{1^{\prime},0\}\stackrel{{\scriptstyle b}}{{\longrightarrow}}\{0^{\prime},1\}.

    That is, {0}bam1b{0,1}\{0^{\prime}\}\stackrel{{\scriptstyle ba^{m-1}b}}{{\longrightarrow}}\{0^{\prime},1\}. For 0kn20\leqslant k\leqslant n-2 we have {0,1}ak{0,1+k}\{0^{\prime},1\}\stackrel{{\scriptstyle a^{k}}}{{\longrightarrow}}\{0^{\prime},1+k\}, and {0,1}c{0,0}\{0^{\prime},1\}\stackrel{{\scriptstyle c}}{{\longrightarrow}}\{0^{\prime},0\}. Thus all states of the form {0,q}\{0^{\prime},q\} for qQnq\in Q_{n} are reachable from {0}\{0^{\prime}\}, using the set of words {x,xa,xa2,,xan2,xc}\{x,xa,xa^{2},\dotsb,xa^{n-2},xc\} where x=bam1bx=ba^{m-1}b. Since all of these words are permutations of QnQ_{n} except for xcxc, by [12, Theorem 2] all states of the form {0}S\{0^{\prime}\}\cup S with SQnS\subseteq Q_{n} are reachable. To reach {q}S\{q^{\prime}\}\cup S with 1qm21\leqslant q\leqslant m-2, reach {0}Saq\{0^{\prime}\}\cup Sa^{-q} and apply aqa^{q}. To reach {(m1),0}S\{(m-1)^{\prime},0\}\cup S, reach {(m2)}Sa1\{(m-2)^{\prime}\}\cup Sa^{-1} and apply aa. In the unrestricted case, we can also reach each state SS from {0}S\{0^{\prime}\}\cup S by dd.

    To see all of these states are distinguishable, consider two distinct states XSX\cup S and YTY\cup T. In the restricted case, XX and YY are singleton subsets of QmQ^{\prime}_{m}; in the unrestricted case they may be singletons or empty sets. In both cases SS and TT are arbitrary subsets of QnQ_{n}. If STS\neq T, let qq be an element of the symmetric difference of SS and TT. If q0q\neq 0 then an1qa^{n-1-q} distinguishes the states; if q=0q=0 use ban2ba^{n-2}. If S=TS=T, then XYX\neq Y and at least one of XX or YY is non-empty. Assume without loss of generality that YY is non-empty, say Y={q}Y=\{q^{\prime}\}, and assume XX is either empty or equal to {p}\{p^{\prime}\} where p<qp<q. We consider several cases:

    1. (i)

      If 0S0\not\in S, then am1qa^{m-1-q} reduces this case to the case where STS\neq T.

    2. (ii)

      If 0S0\in S and 1S1\not\in S, and {p,q}{0,1}\{p^{\prime},q^{\prime}\}\neq\{0^{\prime},1^{\prime}\}, then bb reduces this to case (i).

    3. (iii)

      If 0,1S0,1\in S, and {p,q}{0,1}\{p^{\prime},q^{\prime}\}\neq\{0^{\prime},1^{\prime}\}, then cc reduces this to case (ii).

    4. (iv)

      If {p,q}={0,1}\{p^{\prime},q^{\prime}\}=\{0^{\prime},1^{\prime}\}, then aa reduces this to case (i), (ii) or (iii).

    This shows that in both the restricted and unrestricted cases, all reachable states are pairwise distinguishable.

  7. 7.
    1. (a)

      A binary boolean operation is proper if it depends on both arguments. For example, \cup, \cap, \setminus and \oplus are proper, whereas the operation (L,L)L(L^{\prime},L)\mapsto L is not proper since it depends only on the second argument. Since the transition semigroups of 𝒟m{\mathcal{D}}_{m} and 𝒟n{\mathcal{D}}_{n} are the symmetric groups SmS_{m} and SnS_{n}, for m,n5m,n\geqslant 5, Theorem 1 of [2] applies, and all proper binary boolean operations have complexity mnmn. For (m,n){(3,4),(4,3),(4,4)}(m,n)\in\{(3,4),(4,3),(4,4)\} we have verified our claim by computation.

    2. (b)

      This holds by [2, Theorem 1] as well.

    3. (c)

      The upper bounds for unrestricted boolean operations on regular languages were derived in [7]. The proof that that the bounds are tight is very similar to the corresponding proof of Theorem 1 in [7]. For m,n3m,n\geqslant 3, let Dm(a,b,,c)D^{\prime}_{m}(a,b,-,c) be the dialect of 𝒟m(a,b,c,d){\mathcal{D}}^{\prime}_{m}(a,b,c,d) where cc plays the role of dd and the alphabet is restricted to {a,b,c}\{a,b,c\}, and let 𝒟n(b,a,,d){\mathcal{D}}_{n}(b,a,-,d) be the dialect of 𝒟n(a,b,c,d){\mathcal{D}}_{n}(a,b,c,d) in which aa and bb are permuted, and the alphabet is restricted to {a,b,d}\{a,b,d\}; see Figure 2.

      \gassetNh=2.2,Nw=5.0,Nmr=1.25,ELdist=0.4,loopdiam=1.5 \node(0’)(-2,14)00^{\prime}\imark(0’) \node(1’)(7,14)11^{\prime}\node(2’)(16,14)22^{\prime}\node[Nframe=n](3dots’)(25,14)\dots\node(m-1’)(34,14)(m1)(m-1)^{\prime}\rmark(m-1’) \drawedge[curvedepth= 1.4,ELdist=-1.3](0’,1’)bb\drawedge[curvedepth= 1,ELdist=.3](1’,0’)bb\drawedge(1’,2’)aa\drawedge(2’,3dots’)aa\drawedge(3dots’,m-1’)aa\drawedge[curvedepth= -5.2,ELdist=-1](m-1’,1’)aa\drawloop(0’)a,ca,c\drawloop(1’)cc\drawloop(2’)b,cb,c\drawloop(m-1’)b,cb,c\gassetNh=2.2,Nw=5.0,Nmr=1.25,ELdist=0.4,loopdiam=1.5 \node(0)(-2,7)0\imark(0) \node(1)(7,7)1 \node(2)(16,7)2 \node[Nframe=n](3dots)(25,7)\dots\node(n-1)(34,7)n1n-1\rmark(n-1) \drawloop(0)b,db,d\drawloop(1)dd\drawloop(2)a,da,d\drawloop(n-1)a,da,d\drawedge[curvedepth= 1.2,ELdist=-1.0](0,1)aa\drawedge[curvedepth= .8,ELdist=.25](1,0)aa\drawedge(1,2)bb\drawedge(2,3dots)bb\drawedge(3dots,n-1)bb\drawedge[curvedepth= 3.0,ELdist=-1.5](n-1,1)bb
      Figure 2: Witnesses Dm(a,b,,c)D^{\prime}_{m}(a,b,-,c) and 𝒟n(b,a,,d){\mathcal{D}}_{n}(b,a,-,d) for boolean operations.

      Next we complete the two DFAs by adding empty states. Restricting both DFAs to the alphabet {a,b}\{a,b\}, leads us to the problem of determining the complexity of two DFAs over the same alphabet. In the direct product of the two DFAs, by [2, Theorem 1] and computation for the cases (m,n){(3,4),(4,3),(4,4)}(m,n)\in\{(3,4),(4,3),(4,4)\}, all mnmn states of the form {p,q}\{p^{\prime},q\}, pQmp^{\prime}\in Q^{\prime}_{m}, qQnq\in Q_{n}, are reachable and pairwise distinguishable by words in {a,b}\{a,b\}^{*} for all proper boolean operations. As shown in Figure 3, the remaining states of the direct product are reachable; hence all (m+1)(n+1)(m+1)(n+1) states are reachable.

      The proof of distinguishability of pairs of states in the direct product for the union, intersection and symmetric difference is the same as that in [7]. The proof for difference given in [7] is incorrect, but a corrected version is available in [6]. ∎

\gassetNh=2.6,Nw=2.6,Nmr=1.2,ELdist=0.3,loopdiam=1.2 \node(0’0)(2,15)0,00^{\prime},0\imark(0’0) \node(1’0)(2,10)1,01^{\prime},0\node(2’0)(2,5)2,02^{\prime},0\rmark(2’0) \node(3’0)(2,0),0\emptyset^{\prime},0\node(0’1)(10,15)0,10^{\prime},1\node(1’1)(10,10)1,11^{\prime},1\node(2’1)(10,5)2,12^{\prime},1\rmark(2’1) \node(3’1)(10,0),1\emptyset^{\prime},1\node(0’2)(18,15)0,20^{\prime},2\node(1’2)(18,10)1,21^{\prime},2\node(2’2)(18,5)2,22^{\prime},2\rmark(2’2) \node(3’2)(18,0),2\emptyset^{\prime},2\node(0’3)(26,15)0,30^{\prime},3\rmark(0’3) \node(1’3)(26,10)1,31^{\prime},3\rmark(1’3) \node(2’3)(26,5)2,32^{\prime},3\rmark(2’3) \node(3’3)(26,0),3\emptyset^{\prime},3\rmark(3’3) \node(0’4)(34,15)0,0^{\prime},\emptyset\node(1’4)(34,10)1,1^{\prime},\emptyset\node(2’4)(34,5)2,2^{\prime},\emptyset\rmark(2’4) \node(3’4)(34,0),\emptyset^{\prime},\emptyset\drawedge(3’0,3’1)aa\drawedge(3’1,3’2)bb\drawedge(3’2,3’3)bb\drawedge[curvedepth=2,ELdist=.4](3’3,3’1)bb\drawedge(0’4,1’4)bb\drawedge(1’4,2’4)aa\drawedge[curvedepth=-2,ELdist=-.9](2’4,1’4)aa\drawedge(3’3,3’4)cc\drawedge(2’4,3’4)dd\drawedge[curvedepth=-3,ELdist=.4](0’0,3’0)dd\drawedge[curvedepth=3,ELdist=.4](0’0,0’4)cc
Figure 3: Direct product for union shown partially.

4 Conclusions

We have exhibited a single ternary language stream that is a witness for the maximal state complexities of star and reversal of union-free languages. Together with some dialects it also constitutes a witness for union, intersection, difference, symmetric difference, and product in case the alphabets of the two operands are the same. As was shown in [15] these bounds are the same as those for regular languages. We prove that our witness also has the largest syntactic semigroup and most complex atoms, and that these complexities are again the same as those for arbitrary regular languages. By adding a fourth input inducing the identity transformation to our witness we obtain witnesses for unrestricted binary operations, where the alphabets of the operands are not the same. The bounds here are again the same as those for regular languages. In summary, this shows that the complexity measures proposed in [4] do not distinguish union-free languages from regular languages.

References

  • [1] Afonin, S., Golomazov, D.: Minimal union-free decompositions of regular languages. In: Dediu, A.H., et al. (eds.) LATA 2009. LNCS, vol. 5457, pp. 83–92. Springer (2009)
  • [2] Bell, J., Brzozowski, J.A., Moreira, N., Reis, R.: Symmetric groups and quotient complexity of boolean operations. In: Esparza, J., et al. (eds.) ICALP 2014. LNCS, vol. 8573, pp. 1–12. Springer (2014)
  • [3] Brzozowski, J.A.: Regular Expression Techniques for Sequential Circuits. Ph.D. thesis, Princeton University, Princeton, NJ (1962), http://maveric.uwaterloo.ca/˜brzozo/publication.html
  • [4] Brzozowski, J.A.: In search of the most complex regular languages. Int. J. Found. Comput. Sc. 24(6), 691–708 (2013)
  • [5] Brzozowski, J.A., Davies, S.: Quotient complexities of atoms of regular ideal languages. Acta Cybernet. 22, 293–311 (2015)
  • [6] Brzozowski, J.A., Sinnamon, C.: Unrestricted state complexity of binary operations on regular and ideal languages (2016), updated 2017. http://arxiv.org/abs/1609.04439
  • [7] Brzozowski, J.A., Sinnamon, C.: Unrestricted state complexity of binary operations on regular and ideal languages. Journal of Automata, Languages and Combinatorics 22(1–3), 29–59 (2017)
  • [8] Brzozowski, J.A., Szykuła, M.: Large aperiodic semigroups. Int. J. Found. Comput. Sc. 26(7), 913–931 (2015)
  • [9] Brzozowski, J.A., Tamm, H.: Complexity of atoms of regular languages. Int. J. Found. Comput. Sc. 24(7), 1009–1027 (2013)
  • [10] Brzozowski, J.A., Tamm, H.: Theory of átomata. Theoret. Comput. Sci. 539, 13–27 (2014)
  • [11] Crvenković, S., Dolinka, I., Ésik, Z.: On equations for union-free regular languages. Inform. and Comput. 164, 152–172 (2001)
  • [12] Davies, S.: A new technique for reachability of states in concatenation automata (2017), https://arxiv.org/abs/1710.05061
  • [13] Holzer, M., Kutrib, M.: Structure and complexity of some subregular language families. In: Konstantinidis, S., Moreira, N., Reis, R., Shallit, J. (eds.) The Role of Theory in Computer Science, pp. 59–82. World Scientific (2017)
  • [14] Iván, S.: Complexity of atoms, combinatorially. Inform. Process. Lett. 116(5), 356–360 (2016)
  • [15] Jirásková, G., Masopust, T.: Complexity in union-free regular languages. Int. J. Found. Comput. Sc. 22(7), 1639–1653 (2011)
  • [16] Jirásková, G., Nagy, B.: On union-free and deterministic union-free languages. In: Baeten, J.C.M., Ball, T., de Boer., F.S. (eds.) TCS 2012. LNCS, vol. 7604, pp. 179–192. Springer (2012)
  • [17] Kutrib, M., Wendlandt, M.: Concatenation-free languages. Theoretical Computer Science 679(Supplement C), 83–94 (2017)
  • [18] McNaughton, R., Papert, S.: Counter-Free Automata. The MIT Press (1971)
  • [19] Nagy, B.: Union-free regular languages and 1-cycle-free-path-automata. Publ. Math. Debrecen 68(1-2), 183–197 (2006)
  • [20] Nagy, B.: On union complexity of regular languages. In: CINTI 2010. pp. 177–182. IEEE (2010)
  • [21] Pin, J.E.: Syntactic semigroups. In: Handbook of Formal Languages, vol. 1: Word, Language, Grammar, pp. 679–746. Springer, New York, NY, USA (1997)
  • [22] Salomaa, A., Wood, D., Yu, S.: On the state complexity of reversals of regular languages. Theoret. Comput. Sci. 320, 315–329 (2004)
  • [23] Schützenberger, M.: On finite monoids having only trivial subgroups. Inform. and Control 8, 190–194 (1965)