Waterloo, ON, Canada N2L 3G1
brzozo@uwaterloo.ca 22institutetext: Department of Pure Mathematics, University of Waterloo
Waterloo, ON, Canada N2L 3G1
sldavies@uwaterloo.ca
Most Complex Deterministic Union-Free Regular Languages††thanks: This work was supported by the Natural Sciences and Engineering Research Council of Canada grant No. OGP0000871.
Abstract
A regular language is union-free if it can be represented by a regular expression without the union operation. A union-free language is deterministic if it can be accepted by a deterministic one-cycle-free-path finite automaton; this is an automaton which has one final state and exactly one cycle-free path from any state to the final state. Jirásková and Masopust proved that the state complexities of the basic operations reversal, star, product, and boolean operations in deterministic union-free languages are exactly the same as those in the class of all regular languages. To prove that the bounds are met they used five types of automata, involving eight types of transformations of the set of states of the automata. We show that for each there exists one ternary witness of state complexity that meets the bound for reversal and product. Moreover, the restrictions of this witness to binary alphabets meet the bounds for star and boolean operations. We also show that the tight upper bounds on the state complexity of binary operations that take arguments over different alphabets are the same as those for arbitrary regular languages. Furthermore, we prove that the maximal syntactic semigroup of a union-free language has elements, as in the case of regular languages, and that the maximal state complexities of atoms of union-free languages are the same as those for regular languages. Finally, we prove that there exists a most complex union-free language that meets the bounds for all these complexity measures. Altogether this proves that the complexity measures above cannot distinguish union-free languages from regular languages.
Keywords: atom, boolean operation, concatenation, different alphabets, most complex, one-cycle-free-path, regular, reversal, star, state complexity, syntactic semigroup, transition semigroup, union-free
1 Introduction
Formal definitions are postponed until Section 2.
The class of regular languages over a finite alphabet is the smallest class of languages containing the empty language , the language , where is the empty word, and the letter languages for each , and closed under the operations of union, concatenation, and (Kleene) star. Hence each regular language can be written as a finite expression involving the above basic languages and operations. An expression defining a regular language in this way is called a regular expression. Because regular languages are also closed under complementation, we may also consider regular expressions that allow complementation, which are called extended regular expressions. In this paper we deal exclusively with regular languages.
A natural question is: what kind of languages are defined if one of the operations in the definitions given above is missing? If the star operation is removed from the extended regular expressions we get the well known star-free languages [8, 18, 23], which have been extensively studied. Less attention was given to classes defined by removing an operation from ordinary regular expressions, but recently language classes defined without union or concatenation have been studied.
If we remove some operations from regular expressions, we obtain the following classes of languages:
- Union only
-
subsets of .
- Concatenation only
-
and for each .
- Star only
-
, , for each , and for each .
- Union and Concatenation
-
Finite languages.
- Concatenation and Star
-
These are the union-free languages that constitute the main topic of this paper.
- Union and Star
Union-free regular languages were first considered by Brzozowski [3] in 1962 under the name star-dot regular languages, where dot stands for concatenation. He proved that every regular language is a union of union-free languages [3, p. 216, Theorem 9.5]111Terminology changed to that of the present paper.. Much more recently, in 2001, Crvenković, Dolinka and Ésik [11] studied equations satisfied by union-free regular languages, and proved that the class of these languages cannot be axiomatized by a finite set of equations. This is also known to be true for the class of all regular languages. In 2006 Nagy studied union-free languages in detail and characterized them in terms of nondeterministic finite automata (NFAs) recognizing them [19], which he called one-cycle-free-path NFAs. In 2009 minimal union-free decompositions of regular languages were studied in [1] by Afonin and Golomazov. They also presented a new algorithm for deciding whether a given deterministic finite automaton (DFA) accepts a union-free language. Decompositions of regular languages in terms of union-free languages were further studied by Nagy in 2010 [20]. The state complexities of operations on union-free languages were examined in 2011 by Jirásková and Masopust [15], who proved that the state complexities of basic operations on these languages are the same as those in the class of all regular languages. It was shown in [15] that the class of languages defined by DFAs with the one-cycle-free-path property is a proper subclass of that defined by one-cycle-free-path NFAs; the former class is called the class of deterministic union-free languages. In 2012 Jirásková and Nagy [16] proved that the class of finite unions of deterministic union-free languages is a proper subclass of the class of regular languages. They also showed that every deterministic union-free language is accepted by a special kind of a one-cycle-free-path DFA called a balloon DFA. A summary of the properties of union-free languages was presented in 2017 in [13].
2 Preliminaries
Let be a regular language. We define the alphabet of to be the set of letters which appear at least once in a word of . For example, consider the language and the subset ; we say has alphabet and has alphabet .
A deterministic finite automaton (DFA) is a quintuple , where is a finite non-empty set of states, is a finite non-empty alphabet, is the transition function, is the initial state, and is the set of final states. We extend to functions and as usual (where denotes the set of all subsets of ). A DFA accepts a word if . The language accepted by is the set of all words accepted by , and is denoted by . If is a state of , then the language of is the language accepted by the DFA . A state is empty (or dead or a sink state) if its language is empty. Two states and of are equivalent if . A state is reachable if there exists such that . A DFA is minimal if it has the smallest number of states among all DFAs accepting . We say a DFA has a minimal alphabet if its alphabet is equal to the alphabet of . It is well known that a DFA with a minimal alphabet is minimal if and only if all of its states are reachable and no two states are equivalent.
A nondeterministic finite automaton (NFA) is a quintuple , where , and are as in a DFA, , and is the set of initial states. Each triple with , is a transition if . A sequence of transitions, where for is a path in . The word is the word spelled by the path. A word is accepted by is there exists a path with and that spells . If we also use the notation . We extend this notation also to words, and write for .
The state complexity of a regular language , denoted by , is the number of states in the minimal DFA accepting . Henceforth we frequently refer to state complexity as simply complexity, and we denote a language of complexity by , and a DFA with states by .
The state complexity of a regularity-preserving unary operation on regular languages is the maximal value of , expressed as a function of one parameter , where varies over all regular languages with complexity at most . For example, the state complexity of the reversal operation is ; it is known that if has complexity at most , then , and furthermore this upper bound is tight in the sense that for each there exists a language such that . In general, to show that an upper bound on is tight, we need to exhibit a sequence , called a stream, of languages of each complexity (for some small constant ) that meet this upper bound. Often we are not interested in the special-case behaviour of the operation that may occur at very small values of ; the parameter allows us to ignore these small values and simplify the statements of results.
The state complexity of a regularity-preserving binary operation on regular languages is the maximal value of , epxressed as a function of two parameters and , where varies over all regular languages of complexity at most and varies over all regular languages of complexity at most . In this case, to show an upper bound on the state complexity is tight, we need to exhibit two classes and of languages meeting the bound; the notation and implies that and depend on both and . However, in most cases studied in the literature, it is enough to use witness streams and , where is independent of and is independent of .
For binary operations we consider two types of state complexity: restricted and unrestricted state complexity. For restricted state complexity the operands of the binary operations are required to have the same alphabet. For unrestricted state complexity the alphabets of the operands may differ. See [7] for more details.
Sometimes the same stream can be used for both operands of a binary operation, but this is not always possible. For example, for boolean operations when , the state complexity of is , whereas the upper bound is . However, in many cases the second language is a "dialect" of the first, that is, it “differs only slightly” from the first. A dialect of is a language obtained from by deleting some letters of in the words of – by this we mean that words containing these letters are deleted – or replacing them by letters of another alphabet . In this paper we consider only the cases where , and we encounter only two types of dialects:
-
1.
A dialect in which some letters were deleted; for example, is a dialect of with deleted, and is a dialect with deleted.
-
2.
A dialect in which the roles of two letters are exchanged; for example, is such a dialect of .
These two types of dialects can be combined, for example, in the letter is deleted, and plays the role that played originally. The notion of dialects also extends to DFAs; for example, if recognizes then recognizes the dialect .
We use as our basic set with elements. A transformation of is a mapping . The image of under is denoted by , and this notation is extended to subsets of . The preimage of under the set , and this notation is extended to subsets of as follows: . The rank of a transformation is the cardinality of . If and are transformations of , their composition is denoted and we have for . The -fold composition (with occurences of ) is denoted , and for we define . Let be the set of all transformations of ; then is a monoid under composition.
For , a transformation of a set is a -cycle if . This -cycle is denoted by , and leaves the states in unchanged. A 2-cycle is called a transposition. A transformation that sends state to and acts as the identity on the remaining states is denoted by . The identity transformation is denoted by 1.
Let be a DFA. For each word , the transition function induces a transformation of by : for all , The set of all such transformations by non-empty words is the transition semigroup of under composition. Often we use the word to denote the transformation it induces; thus we write instead of . We also write to mean that induces the transformation .
The size of the syntactic semigroup of a regular language is another measure of the complexity of the language [4]. Write for . The syntactic congruence of a language is defined on as follows: For if and only if for all The quotient set of equivalence classes of is a semigroup, the syntactic semigroup of . The syntactic semigroup is isomorphic to the transition semigroup of the minimal DFA of [21].
The (left) quotient of by a word is the language . It is well known that the number of quotients of a regular language is finite and equal to the state complexity of the language.
The atoms of a regular language are defined by a left congruence, where two words and are congruent whenever if and only if for all . Thus and are congruent whenever if and only if for all . An equivalence class of this relation is an atom of [10]. Atoms can be expressed as non-empty intersections of complemented and uncomplemented quotients of . The number of atoms and their state complexities were suggested as measures of complexity of regular languages [4] because all quotients of a language and all quotients of its atoms are unions of atoms [9, 10, 14].
3 Main Results
The automata described in [19] that characterize union-free languages are called there one-cycle-free-path automata. They are defined by the property that there is only one final state and a unique cycle-free path from each state to the final state. We are now ready to define a most complex deterministic one-cycle-free-path DFA and its most complex deterministic union-free language.
The most complex stream below meets all of our complexity bounds. However, our witness uses three letters for restricted product whereas [15] uses binary witnesses. The same shortcoming of most complex streams occurs in the case of regular languages [4]; that seems to be the price of getting a witness for all operations rather than minimizing the alphabet for each operation.
Definition 1
For , let , where , and is defined by the transformations , , , and ; see Figure 1. Let be the language accepted by .
The DFA of Definition 1 bears some similarities to the DFA for reversal in Fig. 6 in [15, p. 1650]. It is evident that it is a one-cycle-free-path DFA. Let . One verifies that
Noting that for all regular expressions , , we obtain a union-free expression for .
Theorem 3.1 (Most Complex Deterministic Union-Free Languages)
For each , the DFA of Definition 1 is minimal and recognizes a deterministic union-free language. The stream with some dialect streams is most complex in the class of deterministic union-free languages in the following sense:
-
1.
The syntactic semigroup of has cardinality , and at least three letters are required to reach this bound.
-
2.
Each quotient of has complexity .
-
3.
The reverse of has complexity . Moreover, has atoms.
-
4.
Each atom of has maximal complexity:
-
5.
The star of has complexity .
-
6.
-
(a)
Restricted product: .
-
(b)
Unrestricted product: .
-
(a)
-
7.
-
(a)
Restricted boolean operations: For , for all binary boolean operations that depend on both arguments.
-
(b)
Additionally, when , .
-
(c)
Unrestricted boolean operations ( denotes symmetric difference):
-
(a)
All of these bounds are maximal for deterministic union-free languages.
Proof
Only state 0 accepts , and the shortest word accepted by state , , is . Hence all the states are distinguishable, and is minimal. We noted above that it recognizes a deterministic union-free language.
-
1.
It is well known that the three transformations , , and generate all transformations of . We have and in , and is generated by . Hence our semigroup is maximal.
-
2.
This is easily verified.
- 3.
-
4.
The proof in [5] applies here as well.
-
5.
We construct an NFA for by taking and adding a new initial accepting state with and , and adding new transitions and ; then we determinize to get a DFA. For and , the transition function of the DFA is given by
We claim that the following states are reachable and pairwise distinguishable: the initial state , states of the form with , and non-empty states with , for a total of states.
First consider states with . We prove by induction on that all of these states are reachable. In the process, we will also show that is reachable when . For the base case , note that we can reach from the initial state by .
To reach with and , assume we can reach all states with and . Let be the minimal element of ; then . More precisely, if with , then . Set and note that . By the induction hypothesis, we can reach . Apply to reach either (if ) or (if ). Note that the only way we can have is if and . Now apply to reach either (if ) or just (if ). In the latter case, we can apply to reach .
This shows that if , then is reachable. Furthermore, if then is reachable.
For distinguishability, if and , let be an element of the symmetric difference of and . If then distinguishes and ; if use . To distinguish the accepting state from accepting states , use .
-
6.
To avoid confusion between the states of and , we mark the states of with primes: instead of we use . In the restricted case, we construct an NFA for by taking the disjoint union of and , making state non-final, and adding transitions and for ; then we determinize to get a DFA. The states of this DFA are sets of the form , where and . For , the transition function is given by
In the unrestricted case, we use the same construction with and , but there are additional reachable states. In the NFA, if we are in subset , then by input we reach , since is not in the alphabet of . So the determinization also has states where .
We claim the following states of our DFA for product are reachable and pairwise distinguishable:
-
•
Restricted case: All states of the form with and , and all states of the form with .
-
•
Unrestricted case: All states from the restricted case, and all states where .
The initial state is , and we have
That is, . For we have , and . Thus all states of the form for are reachable from , using the set of words where . Since all of these words are permutations of except for , by [12, Theorem 2] all states of the form with are reachable. To reach with , reach and apply . To reach , reach and apply . In the unrestricted case, we can also reach each state from by .
To see all of these states are distinguishable, consider two distinct states and . In the restricted case, and are singleton subsets of ; in the unrestricted case they may be singletons or empty sets. In both cases and are arbitrary subsets of . If , let be an element of the symmetric difference of and . If then distinguishes the states; if use . If , then and at least one of or is non-empty. Assume without loss of generality that is non-empty, say , and assume is either empty or equal to where . We consider several cases:
-
(i)
If , then reduces this case to the case where .
-
(ii)
If and , and , then reduces this to case (i).
-
(iii)
If , and , then reduces this to case (ii).
-
(iv)
If , then reduces this to case (i), (ii) or (iii).
This shows that in both the restricted and unrestricted cases, all reachable states are pairwise distinguishable.
-
•
-
7.
-
(a)
A binary boolean operation is proper if it depends on both arguments. For example, , , and are proper, whereas the operation is not proper since it depends only on the second argument. Since the transition semigroups of and are the symmetric groups and , for , Theorem 1 of [2] applies, and all proper binary boolean operations have complexity . For we have verified our claim by computation.
-
(b)
This holds by [2, Theorem 1] as well.
-
(c)
The upper bounds for unrestricted boolean operations on regular languages were derived in [7]. The proof that that the bounds are tight is very similar to the corresponding proof of Theorem 1 in [7]. For , let be the dialect of where plays the role of and the alphabet is restricted to , and let be the dialect of in which and are permuted, and the alphabet is restricted to ; see Figure 2.
Figure 2: Witnesses and for boolean operations. Next we complete the two DFAs by adding empty states. Restricting both DFAs to the alphabet , leads us to the problem of determining the complexity of two DFAs over the same alphabet. In the direct product of the two DFAs, by [2, Theorem 1] and computation for the cases , all states of the form , , , are reachable and pairwise distinguishable by words in for all proper boolean operations. As shown in Figure 3, the remaining states of the direct product are reachable; hence all states are reachable.
-
(a)
4 Conclusions
We have exhibited a single ternary language stream that is a witness for the maximal state complexities of star and reversal of union-free languages. Together with some dialects it also constitutes a witness for union, intersection, difference, symmetric difference, and product in case the alphabets of the two operands are the same. As was shown in [15] these bounds are the same as those for regular languages. We prove that our witness also has the largest syntactic semigroup and most complex atoms, and that these complexities are again the same as those for arbitrary regular languages. By adding a fourth input inducing the identity transformation to our witness we obtain witnesses for unrestricted binary operations, where the alphabets of the operands are not the same. The bounds here are again the same as those for regular languages. In summary, this shows that the complexity measures proposed in [4] do not distinguish union-free languages from regular languages.
References
- [1] Afonin, S., Golomazov, D.: Minimal union-free decompositions of regular languages. In: Dediu, A.H., et al. (eds.) LATA 2009. LNCS, vol. 5457, pp. 83–92. Springer (2009)
- [2] Bell, J., Brzozowski, J.A., Moreira, N., Reis, R.: Symmetric groups and quotient complexity of boolean operations. In: Esparza, J., et al. (eds.) ICALP 2014. LNCS, vol. 8573, pp. 1–12. Springer (2014)
- [3] Brzozowski, J.A.: Regular Expression Techniques for Sequential Circuits. Ph.D. thesis, Princeton University, Princeton, NJ (1962), http://maveric.uwaterloo.ca/˜brzozo/publication.html
- [4] Brzozowski, J.A.: In search of the most complex regular languages. Int. J. Found. Comput. Sc. 24(6), 691–708 (2013)
- [5] Brzozowski, J.A., Davies, S.: Quotient complexities of atoms of regular ideal languages. Acta Cybernet. 22, 293–311 (2015)
- [6] Brzozowski, J.A., Sinnamon, C.: Unrestricted state complexity of binary operations on regular and ideal languages (2016), updated 2017. http://arxiv.org/abs/1609.04439
- [7] Brzozowski, J.A., Sinnamon, C.: Unrestricted state complexity of binary operations on regular and ideal languages. Journal of Automata, Languages and Combinatorics 22(1–3), 29–59 (2017)
- [8] Brzozowski, J.A., Szykuła, M.: Large aperiodic semigroups. Int. J. Found. Comput. Sc. 26(7), 913–931 (2015)
- [9] Brzozowski, J.A., Tamm, H.: Complexity of atoms of regular languages. Int. J. Found. Comput. Sc. 24(7), 1009–1027 (2013)
- [10] Brzozowski, J.A., Tamm, H.: Theory of átomata. Theoret. Comput. Sci. 539, 13–27 (2014)
- [11] Crvenković, S., Dolinka, I., Ésik, Z.: On equations for union-free regular languages. Inform. and Comput. 164, 152–172 (2001)
- [12] Davies, S.: A new technique for reachability of states in concatenation automata (2017), https://arxiv.org/abs/1710.05061
- [13] Holzer, M., Kutrib, M.: Structure and complexity of some subregular language families. In: Konstantinidis, S., Moreira, N., Reis, R., Shallit, J. (eds.) The Role of Theory in Computer Science, pp. 59–82. World Scientific (2017)
- [14] Iván, S.: Complexity of atoms, combinatorially. Inform. Process. Lett. 116(5), 356–360 (2016)
- [15] Jirásková, G., Masopust, T.: Complexity in union-free regular languages. Int. J. Found. Comput. Sc. 22(7), 1639–1653 (2011)
- [16] Jirásková, G., Nagy, B.: On union-free and deterministic union-free languages. In: Baeten, J.C.M., Ball, T., de Boer., F.S. (eds.) TCS 2012. LNCS, vol. 7604, pp. 179–192. Springer (2012)
- [17] Kutrib, M., Wendlandt, M.: Concatenation-free languages. Theoretical Computer Science 679(Supplement C), 83–94 (2017)
- [18] McNaughton, R., Papert, S.: Counter-Free Automata. The MIT Press (1971)
- [19] Nagy, B.: Union-free regular languages and 1-cycle-free-path-automata. Publ. Math. Debrecen 68(1-2), 183–197 (2006)
- [20] Nagy, B.: On union complexity of regular languages. In: CINTI 2010. pp. 177–182. IEEE (2010)
- [21] Pin, J.E.: Syntactic semigroups. In: Handbook of Formal Languages, vol. 1: Word, Language, Grammar, pp. 679–746. Springer, New York, NY, USA (1997)
- [22] Salomaa, A., Wood, D., Yu, S.: On the state complexity of reversals of regular languages. Theoret. Comput. Sci. 320, 315–329 (2004)
- [23] Schützenberger, M.: On finite monoids having only trivial subgroups. Inform. and Control 8, 190–194 (1965)