11email: szilard.fazekas@ie.akita-u.ac.jp 22institutetext: Loughborough University, Department of Computer Science
22email: R.G.Mercas@lboro.ac.uk
Sweep Complexity Revisited††thanks: This work was supported by JSPS KAKENHI Grant Number JP23K10976.
Abstract
We study the sweep complexity of DFA in one-way jumping mode answering several questions posed earlier. This measure is the number of times in the worst case that such machines have to return to the beginning of their input after having skipped some of the symbols. The class of languages accepted by these machines strictly includes the regular class and constant sweep complexity allows exactly the acceptance of regular languages. However, we show that there exist machines with higher than constant complexity still only accepting regular languages and that in general the sweep complexity of an automaton does not distinguish between accepting regular and non-regular languages. We establish separation results for asymptotic classes defined by this complexity measure and give a surprising exponential/logarithmic relation between factors of certain inputs which can be verified by such machines.
Keywords:
automata deterministic one-way jumping sweep complexity.1 Introduction
In roughly the last three decades, several non-classical models of automata have been introduced to study the effect of processing inputs with simple machines in a non-sequential way. Such models include restarting automata [10], jumping automata [12], input revolving automata [4] and automata with translucent letters [13]. However, these models are either strictly more powerful or accept a class incomparable with the regular one.
One-way jumping finite automata (OWJFA) were introduced [5] to study the power of deterministic finite automata (DFA) performing non-sequential processing without completely discarding structural information about the inputs à la jumping automata. The resulting model is, in a sense, a minimal extension of finite automata. Machines are specified in exactly the same way as DFA allowing partial transition functions. The only change is the behaviour of the machine when encountering a letter for which the current state has no outgoing transition defined. In the classical case such inputs are rejected, but in one-way jumping mode the letters are skipped temporarily to be processed later. The relative order of the skipped symbols is maintained, and the automaton moves back to the beginning after each pass (called sweep here), seeing only the symbols previously skipped. Therefore one can also view this model as a DFA with an input tape which works as a restricted queue, or one that reads and erases symbols from a circular tape always jumping clockwise to the nearest letter for which it has a defined transition from the current state. When the transition function is complete, no symbols are skipped, so the machine behaves as ordinary DFA, which means that the class of languages accepted by DFA in one-way jumping mode trivially includes all regular languages.
Various properties of the accepted language class [1] and the status of fundamental decidability questions have been settled [2]. More powerful machines with this new processing mode have also been investigated, such as nondeterministic finite automata [3, 6], two-way finite automata [7], pushdown automata and linear bounded automata [6]. While the language classes defined by the models have no nontrivial closure properties under usual language operations, the accepting power and decidability issues raised some intriguing problems.
Except for linear bounded automata, the machine models mentioned above become more powerful when they are allowed to jump to the nearest symbol readable in the current state, which is not surprising. However, it has proven challenging to get a clear picture of just how powerful the new processing mode is, even in the simplest case when one starts from DFA. Such automata can accept all regular languages and the language class defined by them is incomparable with the context-free class, but included in the context-sensitive class and in DTIME() [1]. The separation results make use of combinations of a handful of regular languages together with a very simple type of non-regular languages which contain words having letter counts in a certain ratio, e.g., the frequently used accepted by the machine in Fig 2 (with states , or final). While this was enough to establish virtually all separations of interest, it left a significant gap in our understanding of the model: can such machines accept any (‘interesting’) non-regular languages apart from the ones which establish linear relationships among letter counts?
In this work we answer the question above, building on the investigation of sweep complexity of DFA in one-way jumping mode. Sweep count can be viewed as a measure of non-regular resources used by a machine posing the natural question of how much of this resource is needed to be able to accept non-regular languages? It has been shown that constant sweep complexity does not increase the accepting power of the machines [9] and that superconstant sweep complexity requires cycles containing ‘complementary deficient’ states [8]. In the latter paper it was conjectured that, in fact, any automaton with higher than constant sweep complexity accepts a non-regular language. In Section 3 we refute that conjecture by exhibiting a small DFA accepting a regular language while processing some inputs of length in sweeps. We also show that there is no non-trivial upper bound on the sweep complexity of regular languages, that is, there are machines with linear complexity accepting regular languages.
A natural question regarding the new complexity measure is whether there exists a meaningful hierarchy which does not collapse to the extremes of and . The aforementioned example shows that automata with logarithmic complexity exist, which answers another question posed earlier. Furthermore, following the line of computational complexity theory, we set out to explore whether the language classes defined through asymptotic complexity form a true hierarchy, that is whether there are languages which can be accepted by a machine with complexity but not by any with complexity, for various functions . In Section 4 we demonstrate that such a hierarchy exists by presenting languages with and sweep complexity, respectively.
Finally we mention that sweep complexity as an idea has been studied in other contexts, too: an interesting and thorough investigation of a similar flavor established infinite hierarchies in terms of sweep count for iterated uniform finite transducers [11], although that model is significantly more powerful than ours, so the techniques used there do not translate here as far as we can tell.
2 Preliminaries
We consider words over a finite alphabet, e.g., . The set of all words over is , which includes the empty word .
A DFA is a quintuple , where is the finite set of states, is the finite input alphabet, , is the transition function, is the start state, and is the set of final states. Elements of are referred to as (transition) rules of and we write instead of . A configuration of is a string in .
A DFA transitions from a configuration to a configuration if and , with , and . By extending the meaning of we denote this by and the reflexive and transitive closure of by . A word is accepted by a DFA if there exists , such that . The language accepted by is .
One-way jumping automata
The one-way jumping relation (denoted by )
between configurations from , was originally defined in [5]. Here we follow the slightly different definition of [8] which does not change the accepting power of the model, but is more convenient.
A tuple representing a deterministic right one-way jumping automaton (ROWJFA) is defined the same way as a DFA, where the configurations are also elements of the set . Let such that be the set of all of the letters from for which we have a transition defined from state p. A jumping transition (or jump, for short), denoted , is defined between configurations and if state p cannot read the letter , formally:
A ROWJFA can transition from configuration to configuration , which we denote by , if
A word is accepted by if . The language accepted by is defined by
While some texts define DFA having complete transition functions, our DFA allow partially defined ones. Indeed, the pairs for which no transition is defined enable the ROWJFA to perform a jump as opposed to rejecting the input as a DFA would. Hence, a ROWJFA with a complete transition function is just a DFA.
Sweeps are contiguous sequences of transitions on a given input, consisting of the steps from reading or jumping over the leftmost remaining input letter to reading or jumping over the rightmost one. If a position is jumped over, then the input symbol in that position is processed in a later sweep. The number of sweeps needed to process the whole input is the number of times the automaton reaches the last position of the original input word or, equivalently, one more than the maximum number of times any position is jumped over.
For an intuitive picture of sweeps, consider the computation of a ROWJFA on input as a table with rows representing the sweeps needed to process and columns representing positions in the input word. Cell in the table contains either a letter or a symbol representing that the letter has been read, e.g., . Once a letter has been marked read and erased it stays that way, so each column is a word of the form () for some and .
Example 1
In order to be able to analyze the boundary between regular and non-regular languages accepted by the one-way jumping model, as well as to quantify the use of resources beyond the capabilities of classical DFA, when it is the case, the following complexity measure was proposed [8], which gives us the number of sweeps performed by a machine in the ‘worst case’ for an input of length .
Let be a ROWJFA and , and let
be the computation of on the input . Sweep consists of , and we say that sweep ends in configuration . Then, for all , if sweep ends in configuration , then sweep is the sequence of configurations . The last sweep ends in configuration , that is, when all input symbols have been read. We define
When , then we set . The sweep complexity of a machine is a function , with being the maximum number of sweeps makes on processing inputs of length , formally:
In a sense the “most non-regular” word (using the largest amount of non-classical resources) of each length is considered. With this in mind, we can define complexity classes in the usual manner: the class consists of languages accepted by some one-way jumping machine with sweep complexity .
Observe that the sweep complexity of a machine can be defined to also take into account the sweep count of rejected words. However, this allows to ‘artificially’ increase the sweep complexity of machines with complexity without affecting regularity. Let be a machine accepting a regular language and a non-regular language with sweep complexities and , respectively, such that . Then we can construct a ROWJFA accepting with sweep complexity by adding a new initial state from which reading takes us to the initial state of while reading takes us to the initial state of . We set all states of non-final and this way we get that on inputs starting with the machine performs ’s computations but never accepts anything. Moreover, is regular if and only if was (see Fig. 5).
Each machine considered up to the point when the above measures were introduced [8] had either constant or, the maximal possible, linear sweep complexity, so it seemed that there is a gap between them. Moreover, the examples with linear complexity accepted non-regular languages, while as the theorem below states, the constant complexity languages are exactly the regular languages.
Theorem 2.1 ([9])
ROWJFA with sweep complexity accept regular languages.
The sufficient condition above was conjectured to be also necessary for regularity in general, evidenced by the known examples at that point.
Next, we investigate the apparent gap between constant and linear complexities and show that the presumed condition above is not necessary for regularity. Our search for machines with non-constant sweep complexity is directed by the following structural lemma, which says that such machines need to have two ‘complementary deficient states’ in a cycle.
Lemma 1 ([8])
If a ROWJFA has sweep complexity then its state diagram has a closed walk with states and , such that for , and has no transition defined for , while has no transition for .
3 Regular languages with non-constant sweep complexity
In this section we show that there is no sweep complexity separation between regular and non-regular languages by exhibiting automata which accept regular languages while requiring superconstant number of sweeps.
Consider first the automaton with states where is initial and final, and transitions are , described in Fig. 5.
Proposition 1
is regular.
Proof
We claim that . This is obviously a regular language (i.e., Fig. 8 where is the final state).
The computation for a word is rejecting if it finishes in either or . However, the only time that the machine ends up in state is when it reads an odd number of ’s, and, similarly, it ends in when it reads an odd number of ’s. Since both of these types of words are rejected, we conclude. ∎
Theorem 3.1
The sweep complexity of is .
Proof
Firstly, observe that in any sweep, while in or , the automaton fully reads any block of ’s, and, similarly, while in or , the automaton fully reads any block of ’s. Thus, the number of sweeps necessary to process a word consisting of unary blocks is never higher than that of processing the word . Now, for the inputs (and ), starting with the first (respectively, ) every third symbol is jumped over while the rest is read. This means that from an arbitrary word with unary blocks, after one sweep at most blocks remain. This immediately gives us that the machine makes at most logarithmically many sweeps. As for the other side, consider an input . Per the previous argument, after sweeps the remaining input will be or depending on the parity of , so the number of sweeps is at least . Eventually, the input is accepted according to Proposition 1, so the sweep complexity of is also .∎
The above results showcase the existence of ROWJFAs that accept regular languages while performing a logarithmic number of sweeps. Next we construct of a ROWJFA that accepts a regular language while requiring a linear number of sweeps in the worst case. Consider the automaton in Fig. 6 defined as
where the transitions from are given by the edges in the figure.
Proposition 2
The sweep complexity of is .
Proof
To see that the complexity is , consider the word , for . In this case, from we go first to where we jump over all the remaining ’s, then we move back to where we jump over all the remaining ’s, and we are left with to process. After the th sweep, we are only left with to process, which takes us from to , and we accept.
For the complexity, observe that the above computation is indeed the longest possible. Once we reach we either accept or reject a word in at most sweeps, same as in Theorem 3.1. Of course, this part also directly follows from the fact that all ROWJFA process their inputs in sweeps.∎
Proposition 3
is regular.
Proof
We show that . This is obviously a regular language (i.e., Fig. 8 where is the final state).
To show that indeed is the language containing every binary word that has odd number of ’s and ’s, first note that the right hand side automaton consisting only of the -labelled states, accepts every language that has an even number of ’s and ’s, as shown by Proposition 1.
To reach we have to read exactly one and one starting from either or . Since from the start state we can reach or by processing an even number of ’s and ’s, possibly with jumps, our conclusion follows. ∎
As a consequence of Propositions 2 and 3, we know that the class of regular languages has no upper bound in terms of sweep complexity, since the sweep complexity of any is in . The left hand cycle in the automata described in Fig. 6 also showcases that while the conditions from Lemma 1 are necessary for non-regularity (as it requires superconstant complexity), they are not sufficient.
4 Separation results for the language classes and
Consider the prolongable morphism , starting from the word . We get , , etc. The infinite word is a fixed point of . It is easy to see that in all ’s stand alone, that is, we never have blocks of ’s longer than , and the lengths of the blocks of ’s are , and so on111The sequence given by the lengths of blocks is A001511 in OEIS; its most relevant characterization for us is that is the number of trailing zeros in the binary expansion of , since this means that is for powers of . When applying , each introduces a new block of ’s of length and extends a block of ’s by one, while the number of ’s doubles. Thus every other block of ’s gets longer by one on each application of , because of the preceding it. A simple inductive argument shows that the last block of ’s in has length , and is preceded by occurrences of ’s, separated by blocks of ’s.
Lemma 2
Consider the morphism given by , . The following statements hold for any :
-
(i)
;
-
(ii)
if , then ;
-
(iii)
, where , and for all .
Proof
When , then , so for all three claims hold. Suppose they hold for . By and we have that has the form , satisfying for . Then,
From this we can conclude that also holds for . Further, by the equation above we have with . Finally, because of we also get that and for all . ∎
In what follows we analyze the language accepted by the automaton , described in Fig. 8.
Lemma 3
For any , the ROWJFA accepts in sweeps.
Proof
We show that the machine accepts , for any . From state after reading/jumping through a factor of the form the automaton gets back to state . In fact, , for any , so in one sweep the factor is reduced to . From Lemma 2 we can see that we can write , which means that one sweep of acts as the inverse of on those words when starting from state , that is,
This means that in sweeps the machine reduces to . Finally, for , we have , which is accepted by in a single sweep.∎
Lemma 4
The ROWJFA accepts a non-regular language.
Proof
By Lemma 3 we know that for any the machine accepts , which means that for arbitrarily long unary factors consisting of ’s, there is some word in having such a factor as a suffix. Our strategy is to first establish a non-linear relation between the length of those unary factors and the length of the preceding factors in all words accepted by . Then, by a pumping argument we show that a classical finite automaton cannot verify such a non-linear relation, therefore cannot be regular.
Claim 1. Words of the form are only accepted if .
Proof of Claim 1:
In any sweep, any block of ’s which the automaton starts to read is read and erased completely through a sequence of transitions . For the automaton to jump over a block of ’s, it needs to arrive to its start in state . Then it jumps over it to the next , after which it starts and reads completely the following block of ’s, as described earlier. This means that the machine can never jump over two consecutive blocks of ’s. From here we get that if at the beginning of the sweep the number of blocks was , then after the sweep it is at most .
Furthermore, in each sweep, each block of ’s is reduced by at most . This means that the automaton needs at least sweeps to read a block , in each of which it reduces the number of blocks by half (or more). Thus we can conclude that in order to accept a word with a suffix , we have to start out with at least blocks of ’s preceding it.
Claim 2. No finite automaton can accept .
Proof of Claim 2:
Suppose the opposite, i.e., that there exists some complete DFA having states such that . We know that there are words in the language with arbitrarily long suffixes of ’s, so there is a for some word and exponent . By a usual pumping argument, this means that there exists some with such that for any . However, for a large enough this contradicts Claim 1, as the block of ’s can outgrow any upper bound in terms of the length of .
Our result follows as a result of Claims 1 and 2.∎
Lemma 5
The sweep complexity of is .
Proof
As , by Lemma 3 we have that the sweep complexity of is , so what remains to show is that it is also .
We first note that within a sweep all blocks of ’s separated by are fully processed (including any prefix of ’s), while for any symbols that were jumped over, the entire block that they were part of it was jumped over. Following the argument in the proof of Claim 1 of Lemma 4, in each sweep the number of blocks of ’s is reduced by at least half, which means that after sweeps there are no more blocks of on the tape. Then, the machine either accepts in one sweep or it rejects the input. This leads to our conclusion.∎
Theorem 4.1
Proof
Lemma 6
Any automaton which accepts has sweep complexity .
Proof
We know that every machine has sweep complexity , so it is enough to show that it is not possible to accept with sublinear sweep complexity. For that we assume that such an automaton, say exists, and derive a contradiction.
If had linear sweep complexity, then it could have computations on infinitely many inputs in which all sweeps process a constant number of symbols. However, with sublinear complexity we get that for any constant and for all long enough inputs , during the processing of at least one sweep reads more than symbols. We also know that for any . Let where is the number of states of and consider an input with large enough that the machine reads more than symbols in some sweep while processing . The remaining input at the beginning of that sweep is for some such that . During the sweep the machine reads where . This means that either or . Without loss of generality we can assume . This gives us that while reading the automaton must visit some state at least twice while reading only ’s, so we get that for some . But then, by a usual pumping argument the machine also needs to accept contradicting our assumption that and concluding the proof.∎
Theorem 4.2
For any with we have .
Proof
By Lemma 6 we know that for any sublinear function . The two-state automaton accepts the language with sweep complexity . This is easy to see when considering the worst-case inputs of the form for .∎
5 Concluding remarks
Apart from the complexity considerations listed below we think the proof of Lemma 4 contains a detail worth emphasizing: the automaton can verify a logarithmic/exponential relation between two factors of suitably chosen inputs! We found this very surprising since we still basically deal with DFA which cannot store information and cannot ‘choose’ which symbols to read or jump over222Iterated uniform finite transducers can also verify such relationships, albeit their computing power is much stronger. [11].
We presented automata for all pairings of regular and non-regular languages with logarithmic and linear worst case sweep complexity. This way we disproved the conjecture on the constant sweep requirement for regularity [9] and answered several questions regarding sweep complexity posed in [8]:
-
1.
Is the language of each machine with complexity non-regular? NO, by Section 3.
- 2.
-
3.
Is there a language with sweep complexity between constant and linear, that is, all machines accepting it have superconstant complexity and at least one has sublinear? YES, by Theorem 4.1.
- 4.
-
5.
Are machines less complex in the case of binary alphabets, given that the complementary deficient pairs of Lemma 1 are predetermined? NO, illustrated by the fact that all results have been obtained over a binary alphabet.
These coarser forms of Questions 2 and 3 have been answered here, but for a complete picture one would want to know whether there exist machines with arbitrary (constructible) sublinear complexity and its equivalent for languages. The most obvious choices for such a study would probably be complexities and , for constants and . Another angle related to Question 4 is to study the lower bound of non-regularity: logarithmic complexity can produce non-regular languages, but can we do it with less of this ‘non-regular’ resource? In the case of Question 5, our answer may be refined, as there may by some sublinear such that the machines of complexity all accept regular or all accept non-regular languages, although we have not seen anything that indicates such perplexing behaviour.
Another interesting direction relates to our original motivation in looking at the complexity of these automata, deciding regularity. The question more generally becomes, is it decidable given a machine or language and a function , whether the machine/language has complexity (or its one-sided variants with and )? We suspect that the answer is yes at least in the case of constant and linear functions but have no idea about the logarithmic and more complicated cases.
References
- [1] Beier, S., Holzer, M.: Properties of right one-way jumping finite automata. Theoretical Computer Science 798, 78 – 94 (2019)
- [2] Beier, S., Holzer, M.: Decidability of right one-way jumping finite automata. International Journal of Foundations of Computer Science 31(6), 805–825 (2020)
- [3] Beier, S., Holzer, M.: Nondeterministic right one-way jumping finite automata. Information and Computation 284, 104687 (2022), selected papers from DCFS 2019
- [4] Bensch, S., Bordihn, H., Holzer, M., Kutrib, M.: On input-revolving deterministic and nondeterministic finite automata. Information and Computation 207(11), 1140–1155 (2009)
- [5] Chigahara, H., Fazekas, S.Z., Yamamura, A.: One-way jumping finite automata. International Journal of Foundations of Computer Science 27(3), 391–405 (2016)
- [6] Fazekas, S.Z., Hoshi, K., Yamamura, A.: The effect of jumping modes on various automata models. Natural Computing (2021)
- [7] Fazekas, S.Z., Hoshi, K., Yamamura, A.: Two-way deterministic automata with jumping mode. Theoretical Computer Science 864, 92–102 (2021)
- [8] Fazekas, S.Z., Mercaş, R., Wu, O.: Complexities for jumps and sweeps. J. Autom. Lang. Comb. 27(1-3), 131–149 (2022)
- [9] Fazekas, S.Z., Yamamura, A.: On regular languages accepted by one-way jumping finite automata. In: 8th Workshop on Non-Classical Models of Automata and Applications, Short Papers. pp. 7–14 (2016)
- [10] Jančar, P., Mráz, F., Plátek, M., Vogel, J.: Restarting automata. In: Reichel, H. (ed.) Fundamentals of Computation Theory. pp. 283–292. Springer Berlin Heidelberg, Berlin, Heidelberg (1995)
- [11] Kutrib, M., Malcher, A., Mereghetti, C., Palano, B.: Descriptional complexity of iterated uniform finite-state transducers. Information and Computation 284, 104691 (2022)
- [12] Meduna, A., Zemek, P.: Jumping finite automata. International Journal of Foundations of Computer Science 23(7), 1555–1578 (2012)
- [13] Nagy, B., Otto, F.: Finite-state acceptors with translucent letters. In: BILC 2011 - 1st International Workshop on AI Methods for Interdisciplinary Research in Language and Biology, ICAART 2011. pp. 3–13 (2011)