This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Akita University, Department of Mathematical Science and Electrical-Electronic-Computer Engineering
11email: szilard.fazekas@ie.akita-u.ac.jp
22institutetext: Loughborough University, Department of Computer Science
22email: R.G.Mercas@lboro.ac.uk

Sweep Complexity Revisitedthanks: This work was supported by JSPS KAKENHI Grant Number JP23K10976.

Szilárd Zsolt Fazekas 11 0001-5319-0395    Robert Mercaş 22 0001-6034-433X
Abstract

We study the sweep complexity of DFA in one-way jumping mode answering several questions posed earlier. This measure is the number of times in the worst case that such machines have to return to the beginning of their input after having skipped some of the symbols. The class of languages accepted by these machines strictly includes the regular class and constant sweep complexity allows exactly the acceptance of regular languages. However, we show that there exist machines with higher than constant complexity still only accepting regular languages and that in general the sweep complexity of an automaton does not distinguish between accepting regular and non-regular languages. We establish separation results for asymptotic classes defined by this complexity measure and give a surprising exponential/logarithmic relation between factors of certain inputs which can be verified by such machines.

Keywords:
automata deterministic one-way jumping sweep complexity.

1 Introduction

In roughly the last three decades, several non-classical models of automata have been introduced to study the effect of processing inputs with simple machines in a non-sequential way. Such models include restarting automata [10], jumping automata [12], input revolving automata [4] and automata with translucent letters [13]. However, these models are either strictly more powerful or accept a class incomparable with the regular one.

One-way jumping finite automata (OWJFA) were introduced [5] to study the power of deterministic finite automata (DFA) performing non-sequential processing without completely discarding structural information about the inputs à la jumping automata. The resulting model is, in a sense, a minimal extension of finite automata. Machines are specified in exactly the same way as DFA allowing partial transition functions. The only change is the behaviour of the machine when encountering a letter for which the current state has no outgoing transition defined. In the classical case such inputs are rejected, but in one-way jumping mode the letters are skipped temporarily to be processed later. The relative order of the skipped symbols is maintained, and the automaton moves back to the beginning after each pass (called sweep here), seeing only the symbols previously skipped. Therefore one can also view this model as a DFA with an input tape which works as a restricted queue, or one that reads and erases symbols from a circular tape always jumping clockwise to the nearest letter for which it has a defined transition from the current state. When the transition function is complete, no symbols are skipped, so the machine behaves as ordinary DFA, which means that the class of languages accepted by DFA in one-way jumping mode trivially includes all regular languages.

Various properties of the accepted language class [1] and the status of fundamental decidability questions have been settled [2]. More powerful machines with this new processing mode have also been investigated, such as nondeterministic finite automata [3, 6], two-way finite automata [7], pushdown automata and linear bounded automata [6]. While the language classes defined by the models have no nontrivial closure properties under usual language operations, the accepting power and decidability issues raised some intriguing problems.

Except for linear bounded automata, the machine models mentioned above become more powerful when they are allowed to jump to the nearest symbol readable in the current state, which is not surprising. However, it has proven challenging to get a clear picture of just how powerful the new processing mode is, even in the simplest case when one starts from DFA. Such automata can accept all regular languages and the language class defined by them is incomparable with the context-free class, but included in the context-sensitive class and in DTIME(n2n^{2}[1]. The separation results make use of combinations of a handful of regular languages together with a very simple type of non-regular languages which contain words having letter counts in a certain ratio, e.g., the frequently used Lab={w{a,b}w contains as many a’s as b’s}L_{ab}=\{w\in\{a,b\}^{*}\mid w\textrm{ contains as many }a\textrm{'s as }b\textrm{'s}\} accepted by the machine 𝒜\mathcal{A} in Fig 2 (with states 𝟏\mathbf{1}, or 𝟐\mathbf{2} final). While this was enough to establish virtually all separations of interest, it left a significant gap in our understanding of the model: can such machines accept any (‘interesting’) non-regular languages apart from the ones which establish linear relationships among letter counts?

In this work we answer the question above, building on the investigation of sweep complexity of DFA in one-way jumping mode. Sweep count can be viewed as a measure of non-regular resources used by a machine posing the natural question of how much of this resource is needed to be able to accept non-regular languages? It has been shown that constant sweep complexity does not increase the accepting power of the machines [9] and that superconstant sweep complexity requires cycles containing ‘complementary deficient’ states [8]. In the latter paper it was conjectured that, in fact, any automaton with higher than constant sweep complexity accepts a non-regular language. In Section 3 we refute that conjecture by exhibiting a small DFA accepting a regular language while processing some inputs of length nn in Ω(logn)\Omega(\log n) sweeps. We also show that there is no non-trivial upper bound on the sweep complexity of regular languages, that is, there are machines with linear complexity accepting regular languages.

A natural question regarding the new complexity measure is whether there exists a meaningful hierarchy which does not collapse to the extremes of 𝒪(1)\mathcal{O}(1) and 𝒪(n)\mathcal{O}(n). The aforementioned example shows that automata with logarithmic complexity exist, which answers another question posed earlier. Furthermore, following the line of computational complexity theory, we set out to explore whether the language classes defined through asymptotic complexity form a true hierarchy, that is whether there are languages which can be accepted by a machine with 𝒪(f(n))\mathcal{O}(f(n)) complexity but not by any with o(f(n))o(f(n)) complexity, for various functions f(n)f(n). In Section 4 we demonstrate that such a hierarchy exists by presenting languages with Θ(logn)\Theta(\log n) and Θ(n)\Theta(n) sweep complexity, respectively.

Finally we mention that sweep complexity as an idea has been studied in other contexts, too: an interesting and thorough investigation of a similar flavor established infinite hierarchies in terms of sweep count for iterated uniform finite transducers [11], although that model is significantly more powerful than ours, so the techniques used there do not translate here as far as we can tell.

2 Preliminaries

We consider words over a finite alphabet, e.g., Σ={a,b}\Sigma=\{a,b\}. The set of all words over Σ\Sigma is Σ\Sigma^{*}, which includes the empty word ε\varepsilon.

A DFA is a quintuple M=(Q,Σ,R,s,F)M=(Q,\Sigma,R,\textbf{s},F), where QQ is the finite set of states, Σ\Sigma is the finite input alphabet, ΣQ=\Sigma\cap Q=\emptyset, R:Q×ΣQR:Q\times\Sigma\rightarrow Q is the transition function, sQ\textbf{s}\in Q is the start state, and FQF\subseteq Q is the set of final states. Elements of RR are referred to as (transition) rules of MM and we write pyqR\textbf{p}y\rightarrow\textbf{q}\in R instead of R(p,y)=qR(\textbf{p},y)=\textbf{q}. A configuration of MM is a string in Q×ΣQ\times\Sigma^{*}.

A DFA transitions from a configuration pw\textbf{p}w to a configuration qw\textbf{q}w^{\prime} if w=aww=aw^{\prime} and paqR\textbf{p}a\rightarrow\textbf{q}\in R, with p,qQ\textbf{p},\textbf{q}\in Q, w,wΣw,w^{\prime}\in\Sigma^{*} and aΣa\in\Sigma. By extending the meaning of \rightarrow we denote this by pwqw\textbf{p}w\rightarrow\textbf{q}w^{\prime} and the reflexive and transitive closure of \rightarrow by \rightarrow^{*}. A word ww is accepted by a DFA MM if there exists fF\textbf{f}\in F, such that swf\textbf{s}w\rightarrow^{*}\textbf{f}. The language accepted by MM is {wΣfF:swf}\{w\in\Sigma^{*}\mid\exists\textbf{f}\in F:\textbf{s}w\rightarrow^{*}\textbf{f}\}.

𝟏\mathbf{1}𝟐\mathbf{2}aabb
Figure 1: The only two-state ROWJFA satisfying Lemma 1

position: 0 1 2 3 4 5 6input𝐚𝐝𝐜𝐛𝐜𝐛𝐚after sweep 1ε𝐝𝐜𝐛𝐜𝐛εafter sweep 2ε𝐝𝐜ε𝐜εεafter sweep 3ε𝐝εεεεεafter sweep 4εεεεεεε\begin{array}[]{r|ccccccc}position:&\mbox{ }0&\mbox{ }1&\mbox{ }2&\mbox{ }3&\mbox{ }4&\mbox{ }5&\mbox{ }6\\ \hline\cr\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\textrm{input}&\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bf{a}&\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bf{d}&\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bf{c}&\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bf{b}&\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bf{c}&\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bf{b}&\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bf{a}\\ \textrm{after sweep }1&\varepsilon&\bf{d}&\bf{c}&\bf{b}&\bf{c}&\bf{b}&\varepsilon\\ \textrm{after sweep }2&\varepsilon&\bf{d}&\bf{c}&\varepsilon&\bf{c}&\varepsilon&\varepsilon\\ \textrm{after sweep }3&\varepsilon&\bf{d}&\varepsilon&\varepsilon&\varepsilon&\varepsilon&\varepsilon\\ \textrm{after sweep }4&\varepsilon&\varepsilon&\varepsilon&\varepsilon&\varepsilon&\varepsilon&\varepsilon\end{array}

Figure 2: The computation table for adcbcbaadcbcba by the machine in Example 1.

One-way jumping automata
The one-way jumping relation (denoted by \circlearrowright) between configurations from QΣQ\Sigma^{*}, was originally defined in [5]. Here we follow the slightly different definition of [8] which does not change the accepting power of the model, but is more convenient.

A tuple M=(Q,Σ,R,s,F)M=(Q,\Sigma,R,\textbf{s},F) representing a deterministic right one-way jumping automaton (ROWJFA) is defined the same way as a DFA, where the configurations are also elements of the set Q×ΣQ\times\Sigma^{*}. Let Σp={bΣqQ\Sigma_{p}=\{b\in\Sigma\mid\exists\textbf{q}\in Q such that pbqR}\textbf{p}b\rightarrow\textbf{q}\in R\} be the set of all of the letters from Σ\Sigma for which we have a transition defined from state p. A jumping transition (or jump, for short), denoted \circlearrowright, is defined between configurations pax\textbf{p}ax and pxa\textbf{p}xa if state p cannot read the letter aa, formally:

paxpxa, if aΣΣp.\textbf{p}ax\circlearrowright\textbf{p}xa,\mbox{ if }a\in\Sigma\setminus\Sigma_{p}.

A ROWJFA can transition from configuration pax\textbf{p}ax to configuration qy\textbf{q}y, which we denote by paxqy\textbf{p}ax\vdash\textbf{q}y, if

(i)\displaystyle(i) paxqy, where x=y and paqR, as defined earlier, or\displaystyle\textbf{p}ax\rightarrow\textbf{q}y,\mbox{ where }x=y\mbox{ and }\textbf{p}a\rightarrow\textbf{q}\in R,\mbox{ as defined earlier, or}
(ii)\displaystyle(ii) paxpxa, when aΣΣp,p=q and xa=y.\displaystyle\textbf{p}ax\circlearrowright\textbf{p}xa,\mbox{ when }a\in\Sigma\setminus\Sigma_{p},\textbf{p}=\textbf{q}\mbox{ and }xa=y.

A word ww is accepted by MM if swf\textbf{s}w\vdash^{*}\textbf{f}. The language accepted by MM is defined by L(M)={xΣfF:sxf}.L(M)=\{x\in\Sigma^{*}\mid\exists\textbf{f}\in F:\textbf{s}x\vdash^{*}\textbf{f}\}.

While some texts define DFA having complete transition functions, our DFA allow partially defined ones. Indeed, the pairs (p,a)Q×Σ(\textbf{p},a)\in Q\times\Sigma for which no transition is defined enable the ROWJFA to perform a jump as opposed to rejecting the input as a DFA would. Hence, a ROWJFA with a complete transition function is just a DFA.

Sweeps are contiguous sequences of transitions on a given input, consisting of the steps from reading or jumping over the leftmost remaining input letter to reading or jumping over the rightmost one. If a position is jumped over, then the input symbol in that position is processed in a later sweep. The number of sweeps needed to process the whole input is the number of times the automaton reaches the last position of the original input word or, equivalently, one more than the maximum number of times any position is jumped over.

For an intuitive picture of sweeps, consider the computation of a ROWJFA MM on input ww as a table with rows representing the kk sweeps needed to process ww and columns representing positions in the input word. Cell i,ji,j in the table contains either a letter or a symbol representing that the letter has been read, e.g., ε\varepsilon. Once a letter has been marked read and erased it stays that way, so each column is a word of the form aεka^{\ell}\varepsilon^{k-\ell} (=a=a^{\ell}) for some aΣa\in\Sigma and 1k1\leq\ell\leq k.

𝟏\mathbf{1}𝟐\mathbf{2}𝟑\mathbf{3}𝟒\mathbf{4}𝟓\mathbf{5}𝟔\mathbf{6}𝟕\mathbf{7}𝟖\mathbf{8}aaaabbbbccccdddd
Figure 3: ROWJFA M1M_{1} accepting all ww with |w|a=|w|b=|w|c=2|w|_{a}=|w|_{b}=|w|_{c}=2 and |w|d1|w|_{d}\geq 1.
Example 1

Consider the automaton M1M_{1} in Fig. 3 and the input adcbcbaadcbcba, processed in the order aabbccdaabbccd. The ROWJFA jumps over the letter dd three times before processing it, hence the number of sweeps is four. Moreover, its computation table is described in Fig. 2.

In order to be able to analyze the boundary between regular and non-regular languages accepted by the one-way jumping model, as well as to quantify the use of resources beyond the capabilities of classical DFA, when it is the case, the following complexity measure was proposed [8], which gives us the number of sweeps performed by a machine in the ‘worst case’ for an input of length nn.

Let MM be a ROWJFA and wL(M)w\in L(M), and let

p0wp1w1p2w2pm, where p0=s and pmF,\textbf{p}_{0}w\vdash\textbf{p}_{1}w_{1}\vdash\textbf{p}_{2}w_{2}\vdash\dots\vdash\textbf{p}_{m},\mbox{ where }\textbf{p}_{0}=s\mbox{ and }\textbf{p}_{m}\in F,

be the computation of MM on the input ww. Sweep 11 consists of 𝐩0w𝐩|w|w|w|\mathbf{p}_{0}w\vdash^{*}\mathbf{p}_{|w|}w_{|w|}, and we say that sweep 11 ends in configuration 𝐩|w|w|w|\mathbf{p}_{|w|}w_{|w|}. Then, for all i1i\geq 1, if sweep ii ends in configuration 𝐩siwsi\mathbf{p}_{s_{i}}w_{s_{i}}, then sweep i+1i+1 is the sequence of configurations 𝐩siwsi𝐩si+|wsi|wsi+|wsi|\mathbf{p}_{s_{i}}w_{s_{i}}\vdash^{*}\mathbf{p}_{s_{i}+|w_{s_{i}}|}w_{s_{i}+|w_{s_{i}}|}. The last sweep ends in configuration 𝐩m\mathbf{p}_{m}, that is, when all input symbols have been read. We define

E(M,w)={the number of sweeps performed by M on w}.E(M,w)=\{\mbox{the number of sweeps performed by }M\mbox{ on }w\}.

When wL(M)w\notin L(M), then we set E(M,w)=0E(M,w)=0. The sweep complexity of a machine MM is a function scM:sc_{M}:\mathbb{N}\rightarrow\mathbb{N}, with scM(n)sc_{M}(n) being the maximum number of sweeps MM makes on processing inputs wL(M)w\in L(M) of length nn, formally:

scM(n)=max{E(M,w)wΣn}.sc_{M}(n)=\max\{E(M,w)\mid w\in\Sigma^{n}\}.

In a sense the “most non-regular” word (using the largest amount of non-classical resources) of each length is considered. With this in mind, we can define complexity classes in the usual manner: the class SWEEP(f(n))\mathrm{SWEEP}(f(n)) consists of languages accepted by some one-way jumping machine with sweep complexity 𝒪(f(n))\mathcal{O}(f(n)).

Observe that the sweep complexity of a machine can be defined to also take into account the sweep count of rejected words. However, this allows to ‘artificially’ increase the sweep complexity of machines with complexity o(n)o(n) without affecting regularity. Let AA be a machine accepting a regular language and BB a non-regular language with sweep complexities f(n)f(n) and g(n)g(n), respectively, such that f(n)o(g(n))f(n)\in o(g(n)). Then we can construct a ROWJFA accepting aL(A)aL(A) with sweep complexity g(n)g(n) by adding a new initial state from which reading aa takes us to the initial state of AA while reading bb takes us to the initial state of BB. We set all states of BB non-final and this way we get that on inputs starting with bb the machine performs BB’s computations but never accepts anything. Moreover, aL(A)aL(A) is regular if and only if L(A)L(A) was (see Fig. 5).

Each machine considered up to the point when the above measures were introduced [8] had either constant or, the maximal possible, linear sweep complexity, so it seemed that there is a gap between them. Moreover, the examples with linear complexity accepted non-regular languages, while as the theorem below states, the constant complexity languages are exactly the regular languages.

Theorem 2.1 ([9])

ROWJFA with 𝒪(1){\mathcal{O}}(1) sweep complexity accept regular languages.

The sufficient condition above was conjectured to be also necessary for regularity in general, evidenced by the known examples at that point.

Next, we investigate the apparent gap between constant and linear complexities and show that the presumed condition above is not necessary for regularity. Our search for machines with non-constant sweep complexity is directed by the following structural lemma, which says that such machines need to have two ‘complementary deficient states’ in a cycle.

Lemma 1 ([8])

If a ROWJFA has sweep complexity ω(1)\omega(1) then its state diagram has a closed walk with states 𝐩\mathbf{p} and 𝐪\mathbf{q}, such that 𝐩au𝐪bv𝐩\mathbf{p}au\rightarrow^{*}\mathbf{q}bv\rightarrow^{*}\mathbf{p} for a,bΣa,b\in\Sigma, u,vΣu,v\in\Sigma^{*} and 𝐩\mathbf{p} has no transition defined for bb, while 𝐪\mathbf{q} has no transition for aa.

3 Regular languages with non-constant sweep complexity

In this section we show that there is no sweep complexity separation between regular and non-regular languages by exhibiting automata which accept regular languages while requiring superconstant number of sweeps.

Consider first the automaton \mathcal{B} with states {𝟏,𝟐,𝟑}\{\mathbf{1},\mathbf{2},\mathbf{3}\} where 𝟏\mathbf{1} is initial and final, and transitions are {𝟏a𝟐,𝟐a𝟏,𝟏b𝟑,𝟑b𝟏}\{\mathbf{1}a\rightarrow\mathbf{2},\mathbf{2}a\rightarrow\mathbf{1},\mathbf{1}b\rightarrow\mathbf{3},\mathbf{3}b\rightarrow\mathbf{1}\}, described in Fig. 5.

𝟏\mathbf{1}𝐀\mathbf{A}𝐁\mathbf{B}aabb
Figure 4: Artificially increasing the automaton’s complexity by adding non-functional states (all final states in 𝐀\mathbf{A}).
𝟏\mathbf{1}𝟐\mathbf{2}𝟑\mathbf{3}aaaabbbb
Figure 5: ROWJFA \mathcal{B} accepts {w{a,b}|w|a and |w|b are even}\{w\in\{a,b\}^{*}\mid|w|_{a}\mbox{ and }|w|_{b}\mbox{ are even}\} with sweep complexity Θ(logn)\Theta(\log n).
Proposition 1

L()L(\mathcal{B}) is regular.

Proof

We claim that L()={w{a,b}|w|a and |w|b are even}L(\mathcal{B})=\{w\in\{a,b\}^{*}\mid|w|_{a}\mbox{ and }|w|_{b}\mbox{ are even}\}. This is obviously a regular language (i.e., Fig. 8 where 𝟎𝟎\mathbf{00} is the final state).

The computation for a word ww is rejecting if it finishes in either 𝟐\mathbf{2} or 𝟑\mathbf{3}. However, the only time that the machine ends up in state 𝟐\mathbf{2} is when it reads an odd number of aa’s, and, similarly, it ends in 𝟑\mathbf{3} when it reads an odd number of bb’s. Since both of these types of words are rejected, we conclude. ∎

Theorem 3.1

The sweep complexity of \mathcal{B} is Θ(logn)\Theta(\log n).

Proof

Firstly, observe that in any sweep, while in 𝟏\mathbf{1} or 𝟐\mathbf{2}, the automaton fully reads any block of aa’s, and, similarly, while in 𝟏\mathbf{1} or 𝟑\mathbf{3}, the automaton fully reads any block of bb’s. Thus, the number of sweeps necessary to process a word ww consisting of 2n2n unary blocks is never higher than that of processing the word (ab)n(ab)^{n}. Now, for the inputs (ab)n(ab)^{n} (and (ba)n(ba)^{n}), starting with the first bb (respectively, aa) every third symbol is jumped over while the rest is read. This means that from an arbitrary word with kk unary blocks, after one sweep at most k3+1\lfloor\frac{k}{3}\rfloor+1 blocks remain. This immediately gives us that the machine makes at most logarithmically many sweeps. As for the other side, consider an input w=(ab)6kw=(ab)^{6^{k}}. Per the previous argument, after ii\leq sweeps the remaining input will be (ab)6k3i(ab)^{\frac{6^{k}}{3^{i}}} or (ba)6k3i(ba)^{\frac{6^{k}}{3^{i}}} depending on the parity of ii, so the number of sweeps is at least log3|w|2=k\log_{3}\frac{|w|}{2}=k. Eventually, the input is accepted according to Proposition 1, so the sweep complexity of \mathcal{B} is also Ω(logn)\Omega(\log n).∎

The above results showcase the existence of ROWJFAs that accept regular languages while performing a logarithmic number of sweeps. Next we construct of a ROWJFA that accepts a regular language while requiring a linear number of sweeps in the worst case. Consider the automaton 𝒞\mathcal{C} in Fig. 6 defined as

𝒞={{𝐀𝟎,𝐀𝟏,𝐀𝟐,𝐀𝟑,𝐁𝟏,𝐁𝟐,𝐁𝟑},{a,b},R,𝐀𝟎,{𝐁𝟏}},\mathcal{C}=\{\{\mathbf{A0},\mathbf{A1},\mathbf{A2},\mathbf{A3},\mathbf{B1},\mathbf{B2},\mathbf{B3}\},\{a,b\},R,\mathbf{A0},\{\mathbf{B1}\}\},

where the transitions from RR are given by the edges in the figure.

𝐀𝟎\mathbf{A0}𝐀𝟏\mathbf{A1}𝐀𝟐\mathbf{A2}𝐀𝟑\mathbf{A3}𝐁𝟏\mathbf{B1}𝐁𝟑\mathbf{B3}𝐁𝟐\mathbf{B2}aaaabbbbbbbbbbaaaaaa
Figure 6: ROWJFA 𝒞\mathcal{C} accepts {w{a,b}|w|a and |w|b are odd}\{w\in\{a,b\}^{*}\mid|w|_{a}\mbox{ and }|w|_{b}\mbox{ are odd}\} with sweep complexity Θ(n)\Theta(n).
Proposition 2

The sweep complexity of 𝒞\mathcal{C} is Θ(n)\Theta(n).

Proof

To see that the complexity is Ω(n)\Omega(n), consider the word a2n+1b2n+1a^{2n+1}b^{2n+1}, for n>1n>1. In this case, from 𝐀𝟎\mathbf{A0} we go first to 𝐀𝟐\mathbf{A2} where we jump over all the remaining aa’s, then we move back to 𝐀𝟎\mathbf{A0} where we jump over all the remaining bb’s, and we are left with a2n1b2n1a^{2n-1}b^{2n-1} to process. After the nnth sweep, we are only left with abab to process, which takes us from 𝐀𝟎\mathbf{A0} to 𝐁𝟏\mathbf{B1}, and we accept.

For the 𝒪(n)\mathcal{O}(n) complexity, observe that the above computation is indeed the longest possible. Once we reach 𝐁𝟏\mathbf{B1} we either accept or reject a word in at most 𝒪(logn){\mathcal{O}}(\log n) sweeps, same as in Theorem 3.1. Of course, this part also directly follows from the fact that all ROWJFA process their inputs in 𝒪(n)\mathcal{O}(n) sweeps.∎

Proposition 3

L(𝒞)L(\mathcal{C}) is regular.

Proof

We show that L(𝒞)={w{a,b}|w|a and |w|b are odd}L(\mathcal{C})=\{w\in\{a,b\}^{*}\mid|w|_{a}\mbox{ and }|w|_{b}\mbox{ are odd}\}. This is obviously a regular language (i.e., Fig. 8 where 𝟏𝟏\mathbf{11} is the final state).

To show that indeed L(𝒞)L(\mathcal{C}) is the language containing every binary word that has odd number of aa’s and bb’s, first note that the right hand side automaton consisting only of the 𝐁\mathbf{B}-labelled states, accepts every language that has an even number of aa’s and bb’s, as shown by Proposition 1.

To reach 𝐁𝟏\mathbf{B1} we have to read exactly one aa and one bb starting from either 𝐀𝟎\mathbf{A0} or 𝐀𝟐\mathbf{A2}. Since from the start state 𝐀𝟎\mathbf{A0} we can reach 𝐀𝟎\mathbf{A0} or 𝐀𝟐\mathbf{A2} by processing an even number of aa’s and bb’s, possibly with jumps, our conclusion follows. ∎

As a consequence of Propositions 2 and 3, we know that the class of regular languages has no upper bound in terms of sweep complexity, since the sweep complexity of any is in 𝒪(n){\mathcal{O}}(n). The left hand cycle in the automata 𝒞\mathcal{C} described in Fig. 6 also showcases that while the conditions from Lemma 1 are necessary for non-regularity (as it requires superconstant complexity), they are not sufficient.

4 Separation results for the language classes SWEEP(logn)\textrm{SWEEP}(\log n) and SWEEP(n)\textrm{SWEEP}(n)

Consider the prolongable morphism φ(a)=abab\varphi(a)=abab, φ(b)=b\varphi(b)=b starting from the word abab. We get φ(ab)=ababb\varphi(ab)=ababb, φ2(ab)=φ(ababb)=ababbababbb\varphi^{2}(ab)=\varphi(ababb)=ababbababbb, etc. The infinite word ϕ=limnφn(ab)=ababbababbb\phi=\lim_{n\rightarrow\infty}\varphi^{n}(ab)=ababbababbb\dots is a fixed point of ϕ\phi. It is easy to see that in ϕ\phi all aa’s stand alone, that is, we never have blocks of aa’s longer than 11, and the lengths of the blocks of bb’s are 1,2,1,31,2,1,3, and so on111The sequence {c(n)}n=1\{c(n)\}_{n=1}^{\infty} given by the lengths of bb blocks is A001511 in OEIS; its most relevant characterization for us is that c(n)1c(n)-1 is the number of trailing zeros in the binary expansion of nn, since this means that c(n)1c(n)-1 is logn\log n for powers of 22. When applying φ\varphi, each aa introduces a new block of bb’s of length 11 and extends a block of bb’s by one, while the number of aa’s doubles. Thus every other block of bb’s gets longer by one on each application of φ\varphi, because of the aa preceding it. A simple inductive argument shows that the last block of bb’s in φn(ab)\varphi^{n}(ab) has length n+1n+1, and is preceded by 2n2^{n} occurrences of aa’s, separated by blocks of bb’s.

Lemma 2

Consider the morphism φ:{a,b}{a,b}\varphi:\{a,b\}^{*}\rightarrow\{a,b\}^{*} given by φ(a)=abab\varphi(a)=abab, φ(b)=b\varphi(b)=b. The following statements hold for any n1n\geq 1:

  1. (i)

    φn(ab)(ababb+)+\varphi^{n}(ab)\in(ababb^{+})^{+};

  2. (ii)

    if φn(ab)=abk1abkm\varphi^{n}(ab)=ab^{k_{1}}\cdots ab^{k_{m}}, then φn+1(ab)=ababk1+1ababk2+1ababkm+1\varphi^{n+1}(ab)=abab^{k_{1}+1}abab^{k_{2}+1}\cdots abab^{k_{m}+1};

  3. (iii)

    φn(ab)=abk1abkm\varphi^{n}(ab)=ab^{k_{1}}\cdots ab^{k_{m}}, where m=2nm=2^{n}, km=n+1k_{m}=n+1 and k2i1=1k_{2i-1}=1 for all i{1,,2n1}i\in\{1,\dots,2^{n-1}\}.

Proof

When n=1n=1, then φ(ab)=ababb\varphi(ab)=ababb, so for n=1n=1 all three claims hold. Suppose they hold for nn. By (ii)(ii) and (iii)(iii) we have that φn+1(ab)\varphi^{n+1}(ab) has the form ababk1+1ababk2+1ababkm+1abab^{k_{1}+1}abab^{k_{2}+1}\cdots abab^{k_{m}+1}, satisfying (i)(i) for n+1n+1. Then,

φn+2(ab)=φ(ababk1+1ababkm+1)=φ(ab)φ(abk1+1)φ(ab)φ(abkm+1)=(abab1+1)(ababk1+2)(abab1+1)(ababkm+2)\begin{split}\varphi^{n+2}(ab)&=\varphi(abab^{k_{1}+1}\cdots abab^{k_{m}+1})=\varphi(ab)\varphi(ab^{k_{1}+1})\cdots\varphi(ab)\varphi(ab^{k_{m}+1})\\ &=(abab^{1+1})(abab^{k_{1}+2})\cdots(abab^{1+1})(abab^{k_{m}+2})\end{split}

From this we can conclude that (ii)(ii) also holds for n+11n+1\geq 1. Further, by the equation above we have φn+1(ab)=ab1abm\varphi^{n+1}(ab)=ab^{\ell_{1}}\cdots ab^{\ell_{m^{\prime}}} with m=2m=22n=2n+1m^{\prime}=2m=2\cdot 2^{n}=2^{n+1}. Finally, because of (ii)(ii) we also get that m=km+1=n+2\ell_{m^{\prime}}=k_{m}+1=n+2 and 2i1=1\ell_{2i-1}=1 for all i{1,,2n}i\in\{1,\dots,2^{n}\}. ∎

𝟏\mathbf{1}𝟐\mathbf{2}𝟑\mathbf{3}aabbbbaa
Figure 7: ROWJFA 𝒟\mathcal{D} accepts a non-regular language with Θ(logn)\Theta(\log n) sweeps.
𝟎𝟎\mathbf{00}𝟏𝟎\mathbf{10}𝟏𝟏\mathbf{11}𝟎𝟏\mathbf{01}aabbbbaaaaaabbbb
Figure 8: DFA accepting words with even (for 𝟎𝟎\mathbf{00} final state) or odd (for 𝟏𝟏\mathbf{11} final state) number of aa’s and bb’s.

In what follows we analyze the language accepted by the automaton 𝒟=({𝟏,𝟐,𝟑},{a,b},{𝟏a𝟐,𝟐a𝟐,𝟐b𝟑,𝟑b𝟏},𝟏,{𝟑})\mathcal{D}=\left(\{\mathbf{1},\mathbf{2},\mathbf{3}\},\{a,b\},\{\mathbf{1}a\rightarrow\mathbf{2},\mathbf{2}a\rightarrow\mathbf{2},\mathbf{2}b\rightarrow\mathbf{3},\mathbf{3}b\rightarrow\mathbf{1}\},\mathbf{1},\{\mathbf{3}\}\right), described in Fig. 8.

Lemma 3

For any n0n\geq 0, the ROWJFA 𝒟\mathcal{D} accepts φn(ab)\varphi^{n}(ab) in n+1n+1 sweeps.

Proof

We show that the machine accepts φn(ab)\varphi^{n}(ab), for any n0n\geq 0. From state 𝟏\mathbf{1} after reading/jumping through a factor of the form ababb+ababb^{+} the automaton gets back to state 𝟏\mathbf{1}. In fact, 𝟏ababkw𝟏wabk1\mathbf{1}abab^{k}w\vdash^{*}\mathbf{1}wab^{k-1}, for any k1k\geq 1, so in one sweep the factor ababkabab^{k} is reduced to abk1ab^{k-1}. From Lemma 2 we can see that we can write φn+1(ab)=ababk1+1ababk2+1ababkm+1\varphi^{n+1}(ab)=abab^{k_{1}+1}abab^{k_{2}+1}\cdots abab^{k_{m}+1}, which means that one sweep of 𝒟\mathcal{D} acts as the inverse of φ\varphi on those words when starting from state 𝟏\mathbf{1}, that is,

𝟏φn+1(ab)=𝟏ababk1+1ababk2+1ababkm+1𝟏abk1abk2abkm=𝟏φn(ab).\mathbf{1}\varphi^{n+1}(ab)=\mathbf{1}abab^{k_{1}+1}abab^{k_{2}+1}\cdots abab^{k_{m}+1}\vdash^{*}\mathbf{1}ab^{k_{1}}ab^{k_{2}}\cdots ab^{k_{m}}=\mathbf{1}\varphi^{n}(ab).

This means that in nn sweeps the machine reduces φn(ab)\varphi^{n}(ab) to φ0(ab)\varphi^{0}(ab). Finally, for n=0n=0, we have φ0(ab)=ab\varphi^{0}(ab)=ab, which is accepted by 𝒟\mathcal{D} in a single sweep.∎

Lemma 4

The ROWJFA 𝒟\mathcal{D} accepts a non-regular language.

Proof

By Lemma 3 we know that for any nn the machine accepts φn(ab)\varphi^{n}(ab), which means that for arbitrarily long unary factors consisting of bb’s, there is some word in L(𝒟)L(\mathcal{D}) having such a factor as a suffix. Our strategy is to first establish a non-linear relation between the length of those unary factors and the length of the preceding factors in all words accepted by 𝒟\mathcal{D}. Then, by a pumping argument we show that a classical finite automaton cannot verify such a non-linear relation, therefore L(𝒟)L(\mathcal{D}) cannot be regular.

Claim 1. Words of the form wbnwb^{n} are only accepted if |w|Ω(2n2)|w|\in\Omega(2^{\frac{n}{2}}).
Proof of Claim 1: In any sweep, any block of aa’s which the automaton starts to read is read and erased completely through a sequence of transitions 𝟏akbu𝟐bu\mathbf{1}a^{k}bu\rightarrow^{*}\mathbf{2}bu. For the automaton to jump over a block of aa’s, it needs to arrive to its start in state 𝟑\mathbf{3}. Then it jumps over it to the next bb, after which it starts and reads completely the following block of aa’s, as described earlier. This means that the machine can never jump over two consecutive blocks of aa’s. From here we get that if at the beginning of the sweep the number of aa blocks was \ell, then after the sweep it is at most 2+1\lfloor\frac{\ell}{2}\rfloor+1.

Furthermore, in each sweep, each block of bb’s is reduced by at most 22. This means that the automaton needs at least n2\frac{n}{2} sweeps to read a block bnb^{n}, in each of which it reduces the number of aa blocks by half (or more). Thus we can conclude that in order to accept a word with a suffix bnb^{n}, we have to start out with at least 2n22^{\frac{n}{2}} blocks of aa’s preceding it. \nabla

Claim 2. No finite automaton can accept L(𝒟)L(\mathcal{D}).
Proof of Claim 2: Suppose the opposite, i.e., that there exists some complete DFA \mathcal{F} having NN states such that L()=L(𝒟)L(\mathcal{F})=L(\mathcal{D}). We know that there are words in the language with arbitrarily long suffixes of bb’s, so there is a wbmL()wb^{m}\in L(\mathcal{F}) for some word ww and exponent m>Nm>N. By a usual pumping argument, this means that there exists some \ell with 0<<N0<\ell<N such that wbm+iL()wb^{m+i\cdot\ell}\in L(\mathcal{F}) for any i0i\geq 0. However, for a large enough ii this contradicts Claim 1, as the block of bb’s can outgrow any upper bound in terms of the length of |w||w|. \nabla

Our result follows as a result of Claims 1 and 2.∎

Lemma 5

The sweep complexity of 𝒟\mathcal{D} is Θ(logn)\Theta(\log n).

Proof

As |φn(ab)|=2n+1+2n1|\varphi^{n}(ab)|=2^{n+1}+2^{n}-1, by Lemma 3 we have that the sweep complexity of 𝒟\mathcal{D} is Ω(logn)\Omega(\log n), so what remains to show is that it is also 𝒪(logn){\mathcal{O}}(\log n).

We first note that within a sweep all blocks of aa’s separated by bbbb are fully processed (including any prefix of aa’s), while for any symbols aa that were jumped over, the entire block that they were part of it was jumped over. Following the argument in the proof of Claim 1 of Lemma 4, in each sweep the number of blocks of aa’s is reduced by at least half, which means that after 𝒪(logn){\mathcal{O}}(\log n) sweeps there are no more blocks of aa on the tape. Then, the machine either accepts in one sweep or it rejects the input. This leads to our conclusion.∎

The results of Lemmas 4 and 5 mean that we have separation between SWEEP(1)\textrm{SWEEP}(1) and SWEEP(logn)\textrm{SWEEP}(\log n).

Theorem 4.1

SWEEP(1)SWEEP(logn)\textrm{SWEEP}(1)\subsetneq\textrm{SWEEP}(\log n)

Proof

Lemma 5 says L(𝒟)SWEEP(logn)L(\mathcal{D})\in\textrm{SWEEP}(\log n). By Theorem 2.1 we know that SWEEP(1)\textrm{SWEEP}(1) is included in the class of regular languages. Finally, by Lemma 4 we have that L(𝒟)L(\mathcal{D}) is not regular which means that L(𝒟)SWEEP(1)L(\mathcal{D})\notin\textrm{SWEEP}(1).∎

Lemma 6

Any automaton which accepts Lab={w{a,b}|w|a=|w|b}L_{ab}=\{w\in\{a,b\}^{*}\mid|w|_{a}=|w|_{b}\} has sweep complexity Θ(n)\Theta(n).

Proof

We know that every machine has sweep complexity 𝒪(n)\mathcal{O}(n), so it is enough to show that it is not possible to accept LabL_{ab} with sublinear sweep complexity. For that we assume that such an automaton, say =(Q,Σ,R,s,F)\mathcal{F}=(Q,\Sigma,R,s,F) exists, and derive a contradiction.

If \mathcal{F} had linear sweep complexity, then it could have computations on infinitely many inputs in which all sweeps process a constant number of symbols. However, with sublinear complexity we get that for any constant CC and for all long enough inputs wLabw\in L_{ab}, during the processing of ww at least one sweep reads more than CC symbols. We also know that anbnLaba^{n}b^{n}\in L_{ab} for any n0n\geq 0. Let C=2|Q|C=2|Q| where |Q||Q| is the number of states of \mathcal{F} and consider an input w=ambmw=a^{m}b^{m} with mm large enough that the machine reads more than CC symbols in some sweep while processing ww. The remaining input at the beginning of that sweep is akba^{k}b^{\ell} for some k,k,\ell such that k+>Ck+\ell>C. During the sweep the machine reads akba^{k^{\prime}}b^{\ell^{\prime}} where k+>Ck^{\prime}+\ell^{\prime}>C. This means that either k>|Q|k^{\prime}>|Q| or >|Q|\ell^{\prime}>|Q|. Without loss of generality we can assume k>|Q|k^{\prime}>|Q|. This gives us that while reading aka^{k^{\prime}} the automaton must visit some state 𝐩\mathbf{p} at least twice while reading only aa’s, so we get that 𝐩ar𝐩\mathbf{p}a^{r}\rightarrow^{*}\mathbf{p} for some r>0r>0. But then, by a usual pumping argument the machine also needs to accept an+rbnLaba^{n+r}b^{n}\notin L_{ab} contradicting our assumption that L()=LabL(\mathcal{F})=L_{ab} and concluding the proof.∎

Theorem 4.2

For any f:f:\mathbb{N}\rightarrow\mathbb{N} with f(n)o(n)f(n)\in o(n) we have SWEEP(f(n))SWEEP(n)\textrm{SWEEP}(f(n))\subsetneq\textrm{SWEEP}(n).

Proof

By Lemma 6 we know that LabSWEEP(f(n))L_{ab}\notin\textrm{SWEEP}(f(n)) for any sublinear function f(n)f(n). The two-state automaton 𝒜\mathcal{A} accepts the language with sweep complexity Θ(n)\Theta(n). This is easy to see when considering the worst-case inputs of the form anbna^{n}b^{n} for n0n\geq 0.∎

5 Concluding remarks

Apart from the complexity considerations listed below we think the proof of Lemma 4 contains a detail worth emphasizing: the automaton can verify a logarithmic/exponential relation between two factors of suitably chosen inputs! We found this very surprising since we still basically deal with DFA which cannot store information and cannot ‘choose’ which symbols to read or jump over222Iterated uniform finite transducers can also verify such relationships, albeit their computing power is much stronger. [11].

We presented automata for all pairings of regular and non-regular languages with logarithmic and linear worst case sweep complexity. This way we disproved the conjecture on the constant sweep requirement for regularity [9] and answered several questions regarding sweep complexity posed in [8]:

  1. 1.

    Is the language of each machine with ω(1)\omega(1) complexity non-regular? NO, by Section 3.

  2. 2.

    Is there a machine with sweep complexity between constant and linear, that is, ω(1)\omega(1) and o(n)o(n)? YES, by Theorem 3.1 (and Lemma 5).

  3. 3.

    Is there a language with sweep complexity between constant and linear, that is, all machines accepting it have superconstant complexity and at least one has sublinear? YES, by Theorem 4.1.

  4. 4.

    Is there an upper bound in terms of sweep complexity on machines accepting regular languages? NO, by Propositions 2 and 3.

  5. 5.

    Are machines less complex in the case of binary alphabets, given that the complementary deficient pairs of Lemma 1 are predetermined? NO, illustrated by the fact that all results have been obtained over a binary alphabet.

These coarser forms of Questions 2 and 3 have been answered here, but for a complete picture one would want to know whether there exist machines with arbitrary (constructible) sublinear complexity and its equivalent for languages. The most obvious choices for such a study would probably be complexities Θ(logkn)\Theta(\log^{k}n) and Θ(nϵ)\Theta(n^{\epsilon}), for constants k>1k>1 and ϵ<1\epsilon<1. Another angle related to Question 4 is to study the lower bound of non-regularity: logarithmic complexity can produce non-regular languages, but can we do it with less of this ‘non-regular’ resource? In the case of Question 5, our answer may be refined, as there may by some sublinear f(n)f(n) such that the machines of Θ(f(n))\Theta(f(n)) complexity all accept regular or all accept non-regular languages, although we have not seen anything that indicates such perplexing behaviour.

Another interesting direction relates to our original motivation in looking at the complexity of these automata, deciding regularity. The question more generally becomes, is it decidable given a machine or language and a function f(n)f(n), whether the machine/language has Θ(f(n))\Theta(f(n)) complexity (or its one-sided variants with 𝒪\mathcal{O} and Ω\Omega)? We suspect that the answer is yes at least in the case of constant and linear functions but have no idea about the logarithmic and more complicated cases.

References

  • [1] Beier, S., Holzer, M.: Properties of right one-way jumping finite automata. Theoretical Computer Science 798, 78 – 94 (2019)
  • [2] Beier, S., Holzer, M.: Decidability of right one-way jumping finite automata. International Journal of Foundations of Computer Science 31(6), 805–825 (2020)
  • [3] Beier, S., Holzer, M.: Nondeterministic right one-way jumping finite automata. Information and Computation 284, 104687 (2022), selected papers from DCFS 2019
  • [4] Bensch, S., Bordihn, H., Holzer, M., Kutrib, M.: On input-revolving deterministic and nondeterministic finite automata. Information and Computation 207(11), 1140–1155 (2009)
  • [5] Chigahara, H., Fazekas, S.Z., Yamamura, A.: One-way jumping finite automata. International Journal of Foundations of Computer Science 27(3), 391–405 (2016)
  • [6] Fazekas, S.Z., Hoshi, K., Yamamura, A.: The effect of jumping modes on various automata models. Natural Computing (2021)
  • [7] Fazekas, S.Z., Hoshi, K., Yamamura, A.: Two-way deterministic automata with jumping mode. Theoretical Computer Science 864, 92–102 (2021)
  • [8] Fazekas, S.Z., Mercaş, R., Wu, O.: Complexities for jumps and sweeps. J. Autom. Lang. Comb. 27(1-3), 131–149 (2022)
  • [9] Fazekas, S.Z., Yamamura, A.: On regular languages accepted by one-way jumping finite automata. In: 8th Workshop on Non-Classical Models of Automata and Applications, Short Papers. pp. 7–14 (2016)
  • [10] Jančar, P., Mráz, F., Plátek, M., Vogel, J.: Restarting automata. In: Reichel, H. (ed.) Fundamentals of Computation Theory. pp. 283–292. Springer Berlin Heidelberg, Berlin, Heidelberg (1995)
  • [11] Kutrib, M., Malcher, A., Mereghetti, C., Palano, B.: Descriptional complexity of iterated uniform finite-state transducers. Information and Computation 284, 104691 (2022)
  • [12] Meduna, A., Zemek, P.: Jumping finite automata. International Journal of Foundations of Computer Science 23(7), 1555–1578 (2012)
  • [13] Nagy, B., Otto, F.: Finite-state acceptors with translucent letters. In: BILC 2011 - 1st International Workshop on AI Methods for Interdisciplinary Research in Language and Biology, ICAART 2011. pp. 3–13 (2011)