This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The maximum length of shortest accepted strings for direction-determinate two-way finite automata

Olga Martynova    Alexander Okhotin
Abstract

It is shown that, for every n2n\geqslant 2, the maximum length of the shortest string accepted by an nn-state direction-determinate two-way finite automaton is exactly (nn2)1\binom{n}{\lfloor\frac{n}{2}\rfloor}-1 (direction-determinate automata are those that always remember in the current state whether the last move was to the left or to the right). For two-way finite automata of the general form, a family of nn-state automata with shortest accepted strings of length 342n1\frac{3}{4}\cdot 2^{n}-1 is constructed.

1 Introduction

A natural question about automata and related models of computation is the length of the shortest string an automaton accepts. A function mapping the size of an automaton to the maximum length of the shortest accepted string, with the maximum taken over all automata of that size, is a certain complexity measure for a family of automata.

For one-way finite automata, this measure is trivial: the length of the shortest string accepted by a nondeterministic finite automaton (NFA) with nn states is at most n1n-1: this is the length of the shortest path to an accepting state. On the other hand, Ellul et al. [4] proved that the length of shortest strings not accepted by an nn-state NFA is exponential in nn. Similar questions were studied for other models and some variants of the problem. Chistikov et al. [2] investigated the length of shortest strings in counter automata. The length of shortest strings in formal grammars under intersections with regular languages was studied by Pierre [10], and recently by Shemetova et al. [11]. Alpoge et al. [1] investigated shortest strings in intersections of deterministic one-way finite automata (DFA).

The maximum length of shortest strings for deterministic two-way finite automata (2DFA) has been investigated in two recent papers. First of all, from the well-known proof of the PSPACE-completeness of the emptiness problem for 2DFA by Kozen [7] it is understood that the length of the shortest string accepted by an nn-state 2DFA can be exponential in nn. There is also an exponential upper bound on this length, given by transforming a 2DFA to an NFA: the construction by Kapoutsis [6] uses at most (2nn+1)=Θ(1n4n)\binom{2n}{n+1}=\Theta(\frac{1}{\sqrt{n}}4^{n}) states, and hence the length of the shortest string is slightly less than 4n4^{n}. Overall, the maximum length of the shortest string is exponential, with the base bounded by 4.

The first attempt to determine the exact base was made by Dobronravov et al. [3], who constructed a family of nn-state 2DFA with shortest strings of length Ω((105)n)Ω(1.584n)\Omega((\sqrt[5]{10})^{n})\geqslant\Omega(1.584^{n}). The automata they have actually constructed belong to a special class of 2DFA: the direction-determinate automata. These are 2DFA with the set of states split into states accessible only by transitions from the right and states accessible only by transitions from the left: in other words, direction-determinate automata always remember the direction of the last transition in their state.

Later, Krymski and Okhotin [8] extended the method of Dobronravov et al. [3] to produce automata of a more general form, with longer shortest accepted strings. They constructed a family of non-direction-determinate 2DFA with shortest strings of length Ω((74)n)Ω(1.626n)\Omega((\sqrt[4]{7})^{n})\geqslant\Omega(1.626^{n}).

This paper improves these bounds. First, the maximum length of the shortest string accepted by nn-state direction-determinate 2DFA is determined precisely as (nn2)1=Θ(1n2n)\binom{n}{\lfloor\frac{n}{2}\rfloor}-1=\Theta(\frac{1}{\sqrt{n}}2^{n}). The upper bound on the length of the shortest string immediately follows from the complexity of transforming direction-determinate 2DFA to NFA, see Geffert and Okhotin [5]. A matching lower bound is proved by a direct construction of a family of nn-state automata.

The second result of this paper is that not remembering the direction helps to accept longer shortest strings: a family of nn-state non-direction-determinate automata with shortest strings of length 342n1\frac{3}{4}\cdot 2^{n}-1 is constructed. This is more than what is possible in direction-determinate automata.

2 Definitions

Definition 1.

A two-way deterministic finite automaton (2DFA) is a quintuple 𝒜=(Σ,Q,q0,δ,F)\mathcal{A}=(\Sigma,Q,q_{0},\delta,F), in which:

  • Σ\Sigma is a finite alphabet, which does not contain two special symbols: the left end-marker (\vdash) and the right end-marker (\dashv);

  • QQ is a finite set of states;

  • q0Qq_{0}\in Q is the initial state;

  • δ:Q×(Σ{,})Q×{1,+1}\delta\colon Q\times(\Sigma\cup\{{\vdash},{\dashv}\})\to Q\times\{-1,+1\} is a partial transition function;

  • FQF\subseteq Q is the set of accepting states, effective at the right end-marker (\dashv).

An input string w=a1amΣw=a_{1}\ldots a_{m}\in\Sigma^{*} is given to an automaton on a tape a1am{\vdash}a_{1}\ldots a_{m}{\dashv}. The automaton starts at the left end-marker {\vdash} in the state q0q_{0}. At each moment, if the automaton is in a state qQq\in Q and sees a symbol aΣ{,}a\in\Sigma\cup\{{\vdash},{\dashv}\}, then, according to the transition function δ(q,a)=(r,d)\delta(q,a)=(r,d), it enters a new state rr and moves to the left or to the right depending on the direction dd. If the requested value δ(q,a)\delta(q,a) is not defined, then the automaton rejects. The automaton accepts the string, if it ever comes to the right end-marker \dashv in any state from FF. The automaton can also loop.

The language recognized by an automaton AA, denoted by L(A)L(A), is the set of all strings it accepts.

This paper also uses a subclass of 2DFA, in which one can determine the direction of the previous transition from the current state.

Definition 2 ([9]).

A 2DFA is called direction-determinate, if there is a partition of the set of states Q=Q+QQ=Q^{+}\cup Q^{-}, with Q+Q=Q^{+}\cap Q^{-}=\varnothing, such that for each transition δ(q,a)=(r,+1)\delta(q,a)=(r,+1), the state rr must belong to Q+Q^{+}, and for each transition δ(q,a)=(r,1)\delta(q,a)=(r,-1), the state rr is in QQ^{-}.

The known upper bounds on the length of the shortest accepted string are different for direction-determinate 2DFA and for 2DFA of the general form. These bounds are inferred from the complexity of transforming two-way automata with nn states to one-way NFA: for 2DFA of the general form, as proved by Kapoutsis [6], it is sufficient and in the worst case necessary to use (2nn)\binom{2n}{n} states in a simulating NFA, whereas for direction-determinate 2DFA the simulating 2DFA requires (nn2)\binom{n}{\lfloor\frac{n}{2}\rfloor} states in the worst case, see Geffert and Okhotin [5]. Since the shortest string in a language cannot be longer than the shortest path to an accepting state in an NFA, the following bounds hold.

Theorem 1 (Dobronravov et al. [3]).

Let n1n\geqslant 1, and let AA be a 2DFA with nn states, which accepts at least one string. Then the length of the shortest string accepted by AA is at most (2nn)1\binom{2n}{n}-1. If the automaton AA is direction-determinate, then the length of the shortest accepted string does not exceed (nn2)1\binom{n}{\lfloor\frac{n}{2}\rfloor}-1.

The first result of this paper is that this upper bound for direction-determinate automata is actually precise.

3 Shortest accepted strings for direction-determinate automata

In this section, direction-determinate automata with the maximum possible length (nn2)1\binom{n}{\lfloor\frac{n}{2}\rfloor}-1 of shortest accepted strings, where nn is the number of states, will be constructed.

Automata are constructed for every kk and \ell, where kk is the number of states reachable by transitions to the right and \ell is the number of states reachable in the left direction. The following theorem shall be proved.

Theorem 2.

For every k2k\geqslant 2 and 0\ell\geqslant 0 there exists a direction-determinate 2DFA with the set of states Q=Q+QQ=Q^{+}\cup Q^{-}, where |Q+|=k|Q^{+}|=k and |Q|=|Q^{-}|=\ell, such that the length of the shortest string it accepts is (k++1)1\binom{k+\ell}{\ell+1}-1.

The automaton constructed in the theorem works as follows. While working on its shortest string, it processes every pair of consecutive symbols by moving back and forth between them, thus effectively comparing them to each other. Eventually it moves on to the next pair and processes it in the same way. It cannot come back to the previous pair anymore, because it has no transitions for that.

The automaton’s motion between two neighbouring symbols begins when it first arrives from the first symbol to the second in some state from Q+Q^{+}. Then it moves back and forth, alternating between states from Q+Q^{+} at the second symbol and states from QQ^{-} at the first symbol, and finally leaves the second symbol to the right. Among the states visited by the automaton during this back-and-forth motion, the number of states from Q+Q^{+} is greater by one than the number of states from QQ^{-}. Two such sets of states will be denoted by a pair (P,R)(P,R), where PQP\subseteq Q^{-}, RQ+R\subseteq Q^{+} and |R|=|P|+1|R|=|P|+1.

Proposition 1.

There are (k++1)\binom{k+\ell}{\ell+1} different pairs (P,R)(P,R), such that PQP\subseteq Q^{-}, RQ+R\subseteq Q^{+} and |R|=|P|+1|R|=|P|+1.

Proof.

There are as many pairs (P,R)(P,R) as pairs (QP,R)(Q^{-}\setminus P,R), where |R|=|P|+1|R|=|P|+1. The number of pairs of the latter form is equal to the number of subsets of QQ of size +1\ell+1, that is, (k++1)\binom{k+\ell}{\ell+1}. ∎

Let the sets Q+Q^{+} and QQ^{-} be linearly ordered. Then one can define an order on the set of pairs (P,R)(P,R) as follows. In every such pair, let P={p1,,pm}P=\{p_{1},\ldots,p_{m}\}, where p1<<pmp_{1}<\ldots<p_{m}, and R={r1,,rm+1}R=\{r_{1},\ldots,r_{m+1}\}, where r1<<rm+1r_{1}<\ldots<r_{m+1}. There is a corresponding sequence to each pair, of the form r1r_{1}, p1-p_{1}, r2r_{2}, p2-p_{2}, …, rmr_{m}, pm-p_{m}, rm+1r_{m+1}, and different pairs are compared by the lexicographic order on these sequences. In Table 1, all pairs (P,R)(P,R), for k=4k=4 and =2\ell=2, are given in increasing order, along with the corresponding sequences.

pairs (P,R)sequences,{1}(1){2},{1,2}(1,2,2){2},{1,3}(1,2,3){2},{1,4}(1,2,4){1},{1,2}(1,1,2){1,2},{1,2,3}(1,1,2,2,3){1,2},{1,2,4}(1,1,2,2,4){1},{1,3}(1,1,3){1,2},{1,3,4}(1,1,3,2,4){1},{1,4}(1,1,4),{2}(2){2},{2,3}(2,2,3){2},{2,4}(2,2,4){1},{2,3}(2,1,3){1,2},{2,3,4}(2,1,3,2,4){1},{2,4}(2,1,4),{3}(3){2},{3,4}(3,2,4){1},{3,4}(3,1,4),{4}(4)\begin{array}[]{ll}\text{pairs }(P,R)&\text{sequences}\\ \hline\cr\varnothing,\{1\}&(1)\\ \{2^{\prime}\},\{1,2\}&(1,-2^{\prime},2)\\ \{2^{\prime}\},\{1,3\}&(1,-2^{\prime},3)\\ \{2^{\prime}\},\{1,4\}&(1,-2^{\prime},4)\\ \{1^{\prime}\},\{1,2\}&(1,-1^{\prime},2)\\ \{1^{\prime},2^{\prime}\},\{1,2,3\}&(1,-1^{\prime},2,-2^{\prime},3)\\ \{1^{\prime},2^{\prime}\},\{1,2,4\}&(1,-1^{\prime},2,-2^{\prime},4)\\ \{1^{\prime}\},\{1,3\}&(1,-1^{\prime},3)\\ \{1^{\prime},2^{\prime}\},\{1,3,4\}&(1,-1^{\prime},3,-2^{\prime},4)\\ \{1^{\prime}\},\{1,4\}&(1,-1^{\prime},4)\\ \varnothing,\{2\}&(2)\\ \{2^{\prime}\},\{2,3\}&(2,-2^{\prime},3)\\ \{2^{\prime}\},\{2,4\}&(2,-2^{\prime},4)\\ \{1^{\prime}\},\{2,3\}&(2,-1^{\prime},3)\\ \{1^{\prime},2^{\prime}\},\{2,3,4\}&(2,-1^{\prime},3,-2^{\prime},4)\\ \{1^{\prime}\},\{2,4\}&(2,-1^{\prime},4)\\ \varnothing,\{3\}&(3)\\ \{2^{\prime}\},\{3,4\}&(3,-2^{\prime},4)\\ \{1^{\prime}\},\{3,4\}&(3,-1^{\prime},4)\\ \varnothing,\{4\}&(4)\end{array}

Table 1: All pairs (P,R)(P,R) for sets of states Q+={1,2,3,4}Q^{+}=\{1,2,3,4\} and Q={1,2}Q^{-}=\{1^{\prime},2^{\prime}\}.

Let N=(k++1)N=\binom{k+\ell}{\ell+1} be the number of pairs. Then all pairs are enumerated in increasing order as (P(1),R(1))<<(P(N),R(N))(P^{(1)},R^{(1)})<\ldots<(P^{(N)},R^{(N)}), where P(i)={p1(i),,pmi(i)}P^{(i)}=\{p^{(i)}_{1},\ldots,p^{(i)}_{m_{i}}\} and R(i)={r1(i),,rmi+1(i)}R^{(i)}=\{r^{(i)}_{1},\ldots,r^{(i)}_{m_{i}+1}\}. In particular, the least pair is (P(1),R(1))=(,{minQ+})(P^{(1)},R^{(1)})=(\varnothing,\{\min Q^{+}\}), because the corresponding sequence (minQ+\min Q^{+}) is lexicographically the least. The greatest pair is (P(N),R(N))=(,{maxQ+})(P^{(N)},R^{(N)})=(\varnothing,\{\max Q^{+}\}).

The desired direction-determinate automaton AA with the shortest accepted string of length N1N-1 is defined over an alphabet Σ={a1,,aN1}\Sigma=\{a_{1},\ldots,a_{N-1}\}, and the shortest accepted string will be w=a1aN1w=a_{1}\ldots a_{N-1}. The set of states is defined as Q=Q+QQ=Q^{+}\cup Q^{-}, where Q+={1,,k}Q^{+}=\{1,\ldots,k\} and Q={1,,}Q^{-}=\{1^{\prime},\ldots,\ell^{\prime}\}. The initial state is q0=1q_{0}=1. The only transition by the left end-marker (\vdash) leads from the initial state to the least state in R(1)R^{(1)}.

δ(q0,)\displaystyle\delta(q_{0},{\vdash}) =(r1(1),+1)\displaystyle=(r^{(1)}_{1},+1) (1a)
For each symbol aia_{i}, transitions are defined in the states R(i)P(i+1)R^{(i)}\cup P^{(i+1)}. If the automaton is at the symbol aia_{i} in any state from R(i)R^{(i)} (except for the greatest state), then it moves to the left in the corresponding state from P(i)P^{(i)}.
δ(rj(i),ai)\displaystyle\delta(r^{(i)}_{j},a_{i}) =(pj(i),1)\displaystyle=(p^{(i)}_{j},-1) (j{1,,mi})\displaystyle(j\in\{1,\ldots,m_{i}\}) (1b)
For the greatest state in R(i)R^{(i)}, there is no corresponding state in P(i)P^{(i)}, and so the automaton moves to the right (and this is the only way to move from Q+Q^{+} to Q+Q^{+}, and hence the only way to advance from the symbol aia_{i} to the next symbol for the first time).
δ(rmi+1(i),ai)\displaystyle\delta(r^{(i)}_{m_{i}+1},a_{i}) =(r1(i+1),+1)\displaystyle=(r^{(i+1)}_{1},+1) (1c)
In each state from P(i)P^{(i)}, the automaton moves to the right in the next available state from R(i)R^{(i)}.
δ(pj(i+1),ai)\displaystyle\delta(p^{(i+1)}_{j},a_{i}) =(rj+1(i+1),+1)\displaystyle=(r^{(i+1)}_{j+1},+1) (j{1,,mi+1})\displaystyle(j\in\{1,\ldots,m_{i+1}\}) (1d)

There are no transitions at the right end-marker, and there is one accepting state: F={rmN+1(N)}F=\{r^{(N)}_{m_{N}+1}\}.

Refer to caption

Figure 1: The accepting computation of the automaton AA on the string ww, for k=4k=4 and =2\ell=2.

The computation of the automaton on the string w=a1aN1w=a_{1}\ldots a_{N-1} is illustrated in Figure 1. The automaton gradually advances, and moves between every two subsequent symbols, ai1a_{i-1} and aia_{i}, according to the sets PiP_{i} and RiR_{i}. Transitions at aia_{i} expect that there is ai1a_{i-1} to the left, whereas transitions at ai1a_{i-1} expect aia_{i} to the right. As long as every symbol is followed by the next symbol in order, these expectations will be fulfilled each time, and the automaton accepts in the end.

Lemma 1.

The automaton AA accepts the string w=a1aN1w=a_{1}\ldots a_{N-1}.

Proof.

It is claimed that the automaton AA, executed on the string ww, eventually arrives to each symbol aia_{i} in the state rmi+1(i)r^{(i)}_{m_{i}+1}. This is proved by induction on ii.

Base case i=1i=1: the first transition (1a) moves the automaton to the state r1(1)r^{(1)}_{1}. The first pair (P(1),R(1))(P^{(1)},R^{(1)}) is (,{1})(\varnothing,\{1\}), and so r1(1)=rm1+1(1)r^{(1)}_{1}=r^{(1)}_{m_{1}+1}.

Induction step. Assume that the automaton comes to the symbol aia_{i} in the state rmi+1(i)r^{(i)}_{m_{i}+1}. Then it makes a transition (1c) to the right in the state r1(i+1)r^{(i+1)}_{1}. Then it executes the sequence of transitions (1b), (1d), defined by the pair (Pi+1,Ri+1)(P_{i+1},R_{i+1}), moving back and forth between ai+1a_{i+1} and aia_{i}, and passing through the states p1(i+1)p^{(i+1)}_{1}, r2(i+1)r^{(i+1)}_{2}, p2(i+1)p^{(i+1)}_{2}, …rmi+1(i+1)r^{(i+1)}_{m_{i+1}}, pmi+1(i+1)p^{(i+1)}_{m_{i+1}}, rmi+1+1(i+1)r^{(i+1)}_{m_{i+1}+1}. And so it comes to the symbol ai+1a_{i+1} in the state rmi+1+1(i+1)r^{(i+1)}_{m_{i+1}+1}, as shown in Figure 2.

Refer to caption

Figure 2: The moves of AA between two neighbouring symbols of ww.

In the end, the automaton comes to the last symbol aN1a_{N-1} in the state rmN1+1(N1)r^{(N-1)}_{m_{N-1}+1}. Then it makes a transition (1c) and moves to the right end-marker in the state r1(N)r^{(N)}_{1}. And this is the accepting state rmN+1(N)r^{(N)}_{m_{N}+1}, because the last pair (P(N),R(N))(P^{(N)},R^{(N)}) is (,{k})(\varnothing,\{k\}). Therefore, the string ww is accepted. ∎

It is claimed that the automaton AA cannot accept any shorter string. It cannot accept the empty string; if it did, then the first transition would lead to the right end-marker in the state 1, and the automaton would reject, because k1k\neq 1. Next, it will be shown that each accepted string begins with the symbol a1a_{1} and ends with the symbol aN1a_{N-1}. Finally, it will be proved that the automaton cannot skip any number, that is, the number of every next symbol, as compared to the number of the previous symbol, cannot increase by more than 11. If the number decreases or does not change, this would make the string only longer; but in order to reach aN1a_{N-1} from a1a_{1} without skipping any number, the automaton would have to move through all symbols of the alphabet, and therefore an accepted string cannot be shorter than N1N-1 symbols.

Lemma 2.

Every string accepted by the automaton AA begins with the symbol a1a_{1}.

Proof.

Let the automaton AA accept some string that starts from some symbol aia_{i}. The transition from the initial configuration leads the automaton to the state r1(1)r^{(1)}_{1} at the first symbol aia_{i}. As (P(1),R(1))=(,{1})(P^{(1)},R^{(1)})=(\varnothing,\{1\}), the state r1(1)r^{(1)}_{1} is 11.

Transitions by the symbol aia_{i} are defined only in states from R(i)P(i+1)R^{(i)}\cup P^{(i+1)}, and hence 1R(i)1\in R^{(i)}, for otherwise the automaton immediately rejects. If there is at least one more state in R(i)R^{(i)}, then the transition in the state 11 by aia_{i} moves the automaton to the left. Then the automaton returns to the left end-marker, and then either loops or rejects, because there is only one transition defined there. Therefore, there are no other states in R(i)R^{(i)} besides 11, and so, (P(i),R(i))=(,{1})=(P(1),R(1))(P^{(i)},R^{(i)})=(\varnothing,\{1\})=(P^{(1)},R^{(1)}), which implies i=1i=1. ∎

Lemma 3.

Every string accepted by the automaton AA ends with the symbol aN1a_{N-1}.

Proof.

Let a string accepted by AA end with a symbol aia_{i}. To accept, the automaton should move from aia_{i} to the right using the transition (1c), and it arrives to the right end-marker in the state r1(i+1)r^{(i+1)}_{1}. As the only accepting state is kk, and the automaton rejects at the right end-marker in all other states, this state must be r1(i+1)=kr^{(i+1)}_{1}=k. Because the state r1(i+1)r^{(i+1)}_{1} is the least in R(i+1)R^{(i+1)}, it follows that R(i+1)={k}R^{(i+1)}=\{k\} and P(i+1)=P^{(i+1)}=\varnothing. Therefore, this is the last pair, and i=N1i=N-1. ∎

Lemma 4.

No string accepted by the automaton AA may contain any substring of the form aiaja_{i}a_{j}, where j>i+1j>i+1.

Proof.

The proof is by contradiction. Suppose that AA accepts a string that contains a substring aiaja_{i}a_{j}, with j>i+1j>i+1. In order to accept, the automaton should eventually reach this symbol aja_{j} for the first time, moving to it from the symbol aia_{i}. To make this transition, the automaton should be at aia_{i} in some state from Q+Q^{+} (indeed, if it were in the state from QQ^{-}, then it would have been at aja_{j} already at the previous step). Then the automaton must use the transition (1c) to move from aia_{i} to aja_{j}, and this transition leads to the state r1(i+1)r^{(i+1)}_{1}. For the computation to go onward, this state should lie in R(j)R^{(j)}. Moreover, the state r1(i+1)r^{(i+1)}_{1} should be the least in R(j)R^{(j)}, for otherwise the pair (P(j),R(j))(P^{(j)},R^{(j)}) would be less than the pair (P(i+1),R(i+1))(P^{(i+1)},R^{(i+1)}). Also r1(i+1)r^{(i+1)}_{1} cannot be the only state in R(j)R^{(j)}: if not, then (P(j),R(j))(P^{(j)},R^{(j)}) would either coincide with or be less than (P(i+1),R(i+1))(P^{(i+1)},R^{(i+1)}).

It can be concluded that r1(i+1)=r1(j)r^{(i+1)}_{1}=r^{(j)}_{1}, and the next transition from this state leads to the state p1(j)p^{(j)}_{1}, moving to the symbol aia_{i}. For the automaton to have a transition in the state p1(j)p^{(j)}_{1} at aia_{i}, this state should belong to P(i+1)P^{(i+1)}. In addition, it should be the least among the state in P(i+1)P^{(i+1)}, because if there were a lesser state pp, then the second term in the sequence for (P(i+1),R(i+1))(P^{(i+1)},R^{(i+1)}) would be p-p, and this pair would be greater than (P(j),R(j))(P^{(j)},R^{(j)}). This leads to the equality p1(j)=p1(i+1)p^{(j)}_{1}=p^{(i+1)}_{1}.

By analogous arguments, one can prove that the sequences for (P(j),R(j))(P^{(j)},R^{(j)}) and for (P(i+1),R(i+1))(P^{(i+1)},R^{(i+1)}) must coincide and continue infinitely. This is impossible, because the numbers of states increase, and there finitely many of them. ∎

Corollary 1 (from Theorem 2).

For every n2n\geqslant 2, there is a direction-determinate 2DFA with nn states, such that the length of the shortest string it accepts is (nn2)1\binom{n}{\lfloor\frac{n}{2}\rfloor}-1.

4 Longer shortest strings for automata of the general form

Refer to caption

Figure 3: Computations of automata A2A_{2}, A3A_{3} and A4A_{4} from the proof of Theorem 3 on their shortest strings w2w_{2}, w3w_{3} and w4w_{4}.

The main result of this section is the construction of a family of 2DFA with shortest strings of length 32n213\cdot 2^{n-2}-1, where nn is the number of states in an automaton. This is more than the maximum possible length of shortest strings for direction-determinate automata; in other words, forgetting the direction is useful.

Theorem 3.

For each n2n\geqslant 2 there exists a 2DFA with nn states, such that the shortest string it accepts is of length 32n213\cdot 2^{n-2}-1.

Proof.

The automata and the shortest strings they accept are constructed inductively; for small values of nn they are given in Figure 3.

For the inductive proof to work, the following set of properties is ensured for every nn.

Claim.

For each n2n\geqslant 2 there exists a 2DFA An=(Σn,Qn,δn)A_{n}=(\Sigma_{n},Q_{n},\delta_{n}) with no transitions by end-markers, no initial state and no accepting states, with the set of states Qn={1,,n}Q_{n}=\{1,\ldots,n\}, and there exists a string wnΣnw_{n}\in\Sigma_{n}^{*} of length 32n213\cdot 2^{n-2}-1, such that the following two properties hold.

  1. 1.

    If AnA_{n} starts at any symbol of wnw_{n} in the state nn, then it eventually leaves this string by a transition from its rightmost symbol to the right in the state 11.

  2. 2.

    If for some non-empty string uu there exists a position, in which the automaton AnA_{n} can start in the state nn and eventually leave the string uu by a transition from its rightmost symbol to the right in the state 11, then uu is at least as long as wnw_{n}.

The first observation is that Theorem 3 follows from this claim. Let n2n\geqslant 2, and let AnA_{n} and wnw_{n} be an automaton and a string that satisfy the conditions in the claim. Then AnA_{n} is supplemented with an initial state nn, a set of accepting states {1}\{1\} and a single transition by the left end-marker: from the state nn to the state nn; no transitions by the right end-marker are defined. The resulting automaton AnA^{\prime}_{n} becomes a valid 2DFA, and it accepts the string wnw_{n} as follows: from the initial state at \vdash it moves to the first symbol of wnw_{n} in the state nn, then, by the first point of the claim, the automaton eventually leaves wnw_{n} to the right in the state 11, and thus arrives to the right end-marker \dashv in an accepting state.

To see that every string accepted by AnA^{\prime}_{n} is of length at least |wn||w_{n}|, let uu be any accepted string. It is not empty, because on the empty string the automaton steps on the right end-marker in the state nn and rejects. Then, after the first step the automaton AnA^{\prime}_{n} is at the first symbol of uu in the state nn. It cannot return to \vdash, because it has already used the only transition at this label, and if it ever comes back, it will reject or loop. Also the automaton cannot come to \dashv in states other than 11. In order to accept, it must arrive to \dashv in the state 11, and this is the first and the only time when it leaves the string uu. Then, by the second point of the claim, the length of uu cannot be less than the length of wnw_{n}.

It remains to prove the claim, which is done by induction on nn.

Base case: n=2n=2.

The automaton A2=(Σ2,Q2,δ2)A_{2}=(\Sigma_{2},Q_{2},\delta_{2}) for n=2n=2 is constructed as follows. The alphabet is Σ2={a,b}\Sigma_{2}=\{a,b\}, and the set of states is Q2={1,2}Q_{2}=\{1,2\}. The transition function is defined by

δ2(2,a)\displaystyle\delta_{2}(2,a) =(2,+1),\displaystyle=(2,+1),
δ2(2,b)\displaystyle\delta_{2}(2,b) =(1,1),\displaystyle=(1,-1),
δ2(1,a)\displaystyle\delta_{2}(1,a) =(1,+1),\displaystyle=(1,+1),
δ2(1,b)\displaystyle\delta_{2}(1,b) =(1,+1).\displaystyle=(1,+1).

The string w2w_{2} is abab, and the computation of A2A_{2} on w2w_{2} is presented in Figure 3 (top left). To be precise, computations starting in the state 22 either at aa or at bb both end by leaving the string to the right in the state 11, as claimed. There are only two shorter non-empty strings: aa and bb. If the automaton starts on the string aa in the state 22, then it moves to the right in the state 22; on bb, it moves to the left in the state 11. In either case, it does not go to the right in the state 11. Thus, the second point of the claim is satisfied. The length of the string is |w2|=2=3201|w_{2}|=2=3\cdot 2^{0}-1.

Induction step: nn+1n\to n+1.

Let an nn-state 2DFA An=(Σn,Qn,δn)A_{n}=(\Sigma_{n},Q_{n},\delta_{n}) and a string wnΣnw_{n}\in\Sigma_{n}^{*} satisfy the claim. The (n+1)(n+1)-state automaton An+1A_{n+1} satisfying the claim is constructed as follows. Let An+1=(Σn+1,Qn+1,δn+1)A_{n+1}=(\Sigma_{n+1},Q_{n+1},\delta_{n+1}).

  • Its alphabet is Σn+1=ΣnΣn{#}\Sigma_{n+1}=\overrightarrow{\Sigma_{n}}\cup\overleftarrow{\Sigma_{n}}\cup\{\#\}, where Σn={aaΣn}\overrightarrow{\Sigma_{n}}=\{\,\overrightarrow{a}\mid a\in\Sigma_{n}\,\} and Σn={aaΣn}\overleftarrow{\Sigma_{n}}=\{\,\overleftarrow{a}\mid a\in\Sigma_{n}\,\}

  • The set of states is Qn+1=Qn{n+1}={1,,n+1}Q_{n+1}=Q_{n}\cup\{n+1\}=\{1,\ldots,n+1\}.

  • The transition function is defined as follows. In the new state n+1n+1, the automaton moves by all symbols with arrows in the directions pointed by the arrows.

    δn+1(n+1,a)\displaystyle\delta_{n+1}(n+1,\overrightarrow{a}) =(n+1,+1),\displaystyle=(n+1,+1), for aΣ\displaystyle\text{for }a\in\Sigma
    δn+1(n+1,a)\displaystyle\delta_{n+1}(n+1,\overleftarrow{a}) =(n+1,1),\displaystyle=(n+1,-1), for aΣ\displaystyle\text{for }a\in\Sigma
    In all old states 1,,n1,\ldots,n, on symbols with arrows, the new automaton works in the same way as the automaton AnA_{n} on the corresponding symbols without arrows.
    δn+1(i,a)=δn+1(i,a)\displaystyle\delta_{n+1}(i,\overrightarrow{a})=\delta_{n+1}(i,\overleftarrow{a}) =δn(i,a),\displaystyle=\delta_{n}(i,a), for aΣ and i{1,,n}\displaystyle\text{for }a\in\Sigma\text{ and }i\in\{1,\ldots,n\}
    By the new separator symbol #\#, only two transitions are defined. In the state n+1n+1, the automaton moves to the left in the state nn, thus starting the automaton AnA_{n} on the substring to the left.
    δn+1(n+1,#)\displaystyle\delta_{n+1}(n+1,\#) =(n,1)\displaystyle=(n,-1)
    And if the automaton gets to #\# in the state 11 (which happens after concluding the simulation of AnA_{n} on the substring to the left), then the automaton moves to the right in the state nn to start the simulation of AnA_{n} also on the substring to the right of the separator #\#.
    δn+1(1,#)\displaystyle\delta_{n+1}(1,\#) =(n,+1)\displaystyle=(n,+1)

    The rest of transitions are undefined.

Refer to caption

Figure 4: Computation of the automaton An+1A_{n+1} on the string wn+1w_{n+1}.

Note that once the automaton An+1A_{n+1} leaves the state n+1n+1, it never returns to it, because there are no transitions to n+1n+1 from any other state. Let h:(ΣnΣn)Σnh\colon(\overrightarrow{\Sigma_{n}}\cup\overleftarrow{\Sigma_{n}})^{*}\to\Sigma_{n}^{*} be a string homomorphism which removes the arrow from the top of every symbol, that is, h(a)=h(a)=ah(\overrightarrow{a})=h(\overleftarrow{a})=a for all aΣna\in\Sigma_{n}. The automaton An+1A_{n+1} works in the states 1,,n1,\ldots,n on symbols from ΣnΣn\overrightarrow{\Sigma_{n}}\cup\overleftarrow{\Sigma_{n}} as AnA_{n} works on the corresponding symbols from Σn\Sigma_{n}. Then, if h(w)=wnh(w)=w_{n} for some w(ΣnΣn)w\in(\overrightarrow{\Sigma_{n}}\cup\overleftarrow{\Sigma_{n}})^{*}, it follows that the automaton An+1A_{n+1}, having started in the state nn at any symbol of ww, eventually leaves the string ww by moving to the right in the state 11. Furthermore, if |w|<|wn||w|<|w_{n}| for some string w(ΣnΣn)w\in(\overrightarrow{\Sigma_{n}}\cup\overleftarrow{\Sigma_{n}})^{*}, then the automaton An+1A_{n+1}, having started in the state nn at any symbol of ww, cannot leave the string by moving to the right in the state 11.

The string wn+1w_{n+1} is defined as wn#wn\overrightarrow{w_{n}}\#\overleftarrow{w_{n}}, where a1a=a1a\overrightarrow{a_{1}\ldots a_{\ell}}=\overrightarrow{a_{1}}\ldots\overrightarrow{a_{\ell}} and a1a=a1a\overleftarrow{a_{1}\ldots a_{\ell}}=\overleftarrow{a_{1}}\ldots\overleftarrow{a_{\ell}} for every string a1aΣna_{1}\ldots a_{\ell}\in\Sigma_{n}^{*}. The length of wn+1w_{n+1} is |wn+1|=2|wn|+1=2(32n21)+1=32n11|w_{n+1}|=2|w_{n}|+1=2(3\cdot 2^{n-2}-1)+1=3\cdot 2^{n-1}-1, as desired.

First, it is proved that the automaton An+1A_{n+1} works on the string wn+1w_{n+1} as stated in the first point of the claim. Let An+1A_{n+1} start its computation on the string wn+1w_{n+1} at any symbol in the state n+1n+1, as shown in Figure 4. By the symbols in wn\overrightarrow{w_{n}}, the automaton moves to the right, maintaining the state n+1n+1; by the symbols in wn\overleftarrow{w_{n}}, it moves to the left in n+1n+1. Thus, wherever the automaton begins, it eventually arrives to the separator #\# in the state n+1n+1. Next, the automaton moves to the last symbol of wn\overrightarrow{w_{n}} in the state nn. Since h(wn)=wnh(\overrightarrow{w_{n}})=w_{n}, the automaton An+1A_{n+1} operates on wn\overrightarrow{w_{n}} as AnA_{n} on wnw_{n}, and leaves wn\overrightarrow{w_{n}} by a transition to the right in the state 11. Then An+1A_{n+1} arrives to the separator #\# again, now in the state 11, and moves to the first symbol of wn\overleftarrow{w_{n}} in the state nn. As h(wn)=wnh(\overleftarrow{w_{n}})=w_{n}, the automaton An+1A_{n+1} works as AnA_{n} on wnw_{n}, and leaves wn\overleftarrow{w_{n}} (and the whole string wn+1w_{n+1}) by moving to the right in the state 11.

Turning to the second point of the claim, it should be proved that computations of a certain form are impossible on any strings shorter than wn+1w_{n+1}. Let wΣn+1w\in\Sigma_{n+1}^{*} be a string, and let there be a position in ww, such that the automaton An+1A_{n+1}, having started at this position in the state n+1n+1, eventually leaves the string ww by a transition to the right in the state 11. It is claimed that |w||wn+1||w|\geqslant|w_{n+1}|.

Consider the computation of An+1A_{n+1} leading out of ww to the right in the state 11. It begins in the state n+1n+1, and the automaton maintains the state n+1n+1 at all symbols except #\#. In order to reach the state 11, there should be a moment in the computation on ww when the automaton arrives at some symbol #\# in the state n+1n+1. Let uu be the prefix of ww to the left of this #\#, and let vv be the suffix to the right of this #\#; note that the substrings uu and vv may contain more symbols #\#. It is sufficient to prove that |u||wn||u|\geqslant|w_{n}| and |v||wn||v|\geqslant|w_{n}|.

Refer to caption

Figure 5: The partition w=u#vw=u\#v and the suffix v0v_{0} of vv.

Consider first the case of the suffix vv. Let v0v_{0} be the longest suffix of vv that does not contain the symbol #\#; then the symbol preceding v0v_{0} in ww is the separator #\#, as shown in Figure 5. Once the automaton An+1A_{n+1} steps from the last #\# in ww to the right, it arrives to the first symbol of v0v_{0} in the state nn (by the unique transition to the right at #\#). The string v0v_{0} cannot be empty, because n1n\neq 1. Once the automaton is inside v0v_{0}, it cannot return to #\# anymore, since it has already used the only transition to the right from #\#, and cannot use it again without looping. Therefore, the automaton An+1A_{n+1} starts on the string v0(Σn+1{#})v_{0}\in(\Sigma_{n+1}\setminus\{\#\})^{*} in the state nn, and, operating as AnA_{n}, eventually leaves this string to the right in the state 11. Then |v0||wn||v_{0}|\geqslant|w_{n}| by the induction hypothesis, and hence |v||wn||v|\geqslant|w_{n}|.

Refer to caption

Figure 6: The case of computations on uu not reaching any separators.

Now consider the prefix uu. Once the automaton An+1A_{n+1} comes in the state n+1n+1 to the separator #\# between uu and vv, it moves to the last symbol of uu in the state nn. In order to leave the string uu to the right and proceed further, it must return to the separator #\# in the state 11, because there are no transitions by any states {2,,n}\{2,\ldots,n\} at this separator. If there are no symbols #\# in uu, or if there are some, but the automaton does not reach them, then the entire computation of An+1A_{n+1} on uu takes place on a certain suffix of uu that does not contain #\#, as illustrated in Figure 6. This computation follows a computation of AnA_{n} on a string from Σn\Sigma_{n}^{*}. Then, by the induction hypothesis, this suffix is not shorter than wnw_{n}, and therefore |u||wn||u|\geqslant|w_{n}|.

The remaining case is when the automaton comes to some symbol #\# inside the string uu. Let u0u_{0} be the maximal suffix of uu not containing any symbols #\#, as in Figure 7. The automaton An+1A_{n+1} visits the separator #\# to the left of u0u_{0}, and then immediately moves from this separator back to the first symbol of u0u_{0} in the state nn (the string u0u_{0} is non-empty, because it is followed by #\#, which has no transitions in the state nn). Returning back to #\# to the left of u0u_{0} is not an option, since the unique transition by #\# to the right has been used already. Therefore, the automaton leaves u0u_{0} by a transition to the right, and comes to the separator #\# between uu and vv. In order to continue the computation, it should come there in the state 11. By the induction hypothesis for this computation on u0u_{0}, the length of u0u_{0} is at least |wn||w_{n}|. Then the length of the entire uu is also at least |wn||w_{n}|.

Refer to caption

Figure 7: The case of computations on uu reaching a separator #\# inside uu.

This confirms that |w|=|u|+1+|v||wn|+1+|wn|=|wn+1||w|=|u|+1+|v|\geqslant|w_{n}|+1+|w_{n}|=|w_{n+1}| and completes the proof. ∎

5 Conclusion

The maximum length of the shortest accepted string for direction-determinate 2DFA has been determined precisely, whereas for 2DFA of the general form, a lower bound of the order 2n2^{n} has been established. The known upper bound on this length is of the order 4n4^{n}. Bounds on the maximum length of shortest strings for small values of the number of states nn are given in Table 2.

nn direction-determinate 2DFA of the general form
2DFA lower bound computed values upper bound
(nn/2)1\binom{n}{\lfloor n/2\rfloor}-1 32n213\cdot 2^{n-2}-1 (2nn+1)1\binom{2n}{n+1}-1
2 1 2 2 3
3 2 5 6 14
4 5 11 17 55
5 9 23 32 209
6 19 47 791
Table 2: The maximum length of shortest accepted strings for nn-state 2DFA, for small nn.

In the table, besides the theoretical bounds, there are also some computed values of the length of shortest strings in some automata. The example for n=3n=3 was obtained by exhaustive search, while the examples for n=4n=4 and n=5n=5 were found by heuristic search. Therefore, the maximum length of the shortest string for 3-state automata is now known precisely, for 4-state automata it is at least 17 and possibly more, and the given length of strings for 5 states is most likely much less than possible. The computations of the automata found for n=3n=3 and n=4n=4 on their shortest strings are presented in Figure 8.

It should be noted that these computed values exceed the theoretical lower bound 342n1\frac{3}{4}\cdot 2^{n}-1 proved in this paper, and are much less than the known upper bound (2nn+1)1\binom{2n}{n+1}-1. Thus, the bounds for 2DFA of the general form are still in need of improvement.

Refer to caption

Figure 8: Automata found by computer programs, and their shortest strings: (top) 3 states, string of length 6; (bottom) 4 states, string of length 17.

References