This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: National Research University Higher School of Economics
11email: rubtsov99@gmail.com

A Linear-time Simulation of Deterministic dd-Limited Automata

Alexander A. Rubtsov Supported by Russian Science Foundation grant 20–11–20203 0000-0001-8850-9749

A dd-limited automaton is a Turing machine that may rewrite each input cell at most dd times. Hibbard (1967) showed that for every d2d\geqslant 2 such automata recognize all context-free languages and that deterministic dd-limited automata form a strict hierarchy. Later, Pighizzini and Pisoni proved that the second level of this hierarchy coincides with deterministic context-free languages (DCFLs). We present a linear-time recognition algorithm for deterministic dd-limited automata in the RAM model, thereby extending linear-time recognition beyond DCFLs. We further generalize this result to deterministic d(n)d(n)-limited automata, where the bound dd may depend on the input length nn. In addition, we prove an O(nkd(n)+m)O(n\cdot k\cdot d(n)+m) bound for the membership problem, where the input includes both the word and the automaton’s description, with mm denoting the size of the description and kk the number of states.

1 Introduction

Context-free languages (CFLs) play a central role in computer science. Their deterministic subclass (DCFLs) is especially important in compiler construction, where parsing is based on the connection between LR(1){\mathrm{LR}}(1)-grammars and deterministic pushdown automata (DPDA). In 1965 Knuth showed [9] that LR(1){\mathrm{LR}}(1)-grammars generate exactly the class of DCFLs, and DPDAs provide linear-time parsing algorithms for them. Thus, DCFLs form a practically significant subclass of CFLs: they are recognizable in linear time, and LR(1){\mathrm{LR}}(1)-grammars admit linear-time construction of derivation trees. All these linear-time bounds, both in classical parsing theory and in this paper, are measured in the RAM model. We emphasize that the situation is different for Turing machines: Hennie showed [7] that any language recognizable in linear time on a Turing machine is regular, so stronger models such as RAM are required to capture the linear-time parsing of non-regular languages.

It is important to distinguish between two closely related problems. We follow the convention that in the recognition problem, the language is fixed and the input is only the word ww, while in the membership problem, both a description of the language (for instance, a context-free grammar) and the word ww are given as input.

According to this convention, the best known upper bound for the recognition problem for context-free languages is O(nω)O(n^{\omega}), where ω2.373\omega\leqslant 2.373 is the exponent of fast matrix multiplication. This bound was obtained by Valiant in 1975 [20], and his algorithm decides whether a given word belongs to the language generated by a fixed context-free grammar, but it does not construct a parse tree. Subsequent work [12, 1] confirmed that the same setting — fixed grammar and variable input word — is considered, and provided strong evidence that this upper bound is hard to substantially improve.

1.1 dd-Limited Automata and dd-DCFLs

We next recall the notion of dd-limited automata, introduced by Hibbard [8]. A dd-limited automaton (dd-LA) is a nondeterministic Turing machine that scans only the cells containing the input word together with end-markers, and is allowed to rewrite a symbol in each cell (except the end-markers) only during its first dd visits. Hibbard showed that for every d2d\geqslant 2 dd-LAs recognize precisely the class of context-free languages; it is also known that 11-LAs recognize exactly the class of regular languages [21]. For d=d=\infty, the model coincides with linear-bounded automata, which recognize the class of context-sensitive languages.

For the deterministic case, Pighizzini and Pisoni [14] proved that deterministic 22-LAs recognize exactly the deterministic context-free languages (DCFLs) by providing an algorithm that transforms a 2-LA into a PDA while preserving determinism; the inverse transformation had already been established by Hibbard [8]. Following Hibbard, we call a language recognized by a deterministic dd-LA a dd-deterministic language (dd-DCFL) and denote such automata by dd-DLAs. Hibbard also established that the hierarchy is strict: for every fixed dd, the class of (d+1)(d\!+\!1)-DCFLs properly contains the class of dd-DCFLs.

Although DCFLs are widely used, they suffer from certain practical limitations. First, they are not closed under reversal: while both

Ld={danbncmn,m0},Le={eambncnn,m0}L_{d}=\{da^{n}b^{n}c^{m}\mid n,m\geqslant 0\},\qquad L_{e}=\{ea^{m}b^{n}c^{n}\mid n,m\geqslant 0\}

are DCFLs, their union is a DCFL, but the reversal (LdLe)R(L_{d}\cup L_{e})^{R} is not. This language can still be recognized by a 33-DLA, which can scan the input from right to left before simulating a 22-DLA for LdLeL_{d}\cup L_{e}.

Second, DCFLs are not closed under union. The language

Ld,e={anbncmn,m0}{ambncnn,m0}L_{d,e}=\{a^{n}b^{n}c^{m}\mid n,m\geqslant 0\}\cup\{a^{m}b^{n}c^{n}\mid n,m\geqslant 0\}

is a union of two DCFLs but is known to be inherently ambiguous [17]. Every DCFL can be generated by an unambiguous grammar, in particular by an LR(1){\mathrm{LR}}(1) grammar, and Hibbard showed [8] that the same holds for dd-DCFLs. Hence the union d1d\bigcup_{d\geqslant 1}d-DCFLs does not cover all CFLs. This illustrates that dd-DCFLs, like DCFLs, are not closed under union; recognizing such languages in linear time via dd-DLAs requires parallelism.

1.2 d(n)d(n)-Limited Automata

We now consider a natural extension of dd-LAs. Assume that dd is not a constant but a function d(n)d(n) depending on the input length nn. The automaton can then rewrite the content of a cell until the number of visits to that cell reaches d(n)d(n). To our knowledge, this is the first time such a generalization has been considered. In classical restrictions on Turing machine computation, the time bound is imposed on the total number of cell visits, whereas here we impose a bound on the number of visits per individual cell (after which the cell remains accessible for reading but no longer for rewriting).

1.3 Our Contribution

We focus on the membership problem for d(n)d(n)-DLAs. Let mm be the length of the description of a d(n)d(n)-DLA 𝒜{\cal A}, kk the number of its states, and nn the length of the input word ww. We present an O(nkd(n)+m)O(n\cdot k\cdot d(n)+m) algorithm in the RAM model for the membership problem. In particular, when 𝒜{\cal A} is fixed and d(n)=O(1)d(n)=O(1) (i.e., for dd-DLAs), this yields a linear-time algorithm. Thus every dd-DCFL is recognizable in linear time in the RAM model.

Hennie proved in [7] that every language recognizable in linear time by a deterministic Turing machine is regular (and a more general result holds for nondeterministic TMs [19, 13]). Hence no dd-DLA with d2d\geqslant 2 can be simulated by a linear-time Turing machine. Guillon and Prigioniero [6] showed that every 11-DLA can be transformed into an equivalent linear-time Turing machine (and an analogous result holds for nondeterministic 11-LAs), and a related construction was also used by Kutrib and Wendlandt [11]. Their approach relies on Shepherdson’s classical simulation of two-way DFAs by one-way DFAs [18]. We build on this idea as well, but it cannot be applied directly because of Hennie’s result: otherwise one could simulate a non-regular language in linear time on a Turing machine, contradicting the theorem.

To overcome this obstacle we transform a classical Turing machine into one that operates on a doubly-linked list instead of a tape and adapt Shepherdson’s construction to this model. This forms the basis of our membership algorithm. We also reinterpret Birget’s algebraic constructions [3] in graph-theoretic terms, which provides subroutines underlying the final version of our algorithm. A related algebraic approach was developed by Kunc and Okhotin [10], who employed transformation semigroups to capture, for each substring, the state-to-state behavior of two-way deterministic finite automata. This line of work closely mirrors Birget’s method via function composition, and our construction follows the same underlying ideas.

We further establish an upper bound of O(d(n)n2)O(d(n)\cdot n^{2}) steps for direct simulation of d(n)d(n)-DLAs, provided the computation does not enter an infinite loop. In particular, this implies an O(n2)O(n^{2}) upper bound for dd-DLAs, which, to the best of our knowledge, was previously open. This bound is tight: it is witnessed by the classical dd-DLA recognizing the language {anbnn0}\{a^{n}b^{n}\mid n\geqslant 0\}.

From a theoretical perspective, our results show that some CFLs are easy (linear-time recognizable), while in general recognition of CFLs may require O(nω)O(n^{\omega}) time, and by conditional results there must exist hard CFLs (recognizable only in superlinear time). We discuss this point in the following subsection.

1.4 Related Results

We begin with linear-time recognizable subclasses of context-sensitive languages (CSLs) and context-free languages (CFLs). E. Bertsch and M.-J. Nederhof [2] showed that a nontrivial subclass of CFLs, the regular closure of DCFLs, is recognizable in linear time. This class consists of all languages obtained by taking a regular expression and replacing each symbol with a DCFL. It evidently contains the aforementioned language Ld,eL_{d,e} (as a union of DCFLs), so it is a strict extension of DCFLs. Note also that Ld,eL_{d,e} is recognizable by 2-DPDAs.

A broad subclass of CSLs recognizable in linear time is given by the class of languages accepted by two-way deterministic pushdown automata (2-DPDAs). A linear-time simulation algorithm for 2-DPDAs was obtained by S. Cook [4]. This class clearly contains DCFLs, and it also includes the language of palindromes over an alphabet of at least two letters, which is a well-known example of a context-free language that is not a DCFL.

A. Rubtsov and N. Chudinov introduced in [15, 16] a computational model, DPPDA, for Parsing Expression Grammars (PEGs). This model extends DCFLs, remains recognizable in linear time, and is based on a modification of classical pushdown storage. It was also shown that the class of languages recognized by 2-DPPDAs is recognizable in linear time. Moreover, they proved that parsing expression languages (the class generated by PEGs) contain a highly nontrivial subclass, namely the Boolean closure of the regular closure of DCFLs.

It remains open whether 2-DPDAs, 2-DPPDAs, or PEGs recognize all CFLs. However, the works of L. Lee [12] and Abboud et al. [1] provide strong evidence that this is very unlikely due to complexity-theoretic considerations: any CFG parser with time complexity O(gn3ε)O(gn^{3-\varepsilon}), where gg is the size of the grammar and nn the input length, can be efficiently converted into an algorithm for multiplying m×mm\times m Boolean matrices in time O(m3ε/3)O(m^{3-\varepsilon/3}). This naturally raises the question: can 2-DPDAs or 2-DPPDAs simulate dd-DLAs for d3d\geqslant 3?

Finally, T. Yamakami presented another extension of Hibbard’s approach [22, 23]: in several models the input tape is one-way read-only, while the work tape obeys a similar restriction, forbidding rewriting beyond the first dd visits. We leave the application of our simulation technique to such models as a direction for future research.

2 Definitions

In this section we give precise definitions of the computational models used in the paper. We begin with Hibbard’s original model of dd-limited automata, introduced in [8]. We provide below a concise definition; a more formal equivalent definition can be found in [14]. Next, we introduce a modified variant, called the deleting automaton, in which the tape is replaced by a doubly linked list and cells may be deleted; this auxiliary model is central to our simulation technique. Finally, we define d(n)d(n)-limited automata, a natural extension of dd-limited automata where the rewriting bound depends on the length of the input word. For clarity, we formulate our simulation algorithm for the case of a fixed constant dd, since the extension to d(n)d(n) is straightforward.

2.1 dd-Limited Automaton

Let d0d\geqslant 0 be a fixed integer. A deterministic dd-limited automaton (dd-DLA) is a deterministic single-tape Turing machine whose tape initially contains the input word bordered by the left endmarker \vartriangleright and the right endmarker \vartriangleleft. Each tape symbol is annotated with an integer in {0,,d}\{0,\dots,d\}, called its rank. Initially, all letters of the input word have rank 0, while the endmarkers have rank dd. Whenever the head visits a cell containing a symbol of rank r<dr<d, the symbol may be overwritten with a new symbol of rank rr^{\prime} such that r<rdr<r^{\prime}\leqslant d. Symbols of rank dd are read-only and cannot be changed.

Formally, a dd-DLA 𝒜{\cal A} is defined by a tuple

𝒜=(Q,Σ,Γ,δ,q0,F),{\cal A}=(Q,\,\Sigma,\,\Gamma,\,\delta,\,q_{0},\,F),

where QQ is a finite set of states, Σ\Sigma is the input alphabet (symbols of rank 0), Γ\Gamma is the tape alphabet with Σ{,}Γ\Sigma\cup\{\vartriangleright,\vartriangleleft\}\subseteq\Gamma, for each rr we let ΓrΓ\Gamma_{r}\subseteq\Gamma denote the set of symbols of rank rr, q0Qq_{0}\in Q is the initial state, FQF\subseteq Q is the set of accepting states, and δ\delta is the transition function

δ:Q×ΓQ×Γ×{,},\delta:Q\times\Gamma\to Q\times\Gamma\times\{\leftarrow,\rightarrow\},

such that:

  • for arΓra_{r}\in\Gamma_{r} with r<dr<d, each transition is of the form

    δ(q,ar)=(q,ar,m),\delta(q,a_{r})=(q^{\prime},a_{r^{\prime}},m),

    where arΓr{,}a_{r^{\prime}}\in\Gamma_{r^{\prime}}\setminus\{\vartriangleright,\vartriangleleft\}, r<rdr<r^{\prime}\leqslant d, and m{,}m\in\{\leftarrow,\rightarrow\};

  • for adΓda_{d}\in\Gamma_{d}, each transition is of the form

    δ(q,ad)=(q,ad,m),\delta(q,a_{d})=(q^{\prime},a_{d},m),

    where m{,}m\in\{\leftarrow,\rightarrow\}; moreover, if ad=a_{d}=\vartriangleright then m=m=\rightarrow, and if ad=a_{d}=\vartriangleleft then m=m=\leftarrow.

A dd-DLA starts in state q0q_{0} with the head positioned on the first input symbol (immediately to the right of the left endmarker \vartriangleright). At each step, given a state qq and the symbol aa under the head, it computes δ(q,a)=(q,a,m)\delta(q,a)=(q^{\prime},a^{\prime},m), replaces aa with aa^{\prime}, updates its state to qq^{\prime}, and moves the head left or right according to mm. The input is accepted if the automaton reaches a state qfFq_{f}\in F while scanning the right endmarker \vartriangleleft; in this case the computation halts.

We slightly modify Hibbard’s original definition by requiring the transition function to be total. This modification does not change the class of recognizable languages and comes at the cost of a single additional state.

2.2 Deterministic Linked-List Automaton

We next introduce an auxiliary model, which we call the Deterministic Linked-List Automaton (DLLA). In this model there is no constraint on the number of visits to a cell. The tape is replaced by a doubly linked list, so the automaton may delete any cell between the endmarkers (but never the endmarkers themselves). Formally, a DLLA has the same components as a dd-DLA (states, input and tape alphabets, initial and accepting states), but with a modified transition function:

δ:Q×Γ(Q×(Γ{})×{,}){},\delta:Q\times\Gamma\to\bigl{(}Q\times(\Gamma\cup\{\perp\})\times\{\leftarrow,\rightarrow\}\bigr{)}\cup\{\uparrow\},

where the special symbol \perp indicates that the current cell is to be deleted immediately after the head leaves it. Once a cell is deleted, the head moves directly from its left neighbor to its right neighbor when moving right, and symmetrically when moving left. If the transition function returns \uparrow, the computation halts and the input is rejected. No additional restrictions are imposed on the transition function.

It is easy to see that DLLAs recognize exactly the class of deterministic context-sensitive languages. They can simulate deterministic linear-bounded automata (DLBA) directly, and conversely, a DLBA can simulate a DLLA by marking a deleted cell with a special symbol. We employ the doubly linked list representation in order to achieve the claimed upper bounds for deterministic d(n)d(n)-DLAs.

2.3 d(n)d(n)-Limited Automaton

For dd-DLAs it is convenient to associate with each letter ara_{r} its rank rr, representing the number of visits to the corresponding cell. For d(n)d(n)-DLAs this is no longer possible, since the alphabet and the machine description are fixed and cannot grow with the input length. Instead, in a d(n)d(n)-DLA each tape cell maintains its own visit counter, so the rank is associated with the cell rather than the symbol. Formally, each cell (except the endmarkers ,\vartriangleright,\vartriangleleft) contains a pair (a,e)(a,e), where aΓa\in\Gamma and ee is a bit. The bit ee is initially 0, and remains 0 while the number of visits to the cell is less than d(n)d(n); once the number of visits reaches d(n)d(n), ee is set to 11, and from that point on the cell becomes read-only.

This modification of dd-DLAs does not affect our simulation algorithm. The value of d(n)d(n) is fixed for a given input length nn, and the algorithm never relies on dd being a constant rather than a precomputed parameter. Therefore, it suffices to describe the simulation algorithm for dd-DLAs; the extension to d(n)d(n)-DLAs is straightforward.

3 Linear-Time Simulation Algorithm

In this section we present a linear-time simulation algorithm for deterministic dd-limited automata (DLAs). We begin with the recognition problem, where the automaton 𝒜{\cal A} is fixed and only the input word ww of length nn varies. Later we extend the construction to the membership problem, where both the automaton and the word are part of the input.

High-level idea. Our approach is inspired by Shepherdson’s classical simulation of two-way deterministic finite automata by one-way DFAs [18]. The key observation is that whenever a dd-DLA 𝒜{\cal A} produces a block of cells all of rank dd, the precise contents of these cells are no longer relevant: what matters is only how 𝒜{\cal A} can enter and leave this block. We therefore compress each such maximal block into a single cell containing a compact mapping that summarizes the block’s effect on the computation. If two adjacent blocks are compressed, their mappings can be merged by composition.

To implement this idea, we simulate 𝒜{\cal A} by a deterministic linked-list automaton MM that uses a doubly linked list in place of a tape. The machine MM performs two types of steps:

  • 𝒜{\cal A}-moves, which directly simulate moves of 𝒜{\cal A} on symbols of rank <d<d;

  • technical moves, which occur when MM encounters a compressed block. In this case MM consults the mapping stored in the corresponding cell to decide how 𝒜{\cal A} would leave the block, and moves accordingly.

In this way, long stretches of redundant dd-rank cells are collapsed into constant-size summaries, ensuring that each cell contributes only a bounded number of times to the overall running time.

In the remainder of this section we first present the simulation algorithm for recognition, together with a correctness proof and amortized analysis. We then introduce the technical machinery of mappings and composition, which allows us to extend the algorithm to the membership problem and to prove the claimed bound.

3.1 Preparations

Directed states.

For convenience, we write p\overrightarrow{p} (resp. p\overleftarrow{p}) to denote a state pp entered from the left (resp. right). Formally, if δ(q,X)=(p,Y,m)\delta(q,X)=(p,Y,m) with m{,}m\in\{\leftarrow,\rightarrow\}, we abbreviate it as δ(q,X)=(p,Y)\delta(q,X)=(\overleftrightarrow{p},Y), where we write p=(p,m)\overleftrightarrow{p}=(p,m). We call such pairs directed states, and use this notation for both dd-DLAs and DLLAs. We write

Q=Q×{},Q=Q×{},Q=QQ,\overleftarrow{Q}=Q\times\{\leftarrow\},\quad\overrightarrow{Q}=Q\times\{\rightarrow\},\quad\overleftrightarrow{Q}=\overleftarrow{Q}\cup\overrightarrow{Q},
A=A{}for A{Q,Q,Q}.A_{\uparrow}=A\cup\{\uparrow\}\;\;\text{for }A\in\{\overleftarrow{Q},\overrightarrow{Q},\overleftrightarrow{Q}\}.

Mappings.

The key idea is to collapse long segments of rank-dd cells into a single object. When 𝒜{\cal A} makes the dd-th visit to a cell and writes a symbol of rank dd, we replace that cell by a mapping ff. Formally, a mapping is a function

f:QQ;f:\overleftrightarrow{Q}\to\overleftrightarrow{Q}_{\uparrow};

it specifies, given the entry state and the entry direction, the exit state and the exit direction when 𝒜{\cal A} leaves the block (or \uparrow if it never does). For a mapping describing a single rank-dd cell, the entry direction is irrelevant, but for multi-cell segments it matters.

Segment traversal.

We denote by W𝒜[i]W_{\cal A}[i] the ii-th cell of 𝒜{\cal A}’s tape and by

W𝒜[l,r]=W𝒜[l]W𝒜[r1]W_{\cal A}[l,r]=W_{\cal A}[l]\cdots W_{\cal A}[r-1]

the segment of 𝒜{\cal A}’s tape. When we say that the head enters the segment W𝒜[l,r]W_{\cal A}[l,r] in the directed state q\overleftrightarrow{q} we mean that q=q\overleftrightarrow{q}=\overrightarrow{q} corresponds to the head entering W𝒜[l]W_{\cal A}[l] from the left, and q=q\overleftrightarrow{q}=\overleftarrow{q} corresponds to entering W𝒜[r1]W_{\cal A}[r-1] from the right. Symmetrically, p=p\overleftrightarrow{p}=\overrightarrow{p} means that the head exits through W𝒜[r1]W_{\cal A}[r-1] to the right, and p=p\overleftrightarrow{p}=\overleftarrow{p} means it exits through W𝒜[l]W_{\cal A}[l] to the left, and the head leaves the segment W𝒜[l,r]W_{\cal A}[l,r] in the directed state p\overleftrightarrow{p}.

Segment description mappings.

We say that a mapping ff describes a segment W𝒜[l,r]W_{\cal A}[l,r] if

  • all letters in W𝒜[l,r]W_{\cal A}[l,r] have rank dd

  • f(q)=pf(\overleftrightarrow{q})=\overleftrightarrow{p} if the head enters the segment W𝒜[l,r]W_{\cal A}[l,r] in a directed state q\overleftrightarrow{q}, it leaves the segment in a directed state p\overleftrightarrow{p}.

  • f(q)=f(\overleftrightarrow{q})=\uparrow if the head enters the segment W𝒜[l,r]W_{\cal A}[l,r] in a directed state q\overleftrightarrow{q}, it never leaves the segment (i.e., the computation loops inside).

We denote the set of all possible mappings as

={ff:QQ}{\cal F}=\{f\mid f:\overleftrightarrow{Q}\to\overleftrightarrow{Q}_{\uparrow}\}

Operations with mappings.

Let ff and gg describe segments W𝒜[L,r]W_{\cal A}[L,r] and W𝒜[r,R]W_{\cal A}[r,R] respectively. We define the directed composition \diamond of mappings by setting h=fgh=f\diamond g whenever hh describes the concatenated segment W𝒜[L,R]W_{\cal A}[L,R]. Assume now that the head either enters the segment W𝒜[r,R]W_{\cal A}[r,R] in the directed state q\overrightarrow{q}, or enters W𝒜[L,r]W_{\cal A}[L,r] in q\overleftarrow{q}. We define the departure function

D:××QQ,D:{\cal F}\times{\cal F}\times\overleftrightarrow{Q}\to\overleftrightarrow{Q}_{\uparrow},

which, given mappings f,gf,g and an entry state q\overleftrightarrow{q}, returns the directed state p\overleftrightarrow{p} such that 𝒜{\cal A} leaves the concatenated segment W𝒜[L,R]W_{\cal A}[L,R] in state pp and direction p\overleftrightarrow{p}. If the head never leaves W𝒜[L,R]W_{\cal A}[L,R], then D(f,g,q)=D(f,g,\overleftrightarrow{q})=\uparrow.

Finally, we denote by CF:ΓdCF:\Gamma_{d}\to{\cal F} a function that for each symbol XΓdX\in\Gamma_{d}, returns the mapping CF(X)CF(X) describing the one-cell segment consisting only of XX.

Since QQ is finite, the set {\cal F} is finite as well. Therefore all the functions CFCF, DD, and the operation \diamond are computable in constant time for a fixed automaton 𝒜{\cal A}, because the number of mappings is finite. In the general case (varying 𝒜{\cal A}) we later show that they are computable in O(|Q𝒜|)O(|Q_{\cal A}|). The operation \diamond and the departure function DD are well-defined; this follows directly from the definitions, and we will later give a formal justification.

3.2 Recognition Problem

We now present a high-level simulation algorithm for the recognition problem. Detailed constructions of the subroutines will be given later in Subsection 3.3, where the membership problem is analyzed. The algorithm below both formally defines the DLLA MM that simulates 𝒜{\cal A}, and at the same time serves as the procedure for simulating MM on a RAM model.

Simulation Algorithm 1 is given in pseudocode; its description in natural language is as follows. Since the DLLA MM deletes cells during its run, we refer to the current left and right neighbors of the ii-th cell as i.𝗉𝗋𝖾𝗏i.\mathsf{prev} and i.𝗇𝖾𝗑𝗍i.\mathsf{next}, respectively. When we write j=i.𝗉𝗋𝖾𝗏j=i.\mathsf{prev} we assume that both ii and jj refer to the indices of 𝒜{\cal A}’s tape cells; we do not reenumerate the cells of MM’s tape.

Thus, we denote the ii-th cell of MM’s tape by WM[i]W_{M}[i]. If W𝒜[i]W_{\cal A}[i] contains a letter of rank less than d1d-1, then WM[i]=W𝒜[i]W_{M}[i]=W_{\cal A}[i], and MM behaves exactly as 𝒜{\cal A} (an 𝒜{\cal A}-move). When MM visits a cell for the dd-th time, where 𝒜{\cal A} would write a symbol XX of rank dd (not an end-marker), MM instead writes to that cell a mapping g=CF(X)g=CF(X). When MM writes a mapping gg in a cell ii for the first time, it scans the neighbors i.𝗉𝗋𝖾𝗏i.\mathsf{prev} and i.𝗇𝖾𝗑𝗍i.\mathsf{next} and performs the procedure we call a deletion scan.

  • If none of the neighbors contains a mapping, the scan is finished.

  • If only one of the neighbors i.𝗉𝗋𝖾𝗏i.\mathsf{prev} or i.𝗇𝖾𝗑𝗍i.\mathsf{next} contains a mapping (say ff or hh), then MM replaces ii with fgf\diamond g (or ghg\diamond h) and deletes the neighboring cell.

  • If both neighbors contain mappings, then MM replaces ii with fghf\diamond g\diamond h and deletes both neighboring cells.

After a deletion scan the cell WM[i]W_{M}[i] contains the resulting mapping, say gg, while both its neighbors contain letters. Hence gg describes the segment W𝒜[i.𝗉𝗋𝖾𝗏+1,i.𝗇𝖾𝗑𝗍]W_{\cal A}[i.\mathsf{prev}+1,\,i.\mathsf{next}]. MM then moves the head to the same neighbor (i.𝗉𝗋𝖾𝗏i.\mathsf{prev} or i.𝗇𝖾𝗑𝗍i.\mathsf{next}) where 𝒜{\cal A} would be after leaving the rank-dd segment W𝒜[i.𝗉𝗋𝖾𝗏+1,i.𝗇𝖾𝗑𝗍]W_{\cal A}[i.\mathsf{prev}+1,\,i.\mathsf{next}]. If the head of 𝒜{\cal A} had not quit the segment W𝒜[i.𝗉𝗋𝖾𝗏+1,i.𝗇𝖾𝗑𝗍]W_{\cal A}[i.\mathsf{prev}+1,\,i.\mathsf{next}] right after visiting W𝒜[i]W_{\cal A}[i], the cell where 𝒜{\cal A} arrives after it exits the segment is determined via the departure function DD during the deletion scan.

We have thus described the cases in which MM arrives:

  • at a cell of rank r<dr<d (from any neighbor), and

  • at a cell of rank dd (from another rank-dd cell).

It remains to describe the case when MM arrives at a cell containing a mapping ff in a directed state q\overleftrightarrow{q}, coming from a cell with a letter of rank r<dr<d (lines 11). In this case MM computes f(q)=pf(\overleftrightarrow{q})=\overleftrightarrow{p} and moves the head to the left or right neighbor according to the direction of p\overleftrightarrow{p}, arriving at that neighbor in state pp.

1q:=q0\overleftrightarrow{q}:=\overleftarrow{q_{0}}; i:=1i:=1;
2
3while  no result returned  do
4 case  WM[i]Γr{,},r<d1W_{M}[i]\in\Gamma_{r}\cup\{\vartriangleright,\vartriangleleft\},r<d-1 do /* 𝒜{\cal A}-move */
5    (p,ar):=δ𝒜(q,WM[i])(\overleftrightarrow{p},a_{r^{\prime}}):=\delta_{\cal A}(q,W_{M}[i]);   WM[i]:=arW_{M}[i]:=a_{r^{\prime}};
6    
7 case WM[i]Γd1W_{M}[i]\in\Gamma_{d-1} do /* Deletion scan */
8    (p,X):=δ𝒜(q,WM[i])(\overleftrightarrow{p},X):=\delta_{\cal A}(q,W_{M}[i]);
    g:=CF(X)g:=CF(X) ;
       /* CF(𝒜)\langle CF\rangle({\cal A}) */
9    if  i.𝗉𝗋𝖾𝗏>0i.\mathsf{prev}>0 and WM[i.𝗉𝗋𝖾𝗏]W_{M}[i.\mathsf{prev}]\in{\cal F} then
10       f:=WM[i.𝗉𝗋𝖾𝗏]f:=W_{M}[i.\mathsf{prev}];
11       if p=p\overleftrightarrow{p}=\overleftarrow{p} then p:=D(f,g,p)\overleftrightarrow{p}:=D(f,g,\overleftarrow{p});
12       if  p=\overleftrightarrow{p}=\,\uparrow  then  return Reject;
       g:=fgg:=f\diamond g;   i.𝗉𝗋𝖾𝗏:=(i.𝗉𝗋𝖾𝗏).𝗉𝗋𝖾𝗏i.\mathsf{prev}:=(i.\mathsf{prev}).\mathsf{prev};
          /* (𝒜)\langle\diamond\rangle({\cal A}). */
13       
14      end if
15    if  i.𝗇𝖾𝗑𝗍<n+1i.\mathsf{next}<n+1 and WM[i.𝗇𝖾𝗑𝗍]W_{M}[i.\mathsf{next}]\in{\cal F} then
16       h:=WM[i.𝗇𝖾𝗑𝗍]h:=W_{M}[i.\mathsf{next}];
17       if p=p\overleftrightarrow{p}=\overrightarrow{p} then  p=D(g,h,p)\overleftrightarrow{p}=D(g,h,\overrightarrow{p});
18          /* D(𝒜)\langle D\rangle({\cal A}). */ if  p=\overleftrightarrow{p}=\,\uparrow  then  return Reject;
       g:=ghg:=g\diamond h;   i.𝗇𝖾𝗑𝗍:=(i.𝗇𝖾𝗑𝗍).𝗇𝖾𝗑𝗍i.\mathsf{next}:=(i.\mathsf{next}).\mathsf{next};
          /* (𝒜)\langle\diamond\rangle({\cal A}). */
19       
20      end if
21    WM[i]:=gW_{M}[i]:=g;
22    
23 case WM[i]W_{M}[i]\in{\cal F} do
24    f:=WM[i]f:=W_{M}[i];
25    if  f(q)=f(\overleftrightarrow{q})=\,\uparrow  then return Reject else p=f(q)\overleftrightarrow{p}=f(\overleftrightarrow{q});
26      
27   end case
28 if  p=p\overleftrightarrow{p}=\overleftarrow{p}  then i:=i.𝗉𝗋𝖾𝗏i:=i.\mathsf{prev} else i:=i.𝗇𝖾𝗑𝗍i:=i.\mathsf{next};
29    q:=p\overleftrightarrow{q}:=\overleftrightarrow{p};
30 if  qFq\in F and WM[i]=W_{M}[i]=\,\vartriangleleft  then  return Accept;
31 
32 end while
Algorithm 1 Simulation Algorithm

Before proving correctness we fix notation for time indices. Let W𝒜t[i]W_{\cal A}^{t}[i] denote the content of cell ii of 𝒜{\cal A}’s tape after tt steps of 𝒜{\cal A}, and let WMt[i]W_{M}^{t^{\prime}}[i] denote the content of cell ii of MM’s tape after tt^{\prime} steps of MM. We consider only regular steps, i.e. steps performed on symbols of rank <d<d or on endmarkers. Every regular step tt of 𝒜{\cal A} has a corresponding regular step tt^{\prime} of MM, and the mapping ttt\mapsto t^{\prime} is strictly increasing (if t1<t2t_{1}<t_{2} then t1<t2t^{\prime}_{1}<t^{\prime}_{2}). This correspondence will be used below.

Lemma 1

For each dd-DLA 𝒜{\cal A}, the corresponding DLLA MM simulates 𝒜{\cal A}. More precisely, there exists an order-preserving correspondence ttt\mapsto t^{\prime} such that:

  • (i)

    for every regular step tt of 𝒜{\cal A} with the corresponding step tt^{\prime} of MM, if 𝒜{\cal A} visits cell ii with a symbol of rank <d<d or an endmarker, then W𝒜t[i]=WMt[i]W_{\cal A}^{t}[i]=W_{M}^{t^{\prime}}[i];

  • (ii)

    at steps tt and tt^{\prime} 𝒜{\cal A} and MM are in the same state when arriving at cell ii;

  • (iii)

    MM accepts an input iff 𝒜{\cal A} does.

Proof

Let 𝒜{\cal A} perform NN moves

δ1,,δN,δiQ×Γ×{,},δ1=δ(q0,W𝒜0[1])\delta_{1},\ldots,\delta_{N},\quad\delta_{i}\in Q\times\Gamma\times\{\leftarrow,\rightarrow\},\quad\delta_{1}=\delta(q_{0},W^{0}_{\cal A}[1]) (1)

on a fixed input, and either accepts the input or enters a loop.

We call a move δi=(q,a,m)\delta_{i}=(q,a,m) a dd-move if aa has rank dd but is not an endmarker; otherwise we call it a regular move. Thus a run is a sequence (1) partitioned into alternating segments of regular moves and dd-moves.

For MM we define runs analogously: regular moves are the same, while a dd-move is either a step into a cell containing a mapping, or a step of the deletion scan initiated on a cell with a symbol of rank d1d-1.

We claim:

  1. (i)

    if we delete all dd-moves from the runs of 𝒜{\cal A} and MM, the resulting sequences of regular moves are identical;

  2. (ii)

    after each maximal block of dd-moves, both automata end in the same cell and in the same state.

From these two properties it follows that MM accepts exactly the same words as 𝒜{\cal A}, because accepting configurations are reached by regular moves only. The correspondence ttt\mapsto t^{\prime} between regular steps of 𝒜{\cal A} and MM is then precisely the index matching described before the lemma.

Note that property (i) follows immediately from (ii), since between two blocks of dd-moves both automata perform the same sequence of regular moves. Indeed, once the heads are in the same cell with the same symbol of rank <d<d and in the same state, the subsequent regular moves of MM coincide with those of 𝒜{\cal A} by construction.

It remains to prove (ii). Assume that before some dd-block both 𝒜{\cal A} and MM are in the same state on the same cell. We distinguish two cases for the first move of the dd-block of 𝒜{\cal A}:

Case 1: the head visits a cell ii containing a symbol aa of rank d1d-1. Then 𝒜{\cal A} rewrites aa to a symbol XX of rank dd. In the corresponding move MM writes into ii the mapping fXf_{X} describing the one-cell segment at ii. If the dd-block of 𝒜{\cal A} ends immediately (or on the next step 𝒜{\cal A} visits another rank-(d1)(d-1) cell), then by the definition of fXf_{X} both automata end in the same cell and state. Otherwise, 𝒜{\cal A} proceeds into a neighboring rank-dd cell and eventually leaves the contiguous rank-dd segment. By construction of MM, after the deletion scan and subsequent use of the departure function DD, MM arrives at the same cell and in the same state as 𝒜{\cal A}. Thus both automata synchronize at the end of the dd-block.

Case 2: the head of 𝒜{\cal A} enters a cell of rank dd. Hence 𝒜{\cal A} moves inside a segment of contiguous rank-dd symbols, while MM is positioned at a cell containing a mapping ff describing this segment (the invariant maintained by the deletion scan). Since ff faithfully describes the segment, after MM executes the step f(q)f(\overleftrightarrow{q}), it reaches exactly the same exit cell and state as 𝒜{\cal A}. If 𝒜{\cal A} then continues into a rank-(d1)(d-1) cell, we return to Case 1; otherwise the dd-block ends with synchronization.

Finally, if in either case 𝒜{\cal A} enters a loop, the corresponding mapping for MM returns \uparrow, and MM rejects the input. Thus (ii) holds, which completes the proof. ∎

Now we prove that the simulation algorithm for dd-DLAs works in linear time. We present the proof for the general case of a d(n)d(n)-DLA. The simulation algorithm for d(n)d(n)-DLAs is identical to that for a fixed dd, since its behavior depends only on whether the number of visits to a cell is equal to d(n)1d(n)-1 or smaller. The counters for cell visits required in the case of d(n)d(n)-DLAs can be implemented in the RAM model with O(1)O(1) overhead per operation, and thus do not affect the asymptotic running time.

We denote by F(𝒜)\langle F\rangle({\cal A}) the time complexity of the operation FF on a dd-DLA 𝒜{\cal A}. Since we will prove later that the complexity of each auxiliary step depending on 𝒜{\cal A} is O(|Q|)O(|Q|), we replace CF(𝒜)\langle CF\rangle({\cal A}), D(𝒜)\langle D\rangle({\cal A}), and (𝒜)\langle\diamond\rangle({\cal A}) by

UB(𝒜)=CF(𝒜)+D(𝒜)+(𝒜).\langle U\!B\rangle({\cal A})\;=\;\langle CF\rangle({\cal A})\;+\;\langle D\rangle({\cal A})\;+\;\langle\diamond\rangle({\cal A}). (2)
Lemma 2

The automaton MM performs O(d(n)UB(𝒜)n)O(d(n)\cdot\langle U\!B\rangle({\cal A})\cdot n) steps on processing an input of length nn.

Proof

We use amortized analysis [5], namely the accounting method. Each cell i{1,,n}i\in\{1,\ldots,n\} on MM’s tape has its own budget B[i]B[i] (credit, in the terminology of [5]). We denote by Bt[i]B^{t}[i] the value of the budget of cell ii after step tt.

The budgets are updated according to the following rules:

  • B0[i]=2d(n)B^{0}[i]=2d(n) for all ii;

  • Bt[i]=Bt1[i]1B^{t}[i]=B^{t-1}[i]-1 if at step tt the head visits cell ii and this cell still contains a letter (i.e., it has been visited fewer than d(n)d(n) times);

  • Bt[j]=Bt1[j]1B^{t}[j]=B^{t-1}[j]-1 if at step tt the head enters cell ii from cell jj, where WM[i]W_{M}[i] is a mapping and WM[j]W_{M}[j] is a letter;

  • Bt[i]=Bt1[i]B^{t}[i]=B^{t-1}[i] otherwise.

Budgets are not changed during deletion scans.

Fix a step tt. Suppose that the ii-th cell currently contains a mapping f=WMt[i]f=W^{t}_{M}[i] describing a segment W𝒜[l,r]W_{\cal A}[l,r]. Its neighbors WM[l1]=W𝒜[l1]W_{M}[l-1]=W_{\cal A}[l-1] and WM[r+1]=W𝒜[r+1]W_{M}[r+1]=W_{\cal A}[r+1] have rank <d(n)<d(n), so each has been visited fewer than d(n)d(n) times. When the head moves from a neighbor into the segment, that neighbor pays $1. Until the neighbor itself turns into a mapping, it pays at most d(n)d(n) times for such visits. Once it becomes a mapping, further payments are taken over by the new neighbor. Thus each cell pays $1 for each of its own visits and at most $1 for each visit into an adjacent mapping before it itself turns into a mapping. Since this can happen only after at most d(n)d(n) visits of the cell, each cell pays at most 2d(n)2d(n) dollars in total. Therefore, the described budget strategy guarantees Bt[i]0B^{t}[i]\geq 0 for all tt and ii.

Deletion scans were not counted above. Clearly there are at most O(n)O(n) scans, since each cell can initiate at most one. During one scan, at most two directed compositions are computed, each in O(UB(𝒜))O(\langle U\!B\rangle({\cal A})). Hence all deletion scans together cost O(nUB(𝒜))O(n\cdot\langle U\!B\rangle({\cal A})) time.

Summing up, MM performs the following kinds of operations:

  1. 1.

    moves that end on a letter, but not an endmarker;

  2. 2.

    moves that end on a mapping;

  3. 3.

    deletion scans;

  4. 4.

    moves that end on an endmarker.

By amortized analysis, Cases 1 and 2 together take O(d(n)UB(𝒜)n)O(d(n)\cdot\langle U\!B\rangle({\cal A})\cdot n): the total number of such moves is O(d(n)n)O(d(n)\cdot n) (since i=1nB0[i]=2nd(n)\sum_{i=1}^{n}B^{0}[i]=2n\cdot d(n)), and each move costs O(UB(𝒜))O(\langle U\!B\rangle({\cal A})). Case 3 costs O(nUB(𝒜))O(n\cdot\langle U\!B\rangle({\cal A})), as discussed. For Case 4, note that after the head leaves the left endmarker \vartriangleright on the very first move, each endmarker can only be visited when arriving from an inner cell. Hence the total number of endmarker visits does not exceed the number of visits to all other cells, which is O(nd(n))O(n\cdot d(n)). Since each simulation step of Algorithm 1 takes at most O(UB(𝒜))O(\langle U\!B\rangle({\cal A})) time, endmarker visits also cost O(d(n)UB(𝒜)n)O(d(n)\cdot\langle U\!B\rangle({\cal A})\cdot n).

Thus the total running time of MM on an input of length nn is O(d(n)UB(𝒜)n)O(d(n)\cdot\langle U\!B\rangle({\cal A})\cdot n). ∎

3.3 Membership Problem

To analyze the membership problem for dd-DLAs we need to formalize the auxiliary operations on mappings. Recall that mappings represent contiguous segments of rank-dd cells and that the simulation algorithm relies on three basic subroutines:

  • the cell description function CFCF, which produces the mapping for a single rank-dd cell;

  • the directed composition \diamond, which merges mappings of adjacent segments into one;

  • the departure function DD, which determines the exit state and direction when the head is located at the boundary between two adjacent segments, i.e., when it enters one segment from the other.

We prove in this subsection that all these operations are well-defined and computable in O(|Q𝒜|)O(|Q_{\cal A}|) time, so UB(𝒜)=O(|Q𝒜|)\langle U\!B\rangle({\cal A})=O(|Q_{\cal A}|).

Our constructions rely on graph representations of mappings. A mapping ff\in{\cal F} describing a segment LL is encoded by a four-partite graph GfG_{f} with parts L𝗂𝗇,L𝗂𝗇,L𝗈𝗎𝗍,L𝗈𝗎𝗍\overrightarrow{L}_{\mathsf{in}},\overleftarrow{L}_{\mathsf{in}},\overrightarrow{L}_{\mathsf{out}},\overleftarrow{L}_{\mathsf{out}}, each a copy of Q𝒜Q_{\cal A}. These four parts form a partition of the set Q\overleftrightarrow{Q} according to their labeling. We also adjoin a distinguished sink vertex ()(\uparrow) of out-degree 0. For every qL𝗂𝗇\overleftrightarrow{q}\in\overleftrightarrow{L}_{\mathsf{in}} we add:

  • an edge qp\overleftrightarrow{q}\to\overleftrightarrow{p} to some pL𝗈𝗎𝗍\overleftrightarrow{p}\in\overleftrightarrow{L}_{\mathsf{out}} if f(q)=pf(\overleftrightarrow{q})=\overleftrightarrow{p},

  • or an edge q()\overleftrightarrow{q}\to(\uparrow) if f(q)=f(\overleftrightarrow{q})=\uparrow.

Thus every vertex has out-degree at most 11: inputs have either one outgoing edge to an output or to ()(\uparrow), while outputs and ()(\uparrow) have out-degree 0. We say that GfG_{f} represents the mapping ff (or the segment LL).

If gg describes an adjacent segment RR, to compute the composition fgf\diamond g and the departure function DD we use the intermediate graph (f,g){\cal I}(f,g) obtained by gluing the graphs GfG_{f} and GgG_{g} (with parts labeled by RR). Formally, (f,g){\cal I}(f,g) is defined as follows: part L𝗈𝗎𝗍\overrightarrow{L}_{\mathsf{out}} is glued with R𝗂𝗇\overrightarrow{R}_{\mathsf{in}}, part L𝗂𝗇\overleftarrow{L}_{\mathsf{in}} with R𝗈𝗎𝗍\overleftarrow{R}_{\mathsf{out}}, and the two sinks ()(\uparrow) are glued together. By gluing two vertices we mean that one of them is deleted and all its edges are reattached to the other. When we glue parts, we glue the vertices carrying the same state label (but with opposite directions). An illustration of (f,g){\cal I}(f,g) is given in Fig. 1.

L𝗂𝗇\overrightarrow{L}_{\mathsf{in}}L𝗂𝗇=R𝗈𝗎𝗍\overleftarrow{L}_{\mathsf{in}}=\overrightarrow{R}_{\mathsf{out}}L𝗈𝗎𝗍\overleftarrow{L}_{\mathsf{out}}L𝗈𝗎𝗍=R𝗂𝗇\overrightarrow{L}_{\mathsf{out}}=\overleftarrow{R}_{\mathsf{in}}R𝗂𝗇\overleftarrow{R}_{\mathsf{in}}R𝗈𝗎𝗍\overrightarrow{R}_{\mathsf{out}}()(\uparrow)edges of ffedges of gg
Figure 1: Graph-based computation of fgf\diamond g and DD.
Proposition 1

The directed composition h=fgh=f\diamond g is determined via the intermediate graph as follows: h(q)=ph(\overleftrightarrow{q})=\overleftrightarrow{p} iff there is a path qp\overleftrightarrow{q}\leadsto\overleftrightarrow{p} to an output vertex, and h(q)=h(\overleftrightarrow{q})=\uparrow iff the unique path from q\overleftrightarrow{q} reaches ()(\uparrow) or falls into a directed cycle. Associativity follows from associativity of concatenating segments (hence of gluing graphs).

Lemma 3

For every dd-DLA 𝒜{\cal A}, the operations \diamond, DD, and CFCF are well-defined and computable in O(|Q𝒜|)O(|Q_{\cal A}|) time, assuming that the description of 𝒜{\cal A} is already stored in RAM.

Proof

The function CFCF is trivially computable in O(|Q𝒜|)O(|Q_{\cal A}|) time: for a rank-dd symbol ada_{d}, the mapping f=CF(ad)f=CF(a_{d}) satisfies f(q)=pf(\overleftrightarrow{q})=\overleftrightarrow{p} iff δ𝒜(q,ad)=(p,ad)\delta_{\cal A}(q,a_{d})=(\overleftrightarrow{p},a_{d}).

We next analyze the algorithms for \diamond and DD. Let VV and EE be the vertex and edge sets of the intermediate graph (f,g){\cal I}(f,g) (Fig. 1). Since each vertex has out-degree at most 1, we have |E|=O(|V|)|E|=O(|V|) and |V|=O(|Q𝒜|)|V|=O(|Q_{\cal A}|). We maintain two global arrays: hh for the resulting mapping and mm for marks on vertices. Initially all values of mm are set to a placeholder \downarrow. These arrays serve as memoization.

The core procedure is the function FindPath (Algorithm 2), which returns the endpoint of the unique path starting from q\overleftrightarrow{q}, or \uparrow if the path falls into a loop. We explain the steps of the algorithm while proving correctness.

1
2Function FindPath(q\overleftrightarrow{q}) :
3 u:=q.𝗇𝖾𝗑𝗍u:=\overleftrightarrow{q}.\mathsf{next};
4 while uL𝗈𝗎𝗍R𝗈𝗎𝗍{()}u\not\in\overleftarrow{L}_{\mathsf{out}}\cup\overrightarrow{R}_{\mathsf{out}}\cup\{(\uparrow)\} do
5    if m[u]=m[u]=\downarrow then
6       m[u]:=qm[u]:=\overleftrightarrow{q}; u:=u.𝗇𝖾𝗑𝗍u:=u.\mathsf{next};
7       
8    else if m[u]=qm[u]=\overleftrightarrow{q} then
       return \uparrow;
          /* Detected a cycle reachable from q\overleftrightarrow{q} */
9       
10    else
       return h[m[u]]h[m[u]];
          /* Path merges with a processed input */
11       
12      end if
13    
14   end while
 return uu;
    /* Reached an output or the sink ()(\uparrow) */
15 
16 end
17
Algorithm 2 FindPath on the intermediate graph

Correctness and complexity of FindPath. Let u0=qu_{0}=\overleftrightarrow{q} and define uk+1=uk.𝗇𝖾𝗑𝗍u_{k+1}=u_{k}.\mathsf{next}. The procedure terminates when one of the following holds:

  • uku_{k} is an output: then FindPath returns uku_{k};

  • uk=()u_{k}=(\uparrow): then FindPath returns \uparrow;

  • uku_{k} is marked: let s=m[uk]\overleftrightarrow{s}=m[u_{k}].

If s=q\overleftrightarrow{s}=\overleftrightarrow{q}, a loop is detected and FindPath returns \uparrow. Otherwise FindPath returns h(s)h(\overleftrightarrow{s}), since h(s)h(\overleftrightarrow{s}) has already been computed and stored in h[m[u]]h[m[u]]. As the paths from q\overleftrightarrow{q} and s\overleftrightarrow{s} merge at the vertex uku_{k}, their endpoints coincide.

The invariant “if m[u]=sm[u]=\overleftrightarrow{s} then h[m[u]]=h(s)h[m[u]]=h(\overleftrightarrow{s})” is established by Algorithm 3, which computes h=fgh=f\diamond g using FindPath.

1
m[u]:=m[u]:=\downarrow for all uVu\in V;
  /* clear marks */
2 h[q]:=h[\overleftrightarrow{q}]:=\downarrow for all qL𝗂𝗇R𝗂𝗇\overleftrightarrow{q}\in\overrightarrow{L}_{\mathsf{in}}\cup\overleftarrow{R}_{\mathsf{in}};
3
4for qL𝗂𝗇R𝗂𝗇\overleftrightarrow{q}\in\overrightarrow{L}_{\mathsf{in}}\cup\overleftarrow{R}_{\mathsf{in}} do
5 h[q]:=FindPath(q)h[\overleftrightarrow{q}]:=\textnormal{{FindPath}}(\overleftrightarrow{q});
6 
7 end for
8
Algorithm 3 Directed composition h=fgh=f\diamond g

Throughout its execution each edge of (f,g){\cal I}(f,g) is traversed at most once before the corresponding vertex is marked, so the total running time is O(|E|+|V|)=O(|Q𝒜|)O(|E|+|V|)=O(|Q_{\cal A}|). Observe that D(f,g,q)D(f,g,\overleftrightarrow{q}) is defined exactly as the endpoint of the path from q\overleftrightarrow{q} in the intermediate graph (f,g){\cal I}(f,g). Hence a single call D(f,g,q)D(f,g,\overleftrightarrow{q}) is realized by FindPath(q)\textnormal{{FindPath}}(\overleftrightarrow{q}) and runs in O(|Q𝒜|)O(|Q_{\cal A}|) time. Therefore \diamond and DD are computable in O(|Q𝒜|)O(|Q_{\cal A}|) time, and together with the bound for CFCF this completes the proof. ∎

3.4 Main Result

We now summarize the simulation in the following theorem.

Theorem 3.1

For every d(n)d(n)-DLA 𝒜{\cal A}, the membership problem for 𝒜{\cal A} is solvable in time O(knd(n)+m)O(k\cdot n\cdot d(n)+m) on a RAM model, where nn is the length of the input word ww, k=|Q𝒜|k=|Q_{\cal A}| is the number of states of 𝒜{\cal A}, and mm is the length of the description of 𝒜{\cal A}, assuming that the function d(n)d(n) is computable in time O(n+m)O(n+m).

Proof

By Lemma 1, for each d(n)d(n)-DLA 𝒜{\cal A} there exists a corresponding DLLA MM (described by Algorithm 1) that simulates 𝒜{\cal A}. By Lemma 2, MM performs O(d(n)UB(𝒜)n)O(d(n)\cdot\langle U\!B\rangle({\cal A})\cdot n) steps, where UB(𝒜)\langle U\!B\rangle({\cal A}) is defined by Eq. (2). By Lemma 3, UB(𝒜)=O(|Q𝒜|)=O(k)\langle U\!B\rangle({\cal A})=O(|Q_{\cal A}|)=O(k). To simulate MM on a RAM model we use Simulation Algorithm 1 with subroutines from Algorithms 2 and 3. Before running the simulation we preprocess the description of 𝒜{\cal A} and store it in RAM, which takes O(m)O(m) time. Combining all these bounds, we obtain the claimed complexity O(knd(n)+m)O(k\cdot n\cdot d(n)+m).

Corollary 1

The recognition problem for a d(n)d(n)-DLA 𝒜{\cal A} is solvable in O(nd(n))O(n\cdot d(n)) time. In particular, for each fixed dd\in\mathbb{N}, every dd-DCFL is recognizable in linear time.

4 Upper bound on d(n)d(n)-DLA runtime

In this section we establish an upper bound on the runtime of a d(n)d(n)-DLA in the classical simulation model (as defined above).

Theorem 4.1

A d(n)d(n)-DLA with kk states performs at most O(d(n)n2k)O(d(n)\cdot n^{2}\cdot k) steps on an input of length nn, provided the computation does not enter an infinite loop.

Proof

Suppose the head traverses a segment of tape consisting of rank-d(n)d(n) symbols for more than knkn steps. Then some cell must be visited at least twice in the same state, since the average number of state visits per cell exceeds kk. Hence the computation would fall into an infinite loop.

As noted in the proof of Lemma 1, d(n)d(n)-DLAs have two types of moves: regular moves, when the head arrives at a cell of rank less than d(n)d(n), and d(n)d(n)-moves, when the head arrives at a segment consisting of rank-d(n)d(n) cells. Any series of d(n)d(n)-moves cannot exceed knkn steps unless the computation enters a loop. Each such series must be preceded by a regular move, and the total number of regular moves is O(nd(n))O(n\cdot d(n)): after each regular move the rank of the visited cell strictly increases, and it can increase only up to d(n)d(n).

Since each regular move takes O(1)O(1) steps, and each subsequent series of d(n)d(n)-moves takes O(kn)O(kn) steps, the total number of steps is bounded by O(d(n)n2k)O(d(n)\cdot n^{2}\cdot k). Therefore the claim follows. ∎

Theorem 4.1 implies that a dd-DLA runs in O(dkn2)O(d\cdot k\cdot n^{2}) steps. This bound is asymptotically tight: for instance, the classical 22-LA recognizing the language {anbnn0}\{a^{n}b^{n}\mid n\geqslant 0\} runs in quadratic time. Its behavior is as follows. The head moves right until it encounters the first bb, then returns left to find the leftmost aa of rank 11 (which is then promoted to rank 22). After this aa is located, the automaton moves right again to check for a matching bb of rank 11; if found, it proceeds to the next matching aa, and so on. When the automaton reaches the right endmarker \vartriangleleft, it verifies that no aa’s of rank 11 remain. In this case the input is accepted, and otherwise rejected.

Acknowledgments

The author thanks Dmitry Chistikov for valuable feedback, for discussions of the results presented in this text, and for helpful suggestions that improved the presentation.

References

  • [1] Abboud, A., Backurs, A., Williams, V.V.: If the current clique algorithms are optimal, so is Valiant’s parser. p. 98–117. FOCS ’15, IEEE Computer Society, USA (2015)
  • [2] Bertsch, E., Nederhof, M.J.: Regular closure of deterministic languages. SIAM Journal on Computing 29(1), 81–102 (1999)
  • [3] Birget, J.C.: Concatenation of inputs in a two-way automaton. Theoretical Computer Science 63(2), 141–156 (1989)
  • [4] Cook, S.A.: Linear time simulation of deterministic two-way pushdown automata. Department of Computer Science, University of Toronto (1970)
  • [5] Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, fourth edition. MIT Press (2022)
  • [6] Guillon, B., Prigioniero, L.: Linear-time limited automata. Theor. Comput. Sci. 798, 95–108 (2019)
  • [7] Hennie, F.: One-tape, off-line Turing machine computations. Information and Control 8(6), 553–578 (1965)
  • [8] Hibbard, T.N.: A generalization of context-free determinism. Information and Control 11(1/2), 196–238 (1967)
  • [9] Knuth, D.: On the translation of languages from left to right. Information and Control 8, 607–639 (1965)
  • [10] Kunc, M., Okhotin, A.: Describing periodicity in two-way deterministic finite automata using transformation semigroups. In: Mauri, G., Leporati, A. (eds.) Developments in Language Theory. pp. 324–336. Springer Berlin Heidelberg, Berlin, Heidelberg (2011)
  • [11] Kutrib, M., Wendlandt, M.: Reversible limited automata. Fundamenta Informaticae 155(1-2), 31–58 (2017). https://doi.org/10.3233/FI-2017-1575, https://journals.sagepub.com/doi/abs/10.3233/FI-2017-1575
  • [12] Lee, L.: Fast context-free grammar parsing requires fast boolean matrix multiplication. J. ACM 49(1), 1–15 (2002)
  • [13] Pighizzini, G.: Nondeterministic one-tape off-line Turing machines and their time complexity. J. Autom. Lang. Comb. 14(1), 107–124 (2009)
  • [14] Pighizzini, G., Pisoni, A.: Limited automata and context-free languages. In: Fundamenta Informaticae. vol. 136, pp. 157–176. IOS Press (2015)
  • [15] Rubtsov, A.A., Chudinov, N.: Computational model for parsing expression grammars. In: Královic, R., Kucera, A. (eds.) 49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024, August 26-30, 2024, Bratislava, Slovakia. LIPIcs, vol. 306, pp. 80:1–80:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2024). https://doi.org/10.4230/LIPICS.MFCS.2024.80, https://doi.org/10.4230/LIPIcs.MFCS.2024.80
  • [16] Rubtsov, A.A., Chudinov, N.: Computational model for parsing expression grammars. CoRR abs/2406.14911 (2024). https://doi.org/10.48550/ARXIV.2406.14911, https://doi.org/10.48550/arXiv.2406.14911
  • [17] Shallit, J.O.: A Second Course in Formal Languages and Automata Theory. Cambridge University Press (2008)
  • [18] Shepherdson, J.C.: The reduction of two-way automata to one-way automata. IBM Journal of Research and Development 3(2), 198–200 (1959)
  • [19] Tadaki, K., Yamakami, T., Lin, J.C.: Theory of one-tape linear-time Turing machines. Theoretical Computer Science 411(1), 22–43 (2010)
  • [20] Valiant, L.G.: General context-free recognition in less than cubic time. J. Comput. Syst. Sci. 10(2), 308–315 (1975)
  • [21] Wagner, K., Wechsung, G.: Computational complexity. Springer Netherlands (1986)
  • [22] Yamakami, T.: Behavioral strengths and weaknesses of various models of limited automata (2021), https://arxiv.org/abs/2111.05000
  • [23] Yamakami, T.: What is the most natural generalized pumping lemma beyond regular and context-free languages? In: Malcher, A., Prigioniero, L. (eds.) Descriptional Complexity of Formal Systems - 26th IFIP WG 1.02 International Conference, DCFS 2025, Loughborough, UK, July 22-24, 2025, Proceedings. Lecture Notes in Computer Science, vol. 15759, pp. 196–210. Springer (2025). https://doi.org/10.1007/978-3-031-97100-6_14, https://doi.org/10.1007/978-3-031-97100-6\_14