11email: rubtsov99@gmail.com
A Linear-time Simulation of Deterministic -Limited Automata
A -limited automaton is a Turing machine that may rewrite each input cell at most times. Hibbard (1967) showed that for every such automata recognize all context-free languages and that deterministic -limited automata form a strict hierarchy. Later, Pighizzini and Pisoni proved that the second level of this hierarchy coincides with deterministic context-free languages (DCFLs). We present a linear-time recognition algorithm for deterministic -limited automata in the RAM model, thereby extending linear-time recognition beyond DCFLs. We further generalize this result to deterministic -limited automata, where the bound may depend on the input length . In addition, we prove an bound for the membership problem, where the input includes both the word and the automaton’s description, with denoting the size of the description and the number of states.
1 Introduction
Context-free languages (CFLs) play a central role in computer science. Their deterministic subclass (DCFLs) is especially important in compiler construction, where parsing is based on the connection between -grammars and deterministic pushdown automata (DPDA). In 1965 Knuth showed [9] that -grammars generate exactly the class of DCFLs, and DPDAs provide linear-time parsing algorithms for them. Thus, DCFLs form a practically significant subclass of CFLs: they are recognizable in linear time, and -grammars admit linear-time construction of derivation trees. All these linear-time bounds, both in classical parsing theory and in this paper, are measured in the RAM model. We emphasize that the situation is different for Turing machines: Hennie showed [7] that any language recognizable in linear time on a Turing machine is regular, so stronger models such as RAM are required to capture the linear-time parsing of non-regular languages.
It is important to distinguish between two closely related problems. We follow the convention that in the recognition problem, the language is fixed and the input is only the word , while in the membership problem, both a description of the language (for instance, a context-free grammar) and the word are given as input.
According to this convention, the best known upper bound for the recognition problem for context-free languages is , where is the exponent of fast matrix multiplication. This bound was obtained by Valiant in 1975 [20], and his algorithm decides whether a given word belongs to the language generated by a fixed context-free grammar, but it does not construct a parse tree. Subsequent work [12, 1] confirmed that the same setting — fixed grammar and variable input word — is considered, and provided strong evidence that this upper bound is hard to substantially improve.
1.1 -Limited Automata and -DCFLs
We next recall the notion of -limited automata, introduced by Hibbard [8]. A -limited automaton (-LA) is a nondeterministic Turing machine that scans only the cells containing the input word together with end-markers, and is allowed to rewrite a symbol in each cell (except the end-markers) only during its first visits. Hibbard showed that for every -LAs recognize precisely the class of context-free languages; it is also known that -LAs recognize exactly the class of regular languages [21]. For , the model coincides with linear-bounded automata, which recognize the class of context-sensitive languages.
For the deterministic case, Pighizzini and Pisoni [14] proved that deterministic -LAs recognize exactly the deterministic context-free languages (DCFLs) by providing an algorithm that transforms a 2-LA into a PDA while preserving determinism; the inverse transformation had already been established by Hibbard [8]. Following Hibbard, we call a language recognized by a deterministic -LA a -deterministic language (-DCFL) and denote such automata by -DLAs. Hibbard also established that the hierarchy is strict: for every fixed , the class of -DCFLs properly contains the class of -DCFLs.
Although DCFLs are widely used, they suffer from certain practical limitations. First, they are not closed under reversal: while both
are DCFLs, their union is a DCFL, but the reversal is not. This language can still be recognized by a -DLA, which can scan the input from right to left before simulating a -DLA for .
Second, DCFLs are not closed under union. The language
is a union of two DCFLs but is known to be inherently ambiguous [17]. Every DCFL can be generated by an unambiguous grammar, in particular by an grammar, and Hibbard showed [8] that the same holds for -DCFLs. Hence the union -DCFLs does not cover all CFLs. This illustrates that -DCFLs, like DCFLs, are not closed under union; recognizing such languages in linear time via -DLAs requires parallelism.
1.2 -Limited Automata
We now consider a natural extension of -LAs. Assume that is not a constant but a function depending on the input length . The automaton can then rewrite the content of a cell until the number of visits to that cell reaches . To our knowledge, this is the first time such a generalization has been considered. In classical restrictions on Turing machine computation, the time bound is imposed on the total number of cell visits, whereas here we impose a bound on the number of visits per individual cell (after which the cell remains accessible for reading but no longer for rewriting).
1.3 Our Contribution
We focus on the membership problem for -DLAs. Let be the length of the description of a -DLA , the number of its states, and the length of the input word . We present an algorithm in the RAM model for the membership problem. In particular, when is fixed and (i.e., for -DLAs), this yields a linear-time algorithm. Thus every -DCFL is recognizable in linear time in the RAM model.
Hennie proved in [7] that every language recognizable in linear time by a deterministic Turing machine is regular (and a more general result holds for nondeterministic TMs [19, 13]). Hence no -DLA with can be simulated by a linear-time Turing machine. Guillon and Prigioniero [6] showed that every -DLA can be transformed into an equivalent linear-time Turing machine (and an analogous result holds for nondeterministic -LAs), and a related construction was also used by Kutrib and Wendlandt [11]. Their approach relies on Shepherdson’s classical simulation of two-way DFAs by one-way DFAs [18]. We build on this idea as well, but it cannot be applied directly because of Hennie’s result: otherwise one could simulate a non-regular language in linear time on a Turing machine, contradicting the theorem.
To overcome this obstacle we transform a classical Turing machine into one that operates on a doubly-linked list instead of a tape and adapt Shepherdson’s construction to this model. This forms the basis of our membership algorithm. We also reinterpret Birget’s algebraic constructions [3] in graph-theoretic terms, which provides subroutines underlying the final version of our algorithm. A related algebraic approach was developed by Kunc and Okhotin [10], who employed transformation semigroups to capture, for each substring, the state-to-state behavior of two-way deterministic finite automata. This line of work closely mirrors Birget’s method via function composition, and our construction follows the same underlying ideas.
We further establish an upper bound of steps for direct simulation of -DLAs, provided the computation does not enter an infinite loop. In particular, this implies an upper bound for -DLAs, which, to the best of our knowledge, was previously open. This bound is tight: it is witnessed by the classical -DLA recognizing the language .
From a theoretical perspective, our results show that some CFLs are easy (linear-time recognizable), while in general recognition of CFLs may require time, and by conditional results there must exist hard CFLs (recognizable only in superlinear time). We discuss this point in the following subsection.
1.4 Related Results
We begin with linear-time recognizable subclasses of context-sensitive languages (CSLs) and context-free languages (CFLs). E. Bertsch and M.-J. Nederhof [2] showed that a nontrivial subclass of CFLs, the regular closure of DCFLs, is recognizable in linear time. This class consists of all languages obtained by taking a regular expression and replacing each symbol with a DCFL. It evidently contains the aforementioned language (as a union of DCFLs), so it is a strict extension of DCFLs. Note also that is recognizable by 2-DPDAs.
A broad subclass of CSLs recognizable in linear time is given by the class of languages accepted by two-way deterministic pushdown automata (2-DPDAs). A linear-time simulation algorithm for 2-DPDAs was obtained by S. Cook [4]. This class clearly contains DCFLs, and it also includes the language of palindromes over an alphabet of at least two letters, which is a well-known example of a context-free language that is not a DCFL.
A. Rubtsov and N. Chudinov introduced in [15, 16] a computational model, DPPDA, for Parsing Expression Grammars (PEGs). This model extends DCFLs, remains recognizable in linear time, and is based on a modification of classical pushdown storage. It was also shown that the class of languages recognized by 2-DPPDAs is recognizable in linear time. Moreover, they proved that parsing expression languages (the class generated by PEGs) contain a highly nontrivial subclass, namely the Boolean closure of the regular closure of DCFLs.
It remains open whether 2-DPDAs, 2-DPPDAs, or PEGs recognize all CFLs. However, the works of L. Lee [12] and Abboud et al. [1] provide strong evidence that this is very unlikely due to complexity-theoretic considerations: any CFG parser with time complexity , where is the size of the grammar and the input length, can be efficiently converted into an algorithm for multiplying Boolean matrices in time . This naturally raises the question: can 2-DPDAs or 2-DPPDAs simulate -DLAs for ?
Finally, T. Yamakami presented another extension of Hibbard’s approach [22, 23]: in several models the input tape is one-way read-only, while the work tape obeys a similar restriction, forbidding rewriting beyond the first visits. We leave the application of our simulation technique to such models as a direction for future research.
2 Definitions
In this section we give precise definitions of the computational models used in the paper. We begin with Hibbard’s original model of -limited automata, introduced in [8]. We provide below a concise definition; a more formal equivalent definition can be found in [14]. Next, we introduce a modified variant, called the deleting automaton, in which the tape is replaced by a doubly linked list and cells may be deleted; this auxiliary model is central to our simulation technique. Finally, we define -limited automata, a natural extension of -limited automata where the rewriting bound depends on the length of the input word. For clarity, we formulate our simulation algorithm for the case of a fixed constant , since the extension to is straightforward.
2.1 -Limited Automaton
Let be a fixed integer. A deterministic -limited automaton (-DLA) is a deterministic single-tape Turing machine whose tape initially contains the input word bordered by the left endmarker and the right endmarker . Each tape symbol is annotated with an integer in , called its rank. Initially, all letters of the input word have rank , while the endmarkers have rank . Whenever the head visits a cell containing a symbol of rank , the symbol may be overwritten with a new symbol of rank such that . Symbols of rank are read-only and cannot be changed.
Formally, a -DLA is defined by a tuple
where is a finite set of states, is the input alphabet (symbols of rank ), is the tape alphabet with , for each we let denote the set of symbols of rank , is the initial state, is the set of accepting states, and is the transition function
such that:
-
•
for with , each transition is of the form
where , , and ;
-
•
for , each transition is of the form
where ; moreover, if then , and if then .
A -DLA starts in state with the head positioned on the first input symbol (immediately to the right of the left endmarker ). At each step, given a state and the symbol under the head, it computes , replaces with , updates its state to , and moves the head left or right according to . The input is accepted if the automaton reaches a state while scanning the right endmarker ; in this case the computation halts.
We slightly modify Hibbard’s original definition by requiring the transition function to be total. This modification does not change the class of recognizable languages and comes at the cost of a single additional state.
2.2 Deterministic Linked-List Automaton
We next introduce an auxiliary model, which we call the Deterministic Linked-List Automaton (DLLA). In this model there is no constraint on the number of visits to a cell. The tape is replaced by a doubly linked list, so the automaton may delete any cell between the endmarkers (but never the endmarkers themselves). Formally, a DLLA has the same components as a -DLA (states, input and tape alphabets, initial and accepting states), but with a modified transition function:
where the special symbol indicates that the current cell is to be deleted immediately after the head leaves it. Once a cell is deleted, the head moves directly from its left neighbor to its right neighbor when moving right, and symmetrically when moving left. If the transition function returns , the computation halts and the input is rejected. No additional restrictions are imposed on the transition function.
It is easy to see that DLLAs recognize exactly the class of deterministic context-sensitive languages. They can simulate deterministic linear-bounded automata (DLBA) directly, and conversely, a DLBA can simulate a DLLA by marking a deleted cell with a special symbol. We employ the doubly linked list representation in order to achieve the claimed upper bounds for deterministic -DLAs.
2.3 -Limited Automaton
For -DLAs it is convenient to associate with each letter its rank , representing the number of visits to the corresponding cell. For -DLAs this is no longer possible, since the alphabet and the machine description are fixed and cannot grow with the input length. Instead, in a -DLA each tape cell maintains its own visit counter, so the rank is associated with the cell rather than the symbol. Formally, each cell (except the endmarkers ) contains a pair , where and is a bit. The bit is initially , and remains while the number of visits to the cell is less than ; once the number of visits reaches , is set to , and from that point on the cell becomes read-only.
This modification of -DLAs does not affect our simulation algorithm. The value of is fixed for a given input length , and the algorithm never relies on being a constant rather than a precomputed parameter. Therefore, it suffices to describe the simulation algorithm for -DLAs; the extension to -DLAs is straightforward.
3 Linear-Time Simulation Algorithm
In this section we present a linear-time simulation algorithm for deterministic -limited automata (DLAs). We begin with the recognition problem, where the automaton is fixed and only the input word of length varies. Later we extend the construction to the membership problem, where both the automaton and the word are part of the input.
High-level idea. Our approach is inspired by Shepherdson’s classical simulation of two-way deterministic finite automata by one-way DFAs [18]. The key observation is that whenever a -DLA produces a block of cells all of rank , the precise contents of these cells are no longer relevant: what matters is only how can enter and leave this block. We therefore compress each such maximal block into a single cell containing a compact mapping that summarizes the block’s effect on the computation. If two adjacent blocks are compressed, their mappings can be merged by composition.
To implement this idea, we simulate by a deterministic linked-list automaton that uses a doubly linked list in place of a tape. The machine performs two types of steps:
-
•
-moves, which directly simulate moves of on symbols of rank ;
-
•
technical moves, which occur when encounters a compressed block. In this case consults the mapping stored in the corresponding cell to decide how would leave the block, and moves accordingly.
In this way, long stretches of redundant -rank cells are collapsed into constant-size summaries, ensuring that each cell contributes only a bounded number of times to the overall running time.
In the remainder of this section we first present the simulation algorithm for recognition, together with a correctness proof and amortized analysis. We then introduce the technical machinery of mappings and composition, which allows us to extend the algorithm to the membership problem and to prove the claimed bound.
3.1 Preparations
Directed states.
For convenience, we write (resp. ) to denote a state entered from the left (resp. right). Formally, if with , we abbreviate it as , where we write . We call such pairs directed states, and use this notation for both -DLAs and DLLAs. We write
Mappings.
The key idea is to collapse long segments of rank- cells into a single object. When makes the -th visit to a cell and writes a symbol of rank , we replace that cell by a mapping . Formally, a mapping is a function
it specifies, given the entry state and the entry direction, the exit state and the exit direction when leaves the block (or if it never does). For a mapping describing a single rank- cell, the entry direction is irrelevant, but for multi-cell segments it matters.
Segment traversal.
We denote by the -th cell of ’s tape and by
the segment of ’s tape. When we say that the head enters the segment in the directed state we mean that corresponds to the head entering from the left, and corresponds to entering from the right. Symmetrically, means that the head exits through to the right, and means it exits through to the left, and the head leaves the segment in the directed state .
Segment description mappings.
We say that a mapping describes a segment if
-
•
all letters in have rank
-
•
if the head enters the segment in a directed state , it leaves the segment in a directed state .
-
•
if the head enters the segment in a directed state , it never leaves the segment (i.e., the computation loops inside).
We denote the set of all possible mappings as
Operations with mappings.
Let and describe segments and respectively. We define the directed composition of mappings by setting whenever describes the concatenated segment . Assume now that the head either enters the segment in the directed state , or enters in . We define the departure function
which, given mappings and an entry state , returns the directed state such that leaves the concatenated segment in state and direction . If the head never leaves , then .
Finally, we denote by a function that for each symbol , returns the mapping describing the one-cell segment consisting only of .
Since is finite, the set is finite as well. Therefore all the functions , , and the operation are computable in constant time for a fixed automaton , because the number of mappings is finite. In the general case (varying ) we later show that they are computable in . The operation and the departure function are well-defined; this follows directly from the definitions, and we will later give a formal justification.
3.2 Recognition Problem
We now present a high-level simulation algorithm for the recognition problem. Detailed constructions of the subroutines will be given later in Subsection 3.3, where the membership problem is analyzed. The algorithm below both formally defines the DLLA that simulates , and at the same time serves as the procedure for simulating on a RAM model.
Simulation Algorithm 1 is given in pseudocode; its description in natural language is as follows. Since the DLLA deletes cells during its run, we refer to the current left and right neighbors of the -th cell as and , respectively. When we write we assume that both and refer to the indices of ’s tape cells; we do not reenumerate the cells of ’s tape.
Thus, we denote the -th cell of ’s tape by . If contains a letter of rank less than , then , and behaves exactly as (an -move). When visits a cell for the -th time, where would write a symbol of rank (not an end-marker), instead writes to that cell a mapping . When writes a mapping in a cell for the first time, it scans the neighbors and and performs the procedure we call a deletion scan.
-
•
If none of the neighbors contains a mapping, the scan is finished.
-
•
If only one of the neighbors or contains a mapping (say or ), then replaces with (or ) and deletes the neighboring cell.
-
•
If both neighbors contain mappings, then replaces with and deletes both neighboring cells.
After a deletion scan the cell contains the resulting mapping, say , while both its neighbors contain letters. Hence describes the segment . then moves the head to the same neighbor ( or ) where would be after leaving the rank- segment . If the head of had not quit the segment right after visiting , the cell where arrives after it exits the segment is determined via the departure function during the deletion scan.
We have thus described the cases in which arrives:
-
•
at a cell of rank (from any neighbor), and
-
•
at a cell of rank (from another rank- cell).
It remains to describe the case when arrives at a cell containing a mapping in a directed state , coming from a cell with a letter of rank (lines 1–1). In this case computes and moves the head to the left or right neighbor according to the direction of , arriving at that neighbor in state .
Before proving correctness we fix notation for time indices. Let denote the content of cell of ’s tape after steps of , and let denote the content of cell of ’s tape after steps of . We consider only regular steps, i.e. steps performed on symbols of rank or on endmarkers. Every regular step of has a corresponding regular step of , and the mapping is strictly increasing (if then ). This correspondence will be used below.
Lemma 1
For each -DLA , the corresponding DLLA simulates . More precisely, there exists an order-preserving correspondence such that:
-
(i)
for every regular step of with the corresponding step of , if visits cell with a symbol of rank or an endmarker, then ;
-
(ii)
at steps and and are in the same state when arriving at cell ;
-
(iii)
accepts an input iff does.
Proof
Let perform moves
(1) |
on a fixed input, and either accepts the input or enters a loop.
We call a move a -move if has rank but is not an endmarker; otherwise we call it a regular move. Thus a run is a sequence (1) partitioned into alternating segments of regular moves and -moves.
For we define runs analogously: regular moves are the same, while a -move is either a step into a cell containing a mapping, or a step of the deletion scan initiated on a cell with a symbol of rank .
We claim:
-
(i)
if we delete all -moves from the runs of and , the resulting sequences of regular moves are identical;
-
(ii)
after each maximal block of -moves, both automata end in the same cell and in the same state.
From these two properties it follows that accepts exactly the same words as , because accepting configurations are reached by regular moves only. The correspondence between regular steps of and is then precisely the index matching described before the lemma.
Note that property (i) follows immediately from (ii), since between two blocks of -moves both automata perform the same sequence of regular moves. Indeed, once the heads are in the same cell with the same symbol of rank and in the same state, the subsequent regular moves of coincide with those of by construction.
It remains to prove (ii). Assume that before some -block both and are in the same state on the same cell. We distinguish two cases for the first move of the -block of :
Case 1: the head visits a cell containing a symbol of rank . Then rewrites to a symbol of rank . In the corresponding move writes into the mapping describing the one-cell segment at . If the -block of ends immediately (or on the next step visits another rank- cell), then by the definition of both automata end in the same cell and state. Otherwise, proceeds into a neighboring rank- cell and eventually leaves the contiguous rank- segment. By construction of , after the deletion scan and subsequent use of the departure function , arrives at the same cell and in the same state as . Thus both automata synchronize at the end of the -block.
Case 2: the head of enters a cell of rank . Hence moves inside a segment of contiguous rank- symbols, while is positioned at a cell containing a mapping describing this segment (the invariant maintained by the deletion scan). Since faithfully describes the segment, after executes the step , it reaches exactly the same exit cell and state as . If then continues into a rank- cell, we return to Case 1; otherwise the -block ends with synchronization.
Finally, if in either case enters a loop, the corresponding mapping for returns , and rejects the input. Thus (ii) holds, which completes the proof. ∎
Now we prove that the simulation algorithm for -DLAs works in linear time. We present the proof for the general case of a -DLA. The simulation algorithm for -DLAs is identical to that for a fixed , since its behavior depends only on whether the number of visits to a cell is equal to or smaller. The counters for cell visits required in the case of -DLAs can be implemented in the RAM model with overhead per operation, and thus do not affect the asymptotic running time.
We denote by the time complexity of the operation on a -DLA . Since we will prove later that the complexity of each auxiliary step depending on is , we replace , , and by
(2) |
Lemma 2
The automaton performs steps on processing an input of length .
Proof
We use amortized analysis [5], namely the accounting method. Each cell on ’s tape has its own budget (credit, in the terminology of [5]). We denote by the value of the budget of cell after step .
The budgets are updated according to the following rules:
-
•
for all ;
-
•
if at step the head visits cell and this cell still contains a letter (i.e., it has been visited fewer than times);
-
•
if at step the head enters cell from cell , where is a mapping and is a letter;
-
•
otherwise.
Budgets are not changed during deletion scans.
Fix a step . Suppose that the -th cell currently contains a mapping describing a segment . Its neighbors and have rank , so each has been visited fewer than times. When the head moves from a neighbor into the segment, that neighbor pays $1. Until the neighbor itself turns into a mapping, it pays at most times for such visits. Once it becomes a mapping, further payments are taken over by the new neighbor. Thus each cell pays $1 for each of its own visits and at most $1 for each visit into an adjacent mapping before it itself turns into a mapping. Since this can happen only after at most visits of the cell, each cell pays at most dollars in total. Therefore, the described budget strategy guarantees for all and .
Deletion scans were not counted above. Clearly there are at most scans, since each cell can initiate at most one. During one scan, at most two directed compositions are computed, each in . Hence all deletion scans together cost time.
Summing up, performs the following kinds of operations:
-
1.
moves that end on a letter, but not an endmarker;
-
2.
moves that end on a mapping;
-
3.
deletion scans;
-
4.
moves that end on an endmarker.
By amortized analysis, Cases 1 and 2 together take : the total number of such moves is (since ), and each move costs . Case 3 costs , as discussed. For Case 4, note that after the head leaves the left endmarker on the very first move, each endmarker can only be visited when arriving from an inner cell. Hence the total number of endmarker visits does not exceed the number of visits to all other cells, which is . Since each simulation step of Algorithm 1 takes at most time, endmarker visits also cost .
Thus the total running time of on an input of length is . ∎
3.3 Membership Problem
To analyze the membership problem for -DLAs we need to formalize the auxiliary operations on mappings. Recall that mappings represent contiguous segments of rank- cells and that the simulation algorithm relies on three basic subroutines:
-
•
the cell description function , which produces the mapping for a single rank- cell;
-
•
the directed composition , which merges mappings of adjacent segments into one;
-
•
the departure function , which determines the exit state and direction when the head is located at the boundary between two adjacent segments, i.e., when it enters one segment from the other.
We prove in this subsection that all these operations are well-defined and computable in time, so .
Our constructions rely on graph representations of mappings. A mapping describing a segment is encoded by a four-partite graph with parts , each a copy of . These four parts form a partition of the set according to their labeling. We also adjoin a distinguished sink vertex of out-degree . For every we add:
-
•
an edge to some if ,
-
•
or an edge if .
Thus every vertex has out-degree at most : inputs have either one outgoing edge to an output or to , while outputs and have out-degree . We say that represents the mapping (or the segment ).
If describes an adjacent segment , to compute the composition and the departure function we use the intermediate graph obtained by gluing the graphs and (with parts labeled by ). Formally, is defined as follows: part is glued with , part with , and the two sinks are glued together. By gluing two vertices we mean that one of them is deleted and all its edges are reattached to the other. When we glue parts, we glue the vertices carrying the same state label (but with opposite directions). An illustration of is given in Fig. 1.
Proposition 1
The directed composition is determined via the intermediate graph as follows: iff there is a path to an output vertex, and iff the unique path from reaches or falls into a directed cycle. Associativity follows from associativity of concatenating segments (hence of gluing graphs).
Lemma 3
For every -DLA , the operations , , and are well-defined and computable in time, assuming that the description of is already stored in RAM.
Proof
The function is trivially computable in time: for a rank- symbol , the mapping satisfies iff .
We next analyze the algorithms for and . Let and be the vertex and edge sets of the intermediate graph (Fig. 1). Since each vertex has out-degree at most 1, we have and . We maintain two global arrays: for the resulting mapping and for marks on vertices. Initially all values of are set to a placeholder . These arrays serve as memoization.
The core procedure is the function FindPath (Algorithm 2), which returns the endpoint of the unique path starting from , or if the path falls into a loop. We explain the steps of the algorithm while proving correctness.
Correctness and complexity of FindPath. Let and define . The procedure terminates when one of the following holds:
-
•
is an output: then FindPath returns ;
-
•
: then FindPath returns ;
-
•
is marked: let .
If , a loop is detected and FindPath returns . Otherwise FindPath returns , since has already been computed and stored in . As the paths from and merge at the vertex , their endpoints coincide.
The invariant “if then ” is established by Algorithm 3, which computes using FindPath.
Throughout its execution each edge of is traversed at most once before the corresponding vertex is marked, so the total running time is . Observe that is defined exactly as the endpoint of the path from in the intermediate graph . Hence a single call is realized by and runs in time. Therefore and are computable in time, and together with the bound for this completes the proof. ∎
3.4 Main Result
We now summarize the simulation in the following theorem.
Theorem 3.1
For every -DLA , the membership problem for is solvable in time on a RAM model, where is the length of the input word , is the number of states of , and is the length of the description of , assuming that the function is computable in time .
Proof
By Lemma 1, for each -DLA there exists a corresponding DLLA (described by Algorithm 1) that simulates . By Lemma 2, performs steps, where is defined by Eq. (2). By Lemma 3, . To simulate on a RAM model we use Simulation Algorithm 1 with subroutines from Algorithms 2 and 3. Before running the simulation we preprocess the description of and store it in RAM, which takes time. Combining all these bounds, we obtain the claimed complexity .
Corollary 1
The recognition problem for a -DLA is solvable in time. In particular, for each fixed , every -DCFL is recognizable in linear time.
4 Upper bound on -DLA runtime
In this section we establish an upper bound on the runtime of a -DLA in the classical simulation model (as defined above).
Theorem 4.1
A -DLA with states performs at most steps on an input of length , provided the computation does not enter an infinite loop.
Proof
Suppose the head traverses a segment of tape consisting of rank- symbols for more than steps. Then some cell must be visited at least twice in the same state, since the average number of state visits per cell exceeds . Hence the computation would fall into an infinite loop.
As noted in the proof of Lemma 1, -DLAs have two types of moves: regular moves, when the head arrives at a cell of rank less than , and -moves, when the head arrives at a segment consisting of rank- cells. Any series of -moves cannot exceed steps unless the computation enters a loop. Each such series must be preceded by a regular move, and the total number of regular moves is : after each regular move the rank of the visited cell strictly increases, and it can increase only up to .
Since each regular move takes steps, and each subsequent series of -moves takes steps, the total number of steps is bounded by . Therefore the claim follows. ∎
Theorem 4.1 implies that a -DLA runs in steps. This bound is asymptotically tight: for instance, the classical -LA recognizing the language runs in quadratic time. Its behavior is as follows. The head moves right until it encounters the first , then returns left to find the leftmost of rank (which is then promoted to rank ). After this is located, the automaton moves right again to check for a matching of rank ; if found, it proceeds to the next matching , and so on. When the automaton reaches the right endmarker , it verifies that no ’s of rank remain. In this case the input is accepted, and otherwise rejected.
Acknowledgments
The author thanks Dmitry Chistikov for valuable feedback, for discussions of the results presented in this text, and for helpful suggestions that improved the presentation.
References
- [1] Abboud, A., Backurs, A., Williams, V.V.: If the current clique algorithms are optimal, so is Valiant’s parser. p. 98–117. FOCS ’15, IEEE Computer Society, USA (2015)
- [2] Bertsch, E., Nederhof, M.J.: Regular closure of deterministic languages. SIAM Journal on Computing 29(1), 81–102 (1999)
- [3] Birget, J.C.: Concatenation of inputs in a two-way automaton. Theoretical Computer Science 63(2), 141–156 (1989)
- [4] Cook, S.A.: Linear time simulation of deterministic two-way pushdown automata. Department of Computer Science, University of Toronto (1970)
- [5] Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, fourth edition. MIT Press (2022)
- [6] Guillon, B., Prigioniero, L.: Linear-time limited automata. Theor. Comput. Sci. 798, 95–108 (2019)
- [7] Hennie, F.: One-tape, off-line Turing machine computations. Information and Control 8(6), 553–578 (1965)
- [8] Hibbard, T.N.: A generalization of context-free determinism. Information and Control 11(1/2), 196–238 (1967)
- [9] Knuth, D.: On the translation of languages from left to right. Information and Control 8, 607–639 (1965)
- [10] Kunc, M., Okhotin, A.: Describing periodicity in two-way deterministic finite automata using transformation semigroups. In: Mauri, G., Leporati, A. (eds.) Developments in Language Theory. pp. 324–336. Springer Berlin Heidelberg, Berlin, Heidelberg (2011)
- [11] Kutrib, M., Wendlandt, M.: Reversible limited automata. Fundamenta Informaticae 155(1-2), 31–58 (2017). https://doi.org/10.3233/FI-2017-1575, https://journals.sagepub.com/doi/abs/10.3233/FI-2017-1575
- [12] Lee, L.: Fast context-free grammar parsing requires fast boolean matrix multiplication. J. ACM 49(1), 1–15 (2002)
- [13] Pighizzini, G.: Nondeterministic one-tape off-line Turing machines and their time complexity. J. Autom. Lang. Comb. 14(1), 107–124 (2009)
- [14] Pighizzini, G., Pisoni, A.: Limited automata and context-free languages. In: Fundamenta Informaticae. vol. 136, pp. 157–176. IOS Press (2015)
- [15] Rubtsov, A.A., Chudinov, N.: Computational model for parsing expression grammars. In: Královic, R., Kucera, A. (eds.) 49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024, August 26-30, 2024, Bratislava, Slovakia. LIPIcs, vol. 306, pp. 80:1–80:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2024). https://doi.org/10.4230/LIPICS.MFCS.2024.80, https://doi.org/10.4230/LIPIcs.MFCS.2024.80
- [16] Rubtsov, A.A., Chudinov, N.: Computational model for parsing expression grammars. CoRR abs/2406.14911 (2024). https://doi.org/10.48550/ARXIV.2406.14911, https://doi.org/10.48550/arXiv.2406.14911
- [17] Shallit, J.O.: A Second Course in Formal Languages and Automata Theory. Cambridge University Press (2008)
- [18] Shepherdson, J.C.: The reduction of two-way automata to one-way automata. IBM Journal of Research and Development 3(2), 198–200 (1959)
- [19] Tadaki, K., Yamakami, T., Lin, J.C.: Theory of one-tape linear-time Turing machines. Theoretical Computer Science 411(1), 22–43 (2010)
- [20] Valiant, L.G.: General context-free recognition in less than cubic time. J. Comput. Syst. Sci. 10(2), 308–315 (1975)
- [21] Wagner, K., Wechsung, G.: Computational complexity. Springer Netherlands (1986)
- [22] Yamakami, T.: Behavioral strengths and weaknesses of various models of limited automata (2021), https://arxiv.org/abs/2111.05000
- [23] Yamakami, T.: What is the most natural generalized pumping lemma beyond regular and context-free languages? In: Malcher, A., Prigioniero, L. (eds.) Descriptional Complexity of Formal Systems - 26th IFIP WG 1.02 International Conference, DCFS 2025, Loughborough, UK, July 22-24, 2025, Proceedings. Lecture Notes in Computer Science, vol. 15759, pp. 196–210. Springer (2025). https://doi.org/10.1007/978-3-031-97100-6_14, https://doi.org/10.1007/978-3-031-97100-6\_14