Computing with and without arbitrary large numbers
(Extended abstract)
Abstract
In the study of random access machines (RAMs) it has been shown that the availability of an extra input integer, having no special properties other than being sufficiently large, is enough to reduce the computational complexity of some problems. However, this has only been shown so far for specific problems. We provide a characterization of the power of such extra inputs for general problems.
To do so, we first correct a classical result by Simon and Szegedy (1992) as well as one by Simon (1981). In the former we show mistakes in the proof and correct these by an entirely new construction, with no great change to the results. In the latter, the original proof direction stands with only minor modifications, but the new results are far stronger than those of Simon (1981).
In both cases, the new constructions provide the theoretical tools required to characterize the power of arbitrary large numbers.
1 Introduction
The Turing machine (TM), first introduced in [17], is undoubtedly the most familiar computational model. However, for algorithm analysis it often fails to adequately represent real-life complexities, for which reason the random access machine (RAM), closely resembling the intuitive notion of an idealized computer, has become the common choice in algorithm design. Ben-Amram and Galil [2] write “The RAM is intended to model what we are used to in conventional programming, idealized in order to be better accessible for theoretical study.”
Here, “what we are used to in conventional programming” refers, among other things, to the ability to manipulate high-level objects by basic commands. However, this ability comes with some unexpected side effects. For example, one can consider a RAM that takes as an extra input an integer that has no special property other than being “large enough”. Contrary to intuition, it has been shown that such arbitrary large numbers (ALNs) can lower problem time complexities. For example, [4] shows that the availability of ALNs lowers the arithmetic time complexity111Arithmetic complexity is the computational complexity of a problem under the model, which is defined later on in this section. of calculating from to . However, all previous attempts to characterize the contribution of ALNs dealt with problem-specific methods of exploiting such inputs, whereas the present work gives, for the first time, a broad characterization of the scenarios in which arbitrary numbers do and those in which they do not increase computational power.
In order to present our results, we first redefine, briefly, the RAM model. (See [1] for a more formal introduction.)
Computations on RAMs are described by programs. RAM programs are sets of commands, each given a label. Without loss of generality, labels are taken to be consecutive integers. The bulk of RAM commands belong to one of two types. One type is an assignment. It is described by a triplet containing a -ary operator, operands and a target. The other type is a comparison. It is given two operands and a comparison operator, and is equipped with labels to proceed to if the comparison is evaluated as either true or false. Other command-types include unconditional jumps and execution halt commands.
The execution model for RAM programs is as follows. The RAM is considered to have access to an infinite set of registers, each marked by a non-negative integer. The input to the program is given as the initial state of the first registers. The rest of the registers are initialized to . Program execution begins with the command labeled and proceeds sequentially, except in comparisons (where execution proceeds according to the result of the comparison) and in jumps. When executing assignments, the -ary operator is evaluated based on the values of the operands and the result is placed in the target register. The output of the program is the state of the first registers at program termination.
In order to discuss the computational power of RAMs, we consider only RAMs that are comparable in their input and output types to TMs. Namely, these will be the RAMs whose inputs and outputs both lie entirely in their first register. We compare these to TMs working on one-sided-infinite tapes over a binary alphabet, where “” doubles as the blank. A RAM will be considered equivalent to a TM if, given as an input an integer whose binary encoding is the initial state of the TM’s tape, the RAM halts with a non-zero output value if and only if the TM accepts on the input.
Furthermore, we assume, following e.g. [6], that all explicit constants used as operands in RAM programs belong to the set . This assumption does not make a material difference to the results, but it simplifies the presentation.
In this paper we deal with RAMs that use non-negative integers as their register contents. This is by far the most common choice. A RAM will be indicated by , where op is the set of basic operations supported by the RAM. These basic operations are assumed to execute in a single unit of time. We use the syntax to denote the set of problems solvable in time by a , where is the bit-length of the input. Replacing “” by “TM” indicates that the computational model used is a Turing machine.
Note that because registers only store non-negative integers, such operations as subtraction cannot be supported without tweaking. The customary solution is to replace subtraction by “natural subtraction”, denoted “” and defined by . We note that if the comparison operator “” (testing whether the first operand is less than or equal to the second operand) is not supported by the RAM directly, the comparison “” can be simulated by the equivalent equality test “”. Testing for equality is always assumed to be supported.
By the same token, regular bitwise negation is not allowed, and is tweaked to mean that the bits of are negated only up to and including its most significant “” bit.
Operands to each operation can be explicit integer constants, the contents of explicitly named registers or the contents of registers whose numbers are specified by other registers. This last mode, which can also be used to define the target register, is known as “indirect addressing”. In [3] it is proved that for the RAMs considered here indirect addressing has no effect. We therefore assume throughout that it is unavailable to the RAMs.
The following are two classical results regarding RAMs. Operations appearing in brackets within the operation list are optional, in the sense that the theorem holds both when the operation is part of op and when it is not.
Theorem 1 ([14]).
and
Theorem 2 ([13]).
, where ER is the set of problems solvable by Turing machines in
time, where is the length of the input.
Here, “” indicates exact division, which is the same as integer division (denoted “”) but is only defined when the two operands divide exactly. The operations “” and “” indicate left shift () and right shift (), and Bool is shorthand for the set of all bitwise Boolean functions.
In this paper, we show that while Theorem 1 is correct, its original proof is not. Theorem 2, on the other hand, despite being a classic result and one sometimes quoted verbatim (see, e.g., [16]), is, in fact, erroneous.
We re-prove the former here, and replace the latter by a stronger result, for the introduction of which we first require several definitions.
Definition 1 (Expansion Limit).
Let be the largest number that can appear in any register of a working on inp as its input, during the course of its first execution steps.
We define to be the maximum of over all values of inp for which . This is the maximum number that can appear in any register of a that was initialized by an input of length at most , after execution steps.
The subscript op may be omitted if understood from the context.
As a slight abuse of notation, we use to be the maximum of over all inp of length at most , when is understood from the context and is independent of . (The following definition exemplifies this.)
Definition 2 (RAM-Constructability).
A set of operations op is RAM-constructable if the following two conditions are satisfied: (1) there exists a RAM program that, given inp and as its inputs, with being the length of inp, returns in time a value no smaller than , and (2) each operation in op is computable in space on a Turing machine, where is the total length of all operands and of the result.
Our results are as follows.
Theorem 3.
For a RAM-constructable and any function ,
where the new notations refer to nondeterministic Turing machines, space-bounded Turing machines and nondeterministic space-bounded Turing machines, respectively.
Among other things, this result implies for polynomial-time RAMs that their computational power is far greater than ER, as was previously believed.
The theoretical tools built for proving Theorem 3 and re-proving Theorem 1 then allow us to present the following new results regarding the power of arbitrary large numbers.
Theorem 4.
Theorem 5.
Any recursively enumerable (r.e.) set can be recognized by an in time.
Here, “ARAM” is the RAM model assisted by an arbitrary large number. Formally, we say that a set is computable by an in time if there exists a Boolean function , computable in time on a , such that implies for almost all (all but a finite number of ) whereas implies for almost all . Here, conventionally denotes the bit length of the input, but other metrics are also applicable.
We see, therefore, that the availability of arbitrary numbers has no effect on the computational power of a RAM without division. However, for a RAM equipped with integer division, the boost in power is considerable, to the extent that any problem solvable by a Turing machine in any amount of time or space can be solved by an ARAM in time.
2 Models without division
2.1 Errata on [14]
We begin with a definition.
Definition 3 (Straight Line Program).
A Straight Line Program, or , is a list of tuples, , where each is composed of an operator, , and integers, , all in the range , where is the number of operands taken by . This list is to be interpreted as a set of computations, whose targets are , which are calculated as follows: , , and for each , is the result of evaluating the operator on the inputs . The output of an SLP is the value of .
A technique first formulated in a general form in [11] allows results on SLPs to be generalized to RAMs. Schönhage’s theorem, as worded for the special case that interests us, is that if there exists a Turing machine, running on a polynomial-sized tape and in finite time, that takes an as input and halts in an accepting state if and only if is nonzero, then there also exists a TM running on a polynomial-sized tape that simulates a . This technique is used both in [14] and in our new proof.
The proof of [14] follows this scheme, and attempts to create such a Turing machine. In doing so, this TM stores monomial-based representations of certain powers of two. These are referred to by the paper as “monomials” but are, for our purposes, integers.
The main error in [14] begins with the definition of a relation, called “vicinity”, between monomials, which is formulated as follows.
We define an equivalence relation called vicinity between monomials. Let and be two monomials. Let be a given parameter. If , then is in the vicinity of . The symmetric and transitive closure of this relation gives us the full vicinity relation. As it is an equivalence relation, we can talk about two monomials being in the same vicinity (in the same equivalence class).
It is unclear from the text whether the authors’ original intention was to define this relation in a universal sense, as it applies to the set of all monomials (essentially, the set of all powers of two), or whether it is only defined over the set of monomials actually used by any given program. If the former is correct, any two monomials are necessarily in the same vicinity, because one can bridge the gap between them by monomials that are only a single order of magnitude apart. If the latter is correct, it is less clear what the final result is. The paper does not argue any claim that would characterize the symmetric and transitive closure in this case.
However, the paper does implicitly assume throughout that the vicinity relation, as originally defined (in the sense) is its own symmetric and transitive closure. This is used in the analysis by assuming for any and which are in the same vicinity (in the equivalence relation sense) that they also satisfy , i.e. they are in the same vicinity also in the restrictive sense.
Unfortunately, this claim is untrue. It is quite possible to construct an SLP that violates this assumption, and because the assumption is central to the entire algorithm, the proof does not hold.
We therefore provide here an alternate algorithm, significantly different from the original, that bypasses the entire “vicinity” issue.
2.2 Our new construction
Our proof adapts techniques from two previous papers: [5] (which uses lazy evaluation to perform computations on operands that are too long to fit into a polynomial-sized tape) and [12] (which stores operands in a hierarchical format that notes only the positions of “interesting bits”, these being bit positions whose values are different than those of the less significant bit directly preceding them). The former method is able to handle multiplication but not bit shifting and the latter the reverse. We prove Theorem 1 using a sequence of lemmas. All algorithms described are available as C++ code in Appendix A.
Lemma 1.
In an , the number of interesting bits in the output grows at most exponentially with . There exists a Turing machine working in polynomial space that takes such an SLP as its input, and that outputs an exponential-sized set of descriptions of bit positions, where bit positions are described as functions of , such that the set is a superset of the interesting bit positions of .
The fact that the number of interesting bits grows only exponentially given this operation set was noted in [14]. Our proof follows the reasoning of the original paper.
Proof.
Consider, for simplicity, the instruction set . Suppose that we were to change the meaning of the operator “”, so that, instead of calculating , its result would be where is a formal parameter, and a new formal parameter is generated every time the “” operator is used. The end result of the calculation will now no longer be an integer but rather a polynomial in the formal parameters. The following are some observations regarding this polynomial.
-
1.
The number of formal parameters is at most , the length of the SLP.
-
2.
The power of each formal parameter is at most , where is the step number in which the parameter was defined. (This exponent is at most doubled at each step in the SLP. Doubling may happen, for example, if the parameter is multiplied by itself.)
-
3.
The sum of all multiplicative coefficients in the polynomial is at most . (During multiplication, the sum of the product polynomial’s coefficients is the product of the sums of the operands’ coefficients. As such, this value can at most square itself at each operation. The maximal value it can attain at step is .)
If we were to take each formal variable, , that was created at an “” operation, and substitute in it the value (a substitution that [14] refers to as the “standard evaluation”), then the value of the polynomial will equal the value of the SLP’s output. We claim that if is an interesting bit position, then there is some product of formal variables appearing as a monomial in the result polynomial such that its standard evaluation is , and .
The claim is clearly true for and . For , we will make the stronger claim . To prove this, note that any monomial whose standard evaluation is greater than cannot influence the value of bit and cannot make it “interesting”. On the other hand, if all remaining monomials are smaller than , the total value that they carry within the polynomial is smaller than times the sum of their coefficients, hence smaller than . Bits and , however, are both zero. Therefore, is not an interesting bit.
We proved the claim for the restricted operation set . Adding logical AND (“”) and logical OR (“”) can clearly not change the fact that bits and are both zero, nor can it make the polynomial coefficients larger than .
Incorporating “” and “” into the operation set has a more interesting effect: the values of bit and can both become “1”. This will still not make bit interesting, but it does require a small change in the argument. Instead of considering polynomials whose coefficients are between and , we can now consider polynomials whose coefficients are between and . This changes the original argument only slightly, in that we now need to argue that in taking the product over two polynomials the sum of the absolute values of the coefficients of the product is no greater than the product of the sums of the absolute values of the coefficients of the operands.
Similarly, adding “” into consideration, we no longer consider only formal variables of the form but also , where the standard evaluation of is and is treated as a bitwise Boolean operation (in the sense that, conceptually, it zeroes all bit positions that are “to the right of the decimal point” in the product).
We can therefore index the set of interesting bits by use of a tuple, as follows. If are the set of steps for which , the tuple will contain one number between and for each , to indicate the exponent of the formal parameter added at step , and an additional ’th element, between and to indicate a bit offset from this bit position.
Though this tuple may contain many non-interesting bits, or may describe a single bit position by many names, it is a description of a super-set of the interesting bits in polynomial space. ∎
In Appendix A, such an enumeration is implemented by the method Index::next. We refer to the set of bit positions thus described as the potentially-interesting bits, or po-bits, of the SLP.
Lemma 2.
Let . Let be an Oracle that takes as input and outputs the descriptions of all its po-bits in order, from least-significant to most-significant, without repetitions. There exists a TM working in polynomial space but with access to that takes as inputs and the description of a po-bit position, , of , and that outputs the ’th bit of the output of .
Proof.
Given a way to iterate over the po-bits in order, the standard algorithms for most operations required work as expected. For example, addition can be performed bit-by-bit if the bits of the operands are not stored, but are, rather, calculated recursively whenever they are needed. The depth of the recursion required in this case is at most . (See Add::eval in Appendix A.)
The fact that iterating only over the po-bits, instead of over all bit positions, makes no difference to the results is exemplified in Figure 1.
po-bits | non-po-bits | po-bits | ||||||||||
+ | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | |||
1 | 1 |
As can be seen, not only are the non-po-bits all equal to the last po-bit preceding them, in addition, the carry bit going over from the last po-bit to the first non-po-bit is the same as the carry bit carried over from the last non-po-bit to the first po-bit. Because of this, the sequential carry bits across non-po-bits (depicted in light blue in Figure 1) can be replaced by a single non-contiguous carry operation (the red arrow).
This logic works just as well for subtraction and Boolean operations. The only operation acting differently is multiplication. Implementing multiplication directly leads to incorrect results. Instead, we re-encode the operand bits in a way that reflects our original observation, that the operands can be taken to be polynomials with small coefficients in absolute value, though these coefficients may not necessarily be nonnegative.
The new encoding is as follows: going from least significant bit to most significant bit, a “” bit is encoded as a if preceded by a “” and as , otherwise. A “” bit is encoded as a if preceded by a “” and as , otherwise. It is easy to see that a number, , encoded in regular binary notation but including a leading zero by a sequence, , denoting coefficients of a power series , does not change its value if the are switched for the that are the result of the re-encoding procedure described. The main difference is that now the value of all non-po-bits is . (See Mult::eval in Appendix A.)
Proving that multiplication works correctly after re-encoding is done by observing its base cases and bilinear properties. The carry in the calculation is exponential in size, so can be stored using a polynomial number of bits. ∎
Lemma 3.
Let be an Oracle that takes an and two po-bit positions of and determines which position is the more significant. Given access to , Oracle , described in Lemma 2, can be implemented as a polynomial space Turing machine.
Proof.
Given an Oracle able to compare between indices, the ability to enumerate over the indices in an arbitrary order allows creation of an ordered enumeration. Essentially, we begin by choosing the smallest value, then continue sequentially by choosing, at each iteration, the smallest value that is still greater than the current value. This value is found by iterating over all index values in an arbitrary order and trying each in turn. ∎
In Appendix A, this algorithm is implemented by the method Index::operator++.
Lemma 4.
Oracle , described in Lemma 3, can be implemented as a polynomial space Turing machine.
Proof.
Recall that an index position is an affine function of the coefficients of the formal variables introduced, in their standard evaluations. To determine which of two indices is larger, we subtract these, again reaching an affine function of the same form. The coefficients themselves are small, and can be stored directly. Determining whether the subtraction result is negative or not is a problem of the same kind as was solved earlier: subtraction, multiplication and addition need to be calculated over variables; in this case the variables are the coefficients, instead of the original formal variables.
However, there is a distinct difference in working with coefficients, in that they, themselves, are calculable as polynomials over formal variables. The calculation can, therefore, be transformed into addition, multiplication and subtraction, once again over the original formal variables.
Although it may seem as though this conclusion returns us to the original problem, it does not. Consider, among all formal variables, the one defined last. This variable cannot appear in the exponentiation coefficients of any of the new polynomials. Therefore, the new equation is of the same type as the old equation but with at least one formal parameter less. Repeating the process over at most recursion steps (a polynomial number) allows us to compare any two indices for their sizes. ∎
See the function cmp and the method Command::cmp in Appendix A for an implementation.
2.3 Incorporating arbitrary numbers
The framework described in Section 2.1 can readily incorporate simulation of arbitrary large number computation. We use it now, to prove Theorem 4.
Proof of Theorem 4.
Having proved Theorem 1, what remains to be shown is
As in the proof of Theorem 1, it is enough to show that an SLP that is able to handle all operations can be simulated in PSPACE.
We begin by noting that because the PTIME-ARAM must work properly for all but a finite range of numbers as its ALN input, it is enough to show one infinite family of numbers that can be simulated properly. We choose , for any sufficiently large . In the simulation, we treat this as a new formal variable, as was done with outputs of “” operations.
Lemmas 1–3 continue to hold in this new model. They rely on the ability to compare between two indices, which, in the previous model, was guaranteed by Lemma 4. The technique by which Lemma 4 was previously proved was to show that comparison of two indices is tantamount to evaluating the sign of an affine combination of the exponents associated with a list of formal variables, when using their standard evaluation. This was performed recursively. The recursion was guaranteed to terminate, because at each step the new affine combination must omit at least one formal variable, namely the last one to be defined. Ultimately, the sign to be evaluated is of a scalar, and this can be performed directly.
When adding the new formal variable , the same recursion continues to hold, but the terminating condition must be changed. Instead of evaluating the sign of a scalar, we must evaluate the sign of a formal expression of the form . For a sufficiently large (which we assume to be), the sign is the result of lexicographic evaluation. ∎
A full C++ implementation of the solution appears in Appendix B.
3 Models with division
Our proof of Theorem 3 appears in Appendix C. It resembles [13] in that it uses Simon’s ingenious argument that, for any given , the value can be calculated in -time by considering geometric series summation techniques. The result is an integer that includes, in windows of length bits, every possible bit-string of length . The simulating RAM acts by verifying whether any of these bit-strings is a valid tableau for an accepting computation by the simulated TM. This verification is performed using bitwise Boolean operations, in parallel over all options. The most salient differences between the proofs, being the errors in Simon’s original argument that this paper corrects, are as follows.
-
1.
Simon does not show how a TM can simulate an arbitrary RAM in ER-time, making his result a lower-bound only.
-
2.
Simon uses what he calls “oblivious Turing machines” (which are different than those of [7]) in a way that simultaneously limits the TM’s tape size and maximum execution time (only the latter condition being considered in the proof), and, moreover, are defined in a way that is non-uniform, in the sense that adding more tape may require a different TM, with potentially more states, a fact not accounted for in the proof.
-
3.
Most importantly, Simon underestimates the length needed for the tableau, taking it to be the value of the input. TMs are notorious for using up far more tape than the value of their inputs (see [9]).
Ultimately, Theorem 3 proves that the power of a , where op is RAM-constructable and includes , is limited only by the maximal size of values that it can produce (relating to the maximal tableau size that it can generate and check). Considering this, the proof of Theorem 5 becomes a trivial corollary. The full details are given in Appendix D, but the basic idea is that any accepting computation by a TM is necessarily of some finite length . Consider an operation set , where is a function that has no parameters and returns a number that is at least as large as . This places the computation in -TM, so by Theorem 3 it is in , which is a subset of .
We have shown, therefore, that while arbitrary numbers have no effect on computational power without division, with division they provide Turing completeness in computational resources.
References
- [1] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley Publishing Co., Reading, Mass.-London-Amsterdam, 1975. Second printing, Addison-Wesley Series in Computer Science and Information Processing.
- [2] Amir M. Ben-Amram and Zvi Galil. On the power of the shift instruction. Inf. Comput., 117:19–36, February 1995.
- [3] Michael Brand. Does indirect addressing matter? Acta Inform., 49(7-8):485–491, 2012.
- [4] Nader H. Bshouty, Yishay Mansour, Baruch Schieber, and Prasoon Tiwari. Fast exponentiation using the truncation operation. Comput. Complexity, 2(3):244–255, 1992.
- [5] Juris Hartmanis and Janos Simon. On the power of multiplication in random access machines. In 15th Annual Symposium on Switching and Automata Theory (1974), pages 13–23. IEEE Comput. Soc., Long Beach, Calif., 1974.
- [6] Yishay Mansour, Baruch Schieber, and Prasoon Tiwari. Lower bounds for computations with the floor operation. SIAM J. Comput., 20(2):315–327, 1991.
- [7] Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. Assoc. Comput. Mach., 26(2):361–381, 1979.
- [8] Vaughan R. Pratt, Michael O. Rabin, and Larry J. Stockmeyer. A characterization of the power of vector machines. In Sixth Annual ACM Symposium on Theory of Computing (Seattle, Wash., 1974), pages 122–134. Assoc. Comput. Mach., New York, 1974.
- [9] T. Radó. On non-computable functions. Bell System Tech. J., 41:877–884, 1962.
- [10] Walter J. Savitch. Relationships between nondeterministic and deterministic tape complexities. J. Comput. System. Sci., 4:177–192, 1970.
- [11] Arnold Schönhage. On the power of random access machines. In Automata, Languages and Programming (Sixth Colloq., Graz, 1979), volume 71 of Lecture Notes in Comput. Sci., pages 520–529. Springer, Berlin, 1979.
- [12] Janos Simon. On feasible numbers (preliminary version). In Conference Record of the Ninth Annual ACM Symposium on Theory of Computing (Boulder, Colo., 1977), pages 195–207. Assoc. Comput. Mach., New York, 1977.
- [13] Janos Simon. Division in idealized unit cost RAMs. J. Comput. System Sci., 22(3):421–441, 1981. Special issue dedicated to Michael Machtey.
- [14] Janos Simon and Mario Szegedy. On the complexity of RAM with various operation sets. In Proceedings of the twenty-fourth Annual ACM Symposium on Theory of Computing, STOC ’92, pages 624–631, New York, NY, USA, 1992. ACM.
- [15] R. E. Stearns, J. Hartmanis, and P. M. Lewis. Hierarchies of memory limited computations. In Proceedings of the 6th Annual Symposium on Switching Circuit Theory and Logical Design (SWCT 1965), FOCS ’65, pages 179–190, 1965.
- [16] Jerry L. Trahan, Michael C. Loui, and Vijaya Ramachandran. Multiplication, division, and shift instructions in parallel random-access machines. Theor. Comp. Sci., 100(1):1–44, Jun 22 1992.
- [17] Alan M. Turing. On computable numbers, with an application to the Entscheidungsproblem. Proc. London Math. Soc., 42:230–265, 1936.
Appendix A Code to simulate P-RAM in PSPACE
A.1 slp.h
/*
Interface file to the SLP and related classes.
This class simulates a PTIME-SLP[+,-,*,,,Bool] in PSPACE on a Turing machine.
*/
#include algorithm
#include vector
class SLP; 10
class Command;
class Index
{
public:
Command* command;
std::vectorCommand* values;
std::vectorint counters;
std::vectorint lines;
std::vectorint maxima; 20
public:
Index begin() const;
Index end() const;
Index rbegin() const;
Index zero() const;
Index unity() const;
Index& next();
public:
Index() {}
Index(Command* command, const Index& old); 30
void make index(int line, Command& value);
Index& operator++();
Index& operator();
Index normalize(const Index& ref) const;
};
int cmp(const Index& index1, const Index& index2);
bool operator(const Index& index1, const Index& index2);
bool operator(const Index& index1, const Index& index2); 40
bool operator==(const Index& index1, const Index& index2);
bool operator=(const Index& index1, const Index& index2);
bool operator=(const Index& index1, const Index& index2);
bool operator!=(const Index& index1, const Index& index2);
Index operator+(const Index& index1, const Index& index2);
Index operator(const Index& index1, const Index& index2);
class Command
{ 50
private:
int m line;
Index m begin;
Index m end;
Index m rbegin;
protected:
virtual int eval(const Index& index) const=0;
Index make index(Command& exponent);
SLP* slp;
public: 60
const Index& begin() const { return m begin; }
const Index& end() const { return m end; }
const Index& rbegin() const { return m rbegin; }
virtual int cmp(const Index& index1, const Index& index2) const;
public:
Command(SLP& slp);
void set slp(SLP* slp, int line);
int operator[ ](const Index& index);
int is nonzero();
int line() { return m line; } 70
};
class Const : public Command
{
private:
int val;
protected:
virtual int eval(const Index& index) const;
public:
virtual int cmp(const Index& index1, const Index& index2) const; 80
Const(SLP& slp, int val) : Command( slp), val( val!=0) {}
};
class And : public Command
{
private:
Command& arg1;
Command& arg2;
protected:
virtual int eval(const Index& index) const; 90
public:
And(SLP& slp, int arg1, int arg2);
};
class Or : public Command
{
private:
Command& arg1;
Command& arg2;
protected: 100
virtual int eval(const Index& index) const;
public:
Or(SLP& slp, int arg1, int arg2);
};
class Not : public Command
{
private:
Command& arg;
protected: 110
virtual int eval(const Index& index) const;
public:
Not(SLP& slp, int arg);
};
class Add : public Command
{
private:
Command& arg1;
Command& arg2; 120
protected:
virtual int eval(const Index& index) const;
public:
Add(SLP& slp, int arg1, int arg2);
};
class Sub : public Command
{
private:
Command& arg1; 130
Command& arg2;
protected:
virtual int eval(const Index& index) const;
public:
Sub(SLP& slp, int arg1, int arg2);
};
class Mult : public Command
{
private: 140
Command& arg1;
Command& arg2;
int make pn(Command& command, const Index& index) const;
protected:
virtual int eval(const Index& index) const;
public:
Mult(SLP& slp, int arg1, int arg2);
};
class LShift : public Command 150
{
private:
Command& mantissa;
Index exponent;
protected:
virtual int eval(const Index& index) const;
public:
LShift(SLP& slp, int mantissa, int exponent);
};
160
class RShift : public Command
{
private:
Command& mantissa;
Index exponent;
protected:
virtual int eval(const Index& index) const;
public:
RShift(SLP& slp, int mantissa, int exponent);
}; 170
class SLP
{
private:
std::vectorCommand* commands;
public:
SLP();
void push back(Command* command);
Command& operator[ ](int line) { return *commands[line]; }
int is nonzero() { return (*commands.rbegin())is nonzero(); } 180
˜SLP();
};
A.2 slp.cpp
/*
Implementation file of the SLP and related classes.
This class simulates a PTIME-SLP[+,-,*,,,Bool] in PSPACE on a Turing machine.
*/
#include "slp.h"
#include numeric
using namespace std; 10
Index Index::begin() const
{
Index rc=*this;
for(int i=0;imaxima.size();++i) {
rc.counters[i]=maxima[i];
}
return rc;
}
20
Index Index::end() const
{
Index rc=*this;
rc.counters.clear();
rc.values.clear();
rc.maxima.clear();
return rc;
}
Index Index::rbegin() const 30
{
Index rc=*this;
for(int i=0;imaxima.size();++i) {
rc.counters[i]=maxima[i];
}
return rc;
}
Index Index::zero() const
{ 40
Index rc=*this;
for(int i=0;imaxima.size();++i) {
rc.counters[i]=0;
}
return rc;
}
Index Index::unity() const
{
Index rc=zero(); 50
rc.counters[counters.size()1]=1;
return rc;
}
Index& Index::next()
{
int i;
for (i=0;imaxima.size();++i) {
++counters[i];
if (counters[i]==maxima[i]+1) { 60
counters[i]=maxima[i];
} else {
return *this;
}
}
if (i==maxima.size()) {
*this=end();
}
return *this;
} 70
Index::Index(Command* command, const Index& old)
: command( command), values(old.values), counters(old.counters),
lines(old.lines), maxima(values.size())
{
for(int i=0;ivalues.size();++i) {
maxima[i]=1(commandline()lines[i]);
}
}
80
void Index::make index(int line, Command& value)
{
values.push back(&value);
counters.push back(0);
lines.push back(line);
maxima.push back(1(commandline()line));
}
Index& Index::operator++()
{ 90
if (counters.size()==0) {
return *this;
}
Index rc=end();
for(Index i=begin();i!=end();i.next()) {
if ((i*this)&&((rc==end()) (rci))) {
rc=i;
}
}
*this=rc; 100
return *this;
}
Index& Index::operator()
{
if (counters.size()==0) {
return *this;
}
Index rc=end();
for(Index i=begin();i!=end();i.next()) { 110
if ((i*this)&&((rc==end()) (rci))) {
rc=i;
}
}
*this=rc;
return *this;
}
Index Index::normalize(const Index& ref) const
{ 120
if (counters.size()==0) {
return *this;
}
Index rc=ref.end();
for(Index i=ref.begin();i!=ref.end();i.next()) {
if ((*this=i)&&((rc==ref.end()) (rci))) {
rc=i;
}
}
return rc; 130
}
int cmp(const Index& index1, const Index& index2)
{
if (index1.counters.size()==0) {
if (index2.counters.size()==0) {
return 0;
} else {
return 2;
} 140
} else if (index2.counters.size()==0) {
return 2;
}
Index indexL,indexR;
if (index1.counters.size()=index2.counters.size()) {
indexL=indexR=index1;
} else {
indexL=indexR=index2;
}
for(int i=0;iindexL.counters.size();++i) { 150
indexL.counters[i]=indexR.counters[i]=0;
}
for(int i=0;iindex1.counters.size();++i) {
indexL.counters[i]=max(index1.counters[i],0);
indexR.counters[i]=max(index1.counters[i],0);
}
for(int i=0;iindex2.counters.size();++i) {
indexL.counters[i]+=max(index2.counters[i],0);
indexR.counters[i]+=max(index2.counters[i],0);
} 160
// comparing indexL to indexR is equivalent to comparing index1 to index2,
// but none of the coefficients are negative.
return indexL.commandcmp(indexL,indexR);
}
Index Command::make index(Command& exponent)
{
m begin.make index(line(),exponent);
m begin=m begin.zero();
m end=m begin.end(); 170
m rbegin=m begin.rbegin();
return m begin.unity();
}
int Command::cmp(const Index& index1, const Index& index2) const
{
Command* command=(*index1.values.rbegin());
Index i=commandbegin();
int maximum=max(accumulate(index1.counters.begin(),index1.counters.end(),0),
accumulate(index2.counters.begin(),index2.counters.end(),0)); 180
int logmax=0;
while (1logmaxmaximum) ++logmax;
i.maxima[0]+=logmax;
int acc1=0;
int acc2=0;
int sign=0;
for(;i!=commandend();++i) {
acc1=1;
acc2=1;
for(int j=0;jindex1.counters.size();++j) { 190
int bit=(*index1.values[j])[i];
acc1+=index1.counters[j]*bit;
acc2+=index2.counters[j]*bit;
}
if (acc1%2acc2%2) {
sign=1;
}
if (acc1%2acc2%2) {
sign=1;
} 200
}
return sign;
}
Command::Command(SLP& slp)
{
slp.push back(this);
}
void Command::set slp(SLP* slp, int line) 210
{
slp= slp;
m line= line;
if (m line0) {
m begin=Index(this,(*slp)[m line1].begin());
m begin=m begin.zero();
m end=m begin.end();
m rbegin=m begin.rbegin();
}
if (m line==1) { 220
make index(*this);
}
}
bool operator(const Index& index1, const Index& index2)
{
return cmp(index1,index2)0;
}
bool operator(const Index& index1, const Index& index2) 230
{
return cmp(index1,index2)0;
}
bool operator==(const Index& index1, const Index& index2)
{
return cmp(index1,index2)==0;
}
bool operator=(const Index& index1, const Index& index2) 240
{
return cmp(index1,index2)=0;
}
bool operator=(const Index& index1, const Index& index2)
{
return cmp(index1,index2)=0;
}
bool operator!=(const Index& index1, const Index& index2) 250
{
return cmp(index1,index2)!=0;
}
Index operator+(const Index& index1, const Index& index2)
{
Index rc=index1;
for(int i=0;irc.counters.size();++i) {
rc.counters[i]=index1.counters[i]+index2.counters[i];
} 260
rc.normalize(index1);
return rc;
}
Index operator(const Index& index1, const Index& index2)
{
Index rc=index1;
for(int i=0;irc.counters.size();++i) {
rc.counters[i]=index1.counters[i]index2.counters[i];
} 270
rc.normalize(index1);
return rc;
}
int Command::operator[ ](const Index& index)
{
if (indexbegin()) {
return 0;
}
Index normalized=index.normalize(begin()); 280
return eval(normalized);
}
int Command::is nonzero()
{
for(Index index=begin();index!=end();++index) {
if (eval(index)==1) {
return 1;
}
} 290
return 0;
}
int Const::eval(const Index& index) const
{
return val&&(index==begin());
}
int Const::cmp(const Index& index1, const Index& index2) const
{ 300
return index1.counters[0]index2.counters[0];
}
int And::eval(const Index& index) const
{
return arg1[index]&&arg2[index];
}
And::And(SLP& slp, int arg1, int arg2)
: Command( slp), arg1( slp[ arg1]), arg2( slp[ arg2]) 310
{
}
int Or::eval(const Index& index) const
{
return arg1[index] arg2[index];
}
Or::Or(SLP& slp, int arg1, int arg2)
: Command( slp), arg1( slp[ arg1]), arg2( slp[ arg2]) 320
{
}
int Not::eval(const Index& index) const
{
int rc=1;
for(Index i=rbegin();(i=index)&&(rc==1);i) {
rc=1arg[i];
}
if (rc==1) { 330
return 0; // The tweaked NOT operator
}
return 1arg[index];
}
Not::Not(SLP& slp, int arg) : Command( slp), arg( slp[ arg])
{
}
int Add::eval(const Index& index) const 340
{
int acc=0;
for(Index i=begin();i=index;++i) {
acc=1;
acc+=arg1[i]+arg2[i];
}
return acc%2;
}
Add::Add(SLP& slp, int arg1, int arg2) 350
: Command( slp), arg1( slp[ arg1]), arg2( slp[ arg2])
{
}
int Sub::eval(const Index& index) const
{
int acc=0;
Index i;
for(i=begin();i=index;++i) {
acc=1; 360
acc+=arg1[i]arg2[i];
}
int rc=(acc%2)!=0;
for(;i!=end();++i) {
acc=1;
acc+=arg1[i]arg2[i];
}
if (acc==0) {
return rc;
} else { 370
return 0; // arg2arg1
}
}
Sub::Sub(SLP& slp, int arg1, int arg2)
: Command( slp), arg1( slp[ arg1]), arg2( slp[ arg2])
{
}
int Mult::make pn(Command& command, const Index& index) const 380
{
int curr bit=command[index];
if (indexbegin()) {
return 0;
} else if (index==begin()) {
return curr bit;
} else {
Index prev=index;
prev;
int prev bit=command[prev]; 390
return prev bitcurr bit;
}
}
int Mult::eval(const Index& index) const
{
int acc=0;
for(Index i=begin();i=index;++i) {
acc=1;
for(Index p1=begin();p1!=end();++p1) { 400
Index p2=ip1;
acc+=make pn(arg1,p1)*make pn(arg2,p2);
}
}
return (acc%2)!=0;
}
Mult::Mult(SLP& slp, int arg1, int arg2)
: Command( slp), arg1( slp[ arg1]), arg2( slp[ arg2])
{ 410
}
int LShift::eval(const Index& index) const
{
return mantissa[indexexponent];
}
LShift::LShift(SLP& slp, int mantissa, int exponent)
: Command( slp),
mantissa( slp[ mantissa]), exponent(make index( slp[ exponent])) 420
{
}
int RShift::eval(const Index& index) const
{
return mantissa[index+exponent];
}
RShift::RShift(SLP& slp, int mantissa, int exponent)
: Command( slp), 430
mantissa( slp[ mantissa]), exponent(make index( slp[ exponent]))
{
}
SLP::SLP()
{
new Const(*this,0);
new Const(*this,1);
}
440
void SLP::push back(Command* command)
{
commands.push back(command);
(*commands.rbegin())set slp(this,commands.size()1);
}
SLP::˜SLP()
{
for(vectorCommand*::iterator it=commands.begin();
it!=commands.end();++it) { 450
delete *it;
}
}
A.3 main.cpp
/*
Example program, using the class SLP.
*/
#include iostream
#include "slp.h"
using namespace std;
10
int main(void)
{
SLP sample slp; // The SLP being simulated is: s[0] := 0; s[1] := 1;
new LShift(sample slp,1,1); // s[2] := s[1] s[1];
new Sub(sample slp,2,1); // s[3] := s[2] - s[1];
new Sub(sample slp,3,1); // s[4] := s[3] - s[1];
cout sample slp.is nonzero() endl; // (s[4] != 0) ?
return 0;
}
20
Appendix B Code to simulate PTIME-ARAM in PSPACE
B.1 aln.h
/*
Interface file to the ALN class.
This class extends the capabilities of class SLP by adding support of
arbitrary large numbers.
*/
#include "slp.h"
class ALN : public Command 10
{
private:
Index exponent;
protected:
virtual int eval(const Index& index) const;
public:
virtual int cmp(const Index& index1, const Index& index2) const;
ALN(SLP& slp) : Command( slp), exponent(make index(*this)) {}
};
20
B.2 aln.cpp
/*
Implementation file of the ALN class.
This class extends the capabilities of class SLP by adding support of
arbitrary large numbers.
*/
#include "ALN.h"
int ALN::eval(const Index& index) const 10
{
return index==exponent;
}
int ALN::cmp(const Index& index1, const Index& index2) const
{
if (*index1.counters.rbegin()==*index2.counters.rbegin()) {
Index indexL=index1;
Index indexR=index2;
indexL.counters.pop back(); 20
indexL.values.pop back();
indexL.maxima.pop back();
indexR.counters.pop back();
indexR.values.pop back();
indexR.maxima.pop back();
return Command::cmp(indexL,indexR);
} else {
return (*index1.counters.rbegin())(*index2.counters.rbegin());
}
} 30
B.3 main_w_ALN.cpp
/*
Example program, using the class ALN.
*/
#include iostream
#include "aln.h"
using namespace std;
10
int main(void)
{
SLP sample slp; // The SLP being simulated is: s[0] := 0; s[1] := 1;
new ALN(sample slp); // s[2] := 2ˆomega;
new Add(sample slp,1,2); // s[3] := s[1] + s[2];
cout sample slp.is nonzero() endl; // (s[3] != 0) ?
return 0;
}
Appendix C Proof for Theorem 3
We prove Theorem 3 by first establishing a sequence of lemmas.
Lemma 5.
A TM can be simulated by a using only bounded shifts.222That is, the right operand to the shift operation, being the exponent, is bounded by a value independent of the input. Equivalently, shifts can be restricted to shift-by-. This is considered to be a weaker operation than general shifting. A TM run requiring execution steps can be simulated in this way by RAM steps. In the simulation, advancing the TM by a single step is simulated by an .
Proof.
The simulation will store the state of the TM on three registers: tape, head and state. Register tape will store the current state of the tape. At start-up, this register is initialized by
where “” is the assignment operator (as opposed to “”, which signifies left shift).
Register head will store the position of the reading head. The format for doing so is that if the reading head is at position on the tape, the value of head will be , this being a “” at binary digit position and “” everywhere else. At start-up, the register is initialized by
This places the head at the first position on the tape. (The count of position numbers begins at .)
Register state will signify the instantaneous state of the finite control. We number the states of the TM arbitrarily from to , where is the total number of states. By convention, state will be the initial state of the machine. The format for storing the state number on the state variable is that the variable equals the state number times head. In other words, the number is stored starting at the bit position of the reading head. At start-up,
Let , this constant being the number of bits required to store the state number. The transition function of the Turing machine is a function from the current state and the value of the tape currently under the reading head to a new state, a new value and a new position of the reading head (which is at most one position from its previous position).
Consider the function only for the new value under the reading head. This is a function from bits to bit, and can therefore be described by a finite number of Boolean operations. Applying this Boolean function requires, however, that all operands be available as bits. Consider, now, the numbers . In each of these numbers a different bit of the original state description is aligned with the bit position of the reading head. Boolean algebra on these numbers along with the number tape will result in the output value having the correct new bit value for tape in the bit position that is under the reading head (though its other values may not signify anything meaningful). Let us refer to this numerical result as output.
To calculate the new value for tape, we note that the value under the reading head is the only one that requires updating. All other bit positions retain their original values. Hence,
is the correct update.
We remark that the above expression relates to standard Boolean algebra. In our case, the operation “” has been tweaked so as to ensure that the result is a nonnegative integer. (Non-tweaked negation would result in a Boolean string that has an infinite number of leading “” bits, which would not be a valid result.) That being the case, should be taken as a single (not tweaked) Boolean operation. To avoid confusion we denote it “ ” throughout, and continue to use “” to mean tweaked negation.
Computing the value of the ’th bit of the new state number is attained by the same means as computing the new tape bit value. Let denote the result of applying Boolean operations on tape and through to calculate this ’th bit. The update of state is
Lastly, we need to update the variable head. To do so, we calculate three functions from tape and through , namely IsRightMotion, IsStay and IsLeftMotion, calculating whether the head is to move to the right, stay in place, or move to the left, respectively. The update required is
This simulation, as described above, is already correct. However, we will change it slightly in order to make it more elegant. Specifically, we wish to avoid the scenario of the reading head dropping off the edge of the tape, which may happen if and the next motion is a motion to the right. The result will be that the new head value is .333The TM being simulated may legitimately move the reading head beyond the edge of the tape as a method of rejecting the input.
To avoid this, we introduce a new element into the simulation. This is a constant. Its name is boundary and its value is . Furthermore, we add one new state into the state machine, this being a rejecting halting state, signifying that the tape head has dropped off the tape. (Adding a new state may cause an increase in , requiring a change in the update functions.) The boundary condition is a new bit in the input of the update functions. Its value is , and it triggers a transition to the new rejecting state, and ultimately no head motion. ∎
Lemma 6.
A TM working on a bounded tape of length can be simulated by a using only bounded shifts if the RAM is given as an additional input. The simulation is required to be uniform in this input. A TM run requiring execution steps can be simulated in this way by RAM steps. In the simulation, advancing the TM by a single step is simulated by an .
Note that the parameter is not given as an input to the RAM.
Proof.
We want to make the simulation of Lemma 5 into a tape-bounded one. In the new simulation, the value of boundary will be .
To the boundary condition discussed in Lemma 5 we now add a second, triggered by , which will cause the state machine to transition to another new halting state, namely one signifying that the simulation was stopped due to the reading head exceeding its tape allocation. ∎
An explanation is due as to the question of why the second bit of the boundary is at position , and not at position . The reason is that boundary does not signify the end of the tape, but rather the end of all bits used in the simulation process. When the reading head is at its left-most position (position ) the state of the TM is written in parameter state between its ’th and ’th position. Therefore, the bit is the first not to be part of the simulation. Some of the bits before the bit, such as the last bits of tape are not used in the simulation process.
This property is put to use in Lemma 7.
Lemma 7.
For any nonnegative , and any naturals , copies of the same TM, working on inputs on tapes of sizes and infinity, respectively, can be simulated by a using only bounded shifts, given that the RAM is given its input in the following format:
and that the output is described by
where are the halting states of the TMs at program termination. The simulation requires RAM steps, where is the number of steps required by the longest running of the executions. In the simulation, advancing all the TMs together by a single step is simulated by an .
Proof.
We simulate all TMs in parallel. In doing so, it is convenient to simulate an “advance one step” also on the TMs that have already halted. Let us therefore reconsider the term “halting state”, in order for us to allow continued execution of the simulation even after a halt. To allow this, transition functions need to be defined also for halting states. We define these as follows: if is a halting state, transitions from are all to , the tape contents are never changed, and the reading head will always move to the right, unless it is already at the end of the tape, in which case it will stop. Clearly, all this can be encapsulated by the Boolean functions already discussed.
We note that if a TM runs for steps, its reading head is no more than positions from the right end of the tape. Hence, in steps, the TM’s state, head-position and tape contents will have reached their final values.
The reason for the strange behavior we require for the halting states is that it allows us to easily query the TM simulator, at the end of the simulation, regarding its halting condition. Let us assume, without loss of generality, that the only halting states are
-
•
signifying an accepting calculation,
-
•
signifying a rejecting calculation, and
-
•
signifying that the calculation was aborted due to exceeded tape length.
Because the ultimate position of the reading head is known to be , querying the simulator for the final state is simply a test for equality.
To initialize the simulator we set
boundary | |||
head | |||
state | |||
tape |
Because the positions of the “”s on boundary signify a separation of the bit positions into segments that cannot interact, simply using the described in Lemma 5 and then re-used in Lemma 6 results in all TMs advancing one step in parallel.
Lastly, we need to test whether all TMs have halted. This can be split into the following conditions. First, we verify that
and
This guarantees that all TMs are either in one of the three halting states or in the initial state, . To verify that none are in zero, we simply check that the set of TMs that have halted in one of the halting conditions covers all TMs. This is done by
When all of the above conditions are satisfied, execution of the RAM terminates, and the output can be read off state. ∎
We remark regarding Lemma 7 that even though parallelization over machines is possible, it is not necessary. By zeroing-out all bits of head in any given segment, the TM related to that segment ceases to advance. Indeed, that would have been a different method to approach the question of how to simulate the execution of machines that have already halted.
We now strengthen Lemmas 5, 6 and 7 further, by omitting from each the right-shift operator, which can be done by means of the following lemma.
Lemma 8.
For , if a does not use indirect addressing and is restricted to bounded shifts, it can be simulated by a without loss in time complexity. This result remains true also if the RAM can apply “” when is the (unbounded) contents of a register, provided that the calculation of does not involve use of the “” operator.
Proof.
We begin by considering the case of bounded shifts.
A RAM that does not use indirect addressing is inherently able to access only a finite set of registers. Without loss of generality, let us assume that these are . The simulating RAM will have satisfying the invariant
To do this, we initialize to be , and proceed with the simulation by translating any action by the simulated RAM on , for any , to the same action on .444An action involving an explicit “” (except for shifting by 1) will have the “” replaced by in the simulation. We do this for all actions except , which is an operation that is unavailable to the simulating RAM.
To simulate “”, we perform the following.
-
1.
.
-
2.
.
-
3.
.
We note regarding the second step that this operation is performed also on . The fact that is bounded ensures that this step is performed in time.
Essentially, if “” is thought of as “”, Step 1 performs the assignment, Step 2 the division, and Step 3 the truncation.
In order to support “” also when is the product of a calculation, the simulating RAM also performs, in parallel to all of the above, a direct simulation that keeps track of the register’s native values. In this alternate simulation, right shifts are merely ignored. Any calculation performed by the simulated RAM that does not involve right shifts will, however, be calculated correctly, so the value of will always be correct. ∎
Before proceeding further, we introduce three definitions.
Definition 4 (Instantaneous Description).
Let be a TM working on a bounded tape of size and having a state space that can be described (including up to additional halting states, as in Lemma 7) by bits.
An instantaneous description of at any point in its execution is the value of at that point in its execution, where tape, state and head are as in Lemma 6.
Definition 5 (Vectors).
A triplet of integers will be called an encoded vector. We refer to as the of the vector, as the contents of the vector and as the of the vector. If with , then will be called the vector (or, the decoded vector), and the will be termed the vector elements. Notably, vector elements belong to a finite set of size and are not general integers. It is well-defined to consider the most-significant bits (MSBs) of vector elements. Nevertheless, any integers can be encoded as a vector, by choosing a large enough .
Actions described as operating on the vector are mathematical operations on the encoded vector (typically, on the vector contents, ). However, many times we will be more interested in analyzing these mathematical operations in terms of the effects they have on the vector elements. Where this is not ambiguous, we will name vectors by their contents. For example, we can talk about the “decoded ” to denote the decoded vector corresponding to some encoded vector whose contents are .
Definition 6 (Tableau).
Let , and be as in Definition 4. A tableau is a vector of width , whose ’th element is the instantaneous description of after execution steps.
Furthermore, we define
For , this is the vector .
Lemma 9.
There exists a constant, , such that for every TM, , there exists a , , that takes inputs, such that for every number inp, accepts inp if and only if there exist witness numbers , for which accepts on
and executes in time on every set of inputs.
Proof.
The TM accepts if and only if there is an accepting computation path beginning at the initial instantaneous description at the start of execution with inp as its input. (The terminology “accepting computation path” is more commonly applied to nondeterministic computation. In deterministic computation, there is exactly one path. The question is only whether it is accepting or not.) If there exists an accepting computation path, it can be described by a tableau. (If it does not exist, no corresponding tableau exists.) Let be this tableau. We will choose witness integers so that the RAM is able to verify the existence and correctness of the tableau.
We use . The auxiliary inputs to the RAM will be the following five witness integers.
The triplet defines the tableau, by a reversible transformation. The first step of the verification process is not to test the validity of this tableau, but rather to ascertain that and correspond properly to this tableau definition.
Let , and note that it can be computed by . Ascertaining the validity of is done by verifying that
This follows directly from the definition of .
We do not actually have subtraction as part of our operation set, but can be simulated in this case by after having verified that .
Validating is done following exactly the same principles, only replacing for in the computation.
At first glance, it may seem that also needs to be verified in order to ascertain that it is a multiple of , but in fact this has already been covered by the previous checks: if is a multiple of then is a multiple of . That being the case, the fact that we have already established
directly implies that is a multiple of , and therefore also is a multiple of .
With these verified, the vector contents of the tableau, , given in , can be decomposed into its constituents.
tape | |||
state | |||
head |
Furthermore, a corresponding boundary parameter can be devised by
On the assumption that the tableau is legitimate, if we were to use these new values as inputs for the RAM described in Lemma 7 (which can be used here, because the right-shift operations can be circumvented, as per Lemma 8), then what these inputs describe is copies of the same TM, each performing execution on the same input, but at a different stages in the execution. The first TM is at execution start, the next one is after one execution step, etc.. (Note: the tableau is shifted left by , but, as stated before, this merely introduces an “unused” machine into the proof of Lemma 7.)
We can now use the RAM described in Lemma 7 to simulate, in total time, one execution step of each of these TM steps. Let us denote the output variables after a single-step operation . The legitimacy of the tableau can now be verified. A true tableau will satisfy two conditions:
-
1.
The state of the machine described in element of the original tableau is the correct initial state of the TM to be simulated.
-
2.
For each , the instantaneous description of the machine described in element of the tableau, after advancing by a single step, equals the original instantaneous description in element of the tableau.
Verifying these conditions for tape, for example, is done by checking that
The idea is that should equal tape in all but the first and last elements. These are handled separately. The same method works for verifying the validity of state and head.
Once the tableau is known to be correct, we finish by ascertaining that its final state is an accepting state:
and
∎
Though not required for the main proofs that these lemmas have all been building up towards, it is still illuminating to see that Lemma 9 implies a much stronger claim regarding nondeterministic computation. This is discussed in Appendix E.
We now turn to the main proof.
Proof of Theorem 3.
The statement of the theorem involves equality between four Turing machine related complexities and one RAM complexity. The equivalence between the deterministic time and deterministic space TM complexities stems from the well-known argument that
In execution steps, the reading head cannot reach more than elements into the tape, proving , whereas an execution bounded by tape elements cannot proceed more than execution steps without retracing its steps, proving . In expansion limit terms, is greater than but is bounded from above by . This argument shows equivalence between time-bounded TMs and space-bounded TMs, which is a stricter condition than the one actually needed for the theorem.
The addition of nondeterminism to a space-bounded TM requires only a limited amount of extra space to simulate on a deterministic TM, as is proved by Savitch’s Theorem [10], so here, too, an is all that is needed. Simulating an time nondeterministic TM by a deterministic TM requires at most nondeterministic bits, and can hence be simulated by runs of the deterministic algorithm, enumerating over the witness string. Exponentiation is, again, within .
We will, therefore, only need to prove the following equality.
This, in turn, can be thought of as two statements. We begin by demonstrating the simpler
(1) |
then continue to our main argument,
(2) |
To prove Equation (1) we show that an space-bounded TM can simulate an -RAM. We allow this simulation to use a larger set of tape symbols than the original , knowing that this costs at most a linear increase in the amount of space required [15], which cannot violate the bound. Furthermore, as per [3], we continue to consider only RAMs that make no use of indirect addressing (and are therefore known to only access a finite number of registers).
Consider the following memory layout which may be used by the TM. Each of the (finite) number of registers of the RAM is allocated a new tape-alphabet character. We refer to these as “control” characters. During the TM’s execution, each control character will appear on the tape exactly once. All characters to the right of it and to the left of the next control character are deemed to be the contents of the relevant register. Additionally, a further control character, appearing last on the tape, marks the start of a “scratchpad”. The TM is initialized by moving the input one element to the right in order to make room for the control character, then writing the control character before the input and the rest of the control characters after (signifying registers whose contents are zero).
At each execution step, the operands are copied to the scratchpad, the operation required by the RAM is performed, the tape elements are moved enough to allocate enough space on the target register, and then the scratchpad result is copied to the target register and the scratchpad is cleared.
By the definition of EL, each register’s contents requires at most elements to store. To store all registers we require . By definition of RAM-constructability, the scratchpad requires no more than elements of memory.555This follows from , where the maximum is taken over all that are computable from inp in calculation steps. Together, the amount of tape required is bounded by , as required by the Theorem.
We now turn to the main problem of proving Equation (2). We do this by working from Lemma 9, noting that if there exists a tableau computation , then there also exists one with , where can be any number at least as large as , and is determined from so as to be at least as large as the number of possible instantaneous descriptions for the TM. (If the TM requires more steps, it will necessarily have retraced its steps and have entered an infinite loop.)
Given a chosen value for , the computed value of allows us to determine the number of bits in . It is . is, therefore, one of finitely many possibilities. If we were to enumerate over all of these possibilities, we would be able to answer whether a tableau for an accepting computation exists for a particular value of .
Consider now several tableau candidates, , each of which is a variable in the range . We wish to check simultaneously which of these (if any) is a correct tableau of bit-length , describing the execution of a TM running on a tape of length on input inp. To do this, we describe a new variable, .
Consider now Lemma 7. It allows the verification of a tableau by use of only bitwise operations and shifts by values that are functions only of . Because all candidates share the same , each of these operations is executed on all candidates simultaneously, by executing them on . There are only two places where this strategy fails.
First, the verification requires the use of several constants, such as and inp. In order to apply these simultaneously to all simulations, we need to replace the original constants by and , respectively. Second, the final step in verifying whether a tableau candidate is correct involves several comparisons of the type “”. These must now be executed in parallel, for which we need an operation EQ that takes two vectors and outputs a new vector of the same length and width as its two operands, whose elements are in positions where and otherwise. Appendix F describes how both of these can be calculated in time given the available set of operations, assuming .
A correct tableau is one that passes all equality checks outlined in the proof of Lemma 7. To simulate this in parallel for all candidates, we perform the necessary EQ per equality test, then take the bitwise AND over all results. The resulting vector, res, has a in its ’th position if and only if the candidate is correct.
For the purposes of the present proof, we are not interested in finding which of the tableau candidates is correct, but rather whether any of them are correct. This can be checked by a simple “”.
(Readers may wish to verify that the proof of Lemma 7 does not depend on the tableau candidate verified starting at bit-position , nor is the verification process disrupted by having any additional non-zero bits to the left or right of the tableau bits. The verification process does not use or disturb any bits in the integer, other than the being checked.)
The above argument, showing that the condition “the tableau is correct” can be verified in constant time can be applied equally well with additional checks such as “the tableau is correct and the halting state is ”. Thus, if we were to check all tableau candidates, the program does not only work as a verifier, but also as a complete simulator: it takes an input, and can return (in constant time) what the final state of the TM finite control is. In our case, the interesting final states are , and . The simulator should return which of these three the TM halted on, or return that none of the above occurred.
The only remaining question is how to create all possible tableaus in a single integer, for them to be verified simultaneously. One way to do this is as follows. The function described produces the contents of a vector of width and length , whose elements are all unique.
It is computed solely by use of left-shifting, exact division and Boolean operations.
We see, therefore, that all tableaus can be created and verified for correctness in constant time. Let us refer to the RAM program that verifies simultaneously and in constant time all possible tableaus pertaining to the execution of a TM on a tape of size as .
By the assumption of constructability, there also exists a RAM program, that calculates a value no smaller than in time. In order to complete the proof of the theorem, we therefore present Algorithm 1 that enables a RAM working in time to simulate any TM working in space.
We remark that Algorithm 1 accepts if the original TM accepts, and continues indefinitely if the original TM does not, but this is merely because we check only for a halting state of . It is also possible to terminate the run and reject if the halting state is (signifying that the TM halted and rejected) or if the halting state is not in - (signifying that the TM returned to a previous instantaneous description, and has therefore entered an infinite loop). However, if the TM continues indefinitely while consuming unbounded amounts of tape, the simulation cannot detect this, and will continue forever. This is, of course, inherent, because had the simulator been able to determine in finite time for every TM whether it halts or not, this would have contradicted Turing’s halting theorem [17]. ∎
Appendix D Adding arbitrary numbers
In the main paper we present the general reasoning that leads to Theorem 5. The formal argument is as follows.
Proof of Theorem 5.
If is an r.e. set, then there is a TM that recognizes it, indicating that if and only if there is an accepting computation of the TM with inp as the TM’s input. This accepting computation by definition requires only a finite number of execution steps of the TM, and therefore only a finite number, , of tape cells. Consider Algorithm 2, using the RAM-implementable function simulate discussed in Appendix C.
By the arguments of the proof for Theorem 3, a sufficient condition for simulate to return the proper result is . This is clearly satisfied by making “large enough”, which is what is guaranteed by “”. Formally, almost all choices for return the correct result, as the number of values that return an incorrect result is finite and bounded by . ∎
Appendix E Results about nondeterministic computation
In Turing machines, nondeterministic computation is often described as computation that makes use of an extra tape, with Oracle-provided information. In Turing machine computation, a machine that runs for execution steps can necessarily access no more than bits of this extra tape, so the Oracle-provided information (sometimes referred to as a “witness” or a “certificate”) for a polynomial-time TM algorithm is effectively limited to only a polynomial number of bits.
It has been shown (e.g. in [5, 8]) that the availability of such a certificate does not increase computational power in sufficiently-equipped RAMs. However, in [13] an alternative is discussed. RAMs have the ability to access entire integers in -time. Simon suggests that for RAMs one can analyze a variant of nondeterministic computation where the information provided by the Oracle is in the form of an integer of arbitrary length. We refer to a RAM with access to such an integer as an NRAM.
Theorem 6.
All recursively enumerable (r.e.) sets can be recognized in by an , where op is a superset of any of the following: , , and , where inc is the increment function.666This function is generally available to all RAMs and is often not listed on the operation set (c.f. [11]).
We prove separately for each of the instruction sets in the theorem.
Proof for .
For this part of the proof, the missing piece from Lemma 9 is the ability to extract all required integers, , from a single, Oracle-provided integer, , in time.
A critical observation is that there are some degrees of freedom in the choice of witnesses. For example, merely needs to be large enough. It signifies the length of the tape available to the TM, so choosing a larger value does not invalidate the simulation. As such, we can restrict to be a power of two at no cost.
Next, we note that is not actually needed from the Oracle. The number provides the length, , of the tableau, but choosing an overly long tableau is not a problem. All that needs to be ascertained is that the chosen length is not too small.
Consider how many simulation steps are needed to decide how a TM’s execution terminates. A TM on a bounded-length tape is essentially a finite state machine (FSM). It only has a constant number of instantaneous descriptions that it can be in. In our case, each instantaneous description is stored by bits. Hence, the size of the state space is trivially bounded by . There is no reason to simulate a deterministic FSM to more steps than the size of its state space, because if it has not terminated after steps this indicates that it has revisited a state more than once, and is therefore in an infinite loop.
Assuming that is a power of two, multiplication by can be performed by
Using is then equivalent to choosing for , which is more than enough.
This leaves us with the necessity to extract , , and from .
We do so by first considering . If ’s lowest bits are ones, but the bit is , this operation yields the number . We will take this to be , noting that by construction it is a power of two, as desired.
The next observation to make is that , and all have at most bits. As such, consider as the contents of a vector of width and length . The vector element in the ’th position, of which bits were already specified, we ignore. The remaining elements we take to be , and , respectively.
Let . This value can be calculated by
Given a right-shift operator, the three witness values would have been extractable by
and similarly for and , from which point the rest of the construction could follow that of Lemma 9.
Because right-shifting is, in fact, not assumed to be part of op, we note, instead, that all shifts in both the construction above and that of Lemma 9 are by amounts that require no right-shifting to calculate. (Aside from shifts by a constant, they are by , and , which we show here how to calculate explicitly with no use of right-shifting.) That being the case, we can apply Lemma 8 to show that the right-shift operator is not necessary. ∎
Proof for .
When right-shifting is the only shift available, we employ a slightly different tactic. Given a right-shift operator, unpacking in time integers, from a given integer, , is fairly straightforward. We do this by storing in the contents of the vector , where is the width of the vector and is a power of two and greater than any of the . (Regardless of what values we wish to store in the , it is always possible to choose an appropriate width, . We note that unlike in the first part of the proof, here does not serve any role other than being the width of this vector. It does not participate in the actual simulation of the TM.)
Finding is done by
as in the first part of the proof. Extracting the elements is done by
In order to proceed from extracted integers to a proof regarding a TM, we use the same algorithm as that of Lemma 9 (not using any of the short-cuts introduced for ). However, we switch every left shift for a right shift by the following technique.
First, note that Lemma 9 guarantees a procedure that terminates in time. As such, it can only use a bounded number of left-shift operations.
We can therefore transform each operation in the original proof to a comparison step: . If all of , , and are given as part of , this can be performed simply by the following verification steps.
First, we verify that is a power of two by . Next, we verify that it is the correct power of two by . Lastly, we verify the actual operation by coupled with . ∎
Proof for .
As before, we work from Lemma 9 and only need to complete two details.
-
1.
How does one unpack general integers from a single ?
-
2.
How does one simulate left-shifting using op?
Extracting multiple integers from is done in this case by storing in the contents of the vector , where is the width of the vector. Extracting is done as before, and the rest of the elements can be retrieved by repeatedly dividing by , and then ultimately taking the bitwise intersection with .
Next, we wish to simulate , which, as before, is done by verifying . Specifically, note that is always either a constant or one of and , and that this is the only context in which either or are ever used. Instead of storing these values directly, we therefore store and , instead.
Ascertain that and are valid can be performed by . Following the transformation from to , the left shifts become multiplications: . We verify them by and , where the first comparison ascertains that the second comparison involves an exact division (utilizing the fact that is a power of two). ∎
Proof for .
We use the same as in the proof for , noting that can be verified now simply by . This leaves us with the need to extract multiple integers, from a single , which is done by
Notably, the themselves are not extracted, but rather a shifted version of them. The reason that this partial extraction suffices is because in all operations, will be offset by a known power of compared to the offset with which we calculate , and simply multiplying one side by the appropriate (and bounded) number of times will serve to prove equality. ∎
Appendix F Some useful basic operations
We first note that “” can always be implemented as an operation when “” and Boolean functions are available. This can be done as follows.
where . As such, “” should always appear as an optional operation wherever addition and Boolean functions are part of a RAM’s basic operation set.
Our definition of was given originally as
With the optional “”, this can be implemented directly using the operations available in Theorem 3, except for the multiplication “”. However, if is known to equal , this product can be calculated by means of a left shift: .
We now construct a function, GT, that takes two encoded vectors, and , each of width and length , that returns another vector, of the same length and width, that has a in every position where the decoded is greater than the decoded , and otherwise. This is described as Algorithm 3.
The algorithm essentially calculates the overflow bit in each element during the subtraction operation , but uses special handling for the MSB so as to ensure that the overflow from no element affects the result in any other element.
The operations used are all available to Theorem 3 directly, except for the right shift on the last line, which is accomplished by means of Lemma 8. (Nowhere in the proof of Theorem 3 is division applied to operands that were calculated by use of right shifts. This being the case, Lemma 8 continues to be applicable despite the larger operation set available to Theorem 3.)
The function GT now allows us to construct a test for equality as follows.