Click here for PDF version (which has better rendering of pictures, tables, etc.)
Click here for all blog posts on this.
The alphabet of Time
The details of the model of computation are not too important if you don’t care about power differences in running times, such as the difference between solving a problem on an input of length in time vs. time . But they matter if you do.
The fundamentals features of computation are two:
- Locality. Computation proceeds in small, local steps. Each step only depends on and affects a small amount of “data.” For example, in the grade-school algorithm for addition, each step only involves a constant number of digits.
- Generality. The computational process is general in that it applies to many different problems. At one extreme, we can think of a single algorithm which applies to an infinite number of inputs. This is called uniform computation. Or we can design algorithms that work on a finite set of inputs. This makes sense if the description of the algorithm is much smaller than the description of the inputs that can be processed by it. This setting is usually referred to as non-uniform computation.
Keep in mind these two principles when reading the next models.
2.1 Tape machines (TMs)
Tape machines are equipped with an infinite tape of cells with symbols from the tape alphabet , and a tape head lying on exactly one cell. The machine is in one of several states, which you can think of as lines in a programming language. In one step the machine writes a symbol where the head points, changes state, and moves the head one cell to the right or left. Alternatively, it can stop. Such action depends only on the state of the machine and the tape symbol under the head.
We are interested in studying the resources required for computing. Several resources are of interest, like time and space. In this chapter we begin with time.
Describing TMs by giving the transition function quickly becomes complicated and uninformative. Instead, we give a high-level description of how the TM works. The important points to address are how the head moves, and how information is moved across the tape.
Example 2.1. On input we wish to compute (i.e., we think of as an integer in binary, and increment by one). This can be accomplished by a TM with states as follows. Move the head to the least significant bit of . If you read a , write a , move the head to the beginning, and stop. If instead you read a , write a , move the head by one symbol, and repeat. If you reach the beginning of the input, shift the input by one symbol, append a , move the head to the beginning and stop.
The TM only does a constant number of passes over the input, so the running time is .
Exercise 2.1. Describe a TM that decides if a string is palindrome, and bound its running time.
Exercise 2.2. Describe a TM that on input computes in binary in time .
TMs can compute any function if they have sufficiently many states:
Exercise 2.3. Prove that every function can be computed by a TM in time using states.
2.1.1 You don’t need much to have it all
How powerful are tape machines? Perhaps surprisingly, they are all-powerful.
Power-time computability thesis.
For any “realistic” computational model there is such that: Anything that can be computed on in time can also be computed on a TM in time .
This is a thesis, not a theorem. The meaning of “realistic” is a matter of debate, and one challenge to the thesis is discussed in section º2.5.
However, the thesis can be proved for many standard computational models, which include all modern programming languages. The proofs aren’t hard. One just tediously goes through each instruction in the target model and gives a TM implementation. We prove a representative case below (Theorem 2.6) for rapid-access machines (RAMs), which are close to how computers operate, and from which the jump to a programming language is short.
Given the thesis, why bother with TMs? Why not just use RAMs or a programming language as our model? In fact, we will basically do that. Our default for complexity will be RAMs. However, some of the benefits of TMs remain
- TMs are easier to define – just imagine how more complicated Definition 2.1 would be were we to use a different model. Whereas for TMs we can give a short self-contained definition, for other models we have to resort to skipping details. There is also some arbitrariness in the definition of other models. What operations exactly are allowed?
- TMs allow us to pinpoint more precisely the limits of computation. Results such as Theorem ?? are easier to prove for TMs. A proof for RAM would first go by simulating RAM by a TM.
- Finally, TMs allow us to better pinpoint the limits of our knowledge about computation; we will see several examples of this.
In short, RAMs and programming languages are useful to carry computation, TMs to analyze it.
2.1.2 Time complexity, P, and EXP
We now define our first complexity classes. We are interested in solving a variety of computational tasks on TMs. So we make some remarks before the definition.
- We often need to compute structured objects, like tuples, graphs, matrices, etc. One can encode such objects in binary by using multiple bits. We will assume that such encodings are fixed and allow ourselves to speak of such structures objects. For example, we can encode a tuple where by repeating each bit in each twice, and separate elements with .
- We can view machines as computing functions, or solving problems, or deciding sets, or deciding languages. These are all equivalent notions. For example, for a set , the problem of deciding if an input belongs to , written , is equivalent to computing the boolean characteristic function which outputs if the input belongs to , and otherwise. We will use this terminology interchangeably. In general, “computing a function” is more appropriate terminology when the function is not boolean.
- We allow partial functions, i.e., functions with a domain that is a strict subset of . This is a natural choice for many problems, cf. discussion in section º??.
- We measure the running time of the machine in terms of the input length, usually denoted . Input length can be a coarse measure: it is often natural to express the running time in terms of other parameters (for example, the time to factor a number could be better expressed in terms of the number of factors of the input, rather than its bit length). However for most of the discussion this coarse measure suffices, and we will discuss explicitly when it does not.
- We allow non-boolean outputs. However the running time is still only measured in terms of the input. (Another option which sometimes makes sense, it to bound the time in terms of the output length as well, which allows us to speak meaningfully of computing functions with very large outputs, such as exponentiation.)
- More generally, we are interested in computing not just functions but relations. That is, given an input we wish to compute some that belongs to a set . For example, the problem at hand might have more than one solution, and we just want to compute any of them.
- We are only interested in sufficiently large , because one can always hard-wire solutions for inputs of fixed size, see Exercise 2.3. This allows us to speak of running times like without worrying that it is not suitable when is small (for example, , so the TM could not even get started). This is reflected in the in the definition.
With this in mind, we now give the definition.
We will not need to deal with relations and partial functions until later in this text.
Also, working with boolean functions, i.e., functions with range slightly simplifies the exposition of a number of results we will see later. To avoid an explosion of complexity classes, we adopt the following convention.
Convention about complexity classes:
Unless specified otherwise, inclusions and separations among complexity classes refer to boolean functions. For example, an expression like means that every boolean function in is in .
As hinted before, the definition of is robust. In the next few sections we discuss this robustness in more detail, and also introduce a number of other central computational models.
2.1.3 The universal TM
Universal machines can simulate any other machine on any input. These machines play a critical role in some results we will see later. They also have historical significance: before them machines were tailored to specific tasks. One can think of such machines as epitomizing the victory of software over hardware: A single machine (hardware) can be programmed (software) to simulate any other machine.
2.2 Multi-tape machines (MTMs)
Exercise 2.4. Prove that Palindromes is in . Compare this to the run-time from the the TM in Exercise 2.1.
The following result implies in particular that is unchanged if we define it in terms of TMs or .
Theorem 2.1. for any and .
Exercise 2.5. Prove this. Note: Use TMs as we defined them, this may make a step of the proof less obvious than it may seem.
A much less obvious simulation is given by the following fundamental result about MTMs. It shows how to reduce the number of tapes two two, at very little cost in time. Moreover, the head movements of the simulator are restricted in a sense that at first sight appears too strong.
Theorem 2.2. [3, 6], for every function . Moreover, the 2-TM is oblivious: the movement of each tape head depends only on the length of the input.
Using this results one can prove the existence of universal MTMs similar to the universal TMs in Lemma 2.1.
Figure 2.2: A circuit computing the Xor of two bits.
Figure 2.3: An alternating circuit.
We now define circuits. It may be helpful to refer to figure 2.2 and figure 2.3.
Definition 2.4. A circuit, abbreviated Ckt, is a directed acyclic graph where each node is one of the following types: an input variable (fan-in ), an output variable (fan-in ), a negation gate (fan-in ), an And gate (fan-in ), or an Or gate (fan-in ). The fan-in of a gate is the number of edges pointing to the gate, the fan-out is the number of edges pointing away from the gate.
An alternating circuit, abbreviated AltCkt, is a circuit with unbounded fan-in Or and And gates arranged in alternating layers (that is, the gates at a fixed distance from the input all have the same type). For each input variable the circuit has both and as input.
A DNF (resp. CNF) is an AltCkt whose output is Or (resp. And). The non-ouput gates are called terms (resp. clauses).
denotes the set of function that, for all sufficiently large , on inputs of length have circuits with gates; input and output gates are not counted.
The size of a circuit is the number of gates.
Often we will consider computing functions on small inputs. In such cases, we can often forget about details and simply appeal to the following result, which gives exponential-size circuits which are however good enough if the input is really small. In a way, the usefulness of the result goes back to the locality of computation. The result, which is a circuit analogue of Exercise 2.3, will be extensively used in this book.
Theorem 2.3. Every function can be computed by
(1)  circuits of size , and
(2) A DNF or CNF with gates (in particular, circuits of size ).
Exercise 2.7. Prove that the Or function on bits has circuits of size . Prove Item (2) in Theorem 2.3. Prove a weaker version of Item (1) in Theorem 2.3 with bound .
We now show that circuits can simulates TMs. We begin with a simple but instructive simulation which incurs a quadratic loss, then discuss a more complicated and sharper one.
For this proof it is convenient to represent a configuration of a TM in a slightly different way, as a string over the alphabet . String
with indicates that (1) the tape content is with blanks on either side, (2) the machine is in state , and (3) the head of the machine is on the tape symbol in the string.
Locality of computation here means that one symbol in a string only depends on the symbols corresponding to the same tape cell in the previous step and its two neighbors – three symbols total – because the head only moves in one step.
The simulation in Theorem 2.4. incurs a quadratic loss. However, a better simulation exists. In fact, this applies even to -TMs.
Exercise 2.9. Prove Theorem 2.5 assuming Theorem 2.2.
However, we won’t need it, as we will work with another model which is closer to how computers operate. And for this model we shall give a different and simpler “simulation” by circuits in Chapter ??.
In the other direction, TMs can simulate circuits if they have enough states. In general, allowing for the number of states to grow with the input length gives models with “hybrid uniformity.”
2.4 Rapid-access machines (RAMs)
“In some sense we are therefore merely making concrete intuitions that already pervade the literature. A related model has, indeed, been treated explicitly” […] 
The main feature that’s missing in all models considered so far is the ability to access a memory location in one time step. One can augment TMs with this capability by equipping them with an extra addressing tape and a special “jump” state which causes the head on a tape to move in one step to the address on the address tape. This model is simple enough to define, and could in a philosophical sense be the right model for how hardware can scale, since we charge for the time to write the address.
However, other models are closer to how computers seem to operate, at least over small inputs. We want to think of manipulating small integers and addresses as constant-time operations, as one typically has in mind when programming. There is a variety of such models, and some arbitrariness in their definition. Basically, we want to think of the memory as an array of cells of bits and allow for typical operations of them, including addressing arithmetic and indirect addressing: reading and writing the cell indexed by another cell.
One issue that arises is how much memory the machine should have and consequently how big should be. There are two main options here. For “typical programming,” we have a fixed memory size and time bound in mind, for example and . A good choice then is to set bits.
This however makes it harder to compare machines with different memory bounds. Also in some scenarios the memory size and the time bound are not fixed. This occurs for example when simulating another machine. To handle such scenarios we can start with a memory of cells,and a cell size of bits, enough to access the input. We then equip machines with the operation MAlloc which increases the memory (i.e., ) by one, and always sets . Note the latter operation may increase by . The MAlloc operation is akin to the TM’s tape head wandering into unknown cells.
There are also two options for how the input is given to the machine. The difference doesn’t matter if you don’t care about factors in time, but it matters if you do. For many problems, like sorting, etc. we think of the input and the output as coming in cells of bits. (Typically, , and one can simulate such cells with cells with bits.) In this case, the RAM is computing a function and the input to the RAM is given accordingly. This is what one often has in mind when writing programs that involve numbers. For other problems, it is natural to just give one bit of the input in each cell. That is, the RAM is computing and bit of the input is placed in the input cells. We will not be concerned too much with small factors and so we pick the second choice for simplicity.
Definition 2.5. A -bit -line rapid-access machine (RAM) with cells consists of a memory array of cells of bits, registers of bits, and a program of lines.
Each line of the program contains an instruction among the following:
- Standard arithmetical and logical operations, such as etc.
- , called a Read operation, which reads the memory cell and copies its content into ,
- , called a Write operation, which writes into memory cells , memory cell and copies its content into ,
- MAlloc which increases by and, if also increases by ,
Read and write operations out of boundary indices have no effect.
On an input , the RAM starts the computation with cells of memory. The input is written in cells , while contains the length of the input.
The output is written starting in cell .
We use RAMs as our main model for time inside .
Definition 2.6. is defined as but for RAMs instead of TMs.
Theorem 2.6. , for any .
What is the relationship between circuits and RAMs? If a “description” of the circuit is given, then a RAM can simulate the circuit efficiently. The other way around is not clear. It appears that circuits need a quadratic blow-up to simulate RAMs.
Exercise 2.12. Give a function in but which requires circuits of size .
There are universal RAMs that can simulate any other RAM with only a constant-factor overhead, unlike the logarithmic-factor overhead for tape machines.
2.5 A challenge to the computability thesis
Today, there’s a significant challenge to the computability thesis. This challenge comes from… I know what you are thinking: Quantum computing, superposition, factoring. Nope. Randomness.
The last century or so has seen an explosion of randomness affecting much of science, and computing has been a leader in the revolution. Today, randomness permeates computation. Except for basic “core” tasks, using randomness in algorithms is standard. So let us augment our model with randomness.
Today, one usually takes , not , for “feasible computation.” Thus it is natural to investigate how robust is.
2.5.1 Robustness of BPP: Error reduction and tail bounds for the sum of random variables
The error in the definition of BPTime is somewhat arbitrary because it can be reduced. The way you do this is natural. For boolean functions, you repeat the algorithm many times, and take a majority vote. To analyze this you need probability bounds for the sum of random variables (corresponding to the outcomes of the algorithm).
Theorem 2.7. Let be i.i.d. boolean random variables with . Then for we have , where
is the divergence.
Now one can get a variety of bounds by bounding divergence for different settings of parameter. We state one such bound which we use shortly.
Fact 2.1. .
Using this bound we can prove the error reduction stated earlier.
2.5.2 Does randomness buy time?
We can always brute-force the random choices in exponential time. If a randomized machine uses random bits then we simulate it deterministically by running it on each of the choices for the bits. A RRAM machine running in time has registers of bits. Each Rand operation gives a uniform register, so the machine uses bits. This gives the following inclusions.
Now, two surprises. First, is the fastest deterministic simulation we can prove for RAMs, or even 2-TMs. On the other hand, and that is perhaps the bigger surprise, it appears commonly believed that in fact ! Moreover, it appears commonly believed that the overhead to simulate randomized computation deterministically is very small. Here the mismatch between our ability and common belief is abysmal.
However, we can do better for TMs. A randomized TMs has two transition functions and , where each is as in Definition 2.1. At each step, the TM uses or with probability each, corresponding to tossing a coin. We can define as but with randomized TMs instead of RRAMS.
Theorem 2.10.  , for any .
2.5.3 Polynomial identity testing
We now discuss an important problem which is in BPP but not known to be in P. In fact, in a sense to be made precise later, this is the problem in BPP which is not known to be in P. To present this problem we introduce two key concepts which will be used many times: finite fields, and arithmetic circuits.
A finite field is a finite set with elements and that is equipped with operations and that behave “in the same way” as the corresponding operations over the reals or the rationals. One example are the integers modulo a prime . For this gives the field with two elements where is Xor and is And. For larger you add and multiply as over the integers but then you take the result modulo .
The following summarizes key facts about finite fields. The case of prime fields suffices for the main points of this section, but stating things for general finite fields actually simplifies the exposition overall (since otherwise we need to add qualifiers to the size of the field).
Fields of size are of natural interest in computer science. It is often desirable to have very explicit representations for such and other fields. Such representations are known and are given by simple formulas, and are in particular computable in linear time.
We now move to defining arithmetic circuits, which are a natural generalization of the circuits we encountered in section º2.3.
The PIT problem over large fields is in but it is not known to be in . The requirement that the field be large is critical, see Problem ??.
To prove this theorem we need the following fundamental fact.
- is the zero polynomial.
- for every .
The bound is a weak form of the prime number theorem. The weak form typically suffices in computer science, and has a simple and cute encoding proof.
2.5.4 Simulating BPP by circuits
While we don’t know if , we can prove that, like , BPP has power-size circuits.
Theorem 2.12. 
Proof. Let be in . By Theorem 2.8 we can assume that the error is , and let be the corresponding RRAM. Note
where the first inequality is a union bound.
Therefore, there is a fixed choice for that gives the correct answer for every input . This choice can be hardwired in the circuit, and the rest of the computation can be written as a circuit by Theorem 2.4. QED
2.5.5 Questions raised by randomness
The introduction of randomness in our model raises several fascinating questions. First, does “perfect” randomness exists “in nature?” Second, do we need “perfect” randomness for computation? A large body of research has been devoted to greatly generalize Problem 2.16 to show that, in fact, even imperfect sources of randomness suffices for computation. Third, do we need randomness at all? Is ?
One of the exciting developments of complexity theory has been the connection between the latter question and the “grand challenge” from the next chapter. At a high level, it has been shown that explicit functions that are hard for circuits can be used to de-randomize computation. In a nutshell, the idea is that if a function is hard to compute then its output is “random,” so can be used instead of true randomness. The harder the function the less randomness we need. At one extreme, we have the following striking connection:
In other words, either randomness is useless for power-time computation, or else circuits can speed up exponential-time uniform computation!
2.6 Inclusion extend “upwards,” separations downwards
To develop intuition about complexity, we now discuss a general technique known as padding. In short, the technique shows that if you can trade resource for , then you can also trade a lot of for a lot of . For a metaphor, if you have a magical device that can turn one pound of sill into gold, you can also use it to turn two pounds of sill into gold. The contrapositive is that if you can’t trade a lot of for a lot of , then you also can’t trade a little of for a little of .
We give a first example using the classes that we have encountered so far.
Example 2.3. Suppose that . Then .
Problem 2.2. Show that Palindromes can be solved in time on a randomized TM. (Yes, only one tape.)
Hint: View the input as coefficients of polynomials.
Problem 2.3. Give a function that is in
 Leonard Adleman. Two theorems on random polynomial time. In 19th IEEE Symp. on Foundations of Computer Science (FOCS), pages 75–83. 1978.
 Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. J. Comput. Syst. Sci., 18(2):155–193, 1979.
 Fred Hennie and Richard Stearns. Two-tape simulation of multitape turing machines. J. of the ACM, 13:533–546, October 1966.
 Russell Impagliazzo and Avi Wigderson. if requires exponential circuits: Derandomizing the XOR lemma. In 29th ACM Symp. on the Theory of Computing (STOC), pages 220–229. ACM, 1997.
 O. B. Lupanov. A method of circuit synthesis. Izv. VUZ Radiofiz., 1:120–140, 1958.
 Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. of the ACM, 26(2):361–381, 1979.
 Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Mathematics of Computation, 54(189):435–447, 1990.
 Alan M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. London Math. Soc., s2-42(1):230–265, 1937.
 Emanuele Viola. Pseudorandom bits and lower bounds for randomized turing machines. Theory of Computing, 18(10):1–12, 2022.