Nathan Never strikes back

Amiga Germany Fan’zine – #5 features Nathan Never, 30 years after the game came out (the article is from some months ago, but I just found out). You can read the article here.

My first day of high school I was looking for a desk, and the one that was left was occupied by Marco Genovesi (also see previous post about Black Viper, our other game together, also featured in article). We had both programmed a little the mythical C64. I had written in basic a fighting game inspired by Hokuto No Ken. It was hand-coded pixellated sprites doing flying martial arts kicks. It was on a cassette — I didn’t have a floppy disk. I would love to see it again, but it seems lost forever. It was my first game, written sometime when I was between 10 and 12.

If I think back at my schools, from elementary to high, I scream in horror. There was nothing. No play-yard, no props, nothing. Even the walls! How much does it cost to hang a world map? Instead they were solid color. We were 8-1 or something, every day, including Saturdays, in a room with solid color walls, solid color desks, and a blackboard, and that’s it. Perhaps no wonder that I exercised my art and craft on what I had. I dismembered floppy disks, cassettes, speakers etc. and stuck them on the wall. I scribbled all over my desk.

At some point I got an Amiga 1000 (my trajectory was Spectrum 48K, Commodore 64, Amiga 1000). Naturally I wanted to program, but had zero leads. Unlike today, it was very hard to get started. Interestingly, people more experienced than me had the same complaint, like “when I was 13 myself good luck finding an assembler.” Actually, I think things will continue to move in this direction. Even today, learning something isn’t that easy, and one can imagine how in 20 years you’ll have tutorials precisely tailored to your background, instead of having to zap through youtube to find the small step you’re missing.

So for a while I was stuck fiddling with startup sequences and shell commands. I remember my parents puzzled at why I wasn’t doing anything fancy like on the C64.
“You’re still at popcli” (a type of shell, known back then as CLI, command line interface) they told me once — I am cursed with good memory.
My reply was “You need a program to write a program.” They looked puzzled. (And so was I, after all, I didn’t need one on the C64.)

My turning point occurred when I did one of my stupid things. I was balancing myself on the top beam of a swing. Naturally, I fell, didn’t land right, and injured my ankle. I was stuck in my room for a month, and one day Marco showed up and brought me the Amiga hardware reference manual, a book we had overheard was critical. I thanked him and I remember him saying: “I am doing it for me.” This book became my bible. I thrashed the stupid C compilers, got an assembler, and started “bashing metal” and having fun.

Then there was the store. I don’t remember how I ended up there. It sold games, so maybe I was just there to buy games. But I never bought a new game — only pirate — so the explanation doesn’t stand. Anyway, we were there and the person there was a techno-musician who knew people who wanted to make a game on Nathan Never, a pretty high-profile graphic novel series in Italy.

To this day I am shocked they didn’t find anyone else, since as I said it’s pretty high-profile. Maybe the programmer wanted too much money, or ditched them last moment and they needed the game fast. Who knows. I had just barely started programming on Amiga, but… I know blitter, I know copper; I am obviously ready for a major project.

So we signed the contract, and I insisted on some money upfront which I got. A few Ks. The development of the game was covered in games magazines, which other people in our high school were devouring. They were shocked to find out the team was us! Waking around, we would pass by a store and find the box of the game on display.

As part of my deal I managed to get an Amiga 3000 tower, a one-of-a-kind machine, which for a kid like me was a dream. Before that, I didn’t have enough memory to play the game from the assembler with the graphics loaded, so when I tested it I would see dots instead of sprites. Marco, who did the graphics, at the beginning didn’t have a color monitor, and would pick colors from the hex code, quite cool. I brought the 3000T to our vacation that summer, and instead of going to the sea (which I hated) I stayed indoor to program. Naturally, I spent 50% (easily) of that time programming a script that would compile an animation through a landscape of fractal mountains. The final animation was quite cool.

I wish the game was more fun to play. It isn’t really my fault: no experience, no time, and crucially the people who imposed the game design on us were not players but exclusively concerned with showing the main character, whose frames occupy most of the memory. More than the game they cared about the graphic novel that came with it in the box. Of all the people buzzing around the game, I was the only one who actually played computer games (I still do). Still, of course, I could have improved the gameplay. Instead, on an ultra-tight deadline, I decided to embark on a 3D “escape from the tunnel” last level, something they didn’t ask for (they were happy with an animation, which would have cost me zero effort) and that probably almost noone has seen, since to get there you need to beat two grueling platform levels a puzzle level, with one life and no save state. (The 3D level starts at 20:09 in this long play.) And this is what it means to be 14.

On the bright side, however, perhaps the game has its unique charm, and today it can be played on a hard disk through the emulator, even though the game was not designed to be installed on a hard disk. And this is great because playing the game from the disks is really painful, despite or in part because of a crazy color hack to make a 3D “change disk” message (another uncalled-for feature). And it’s a good feeling that after 30+ years some people are still watching and playing the game.

Mathematics of the impossible: Computational Complexity, Chapter 3: The grand challenge

All posts in this series.
A PDF version of this post will be published with a delay, but if you’d like to have it soon let me know.



As mentioned in Chapter ??, our ability to prove impossibility results related to efficient computation appears very limited. We can now express this situation more precisely with the models we’ve introduced since then.

It is consistent with our knowledge that any problem in a standard algorithm textbook can be solved

  1. in Time cn^{2} on a TM, and
  2. in Time cn on a 2-TM, and
  3. by circuits of size cn.

Note that 2. implies 1. by Theorem ??.

In this chapter we begin to present several impossibility results, covering a variety of techniques which will be used later as well. As hinted above, they appear somewhat weak. However, jumping ahead, there is a flip side to all of this:

  1. At times, contrary to our intuition, stronger impossibility results are actually false. One example appears in Chapter ??. A list will be given later.
  2. Many times, the impossibility results that we can prove turn out to be, surprisingly, just “short” of proving major results. Here by “major result” I mean a result that would be phenomenal and that was in focus long before the connection was established. We will see several examples of this (section º??, section º??).
  3. Yet other times, one can identify broad classes of proof techniques, and argue that impossibility results can’t be proved with them (section º??).

Given this situation, I don’t subscribe to the general belief that stronger impossibility results are true and we just can’t prove them.

1.1 Information bottleneck: Palindromes requires quadratic time on TMs

Intuitively, the weakness of TMs is the bottleneck of passing information from one end of the tape to the other. We now show how to formalize this and use it show that deciding if a string is a palindrome requires quadratic time on TMs, which is tight and likely matches the time in Exercise ??. The same bound can be shown for other functions; palindromes just happen to be convenient to obtain matching bounds.

Theorem 1.1. \text {Palindromes}\not \in \text {TM-Time}(t(n)) for any t(n)=o(n^{2}).

More precisely, for every n and s, an s-state TM that decides if an n-bit input is a palindrome requires time \ge cn^{2}/\log s.

The main concept that allows us to formalize the information bottleneck mentioned above is the following.

Definition 1.1. A crossing sequence of a TM M on input x and boundary i, abbreviated i-CS, is the sequence of states that M is transitioning to when crossing cell boundary i (i.e., going from Cell i to i+1 or vice versa) during the computation on x.

The idea in the proof is very interesting. If M accepts inputs x and y and those two inputs have the same i-CS for some i, then we can “stitch together” the computation of M on x and y at boundary i to create a new input z that is still accepted by M. The input z is formed by picking bits from x to the left of cell boundary i and bits from y to the right of i:

\begin{aligned} z:=x_{1}x_{2}\cdots x_{i}y_{i+1}y_{i+2}\cdots y_{n}. \end{aligned}

The proof that z is still accepted is left as an exercise.

Now, for many problems, input z should not be accepted by M, and this gives a contradiction. In particular this will be be the case for palindromes. We are going to find two palindromes x and y that have the same i-CS for some i, but the corresponding z is not a palindrome, yet it is still accepted by M. We can find these two palindromes if M takes too little time. The basic idea is that if M runs in time t, because i-CSs for different i correspond to different steps of the computation, for every input there is a value of i such that the i-CS is short, namely has length at most t(|x|)/n. If t(n) is much less than n^{2}, the length of this CS is much less than n, from which we can conclude that the number of CSs is much less than the number of inputs, and so we can find two inputs with the same CS.

ProofLet n be divisible by four, without loss of generality, and consider palindromes of the form

\begin{aligned} p(x):=x0^{n/2}x^{R} \end{aligned}

where x\in \{0,1\} ^{n/4} and x^{R} is the reverse of x.

Assume there are x\ne y in \{0,1\} ^{n/4} and i in the middle part, i.e., n/4\le i\le 3n/4-1, so that the i-CS of M on p(x) and p(y) is the same. Then we can define z:=x0^{n/2}y^{R} which is not a palindrome but is still accepted by M, concluding the proof.

There remains to prove that the assumption of Theorem 1.1 implies the assumption in the previous paragraph. Suppose M runs in time t. Since crossing sequences at different boundaries correspond to different steps of the computation, for every x\in \{0,1\} ^{n/4} there is a value of i in the middle part such that the i-CS of M on p(x) has length \le ct/n. This implies that there is an i in the middle s.t. there are \ge c2^{n/4}/n inputs x for which the i-CS of M on x has length \le ct/n.

For fixed i, the number of i-CS of length \le \ell is \le (s+1)^{\ell }.

Hence there are x\ne y for which p(x) and p(y) have the same i-CS whenever c2^{n/4}/n\ge (s+1)^{ct/n}. Taking logs one gets ct\log (s)/n\le cn. QED

Exercise 1.1. For every s and n describe an s-state TM deciding palindromes in time cn^{2}/\log s (matching Theorem 1.1).

Exercise 1.2. Let L:=\{xx:x\in \{0,1\} ^{*}\}. Show L\in \text {TM-Time\ensuremath {(cn^{2})}}, and prove this is tight up to constants.

One may be tempted to think that it is not hard to prove stronger bounds for similar functions. In fact as mentioned above this has resisted all attempts!

1.2 Counting: impossibility results for non-explicit functions

Proving the existence of hard functions is simple: Just count. If there are more functions than efficient machines, some function is not efficiently computable. This is applicable to any model; next we state it for TMs for concreteness. Later we will state it for circuits.

Theorem 1.2. There exists a function f:\{0,1\} ^{n}\to \{0,1\} that cannot be computed by a TM with s states unless cs\log s\ge 2^{n}, regardless of time.

ProofThe number of TMs with s states is \le s{}^{cs}, and each TM computes at most one function (it may compute none, if it does not stop). The number of functions on n bits is 2^{2^{n}}. Hence if 2^{n}>cs\log s some function cannot be computed. QED

Note this bound is not far from that in Exercise ??.

It is instructive to present this basic result as an application of the probabilistic method:

ProofLet us pick f uniformly at random. We want to show that

\begin{aligned} \mathbb {P}_{f}[\exists \text { an }s\text {-state TM }M\text { such that }M(x)=f(x)\text { for every }x\in \{0,1\} ^{n}]<1. \end{aligned}

Indeed, if the probability is less than 1 than some function exists that cannot be computed. By a union bound we can say that this probability is

\begin{aligned} \le \sum _{M}\mathbb {P}_{f}[M(x)=f(x)\text { for every }x\in \{0,1\} ^{n}], \end{aligned}

where the sum is over all s-state machines. Each probability in the sum is (1/2)^{2^{n}}, since M is fixed. The number of s-state machines is \le s^{cs}. So the sum is \le s^{cs}(1/2)^{2^{n}}, and we can conclude as before taking logs. QED

1.3 Diagonalization: Enumerating machines

Can you compute more if you have more time? For example, can you write a program that runs in time n^{2} and computes something that cannot be computed in time n^{1.5}? The answer is yes for trivial reasons if we allow for non-boolean functions.

Exercise 1.3. Give a function f:\{0,1\}^* \to \{0,1\}^* in \text {Time}(n^{2})\setminus \text {Time}(n^{1.5}).

The answer is more interesting if the functions are boolean. Such results are known as time hierarchies, and a generic technique for proving them is diagonalization, applicable to any model.

We first illustrate the result in the simpler case of partial functions, which contains the main ideas. Later we discuss total functions.

Theorem 1.3. There is a partial function in \text {TM-Time}(t(n)) such that any TM M computing it runs in time \ge c_{M}t(n), for any t(n)=\omega (1).

In other words, \text {Time}(t(n))\supsetneq \text {Time}(o(t(n)).

ProofConsider the TM H that on input x=(M,1^{n-|M|}) of length n runs M on x until it stops and then complements the answer. (We can use a simple encoding of these pairs, for example every even-position bit of the description of M is a 0.)

Now define X to be the subset of pairs s.t. M runs in time \le t(n)/|M|^{c} on inputs of length n, and |M|^{c}\le t(n)/2.

On these inputs, H runs in time |M|^{c}+|M|^{c}\cdot t(n)/|M|^{c}\le t(n), as desired. To accomplish this, H can begin by making a copy of M in time |M|^{c}\le t(n)/2. Then every step of the computation of M can be simulated by H with |M|^{c} steps, always keeping the description of M to the left of the head.

Now suppose N computes the same function as H in time t(n)/|N|^{c}. Note that x:=(N,1^{n-|N|}) falls in the domain X of the function, for n sufficiently large, using that t(n)=\omega (1). Now consider running N on x. We have N(x)=H(x) by supposition, but H(x) is the complement of N(x), contradiction. QED

This proof is somewhat unsatisfactory; in particular we have no control on the running time of H on inputs not in X. It is desirable to have a version of this fundamental result for total functions. Such a version is stated next. It requires additional assumptions on t and a larger gap between the running times. Perhaps surprisingly, as we shall discuss, both requirements are essential.

Theorem 1.4. Let t(n)\ge n be a function. Suppose that f(x):=t(|x|) is in \text {TM-Time}(t(n)/\log ^{c}n).

There is a total function in \text {TM-Time}(ct(n)\log t(n)) that cannot be computed by any TM M in time c_{M}t(n).

The assumption about t is satisfied by all standard functions, including all those in this book. (For example, take t(n):=n^{2}. The corresponding f is then |x|^{2}. To compute f on input x of n bits we can first compute |x|=n in time cn\log n (Exercise ??). This is a number of b:=\log n bits. We can then square this number in time b^{c}. Note that the time to compute f(x) is dominated by the cn\log n term coming from computing |x|, which does not depend on t and is much less than the n^{2}/\log ^{c}n in the assumption.) The assumption cannot be removed altogether because there exist pathological functions t for which the result is false.

The proof is similar to that of Theorem 1.3. However, to make the function total we need to deal with arbitrary machines, which may not run in the desired time or even stop at all. The solution is to clock H in a manner similar to the proof of the universal machine, Lemma ??.

Also, we define a slightly different language to give a stronger result – a unary language – and to avoid some minor technical details (the possibility that the computation of f erases the input).

We define a TM H that on input 1^{n} obtains a description of a TM M, computes t(n), and then simulates M on input 1^{n} for t(n) steps in a way similar to Lemma ??, and if M stops then H outputs the complement of the output of M; and if M does not stop then H stops and outputs anything. Now H computes a function in time about t(n). We argue that this function cannot be computed in much less time as follows. Suppose some TM M computes the function in time somewhat less than t(n). Then we can pick an 1^{n} for which H obtains the description of this M, and the simulation always stops. Hence, for that 1^{n} we would obtain M(1^{n})=H(1^{n})=1-M(1^{n}), which is a contradiction.

However, there are interesting differences with the simulation in Lemma ??. In that lemma the universal machine U was clocking the steps of the machine M being simulated. Now instead we need to clock the steps of U itself, even while U is parsing the description of M to compute its transition function. This is necessary to guarantee that H does not waste time on big TM descriptions.

Whereas in Lemma ?? the tape was arranged as

\begin{aligned} (x,M,\underline {i},t',y), \end{aligned}

it will now be arranged as

\begin{aligned} (x,M',\underline {i},t',M'',y) \end{aligned}

which is parsed as follows. The description of M is M'M'', M is in state \underline {i}, the tape of M contains xy and the head is on the left-most symbol of y. The integer t' is the counter decreased at every step

ProofDefine TM H that on input 1^{n}:

  1. Compute (n,t(n),1^{n}).
  2. Compute (M_{n},t(n),1^{n}). Here M_{n} is obtained from n by removing all left-most zeroes until the first 1. I.e., if n=0^{j}1x then M_{n}=x. This is similar to the fact that a program does not change if you add, say, empty lines at the bottom.
  3. Simulate M_{n} on 1^{n}, reducing the counter t(n) at every step, including those parsing M_{n}, as explained before.
  4. If M_{n} stops before the counter reaches 0, output the complement of its output. If the counter reaches 0 stop and output anything.

Running time of H.

  1. Computing n is similar to Exercise ??. By assumption t(n) is computable in time t(n)/\log ^{c}n. Our definition of computation allows for erasing the input, but we can keep n around spending at most another \log ^{c}n factor. Thus we can compute (n,t(n)) in time t(n). Finally, in case it was erased, we can re-compute 1^{n} in time cn\log n by keeping a counter (initialized to a copy of n).
  2. This takes time c(n+t(n)): simply scan the input and remove zeroes.
  3. Decreasing the counter takes c|t(n)| steps. Hence this simulation will take ct(n)\log t(n) time.

Overall the running time is ct(n)\log t(n).

Proof that the function computed by H requires much time. Suppose some TM M computes the same function as H. Consider inputs 1^{n} where n=0^{j}1M. Parsing the description of M to compute its transition function takes time c_{M}, a value that depends on M only and not on j. Hence H will simulate \lfloor t(n)/c_{M}\rfloor steps of M. If M stops within that time (which requires t(n) to be larger than b_{M}, and so n and j sufficiently large compared to M) then the simulation terminates and we reach a contradiction as explained before. QED

The extra \log t(n) factor cannot be reduced because of the surprising result presented in Theorem 1.5 showing that, on TMs, time o(n\log n) equals time n for computing total functions.

However, tighter time hierarchies hold for more powerful models, like RAMs. Also, a time hierarchy for total functions for \text {BPTime} is… an open problem! But a hierarchy is known for partial functions.

Exercise 1.4. (1) State and prove a tighter time hierarchy for Time (which recall corresponds to RAMs) for total functions. You don’t need to address simulation details, but you need to explain why a sharper separation is possible.

(2) Explain the difficulty in extending (1) to \text {BPTime}.

(3) State and prove a time hierarchy for \text {BPTime} for partial functions.

1.3.1 \text {TM-Time}(o(n\log n))=\text {TM-Time}(n)

In this subsection we prove the result in the title, which we also mentioned earlier. First let us state the result formally.

Theorem 1.5. [12] Let f:\{0,1\}^* \to \{0,1\} be in \text {TM-Time}(t(n)) for a t(n)=o(n\log n). Then f\in \text {TM-Time}(n).

Note that time n is barely enough to scan the input. Indeed, the corresponding machines in Theorem 1.5 will only move the head in one direction.

The rest of this section is devoted to proving the above theorem. Let M be a machine for f witnessing the assumption of the theorem. We can assume that M stops on every input (even though our definition of time only applies to large enough inputs), possibly by adding \le n to the time, which does not change the assumption on t(n). The theorem now follows from the combination of the next two lemmas.

Lemma 1.1. Let M be a TM running in time t(n)\le o(n\log n). Then on every input x\in \{0,1\}^* every i-CS with i\le |x| has length \le c_{M}.

ProofFor an integer b let x(b) be a shortest input such that there exists j\in \{0,1,\ldots ,n\} for which the j-CS in the computation of M on x(b) has length \ge b. Let n(b):=|x(b)|.

Exercise 1.5. Prove n(b)\to \infty for b\to \infty .

There are n(b)+1\ge n(b) tape boundaries within or bordering x(b). If we pick a boundary uniformly at random, the average length of a CS on x(b) is \le t(n(b))/n(b). Hence there are \ge n(b)/2 choices for i s.t. the i-CS on x(b) has length \le 2t(n(b))/n(b).

The number of such crossing sequences is

\begin{aligned} \le (s+1)^{2t(n(b))/n(b)}=(s+1)^{o(n(b)\log (n(b))/n(b)}=n(b)^{o(\log s)}. \end{aligned}

Hence, the same crossing sequence occurs at \ge (n(b)/2)/n(b)^{o(\log s)}\ge 4 positions i, using that n(b) is large enough.

Of these four, one could be the CS of length \ge b from the definition of x(b). Of the other three, two are on the same side of j. We can remove the corresponding interval of the input without removing the CS of length \ge b. Hence we obtained a shorter input with a CS of length \ge b. Contradiction. QED

Lemma 1.2. Suppose f:\{0,1\}^* \to \{0,1\} is computable by a TM such that on every input x, every i-CS with i\le |x| has length \le b. Then f is computable in time n by a TM with c_{b} states that only moves the head in one direction.

Exercise 1.6. Prove this.

1.4 Circuits

The situation for circuits is similar to that of 2-TMs, and by Theorem ?? we know that proving \omega (n\log n) bounds on circuits is harder than for 2-TMs. Even a bound of cn is unknown. The following is a circuit version of Theorem 1.2. Again, the bound is close to what we saw in Theorem ??.

Theorem 1.6. [3] There are functions f:\{0,1\} ^{n}\to \{0,1\} that require circuits of size \ge (1-o(1))2^{n}/n, for every n.

One can prove a hierarchy for circuit size, by combining Theorem 1.6 and Theorem ??. As stated, the results give that circuits of size cs compute more functions than those of size s. In fact, the “o(1)” in the theorems is small, so one can prove a sharper result. But a stronger and more enjoyable argument exists.

Theorem 1.7. [Hierarchy for circuit size] For every n and s\le c2^{n}/n there is a function f:\zonzo that can be computed by circuits of size s+cn but not by circuits of size s.

ProofConsider a path from the all-zero function to a function which requires circuits of size \ge s, guaranteed to exist by Theorem 1.6, changing the output of the function on one input at each step of the path. Let h be the first function that requires size >s, and let h' be the function right before that in the path. Note that h' has circuits of size \le s, and h can be computed from h' by changing the value on a single input. The latter can be implemented by circuits of size cn. QED

Exercise 1.7. Prove a stronger hierarchy result for alternating circuits, where the “cn” in Theorem 1.7 is replaced with “c.”

In fact, this improvement is possible even for (non aternating) circuits, see Problem 1.2.

1.4.1 The circuit won’t fit in the universe: Non-asymptotic, cosmological results

Most of the results in this book are asymptotic, i.e., we do not explicitly work out the constants because they become irrelevant for larger and larger input lengths. As stated, these results don’t say anything for inputs of a fixed length. For example, any function on 10^{100} bits is in Time(c).

However, it is important to note that all the proofs are constructive and one can work out the constants, and produce non-asymptotic results. We state next one representative example when this was done. It is about a problem in logic, an area which often produces very hard problems.

On an alphabet of size 63, the language used to write formulas includes first-order variables that range over \mathbb {N}, second-order variables that range over finite subsets of \mathbb {N}, the predicates “y=x+1” and “x\in S” where x and y denote first-order variables and S denotes a set variable, and standard quantifiers, connectives, constants, binary relation symbols on integers, and set equality. For example one can write things like “every finite set has a maximum:” \forall S\exists x\in S\forall y\in S,y\le x.

Theorem 1.8. [4] To decide the truth of logical formulas of length at most 610 in this language requires a circuit containing at least 10^{125} gates. So even if each gate were the size of a proton, the circuit would not fit in the known universe.

Their result applies even to randomized circuits with error 1/3, if 610 is replaced with 614. (We can define randomized circuits analogously to BPTime.)

1.5 Problems

Problem 1.1. Hierarchy Theorem 1.4 only gives a function f that cannot be computed fast on all large enough input lengths: it is consistent with the theorem that f can be computed fast on infinitely many input lengths.

Give a function f:\{0,1\}^* \to \{0,1\}^* mapping x to \{0,1\} ^{\log \log \log |x|} that is computable in time n^{c} but such that for any TM M running in time n^{2} the following holds. For every n\ge c_{M} and every x\in \{0,1\} ^{n} such that M(x)\ne f(x).

Hint: Note the range of f. Can this result hold as stated with range \{0,1\} ?

Problem 1.2. Replace “cn” in Theorem 1.7 with “c.”

Problem 1.3. Prove that \{0^{i}1^{i}:i\ge 0\}\in \text {TM-Time\ensuremath {(cn\log n)\setminus \text {TM-Time}(n)}.}

This shows Theorem 1.5 is tight, and gives an explicit problem witnessing the time-hierarchy separation in Theorem 1.4, for a specific time bound. For the negative result, don’t use pumping lemmas or other characterization results not covered in this book.

Problem 1.4. The following argument contradicts Theorem 1.4; what is wrong with it?

“By Theorem 1.5, \text {TM-Time}(n\log ^{0.9}n)=\text {TM-Time}(n). By padding (Theorem 1.5), \text {TM-Time}(n\log ^{1.1}n)=\text {TM-Time}(n\log ^{0.9}n). Hence \text {TM-Time}(n\log ^{1.1}n)=\text {TM-Time}(n).


[1]   F. C. Hennie. One-tape, off-line turing machine computations. Information and Control, 8(6):553–578, 1965.

[2]   Kojiro Kobayashi. On the structure of one-tape nondeterministic turing machine time hierarchy. Theor. Comput. Sci., 40:175–193, 1985.

[3]   Claude E. Shannon. The synthesis of two-terminal switching circuits. Bell System Tech. J., 28:59–98, 1949.

[4]   Larry Stockmeyer and Albert R. Meyer. Cosmological lower bound on the circuit complexity of a small problem in logic. J. ACM, 49(6):753–784, 2002.

Mathematics of the impossible: Computational Complexity, Chapter 2

Click here for PDF version (which has better rendering of pictures, tables, etc.)
Click here for all blog posts on this.






Chapter 2
The alphabet of Time



The details of the model of computation are not too important if you don’t care about power differences in running times, such as the difference between solving a problem on an input of length n in time cn vs. time cn^{2}. But they matter if you do.

The fundamentals features of computation are two:

  • Locality. Computation proceeds in small, local steps. Each step only depends on and affects a small amount of “data.” For example, in the grade-school algorithm for addition, each step only involves a constant number of digits.
  • Generality. The computational process is general in that it applies to many different problems. At one extreme, we can think of a single algorithm which applies to an infinite number of inputs. This is called uniform computation. Or we can design algorithms that work on a finite set of inputs. This makes sense if the description of the algorithm is much smaller than the description of the inputs that can be processed by it. This setting is usually referred to as non-uniform computation.

Keep in mind these two principles when reading the next models.


SVG-Viewer needed.


Figure 2.1: Computational models for Time. An arrow from A to B means that B can simulate A efficiently (from time t to t\log ^{c}t).

2.1 Tape machines (TMs)

Tape machines are equipped with an infinite tape of cells with symbols from the tape alphabet A, and a tape head lying on exactly one cell. The machine is in one of several states, which you can think of as lines in a programming language. In one step the machine writes a symbol where the head points, changes state, and moves the head one cell to the right or left. Alternatively, it can stop. Such action depends only on the state of the machine and the tape symbol under the head.

We are interested in studying the resources required for computing. Several resources are of interest, like time and space. In this chapter we begin with time.

Definition 2.1. [8]A tape machine (TM) with s states is a map (known as transition or step)

\begin{aligned} \sigma :\{\underline {1},\underline {2},\ldots ,\underline {s}\}\times A\to A\times \{\text {Left},\text {Right},\text {Stop}\}\times \{\underline {1},\underline {2},\ldots ,\underline {s}\}, \end{aligned}

where A:=\{0,1,\#,-,\text {\_}\} is the tape alphabet. The alphabet symbol _ is known as blank.

A configuration of a TM encodes its tape content, the position of the head on the tape, and the current state. It can be written as a triple (M,i,\underline {j}) where M maps the integers to A and specifies the tape contents, i is an integer indicating the position of the head on the tape, and \underline {j} is the state of the machine.

A configuration (\mu ,i,\underline {j}) yields (\mu ',i+1,\underline {j}') if \sigma (\underline {j},\mu [i])=(a,\text {Right},\underline {j}') and \mu '[i]=a and \mu '=\mu elsewhere, and similarly it yields (\mu ',i-1,\underline {j}') if \sigma (\underline {j},\mu [i])=(a,\text {Left},\underline {j}') and \mu '[i]=a and \mu '=\mu elsewhere, and finally it yields itself if \sigma (\underline {j},\mu [i])=(a,\text {Stop},\underline {j}').

We say that a TM computes y\in \{0,1\} ^{*} on input x\in \{0,1\} ^{*} in time t (or in t steps) if, starting in configuration (\mu ,0,\underline {1}) where x=\mu [0]\mu [1]\cdots \mu [|x|-1] and \mu is blank elsewhere, it yields a sequence of t configurations where the last one is (\mu ,i,\underline {j}) where \sigma (\mu [i],\underline {j}) has a Stop instruction, and y=\mu [i]\mu [i+1]\cdots \mu [i+|y|-1] and \mu is blank elsewhere.


Describing TMs by giving the transition function quickly becomes complicated and uninformative. Instead, we give a high-level description of how the TM works. The important points to address are how the head moves, and how information is moved across the tape.

Example 2.1. On input x\in \{0,1\}^* we wish to compute x+1 (i.e., we think of x as an integer in binary, and increment by one). This can be accomplished by a TM with c states as follows. Move the head to the least significant bit of x. If you read a 0, write a 1, move the head to the beginning, and stop. If instead you read a 1, write a 0, move the head by one symbol, and repeat. If you reach the beginning of the input, shift the input by one symbol, append a 1, move the head to the beginning and stop.

The TM only does a constant number of passes over the input, so the running time is c|x|.


Example 2.2. On an input x\in \{0,1\}^* we wish to decide if it has the the same number of zeros and ones. This can be done as follows. Do a pass on the input, and cross off one 0 and one 1 (by replacing them with tape symbol #). If you didn’t find any 0 or or 1, accept (that is, write 1 on the tape and stop). If only find a 0 but not a 1, or vice versa, reject.

Since every time we do a pass we cross at least two symbols, the running time is cn^{2}.


Exercise 2.1. Describe a TM that decides if a string x\in \{0,1\}^* is palindrome, and bound its running time.


Exercise 2.2. Describe a TM that on input x\in \{0,1\}^* computes n:=|x| in binary in time cn\log n.


TMs can compute any function if they have sufficiently many states:

Exercise 2.3. Prove that every function f:\{0,1\} ^{n}\to \{0,1\} can be computed by a TM in time n using 2^{n+1} states.



2.1.1 You don’t need much to have it all

How powerful are tape machines? Perhaps surprisingly, they are all-powerful.




Power-time computability thesis.

For any “realistic” computational model C there is d>0 such that: Anything that can be computed on C in time t can also be computed on a TM in time t^{d}.


This is a thesis, not a theorem. The meaning of “realistic” is a matter of debate, and one challenge to the thesis is discussed in section º2.5.

However, the thesis can be proved for many standard computational models, which include all modern programming languages. The proofs aren’t hard. One just tediously goes through each instruction in the target model and gives a TM implementation. We prove a representative case below (Theorem 2.6) for rapid-access machines (RAMs), which are close to how computers operate, and from which the jump to a programming language is short.

Given the thesis, why bother with TMs? Why not just use RAMs or a programming language as our model? In fact, we will basically do that. Our default for complexity will be RAMs. However, some of the benefits of TMs remain

  • TMs are easier to define – just imagine how more complicated Definition 2.1 would be were we to use a different model. Whereas for TMs we can give a short self-contained definition, for other models we have to resort to skipping details. There is also some arbitrariness in the definition of other models. What operations exactly are allowed?
  • TMs allow us to pinpoint more precisely the limits of computation. Results such as Theorem ?? are easier to prove for TMs. A proof for RAM would first go by simulating RAM by a TM.
  • Finally, TMs allow us to better pinpoint the limits of our knowledge about computation; we will see several examples of this.

In short, RAMs and programming languages are useful to carry computation, TMs to analyze it.


2.1.2 Time complexity, P, and EXP

We now define our first complexity classes. We are interested in solving a variety of computational tasks on TMs. So we make some remarks before the definition.

  • We often need to compute structured objects, like tuples, graphs, matrices, etc. One can encode such objects in binary by using multiple bits. We will assume that such encodings are fixed and allow ourselves to speak of such structures objects. For example, we can encode a tuple (x_{1},x_{2},\ldots ,x_{t}) where x_{i}\in \{0,1\} ^{*} by repeating each bit in each x_{i} twice, and separate elements with 01.
  • We can view machines as computing functions, or solving problems, or deciding sets, or deciding languages. These are all equivalent notions. For example, for a set A, the problem of deciding if an input x belongs to A, written x\in A, is equivalent to computing the boolean characteristic function f_{A} which outputs 1 if the input belongs to A, and 0 otherwise. We will use this terminology interchangeably. In general, “computing a function” is more appropriate terminology when the function is not boolean.
  • We allow partial functions, i.e., functions with a domain X that is a strict subset of \{0,1\} ^{*}. This is a natural choice for many problems, cf. discussion in section º??.
  • We measure the running time of the machine in terms of the input length, usually denoted n. Input length can be a coarse measure: it is often natural to express the running time in terms of other parameters (for example, the time to factor a number could be better expressed in terms of the number of factors of the input, rather than its bit length). However for most of the discussion this coarse measure suffices, and we will discuss explicitly when it does not.
  • We allow non-boolean outputs. However the running time is still only measured in terms of the input. (Another option which sometimes makes sense, it to bound the time in terms of the output length as well, which allows us to speak meaningfully of computing functions with very large outputs, such as exponentiation.)
  • More generally, we are interested in computing not just functions but relations. That is, given an input x we wish to compute some y that belongs to a set f(x). For example, the problem at hand might have more than one solution, and we just want to compute any of them.
  • We are only interested in sufficiently large n, because one can always hard-wire solutions for inputs of fixed size, see Exercise 2.3. This allows us to speak of running times like t(n)=n^{2}/1000 without worrying that it is not suitable when n is small (for example, t(10)=100/1000<1, so the TM could not even get started). This is reflected in the n\ge c_{M} in the definition.

With this in mind, we now give the definition.

Definition 2.2. [Time complexity classes – boolean] Let t:\mathbb {N}\to \mathbb {N} be a function. (\mathbb {N} denotes the natural numbers \{0,1,2,\ldots \}.) TM-Time(t) denotes the functions f that map bit strings x from a subset X\subseteq \{0,1\}^* to a set f(x) for which there exists a TM M such that, on any input x\in X of length \ge c_{M}, M computes y within t(|x|) steps and y\in f(x).

\begin{aligned} \text {P}:= & \bigcup _{d\ge 1}\text {TM-Time}(n^{d}),\\ \text {Exp}:= & \bigcup _{d\ge 1}\text {TM-Time}(2^{n^{d}}). \end{aligned}



We will not need to deal with relations and partial functions until later in this text.

Also, working with boolean functions, i.e., functions f with range \{0,1\} slightly simplifies the exposition of a number of results we will see later. To avoid an explosion of complexity classes, we adopt the following convention.


Convention about complexity classes:

  Unless specified otherwise, inclusions and separations among complexity classes refer to boolean functions. For example, an expression like \text {P}\subseteq \text {NP} means that every boolean function in \text {P} is in \text {NP}.


As hinted before, the definition of \text {P} is robust. In the next few sections we discuss this robustness in more detail, and also introduce a number of other central computational models.


2.1.3 The universal TM

Universal machines can simulate any other machine on any input. These machines play a critical role in some results we will see later. They also have historical significance: before them machines were tailored to specific tasks. One can think of such machines as epitomizing the victory of software over hardware: A single machine (hardware) can be programmed (software) to simulate any other machine.

Lemma 2.1. There is a TM U that on input (M,t,x) where M is a TM, t is an integer, and x is a string:

-Stops in time |M|^{c}\cdot t\cdot |t|,

-Outputs M(x) if the latter stops within t steps on input x.



ProofTo achieve the desired speed, the idea is to maintain the invariant that M and t are always next to the tape head. After the simulation of each step of M the tape of U will contain

\begin{aligned} (x,M,\underline {i},t',y) \end{aligned}

where M is in state \underline {i}, the tape of M contains xy and the head is on the left-most symbol of y. The integer t' is the counter decreased at every step. Computing the transition of M takes time |M|^{c}. Decreasing the counter takes time c|t|. To move M and t next to the tape head takes c|M||t| time. QED



2.2 Multi-tape machines (MTMs)

Definition 2.3. A k-TM is like a TM but with k tapes, where the heads on the tapes move independently. The input is placed on the first tape, and all other tapes are initialized to _. The output is on the first tape. k\text {-TM-Time} is defined analogously to \text {TM-Time}.


Exercise 2.4. Prove that Palindromes is in 2\text {-TM-Time}(cn). Compare this to the run-time from the the TM in Exercise 2.1.


The following result implies in particular that \text {P} is unchanged if we define it in terms of TMs or k\text {-TMs}.

Theorem 2.1. k\text {-TM-Time}(t(n))\subseteq \text {TM-Time}(c_{k}t^{2}(n)) for any t(n)\ge n and k.


Exercise 2.5. Prove this. Note: Use TMs as we defined them, this may make a step of the proof less obvious than it may seem.


A much less obvious simulation is given by the following fundamental result about MTMs. It shows how to reduce the number of tapes two two, at very little cost in time. Moreover, the head movements of the simulator are restricted in a sense that at first sight appears too strong.

Theorem 2.2. [36]k\text {-TM-Time}(t(n))\subseteq \text {2-TM-Time}(c_{k}t(n)\log t(n)), for every function t(n)\ge n. Moreover, the 2-TM is oblivious: the movement of each tape head depends only on the length of the input.


Using this results one can prove the existence of universal MTMs similar to the universal TMs in Lemma 2.1.


SVG-Viewer needed.


Figure 2.2: A circuit computing the Xor of two bits.



SVG-Viewer needed.


Figure 2.3: An alternating circuit.


2.3 Circuits

We now define circuits. It may be helpful to refer to figure 2.2 and figure 2.3.

Definition 2.4. A circuit, abbreviated Ckt, is a directed acyclic graph where each node is one of the following types: an input variable (fan-in 0), an output variable (fan-in 1), a negation gate \neg (fan-in 1), an And gate \wedge (fan-in 2), or an Or gate \vee (fan-in 2). The fan-in of a gate is the number of edges pointing to the gate, the fan-out is the number of edges pointing away from the gate.

An alternating circuit, abbreviated AltCkt, is a circuit with unbounded fan-in Or and And gates arranged in alternating layers (that is, the gates at a fixed distance from the input all have the same type). For each input variable x_{i} the circuit has both x_{i} and \neg x_{i} as input.

A DNF (resp. CNF) is an AltCkt whose output is Or (resp. And). The non-ouput gates are called terms (resp. clauses).

\text {CktGates}(g(n)) denotes the set of function f:\{0,1\}^* \to \{0,1\} ^{*} that, for all sufficiently large n, on inputs of length n have circuits with g(n) gates; input and output gates are not counted.

The size of a circuit is the number of gates.


Exercise 2.6. [Pushing negation gates at the input] Show that for any circuit C:\{0,1\} ^{n}\to \{0,1\} with g gates and depth d there is a monotone circuit C' (that is, a circuit without Not gates) with 2g gates and depth d such that for any x\in \{0,1\}^n : C(x_{1},x_{2},\ldots ,x_{n})=C'(x_{1},\neg x_{1},x_{2},\neg x_{2}\ldots ,x_{n},\neg x_{n}).


Often we will consider computing functions on small inputs. In such cases, we can often forget about details and simply appeal to the following result, which gives exponential-size circuits which are however good enough if the input is really small. In a way, the usefulness of the result goes back to the locality of computation. The result, which is a circuit analogue of Exercise 2.3, will be extensively used in this book.

Theorem 2.3. Every function f:\zonzo can be computed by

(1) [5] circuits of size \le (1+o(1))2^{n}/n, and

(2) A DNF or CNF with \le 2^{n}+1 gates (in particular, circuits of size \le n2^{n}).


Exercise 2.7. Prove that the Or function on n bits has circuits of size cn. Prove Item (2) in Theorem 2.3. Prove a weaker version of Item (1) in Theorem 2.3 with bound cn2^{n}.


Exercise 2.8. Prove that the sum of two n-bit integers can be computed by circuits with cn gates, and by alternating circuits of depth c and size n^{c}.


We now show that circuits can simulates TMs. We begin with a simple but instructive simulation which incurs a quadratic loss, then discuss a more complicated and sharper one.

Theorem 2.4. Suppose an s-state TM computes f:\{0,1\}^* \to \{0,1\} in time t\ge n. Then f\in \text {CktGates}(c_{s}t^{2}(n)).


For this proof it is convenient to represent a configuration of a TM in a slightly different way, as a string over the alphabet A\times \{0,1,\ldots ,s\}. String

\begin{aligned} (a_{1},0)(a_{2},0)\ldots (a_{i-1},0)(a_{i},j)(a_{i+1},0)\ldots (a_{m},0) \end{aligned}

with j>0 indicates that (1) the tape content is a_{1}a_{2}\ldots a_{m} with blanks on either side, (2) the machine is in state \underline {j}, and (3) the head of the machine is on the i tape symbol a_{i} in the string.

Locality of computation here means that one symbol in a string only depends on the symbols corresponding to the same tape cell i in the previous step and its two neighbors – three symbols total – because the head only moves in one step.


Proof of Theorem 2.4 Given a TM M with s states consider a t\times (2t+1) matrix T, a.k.a. the computation table, where row i is the configuration at time i. The starting configuration in Row 1 has the head in the middle cell. Note we don’t need more than t cells to the right or left because the head moves only by one cell in one step. Next we claim that Row i+1 can be computed from Row i by a Ckt with c_{s}t gates. This follows by locality of computation, where note each entry in Row i+1 can be computed by a Ckt of size c_{s}, by Theorem 2.3.

Stacking t such circuits we obtain a circuit of size c_{s}t^{2} which computes the end configuration of them TM.

There remains to output the value of the function. Had we assumed that the TM writes the output in a specific cell, we could just read it off by a circuit of size c. Without the assumption, we can have a circuit C:A\times \{0,1,\ldots ,s\}\to \{0,1\} which outputs 1 on (x,y) iff y\ne 0 and x=1 (i.e., if x is a 1 that is under the TM’s head). Taking an Or such circuits applied to every entry in the last row of T concludes the proof. QED


The simulation in Theorem 2.4. incurs a quadratic loss. However, a better simulation exists. In fact, this applies even to k-TMs.

Theorem 2.5. [6] Suppose an s-state k-TM computes f:\{0,1\} ^{*}\to \{0,1\} in time t(n)\ge n. Then f\in \text {CktGates}(c_{s,k}t(n)\log t(n)).


Exercise 2.9. Prove Theorem 2.5 assuming Theorem 2.2.


However, we won’t need it, as we will work with another model which is closer to how computers operate. And for this model we shall give a different and simpler “simulation” by circuits in Chapter ??.

In the other direction, TMs can simulate circuits if they have enough states. In general, allowing for the number of states to grow with the input length gives models with “hybrid uniformity.”

Exercise 2.10. Suppose that f:\zonzo has circuits with s gates. Show that f can be computed by a TM with s^{c} states in time s^{c}.



2.4 Rapid-access machines (RAMs)




“In some sense we are therefore merely making concrete intuitions that already pervade the literature. A related model has, indeed, been treated explicitly” […] [2]




The main feature that’s missing in all models considered so far is the ability to access a memory location in one time step. One can augment TMs with this capability by equipping them with an extra addressing tape and a special “jump” state which causes the head on a tape to move in one step to the address on the address tape. This model is simple enough to define, and could in a philosophical sense be the right model for how hardware can scale, since we charge for the time to write the address.


However, other models are closer to how computers seem to operate, at least over small inputs. We want to think of manipulating small integers and addresses as constant-time operations, as one typically has in mind when programming. There is a variety of such models, and some arbitrariness in their definition. Basically, we want to think of the memory as an array \mu of s cells of w bits and allow for typical operations of them, including addressing arithmetic and indirect addressing: reading and writing the cell indexed by another cell.

One issue that arises is how much memory the machine should have and consequently how big w should be. There are two main options here. For “typical programming,” we have a fixed memory size s and time bound t in mind, for example S=n^{3} and t=n^{2}. A good choice then is to set w:=\lceil \log _{2}s\rceil bits.

This however makes it harder to compare machines with different memory bounds. Also in some scenarios the memory size and the time bound are not fixed. This occurs for example when simulating another machine. To handle such scenarios we can start with a memory of s=n+c cells,and a cell size of w=\lceil \log _{2}s\rceil bits, enough to access the input. We then equip machines with the operation MAlloc which increases the memory (i.e., s) by one, and always sets w:=\lceil \log _{2}s\rceil . Note the latter operation may increase w by 1. The MAlloc operation is akin to the TM’s tape head wandering into unknown cells.

There are also two options for how the input is given to the machine. The difference doesn’t matter if you don’t care about w factors in time, but it matters if you do. For many problems, like sorting, etc. we think of the input and the output as coming in n cells of w bits. (Typically, w=c\log n, and one can simulate such cells with c cells with \log n bits.) In this case, the RAM is computing a function f:\left (\{0,1\} ^{w}\right )^{n}\to \left (\{0,1\} ^{w}\right )^{m} and the input to the RAM is given accordingly. This is what one often has in mind when writing programs that involve numbers. For other problems, it is natural to just give one bit of the input in each cell. That is, the RAM is computing f:\{0,1\} ^{n}\to \{0,1\} ^{m} and bit i of the input is placed in the i input cells. We will not be concerned too much with small factors and so we pick the second choice for simplicity.

Definition 2.5. A w-bit \ell -line rapid-access machine (RAM) with s cells consists of a memory array \mu [1..s] of s cells of w bits, c registers r_{1},r_{2},\ldots of w bits, and a program of \ell lines.

Each line of the program contains an instruction among the following:

  • Standard arithmetical and logical operations, such as r_{i}=r_{j}+r_{k} etc.
  • r_{i}:=\mu [r_{j}], called a Read operation, which reads the r_{j} memory cell and copies its content into r_{i},
  • \mu [r_{i}]:=r_{j}, called a Write operation, which writes r_{j} into memory cells r_{i}, memory cell and copies its content into r_{i},
  • MAlloc which increases s by 1 and, if s\ge 2^{w} also increases w by 1,
  • Stop.

Read and write operations out of boundary indices have no effect.

On an input x\in \{0,1\} ^{n}, the RAM starts the computation with s:=n+1 cells of memory. The input is written in cells 1..n, while \mu [0] contains the length n of the input.

The output is written starting in cell 1.


We use RAMs as our main model for time inside \text {P}.

Definition 2.6. \text {Time}(t(n)) is defined as \text {TM-Time}(t(n)) but for RAMs instead of TMs.


Theorem 2.6. \text {Time}(t(n))\subseteq \text {TM-Time}(t^{c}(n)), for any t(n)\ge n.


Exercise 2.11. Prove it.


What is the relationship between circuits and RAMs? If a “description” of the circuit is given, then a RAM can simulate the circuit efficiently. The other way around is not clear. It appears that circuits need a quadratic blow-up to simulate RAMs.

Exercise 2.12. Give a function f:\{0,1\}^* \to \{0,1\} in \text {Time}(c\log n) but which requires circuits of size \ge cn.


There are universal RAMs that can simulate any other RAM with only a constant-factor overhead, unlike the logarithmic-factor overhead for tape machines.

Lemma 2.2. There is a RAM U that on input (P,t,x) where P is a RAM, t is an integer, and x is an input

-Stops in time ct,

-Outputs P(x) if the latter stops within t steps on input x.



ProofThroughout the computation, U will keep track of the memory size s_{P} and cell-size w_{P} of P. These are initialized as in the initial configuration of P on input x, whereas U starts with bigger values, since its input also contains P and t. Let h be the first cell where the input x starts. Memory location i of P is mapped to i+h during the simulation. When P performs an operations among registers, U simulates that with its own registers, but discards the data that does not fit into w_{P} bits.

After each step, U decreases the counter. The counter can be stored in t cells, one bit per cell. The total number of operations to decrease such a counter from t to 0 is \le ct. Alternatively, we can think of the counter as being stored in a single register at the beginning of the simulation. Then decreasing the counter is a single operation. QED



2.5 A challenge to the computability thesis

Today, there’s a significant challenge to the computability thesis. This challenge comes from… I know what you are thinking: Quantum computing, superposition, factoring. Nope. Randomness.

The last century or so has seen an explosion of randomness affecting much of science, and computing has been a leader in the revolution. Today, randomness permeates computation. Except for basic “core” tasks, using randomness in algorithms is standard. So let us augment our model with randomness.

Definition 2.7. A randomized (or probabilistic) RAM, written RRAM, is a RAM equipped with the extra instruction

  • r_{i}:=\text {Rand}, which sets r_{i} to a uniform value, independent of all previous random choices.

For a RRAM and a sequence R=R_{1},R_{2},\ldots we write M(x,R) for the execution of M on input x where the j-th instruction r_{i}:=\text {Rand} is replaced with r_{i}:=R_{i}.

We refer to BPTime(t(n)) with error \epsilon (n) as the set of functions f that map bit strings x from a subset X\subseteq \{0,1\}^* to a set f(x) for which there exists a RRAM M such that, on any input x\in X of length \ge c_{M}, M stops within t(|x|) steps and \mathbb {P}_{R}[M(x,R)\in f(x)]\ge 1-\epsilon (|x|) .

If the error \epsilon is not specified then it is assumed to be 1/3. Finally, we define

\begin{aligned} \text {BPP}:=\bigcup _{a}\text {BPTime}(n^{a}). \end{aligned}



Exercise 2.13. Does the following algorithm show that deciding if a given integer x is prime is in \text {BPP}? “Pick a uniform integer y\le x. Check if y divides x.”


Today, one usually takes \text {BPP}, not \text {P}, for “feasible computation.” Thus it is natural to investigate how robust \text {BPP} is.


2.5.1 Robustness of BPP: Error reduction and tail bounds for the sum of random variables

The error in the definition of BPTime is somewhat arbitrary because it can be reduced. The way you do this is natural. For boolean functions, you repeat the algorithm many times, and take a majority vote. To analyze this you need probability bounds for the sum of random variables (corresponding to the outcomes of the algorithm).

Theorem 2.7. Let X_{1},X_{2},\ldots ,X_{t} be i.i.d. boolean random variables with p:=\mathbb {P}[X_{i}=1]. Then for q\ge p we have \mathbb {P}[\sum _{i=1}^{t}X_{i}\ge qt]\le 2^{-D(q|p)t}, where

\begin{aligned} D(q|p):=q\log \left (\frac {q}{p}\right )+(1-q)\log \left (\frac {1-q}{1-p}\right ) \end{aligned}

is the divergence.


Now one can get a variety of bounds by bounding divergence for different settings of parameter. We state one such bound which we use shortly.

Fact 2.1. D(1/2|1/2-\epsilon )\ge \epsilon ^{2}.


Exercise 2.14. Plot both sides of Fact 2.1 as a function of \epsilon . (Hint: I used


Using this bound we can prove the error reduction stated earlier.

Theorem 2.8. [Error reduction for BPP] For boolean functions, the definition of BPP (Definition 2.7) remains the same if 1/3 is replaced with 1/2-1/n^{a} or 1/2^{n^{a}}, for any constant a.



ProofSuppose that f is in \text {BPP} with error p:=1/2-1/n^{a} and let M be the corresponding RRAM. On an input x, let us run t:=n^{2a}\cdot n^{b} times M, each time with fresh randomness, and take a majority vote. The new algorithm is thus

\begin{aligned} \text {Maj}(M(x,R_{1}),M(x,R_{2}),\ldots ,M(x,R_{t})). \end{aligned}

This new algorithm makes a mistake iff at least t/2 runs of M make a mistake. To analyze this error probability we invoke Theorem 2.7 where X_{i}:=1 iff run i of the algorithm makes a mistake, i.e., M(x,R_{i})\ne f(x), and \epsilon :=1/n^{a}. By Fact 2.1 we obtain an error bound of

\begin{aligned} 2^{-D(1/2|1/2-\epsilon )t}\le 2^{-\epsilon ^{2}t}\le 2^{-n^{b}}, \end{aligned}

as desired. The new algorithm still runs in power time, for fixed a and b. QED


Exercise 2.15. Consider an alternative definition of \text {BPTime}, denoted \text {BPTime'}, which is analogous to \text {BPTime} except that the requirement that the machine always stops within t(|x|) steps is relaxed to “the expected running time of the machine is t(|x|).

Show that defining \text {BPP} with respect to \text {BPTime} or \text {BPTime'} is equivalent.


Exercise 2.16. Consider biased RRAMs which are like RRAMs except that the operation Rand returns one bit which, independently from all previous calls to Rand, is 1 with probability 1/3 and 0 with probability 2/3. Show that BPP does not change if we use biased RRAMs.



2.5.2 Does randomness buy time?

We can always brute-force the random choices in exponential time. If a randomized machine uses r random bits then we simulate it deterministically by running it on each of the 2^{r} choices for the bits. A RRAM machine running in time t\ge n has registers of \le c\log t bits. Each Rand operation gives a uniform register, so the machine uses \le ct\log t bits. This gives the following inclusions.

Theorem 2.9. Time(t) \subseteq BPTime(t) \subseteq Time(c^{t\log t}), for any function t=t(n). In particular, \text {P}\subseteq \text {BPP}\subseteq \text {EXP}.



ProofThe first inclusion is by definition. The idea for the second was discussed before, but we need to address the detail that we don’t know what t is. One way to carry through the simulation is as follows. The deterministic machine initializes a counter r to 0. For each value of r it enumerates over the 2^{r} choices R for the random bits, and runs the RRAM on each choice of R, keeping track of its output on each choice, and outputting the majority vote. If it ever runs out of random bits, it increases r by 1 and restarts the process.

To analyze the running time, recall we only need r\le ct\log t. So the simulation runs the RRAM at most ct\log t\cdot 2^{ct\log t}\le 2^{ct\log t} times, and each run takes time ct, where this last bound takes into account the overhead for incrementing the choice of r, and redirecting the calls to \text {Rand} to R. QED


Now, two surprises. First, \text {BPP}\subseteq \text {EXP} is the fastest deterministic simulation we can prove for RAMs, or even 2-TMs. On the other hand, and that is perhaps the bigger surprise, it appears commonly believed that in fact \text {P}=\text {BPP}! Moreover, it appears commonly believed that the overhead to simulate randomized computation deterministically is very small. Here the mismatch between our ability and common belief is abysmal.

However, we can do better for TMs. A randomized TMs has two transition functions \sigma _{0} and \sigma _{1}, where each is as in Definition 2.1. At each step, the TM uses \sigma _{0} or \sigma _{1} with probability 1/2 each, corresponding to tossing a coin. We can define \text {TM-BPTime} as \text {BPTime} but with randomized TMs instead of RRAMS.

Theorem 2.10. [9] \text {TM-BPTime}(t)\subseteq \text {Time}(2^{\sqrt {t}\log ^{c}t}), for any t=t(n)\ge n.



2.5.3 Polynomial identity testing

We now discuss an important problem which is in BPP but not known to be in P. In fact, in a sense to be made precise later, this is the problem in BPP which is not known to be in P. To present this problem we introduce two key concepts which will be used many times: finite fields, and arithmetic circuits. \medskip

Finite fields

A finite field \mathbb {F} is a finite set with elements 0 and 1 that is equipped with operations + and \cdot that behave “in the same way” as the corresponding operations over the reals or the rationals. One example are the integers modulo a prime p. For p=2 this gives the field with two elements where + is Xor and \cdot is And. For larger p you add and multiply as over the integers but then you take the result modulo p.

The following summarizes key facts about finite fields. The case of prime fields suffices for the main points of this section, but stating things for general finite fields actually simplifies the exposition overall (since otherwise we need to add qualifiers to the size of the field).

Fact 2.2. [Finite fields] A unique finite field of size q exists iff q=p^{t} where p is a prime and t\in \mathbb {N}. This field is denoted \mathbb {F}_{q}.

Elements in the field can be identified with \{0,1,\ldots ,p-1\}^{t}.

[7] Given q, one can compute a representation of a finite field of size q in time (tp)^{c}. This representation can be identified with p plus an element of \{0,1,\ldots ,p-1\}^{t}.

Given a representation r and field elements x,y computing x+y and x\cdot y is in \text {Time}(n\log ^{c}n).


Fields of size 2^{t} are of natural interest in computer science. It is often desirable to have very explicit representations for such and other fields. Such representations are known and are given by simple formulas, and are in particular computable in linear time. \medskip

Arithmetic circuits

We now move to defining arithmetic circuits, which are a natural generalization of the circuits we encountered in section º2.3.

Definition 2.8. An arithmetic circuit over a field \mathbb {F} is a circuit where the gates compute the operations + and \cdot over \mathbb {F}, or are constants, or are input variables. Such a circuit computes a polynomial mapping \mathbb {F}^{n}\to \mathbb {F}.

The PIT (polynomial identity testing) problem over \mathbb {F}: Given an arithmetic circuit C over \mathbb {F} with n input variables, does C(x)=0 for every x\in \mathbb {F}^{n}?


The PIT problem over large fields is in \text {BPP} but it is not known to be in \text {P}. The requirement that the field be large is critical, see Problem ??.

Theorem 2.11. [PIT over large fields in \text {BPP}] Given an arithmetic circuit C and a field of size \ge c2^{|C|} we can solve PIT in \text {BPP}.


To prove this theorem we need the following fundamental fact.

Lemma 2.3. Let p be a polynomial over a finite field \mathbb {F} with n variables and degree \le d. Let S be a subset of \mathbb {F}, and suppose d<|S|. The following are equivalent:



  1. p is the zero polynomial.
  2. p(x)=0 for every x\in \mathbb {F}^{n}.
  3. \mathbb {P}_{x_{1},x_{2},\ldots ,x_{n}\in S}[p(x)=0]>d/|S|.


Proof and discussion of Lemma 2.3 The implications 1.\Rightarrow 2.\Rightarrow 3. are trivial, but note that for the latter we need d<|S|. The implication 3.\Rightarrow 1. requires more work. It is a multi-variate generalization of the fundamental fact that a non-zero univariate polynomial of degree d has at most d roots. The fundamental fact can be recovered setting n:=1 and S:=\mathbb {F}.

To prove the multi-variate generalization we proceed by induction on n. The base case n=1 is the fundamental fact (which we will not prove). For larger n write

\begin{aligned} p(x_{1},x_{2},\ldots ,x_{n})=\sum _{i=0}^{d}x_{1}^{i}p_{i}(x_{2},x_{3},\ldots ,x_{n}). \end{aligned}


If p is not the zero polynomial then there is at least one i such that p_{i} is not the zero polynomial. Let j be the largest such i. Note that p_{j} has degree at most d-j. By induction hypothesis

\begin{aligned} \mathbb {P}_{x_{2},\ldots ,x_{n}\in S}[p_{j}(x)=0]\le (d-j)/|S|. \end{aligned}

For every choice of x_{2},x_{3},\ldots ,x_{n} s.t. p_{j}(x)\ne 0, the polynomial p is a non-zero polynomial q_{x_{2},x_{3},\ldots ,x_{n}}(x_{1}) only in the variable x_{1}. Moreover, its degree is at most j by our choice of j. Hence by the n=1 case the probability that q is 0 over the choice of x_{1} is \le j.


\begin{aligned} \mathbb {P}_{x_{1},x_{2},\ldots ,x_{n}\in S}[p(x)=0]\le (d-j)/|S|+j/|S|=d/|S|. \end{aligned}




Proof of Theorem 2.11 A circuit C contains at most |C| multiplication gates. Each multiplication gate at most squares the degree of its inputs. Hence C computes a polynomial of degree \le 2^{|C|}. Let S be a subset of size c\cdot 2^{|C|} of \mathbb {F}. Assign uniform values from S independently to each variables, and evaluate the circuit. If C evaluates to 0 everywhere then obviously the output will be 0. Otherwise, by Lemma 2.3, the probability we get a 0 is \le 2^{|C|}/c2^{|C|}\le 1/3. QED


Exercise 2.17. Show that the PIT problem over the integers is in BPP. (Hint: Use that the number of primes in \{1,2,\ldots ,t\} is \ge t/\log ^{c}t, for every t\ge c, and that checking if a number is prime is in \text {P}.)


The t/\log ^{c}t bound is a weak form of the prime number theorem. The weak form typically suffices in computer science, and has a simple and cute encoding proof.


2.5.4 Simulating BPP by circuits

While we don’t know if \text {P}=\text {BPP}, we can prove that, like \text {P}, BPP has power-size circuits.

Theorem 2.12. [1] \text {BPP\ensuremath {\subseteq \bigcup _{a}\text {CktGates}(n^{a})}.}



ProofLet f:X\subseteq \{0,1\}^* \to \{0,1\} be in \text {BPP}. By Theorem 2.8 we can assume that the error is \epsilon <2^{-n}, and let M be the corresponding RRAM. Note

\begin{aligned} \mathbb {P}_{R}[\exists x\in \{0,1\} ^{n}:M(x,R)\ne f(x)]\le \sum _{x\in \{0,1\}^n }\mathbb {P}_{R}[M(x,R)\ne f(x)]\le 2^{n}\cdot \epsilon <1, \end{aligned}

where the first inequality is a union bound.

Therefore, there is a fixed choice for R that gives the correct answer for every input x\in \{0,1\} ^{n}. This choice can be hardwired in the circuit, and the rest of the computation can be written as a circuit by Theorem 2.4. QED


Exercise 2.18. In this exercise you will practice the powerful technique of combining tail bounds with union bounds, which was used in the proof of Theorem 2.12.

An error-correcting code is a subset C\subseteq \{0,1\} ^{n} s.t. for any distinct x,y\in C, x and y differ in at least n/3 coordinates. Prove the existence of codes of size |C|\ge 2^{\epsilon n} for a constant \epsilon .



2.5.5 Questions raised by randomness

The introduction of randomness in our model raises several fascinating questions. First, does “perfect” randomness exists “in nature?” Second, do we need “perfect” randomness for computation? A large body of research has been devoted to greatly generalize Problem 2.16 to show that, in fact, even imperfect sources of randomness suffices for computation. Third, do we need randomness at all? Is \text {P}=\text {BPP}?

One of the exciting developments of complexity theory has been the connection between the latter question and the “grand challenge” from the next chapter. At a high level, it has been shown that explicit functions that are hard for circuits can be used to de-randomize computation. In a nutshell, the idea is that if a function is hard to compute then its output is “random,” so can be used instead of true randomness. The harder the function the less randomness we need. At one extreme, we have the following striking connection:

Theorem 2.13. [4] Suppose for some a>0 there is a function in \text {Time}(2^{an}) which on inputs of length n cannot be computed by circuits with 2^{n/a} gates, for all large enough n. Then \text {P}=\text {BPP}.


In other words, either randomness is useless for power-time computation, or else circuits can speed up exponential-time uniform computation!


2.6 Inclusion extend “upwards,” separations downwards

To develop intuition about complexity, we now discuss a general technique known as padding. In short, the technique shows that if you can trade resource X for Y, then you can also trade a lot of X for a lot of Y. For a metaphor, if you have a magical device that can turn one pound of sill into gold, you can also use it to turn two pounds of sill into gold. The contrapositive is that if you can’t trade a lot of X for a lot of Y, then you also can’t trade a little of X for a little of Y.

We give a first example using the classes that we have encountered so far.

Example 2.3. Suppose that \text {BPTime}(cn)\subseteq \text {Time}(n^{2}). Then \text {BPTime}(n^{2})\subseteq \text {Time}(cn^{4}).



ProofLet f:\{0,1\}^* \to \{0,1\} be a function in \text {BPTime}(n^{2}). Consider the function f' that on input x of length n equals f computed on the first \sqrt {n} bits of x. Thus, inputs to f' are padded with n-\sqrt {n} useless symbols.

Note that f'\in \text {BPTime}(cn), since in linear time we can erase the last n-\sqrt {n} symbols and then just run the algorithm for f which takes time quadratic in \sqrt {n} which is linear in n. (If computing square roots is not an available instruction, one can show that computing \sqrt {n} can be done in linear time, for example using binary search.)

By assumption, f'\in \text {Time}(n^{2}).

To compute f in time cn^{4} we can then do the following. Given input x of length n, pad x to an input of length n^{2} in time cn^{2}. Then run the algorithm for f'. This will take time c(n^{2})^{2}\le cn^{4}. QED



2.7 Problems

Problem 2.1. [Indexing] Describe a TM that on input (x,i)\in \{0,1\} ^{n}\times \{1,2,\ldots ,n\} outputs bit i of x in time cn\log n.


Problem 2.2. Show that Palindromes can be solved in time n\log ^{c}n on a randomized TM. (Yes, only one tape.)

Hint: View the input as coefficients of polynomials.


Problem 2.3. Give a function f:X\subseteq \{0,1\}^* \to \{0,1\} that is in \text {BPTime}\text {(c) but not in \ensuremath {\text {Time}(n/100)}.}





[1]   Leonard Adleman. Two theorems on random polynomial time. In 19th IEEE Symp. on Foundations of Computer Science (FOCS), pages 75–83. 1978.

[2]   Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. J. Comput. Syst. Sci., 18(2):155–193, 1979.

[3]   Fred Hennie and Richard Stearns. Two-tape simulation of multitape turing machines. J. of the ACM, 13:533–546, October 1966.

[4]   Russell Impagliazzo and Avi Wigderson. \mathit {P} = \mathit {BPP} if E requires exponential circuits: Derandomizing the XOR lemma. In 29th ACM Symp. on the Theory of Computing (STOC), pages 220–229. ACM, 1997.

[5]   O. B. Lupanov. A method of circuit synthesis. Izv. VUZ Radiofiz., 1:120–140, 1958.

[6]   Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. of the ACM, 26(2):361–381, 1979.

[7]   Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Mathematics of Computation, 54(189):435–447, 1990.

[8]   Alan M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. London Math. Soc., s2-42(1):230–265, 1937.

[9]   Emanuele Viola. Pseudorandom bits and lower bounds for randomized turing machines. Theory of Computing, 18(10):1–12, 2022.


Mathematics of the impossible: Computational Complexity

I am teaching and writing some notes on complexity. I hope they will become a book, so they are organized as such. The notes will be serialized on this blog, and you can find the latest version of the book in pdf here, which has a better rendering of tables, pictures, comic strips, etc.

0.1 Conventions, choices, and caveats

I write this section before the work is complete, so some of it may change.

This book covers basic results in complexity theory and can be used for a course on the subject. At the same time, it is perhaps sui generis in that it also tells a story of the quest for impossibility results, includes some personal reflections, and makes some non-standard choices about topics and technical details. Some of this is discussed next. \medskip

To test your understanding of the material…

this book is interspersed with mistakes, some subtle, some blatant, some not even mistakes but worrying glimpses into the author’s mind. Please send all bug reports and comments to (my five-letter last name) to be included in the list of heroes. \medskip

The c notation.

The mathematical symbol c has a special meaning in this text. Every occurrence of c denotes a real number >0. There exist choices for these numbers such that the claims in this book are (or are meant to be) correct. This replaces, is more compact than, and is less prone to abuse than the big-Oh notation (sloppiness hides inside brackets).

Example 0.1. “For all sufficiently large n” can be written as n\ge c.

“For every \epsilon and all sufficiently large n” can be written as n\ge c_{\epsilon }.

The following are correct statements:

“It is an open problem to show that some function in NP requires circuits of size cn.” At the moment of this writing, one can replace this occurrence with 5. Note such a claim will remain true if someone proves a 6n lower bounds. One just needs to “recompile” the constants in this book.

c>1+c”, e.g. assign 2 to the first occurrence, 1 to the second.

100n^{15}<n^{c}”, for="" all="" large="" enough="" "" n.="" assign="" c="16$."

The following are not true:

c<1/n for every n”. No matter what we assign c to, we can pick a large enough n. Note the assignment to c is absolute, independent of n.

More generally, when subscripted this notation indicates a function of the subscript. There exist choices for these functions such that the claims in this book are (or are meant to be) correct. Again, each occurrence can indicate a different function.

For the reader who prefers the big-Oh notation a quick an dirty fix is to replace every occurrence of c in this book with O(1). \medskip

The alphabet of TMs.

I define TMs with a fixed alphabet. This choice slightly simplifies the exposition (one parameter vs. two), while being more in line with common experience (it is more common experience to increase the length of a program than its alphabet). This choice affects the proof of Theorem ??. But it isn’t clear that the details are any worse. \medskip

Partial vs. total functions (a.k.a. on promise problems).

Recall that promise problems offer the most direct way of formulating natural computational problems. […] In spite of the foregoing opinions, we adopt the convention of focusing on standard decision and search problems. [2]

I define complexity w.r.t. partial functions whereas most texts consider total functions, i.e. we consider computing functions with arbitrary domains rather than any possible string. This is sometimes called “promise problems.” This affects many things, for example the hierarchy for \text {BPTime} (Exercise ??). \medskip

References and names.

I decided to keep references in the main text to a minimum, just to avoid having a long list later with items “Result X is due to Y,” but relegate discussion to bibliographic notes. I have also decided to not spell out names of authors, which is increasingly awkward. Central results, such as the PCP theorem, are co-authored by five or more people. \medskip


It is customary in complexity theory to bound quantities by a polynomial, as in polynomial time, when in fact the only terms that matters is the leading time. This also lends itself to confusion since polynomials with many terms are useful for many other things. I use power instead of polynomial, as in power time. \medskip

Random-access machines.

“Random access” also leads to strange expressions like “randomized random-access” [1]. \medskip


Are presented as an implication. \medskip

Randomness and circuits.

While randomness and circuits are everywhere in current research, and seem to be on everyone’s mind, they are sometimes still relegated to later chapters, almost as an afterthought. This book starts with them right away, and attempts to weave them through the narrative. \medskip

Data structures

Their study, especially negative results, squarely belongs to complexity theory. Yet data structures are strangely omitted in common textbooks. Results on data structures even tend to miss main venues for complexity theory to land instead on more algorithmic venues! We hope this book helps to revert this trend. \medskip

Algorithms & Complexity

…are of course two sides of the same coin. The rule of thumb I follow is to present algorithms that are surprising and challenge our intuition of computation, and ideally match lower bounds, even though they may not be immediately deployed. \medskip

Exercises and problems.

Exercises are interspersed within the narrative and serve as “concept check.” They are not meant to be difficult or new, though some are. Problems are collected at the end and tend to be harder and more original, though some are not. \medskip

Summary of some terminological and not choices.

Here it is:

Some other sources this book acronym

O(1), \Omega (1) c

Turing machine tape machine TM

random-access machine rapid-access machine RAM

polynomial time power time P

mapping reduction (sometimes) A reduces to B in \text {P} means B\in \text {P}\Rightarrow A\in \text {P}

Extended Church-Turing thesis Power-time computability thesis

pairwise independent pairwise uniform

FP, promise-P P

TM with any alphabet TM with fixed alphabet

classes have total functions classes have partial functions


Copyright 2022-present by Emanuele Viola

Chapter 1
A teaser

Consider a computer with three bits of memory. There’s also a clock, beating 1,2,3,\ldots In one clock cycle the computer can read one bit of the input and update its memory arbitrarily based on the value of the bit and the current memory, or stop and return a value.

Let’s give a few examples of what such computer can do.

First, it can compute the \text {And} function on n bits:

\text {Computing And of (\ensuremath {x_{1}},\ensuremath {x_{2}},\ensuremath {\ldots },\ensuremath {x_{n}})}

\text {For }i=1,2,\ldots \text { until }n

\text {Read }x_{i}

If x_{i}=0 return 0

\text {Return }1

We didn’t really use the memory. Let’s consider a slightly more complicated example. A word is palindrome if it reads the same both ways, like racecar, non, anna, and so on. Similarly, example of palindrome bit strings are 11,0110, and so on.

Let’s show that the computer can decide if a given string is palindrome quickly, in n steps

\text {Deciding if \ensuremath {(x_{1},x_{2},\ldots ,x_{n})} is palindrome:}

\text {For }i=1,2,\ldots \text { until }i>n/2

\text {Read \ensuremath {x_{i}} }and write it in memory bit m

If m\ne x_{n-i} return 0

\text {Return }1

That was easy. Now consider the \text {Majority} function on n bits, which is 1 iff the sum of the input bits is >n/2 and 0 otherwise. Majority, like any other function on n bits, can be computed on such a computer in time exponential in n. To do that, you do a pass on the input and check if it’s all zero, using the program for And given above. If it is, return 0. If it is not, you do another pass now checking if it’s all zero except the last bit is 1. If it is, return 0. You continue this way until you exhausted all the 2^{n}/2 possible inputs with Majority equals to 0. If you never returned 0 you can now safely return 1.

As we said, this works for any function, but it’s terribly inefficient. Can we do better for Majority? Can we compute it in time which is just a power of n?

Exercise 1.1. Convince yourself that this is impossible. Hint: If you start counting bits, you’ll soon run out of memory.

If you solved the exercise, you are not alone.

And yet, we will see the following shocking result:

Shocking theorem:

Majority can be computed on such a computer in time n^{c}.

And this is not a trick tailored to majority. Many other problems, apparently much more complicated, can also be solved in the same time.

But, there’s something possibly even more shocking.

Shocking situation:

It is consistent with our state of knowledge that every “textbook algorithm” can be solved in time n^{c} on such a computer! Nobody can disprove that. (Textbook algorithms include sorting, maxflow, dynamic programming algorithms like longest common subsequence etc., graph problems, numerical problems, etc.)

The Shocking theorem gives some explanation for the Shocking situation. It will be hard to rule out efficient programs on this model, since they are so powerful and counterintuitive. In fact, we will see later that this can be formalized. Basically, we will show that the model is so strong that it can compute functions that provably escape the reach of current mathematics… if you believe certain things, like that it’s hard to factor numbers. This now enters some of the mysticism that surrounds complexity theory, where different beliefs and conjectures are pitted against each other in a battle for ground truth.


[1]   Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. J. Comput. Syst. Sci., 18(2):155–193, 1979.

[2]   Oded Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008.

|Northeastern – Simons Institute| < 10, EXP, and PhD in complexity theory

Northeastern recently merged with Mills College, in Oakland, California. The massive campus is less than 10 miles from the Simons Institute for the Theory of Computing. This is the n+1 growth in an amazing streak, including multiple campuses all over the world, naming gifts, and new buildings, including one named after a complexity class: EXP.

This expansion puts Northeastern at the center of the two main hubs for theory in the U.S., the Bay State and the Bay Area.

If you want to work on theory apply to join our theory group. I always have room for strong students.

Getting started with OpenGL and C++

Below is a template that shows a fractal plant that you can modify with the keys z,w,s. Getting this to work was not straightforward, but the idea is that now you can just have fun and start your videogame. The code is not meant to be optimized at all, but simple. It’s pretty much self-explanatory, basically you can just replace the call to drawFrac in display to show whatever you want, and you handle key strokes in the idle function. I only tested it on Linux.

This is a template to get started with OpenGL (which needs to be installed)
To compile: g++ GLtemplate.c++ -o GLtemplate -lglut -lGLU -lGL
To run: ./GLtemplate

It shows a fractal plant that you can modify with the keys z,w,s


#include <GL/glut.h>
#include <stdio.h>
#include <math.h>

GLfloat userAngle = M_PI/2, userLength = 0.5;

//Keyboard code.  keyStates[x] is true if x is pressed.

bool keyStates[256];     //Key state values

void keyPressed (unsigned char key, int x, int y) { keyStates[key] = true;  }
void keyUp      (unsigned char key, int x, int y) { keyStates[key] = false; }

void idle() {
  if (keyStates['z']) {userAngle += 0.01;}
  if (keyStates['w']) {userLength += 0.01;}
  if (keyStates['s']) {userLength -= 0.01; if (userLength < 0) userLength = 0;}

/* Draws a plant from x,y with first piece length l, and angle a
The window has coordinates from (-1,-1) to (1,1).  The center is 0,0.

void drawFrac(GLfloat x,GLfloat y,GLfloat l, GLfloat a) {
  if ( l < 0.001 )

  glColor3f(0, l*30, 0);
  glLineWidth(l*10);  //Must be before glBegin(GL_LINES)






//This is the function that draws everything on the screen
void display() {
  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); //Clear screen

  /* Draw whatever you need here */


int main(int argc, char** argv)
    glutInit(&argc, argv);
    glutInitWindowSize(1000, 1000);

    return 0;

Myth creation: The switching lemma

The history of science is littered with anecdotes about misplaced credit. Because it does not matter if it was A or B who did it; it only matters if it was I or not I. In this spirit I am starting a series of posts about such misplaced credit, which I hesitated before calling more colorfully “myth creation.” Before starting, I want to make absolutely clear that I am in no way criticizing the works themselves or their authors. In fact, many are among my favorites. Moreover, at least in the examples I have in mind right now, the authors do place their work in the appropriate context with the use of citations etc. My only point is the credit that the work has received within and without our community (typically due to inertia and snowball effects rather than anything else).

Of course, at some level this doesn’t matter. You can call Chebichev’s polynomials rainbow sprinkles and the math doesn’t change. And yet at some other level maybe it does matter a little, for science isn’t yet a purely robotic activity. With these posts I will advertise unpopular points of views that might be useful, for example to researchers who are junior or from different communities.

The switching lemma

I must admit I had a good run — Johan Hastad (privately to the blogger)

Random restrictions have been used in complexity theory since at least the 60’s [Sub61]. The first dramatic use in the context of AC0 is due to [FSS84Ajt83]. These works proved a switching lemma the amazing fact that a DNF gets simplified by a random restriction to the point that it can be written as a CNF, so you can collapse layers and induct. (An exposition is given below.) Using it, they proved super-polynomial lower bounds for AC0. The proof in [FSS84] is very nice, and if I want to get a quick intuition of why switching is at all possible, I often go back to it. [Ajt83] is also a brilliant paper, and long, unavailable online for free, filled with a logical notation which makes some people twitch. The first symbol of the title says it all, and may be the most obscene ever chosen:

\begin{aligned} \Sigma _{1}^{1}. \end{aligned}

Subsequently, [Yao85] proved exponential lower bounds of the form 2^{n^{c}}, with a refined analysis of the switching lemma. The bounds are tight, except for the constant c which depends on the depth of the circuit. Finally, the star of this post [Has86Has87] obtained c=1/(depth-1).

Yao’s paper doesn’t quite state that a DNF can be written exactly as a CNF, but it states that it can be approximated. Hastad’s work is the first to prove that a DNF can be written as a CNF, and in this sense his statement is cleaner than Yao’s. However, Yao’s paper states explicitly that a small circuit, after being hit by a restriction, can be set to constant by fixing few more bits.

The modern formulation of the switching lemma says that a DNF can be written as a shallow decision tree (and hence a small CNF). This formulation in terms of decision trees is actually not explicit in Hastad’s work. Beame, in his primer [Bea94], credits Cai with this idea and mentions several researchers noted Hastad’s proof works in this way.

Another switching lemma trivia is that the proof in Hastad’s thesis is actually due to Boppana; Hastad’s original argument — of which apparently no written record exists — was closer to Razborov’s later proof.

So, let’s recap. Random restrictions are already in [Sub61]. The idea of switching is already in [FSS84Ajt83]. You already had three analyses of these ideas, two giving superpolynomial lower bounds and one [Yao85] giving exponential. The formulation in terms of decision trees isn’t in [Has87], and the proof that appears in [Has87] is due to Boppana.

Still, I would guess [Has87] is more well known than all the other works above combined. [Yao85] did have a following at the time — I think it appeared in the pop news. But hey — have you ever heard of Yao’s switching lemma?

The current citation counts offer mixed support for my thesis:

FSS: 1351

Y: 732

H – paper “Almost optimal…:” 867

H – thesis: 582

But it is very hard to use citation information. The two H citations overlap, and papers are cited for various reasons. For example FSS got a ton of citations for the connection to oracles (which has nothing to do with switching lemmas).

Instead it’s instructive to note the type of citations that you can find in the literature:

Hastad’s switching lemma is a cornerstone of circuit complexity [No mention of FSS, A, Y]

Hastad‘s Switching Lemma is one of the gems of computational complexity [Notes below in passing it builds on FSS, A, Y]

The wikipedia entry is also telling:

In computational complexity theory, Hastad’s switching lemma is a key tool for proving lower bounds on the size of constant-depth Boolean circuits. Using the switching lemma, Johan Hastad (1987) showed that... [No mention of FSS,A,Y]

I think that 99% of the contribution of this line of research is the amazing idea that random restrictions simplify a DNF so that you can write it as a CNF and collapse. 90% of the rest is analyzing this to get superpolynomial lower bounds. And 90% of whatever is left is analyzing this to get exponential lower bounds.

Going back to something I mentioned at the beginning, I want to emphasize that Hastad during talks makes a point of reminding the audience that the idea of random restrictions is due to Sipser, and of Boppana’s contribution. And I also would like to thank him for his help with this post.

OK — so maybe this is so, but it must then be the case that [Has87] is the final word on this stuff, like the ultimate tightest analysis that kills the problem. Actually, it is not tight in some regimes of interest, and several cool works of past and recent times address that. In the end, I can only think of one reason why [Has87] entered the mythology in ways that other works did not, the reason that I carefully sidestepped while composing this post: å.

Perhaps one reason behind the aura of the switching lemma is that it’s hard to find examples. It would be nice to read: If you have this extreme DNF here’s what happens, on the other hand for this other extreme DNF here’s what happens, and in general this always works and here’s the switching lemma. Examples are forever – Erdos. Instead the switching lemma is typically presented as blam!: an example-free encoding argument which feels deus ex machina, as in this crisp presentation by Thapen. For a little more discussion, I liked Bogdanov’s lecture notes. Next I give a slightly different exposition of the encoding argument.

The simplest case: Or of n bits.

Here the circuit C is simply the Or of n bits x_{1},x_{2},\ldots ,x_{n}. This and the next case can be analyzed in more familiar ways, but the benefit of the encoding argument presented next is that it will extend to the general case more easily… arguably. Anyway, it’s also just fun to learn a different argument.

So, let’s take a random restriction \rho with exactly s stars. Some of the bits may become 0, others 1, and others yet may remain unfixed, i.e., assigned to stars. Those that become 0 you can ignore, while if some become 1 then the whole circuit becomes 1.

We will show that the number of restrictions for which the restricted circuit C|_{\rho } requires decision trees of depth \ge d is small. To accomplish this, we are going to encode/map such restrictions using/to a restriction… with no stars (that is, just a 0/1 assignment to the variables). The gain is clear: just think of a restriction with zero stars versus a restriction with one star. The latter are more by a factor about the number n of variables.

A critical observation is that we only want to encode restrictions for which C|_{\rho } requires large depth. So \rho does not map any variable to 1, for else the Or is 1 which has decision trees of depth 0.

The way we are going to encode \rho is this: Simply replace the stars with ones. To go back, replace the ones with stars. We are using the ones in the encoding to “signal” where the stars are.

Hence, the number of bad restrictions is at most 2^{n}, which is tiny compared to the number \binom {n}{s}2^{n-s} of restrictions with s stars.

The medium case: Or of functions on disjoint inputs.

Instead of working with DNFs, I will consider a circuit C which is the Or of arbitrary functions f_{i} each on w bits. You can immediately get this formulation from the usual one for DNFs, but I still find it a little useful since otherwise you might think there is something special about DNFs. What is special is that you take the Or of the functions, and we will exploit this again shortly.

In this warm-up case, we start with functions on disjoint inputs. So, again, let’s take a random restriction \rho with exactly s stars. Some of the functions may become 0, others 1, and others yet may remain unfixed. Those that become 0 you can ignore, while if some become 1 then the whole circuit becomes 1.

As before, we will show that the number of restrictions for which the restricted circuit C|_{\rho } requires decision trees of depth \ge d is small. To accomplish this, we are going to encode/map such restrictions using/to a restriction with just s-d stars, plus a little more information. As we saw already, the gain in reducing the number of stars is clear. In particular, standard calculations show that saving d stars reduces the number of restrictions by a factor O(s/n)^{d}. The auxiliary information will give us a factor of w^{d}, leading to the familiar bound O(ws/n)^{d}.

As before, recall that we only want to encode restrictions for which C|_{\rho } requires large depth. So no function in C|_{\rho } is 1, for else the circuit is 1 and has decision trees of depth 0. Also, you have d stars among inputs to functions that are unfixed (i.e., not even fixed to 0), for else again you can compute the function reading less than d bits. Because the functions are unfixed, there is a setting for those d stars (and possibly a few more stars – that would only help the argument) that make the corresponding functions 1. We are going to pick precisely that setting in our restriction \rho ' with s-d stars. This allows us to “signal” which functions had inputs with the stars we are saving (namely, those that are the constant 1). To completely recover \rho , we simply add extra information to indicate where the stars were. The saving here is that we only have to say where the stars are among w symbols, not n.

The general case: Or of functions on any subset of w bits.

First, the number of functions does not play a role, so you can think you have functions on any possible subset of w bits, where some functions may be constant. The idea is the same, except we have to be slightly more careful because when we set values for the stars in one function we may also affect other functions. The idea is simply to fix one function at the time. Specifically, starting with \rho , consider the first function f that’s not made constant by \rho . So the inputs to f have some stars. As before, let us replace the stars with constants that make the function f equal to the constant 1, and append the extra information that allows us to recover where these stars were in \rho .

We’d like to repeat the argument. Note however we only have guarantees about C|_{\rho }, not C|_{\rho } with some stars replaced with constants that make f equal to 1. We also can’t just jump to the 2nd function that’s not constant in C|_{\rho }, since the “signal” fixing for that might clash with the fixing for the first – this is where the overlap in inputs makes things slightly more involved. Instead, because C|_{\rho } required decision tree depth at least d, we note there have to be some assignments to the m stars in the input to f so that the resulting, further restricted circuit still requires decision tree depth \ge d-m (else C|_{\rho } has decision trees of depth <d).  We append this assignment to the auxiliary information and we continue the argument using the further restricted circuit.


[Ajt83]    Mikl�s Ajtai. \Sigma \sp {1}\sb {1}-formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.

[Bea94]   Paul Beame. A switching lemma primer. Technical Report UW-CSE-95-07-01, Department of Computer Science and Engineering, University of Washington, November 1994. Available from

[FSS84]   Merrick L. Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Mathematical Systems Theory, 17(1):13–27, 1984.

[Has86]   Johan H�stad. Almost optimal lower bounds for small depth circuits. In Juris Hartmanis, editor, Proceedings of the 18th Annual ACM Symposium on Theory of Computing, May 28-30, 1986, Berkeley, California, USA, pages 6–20. ACM, 1986.

[H�s87]   Johan H�stad. Computational limitations of small-depth circuits. MIT Press, 1987.

[Sub61]   B. A. Subbotovskaya. Realizations of linear functions by formulas using +, *, -. Soviet Mathematics-Doklady, 2:110–112, 1961.

[Yao85]   Andrew Yao. Separating the polynomial-time hierarchy by oracles. In 26th IEEE Symp. on Foundations of Computer Science (FOCS), pages 1–10, 1985.

Talk in two days on correlation bounds

In two days, this Thursday at 12PM EST, I am giving a talk on Correlation bounds and all that.  The talk is hosted by Igor Oliveira and is open to everybody.  Follow this link to join.

In the talk I will discuss correlation bounds for polynomials, and give full proofs of touch on three recent works of mine:

  1. Fourier conjectures, correlation bounds, and Majority (link)
  2. On correlation bounds against polynomials, with Peter Ivanov and Liam Pavlovic (link)
  3. Fooling polynomials using invariant theory, with Harm Derksen (upcoming)