Mathematics of the impossible, Chapter 7, Space

As mentioned in Chapter 2, Time is only one of several resources we with to study. Another important one is space, which is another name for the memory of the machine. Computing under space constraints may be less familiar to us than computing under time constraints, and many surprises lay ahead in this chapter that challenge our intuition of efficient computation.

If only space is under consideration, and one is OK with a constant-factor slackness, then TMs and RAMs are equivalent; much like \text {P} is invariant under power changes in time. In a sense, changing the space by a constant factor is like changing the time by a power; from this point of view the equivalence is not surprising.

We shall consider both space bounds bigger than the input length and smaller. For the latter, we have to consider the input separately. The machine should be able to read the input, but not write on its cells. One way to formalize this is to consider 2TMs, where one tape holds the input and is read-only. The other is a standard read-write tape.

We also want to compute functions f whose output is more than 1 bit. One option is to equip the machine with yet another tape, which is write-only. We prefer to stick to two tapes and instead require that given x,i the i output bit of f(x) is computable efficiently.

Definition 7.1. A function f:X\subseteq \{0,1\}^* \to \{0,1\}^* is computable in \text {Space}(s(n)) if there is a 2TM which on input (x,i) on the first tape, where x\in X and i\le |f(x)|, outputs the i bit of f(x), and the machine never writes on the first tape and never uses more than s(|x|) cells on the second.

We define:

\begin{aligned} \text {L}:= & \bigcup _{d}\text {Space}(d\log n),\\ \text {PSpace}:= & \bigcup _{d}\text {Space}(n^{d}). \end{aligned}

We investigate next some basic relationship between space and time.

Theorem 7.1. For every functions t and s:

  1. \text {\ensuremath {k}TM-Time}(t)\subseteq \text {Space}(c_{k}t),
  2. \text {Time}(t)\subseteq \text {Space}(ct\log (nt)),
  3. \text {Space}(s)\subseteq 2\text {TM-Time}(c^{s(n)}\cdot n^{c}).

Proof1. A TM running in time t can only move its heads at most t tape cells. We can write all these contents in one tape. To simulate one step of the k\text {TM} we do a pass on all the contents and update them accordingly.

2. A RAM running in time t can only access \le t memory cells, each containing at most c\log nt bits; the factor n is to take into account that the machine starts with word length \ge \log n. We simulate this machine and for each Write operation we add a record on the tape with the memory cell index and the content, similar to Theorem ??. When the RAM reads from memory, we scan the records and read off the memory values from one of them. If the record isn’t found, then the simulation returns the value corresponding to the initial configuration.

We also allocate c\log nt bits for the registers of the RAM. It can be shown that the operations among them can be performed in the desired space, since they only involve a logarithmic number of bits. A stronger result is proved later in Theorem 7.5.

3. On an input x with |x|=n, a \text {Space}(s) machine can be in at most n^{c}\cdot c^{s(n)} configurations. The first n^{c} factor is for the head position on the input. The second factor is for the contents of the second tape. Since the machine ultimately stops, configurations cannot repeat, hence the same machine (with no modification) will stop in the desired time. QED

A non-trivial saving is given by the following theorem.

Theorem 7.2. [12] k\text {-TM-Time}(t)\subseteq \text {Space}(c_{k}t/\log t), for every function t.

This result is not specific to MTMs but can be proved for other models such as RAMs.

From Theorem 7.1 and the next exercise we have

\begin{aligned} \text {L}\subseteq \text {P}\subseteq \text {NP}\subseteq \text {PH}\subseteq \text {PSpace}\subseteq \text {Exp}. \end{aligned}

Exercise 7.1. Prove \text {PH}\subseteq \text {PSpace}.

Just like for Time, for space one has universal machines and a hierarchy theorem. The latter implies \text {L}\ne \text {PSpace}. Hence, analogously to the situation for Time and NTime (section º5.1), we know that at least one of the inclusions above between L and PSpace is strict. Most people seem to think that all are, but nobody can prove that any specific one is.

7.1 Branching programs

Branching programs are the non-uniform counterpart of \text {Space}, just like circuits are the non-uniform counterpart of \text {Time}.

Definition 7.2. A (branching) program is a directed acyclic graph. A node can be labeled with an input variable, in which case it has two outgoing edges labeled 0 and 1. Alternatively a node can be labeled with 0 or 1, in which case it has no outgoing edges. One special node is the start node.

The space of the program with S nodes is \log S. A program computes a function f:\zonzo by following the path from the starting node, following edge labels corresponding to the input, and outputting b\in \{0,1\} as soon as it reaches a node labeled b.

Theorem 7.3. Suppose a 2TM M computes f:\zonzo in space s. Then f has branching programs of space c_{M}(s+\log n). In particular, any f\in \text {L} has branching programs of size n^{c_{f}}.

ProofEach node in the program corresponds to a configuration. QED

Definition 7.3. The branching program given by Theorem 7.3 is called the configuration graph of M.

The strongest available impossibility result for branching programs is the following.

Theorem 7.4. [16] There are explicit functions (in \text {P}) that require branching programs of size cn^{2}/\log n on inputs of length n.

7.2 The power of L

Computing with severe space bounds, as in L, seems quite difficult. Also, it might be somewhat less familiar than, say, computing within a time bound. It turns out that L is a powerful class capable of amazing computational feats that challenge our intuition of efficient computation. Moreover, these computational feats hinge on deep mathematical techniques of wide applicability. We hinted at this in Chapter 1. We now give further examples. At the same time we develop our intuition of what is computable with little space.

To set the stage, we begin with a composition result. In the previous sections we used several times the simple result that the composition of two maps in P is also in P. This is useful as it allows us to break a complicated algorithm in small steps to be analyzed separately – which is a version of the divide et impera paradigm. A similar composition result holds and is useful for space, but the argument is somewhat less obvious.

Lemma 7.1. Let f_{1}:\{0,1\}^* \to \{0,1\}^* be in \text {Space}(s_{1}) and satisfy |f_{1}(x)|\le m(|x|) for a function m. Suppose f_{2}:\{0,1\}^* \to \{0,1\} is in \text {Space}(s_{2}).

Then the composed function g defined as g(x)=f_{2}(f_{1}(x)) is computable in space c(s_{2}(m(n))+s_{1}(n)+\log nm(n)).

In particular, if f_{1} and f_{2} are in L then g is in L, as long as m\le n^{d} for a constant d.

Exercise 7.2. Prove this.

7.2.1 Arithmetic

A first example of the power of L is given by its ability to perform basic arithmetic. Grade school algorithms use a lot of space, for example they employ space \ge n to multiply two n-bit integers.

Theorem 7.5. The following problems are in L:

  1. Addition of two input integers.
  2. Iterated addition: Addition of any number of input integers.
  3. Multiplication of two input integers.
  4. Iterated multiplication: Multiplication of any number of input integers.
  5. Division of two integers.

Iterated multiplication is of particular interest because it can be used to compute “pseudorandom functions.” Such objects shed light on our ability to prove impossibility results via the “Natural Proofs” connection which we will see in Chapter ??.

Proof of 1. in Theorem 7.5 We are given as input x,y\in \mathbb {N} and an index i and need to compute bit i of x+y. Starting from the least significant bits, we add the bits of x and y, storing the carry of 1 bit in memory. Output bits are discarded until we reach bit i, which is output. QED

Exercise 7.3. Prove 2. and 3. in Theorem 7.5.

Proving 4. and 5. is more involved and requires some of those deep mathematical tools of wide applicability we alluded to before. Division can be computed once we can compute iterated multiplication. In a nutshell, the idea is to use the expansion

\begin{aligned} \frac {1}{x}=\sum _{i\ge 0}(-1)^{i}(x-1)^{i}. \end{aligned}

We omit details about bounding the error. Instead, we point out that this requires computing powers (x-1)^{i} which is an example of iterated multiplication (and in fact is no easier).

So for the rest of this section we focus on iterated multiplication. Our main tool for this the Chinese-remaindering representation of integers, abbreviated CRR.

Definition 7.4. We denote by \mathbb {Z}_{m} the integers modulo m equipped with addition and multiplication (modulo m).

Theorem 7.6. Let p_{1},...,p_{\ell } be distinct primes and m:=\prod _{i}p_{i}. Then \mathbb {Z}_{m} is isomorphic to \mathbb {Z}_{p_{1}}\times \ldots \times \mathbb {Z}_{p_{\ell }}.

The forward direction of the isomorphism is given by the map

x \in \mathbb {Z}_{m} \rightarrow (x \mod p_{1},x \mod p_{2},...,x \mod p_{\ell }).

For the converse direction, there exist integers e_{1},...,e_{\ell }\leq (p')^{c}, depending only on the p_{i} such that the converse direction is given by the map

(x\mod p_{1},x\mod p_{2},...,x\mod p_{\ell }) \to  x:=\sum _{i=1}^{\ell }e_{i}\cdot (x\mod p_{i}).

Each integer e_{i} is 0 mod p_{j} for j\not =i and is 1 mod p_{i}.

Example 7.1. \mathbb {Z}_{6} is isomorphic to \mathbb {Z}_{2}\times \mathbb {Z}_{3}. The equation 2+3=5 corresponds to (0,2)+(1,0)=(1,2). The equation 2\cdot 3=6 corresponds to (0,2)+(1,0)=(0,0). Note how addition and multiplication in CRR are performed in each coordinate separately; how convenient.

To compute iterated multiplication the idea is to move to CRR, perform the multiplications there, and then move back to standard representation. A critical point is that each coordinate in the CRR has a representation of only c\log n bits, which makes it easy to perform iterated multiplication one multiplication at the time, since we can afford to write down intermediate products.

The algorithm is as follows:

Computing the product of input integers x_{1},\ldots ,x_{t}. [6]

  1. Let \ell :=n^{c} and compute the first \ell prime numbers p_{1},p_{2},\ldots ,p_{\ell }.
  2. Convert the input into CRR: Compute (x_{1}\mod p_{1},\ldots ,x_{1}\mod p_{\ell }),\ldots ,(x_{t}\mod p_{1},\ldots ,x_{t}\mod p_{\ell }).
  3. Compute the multiplications in CRR: (\Pi _{i=1}^{t}x_{i}\mod p_{1}),\ldots ,(\Pi _{i=1}^{t}x_{i}\mod p_{\ell }).
  4. Convert back to standard representation.

Exercise 7.4. Prove the correctness of this algorithm.

Now we explain how steps 1, 2, and 3 can be implemented in L. Step 4 can be implemented in L too, but showing this is somewhat technical due to the computation of the numbers e_{i} in Theorem 7.6. However these numbers only depend on the input length, and so we will be able to give a self-contained proof that iterated multiplication has branching programs of size n^{c}.

Step 1

By Theorem ??, the primes p_{i} have magnitude \le n^{c} and so can be represented with c\log n bits. We can enumerate over integers with \le c\log n bits in L. For each integer x we can test if it’s prime by again enumerating over all integers y and z with \le c\log n bits and checking if x=yz, say using the space-efficient algorithm for multiplication in Theorem 7.5. (The space required for this step would in fact be c\log \log n.)

Step 2

We explain how given y\in \{0,1\} ^{n} we can compute (y\mod p_{1},\ldots ,y\mod p_{\ell }) in L. If y_{j} is bit j of y we have that

Step 3

This is a smaller version of the original problem: for each j\leq \ell , we want to compute (\Pi _{i=1}^{t}x_{i}\mod p_{j}) from x_{1}\mod p_{j},\ldots ,x_{t}\mod p_{j}. However, as mentioned earlier, each (x_{i}\mod p_{j}) is at most n^{c} in magnitude and thus has a representation of c\log n bits. Hence we can just perform one multiplication at the time in L.

Step 4

By Theorem 7.6, to convert back to standard representation from CRR we have to compute the map

\begin{aligned} (y\mod p_{1},\ldots ,y\mod p_{\ell })\to \sum _{i=1}^{\ell }e_{i}\cdot (y\mod p_{i}). \end{aligned}

Assuming we can compute the e_{i}, this is just multiplication and iterated addition, which are in L by Theorem 7.5.

Putting the steps together

Combining the steps together we can compute iterated multiplication in L as long as we are given the integers e_{i} in Theorem 7.6.

Theorem 7.7. Given integers x_{1},x_{2},\ldots ,x_{t}, and given the integers e_{1},e_{2},\ldots ,e_{\ell } as in Theorem 7.6, where \ell =n^{c}, we can compute \prod _{i}x_{i} in L.

In particular, because the e_{i} only depend on the input length, but not on the x_{i} they can be hardwired in a branching program.

Corollary 7.1. Iterated multiplication has branching programs of size n^{c}.

Exercise 7.5. Show that given integers x_{1},x_{2},\ldots ,x_{t} and y_{1},y_{2},\ldots ,y_{t} one can decide if

\begin{aligned} \prod _{i=1}^{t}x_{i}=\prod _{i=1}^{t}y_{i} \end{aligned}

in L. You cannot use the fact that iterated multiplication is in L, a result which we stated but not completely proved.

Exercise 7.6. Show that the iterated multiplication of d\times d matrices over the integers has branching programs of size n^{c_{d}}.

7.2.2 Graphs

We now give another example of the power of L.

Definition 7.5. The undirected reachability problem: Given an undirected graph G and two nodes s and t in G determine if there is a path from s to t.

Standard time-efficient algorithms to solve this problem mark nodes in the graph. In logarithmic space we can keep track of a constant number of nodes, but it is not clear how we can avoid repeating old steps forever.

Theorem 7.8. [20] Undirected reachability is in L.

The idea behind this result is that a random walk on the graph will visit every node, and can be computed using small space, since we just need to keep track of the current node. Then, one can derandomize the random walk and obtain a deterministic walk, again computable in L.

7.2.3 Linear algebra

Our final example comes from linear algebra. Familiar methods for solving a linear system

\begin{aligned} Ax=b \end{aligned}

can be done requires a lot of space. For example using elimination we need to rewrite the matrix A. Similarly, we cannot easily compute determinants using small space. However, a different method exists.

Theorem 7.9. [9] Solving a linear system is computable in \text {Space}(c\log ^{2}n).

7.3 Checkpoints

The checkpoint technique is a fundamental tool in the study of space-bounded computation. Let us illustrate it at a high level. Let us consider a graph G, and let us write u\leadsto ^{t}v if there is a path of length \le t from u to v. The technique allows us to trade the length of the path with quantifiers. Specifically, for any parameter b, we can break down paths from u to v in b smaller paths that go through b-1 checkpoints. The length of the smaller paths needs be only t/b (assuming that b divides t). We can guess the breakpoints and verify each smaller path separately, at the price of introducing quantifiers but with the gain that the path length got reduced from t to t/b. The checkpoint technique is thus an instantiation of the general paradigm of guessing computation and verifying it locally, introduced in Chapter 5. One difference is that now we are only going to guess parts of the computation.

The checkpoint technique   u\leadsto ^{t}v\Leftrightarrow \exists p_{1},p_{2},\ldots ,p_{b-1}:\forall i\in \{0,1,b-1\}:p_{i}\leadsto ^{t/b}p_{i+1},   where we denote p_{0}:=u and p_{b}:=v.

An important aspect of this technique is that it can be applied recursively: We can apply it again to the problems p_{i}\leadsto ^{t/b}p_{i+1}. We need to introduce more quantifiers, but we can reduce the path length to t/b^{2}, and so on. We will see several instantiations of this technique, for various settings of parameters, ranging from b=2 to b=n^{c}.

We now utilize the checkpoint technique to show a simulation of small-space computation by small-depth alternating circuits.

Theorem 7.10. A function computable by a branching program with S nodes is also computable by an alternating circuit of depth c\log _{b}S and size S^{b\log _{b}S+c}, for any b\le S.

To illustrate the parameters, suppose S=n^{a}, and let us pick b:=n^{\epsilon } where n\ge c_{a,\epsilon }. Then we have circuits of depth d:=ca/\epsilon and size \le S^{n^{\epsilon }d+c}=2^{n^{c\epsilon }}. In other words, we can have depth d and size 2^{n^{c_{a}/d}}, for every d. This in particular holds for every function in \text {L}. We will later give explicit functions (also in \text {P}) which cannot be computed by circuits of depth d and size 2^{n^{c/d}}, “just short” of ruling out \text {L}. This state of affairs is worth emphasis:

(1) Every f in \text {L} has alternating circuits of depth d and size 2^{n^{c_{f}/d}}. (2) We can prove that there are explicit functions (also in L) which cannot be computed by circuits of depth d and size 2^{n^{c/d}}. (3) Improving the constant in the double exponent for a function in P would yield \text {L}\ne \text {P}. In this sense, the result in (2) is the best possible short of proving a major separation.

ProofWe apply the checkpoint technique to the branching program, recursively, with parameter b. For simplicity we first assume that S is a power of b. Each application of the technique reduces the path length by a factor b. Hence with \log _{b}S applications we can reduce the path length to 1.

In one application, we have an \exists quantifier over b-1 nodes, corresponding to an Or gate with fan-in S^{b-1}, and then a \forall quantifier over b smaller paths, corresponding to an And gate with fan-in b. This gives a tree with S^{b-1}\cdot b\le S^{b} leaves. Iterating, the number of leaves will be

\begin{aligned} (S^{b})^{\log _{b}S}. \end{aligned}

Each leaf can be connected to the input bit on which it depends. The size of the circuit is at most c times the number of leaves.

If S is not a power of b we can view the branching program as having S'\le bS nodes where S' is a power of b . QED

The following is a uniform version of Theorem 7.10, and the proof is similar. It shows that we can speed up space-bounded computation with alternations.

Theorem 7.11. [17]Any f\in L is also in \Sigma _{c_{f}/\epsilon }\text {Time}(n^{\epsilon }), for any \epsilon >0.

ProofLet G be the configuration graph of f. Note this graph has n^{c_{f}} nodes. We need to decide if the start configuration reaches the accept configuration in this graph within t:=n^{c_{f}} steps.

We apply to this graph the checkpoint technique recursively, with parameter b:=n^{\epsilon /2}. Each application of the technique reduces the path length by a factor b. Hence with c_{f}/\epsilon applications we can reduce the path length to

\begin{aligned} \frac {t}{b^{c_{f}/\epsilon }}=\frac {n^{c_{f}}}{n^{c_{f}/2}}\le 1. \end{aligned}

Each quantifier ranges over b\log n^{c_{f}}=c_{f}n^{\epsilon /2}\log n\le n^{\epsilon } bits for large enough n.

There remains to check a path of length 1, i.e., an edge. The endpoints of this edge are two configurations u and v which depend on the quantified bits. The machine can compute the two endpoints in time \log ^{c}m where m is the total number of quantified bits, using rapid access. Once it has u and v it can check if u leads to v in one step by reading one bit from the input. Note m\le n^{\epsilon }\cdot c_{f}/\epsilon , so \log ^{c}m\le c_{f}n^{\epsilon }. QED

We note that by the generic simulation of alternating time by small-depth circuits in Lemma 6.2, the above theorem also gives a result similar to Theorem 7.10.

7.4 Reductions: L vs. P

Again, we can use reductions to related the space complexity of problems. In particular we can identify the problems in P which have space-efficient algorithms iff every problem in P does.

Definition 7.6. A problem f is P-complete if f\in \text {P} and f\in \text {L}\Rightarrow \text {L}=\text {P}.

Definition 7.7. The circuit value problem: Given a circuit C and an input x, compute C(x).

Theorem 7.12. Circuit value is P-complete.

ProofCircuit value is in P since we can evaluate one gate at the time. Now let g\in \text {P}. We can reduce computing g on input x to a circuit value instance, as in the simulation of TMs by circuits in Theorem ??. The important point is that this reduction is computable in L. Indeed, given an index to a gate in the circuit, we can compute the type of the gate and index to its children via simple arithmetic, which is in L by Theorem 7.5, and some computation which only depends on the description of the P-time machine for g. QED

Definition 7.8. The monotone circuit value problem: Given a circuit C with no negations and an input x, compute C(x).

Exercise 7.7. Prove that monotone circuit value is P-complete.

Recall from section 7.2.3 that finding solutions to linear systems

\begin{aligned} Ax=b \end{aligned}

has space-efficient algorithms. Interestingly, if we generalize equalities to inequalities the problem becomes P complete. This set of results illustrates a sense in which “linear algebra” is easier than “optimization.”

Definition 7.9. The linear inequalities problem: Given a d\times d matrix A of integers and a d-dimensional vector, determine if the system Ax\le b has a solution over the reals.

Theorem 7.13. Linear inequalities is P-complete.

ProofThe ellipsoid algorithm shows that Linear inequalities is in P, but we will not discuss this classic result.

Instead, we focus on showing how given a circuit C and an input x we can construct a set of inequalities that are satisfiable iff C(x)=1.

We shall have as many variables v_{i} as gates in the circuit, counting input gates as well.

For an input gate g_{i}=x_{i} add equation v_{i}=x_{i}.

For a Not gate g_{i}=\text {Not}(g_{j}) add equation v_{i}=1-v_{j}.

For an And gate g_{i}=\text {And}(g_{j},g_{k}) add equations 0\le v_{i}\le 1,v_{i}\le v_{j},v_{i}\le v_{k},v_{j}+v_{k}-1\le v_{i}.

The case of Or is similar, or can be dispensed by writing an Or using Not and And.

Finally, if g_{i} is the output gate add equation v_{i}=1.

We claim that in every solution to Av\le b the value of v_{i} is the value in \{0,1\} of gate g_{i} on input x. This can be proved by induction. For input and Not gates this is immediate. For an And gate, note that if v_{j}=0 then v_{i}=0 as well because of the equations v_{i}\ge 0 and v_{i}\le v_{j}. The same holds if v_{k}=0. If both v_{j} and v_{k} are 1 then v_{i} is 1 as well because of the equations v_{i}\le 1 and v_{j}+v_{k}-1\le v_{i}. QED

7.4.1 Nondeterministic space

Because of the insight we gained from considering non-deterministic time-bounded computation in section º5.1, we are naturally interested in non-deterministic space-bounded computation. In fact, perhaps we will gain even more insight, because this notion will really challenge our understanding of computation.

For starters, let us define non-deterministic space-bounded computation. A naive approach is to define it using the quantifiers from section º6.4, leading to the class \exists \cdot \text {L}. This is an ill-fated choice:

Exercise 7.8. Prove \exists \cdot \text {L}=\exists \cdot \text {P}.

Instead, non-deterministic space is defined in terms of non-deterministic TMs.

Definition 7.10. A function f:\{0,1\}^* \to \{0,1\} is computable in \text {NSpace}(s(n)) if there is a two-tape TM which on input x never writes on the first tape and never uses more than s(n) cells on the second, and moreover:

1. The machine is equipped with a special “Guess” state. Upon entering this state, a guess bit is written on the work tape under the head.

2. f(x)=1 iff there exists a choice for the guess bits that causes the machine to output 1.

We define

\begin{aligned} \text {NL}:=\bigcup _{d}\text {NSpace}(d\log n). \end{aligned}

How can we exploit this non-determinism? Recall from section 7.2.2 that reachability in undirected graphs is in L. It is unknown if the same holds for directed graphs. However, we can solve it in NL.

Definition 7.11. The directed reachability problem: Given a directed graph G and two nodes s and t, decide if there is a path from s to t.

Theorem 7.14. Directed reachability \text {is in NL}.

ProofThe proof simply amounts to guessing a path in the graph. The algorithm is as follows:

“On input G,s,t, let v:=s.

For i=0 to |G|:

      If v=t, accept.

      Guess a neighbor w of v. Let v:=w.

If you haven’t accepted, reject.”

The space needed is |v|+|i|=c\log |G|. QED

We can define NL completeness similarly to NP and P completeness, and have the following result.

Theorem 7.15. Directed reachability is NL-complete. That is, it is in NL and it is in L iff \text {L}=\text {NL}.

Exercise 7.9. Prove this.

7.4.2 Nspace vs. time

Recall by definition \text {Space}(s(n))\subseteq \text {NSpace}(s(n)). We showed \text {Space}(s(n))\subseteq \text {Time}(n^{c}c^{s(n)}) in Theorem 7.1. We can strengthen the inclusion to show that it holds even for non-deterministic space.

Theorem 7.16. \text {NSpace}(s(n))\subseteq \text {Time}(n^{c}c^{s(n)}).

ProofOn input x, we compute the configuration graph G of M on input x. The nodes are the configurations, and there is an edge from u to v if the machine can go from u to v in one step. Then we solve reachability on this graph in power time, using say breadth-first-search. QED

The next theorem shows that non-deterministic space is not much more powerful than deterministic space: it buys at most a square. Contrast this with the P vs. NP question! The best determistic simulation of NP that we know is the trivial \text {NP}\subseteq \text {Exp}. Thus the situation for space is entirely different.

How can this be possible? The high-level idea, which was used already in Lemma 7.1, can be cast as follows:

Unlike time, space can be reused.

Theorem 7.17. [23] \text {NSpace}(s(n))\subseteq \text {Space}(cs^{2}(n)), for every s(n)\ge \log n. In particular, \text {NPSpace}=\text {PSpace}.

ProofWe use the checkpoint technique with parameter b=2, and re-use the space to verify the smaller paths. Let N be a non-deterministic TM computing a function in \text {NSpace}(s(n)). We aim to construct a deterministic TM M that on input x returns

\begin{aligned} \text {Reach}(C_{\text {start}},C_{\text {accept}},c^{s(n)}), \end{aligned}

where \text {Reach}(u,v,t) decides if v is reachable from u in \le t steps in the configuration graph of N on input x, and C_{\text {start}} is the start configuration, C_{\text {accept}} is the accept configuration, and c^{s(n)} is the number of configurations of N.

The key point is how to implement Reach.

\text {Computing\text { Reach}\ensuremath {(u,v,t)}}

For all “middle” configurations m

\text {If both \text {Reach}\ensuremath {(u,m,t/2)=1} and \text {Reach}\ensuremath {(m,v,t/2)=1} then Accept.}

Reject

Let S(t) denote the space needed for computing \text {Reach}(u,v,t). We have

\begin{aligned} S(t)\le cs(n)+S(t/2). \end{aligned}

This is because we can re-use the space for two calls to Reach. Therefore, the space for \text {Reach}(C_{\text {start}},C_{\text {accept}},c^{s(n)}) is

\begin{aligned} \le cs(n)+cs(n)+\ldots +cs(n)\le cs^{2}(n). \end{aligned}

QED

Recall that we do not know if \text {Ntime}(t) is closed under complement. It is generally believed not to be the case, and we showed that if it is then the PH collapses Exercise 6.3.

What about space? Is \text {NSpace}(s) closed under complement? Theorem 7.17 shows \text {NSpace}(s(n))\subseteq \text {Space}(cs^{2}(n)). So if f\in \text {NSpace}(s(n)) then the complement function 1-f is in \text {Space}(cs^{2}(n)) (and in particular in \text {Space}(cs^{2}(n))). Hence, up to a quadratic loss in space, non-deterministic space is closed under complement.

Can we avoid squaring the space?

Yes! This is weird!

Theorem 7.18. The complement of Path is in NL.

ProofWe want a non-deterministic 2TM that given G,s,\text {and }t accepts if there is no path from s to t in G.

For starters, suppose the machine has computed the number C of nodes reachable from s. The key idea is that there is no path from s to t iff there are C nodes different from t reachable from s. Thus, knowing C we can solve the problem as follows

 

Algorithm for deciding if there is no path from s to t, given C:

  

\text {Initialize \text {Count}=0; Enumerate over all nodes \ensuremath {v\ne t}}

\text {Guess a path from \ensuremath {s} of length \ensuremath {|G|}. If path reaches \ensuremath {v}, increase \text {Count} by \ensuremath {1} }

If \text {Count}=C Accept, else Reject.

 

There remains to compute C.

Let A_{i} be the nodes at distance \le i from s, and let C_{i}:=|A_{i}|. Note A_{0}=\{s\},c_{0}=1. We seek to compute C=C_{n}.

To compute C_{i+1} from C_{i}, enumerate nodes v (candidate in A_{i+1}). For each v, enumerate over all nodes w in A_{i}, and check if w\to v is an edge. If so, increase C_{i+1} by 1.

The enumeration over A_{i} is done guessing C_{i} nodes and paths from s. If we don’t find C_{i} nodes, we reject. QED

7.5 TiSp

So far in this chapter we have focused on bounding the space usage. For this, the TM model was sufficient, as remarked at the beginning. It is natural to consider algorithms that operate in little time and space. For this, of course, whether we use TMs or RAMs makes a difference.

Definition 7.12. Let \text {TiSp}(t,s) be the functions computable on a RAM that on every input x\in \{0,1\} ^{n} runs in time t(n) and only uses memory cells 0 to s(n)-1.

Exercise 7.10. Prove \text {L}=\bigcup _{d}\text {TiSp}(n^{d},d).

An alternative definition of TiSp would allow the RAM to access s(n) cells anywhere in memory. One can maintain a data structure to show that this alternative definition is equivalent to Definition 7.12.

To illustrate the relationship between TiSp, Time, and Space, consider undirected reachability. It is solvable in Time(n\log ^{c}n) by bradth-first search, and in logarithmic space by Theorem 7.8. But it isn’t known if it is in \text {TiSp}(n\log ^{a}n,a\log n) for some constant a.

Exercise 7.11. Prove that Theorem 7.11 holds even for any function f in \text {TiSp}(n^{a},n^{1-\epsilon }) for any a and \epsilon >0.

The following is a non-uniform version of TiSp.

Definition 7.13. A branching program of length t and width W is a branching program where the nodes are partitioned in t layers L_{1},L_{2},\ldots ,L_{t} where nodes in L_{i} only lead to nodes in L_{i+1}, and |L_{i}|\le W for every i.

Thus t represents the time of the computation, and \log W the space.

Recall that Theorem 7.10 gives bounds of the form \ge cn^{2}/\log n on the size of branching program (without distinguishing between length and width). For branching programd og length t and width W this bound gives t\ge cn^{2}/W\log n. But it gives nothing for power width like W=n^{2}. The state-of-the-art for power width is t\ge \Omega (n\sqrt {\log n/\log \log n}) [27] (in fact the bound holds even for subexponential) .

With these definitions in hand we can refine the connection between branching programs and small-depth circuits in Theorem 7.10 for circuits of depth 3.

Theorem 7.19. Let f:\{0,1\} ^{n}\to \{0,1\} be computable by a branching program with width W and time t. Then f is computable by an alternating depth-3 circuit with \le 2^{c\sqrt {t\log W}} wires.

We will later show explicit functions that require depth-3 circuits of size 2^{c\sqrt {n}}. Theorem 7.19 shows that improving this would also improve results for small-width branching programs, a refinement of the message emphasized before after Theorem 7.10.

Exercise 7.12. Prove this.

7.6 Notes

For a discussion of the complexity of division, see [3]. For a compendium of P-complete problems see [11].

References

[1]   Mikl≤s Ajtai. \Sigma \sp {1}\sb {1}-formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.

[2]   Mikl≤s Ajtai. A non-linear time lower bound for boolean branching programs. Theory of Computing, 1(1):149–176, 2005.

[3]   Eric Allender. The division breakthroughs. Bulletin of the EATCS, 74:61–77, 2001.

[4]   Noga Alon, Oded Goldreich, Johan Hσstad, and RenΘ Peralta. Simple constructions of almost k-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992.

[5]   Kenneth E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, volume 32, pages 307–314, 1968.

[6]   Paul Beame, Stephen A. Cook, and H. James Hoover. Log depth circuits for division and related problems. SIAM J. Comput., 15(4):994–1003, 1986.

[7]   Paul Beame, Michael Saks, Xiaodong Sun, and Erik Vee. Time-space trade-off lower bounds for randomized computation of decision problems. J. of the ACM, 50(2):154–195, 2003.

[8]   Stephen A. Cook. A hierarchy for nondeterministic time complexity. J. of Computer and System Sciences, 7(4):343–353, 1973.

[9]   L. Csanky. Fast parallel matrix inversion algorithms. SIAM J. Comput., 5(4):618–623, 1976.

[10]   Aviezri S. Fraenkel and David Lichtenstein. Computing a perfect strategy for n x n chess requires time exponential in n. J. Comb. Theory, Ser. A, 31(2):199–214, 1981.

[11]   Raymond Greenlaw, H. James Hoover, and Walter Ruzzo. Limits to Parallel Computation: P-Completeness Theory. 02 2001.

[12]   John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. On time versus space. J. ACM, 24(2):332–337, 1977.

[13]   Richard M. Karp and Richard J. Lipton. Turing machines that take advice. L’Enseignement MathΘmatique. Revue Internationale. IIe SΘrie, 28(3-4):191–209, 1982.

[14]   Leonid A. Levin. Universal sequential search problems. Problemy Peredachi Informatsii, 9(3):115–116, 1973.

[15]   Joseph Naor and Moni Naor. Small-bias probability spaces: efficient constructions and applications. SIAM J. on Computing, 22(4):838–856, 1993.

[16]   E. I. Nechiporuk. A boolean function. Soviet Mathematics-Doklady, 169(4):765–766, 1966.

[17]   Valery A. Nepomnjaščiĭ. Rudimentary predicates and Turing calculations. Soviet Mathematics-Doklady, 11(6):1462–1465, 1970.

[18]   NEU. From RAM to SAT. Available at http://www.ccs.neu.edu/home/viola/, 2012.

[19]   Wolfgang J. Paul, Nicholas Pippenger, Endre SzemerΘdi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In IEEE Symp. on Foundations of Computer Science (FOCS), pages 429–438, 1983.

[20]   Omer Reingold. Undirected connectivity in log-space. J. of the ACM, 55(4), 2008.

[21]   J. M. Robson. N by N checkers is exptime complete. SIAM J. Comput., 13(2):252–267, 1984.

[22]   Rahul Santhanam. On separators, segregators and time versus space. In IEEE Conf. on Computational Complexity (CCC), pages 286–294, 2001.

[23]   Walter J. Savitch. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4(2):177–192, 1970.

[24]   Michael Sipser. A complexity theoretic approach to randomness. In ACM Symp. on the Theory of Computing (STOC), pages 330–335, 1983.

[25]   Seinosuke Toda. PP is as hard as the polynomial-time hierarchy. SIAM J. on Computing, 20(5):865–877, 1991.

[26]   Leslie G. Valiant and Vijay V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85–93, 1986.

[27]   N. V. Vinodchandran. A note on the circuit complexity of PP. Theor. Comput. Sci., 347(1-2):415–418, 2005.

[28]   Emanuele Viola. On approximate majority and probabilistic time. Computational Complexity, 18(3):337–375, 2009.

Mathematics of the impossible: Computational Complexity, Chapter 6, Alternation

We placed one quantifier “in front” of computation and got something interesting: NP. So let’s push the envelope.

Definition 6.1. \Sigma _{i}Time(t(n)) is the set of functions f:X\subseteq \{0,1\}^* \to \{0,1\} for which there is a RAM M such that on input (x,y_{1},y_{2},\ldots ,y_{i}) stops within t(|x|) steps and

\begin{aligned} f(x)=1\Leftrightarrow \exists y_{1}\forall y_{2}\exists y_{3}\forall y_{4}\ldots Q_{i}y_{i}\in \{0,1\} ^{t(|x|)}:M(x,y_{1},y_{2},\ldots ,y_{i})=1. \end{aligned}

\Pi _{i}Time(t(n)) is defined similarly except that we start with a \forall quantifier. We also define

\begin{aligned} \text {\ensuremath {\Sigma _{i}\text {P}}}:= & \bigcup _{d}\Sigma _{i}\text {Time}(n^{d}),\\ \text {\ensuremath {\Pi _{i}\text {P}}}:= & \bigcup _{d}\Pi _{i}\text {Time}(n^{d}),\text { and}\\ \text {the power hiearchy PH}:= & \bigcup _{i}\Sigma _{i}\text {P}=\bigcup _{i}\Pi _{i}\text {P}. \end{aligned}

We refer to such computation and the corresponding machines as alternating, since they involve alternation of quantifiers and we will soon see a connection with alternating circuits.

As for NP, Definition 5.1, note that the running time of M is a function of |x| only. Again, this difference is inconsequential for \Sigma _{i}\text {P}, since the composition of two powers is another power. But it is important for a more fine-grained analysis.

Exercise 6.1. Min-Ckt is the problem of deciding if an input circuit has an equivalent circuit which is smaller. It is not known to be in NP. In which of the above classes can you place it?

6.1 Does the PH collapse?

We refer to the event that \exists i:\Sigma _{i}\text {P}=\text {PH} as “the PH collapses.” It is unknown if the PH collapses. Most people appear to believe that it does not, and to consider statements of the type

\begin{aligned} X\Rightarrow \text {PH collapses} \end{aligned}

as evidence that X is false. Examples of such statements are discussed next.

Theorem 6.1. \text {P}=\text {NP}\Rightarrow \text {P}=\text {PH}.

The idea in the proof is simply that if you can remove a quantifier then you can remove more.

ProofWe prove by induction on i that \Sigma _{i}\text {P}\bigcup \Pi _{i}\text {P}=\text {P}.

The base case i=1 follows by assumption and the fact that P is closed under complement.

Next we do the induction step. We assume the conclusion is true for i and prove it for i+1. We will show \Sigma _{i+1}\text {P}=\text {P}. The result about \Pi _{i+1}P follows again by complementing.

Let L\in \sum _{i+1}\text {P}, so \exists a and a power-time TM M such that for any x\in \{0,1\} ^{n},

\begin{aligned} x\in L\Leftrightarrow \exists y_{1}\forall y_{2}\exists y_{3}\forall y_{4}\ldots Q_{i+1}y_{i+1}\in \{0,1\} ^{n^{a}}:M(x,y_{1},y_{2},\ldots ,y_{i+1})=1. \end{aligned}

(As discussed after Definition 6.1 we don’t need to distinguish between time as a function of |x| or of |(x,y_{1},y_{2},\text {\dots },y_{i+1})| when considering power times as we are doing now.)

Now the creative step of the proof is to consider

\begin{aligned} L':=\left \{ (x,y_{1}):\forall y_{2}\in \{0,1\} ^{n^{a}}\ldots Q_{i+1}y_{i+1}\in \{0,1\} ^{n^{a}}:M(x,y_{1},y_{2},\ldots ,y_{i+1})=1\right \} . \end{aligned}

Note L'\in \Pi _{i}\text {P}. By induction hypothesis L'\in \text {P}. So let TM M' solve L' in power time. So x\in L\iff \exists y_{1}\in \{0,1\} ^{n^{a}}:M'(x,y_{1})=1. And so L\in \text {NP\ensuremath {=\text {P}}}, again using the hypothesis. QED

Exercise 6.2. Prove the following strengthening of Theorem 6.1:

\begin{aligned} \bigcup _{d}\text {NTime}(dn)\subseteq \text {Time}(n^{1+\epsilon })\Rightarrow \bigcup _{d}\text {\ensuremath {\Sigma _{i}\text {Time}(dn)\subseteq \text {Time}(n^{1+\epsilon ^{c_{i}}})}.} \end{aligned}

Exercise 6.3. Show that if \Sigma _{i}\text {P}=\Pi _{i}\text {P} for some i then the PH collapses to \Sigma _{i}\text {P}, that is, \text {PH}=\Sigma _{i}\text {P}.

Theorem 6.2. [21] \text {NP}\subseteq \text {PCkt}\Rightarrow \text {PH}=\Sigma _{2}\text {P}.

ProofWe’ll show \Pi _{2}\text {P}\subseteq \Sigma _{2}\text {P} and then appeal to Exercise 6.3. Let f\in \Pi _{2}\text {Time}(n^{d}) and M be a corresponding machine s.t. 

\begin{aligned} f(x)=1\Leftrightarrow \forall y_{1}\in \{0,1\} ^{n^{d}}\exists y_{2}\in \{0,1\} ^{n^{d}}:M(x,y_{1},y_{2})=1. \end{aligned}

We claim the following equivalent expression for the right-hand side above:

\begin{aligned} \forall y_{1}\in \{0,1\} ^{n^{d}}\exists y_{2}\in \{0,1\} ^{n^{d}}:M(x,y_{1},y_{2})=1\Leftrightarrow \exists C\forall y_{1}\in \{0,1\} ^{n^{d}}:M(x,y_{1},C(x,y_{1}))=1, \end{aligned}

where C ranges over circuits of size |x|^{d'} for some d'. If the equivalence is established the result follows, since evaluating a circuit can be done in power time.

To prove the equivalence, first note that the the \Leftarrow direction is obvious, by setting y_{2}:=C(x,y_{1}). The interesting direction is the \Rightarrow . We claim that under the assumption, there is a circuit that given x,y_{1} outputs a string y_{2} that makes M(x,y_{1},y_{2}) accept, if there is such a string.

To verify this, consider the problems CktSat and Search-CktSat which are analogous to the 3Sat and Search-3Sat problems but for general circuits rather than 3CNF. CktSat is in NP, and so by assumption has power-size circuits. By the reduction in Theorem 4.5, Search-CktSat has power-size circuits S as well. Hence, the desired circuit C may, on input x and y_{1} produce a new circuit W mapping an input y_{2} to M(x,y_{1},y_{2}), and run S on W. QED

Exercise 6.4. Prove that \text {PH}\not \subseteq \text {CktGates}(n^{k}), for any k\in \mathbb {N}. (Hint: Existentially guess the truth table of a hard function.)

Improve this to \text {\ensuremath {\Sigma _{2}\text {P}}}\not \subseteq \text {CktGates}(n^{k}).

Exercise 6.5. Prove \text {Exp}\subseteq \text {PCkt}\Rightarrow \text {Exp}=\Sigma _{2}\text {P}.

6.2 PH vs. alternating circuits

As suggested by the word alternation in Definition 6.1, the power hierarchy PH and its subclasses can be seen as alternating circuits. Before presenting this connection it is useful to write problems in PH in a specific format. The next lemma shows that we can restrict the machine M in Definition 6.1 to read only one bit of the input x. The price for this is that we are going to have to introduce an extra quantifier, however this new quantifier will only range over \log t bits.

Lemma 6.1. Let L\in \Sigma _{i}\text {P}. Then there exists a RAM M s.t.:

\begin{aligned} x\in L\Leftrightarrow & \exists y_{1}\in \{0,1\} ^{t(|x|)}\forall y_{2}\in \{0,1\} ^{t(|x|)}\ldots Q_{i-1}y_{i-1}\in \{0,1\} ^{t(|x|)}\\ & Q_{i}(y_{i},z)\in \{0,1\} ^{2t(|x|)}Q_{i+1}y_{i+1}\in \{0,1\} ^{\log t(|x|)}:M(x,y_{1},y_{2},\ldots ,y_{i+1})=1, \end{aligned}

and M on input (x,y_{1},y_{2},\ldots ,y_{i+1}) stops within ct(|x|) steps and only reads one bit of x.

Note the first i-1 quantifiers are over t bits and unchanged from Definition 6.1, the next one is over 2t bits, written as a pair (y_{i},z), and the last is over \log t. The idea is… you guessed it! We are going to guess the bits read from the input, and verify that each of them is correct.

ProofLet M' be the machine corresponding to L in Definition 6.1. We assume that Q_{i}=\exists , i.e., i is odd. The case Q_{i}=\forall is left for exercise.

We enlarge Q_{i} to quantify over additional t bits z which correspond to the bits of x read by M'. The desired machine M simulates M' with the following change. At any time step s, if M' reads bit j of x:

(1) If s\ne y_{i+1} then M does not read x and instead uses bit z_{s},

(2) If s=y_{i+1} then M reads the corresponding bit of x, and it also checks that this bit equals z_{s}. If it doesn’t M rejects.

By inspection, M reads at most one bit of x.

Correctness is argued as follows. For any y_{1},y_{2},\ldots ,y_{i}, if M' accepts x then there are values z for the bits read from the input x that cause M to accept. Conversely, if M accepts for some z then this z matches the bits read from x by M', for else M would reject in (2). Hence M' accepts x as well. QED

We now show how to simulate computation in PH by small-depth circuits. The size of the circuit is exponential in the time; such a simulation would not be interesting if the circuit is not explicit and the time is more than the input length, since every function on n bits has circuits of size 2^{n} (Theorem 2.3). By contrast, the simulation will give explicit circuits and also apply to running times less than the input length. The setting of running time less than the input length makes the simulation interesting even for non-explicit circuits, and is soon to play a critical role.

Lemma 6.2. Any function in \Sigma _{d}\text {Time}(t)\bigcup \Pi _{d}\text {Time}(t) has on inputs of length n alternating circuits of depth d+1 and 2^{cdt(n)} gates. The fan-in of each gate is \le 2^{ct(n)} and the fan-in of the gates closest to the input is \le t(n). Moreover, the circuit is explicit in the following sense: Given an index to a gate g of fan-in h and a number i\le h we can compute the index of child i of g in \text {Time}(cn).

In fact, most of this circuit only depends on t. The only thing that depends on the actual function being computed are the connections between the gates closest to the input and the input.

ProofWe apply Lemma 6.1. Then an \exists (resp. \forall ) quantifier on b bits corresponds to an Or (resp. And) gate with fan-in 2^{b}. A gate g closest to the input correspond to the computation of the RAM for fixed quantified variables. This computation depends on at most one bit of x. If this computation is a constant independent of x, we simply replace this gate with the appropriate constant. Otherwise, if it depends on a bit x_{j} of x then the computation of the RAM is either x_{j} or \neg x_{j}. Thus we can connect gate g to either input gate x_{j} or input gate \neg x_{j}.

An index to a gate g next to the input is just an assignment to the quantified variables y_{i}. Given such an index and i\le t we can compute in linear time which input bit it depends on. This is done by simulating the machine until the i-th time it reads an input bit. Note this simulation runs in time ct which is linear in the length of an index to the gate. QED

6.3 BPP in PH

It is not known if BPP is contained in NP. However, we can show that BPP is in PH. More precisely, the following two simulations are known. The first optimizes the number of quantifiers, the second the time. This should be contrasted with various conditional results suggesting that in fact a quasilinear deterministic simulation (with no quantifiers) is possible.

Theorem 6.3. For every function t we have:

(1) [35] \text {BPTime}(t)\subseteq \Sigma _{2}\text {Time}(t^{2}\log ^{c}t), and

(2) [4
1
]
\text {BPTime}(t)\subseteq \Sigma _{3}\text {Time}(t\log ^{c}t).

A good way to think of these results is as follows. Fix a \text {BPTime}(t) machine M and an input x\in \{0,1\} ^{n}. Now the alternating machine is trying to decide if for most choices of the random bits y we have M(x,y)=1, or if for most choices we have M(x,y)=0. This can be seen as a version of the Majority problem, with two critical features. The first is that it is succinct in the sense of section 5.4.1, that is, the alternating machine does not have access to the exponentially-long majority instance, but rather has access to a small circuit M(x,\cdot ) s.t. M(x,y) is bit y of the majority instance. The second is that instances have a gap. We define this gap-majority problem next. We define the non-succinct version which is of independent interest, and so use the letter n to indicate input length. But recall that for the application later in this section the input is given succinctly, as just remarked, so the gap majority instance will actually be on a number of bits that is exponential in the input length to the alternating machine.

Definition 6.2. \text {Gap-Maj}_{\alpha ,\beta } is the problem of deciding if an input x\in \{0,1\} ^{n} has weight |x|\le \alpha n or |x|\ge \beta n.

As discussed in section º6.2, it is useful to think of alternating computation as alternating circuits. Indeed, the circuit result that is the starting point of all these simulations is the following somewhat surprising construction of small-depth alternating circuits for Gap-Maj. By contrast, (non-gap) Maj does not have small constant-depth alternating circuits, as we will prove in Chapter ??.

Lemma 6.3. [3]\text { Gap-Maj}_{1/3,2/3}(x) has alternating circuits of depth 3 and size n^{c}. Moreover, the gates at distance 1 from the input have fan-in \le c\log n.

ProofThis is a striking application of the probabilistic method. For a fixed pair of inputs (x,y) we say that a distribution C on circuits gives (\le p,\ge q) if \mathbb {P}_{C}[C(x)=1]\le p and \mathbb {P}_{C}[C(y)=1]\ge q; and we similarly define gives with reverse inequalities. Our goal is to have a distribution that gives

\begin{aligned} (\le 2^{-n},\ge 1-2^{-n})~~~~(6.1) \end{aligned}

for every pair (x,y)\in \{0,1\} ^{n}\times \{0,1\} ^{n} where |x|\le n/3 and |y|\ge 2n/3. Indeed, if we have that we can apply a union bound over the <2^{n} inputs to obtain a fixed circuit that solves Gap-Maj.

We construct the distribution C incrementally. Fix any pair (x,y) as above. Begin with the distribution C_{\wedge } obtained by picking 2\log n bits uniformly from the input and computing their And. This gives

\begin{aligned} \left ((1/3)^{2\log n},(2/3)^{2\log n}\right ). \end{aligned}

Let p:=(1/3)^{2\log n} and note (2/3)^{2\log n}=p\cdot n^{2}. So we can say that C_{\wedge } gives

\begin{aligned} \left (\le p,\ge p\cdot n^{2}\right ). \end{aligned}

Now consider the distribution C_{\vee } obtained by complementing the circuits in C_{\wedge }. This gives

\begin{aligned} \left (\ge 1-p,\le 1-p\cdot n^{2}\right ). \end{aligned}

Next consider the distribution C_{\wedge \vee } obtained by taking the And of m:=p^{-1}/n independent samples of C_{\vee }. This gives

\begin{aligned} \left (\ge (1-p)^{m},\le (1-p\cdot n^{2})^{m}\right ). \end{aligned}

To make sense of these quantities we use the basic approximations

\begin{aligned} 1+z \le e^{z }\le 1+2z \end{aligned}

valid for all z \in [0,1]. These imply (1-p)^{m}\ge e^{-pm/2}=e^{-1/(2n)}\ge 0.9 and (1-p\cdot n^{2})^{m}\le e^{-n}. Summarizing, this gives

\begin{aligned} (\ge 0.9,\le e^{-n}). \end{aligned}

Next consider the distribution C_{\vee \wedge } obtained by complementing the circuits in C_{\wedge \vee }. This gives

\begin{aligned} (\le 0.1,\ge 1-e^{-n}). \end{aligned}

Finally, consider the distribution C_{\wedge \vee \wedge } obtained by taking the And of n independent samples of C_{\vee \wedge }. This gives

\begin{aligned} \left (\le 0.1^{n},\ge \left (1-e^{-n}\right )^{n}\right ). \end{aligned}

To make sense of the rightmost quantity we can use the approximation

\begin{aligned} (1+\alpha )^{r}\ge 1+r\alpha \end{aligned}

valid for all \alpha \ge -1 and r\ge 1. Thus this gives

\begin{aligned} \left (\le 0.1^{n},\ge 1-ne^{-n}\right ). \end{aligned}

We have ne^{-n}<2^{-n}. Thus this distribution in particular gives equation (??). The bounds on the number of gates and the fan-in holds by inspection. QED

Exercise 6.6. Prove \text { Gap-Maj}_{1/2-1/\sqrt {\log n},1/2+1/\sqrt {\log n}} has alternating circuits of depth c and size n^{c}.

Exercise 6.7. Assume the circuit in Lemma 6.3 is explicit in the sense of Lemma 6.2. Prove Theorem 6.3.

There remains to construct explicit circuits for Gap-Maj. We give a construction which has worse parameters than Lemma 6.3 but is simple and suffices for (1) in Theorem 6.3. The idea is that if the input weight of x is large, then we can find a few shifts of the ones in x that cover each of the n bits. But if the weight of x is small we can’t. By “shift” by s we mean the string x_{i\oplus s}, obtained from x by permuting the indices by xoring them with s. (Other permutations would work just as well.)

Lemma 6.4. Let r:=\log n. The following circuit solves \text {GapMaj}_{1/r^{2},1-1/r^{2}} on every x\in \{0,1\} ^{n}:

\begin{aligned} \bigvee s_{1},s_{2},\ldots ,s_{r}\in \{0,1\} ^{r}:\bigwedge i\in \{0,1\} ^{r}:\bigvee j\in \{1,2,\ldots ,r\}:x_{i\oplus s_{j}}. \end{aligned}

Note that

\begin{aligned} \bigwedge i\in \{0,1\} ^{r}:\bigvee j\in \{1,2,\ldots ,r\}:x_{i\oplus s_{j}} \end{aligned}

means that every bit i is covered by some shift s_{j} of the input x.

ProofAssume |x|\le n/r^{2}. Each shift s_{i} contributes at most n/r^{2} ones. Hence all the r shifts contribute \le n/r ones, and we do not cover every bit i.

Now assume |x|\ge n(1-1/r^{2}). We show the existence of shifts s_{i} that cover every bit by the probabilistic method. Specifically, for a fixed x we pick the shifts uniformly at random and aim to show that the probability that we do not cover every bit is <1. Indeed:

\begin{aligned} \begin{aligned} & \mathbb {P}_{s_{1},s_{2},\ldots ,s_{r}}[\exists i\in \{0,1\} ^{r}:\forall j\in \{1,2,\ldots ,r\}:x_{i\oplus s_{j}}=0]\\ \le & \sum _{i\in \{0,1\} ^{r}}\mathbb {P}_{s_{1},s_{2},\ldots ,s_{r}}[\forall j\in \{1,2,\ldots ,r\}:x_{i\oplus s_{j}}=0] & \text {(union bound)}\\ = & \sum _{i\in \{0,1\} ^{r}}\mathbb {P}_{s}[x_{i\oplus s}=0]^{r} & \text {(independence of the }s_{i}\text {)}\\ \le & \sum _{i\in \{0,1\} ^{r}}(1/r^{2})^{r} & \text {(by assumption on }|x|\text {)}\\ \le & (2/r^{2})^{r}\\ < & 1, \end{aligned} \end{aligned}

as desired. QED

Exercise 6.8. Prove (1) in Theorem 6.3.

Lemma 6.4 is not sufficient for (2) in Theorem 6.3. One can prove (2) by derandomizing the shifts in Lemma 6.4. This means generating their r^{2} bits using a seed of only r\log ^{c}r bits (instead of the trivial r^{2} in Lemma 6.4.).

Exercise 6.9. Prove:

(1) \text {P}=\text {NP}\Rightarrow \text {P}=\text {BPP}.

(2) \Sigma _{2}\text {P}\subseteq \text {BPP}\Rightarrow \text {PH collapses}.

6.4 The quantifier calculus

We have extended \text {P} with \exists and \forall quantifiers. We have also extended it with randomness to obtain \text {BPP}. As alluded to before, we can also think of BPP as a quantifier \text {BP} applied to \text {P}. The Unique-3Sat problem (Theorem 5.8) also points to a new quantifier, “exists unique.” We now develop a general calculus of quantifiers, and examine fundamental relationships between then. For simplicity, we only consider power-time computation.

Definition 6.3. Let C be a class of functions mapping \{0,1\}^* \to \{0,1\} . We define L\in \text {Op}\cdot C if there is L'\in C and d\in \mathbb {N} such that

  • \text {Op}=\text {Maj}
    \begin{aligned} x\in L & \Leftrightarrow \mathbb {P}_{y\in \{0,1\} ^{|x|^{d}}}[(x,y)\in L']\ge 1/2. \end{aligned}
  • \text {Op}=\text {BP}
    \begin{aligned} x\in L & \Rightarrow \mathbb {P}_{y\in \{0,1\} ^{|x|^{d}}}[(x,y)\in L']\ge 2/3,\\ x\not \in L & \Rightarrow \mathbb {P}_{y\in \{0,1\} ^{|x|^{d}}}[(x,y)\in L']\le 1/3. \end{aligned}
  • \text {Op}=\oplus (read: parity)
    \begin{aligned} x\in L\Leftrightarrow \text {there is an odd number of }y\in \{0,1\} ^{|x|^{d}}:(x,y)\in L'. \end{aligned}
  • \text {Op}=\exists
    \begin{aligned} x\in L\Leftrightarrow \exists y\in \{0,1\} ^{|x|^{d}}:(x,y)\in L'. \end{aligned}
  • \text {Op}=\forall
    \begin{aligned} x\in L\Leftrightarrow \forall y\in \{0,1\} ^{|x|^{d}}:(x,y)\in L'. \end{aligned}

With this notation we have: \text {NP}=\exists \cdot \text {P}, \text {BPP}=\text {BP}\cdot \text {P}, \Sigma _{2}\text {P}=\exists \cdot \forall \cdot \text {P}.

6.5 PH is a random low-degree polynomial

In this section we prove the following result.

Theorem 6.4. [37] \text {PH}\subseteq \text {BP}\cdot \oplus \cdot \text {P}.

This is saying that any constant number of \exists and \forall quantifier can be replaced by a BP quantifier followed by a \oplus quantifier. Let’s see what this has to do with the title of this section. Where is the polynomial? Consider polynomials over \mathbb {F}_{2}\in \{0,1\} . Recall that such a polynomial over n bits is an object like

\begin{aligned} p(x_{1},x_{2},\ldots ,x_{n})=x_{1}\cdot x_{2}+x_{3}+x_{7}\cdot x_{2}\cdot x_{1}+x_{2}+1. \end{aligned}

Because we are only interested in inputs in \{0,1\} we have x^{i}=x for any i\ge 1 and any variable x, so we don’t need to raise variables to powers bigger than one.

Example 6.1. The And function on n bits can be written as the polynomial

\begin{aligned} \text {And}(x_{1},x_{2},\ldots ,x_{n})=\prod _{i=1}^{n}x_{i}. \end{aligned}

The Or function on n bits can be written as the polynomial

\begin{aligned} \text {Or}(x_{1},x_{2},\ldots ,x_{n})=1+\text {And}(1+x_{1},1+x_{2},\ldots ,1+x_{n})=1+\prod _{i=1}^{n}(1+x_{i}). \end{aligned}

For n=2 we have

\begin{aligned} \text {Or}(x_{1},x_{2})=x_{1}+x_{2}+x_{1}\cdot x_{2}. \end{aligned}

The polynomial corresponding to a PH computation will have an exponential number of terms, so we can’t write it down. The big sum over all its monomials corresponds to the \oplus in Theorem 6.4. The polynomial will be sufficiently explicit: we will be able to compute each of its monomials in P. Finally, there won’t be just one polynomial, but we will have a distribution on polynomials, and that’s the BP part.

Confusing? Like before, a good way to look at this result is in terms of alternating circuits. We state the basic circuit result behind Theorem 6.4 after a definition. The result is of independent interest and will be useful later.

Definition 6.4. A distribution P on polynomials computes a function f:\zonzo with error \epsilon if for every x we have

\begin{aligned} \mathbb {P}_{P}[P(x)=f(x)]\ge 1-\epsilon . \end{aligned}

Theorem 6.5. Let C:\{0,1\} ^{n}\to \{0,1\} be an alternating circuit of depth d and size s. Then there is a distribution P on polynomials over \mathbb {F}_{2} of degree \log ^{d}s/\epsilon that computes C with error \epsilon .

Ultimately we only need constant error, but the construction requires small error. Jumping ahead, this is because we construct distributions for each gate separately, and we need the error to be small enough for a union bound over all gates in the circuit.

The important point in Theorem 6.5 is that if the depth d is small (e.g., constant) (and the size is not enormous and the error is not too small) then the degree is small as well. For example, for power-size alternating circuits of constant depth the degree is power logarithmic for constant error.

Let us slowly illustrate the ideas behind Theorem 6.5 starting with the simplest case: C is just a single Or gate on n bits.

Lemma 6.5. For every \epsilon and n there is a distribution P on polynomials of degree \log 1/\epsilon in n variables over \mathbb {F}_{2} that computes Or with error \epsilon .

ProofFor starters, pick the following distribution on linear polynomials: For a uniform A=(A_{1},A_{2},\ldots ,A_{n})\in \{0,1\} ^{n} output the polynomial

\begin{aligned} p_{A}(x_{1},x_{2},\ldots ,x_{n}):=\sum _{i}A_{i}\cdot x_{i}. \end{aligned}

Let us analyze how p_{A} behaves on a fixed input x\in \{0,1\} ^{n}:

  • If \text {Or}(x)=0 then p_{A}(x)=0;
  • If \text {Or}(x)=1 then \mathbb {P}_{A}[p_{A}(x)=1]\ge 1/2.

While the error is large in some cases, a useful feature of p_{A} is that it never makes mistakes if \text {Or}(x)=0. This allows us to easily reduce the error by taking t:=\log 1/\epsilon polynomials p_{A} and combining them with an Or.

\begin{aligned} p_{A_{1},A_{2},\ldots ,A_{t}}(x):=p_{A_{1}}(x)\vee p_{A_{2}}(x)\vee \cdots \vee p_{A_{t}}(x). \end{aligned}

The analysis is like before:

  • If \text {Or}(x)=0 then p_{A_{1},A_{2},\ldots ,A_{t}}(x)=0;
  • If \text {Or}(x)=1 then \mathbb {P}_{A_{1},A_{2},\ldots ,A_{t}}[p{}_{A_{1},A_{2},\ldots ,A_{t}}(x)=1]\ge 1-(1/2)^{t}\ge 1-\epsilon .

It remains to bound the degree. Each p_{A_{i}} has degree 1. The Or on t bits has degree t by Example 6.1. Hence the final degree is t=\log 1/\epsilon . QED

Exercise 6.10. Obtain the same result for C=\text {And}.

Now we would like to handle general circuits which have any number of And and Or gates. As mentioned earlier, we apply the construction above to every gate, and compose the polynomials. We pick the error at each gate small enough so that we can do a union bound over all gates.

Proof of Theorem 6.5 We apply Lemma 6.5 to every gate in the circuit with error \epsilon /s. By a union bound, the probability that any gate makes a mistake is \epsilon , as desired.

The final polynomial is obtained by composing the polynomials of each gate. The composition of a polynomial of degree d_{1} with another of degree d_{2} results in a polynomial of degree d_{1}\cdot d_{2}. Since each polynomial has degree \log s/\epsilon , the final degree is \log ^{d}s/\epsilon . QED

6.5.1 Back to PH

We have proved Theorem 6.5 which is a circuit analogue of Theorem 6.4. We now go back to the PH. First, we have to come up with a more explicit description of the polynomials, instead of picking them at random. This is similar to the way we proceeded in section º6.3: After a non-explicit construction (Lemma 6.3) we then obtained an explicit construction (Lemma 6.4).

Let us go back to the simplest case of Or. Recall that the basic building block in the proof of Lemma 6.5 was the construction of a distribution p_{A} on linear polynomials which are zero on the all-zero input (which just means that they do not have constant terms), and are often non-zero on any non-zero input. We introduce a definition, since now we will have several constructions with different parameters.

Definition 6.5. A distribution p_{A} on linear polynomials with no constant term has the Or property if \mathbb {P}_{A}[p_{A}(x)=1]\ge 1/3 for any x\ne 0. We identify p_{A} with the n bits A corresponding to its coefficients.

The next lemma shows that we can compute distributions on linear polynomials with the Or property from a seed of just \log n+c bits, as opposed to the n bits that were used for A in the proof of Lemma 6.5. Recall that for our application to Lemma 6.5 the polynomials have an exponential number of monomials and so we cannot afford to write them down. Instead we shall guarantee that given a seed r and an index to a monomial we can compute the monomial via a function f in P. In this linear case, for a polynomial in n variables we have \le n monomials x_{i}. So the function f takes as input r and a number i\le n and outputs the coefficient to x_{i}.

Lemma 6.6. [25] Given n, i\le n, and r\in \{0,1\} ^{2\log n+c} we can compute in P a function f(r,i) such that for uniform R\in \{0,1\} ^{2\log n+c} the distribution

\begin{aligned} (f(R,1),f(R,2),\ldots ,f(R,n)) \end{aligned}

has the Or property.

Proof[4] Let q:=2^{\log n+c} and identify the field \mathbb {F}_{q} with bit strings of length \log q. We view r as a pair (s,t)\in (\mathbb {F}_{q})^{2}. Then we define

\begin{aligned} f((s,t),i):=\langle s^{i},t\rangle \end{aligned}

where s^{i} is exponentiation in \mathbb {F}_{q} and \langle .,.\rangle :(\mathbb {F}_{q})^{2}\to \{0,1\} is defined as \langle u,v\rangle :=\sum u_{i}\cdot v_{i} over \mathbb {F}_{2}.

To show that this has the Or property, pick any non-zero x\in \{0,1\} ^{n}. We have to show that

\begin{aligned} p:=\mathbb {P}_{S,T}[\sum _{i}\langle S^{i},T\rangle x_{i}=1]\ge 1/3. \end{aligned}

The critical step is to note that

\begin{aligned} \sum _{i}\langle S^{i},T\rangle x_{i}=\sum _{i}\langle x_{i}\cdot S^{i},T\rangle =\langle \sum _{i}x_{i}\cdot S^{i},T\rangle . \end{aligned}

Now, if x\ne 0, then the probability over S that \sum _{i}x_{i}\cdot S^{i}=0 is \le n/q\le 1/6. This is because any S that gives a zero is a root of the non-zero, univariate polynomial q(y):=\sum _{i}x_{i}\cdot y^{i} of degree \le n over \mathbb {F}_{q}, and so the bound follows by Theorem 5.8.

Whenever \sum _{i}x_{i}\cdot S^{i}\ne 0, the probability over T that \langle \sum _{i}x_{i}\cdot S^{i},T\rangle =0 is 1/2. Hence our target probability p above satisfies

\begin{aligned} p\ge 1/2-1/6 \end{aligned}

as desired. QED

Exercise 6.11. Give an alternative construction of a distribution with the Or property following this guideline.

(1) Satisfy the Or property for every input x with weight 1.

(2) For any j, satisfy the Or property for every input x with weight between 2^{j} and 2^{j+1}. Use Lemma 5.2.

(3) Combine (2) with various j to satisfy the Or property for every input.

(4) State the seed length for your distribution and compare it to that of Lemma 6.6.

With this in hand, we can now reduce the error in the same way we reduced it in the proof of Lemma 6.5.

Lemma 6.7. Given n, a seed r\in \{0,1\} ^{c\log (1/\epsilon )\log n}, and m\le n^{c\log 1/\epsilon } we can compute in P a monomial X_{r,m} of degree c\log 1/\epsilon such that the distribution

\begin{aligned} \sum _{m}X_{R,m} \end{aligned}

for uniform R computes Or with error \epsilon .

ProofWe use the construction

\begin{aligned} p_{A_{1},A_{2},\ldots ,A_{t}}(x):=p_{A_{1}}(x)\vee p_{A_{2}}(x)\vee \cdots \vee p_{A_{t}}(x) \end{aligned}

from the proof of Lemma 6.5, except that each A_{i} is now generated via Lemma 6.5, and that t=c\log 1/\epsilon (as opposed to t=\log 1/\epsilon before). The bound on the degree is the same as before, as is the proof that it computes Or: The error will be (1/3)^{c\log 1/\epsilon }\le \epsilon .

There remains to show that the monomials can be computed in P. For this we can go back to the polynomial for Or in Example 6.1. Plugging that gives

\begin{aligned} p_{A_{1},A_{2},\ldots ,A_{t}}(x)=\sum _{a\in \{0,1\} ^{t}:a\ne 0}\prod _{i\le t}(p_{A_{i}}(x)+a_{i}+1). \end{aligned}

We can use m to index a choice of a and then a choice for a monomial in each of the t linear factors p_{A_{i}}(x)+a_{i}+1. For each factor we can use Lemma 6.6 to compute the monomials. QED

We can now use Lemma 6.7 to prove Theorem 6.4. As in the proof Theorem 6.5 we will convert each gate to a polynomial.

Proof of Theorem 6.4 Let L\in \text {PH}. Apply Lemma 6.2 to obtain a corresponding alternating circuit C of depth d and size s\le 2^{n^{d}} for a constant d. Replace each gate with the distribution on polynomials given by Lemma 6.7. Note each of these polynomials is on 2^{n^{c_{d}}}variables and has degree n^{c_{d}}. We set the error \epsilon in Lemma 6.7 to be 1/(3s). This guarantees that the seed length in each application of Lemma 6.7 is \le n^{c_{d}}. Moreover, the seed used to sample the polynomials is re-used across all gates. We can afford this because we use a union bound in the analysis. Hence the seed length for all the polynomials is again just n^{c_{d}}. We can quantify over this many bits using the BP quantifier.

Finally, we have to compose the polynomials at each gate. We are going to show how we can compute the monomials of the composed polynomial in the same way as we computed monomials in Lemma 6.7. Once we have a monomial in the input variables x_{1},x_{2},\ldots ,x_{n} we simply evaluate it in P on the input bits.

Start at the output gate. We use n^{c_{d}} bits in the \oplus quantifier to choose a monomial in the corresponding polynomial. We write down this monomial, using Lemma 6.7. This monomial is over the 2^{n^{d}} variables z_{1},z_{2},\ldots corresponding to the children of the output gate, and only contains n^{c_{d}} variables. To each z_{i} there corresponds a polynomial p_{i}. Choose a monomial from p_{i} and replace z_{i} with that monomial; do this for every i. The choice of the monomials is done again using bits quantified by the \oplus quantifier. We continue in this way until we have monomials just in the input variables x_{1},x_{2},\ldots ,x_{n}. Because each monomial is over n^{c_{d}} variables, and the depth is constant, the total number of bits in the \oplus quantifier, which are needed to choose monomials, is n^{c_{d}}. QED

Exercise 6.12. A set C\subseteq \{0,1\} ^{n} of size 2^{k} is called linear if there exists an n\times k matrix M over \mathbb {F}_{2} such that C=\{Mx:x\in \{0,1\} ^{k}\}.

  1. Recall error-correcting codes from Exercise 2.18. Prove that a linear set C is an error-correcting code iff the weight of any non-zero string in C is at least n/3.
  2. Prove the existence of linear error-correcting codes matching the parameters in Exercise 2.18.
  3. Let S be a subset of \{0,1\} ^{k} s.t. the uniform distribution over S has the Or property. Define |S|\times k matrix M_{S} where the rows are the elements of S. Prove that \{M_{S}x:x\in \{0,1\} ^{k}\} is an error-correcting code.
  4. Give explicit error-correcting codes over ck^{2} bits of size 2^{k}.
  5. This motivates improving the parameters of distributions with the Or property. Improve the seed length in Lemma 6.6 to \log n+c\log \log n. Hint: What property you need from T?
  6. Give explicit error-correcting codes over k\log ^{c}k bits of size 2^{k}.

6.6 The power of majority

Exercise 6.13. [The power of majority] Prove:

  1. \text {BPP}\cdot \text {P}\subseteq \text {Maj}\cdot \text {P}.
  2. \{(M,x,t):\text { the number of }y\in \{0,1\} ^{|t|}\text { such that }M(x,y)=1\text { is }\ge t\}\in \text {Maj}\cdot \text {P}.
  3. The same as 2. but with \ge replaced by \le .
  4. \text {NP}\subseteq \text {Maj}\cdot \text {P}.
  5. Maj \cdot \oplus \cdot \text {P}\subseteq \text {Maj}\cdot \text {Maj}\cdot \text {P}. (Hint: This is confusing if you don’t have the right angle, otherwise is not too bad. For starters, forget everything and imagine you have an integer w in some range and you want to know if w is odd by just asking questions of the type w\ge t and w\le t', for various t,t'. You want that the number of questions with answer “yes” only depends on whether w is odd or even.)
  6. \text {PH}\subseteq \text {Maj}\cdot \text {Maj}\cdot \text {P}.

It is not known if \text {NP} has linear-size circuits. We saw in Exercise 6.4 that PH does not have circuits of size n^{k}, for any k. By Exercise 6.13 this holds for \text {Maj}\cdot \text {Maj}\cdot \text {P} as well. The following result improves this Maj \cdot P. It is particularly interesting because it cannot be established using a well-studied class of techniques which includes all the results about PH we have encountered so far.

Theorem 6.6. [40] \text {Maj}\cdot \text {P}\not \subseteq \text {CktGates}(n^{k}), for any k\in \mathbb {N}.

References

[1]   Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59–78. IEEE Computer Society, 2015.

[2]   Leonard Adleman. Two theorems on random polynomial time. In 19th IEEE Symp. on Foundations of Computer Science (FOCS), pages 75–83. 1978.

[3]   Mikl≤s Ajtai. \Sigma \sp {1}\sb {1}-formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.

[4]   Noga Alon, Oded Goldreich, Johan Hσstad, and RenΘ Peralta. Simple constructions of almost k-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992.

[5]   Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. J. Comput. Syst. Sci., 18(2):155–193, 1979.

[6]   Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof verification and the hardness of approximation problems. J. of the ACM, 45(3):501–555, May 1998.

[7]   Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087–1097, 2018.

[8]   Kenneth E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, volume 32, pages 307–314, 1968.

[9]   Stephen A. Cook. A hierarchy for nondeterministic time complexity. J. of Computer and System Sciences, 7(4):343–353, 1973.

[10]   Aviezri S. Fraenkel and David Lichtenstein. Computing a perfect strategy for n x n chess requires time exponential in n. J. Comb. Theory, Ser. A, 31(2):199–214, 1981.

[11]   Anka Gajentaan and Mark H. Overmars. On a class of {O}(n^2) problems in computational geometry. Comput. Geom., 5:165–185, 1995.

[12]   M. R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.

[13]   K. G÷del. ▄ber formal unentscheidbare sΣtze der Principia Mathematica und verwandter systeme I. Monatsh. Math. Phys., 38, 1931.

[14]   Oded Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008.

[15]   David Harvey and Joris van der Hoeven. Integer multiplication in time O(n\mathrm {log}\, n). Annals of Mathematics, 193(2):563 – 617, 2021.

[16]   F. C. Hennie. One-tape, off-line turing machine computations. Information and Control, 8(6):553–578, 1965.

[17]   Fred Hennie and Richard Stearns. Two-tape simulation of multitape turing machines. J. of the ACM, 13:533–546, October 1966.

[18]   Russell Impagliazzo and Ramamohan Paturi. The complexity of k-sat. In IEEE Conf. on Computational Complexity (CCC), pages 237–, 1999.

[19]   Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Computer & Systems Sciences, 63(4):512–530, Dec 2001.

[20]   Russell Impagliazzo and Avi Wigderson. \mathit {P} = \mathit {BPP} if E requires exponential circuits: Derandomizing the XOR lemma. In 29th ACM Symp. on the Theory of Computing (STOC), pages 220–229. ACM, 1997.

[21]   Richard M. Karp and Richard J. Lipton. Turing machines that take advice. L’Enseignement MathΘmatique. Revue Internationale. IIe SΘrie, 28(3-4):191–209, 1982.

[22]   Kojiro Kobayashi. On the structure of one-tape nondeterministic turing machine time hierarchy. Theor. Comput. Sci., 40:175–193, 1985.

[23]   Leonid A. Levin. Universal sequential search problems. Problemy Peredachi Informatsii, 9(3):115–116, 1973.

[24]   O. B. Lupanov. A method of circuit synthesis. Izv. VUZ Radiofiz., 1:120–140, 1958.

[25]   Joseph Naor and Moni Naor. Small-bias probability spaces: efficient constructions and applications. SIAM J. on Computing, 22(4):838–856, 1993.

[26]   NEU. From RAM to SAT. Available at http://www.ccs.neu.edu/home/viola/, 2012.

[27]   Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. J. Comput. Syst. Sci., 43(3):425–440, 1991.

[28]   Wolfgang J. Paul, Nicholas Pippenger, Endre SzemerΘdi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In IEEE Symp. on Foundations of Computer Science (FOCS), pages 429–438, 1983.

[29]   Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. of the ACM, 26(2):361–381, 1979.

[30]   J. M. Robson. N by N checkers is exptime complete. SIAM J. Comput., 13(2):252–267, 1984.

[31]   Rahul Santhanam. On separators, segregators and time versus space. In IEEE Conf. on Computational Complexity (CCC), pages 286–294, 2001.

[32]   Arnold Sch÷nhage. Storage modification machines. SIAM J. Comput., 9(3):490–508, 1980.

[33]   Claude E. Shannon. The synthesis of two-terminal switching circuits. Bell System Tech. J., 28:59–98, 1949.

[34]   Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Mathematics of Computation, 54(189):435–447, 1990.

[35]   Michael Sipser. A complexity theoretic approach to randomness. In ACM Symp. on the Theory of Computing (STOC), pages 330–335, 1983.

[36]   Larry Stockmeyer and Albert R. Meyer. Cosmological lower bound on the circuit complexity of a small problem in logic. J. ACM, 49(6):753–784, 2002.

[37]   Seinosuke Toda. PP is as hard as the polynomial-time hierarchy. SIAM J. on Computing, 20(5):865–877, 1991.

[38]   Alan M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. London Math. Soc., s2-42(1):230–265, 1937.

[39]   Leslie G. Valiant and Vijay V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85–93, 1986.

[40]   N. V. Vinodchandran. A note on the circuit complexity of PP. Theor. Comput. Sci., 347(1-2):415–418, 2005.

[41]   Emanuele Viola. On approximate majority and probabilistic time. Computational Complexity, 18(3):337–375, 2009.

[42]   Emanuele Viola. Reducing 3XOR to listing triangles, an exposition. Available at http://www.ccs.neu.edu/home/viola/, 2011.

[43]   Emanuele Viola. Pseudorandom bits and lower bounds for randomized turing machines. Theory of Computing, 18(10):1–12, 2022.

Mathematics of the impossible: Computational Complexity, Chapter 5, Completeness: Reducing arbitrary computation

In this chapter we show how to reduce arbitrary computation to 3Sat (and hence to the other problems in section º4.3). What powers everything is the following landmark and, in hindsight, simple result which reduces circuit computation to 3Sat.

Theorem 5.1. Given a circuit C:\{0,1\} ^{n}\to \{0,1\} with s gates we can compute in \text {P} a 3CNF formula f_{C} in n+s variables such that for every x\in \{0,1\} ^{n}:

\begin{aligned} C(x)=1\Leftrightarrow \exists y\in \{0,1\} ^{s}:f_{C}(x,y)=1. \end{aligned}

The key idea to guess computation and check it efficiently, using that computation is local. The additional s variables one introduces contain the values of the gates during the computation of C on x. We simply have to check that they all correspond to a valid computation, and this can be written as 3CNF because each gate depends on at most two other gates.

ProofIntroduce a variable y_{i} for each non-input gate g_{i} in C. The value of y_{i} is intended to be the value of gate g_{i} during the computation. Whether the value of a gate g_{i} is correct is a function of 3 variables: y_{i} and the \le 2 gates that input g_{i}, some of which could be input variables. This can be written as a 3CNF by Theorem 2.3. Take an And of all these 3CNFs. Finally, add clause y_{o} for the output gate g_{o}. QED

Exercise 5.1. Write down the 3CNF for the circuit in figure 2.2, as given by the proof of Theorem 5.1.

Theorem 5.1 is a depth-reduction result. Indeed, note that a 3CNF can be written as a circuit of depth c\log s, whereas the original circuit may have any depth. This is helpful for example if you don’t have the depth to run the circuit yourself. You can let someone else produce the computation, and you can check it in small depth.

We can combine Theorem 5.1 with the simulations in Chapter 2 to reduce computation in other models to 3SAT. In particular, we can reduce MTMs running in time t to 3Sat of size t\log ^{c}t. To obtain such parameters we need the quasilinear simulation of MTMs by circuits, Theorem 2.5.

However, recall that a quasilinear simulation of RAMs by circuits is not known. Only a power simulation is (which is obtained by combining the power simulation of RAMs by MTMs, Theorem 2.6, with a simulation of MTMs by circuits). This would reduce RAM computation running in time t to 3CNFs of size t^{c}. We content ourselves with this power loss for the beginning of this chapter. Later in section º5.3 we will obtain a quasi-linear simulation using an enjoyable argument which also bypasses Theorem 2.5.

In fact, these simulations apply to a more general, non-deterministic, model of computation. We define this model next, and then present the simulation with power loss in 5.2.

5.1 Nondeterministic computation

In the concluding equation in Theorem 5.1 there is an \exists quantifier on the right-hand side, but there isn’t one on the left, next to the circuit. However, because the simulation works for every input, we can “stick” a quantifier on the left and have the same result. The resulting circuit computation C(x,y) has two inputs, x and y. We can think of it as a non-deterministic circuit, which on input x outputs 1 iff \exists y:C(x,y). Following the discussion before, we could do the same for other models like TMs, MTMs, and RAMs. The message here is that – if we allow for an \exists quantifier, or in other words consider nondeterministic computation – efficient computation is equivalent to 3CNF! This is one motivation for formally introducing a nondeterministic computational model.

Definition 5.1. NTime(t(n)) is the set of functions f:X\subseteq \{0,1\}^* \to \{0,1\} for which there is a RAM M such that:

f(x)=1 iff \exists y\in \{0,1\} ^{t(|x|)} such that M(x,y)=1, and

M(x,y) stops within t(|x|) steps on every input (x,y).

We also define

\begin{aligned} \text {NP}:= & \bigcup _{d\ge 1}\text {NTime}(n^{d}),\\ \text {NExp}:= & \bigcup _{d\ge 1}\text {NTime}(2^{n^{d}}). \end{aligned}

Note that the running time of M is a function of |x|, not |(x,y)|. This difference is inconsequential for NP, since the composition of two powers is another power. But it is important for a more fine-grained analysis. We refer to a RAM machine as in Definition 5.1 as a nondeterministic machine, and to the y in M(x,y) as the nondeterministic choices, or guesses, of the machine on input x.

We can also define NTime in a way that is similar to BPTime, Definition 2.7. The two definitions are essentially equivalent. Our choice for BPTime is motivated by the identification of BPTime with computation that is actually run. For example, in a programming language one uses an instruction like Rand to obtain random values; one does not think of the randomness as being part of the input. By contrast, NTime is a more abstract model, and the definition with the nondeterministic guesses explicitly laid out is closer in spirit to a 3CNF.

All the problems we studied in section º4.3 are in \text {NP}.

Fact 5.1. 3Sat, Clique, Cover-by-vertexes, SubsetSum, and 3Color are in \text {NP}.

ProofFor a 3Sat instance f, the variables y correspond to an assignment. Checking if the assignment satisfies f is in P. This shows that 3Sat is in NP. QED

Exercise 5.2. Finish the proof by ad
dressing the other problems in Fact 5.1

How to think of NP

We can think of \text {NP} as the problems which admit a solution that can be verified efficiently, namely in \text {P}. For example for 3Sat it is easy to verify if an assignment satisfies the clauses, for 3Color it is easy to verify if a coloring is such that any edge has endpoints of different colors, for SubsetSum it is easy to verify if a subset has a sum equal to a target, and so on. However, as we saw above this verification step can be cast in a restricted model, namely a 3CNF. So we don’t have to think of the verification step as using the full power of \text {P} computation.

Here’s a vivid illustration of NP. Suppose I claim that the following matrix contains a 9:

56788565634705634705637480563476

70156137805167840132838202386421

85720582340570372307580234576423

80275880237505788075075802346518

78502378564067807582348057285428

05723748754543650350562378804337

52305723485008160234723884077764

86543234567865435674567836738063

45463788486754345743457483460040

73273873486574375464584895741832

85075783485634856237847287422112

83748874883753485745788788223201

How can you tell, without tediously examining the whole matrix? However, if I tell you that it’s in row 10, 8 digits from the right, you can quickly check that I am right. I won’t be able to cheat, since you can check my claims. On the other hand I can provide a proof that’s easy to verify.

P vs. NP

The flagship question of complexity theory is whether \text {P}=\text {NP} or not. This is a young, prominent special case of the grand challenge we introduced in Chapter 3. Contrary to the analogous question for \text {BPP}, cf. section 2.5.2, the general belief seems to be that \text {P}\ne \text {NP}. Similarly to BPP, cf. Theorem 2.9, the best deterministic simulation of \text {NP} runs in exponential time by trying all nondeterministic guesses. This gives the middle inclusion in the following fact; the other two are by definition.

Fact 5.2. \text {P}\subseteq \text {NP}\subseteq \text {Exp \ensuremath {\subseteq \text {NExp}}}.

A consequence of the Time Hierarchy Theorem 3.4 is that \text {P}\ne \text {Exp}. From the inclusions above it follows that

\begin{aligned} \text {P}\ne \text {NP}\text { or NP}\ne \text {Exp, possibly both}. \end{aligned}

Thus, we are not completely clueless, and we know that at least one important separation is lurking somewhere. Most people appear to think that both separations hold, but we are unable to prove either.

For multi-tape machines, a separation between deterministic and non-deterministic linear time is in [2427].

5.2 NP-completeness

We now go back to the question at the beginning of this chapter about reducing arbitrary computation to 3Sat. We shall reduce all of NP to 3Sat in Theorem 5.2. Problems like 3Sat admitting such reductions deserve a definition.

Definition 5.2. We call a problem L:

NP-hard if every problem in \text {NP} reduces to L in \text {P};

NP-complete if it is NP-hard and in NP.

One can define \text {NP}-hard (and hence NP-complete) w.r.t. different reductions, cf. Chapter 4, and we will do so later. But the simple choice above suffices for now.

Complete problems are the “hardest problems” in the class, as formalized in the following fact.

Fact 5.3. Suppose \text {L} is NP-complete. Then L\in \text {P}\Leftrightarrow \text {P}=\text {NP}.

Proof(\Leftarrow ) This is because L\in \text {NP}.

(\Rightarrow ) Let L'\in \text {NP}. Because L is NP-hard we know that L\in \text {P\ensuremath {\Rightarrow L'\in \text {P}}.} QED

Fact 5.3 points to an important interplay between problems and complexity classes. We can study complexity classes by studying their complete problems, and vice versa.

The central result in the theory of NP completeness is the following.

Theorem 5.2. [720] 3Sat is NP-complete.

Proof3Sat is in \text {NP} by Fact 5.1. Next we prove NP-hardness. The main idea is Theorem 5.1, while the rest of the proof mostly amounts to opening up definitions and using some previous simulations. Let L\in \text {NP} and let M be the corresponding TM which runs in time n^{d} on inputs (x,y) where |x|=n and |y|=n^{d}, for some constant d. We can work with TMs instead of RAMs since they are equivalent up to a power loss, as we saw in Theorem 2.6. We can construct in \text {P }a circuit C(x,y) of size c_{M}n^{c_{d}} such that for any x,y we have M(x,y)=1\Leftrightarrow C(x,y)=1 by Theorem 2.4.

Now, suppose we are given an input w for which we are trying to decide membership in L. This is equivalent to deciding if \exists y:C(w,y)=1 by what we just said. We can “hard-wire” w into C to obtain the circuit C_{w}(y):=C(w,y) only on the variables y, with no loss in size. Here by “hard-wise” se mean replacing the input gates x with the bits of w. Now we can apply Theorem 5.1 to this new circuit to produce a 3CNF f_{w} on variables y and new variables z such that C_{w}(y)=1\Leftrightarrow \exists z:f(y,z)=1, for any y. The size of f_{w} and the number of variables z is power in the size of the circuit.

We have obtained:

\begin{aligned} w\in L\Leftrightarrow \exists y:M(w,y)=1\Leftrightarrow \exists y:C_{w}(y)=1\Leftrightarrow \exists y,z:f_{w}(y,z)=1\Leftrightarrow f_{w}\in \text {3Sat,} \end{aligned}

as desired. QED

In section º4.3 we reduced 3Sat to other problems which are also in NP by Fact 5.1. This implies that all these problems are NP-complete. Here we use that if problem A reduces to B in P, and B reduces to C, then also A reduces to C. This is because if C\in \text {P} then B\in \text {P}, and so A\in \text {P}.

Corollary 5.1. Clique, Cover-by-vertexes, Subset-sum, and 3Color are NP-complete.

It is important to note that there is nothing special about the existence of NP-complete problems. The following is a simple such problem that does not require any of the machinery in this section.

Exercise 5.3. Consider the problem, given a RAM M, an input x, and t\in \mathbb {N}, where t is written in unary, decide if there is y\in \{0,1\} ^{t} such that M(x,y)=1. Prove that this is NP-complete.

What if t is written in binary?

The interesting aspect of NP-complete problems such as 3Sat and those in Corollary 5.1 is that they are very simple and structured, and don’t refer to computational models. This makes them suitable for reductions, and for inferring properties of the complexity class which are not evident from a machine-based definition.

5.3 From RAM to 3SAT in quasi-linear time

The framework in the previous section is useful to relate membership in P of different problems in NP, but it is not suitable for a more fine-grained analysis. For example, under the assumption that 3Sat is in \text {Time}(cn) we cannot immediately conclude that other problems in NP are solvable in this time or in about this time. We can only conclude that they are in \text {P}. In particular, the complexity of 3Sat cannot be related to that of other central conjectures, such as whether 3Sum is in subquadratic time, Conjecture 4.1.

The culprit is the power loss in reducing RAM computation to circuits, mentioned at the beginning of the chapter. We now remedy this situation and present a quasi-linear reduction. As we did before, cf. Theorem 5.1 and Theorem 5.2, we first state a version of the simulation for (deterministic) computation which contains all the main ideas, and then we note that a completeness result follows.

Theorem 5.3. Given an input length n\in \mathbb {N}, a time bound t\in \mathbb {N}, and a RAM M that runs in time t on inputs of n bits, we can compute in time t':=c_{M}t(\log t)^{c} a 3CNF f on variables (x,y) where |y|\le t' such that for every x\in \{0,1\} ^{n}:

\begin{aligned} M(x)=1\iff \exists y:f(x,y)=1. \end{aligned}

We now present the proof of this amazing result; you may want to refer back to Definition 2.5 of a RAM. A key concept in the proof is the following “snapshot” of the RAM computation.

Definition 5.3. The internal configuration, abbreviated IC, of a RAM specifies:

  • its registers,
  • the program counter,
  • the word length w, and
  • if the current instruction is a Read r_{i}:=\mu [r_{j}] or Write \mu [r_{j}]:=r_{i} then the IC includes the content \mu [r_{j}] of the memory cell indexed by r_{j}.

Note that at most one memory cell is included in one IC. By contrast, the configuration of a TM (Definition 2.1) includes all its tape cells. Also note that an IC has length \le c_{M}+c\log t bits, where the c_{M} is for the program counter, and the c\log t is for the rest, using that the maximum word length of a machine running in time t\ge n is c\log t.

The key idea in the proof. At the high level, the approach is, like in Theorem 5.1, to guess computation and check it efficiently. We are going to guess the sequence of ICs, and we need additional ideas to check them efficiently by a circuit. This is not immediate, since, again, the RAM can use direct access to read and write in memory at arbitrary locations, something which is not easy to do with a circuit.

The key idea is to check operations involving memory independently from the operations involving registers but not memory. If both checks pass, then the computation is correct. More precisely, a sequence of internal configurations s_{1},s_{2},\ldots ,s_{t} corresponds to the computation of the RAM on input x iff for every i<t:

  1. If s_{i} does not access memory, then s_{i+1} has its registers, program counter, and word length updated according to the instruction executed in s_{i},
  2. If s_{i} is computing a read operation r_{i}:=\text {\ensuremath {\mu [r_{j}]}} then in s_{i+1} register r_{j} contains the most recent value written in memory cell r_{j}. In case this cell was never written, then r_{j} should contain x_{j} if j\in \{1,2,\ldots ,n\}, n if j=0, and 0 otherwise. The program counter in s_{i+1} also points to the next instruction.

Rather than directly constructing a 3CNF that implements these checks, we construct a circuit and then appeal to Theorem 5.1. It is easy to construct a circuit of quasi-linear size implementing Check 1, since the circuit only has to check adjacent pairs of ICs. As remarked before, these ICs have length \le c_{M}+c\log t. For fixed i, Check 1 can be implemented by a circuit which depends on the RAM and has size power in the length of an IC. Taking an And of these circuits over the choices of i gives a circuit of the desired size for Check 1.

The difficulty lies in Check 2, because the circuit needs to find “the most recent value written.” The solution is to sort the ICs by memory addresses. After sorting, we can implement Check (2) as easily as Check (1), since we just need to check adjacent pairs of ICs.

The emergence of sorting in the theory of NP-completeness cements the pivotal role this operation plays in computer science.

To implement this idea we need to be able to sort with a quasi-linear size circuit. Standard sorting algorithms like Mergesort, Heapsort, or Radixsort run in quasi-linear time on a RAM, but rely on direct addressing (cf. section º2.4) and for this reason cannot be easily implemented by a circuit of quasi-linear size. However other algorithms have been developed that do have such an implementation, for example a variant of Mergesort called Odd-Even-Mergesort [6], see also [22]. This gives the following lemma.

Lemma 5.1. Given t and m we can compute in time t':=t\cdot (m\log t)^{c} a circuit (of size \le t') that sorts t integers of m bits.

We summarize the key steps in the proof.

Proof of Theorem 5.3 We construct a circuit C_{M} and then appeal to Theorem 5.1. The extra variables y correspond to t ICs s_{1},s_{2},\ldots ,s_{t}. An IC takes c_{M}+c\log t bits to specify, so we need \le c_{M}t\log t variables y. The circuit C_{M} first performs Check (1) above for each adjacent pair (s_{i},s_{i+1}) of ICs. This takes size c_{M}\log ^{c}t for each pair, and so size c_{M}t\log ^{c}t overall.

Then C_{M} sorts the ICs by memory addresses, producing sorted ICs s'_{1},s'_{2},\ldots ,s'_{t}. This takes size t\cdot \log ^{c}t by Lemma 5.1, using that the memory addresses have \le c\log t bits. Then the circuit performs Check (2) for each adjacent pair (s'_{i},s'_{i+1}) of ICs. The circuit size required for this is no more than for Check (1).

Finally, the circuit takes an And of the results of the two checks, and also checks that s_{t} is accepting. QED

We can now prove completeness in a manner similar to Theorem 5.2, with a relatively simple extension of Theorem 5.3.

Theorem 5.4. Every problem L in \text {NTime}(t) map reduces to 3Sat in \text {Time}(c_{L,t}t\log ^{c}t), for every function t\ge n such that t(x) is computable in time t(x) given x.

The assumption on t is similar to that in the hierarchy Theorem 3.4, and is satisfied by all standard functions including all those in this book – cf. discussion after Theorem 3.4.

ProofLet M be a RAM computing L in the assumed time. Given an input w of length n we have to efficiently compute a 3CNF f such that

\begin{aligned} \exists y\in \{0,1\} ^{t(n)}:M(w,y)=1\iff \exists y\in \{0,1\} ^{c_{L,t}t(n)\log ^{c}t(n)}:f(y)=1. \end{aligned}

First we compute t(n), using the assumption. We now apply Theorem 5.3, but on a new input length n':=c(n+t)\le ct, to accommodate for inputs of the form (x,y). This produces a formula f of size c_{L,t}t(\log t)^{c} in variables (x,y) and new variables z. We can now set x to w and conclude the proof. QED

With these sharper results we can now study hardness and completenss within time bounds such as n^{2}, n\log ^{3}n etc. We work out an example in the next section.

5.3.1 Quasilinear-time completeness

In this section we use the machinery we just developed to study completeness in quasi-linear time, instead of power time.

Definition 5.4. We define the quasi-linear time complexity classes

\begin{aligned} \text {QLin-Time}:= & \bigcup _{d\in \mathbb {N}}\text {Time}(n\log ^{d}n)\text { and}\\ \text {QLin-NTime}:= & \bigcup _{d\in \mathbb {N}}\text {NTime}(n\log ^{d}n). \end{aligned}

Theorem 5.5. 3Sat is complete for QLin-NTime with respect to mapping reductions in QLin-Time. That is:

– 3Sat is in QLin-NTime, and

– every problem in QLin-NTime map reduces to 3Sat in QLin-Time.

ProofTo show that 3Sat is in QLin-NTime, consider a 3CNF instance f of length n. This instance has at most n variables, and we can guess an assignment y to them within our budget of non-deterministic guesses. There remains to verify that y satisfies f. For this, we can do one pass over the clauses. For each clause, we access the bits in y corresponding to the 3 variables in the clause, and check if the clause is satisfied. This takes constant time per clause, and so time cn overall.

The second part follows from Theorem 5.4, using the fact that the composition of two quasilinear functions is also quasilinear (similarly to the fact that the composition of two power functions is also a power). QED

Note that the proof that 3Sat is in QLin-NTime relies on our computational model being a RAM, because we use direct access to fetch the values for the variables in a clause.

We can now give the following quasi-linear version of Fact 5.3. The only extra observation for the proof is again that the composition of two quasi-linear functions is quasi-linear.

Corollary 5.2. \text {3Sat}\in \text {QLin-NTime}\Leftrightarrow \text {QLin-NTime}=\text {QLin-Time.}

Exercise 5.4. Prove that Theorem 5.5 holds with 3Color instead of 3Sat. What about Clique and Subset-sum?

Exercise 5.5. Prove that 3Sum reduces to 3Sat in Subquadratic time. That is: \text {3Sat}\in \text {SubquadraticTime\ensuremath {\Rightarrow \text {3Sum}\in \text {SubquadraticTime}} (i.e., }Conjecture 4.1 is false).

5.4 Completeness in other classes

The completeness phenomenon is not special to NP but enjoyed by many other classes. In this section we begin to explore completeness for NExp and Exp. One needs to be careful how hardness (and hence completeness) is defined, since these classes are known to be different from \text {P} by the hierarchy Theorem 3.4. So defining a problem L to be NExp-hard if L\in \text {P}\Rightarrow \text {NExp}=\text {P} would mean simply that L\not \in \text {P}. To avoid this in this section hardness (hence completeness) is defined w.r.t. mapping reductions, cf. Chapter 4. (Another option would be to replace \text {P} with say \text {BPP}, since it is not known if \text {BPP}=\text {NExp}.)

5.4.1 NExp completeness

Complete problems for NExp include succinct versions of problems complete for NExp. Here succinct means that rather than giving the input x to the problem in standard format, the input consists instead of a circuit C:\{0,1\} ^{m}\to \{0,1\} encoding x, for example C(i) equals bit i of x, for every i.

Definition 5.5. The Succinct-3Sat problem: Given a circuit C encoding a 3CNF f_{C}, does f_{C} have a satisfying assignment?

Theorem 5.6. Succinct-3Sat is NExp complete with respect to power-time mapping reductions.

Proof sketch. Let us first show that Succinct-3Sat is in NExp. Given a circuit C of length n, we can run it on every possible input (of length \le n) and write down the formula f_{C} encoded by C. This formula has size \le 2^{n}. We can then use the fact that 3Sat is in NP to decide satisfiability of this formula in non-deterministic power time in 2^{n}, that is \text {NTime}(2^{cn})\subseteq \text {NExp}.

To prove NExp hardness it is convenient to work with TMs rather than RAMs. The main observation is that in the simulation of a TM M on an input x by a circuit C_{M}, Theorem 2.4, the circuit is very regular, in the sense that we can construct another circuit S_{M} which is a succinct encoding of C_{M}. The circuit S_{M} is given as input indexes to gates in C_{M} and outputs the type of the gate and its wires. The size of S_{M} is power in the index length and M. Thus, if C_{M} has size t^{c}, S_{M} only needs size \log ^{c}t. If t=2^{n^{d}}, S_{M} has size power in n, as desired. The transformation from circuit to 3CNF in Theorem 5.1 is also regular and can be done succinctly. QED

As a consequence, we obtain the following “concrete” problem not in \text {P}.

Corollary 5.3. \text {Succinct-3Sat}\not \in \text {P}.

5.4.2 Exp-completeness

Exp-complete problems include several two-player games. The important feature for completeness is that the game may last for an exponential number of steps (otherwise it would belong to a class believed to be stricter which we will investigate in Chapter ??). These games include (generalized versions of) Chess [8] and Checkers [26].

5.5 Power from completeness

The realization that arbitrary computation can be reduced to 3Sat and other problems is powerful and liberating. In particular it allows us to significantly widen the net of reductions.

5.5.1 Optimization problems

As observed in section º4.6, 3Sat trivially reduces to Max-3Sat. The converse will be shown next.

Theorem 5.7. Max-3Sat reduces to 3Sat in \text {P}.

ProofConsider the problem Atleast-3Sat: Given a 3CNF formula and an integer t, is there an assignment that satisfies at least t clauses? This is in \text {NP} and so can be reduced to 3Sat in \text {P}. This is the step that’s not easy without “thinking completeness:” given an algorithm for 3Sat it isn’t clear how to use it directly to solve Atleast-3Sat.

Hence, if 3Sat is in P so is Atleast-3Sat. On input a 3CNF f, using binary search and the fact that Atleast-3Sat is in P, we can find in P the largest t s.t. (f,t)\in \text {Atleast-3Sat}. Having found this t, there remains to construct an assignment satisfying the clauses. This can be done fixing one variable at the time as in Theorem 4.5. QED

5.5.2 NP is as easy as detecting unique solutions

A satisfiable 3CNF can have multiple satisfying assignments. On the other hand some problems and puzzles have unique solutions. In this section we relate these two scenarios.

Definition 5.6. Unique-CktSat is the problem: Given a circuit C s.t. there is at most one input x for which C(x)=1, decide if such an input exists.

Unique-3Sat is the Unique-CktSat problem restricted to 3CNF circuits.

Theorem 5.8. [33] 3Sat reduces to Unique-3Sat in BPP.

We in fact reduce 3Sat to Unique-CktSat. Then Unique-CktSat can be reduced to Unique-3Sat observing that the reduction in Theorem 5.1 preserves uniqueness.

The beautiful proof uses a powerful and general technique in randomized computation: pairwise uniformity, sometimes more generically referred to as hashing. We first define such functions and give efficient constructions. Then we show how to use them to “isolate” assignments.

Definition 5.7. A distribution H on functions mapping S\to T is called pairwise uniform if for every x,x'\in S and y,y'\in T one has

\begin{aligned} \mathbb {P}_{H}[H(x)=y\wedge H(x')=y']=1/|T|^{2}. \end{aligned}

This is saying that on every pair of inputs H is behaving as a completely uniform function. Yet unlike completely uniform functions, the next lemma shows that pairwise uniform functions can have a short description, which makes them suitable for use in algorithms.

Exercise 5.6. Let \mathbb {F}_{q} be a finite field. Define the random function H:\mathbb {F}_{q}\to \mathbb {F}_{q} as H(x):=Ax+B where A,B are uniform in \mathbb {F}_{q}.

Prove that H is pairwise uniform.

Explain how to use H to obtain a pairwise uniform function from \{0,1\} ^{n} to \{0,1\} ^{t} for any given t\le n.

Exercise 5.7. Define the random function H_{1}:\{0,1\} ^{n}\to \{0,1\} as H(x):=\sum _{i\le n}A_{i}x_{i}+B where A is uniform in \{0,1\} ^{n} and B is uniform in \{0,1\} .

Prove that H_{1} is pairwise uniform.

Explain how to use H to obtain a pairwise uniform function from \{0,1\} ^{n} to \{0,1\} ^{t} for any given t\le n.

We can now state the lemma that we use to isolate assignments.

Lemma 5.2. Let H be a pairwise uniform function mapping S\to T, and let 1\in T. The probability that there is a unique element s\in S such that H(s)=1 is

\begin{aligned} \ge \frac {|S|}{|T|}-\frac {|S|^{2}}{|T|^{2}}. \end{aligned}

In particular, if |T|/8\le |S|\le |T|/4 this prob. is \ge \frac {1}{8}-\frac {1}{16}\ge 1/8.

ProofFor fixed s\in S, the probability s is the unique element mapped to 1 is at least the prob. that s is mapped to 1 minus the prob. that both s and some other s'\ne s are mapped to 1. This is

\begin{aligned} \ge \frac {1}{|T|}-\frac {|S|-1}{|T|^{2}}. \end{aligned}

These events for different s\in S are disjoint; so the target probability is at least the sum of the above over s\in S. QED

Proof of Theorem 5.8 Given a 3Sat instance \phi with \le n variables x, we pick a random i from 0 to n+c. We then pick a pairwise uniform function mapping \{0,1\} ^{n} to \{0,1\} ^{i}, and consider the circuit

\begin{aligned} C:=\phi (x)\wedge H(x)=0^{i}. \end{aligned}

This circuit has size n^{c}.

If \phi is not satisfiable, C is not satisfiable, for any random choices.

Now suppose that \phi has s\ge 1 satisfying assignment. With prob. \ge 1/n we will have 2^{i-3}\le s\le 2^{i-2}, in which case Lemma 5.2 guarantees that C has a unique satisfying assignment with prob. \ge c.

Overall, C has a unique satisfying assignment with prob. \ge c/n. Hence the Unique-3Sat algorithm on C outputs 1 with prob. \ge c/n. If we repeat this process cn times, with independent random choices, the Or of the outcomes gives the correct answer with prob. \ge 2/3. QED

5.6 Problems

Problem 5.1. In Theorem 4.5 we reduced Search-3Sat to 3Sat.

– Suppose 3Sat is computable by circuits of depth c\log n. What would be the depth of the circuits for Search-3Sat given by the reduction?

– Reduce Search-3Sat to 3Sat in \text {\ensuremath {\bigcup _{a>0}}}\text {Depth}(a\log n).

Hint: First work with randomized circuits. Use ideas in proof of Theorem 4.5.

References

[1]   Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59–78. IEEE Computer Society, 2015.

[2]   Leonard Adleman. Two theorems on random polynomial time. In 19th IEEE Symp. on Foundations of Computer Science (FOCS), pages 75–83. 1978.

[3]   Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. J. Comput. Syst. Sci., 18(2):155–193, 1979.

[4]   Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof verification and the hardness of approximation problems. J. of the ACM, 45(3):501–555, May 1998.

[5]   Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087–1097, 2018.

[6]   Kenneth E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, volume 32, pages 307–314, 1968.

[7]   Stephen A. Cook. A hierarchy for nondeterministic time complexity. J. of Computer and System Sciences, 7(4):343–353, 1973.

[8]   Aviezri S. Fraenkel and David Lichtenstein. Computing a perfect strategy for n x n chess requires time exponential in n. J. Comb. Theory, Ser. A, 31(2):199–214, 1981.

[9]   Anka Gajentaan and Mark H. Overmars. On a class of {O}(n^2) problems in computational geometry. Comput. Geom., 5:165–185, 1995.

[10]   M. R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.

[11]   K. G÷del. ▄ber formal unentscheidbare sΣtze der Principia Mathematica und verwandter systeme I. Monatsh. Math. Phys., 38, 1931.

[12]   Oded Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008.

[13]   David Harvey and Joris van der Hoeven. Integer multiplication in time O(n\mathrm {log}\, n). Annals of Mathematics, 193(2):563 – 617, 2021.

[14]   F. C. Hennie. One-tape, off-line turing machine computations. Information and Control, 8(6):553–578, 1965.

[15]   Fred Hennie and Richard Stearns. Two-tape simulation of multitape turing machines. J. of the ACM, 13:533–546, October 1966.

[16]   Russell Impagliazzo and Ramamohan Paturi. The complexity of k-sat. In IEEE Conf. on Computational Complexity (CCC), pages 237–, 1999.

[17]   Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Computer & Systems Sciences, 63(4):512–530, Dec 2001.

[18]   Russell Impagliazzo and Avi Wigderson. \mathit {P} = \mathit {BPP} if E requires exponential circuits: Derandomizing the XOR lemma. In 29th ACM Symp. on the Theory of Computing (STOC), pages 220–229. ACM, 1997.

[19]   Kojiro Kobayashi. On the structure of one-tape nondeterministic turing machine time hierarchy. Theor. Comput. Sci., 40:175–193, 1985.

[20]   Leonid A. Levin. Universal sequential search problems. Problemy Peredachi Informatsii, 9(3):115–116, 1973.

[21]   O. B. Lupanov. A method of circuit synthesis. Izv. VUZ Radiofiz., 1:120–140, 1958.

[22]   NEU. From RAM to SAT. Available at http://www.ccs.neu.edu/home/viola/, 2012.

[23]   Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. J. Comput. Syst. Sci., 43(3):425–440, 1991.

[24]   Wolfgang J. Paul, Nicholas Pippenger, Endre SzemerΘdi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In IEEE Symp. on Foundations of Computer Science (FOCS), pages 429–438, 1983.

[25]   Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. of the ACM, 26(2):361–381, 1979.

[26]   J. M. Robson. N by N checkers is exptime complete. SIAM J. Comput., 13(2):252–267, 1984.

[27]   Rahul Santhanam. On separators, segregators and time versus space. In IEEE Conf. on Computational Complexity (CCC), pages 286–294, 2001.

[28]   Arnold Sch÷nhage. Storage modification machines. SIAM J. Comput., 9(3):490–508, 1980.

[29]   Claude E. Shannon. The synthesis of two-terminal switching circuits. Bell System Tech. J., 28:59–98, 1949.

[30]   Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Mathematics of Computation, 54(189):435–447, 1990.

[31]   Larry Stockmeyer and Albert R. Meyer. Cosmological lower bound on the circuit complexity of a small problem in logic. J. ACM, 49(6):753–784, 2002.

[32]   Alan M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. London Math. Soc., s2-42(1):230–265, 1937.

[33]   Leslie G. Valiant and Vijay V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85–93, 1986.

[34]   Emanuele Viola. Reducing 3XOR to listing triangles, an exposition. Available at http://www.ccs.neu.edu/home/viola/, 2011.

[35]   Emanuele Viola. Pseudorandom bits and lower bounds for randomized turing machines. Theory of Computing, 18(10):1–12, 2022.

Eric Allender’s day

is unfolding at Simons institute (and tomorrow is Valentine’s day, joke by Rahul).  The speakers are praising Eric’s many contributions to the field, so I thought I’d add my praise, since over the years I interacted with Eric in many different capacities, excluding coauthor, but there’s time to fix that, Eric.  I met him the first time 20 years ago in Denmark.  I had already read some of his surveys, and I remember being somewhat surprised that the mental image I had subconsciously created of him didn’t match the way he looked.  Turns out even he was expecting something different from the emails we had exchanged — pictures weren’t online back then.  Anyway, back to more scientific matters, I told him that his surveys were one of the first things I read, and I think he said it was good that they had had an effect.

Indeed, they have, his works and surveys have had a significant impact on my research.  Especially his surveys on low-level complexity classes, a topic dear to my heart.  Counting hierarchies, arithmetic circuits, and the division breakthroughs are some of the many things his surveys exposed me to.  Eric has a unique angle about these topics, and I often go back to his surveys and papers for knowledge and inspiration.  More in line with the topic of the workshop, people are emphasizing how Eric anticipated recent trends, such as “meta complexity,” before they were a thing.  Way to go.

Mathematics of the impossible: Computational Complexity, chapter 4, reductions


All posts in this series.
A PDF version of this post will be published with a delay, but if you’d like to have it soon let me know.

Contents

Chapter 4
Reductions

PIC

One can relate the complexity of functions via reductions. This concept is so ingrained in common reasoning that giving it a name may feel, at times, strange. For in some sense pretty much everything proceeds by reductions. In any algorithms textbook, the majority of algorithms can be cast as reductions to algorithms presented earlier in the book, and so on. And it is worthwhile to emphasize now that, as we shall see below, reductions, even in the context of computing, have been used for millennia. For about a century reductions have been used in the context of undecidability in a modern way, starting with the incompleteness theorem in logic [8], whose proof reduces questions in logic to questions in arithmetic.

Perhaps one reason for the more recent interest in complexity reductions is that we can use them to relate problems that are tantalizingly close to problems that today we solve routinely on somewhat large scale inputs with computers, and that therefore appear to be just out of reach. By contrast, reductions in the context of undecidability tend to apply to problems that are completely out of reach, and in this sense remote from our immediate worries.

4.1 Types of reductions

Informally, a reduction from a function f to a function g is a way to compute f given that we can compute g. One can define reductions in different ways, depending on the overhead required to compute f given that we can compute g. The most general type of reduction is simply an implication.

General form of reduction from f to g:

  If g can be computed with resources X then f can be computed with resources Y.

A common setting is when X=Y. In this case the reduction allows us to stay within the same complexity class.

Definition 4.1. We say that f reduces to g in X (or under X reductions) if

\begin{aligned} g\in X\Rightarrow f\in X. \end{aligned}

A further special and noteworthy case is when X=\text {P}, or X=\text {BPP}; in these cases the reduction can be interpreted as saying that if g is easy to compute than f is too.But in general X may not be equal to Y. We will see examples of such implications for various X and Y.

It is sometimes useful to be more specific about how the implication is proved. For example, this is useful when inferring various properties of f from properties of g, something which can be obscured by a stark implication. The following definition gives a specific way in which the implication can be proved.

Definition 4.2. We say that f map reduces to g in X (or via a map in X) if there is M\in X such that f(x)=g(M(x)) for every x.

Exercise 4.1. Suppose that f map reduces to g in X.

(1) Suppose X=\text {P}. Show f reduces to g in \text {X}.

(2) Suppose X=\bigcup _{d}\text {Time}(d\cdot n^{2}). Can you still show that f reduces to g in \text {X}?

Many reductions we shall see are not mapping reductions. In fact, our first example is not a mapping reduction.

4.2 Reductions

4.2.1 Multiplication

Summing two n-bit integers is in \text {CktGates}(cn) (Exercise 2.8). But the smallest circuit known for multiplication has \ge cn\log n gates [10]. (The same situation holds for MTMs; over RAMs and related models multiplication can be done in time cn [21].) It is a long-standing question whether we can multiply two n-bit integers with a linear-size circuit.

What about squaring integers? Is that harder or easier than multiplication? Obviously, if we can multiply two numbers we can also square a number: simply multiply it by itself. This is a trivial example of a reduction. What about the other way around? We can use a reduction established millennia ago by the Babylonians. They employed the equation

\begin{aligned} a\cdot b=\frac {(a+b)^{2}-(a-b)^{2}}{4}~~~~(4.1) \end{aligned}

to reduce multiplication to squaring, plus some easy operations like addition and division by four. In our terminology we have the following.

Definition 4.3. Multiplication is the problem of computing the product of two n-bit integers. Squaring is the problem of computing the square of an n-bit integer.

Theorem 4.1. If Squaring has linear-size circuits then Multiplication has linear-size circuits.

ProofSuppose C computes Squaring. Then we can multiply using equation (??). Specifically, given a and b we use Exercise 2.8 to compute a+b and a-b. (We haven’t seen subtraction or negative integers, but it’s similar to addition.) Then we run C on both of them. Finally, we again use Exercise 2.8 for computing their difference. It remains to divide by four. In binary, this is accomplished by ignoring the last two bits – which costs nothing on a circuit. QED

4.2.2 3Sum

Definition 4.4. The 3\text {Sum} problem: Given a list of integers, are there three integers that sum to 0?

It is easy to solve 3Sum in time cn^{2}\log n on a RAM. (We can first sort the integers then for each pair (a,b) we can do a binary search to check if -(a+b) is also present.) The time can be improved n^{2}/\log ^{c}n.

3Sum is believed to require quadratic time.

Definition 4.5. \text {SubquadraticTime}:=\bigcup _{\epsilon >0}\text {Time}(n^{2-\epsilon }).

Conjecture 4.1. \text {3Sum}\not \in \text {SubquadraticTime}.

One can reduce 3Sum to a number of other interesting problem to infer that, under Conjecture 4.1, those problems require quadratic time too.

Definition 4.6. The Collinearity problem: Given a list of points in the plane, are there three points on a line?

Theorem 4.2. [6] \text {Collinearity}\in \text {SubquadraticTime\ensuremath {\Rightarrow \text {3Sum}\in \text {SubquadraticTime}} (i.e., }Conjecture 4.1 is false).

ProofWe map instance a_{1},a_{2},\ldots ,a_{n} of 3Sum to the points

\begin{aligned} (a_{1},a_{1}^{3}),(a_{2},a_{2}^{3}),\ldots ,(a_{n},a_{n}^{3}), \end{aligned}

and solve Collinearity on those points.

To verify correctness, notice that points (x,x^{3}),(y,y^{3}), and (z,z^{3}) are on a line iff

\begin{aligned} \frac {y^{3}-x^{3}}{y-x}=\frac {z^{3}-x^{3}}{z-x}. \end{aligned}

Because y^{3}-x^{3}=(y-x)(y^{2}+yx+x^{2}), this condition is equivalent to

\begin{aligned} y^{2}+yx+x^{2}=z^{2}+zx+x^{2}\Leftrightarrow (x+(y+z))(y-z). \end{aligned}

Assuming y\ne z, i.e., that the 3Sum instance consists of distinct numbers, this is equivalent to x+y+z=0, as desired. (The case where there can be duplicates is left as an exercise.)

Note that the Collinearity instance has length linear in the 3Sum instance, and the result follows. QED

Exercise 4.2. The Tripartite-3Sum problem: Given lists A_{1}, A_{2}, and A_{3} of numbers, are there a_{i}\in A_{i} s.t. a_{1}+a_{2}+a_{3}=0?

Prove that Tripartite-3Sum is in subquadratic time iff 3Sum is.

We now give a reduction in the other direction: We reduce a problem to 3Sum.

Definition 4.7. The 3Cycle-Detection problem: Given the adjacency list of a directed graph, is there a cycle of length 3?

This problem can be solved in time n^{2\omega /(\omega +1)+o(1)} where \omega <2.373 is the exponent of matrix multiplication. If \omega =2 then the bound is n^{1.3\bar {3}+o(1)}. It is not known if any subquadratic algorithm for 3Sum would improve these bounds. However, we can show that an improvement follows if \text {3Sum}\in \text {Time}(n^{1+\epsilon}) for a small enough \epsilon.

Theorem 4.3. [26] \text {3Sum}\in \text {Time}(t(n))\Rightarrow \text {3Cycle-Detection}\in \text {BPTime}(ct(n)), for any t(n)\ge n.

The reduction can be derandomized (that is, one can replace \text {BPTime} with \text {Time} in the conclusion) but the randomized case contains the main ideas.

ProofWe assign random numbers r_{x} with 4\log n bits to each node x in the graph. The 3Sum instance consists of the integers r_{x}-r_{y} for every edge x\to y in the graph.

To verify correctness, suppose that there is a cycle

\begin{aligned} x\to y\to z\to x \end{aligned}

in the graph. Then we have r_{x}-r_{y}+r_{y}-r_{z}+r_{z}-r_{x}=0, for any random choices.

Conversely, suppose there is no cycle, and consider any three numbers r_{x1}-r_{y1},r_{x2}-r_{y2},r_{x3}-r_{y3} from the reduction and its corresponding edges. Some node xi has unequal in-degree and out-degree in those edges. This means that when summing the three numbers, the random variable r_{xi} will not cancel out. When selecting uniform values for that variable, the probability of getting 0 is at most 1/n^{4}.

By a union bound, the probability there there are three numbers that sum to zero is \le n^{3}/n^{4}<1/3. QED

Exercise 4.3. Prove an analogous result for undirected graphs. Note TBD: This exercise should be more interesting for 4-cycles, because you can’t just duplicate edges, I think.

Many other clusters of problems exist, for example based on matrix multiplication or all-pairs shortest path.

4.3 Reductions from 3Sat

In this section we begin to explore an important cluster of problems not known to be in \text {BPP}. What’s special about these problems is that in Chapter ?? we will show that we can reduce arbitrary computation to them, while this is unknown for the problems in the previous section.

Perhaps the most basic problem in the cluster is the following.

Definition 4.8. The 3Sat problem: Given a 3CNF \phi , is there an assignment x s.t. \phi (x)=1?

Conjecture 4.2. \text {3Sat\ensuremath {\not }\ensuremath {\ensuremath {\in \text {P}}}.}

Stronger conjectures have been made.

Conjecture 4.3. [13] [Exponential time hypothesis (ETH)] There is \epsilon>0 such that there is no algorithm that on input a 3CNF \phi with v variables and cv^{3} clauses decides if \phi is satisfiable in time 2^{(\epsilon +o(1))v}.

Conjecture 4.4. [14] [Strong exponential-time hypothesis (SETH)] For every \epsilon >0 there is k such that there is no algorithm that on input a kCNF \phi with v variables and cv^{k} clauses decides if \phi is satisfiable in time 2^{(1-\epsilon +o(1))v}.

It is known that \text {SETH}\Rightarrow \text {ETH}, but the proof is not immediate.

We now give reductions from 3Sat to several other problems. The reductions are in fact mapping reductions. Moreover, the reduction map can be extremely restricted, see Problem 4.4. In this sense, therefore, this reduction can be viewed as a direct translation of the problem, and maybe we shouldn’t really be thinking of the problems as different, even if they at first sight refer to different types of objects (formulas, graphs, numbers, etc.).

Watch videos 29, 30, 31, and 32 covering reductions: 3SAT to CLIQUE, CLIQUE to VERTEX-COVER, 3SAT to SUBSET-SUM, 3SAT to 3COLOR from https://www.ccs.neu.edu/home/viola/classes/algm-generic.html

Note: The videos use the terminology “polynomial time” instead of “power time” here.

Exercise 4.4. The problem System is defined as follows. A linear inequality is an inequality involving sums of variables and constants, such as x+y\ge z, x\le -17, and so on. A system of linear inequalities has an integer solution if it is possible to substitute integer values for the variables so that every inequality in the system becomes true. The language System consists of systems of linear inequalities that have an integer solution. For example,

\begin{aligned} (x+y\ge z,x\le 5,y\le 1,z\ge 5)\in \mbox {System }\\ (x+y\ge 2z,x\le 5,y\le 1,z\ge 5)\not \in \mbox {System } \end{aligned}

Reduce 3Sat to System in \text {P}.

Exercise 4.5. For an integer k, k-Color is the problem of deciding if the nodes of a given undirected graph G can be colored using k colors in such a way that no two adjacent vertices have the same color.

Reduce 3-Color to 4-Color P.

Reductions in the opposite directions are possible, and so in fact the problems in this section are power-time equivalent in the sense that any of the problems is in \text {P} iff all the others are. We will see a generic reduction in the next chapter. For now, we illustrate this equivalence in a particular case.

Exercise 4.6. Reduce 3Color to 3Sat in \text {P}, following these steps:

1. Given a graph G, introduce variables x_{i,c} representing that node i has color c, where c ranges in the set of colors C=\{g,r,b\}. Describe a set of clauses that is satisfiable if and only if for every i there is exactly one c\in C such that x_{i,c} is true.

2. Introduce clauses representing that adjacent nodes do not have the same color.

3. Briefly conclude the proof.

Thus, we are identifying a cluster of problems which are all power-time equivalent. This cluster is so prominent that problems in it have been compiled into books [7]. More recently, it was shown that it contains (generalized versions of) several games including: Tetris, Lemmings, Sudoku, etc. For a list see e.g. the wikipedia page https://en.wikipedia.org/wiki/List_of_NP-complete_problems

4.4 Power hardness from SETH

In this section we show that a conjecture similar to Conjecture 4.1 can be proved assuming SETH. This is an interesting example of how we can connect different parameter regimes, since SETH is stated in terms of exponential running times. In general, “scaling” parameters is a powerful technique in the complexity toolkit.

Definition 4.9. The Or-Vector problem: Given two sets A and B of strings of the same length, determine if there is a\in A and b\in B such that the bit-wise Or a\vee b equals the all-one vector.

The Or-Vector problem is in \text {Time}(n^{2}). We can show that a substantial improvement would disprove SETH.

Theorem 4.4. \text {Or-Vector}\in \text {SubquadraticTime}\Rightarrow \text {SETH is false.}

ProofDivide the variables in two blocks of v/2 each. For each assignment to the variables in the first block construct the vector in \{0,1\} ^{d} where bit i is 1 iff clause i is satisfied by the variables in the first block. Call A the resulting set of vectors. Let N:=2^{v/2} and note |A|=N. Do the same for the other block and call the resulting set B.

Note that \phi is satisfiable iff \exists a\in A,b\in B such that a\vee b=1^{d}.

Constructing these sets takes time Nd^{c}. If \text {Or-Vector}\in \text {Time}(n^{2-\epsilon }) for some \epsilon >0, we can take k=c_{\epsilon} and rule out SETH. QED

Tight hardness results based on SETH have been established for several well-studied problems, including longest-common subsequence [1] and edit distance [5].

4.5 Search problems

Most of the problems in the previous sections ask about the existence of solutions. For example 3Sat asks about the existence of a satisfying assignment. It is natural to ask about computing such a solution, if it exists. Such non-boolean problems are known as search problems.

Next we show that in some cases we can reduce a search problem to the corresponding boolean problem.

Definition 4.10. Search-3Sat is the problem: Given a satisfiable 3CNF formula, output a satisfying assignment.

Theorem 4.5. Search-3Sat reduces to 3Sat in \text {P}. That is: \text {3Sat}\in \text {P}\Rightarrow \text {Search-3Sat}\in \text {P}.

ProofWe construct a satisfying assignment one variable at the time. Given a satisfiable 3CNF, set the first variable to 0 and check if it is still satisfiable with the assumed algorithm for 3Sat. If it is, go to the next variable. If it is not, set the first variable to 1 and go to the next variable. QED

Exercise 4.7. Show \text {Clique}\in \text {P}\Rightarrow \text {Search-Clique}\in \text {P}.

4.5.1 Fastest algorithm for Search-3Sat

A curious fact about many search problems is that we know of an algorithm which is, in an asymptotic sense to be discussed now, essentially the fastest possible algorithm. This algorithm proceeds by simulating every possible program. When a program stops and outputs the answer, we can check it efficiently. Naturally, we can’t just take any program and simulate it until it ends, since it may never end. So we will clock programs, and stop them if they take too long. There is a particular simulation schedule which leads to efficient running times.

Theorem 4.6. [17] There is a RAM U such that on input any satisfiable formula x:

(1) M outputs a satisfying assignment, and

(2) If there is a RAM M that on input x outputs a satisfying assignment for x in t steps then U stops in c_{M}t+|x|^{c} steps.

We are taking advantage of the RAM model. On other models it is not known if the dependence on t can be linear.

ProofFor i=1,2,\ldots the RAM U simulates RAM i for 2^{i} steps. 2.2 guarantees that for each i the simulation takes time c2^{i}. If RAM i stops and outputs y, then U checks in time |x|^{c} if y is a satisfying assignment. If it is, then U outputs y and stops. Otherwise it continues.

Now let M be as in (2). As before, we work with an enumeration of programs where each program appears infinitely often. Hence we can assume that M has a description of length \ell :=c_{M}+\log t. Thus the simulation will terminate when i=\ell .

The time spent by U for a fixed i is \le c\cdot 2^{i}+|x|^{c}. Hence he total running time of U is

\begin{aligned} \le c\sum _{j=1}^{\ell }\left (c2^{j}+|x|^{c}\right )\le c_{M}2^{\ell }+c_{M}|x|^{c}\le c_{M}(t+|x|^{c}). \end{aligned}

QED

This result nicely illustrates how “constant factors” can lead to impractical results because, of course, the problem is that the constant in front of t is enormous. Specifically, it is exponential in the size of the program, see Problem 4.5.

4.6 Gap-SAT: The PCP theorem

“Furthermore, most problem reductions do not create or preserve such gaps. There would appear to be a last resort, namely to create such a gap in the generic reduction [C]. Unfortunately, this also seems doubtful. The intuitive reason is that computation is an inherently unstable, non-robust mathematical object, in the sense that it can be turned from non-accepting by changes that would be insignificant in any reasonable metric – say, by flipping a single state to accepting.” [19]

One of the most exciting, consequential, and technical developments in complexity theory of the last few decades has been the development of reductions that create gaps.

Definition 4.11. \gamma -Gap-3Sat is the 3Sat problem restricted to input formulas f that are either satisfiable or such that any assignment satisfies at most a \gamma fraction of clauses.

Note that 3Sat is equivalent to \gamma -Gap-3Sat for \gamma =1-1/n, since a formula of size n has at most n clauses. At first sight it is unclear how to connect the problems when \gamma is much smaller. But in fact it is possible to obtain a constant \gamma . This result is known as the PCP theorem, where PCP stands for probabilistically-checkable-proofs. The connection to proofs will be discussed in Chapter ??.

Theorem 4.7. [4] [PCP] There is \gamma <1 such that \gamma \text {-Gap-3Sat}\in \text {P}\Rightarrow \text {3Sat}\in \text {P}.

Similar results can be established for other problems such as 3Color, but the reductions in the previous section don’t preserve gaps and can’t be immediately applied.

A major application of the PCP theorem is in inapproximability results. A typical optimization problem is Max-3Sat.

Definition 4.12. The Max-3Sat problem: given a 3CNF formula, find a satisfying assignment that satisfies the maximum number of clauses.

Solving 3Sat reduces to Max-3Sat (in Chapter ?? we will give a reverse reduction as well). But we can ask for \beta -approximating Max-3Sat, that is, computing an assignment that satisfies a number of clauses that is at least a \beta fraction of the maximum possible clauses that can be satisfied.

The PCP Theorem 4.7 implies that 3Sat reduces to \beta -approximating Max-3Sat, for some constant \beta <1.

It has been a major line of research to obtain tight approximation factors \beta for a variety of problems. For example, 3Sat reduces to \beta -approximating Max-3Sat for any \b >7/8. This constant is tight because a random uniform assignment to the variables satisfies each clause with probability 7/8 and hence expects to satisfy a 7/8 fraction of the clauses.

Exercise 4.8. Turn this latter observation in an efficient randomized algorithm with an approximation factor 7/8-o(1).

4.7 Problems

Problem 4.1. Reduce 3Sat to the PIT problem (Definition 2.8) over the field with two elements.

Problem 4.2. Prove that 3Sat is not \text {TM-Time}(n^{1.99}). (Hint: Consider the Padded-Palindromes problem which is like palindromes except the input is divided in blocks of c\log n bits, and only the first bit of each block may be non-zero. (1) Prove a time lower bound for Padded-Palindromes by explaining what modifications are needed to the proof of Theorem 3.1. (2) Give a suitable reduction from Padded-Palindromes to 3Sat.)

Problem 4.3. Show that \text {3Color}\in \text {P}\Rightarrow \text {Search-3Color}\in \text {P}.

Problem 4.4. Give an encoding of 3Sat so that the reduction to 3Color in section º4.3 can be computed, for any input length, by a 1-local map (in particular, a circuit of constant depth).

Problem 4.5. Suppose there exists a such that Theorem 4.6 holds with the running time of U replaced with (|M|\cdot t\cdot |x|)^{a}. (That is, the dependence on the program description improved to polynomial, and we allow even weaker dependence on t.) Prove that \text {3Sat}\in \text {P}.

References

[1]   Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59–78. IEEE Computer Society, 2015.

[2]   Leonard Adleman. Two theorems on random polynomial time. In 19th IEEE Symp. on Foundations of Computer Science (FOCS), pages 75–83. 1978.

[3]   Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. J. Comput. Syst. Sci., 18(2):155–193, 1979.

[4]   Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof verification and the hardness of approximation problems. J. of the ACM, 45(3):501–555, May 1998.

[5]   Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087–1097, 2018.

[6]   Anka Gajentaan and Mark H. Overmars. On a class of {O}(n^2) problems in computational geometry. Comput. Geom., 5:165–185, 1995.

[7]   M. R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.

[8]   K. G÷del. ▄ber formal unentscheidbare sΣtze der Principia Mathematica und verwandter systeme I. Monatsh. Math. Phys., 38, 1931.

[9]   Oded Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008.

[10]   David Harvey and Joris van der Hoeven. Integer multiplication in time O(n\mathrm {log}\, n). Annals of Mathematics, 193(2):563 – 617, 2021.

[11]   F. C. Hennie. One-tape, off-line turing machine computations. Information and Control, 8(6):553–578, 1965.

[12]   Fred Hennie and Richard Stearns. Two-tape simulation of multitape turing machines. J. of the ACM, 13:533–546, October 1966.

[13]   Russell Impagliazzo and Ramamohan Paturi. The complexity of k-sat. In IEEE Conf. on Computational Complexity (CCC), pages 237–, 1999.

[14]   Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Computer & Systems Sciences, 63(4):512–530, Dec 2001.

[15]   Russell Impagliazzo and Avi Wigderson. \mathit {P} = \mathit {BPP} if E requires exponential circuits: Derandomizing the XOR lemma. In 29th ACM Symp. on the Theory of Computing (STOC), pages 220–229. ACM, 1997.

[16]   Kojiro Kobayashi. On the structure of one-tape nondeterministic turing machine time hierarchy. Theor. Comput. Sci., 40:175–193, 1985.

[17]   Leonid A. Levin. Universal sequential search problems. Problemy Peredachi Informatsii, 9(3):115–116, 1973.

[18]   O. B. Lupanov. A method of circuit synthesis. Izv. VUZ Radiofiz., 1:120–140, 1958.

[19]   Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. J. Comput. Syst. Sci., 43(3):425–440, 1991.

[20]   Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. of the ACM, 26(2):361–381, 1979.

[21]   Arnold Sch÷nhage. Storage modification machines. SIAM J. Comput., 9(3):490–508, 1980.

[22]   Claude E. Shannon. The synthesis of two-terminal switching circuits. Bell System Tech. J., 28:59–98, 1949.

[23]   Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Mathematics of Computation, 54(189):435–447, 1990.

[24]   Larry Stockmeyer and Albert R. Meyer. Cosmological lower bound on the circuit complexity of a small problem in logic. J. ACM, 49(6):753–784, 2002.

[25]   Alan M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. London Math. Soc., s2-42(1):230–265, 1937.

[26]   Emanuele Viola. Reducing 3XOR to listing triangles, an exposition. Available at http://www.ccs.neu.edu/home/viola/, 2011.

[27]   Emanuele Viola. Pseudorandom bits and lower bounds for randomized turing machines. Theory of Computing, 18(10):1–12, 2022.

Nathan Never strikes back

Amiga Germany Fan’zine – #5 features Nathan Never, 30 years after the game came out (the article is from some months ago, but I just found out). You can read the article here.

My first day of high school I was looking for a desk, and the one that was left was occupied by Marco Genovesi (also see previous post about Black Viper, our other game together, also featured in article). We had both programmed a little the mythical C64. I had written in basic a fighting game inspired by Hokuto No Ken. It was hand-coded pixellated sprites doing flying martial arts kicks. It was on a cassette — I didn’t have a floppy disk. I would love to see it again, but it seems lost forever. It was my first game, written sometime when I was between 10 and 12.

If I think back at my schools, from elementary to high, I scream in horror. There was nothing. No play-yard, no props, nothing. Even the walls! How much does it cost to hang a world map? Instead they were solid color. We were 8-1 or something, every day, including Saturdays, in a room with solid color walls, solid color desks, and a blackboard, and that’s it. Perhaps no wonder that I exercised my art and craft on what I had. I dismembered floppy disks, cassettes, speakers etc. and stuck them on the wall. I scribbled all over my desk.

At some point I got an Amiga 1000 (my trajectory was Spectrum 48K, Commodore 64, Amiga 1000). Naturally I wanted to program, but had zero leads. Unlike today, it was very hard to get started. Interestingly, people more experienced than me had the same complaint, like “when I was 13 myself good luck finding an assembler.” Actually, I think things will continue to move in this direction. Even today, learning something isn’t that easy, and one can imagine how in 20 years you’ll have tutorials precisely tailored to your background, instead of having to zap through youtube to find the small step you’re missing.

So for a while I was stuck fiddling with startup sequences and shell commands. I remember my parents puzzled at why I wasn’t doing anything fancy like on the C64.
“You’re still at popcli” (a type of shell, known back then as CLI, command line interface) they told me once — I am cursed with good memory.
My reply was “You need a program to write a program.” They looked puzzled. (And so was I, after all, I didn’t need one on the C64.)

My turning point occurred when I did one of my stupid things. I was balancing myself on the top beam of a swing. Naturally, I fell, didn’t land right, and injured my ankle. I was stuck in my room for a month, and one day Marco showed up and brought me the Amiga hardware reference manual, a book we had overheard was critical. I thanked him and I remember him saying: “I am doing it for me.” This book became my bible. I thrashed the stupid C compilers, got an assembler, and started “bashing metal” and having fun.

Then there was the store. I don’t remember how I ended up there. It sold games, so maybe I was just there to buy games. But I never bought a new game — only pirate — so the explanation doesn’t stand. Anyway, we were there and the person there was a techno-musician who knew people who wanted to make a game on Nathan Never, a pretty high-profile graphic novel series in Italy.

To this day I am shocked they didn’t find anyone else, since as I said it’s pretty high-profile. Maybe the programmer wanted too much money, or ditched them last moment and they needed the game fast. Who knows. I had just barely started programming on Amiga, but… I know blitter, I know copper; I am obviously ready for a major project.

So we signed the contract, and I insisted on some money upfront which I got. A few Ks. The development of the game was covered in games magazines, which other people in our high school were devouring. They were shocked to find out the team was us! Waking around, we would pass by a store and find the box of the game on display.

As part of my deal I managed to get an Amiga 3000 tower, a one-of-a-kind machine, which for a kid like me was a dream. Before that, I didn’t have enough memory to play the game from the assembler with the graphics loaded, so when I tested it I would see dots instead of sprites. Marco, who did the graphics, at the beginning didn’t have a color monitor, and would pick colors from the hex code, quite cool. I brought the 3000T to our vacation that summer, and instead of going to the sea (which I hated) I stayed indoor to program. Naturally, I spent 50% (easily) of that time programming a script that would compile an animation through a landscape of fractal mountains. The final animation was quite cool.

I wish the game was more fun to play. It isn’t really my fault: no experience, no time, and crucially the people who imposed the game design on us were not players but exclusively concerned with showing the main character, whose frames occupy most of the memory. More than the game they cared about the graphic novel that came with it in the box. Of all the people buzzing around the game, I was the only one who actually played computer games (I still do). Still, of course, I could have improved the gameplay. Instead, on an ultra-tight deadline, I decided to embark on a 3D “escape from the tunnel” last level, something they didn’t ask for (they were happy with an animation, which would have cost me zero effort) and that probably almost noone has seen, since to get there you need to beat two grueling platform levels a puzzle level, with one life and no save state. (The 3D level starts at 20:09 in this long play.) And this is what it means to be 14.

On the bright side, however, perhaps the game has its unique charm, and today it can be played on a hard disk through the emulator, even though the game was not designed to be installed on a hard disk. And this is great because playing the game from the disks is really painful, despite or in part because of a crazy color hack to make a 3D “change disk” message (another uncalled-for feature). And it’s a good feeling that after 30+ years some people are still watching and playing the game.

Mathematics of the impossible: Computational Complexity, Chapter 3: The grand challenge

All posts in this series.
A PDF version of this post will be published with a delay, but if you’d like to have it soon let me know.


Contents

PhDComics

As mentioned in Chapter ??, our ability to prove impossibility results related to efficient computation appears very limited. We can now express this situation more precisely with the models we’ve introduced since then.

It is consistent with our knowledge that any problem in a standard algorithm textbook can be solved

  1. in Time cn^{2} on a TM, and
  2. in Time cn on a 2-TM, and
  3. by circuits of size cn.

Note that 2. implies 1. by Theorem ??.

In this chapter we begin to present several impossibility results, covering a variety of techniques which will be used later as well. As hinted above, they appear somewhat weak. However, jumping ahead, there is a flip side to all of this:

  1. At times, contrary to our intuition, stronger impossibility results are actually false. One example appears in Chapter ??. A list will be given later.
  2. Many times, the impossibility results that we can prove turn out to be, surprisingly, just “short” of proving major results. Here by “major result” I mean a result that would be phenomenal and that was in focus long before the connection was established. We will see several examples of this (section º??, section º??).
  3. Yet other times, one can identify broad classes of proof techniques, and argue that impossibility results can’t be proved with them (section º??).

Given this situation, I don’t subscribe to the general belief that stronger impossibility results are true and we just can’t prove them.

1.1 Information bottleneck: Palindromes requires quadratic time on TMs

Intuitively, the weakness of TMs is the bottleneck of passing information from one end of the tape to the other. We now show how to formalize this and use it show that deciding if a string is a palindrome requires quadratic time on TMs, which is tight and likely matches the time in Exercise ??. The same bound can be shown for other functions; palindromes just happen to be convenient to obtain matching bounds.

Theorem 1.1. \text {Palindromes}\not \in \text {TM-Time}(t(n)) for any t(n)=o(n^{2}).

More precisely, for every n and s, an s-state TM that decides if an n-bit input is a palindrome requires time \ge cn^{2}/\log s.

The main concept that allows us to formalize the information bottleneck mentioned above is the following.

Definition 1.1. A crossing sequence of a TM M on input x and boundary i, abbreviated i-CS, is the sequence of states that M is transitioning to when crossing cell boundary i (i.e., going from Cell i to i+1 or vice versa) during the computation on x.

The idea in the proof is very interesting. If M accepts inputs x and y and those two inputs have the same i-CS for some i, then we can “stitch together” the computation of M on x and y at boundary i to create a new input z that is still accepted by M. The input z is formed by picking bits from x to the left of cell boundary i and bits from y to the right of i:

\begin{aligned} z:=x_{1}x_{2}\cdots x_{i}y_{i+1}y_{i+2}\cdots y_{n}. \end{aligned}

The proof that z is still accepted is left as an exercise.

Now, for many problems, input z should not be accepted by M, and this gives a contradiction. In particular this will be be the case for palindromes. We are going to find two palindromes x and y that have the same i-CS for some i, but the corresponding z is not a palindrome, yet it is still accepted by M. We can find these two palindromes if M takes too little time. The basic idea is that if M runs in time t, because i-CSs for different i correspond to different steps of the computation, for every input there is a value of i such that the i-CS is short, namely has length at most t(|x|)/n. If t(n) is much less than n^{2}, the length of this CS is much less than n, from which we can conclude that the number of CSs is much less than the number of inputs, and so we can find two inputs with the same CS.

ProofLet n be divisible by four, without loss of generality, and consider palindromes of the form

\begin{aligned} p(x):=x0^{n/2}x^{R} \end{aligned}

where x\in \{0,1\} ^{n/4} and x^{R} is the reverse of x.

Assume there are x\ne y in \{0,1\} ^{n/4} and i in the middle part, i.e., n/4\le i\le 3n/4-1, so that the i-CS of M on p(x) and p(y) is the same. Then we can define z:=x0^{n/2}y^{R} which is not a palindrome but is still accepted by M, concluding the proof.

There remains to prove that the assumption of Theorem 1.1 implies the assumption in the previous paragraph. Suppose M runs in time t. Since crossing sequences at different boundaries correspond to different steps of the computation, for every x\in \{0,1\} ^{n/4} there is a value of i in the middle part such that the i-CS of M on p(x) has length \le ct/n. This implies that there is an i in the middle s.t. there are \ge c2^{n/4}/n inputs x for which the i-CS of M on x has length \le ct/n.

For fixed i, the number of i-CS of length \le \ell is \le (s+1)^{\ell }.

Hence there are x\ne y for which p(x) and p(y) have the same i-CS whenever c2^{n/4}/n\ge (s+1)^{ct/n}. Taking logs one gets ct\log (s)/n\le cn. QED

Exercise 1.1. For every s and n describe an s-state TM deciding palindromes in time cn^{2}/\log s (matching Theorem 1.1).

Exercise 1.2. Let L:=\{xx:x\in \{0,1\} ^{*}\}. Show L\in \text {TM-Time\ensuremath {(cn^{2})}}, and prove this is tight up to constants.

One may be tempted to think that it is not hard to prove stronger bounds for similar functions. In fact as mentioned above this has resisted all attempts!

1.2 Counting: impossibility results for non-explicit functions

Proving the existence of hard functions is simple: Just count. If there are more functions than efficient machines, some function is not efficiently computable. This is applicable to any model; next we state it for TMs for concreteness. Later we will state it for circuits.

Theorem 1.2. There exists a function f:\{0,1\} ^{n}\to \{0,1\} that cannot be computed by a TM with s states unless cs\log s\ge 2^{n}, regardless of time.

ProofThe number of TMs with s states is \le s{}^{cs}, and each TM computes at most one function (it may compute none, if it does not stop). The number of functions on n bits is 2^{2^{n}}. Hence if 2^{n}>cs\log s some function cannot be computed. QED

Note this bound is not far from that in Exercise ??.

It is instructive to present this basic result as an application of the probabilistic method:

ProofLet us pick f uniformly at random. We want to show that

\begin{aligned} \mathbb {P}_{f}[\exists \text { an }s\text {-state TM }M\text { such that }M(x)=f(x)\text { for every }x\in \{0,1\} ^{n}]<1. \end{aligned}

Indeed, if the probability is less than 1 than some function exists that cannot be computed. By a union bound we can say that this probability is

\begin{aligned} \le \sum _{M}\mathbb {P}_{f}[M(x)=f(x)\text { for every }x\in \{0,1\} ^{n}], \end{aligned}

where the sum is over all s-state machines. Each probability in the sum is (1/2)^{2^{n}}, since M is fixed. The number of s-state machines is \le s^{cs}. So the sum is \le s^{cs}(1/2)^{2^{n}}, and we can conclude as before taking logs. QED

1.3 Diagonalization: Enumerating machines

Can you compute more if you have more time? For example, can you write a program that runs in time n^{2} and computes something that cannot be computed in time n^{1.5}? The answer is yes for trivial reasons if we allow for non-boolean functions.

Exercise 1.3. Give a function f:\{0,1\}^* \to \{0,1\}^* in \text {Time}(n^{2})\setminus \text {Time}(n^{1.5}).

The answer is more interesting if the functions are boolean. Such results are known as time hierarchies, and a generic technique for proving them is diagonalization, applicable to any model.

We first illustrate the result in the simpler case of partial functions, which contains the main ideas. Later we discuss total functions.

Theorem 1.3. There is a partial function in \text {TM-Time}(t(n)) such that any TM M computing it runs in time \ge c_{M}t(n), for any t(n)=\omega (1).

In other words, \text {Time}(t(n))\supsetneq \text {Time}(o(t(n)).

ProofConsider the TM H that on input x=(M,1^{n-|M|}) of length n runs M on x until it stops and then complements the answer. (We can use a simple encoding of these pairs, for example every even-position bit of the description of M is a 0.)

Now define X to be the subset of pairs s.t. M runs in time \le t(n)/|M|^{c} on inputs of length n, and |M|^{c}\le t(n)/2.

On these inputs, H runs in time |M|^{c}+|M|^{c}\cdot t(n)/|M|^{c}\le t(n), as desired. To accomplish this, H can begin by making a copy of M in time |M|^{c}\le t(n)/2. Then every step of the computation of M can be simulated by H with |M|^{c} steps, always keeping the description of M to the left of the head.

Now suppose N computes the same function as H in time t(n)/|N|^{c}. Note that x:=(N,1^{n-|N|}) falls in the domain X of the function, for n sufficiently large, using that t(n)=\omega (1). Now consider running N on x. We have N(x)=H(x) by supposition, but H(x) is the complement of N(x), contradiction. QED

This proof is somewhat unsatisfactory; in particular we have no control on the running time of H on inputs not in X. It is desirable to have a version of this fundamental result for total functions. Such a version is stated next. It requires additional assumptions on t and a larger gap between the running times. Perhaps surprisingly, as we shall discuss, both requirements are essential.

Theorem 1.4. Let t(n)\ge n be a function. Suppose that f(x):=t(|x|) is in \text {TM-Time}(t(n)/\log ^{c}n).

There is a total function in \text {TM-Time}(ct(n)\log t(n)) that cannot be computed by any TM M in time c_{M}t(n).

The assumption about t is satisfied by all standard functions, including all those in this book. (For example, take t(n):=n^{2}. The corresponding f is then |x|^{2}. To compute f on input x of n bits we can first compute |x|=n in time cn\log n (Exercise ??). This is a number of b:=\log n bits. We can then square this number in time b^{c}. Note that the time to compute f(x) is dominated by the cn\log n term coming from computing |x|, which does not depend on t and is much less than the n^{2}/\log ^{c}n in the assumption.) The assumption cannot be removed altogether because there exist pathological functions t for which the result is false.

The proof is similar to that of Theorem 1.3. However, to make the function total we need to deal with arbitrary machines, which may not run in the desired time or even stop at all. The solution is to clock H in a manner similar to the proof of the universal machine, Lemma ??.

Also, we define a slightly different language to give a stronger result – a unary language – and to avoid some minor technical details (the possibility that the computation of f erases the input).

We define a TM H that on input 1^{n} obtains a description of a TM M, computes t(n), and then simulates M on input 1^{n} for t(n) steps in a way similar to Lemma ??, and if M stops then H outputs the complement of the output of M; and if M does not stop then H stops and outputs anything. Now H computes a function in time about t(n). We argue that this function cannot be computed in much less time as follows. Suppose some TM M computes the function in time somewhat less than t(n). Then we can pick an 1^{n} for which H obtains the description of this M, and the simulation always stops. Hence, for that 1^{n} we would obtain M(1^{n})=H(1^{n})=1-M(1^{n}), which is a contradiction.

However, there are interesting differences with the simulation in Lemma ??. In that lemma the universal machine U was clocking the steps of the machine M being simulated. Now instead we need to clock the steps of U itself, even while U is parsing the description of M to compute its transition function. This is necessary to guarantee that H does not waste time on big TM descriptions.

Whereas in Lemma ?? the tape was arranged as

\begin{aligned} (x,M,\underline {i},t',y), \end{aligned}

it will now be arranged as

\begin{aligned} (x,M',\underline {i},t',M'',y) \end{aligned}

which is parsed as follows. The description of M is M'M'', M is in state \underline {i}, the tape of M contains xy and the head is on the left-most symbol of y. The integer t' is the counter decreased at every step

ProofDefine TM H that on input 1^{n}:

  1. Compute (n,t(n),1^{n}).
  2. Compute (M_{n},t(n),1^{n}). Here M_{n} is obtained from n by removing all left-most zeroes until the first 1. I.e., if n=0^{j}1x then M_{n}=x. This is similar to the fact that a program does not change if you add, say, empty lines at the bottom.
  3. Simulate