Mathematics of the impossible: Computational Complexity, Chapter 6, Alternation

We placed one quantifier “in front” of computation and got something interesting: NP. So let’s push the envelope.

Definition 6.1. \Sigma _{i}Time(t(n)) is the set of functions f:X\subseteq \{0,1\}^* \to \{0,1\} for which there is a RAM M such that on input (x,y_{1},y_{2},\ldots ,y_{i}) stops within t(|x|) steps and

\begin{aligned} f(x)=1\Leftrightarrow \exists y_{1}\forall y_{2}\exists y_{3}\forall y_{4}\ldots Q_{i}y_{i}\in \{0,1\} ^{t(|x|)}:M(x,y_{1},y_{2},\ldots ,y_{i})=1. \end{aligned}

\Pi _{i}Time(t(n)) is defined similarly except that we start with a \forall quantifier. We also define

\begin{aligned} \text {\ensuremath {\Sigma _{i}\text {P}}}:= & \bigcup _{d}\Sigma _{i}\text {Time}(n^{d}),\\ \text {\ensuremath {\Pi _{i}\text {P}}}:= & \bigcup _{d}\Pi _{i}\text {Time}(n^{d}),\text { and}\\ \text {the power hiearchy PH}:= & \bigcup _{i}\Sigma _{i}\text {P}=\bigcup _{i}\Pi _{i}\text {P}. \end{aligned}

We refer to such computation and the corresponding machines as alternating, since they involve alternation of quantifiers and we will soon see a connection with alternating circuits.

As for NP, Definition 5.1, note that the running time of M is a function of |x| only. Again, this difference is inconsequential for \Sigma _{i}\text {P}, since the composition of two powers is another power. But it is important for a more fine-grained analysis.

Exercise 6.1. Min-Ckt is the problem of deciding if an input circuit has an equivalent circuit which is smaller. It is not known to be in NP. In which of the above classes can you place it?

6.1 Does the PH collapse?

We refer to the event that \exists i:\Sigma _{i}\text {P}=\text {PH} as “the PH collapses.” It is unknown if the PH collapses. Most people appear to believe that it does not, and to consider statements of the type

\begin{aligned} X\Rightarrow \text {PH collapses} \end{aligned}

as evidence that X is false. Examples of such statements are discussed next.

Theorem 6.1. \text {P}=\text {NP}\Rightarrow \text {P}=\text {PH}.

The idea in the proof is simply that if you can remove a quantifier then you can remove more.

ProofWe prove by induction on i that \Sigma _{i}\text {P}\bigcup \Pi _{i}\text {P}=\text {P}.

The base case i=1 follows by assumption and the fact that P is closed under complement.

Next we do the induction step. We assume the conclusion is true for i and prove it for i+1. We will show \Sigma _{i+1}\text {P}=\text {P}. The result about \Pi _{i+1}P follows again by complementing.

Let L\in \sum _{i+1}\text {P}, so \exists a and a power-time TM M such that for any x\in \{0,1\} ^{n},

\begin{aligned} x\in L\Leftrightarrow \exists y_{1}\forall y_{2}\exists y_{3}\forall y_{4}\ldots Q_{i+1}y_{i+1}\in \{0,1\} ^{n^{a}}:M(x,y_{1},y_{2},\ldots ,y_{i+1})=1. \end{aligned}

(As discussed after Definition 6.1 we don’t need to distinguish between time as a function of |x| or of |(x,y_{1},y_{2},\text {\dots },y_{i+1})| when considering power times as we are doing now.)

Now the creative step of the proof is to consider

\begin{aligned} L':=\left \{ (x,y_{1}):\forall y_{2}\in \{0,1\} ^{n^{a}}\ldots Q_{i+1}y_{i+1}\in \{0,1\} ^{n^{a}}:M(x,y_{1},y_{2},\ldots ,y_{i+1})=1\right \} . \end{aligned}

Note L'\in \Pi _{i}\text {P}. By induction hypothesis L'\in \text {P}. So let TM M' solve L' in power time. So x\in L\iff \exists y_{1}\in \{0,1\} ^{n^{a}}:M'(x,y_{1})=1. And so L\in \text {NP\ensuremath {=\text {P}}}, again using the hypothesis. QED

Exercise 6.2. Prove the following strengthening of Theorem 6.1:

\begin{aligned} \bigcup _{d}\text {NTime}(dn)\subseteq \text {Time}(n^{1+\epsilon })\Rightarrow \bigcup _{d}\text {\ensuremath {\Sigma _{i}\text {Time}(dn)\subseteq \text {Time}(n^{1+\epsilon ^{c_{i}}})}.} \end{aligned}

Exercise 6.3. Show that if \Sigma _{i}\text {P}=\Pi _{i}\text {P} for some i then the PH collapses to \Sigma _{i}\text {P}, that is, \text {PH}=\Sigma _{i}\text {P}.

Theorem 6.2. [21] \text {NP}\subseteq \text {PCkt}\Rightarrow \text {PH}=\Sigma _{2}\text {P}.

ProofWe’ll show \Pi _{2}\text {P}\subseteq \Sigma _{2}\text {P} and then appeal to Exercise 6.3. Let f\in \Pi _{2}\text {Time}(n^{d}) and M be a corresponding machine s.t. 

\begin{aligned} f(x)=1\Leftrightarrow \forall y_{1}\in \{0,1\} ^{n^{d}}\exists y_{2}\in \{0,1\} ^{n^{d}}:M(x,y_{1},y_{2})=1. \end{aligned}

We claim the following equivalent expression for the right-hand side above:

\begin{aligned} \forall y_{1}\in \{0,1\} ^{n^{d}}\exists y_{2}\in \{0,1\} ^{n^{d}}:M(x,y_{1},y_{2})=1\Leftrightarrow \exists C\forall y_{1}\in \{0,1\} ^{n^{d}}:M(x,y_{1},C(x,y_{1}))=1, \end{aligned}

where C ranges over circuits of size |x|^{d'} for some d'. If the equivalence is established the result follows, since evaluating a circuit can be done in power time.

To prove the equivalence, first note that the the \Leftarrow direction is obvious, by setting y_{2}:=C(x,y_{1}). The interesting direction is the \Rightarrow . We claim that under the assumption, there is a circuit that given x,y_{1} outputs a string y_{2} that makes M(x,y_{1},y_{2}) accept, if there is such a string.

To verify this, consider the problems CktSat and Search-CktSat which are analogous to the 3Sat and Search-3Sat problems but for general circuits rather than 3CNF. CktSat is in NP, and so by assumption has power-size circuits. By the reduction in Theorem 4.5, Search-CktSat has power-size circuits S as well. Hence, the desired circuit C may, on input x and y_{1} produce a new circuit W mapping an input y_{2} to M(x,y_{1},y_{2}), and run S on W. QED

Exercise 6.4. Prove that \text {PH}\not \subseteq \text {CktGates}(n^{k}), for any k\in \mathbb {N}. (Hint: Existentially guess the truth table of a hard function.)

Improve this to \text {\ensuremath {\Sigma _{2}\text {P}}}\not \subseteq \text {CktGates}(n^{k}).

Exercise 6.5. Prove \text {Exp}\subseteq \text {PCkt}\Rightarrow \text {Exp}=\Sigma _{2}\text {P}.

6.2 PH vs. alternating circuits

As suggested by the word alternation in Definition 6.1, the power hierarchy PH and its subclasses can be seen as alternating circuits. Before presenting this connection it is useful to write problems in PH in a specific format. The next lemma shows that we can restrict the machine M in Definition 6.1 to read only one bit of the input x. The price for this is that we are going to have to introduce an extra quantifier, however this new quantifier will only range over \log t bits.

Lemma 6.1. Let L\in \Sigma _{i}\text {P}. Then there exists a RAM M s.t.:

\begin{aligned} x\in L\Leftrightarrow & \exists y_{1}\in \{0,1\} ^{t(|x|)}\forall y_{2}\in \{0,1\} ^{t(|x|)}\ldots Q_{i-1}y_{i-1}\in \{0,1\} ^{t(|x|)}\\ & Q_{i}(y_{i},z)\in \{0,1\} ^{2t(|x|)}Q_{i+1}y_{i+1}\in \{0,1\} ^{\log t(|x|)}:M(x,y_{1},y_{2},\ldots ,y_{i+1})=1, \end{aligned}

and M on input (x,y_{1},y_{2},\ldots ,y_{i+1}) stops within ct(|x|) steps and only reads one bit of x.

Note the first i-1 quantifiers are over t bits and unchanged from Definition 6.1, the next one is over 2t bits, written as a pair (y_{i},z), and the last is over \log t. The idea is… you guessed it! We are going to guess the bits read from the input, and verify that each of them is correct.

ProofLet M' be the machine corresponding to L in Definition 6.1. We assume that Q_{i}=\exists , i.e., i is odd. The case Q_{i}=\forall is left for exercise.

We enlarge Q_{i} to quantify over additional t bits z which correspond to the bits of x read by M'. The desired machine M simulates M' with the following change. At any time step s, if M' reads bit j of x:

(1) If s\ne y_{i+1} then M does not read x and instead uses bit z_{s},

(2) If s=y_{i+1} then M reads the corresponding bit of x, and it also checks that this bit equals z_{s}. If it doesn’t M rejects.

By inspection, M reads at most one bit of x.

Correctness is argued as follows. For any y_{1},y_{2},\ldots ,y_{i}, if M' accepts x then there are values z for the bits read from the input x that cause M to accept. Conversely, if M accepts for some z then this z matches the bits read from x by M', for else M would reject in (2). Hence M' accepts x as well. QED

We now show how to simulate computation in PH by small-depth circuits. The size of the circuit is exponential in the time; such a simulation would not be interesting if the circuit is not explicit and the time is more than the input length, since every function on n bits has circuits of size 2^{n} (Theorem 2.3). By contrast, the simulation will give explicit circuits and also apply to running times less than the input length. The setting of running time less than the input length makes the simulation interesting even for non-explicit circuits, and is soon to play a critical role.

Lemma 6.2. Any function in \Sigma _{d}\text {Time}(t)\bigcup \Pi _{d}\text {Time}(t) has on inputs of length n alternating circuits of depth d+1 and 2^{cdt(n)} gates. The fan-in of each gate is \le 2^{ct(n)} and the fan-in of the gates closest to the input is \le t(n). Moreover, the circuit is explicit in the following sense: Given an index to a gate g of fan-in h and a number i\le h we can compute the index of child i of g in \text {Time}(cn).

In fact, most of this circuit only depends on t. The only thing that depends on the actual function being computed are the connections between the gates closest to the input and the input.

ProofWe apply Lemma 6.1. Then an \exists (resp. \forall ) quantifier on b bits corresponds to an Or (resp. And) gate with fan-in 2^{b}. A gate g closest to the input correspond to the computation of the RAM for fixed quantified variables. This computation depends on at most one bit of x. If this computation is a constant independent of x, we simply replace this gate with the appropriate constant. Otherwise, if it depends on a bit x_{j} of x then the computation of the RAM is either x_{j} or \neg x_{j}. Thus we can connect gate g to either input gate x_{j} or input gate \neg x_{j}.

An index to a gate g next to the input is just an assignment to the quantified variables y_{i}. Given such an index and i\le t we can compute in linear time which input bit it depends on. This is done by simulating the machine until the i-th time it reads an input bit. Note this simulation runs in time ct which is linear in the length of an index to the gate. QED

6.3 BPP in PH

It is not known if BPP is contained in NP. However, we can show that BPP is in PH. More precisely, the following two simulations are known. The first optimizes the number of quantifiers, the second the time. This should be contrasted with various conditional results suggesting that in fact a quasilinear deterministic simulation (with no quantifiers) is possible.

Theorem 6.3. For every function t we have:

(1) [35] \text {BPTime}(t)\subseteq \Sigma _{2}\text {Time}(t^{2}\log ^{c}t), and

(2) [4
\text {BPTime}(t)\subseteq \Sigma _{3}\text {Time}(t\log ^{c}t).

A good way to think of these results is as follows. Fix a \text {BPTime}(t) machine M and an input x\in \{0,1\} ^{n}. Now the alternating machine is trying to decide if for most choices of the random bits y we have M(x,y)=1, or if for most choices we have M(x,y)=0. This can be seen as a version of the Majority problem, with two critical features. The first is that it is succinct in the sense of section 5.4.1, that is, the alternating machine does not have access to the exponentially-long majority instance, but rather has access to a small circuit M(x,\cdot ) s.t. M(x,y) is bit y of the majority instance. The second is that instances have a gap. We define this gap-majority problem next. We define the non-succinct version which is of independent interest, and so use the letter n to indicate input length. But recall that for the application later in this section the input is given succinctly, as just remarked, so the gap majority instance will actually be on a number of bits that is exponential in the input length to the alternating machine.

Definition 6.2. \text {Gap-Maj}_{\alpha ,\beta } is the problem of deciding if an input x\in \{0,1\} ^{n} has weight |x|\le \alpha n or |x|\ge \beta n.

As discussed in section º6.2, it is useful to think of alternating computation as alternating circuits. Indeed, the circuit result that is the starting point of all these simulations is the following somewhat surprising construction of small-depth alternating circuits for Gap-Maj. By contrast, (non-gap) Maj does not have small constant-depth alternating circuits, as we will prove in Chapter ??.

Lemma 6.3. [3]\text { Gap-Maj}_{1/3,2/3}(x) has alternating circuits of depth 3 and size n^{c}. Moreover, the gates at distance 1 from the input have fan-in \le c\log n.

ProofThis is a striking application of the probabilistic method. For a fixed pair of inputs (x,y) we say that a distribution C on circuits gives (\le p,\ge q) if \mathbb {P}_{C}[C(x)=1]\le p and \mathbb {P}_{C}[C(y)=1]\ge q; and we similarly define gives with reverse inequalities. Our goal is to have a distribution that gives

\begin{aligned} (\le 2^{-n},\ge 1-2^{-n})~~~~(6.1) \end{aligned}

for every pair (x,y)\in \{0,1\} ^{n}\times \{0,1\} ^{n} where |x|\le n/3 and |y|\ge 2n/3. Indeed, if we have that we can apply a union bound over the <2^{n} inputs to obtain a fixed circuit that solves Gap-Maj.

We construct the distribution C incrementally. Fix any pair (x,y) as above. Begin with the distribution C_{\wedge } obtained by picking 2\log n bits uniformly from the input and computing their And. This gives

\begin{aligned} \left ((1/3)^{2\log n},(2/3)^{2\log n}\right ). \end{aligned}

Let p:=(1/3)^{2\log n} and note (2/3)^{2\log n}=p\cdot n^{2}. So we can say that C_{\wedge } gives

\begin{aligned} \left (\le p,\ge p\cdot n^{2}\right ). \end{aligned}

Now consider the distribution C_{\vee } obtained by complementing the circuits in C_{\wedge }. This gives

\begin{aligned} \left (\ge 1-p,\le 1-p\cdot n^{2}\right ). \end{aligned}

Next consider the distribution C_{\wedge \vee } obtained by taking the And of m:=p^{-1}/n independent samples of C_{\vee }. This gives

\begin{aligned} \left (\ge (1-p)^{m},\le (1-p\cdot n^{2})^{m}\right ). \end{aligned}

To make sense of these quantities we use the basic approximations

\begin{aligned} 1+z \le e^{z }\le 1+2z \end{aligned}

valid for all z \in [0,1]. These imply (1-p)^{m}\ge e^{-pm/2}=e^{-1/(2n)}\ge 0.9 and (1-p\cdot n^{2})^{m}\le e^{-n}. Summarizing, this gives

\begin{aligned} (\ge 0.9,\le e^{-n}). \end{aligned}

Next consider the distribution C_{\vee \wedge } obtained by complementing the circuits in C_{\wedge \vee }. This gives

\begin{aligned} (\le 0.1,\ge 1-e^{-n}). \end{aligned}

Finally, consider the distribution C_{\wedge \vee \wedge } obtained by taking the And of n independent samples of C_{\vee \wedge }. This gives

\begin{aligned} \left (\le 0.1^{n},\ge \left (1-e^{-n}\right )^{n}\right ). \end{aligned}

To make sense of the rightmost quantity we can use the approximation

\begin{aligned} (1+\alpha )^{r}\ge 1+r\alpha \end{aligned}

valid for all \alpha \ge -1 and r\ge 1. Thus this gives

\begin{aligned} \left (\le 0.1^{n},\ge 1-ne^{-n}\right ). \end{aligned}

We have ne^{-n}<2^{-n}. Thus this distribution in particular gives equation (??). The bounds on the number of gates and the fan-in holds by inspection. QED

Exercise 6.6. Prove \text { Gap-Maj}_{1/2-1/\sqrt {\log n},1/2+1/\sqrt {\log n}} has alternating circuits of depth c and size n^{c}.

Exercise 6.7. Assume the circuit in Lemma 6.3 is explicit in the sense of Lemma 6.2. Prove Theorem 6.3.

There remains to construct explicit circuits for Gap-Maj. We give a construction which has worse parameters than Lemma 6.3 but is simple and suffices for (1) in Theorem 6.3. The idea is that if the input weight of x is large, then we can find a few shifts of the ones in x that cover each of the n bits. But if the weight of x is small we can’t. By “shift” by s we mean the string x_{i\oplus s}, obtained from x by permuting the indices by xoring them with s. (Other permutations would work just as well.)

Lemma 6.4. Let r:=\log n. The following circuit solves \text {GapMaj}_{1/r^{2},1-1/r^{2}} on every x\in \{0,1\} ^{n}:

\begin{aligned} \bigvee s_{1},s_{2},\ldots ,s_{r}\in \{0,1\} ^{r}:\bigwedge i\in \{0,1\} ^{r}:\bigvee j\in \{1,2,\ldots ,r\}:x_{i\oplus s_{j}}. \end{aligned}

Note that

\begin{aligned} \bigwedge i\in \{0,1\} ^{r}:\bigvee j\in \{1,2,\ldots ,r\}:x_{i\oplus s_{j}} \end{aligned}

means that every bit i is covered by some shift s_{j} of the input x.

ProofAssume |x|\le n/r^{2}. Each shift s_{i} contributes at most n/r^{2} ones. Hence all the r shifts contribute \le n/r ones, and we do not cover every bit i.

Now assume |x|\ge n(1-1/r^{2}). We show the existence of shifts s_{i} that cover every bit by the probabilistic method. Specifically, for a fixed x we pick the shifts uniformly at random and aim to show that the probability that we do not cover every bit is <1. Indeed:

\begin{aligned} \begin{aligned} & \mathbb {P}_{s_{1},s_{2},\ldots ,s_{r}}[\exists i\in \{0,1\} ^{r}:\forall j\in \{1,2,\ldots ,r\}:x_{i\oplus s_{j}}=0]\\ \le & \sum _{i\in \{0,1\} ^{r}}\mathbb {P}_{s_{1},s_{2},\ldots ,s_{r}}[\forall j\in \{1,2,\ldots ,r\}:x_{i\oplus s_{j}}=0] & \text {(union bound)}\\ = & \sum _{i\in \{0,1\} ^{r}}\mathbb {P}_{s}[x_{i\oplus s}=0]^{r} & \text {(independence of the }s_{i}\text {)}\\ \le & \sum _{i\in \{0,1\} ^{r}}(1/r^{2})^{r} & \text {(by assumption on }|x|\text {)}\\ \le & (2/r^{2})^{r}\\ < & 1, \end{aligned} \end{aligned}

as desired. QED

Exercise 6.8. Prove (1) in Theorem 6.3.

Lemma 6.4 is not sufficient for (2) in Theorem 6.3. One can prove (2) by derandomizing the shifts in Lemma 6.4. This means generating their r^{2} bits using a seed of only r\log ^{c}r bits (instead of the trivial r^{2} in Lemma 6.4.).

Exercise 6.9. Prove:

(1) \text {P}=\text {NP}\Rightarrow \text {P}=\text {BPP}.

(2) \Sigma _{2}\text {P}\subseteq \text {BPP}\Rightarrow \text {PH collapses}.

6.4 The quantifier calculus

We have extended \text {P} with \exists and \forall quantifiers. We have also extended it with randomness to obtain \text {BPP}. As alluded to before, we can also think of BPP as a quantifier \text {BP} applied to \text {P}. The Unique-3Sat problem (Theorem 5.8) also points to a new quantifier, “exists unique.” We now develop a general calculus of quantifiers, and examine fundamental relationships between then. For simplicity, we only consider power-time computation.

Definition 6.3. Let C be a class of functions mapping \{0,1\}^* \to \{0,1\} . We define L\in \text {Op}\cdot C if there is L'\in C and d\in \mathbb {N} such that

  • \text {Op}=\text {Maj}
    \begin{aligned} x\in L & \Leftrightarrow \mathbb {P}_{y\in \{0,1\} ^{|x|^{d}}}[(x,y)\in L']\ge 1/2. \end{aligned}
  • \text {Op}=\text {BP}
    \begin{aligned} x\in L & \Rightarrow \mathbb {P}_{y\in \{0,1\} ^{|x|^{d}}}[(x,y)\in L']\ge 2/3,\\ x\not \in L & \Rightarrow \mathbb {P}_{y\in \{0,1\} ^{|x|^{d}}}[(x,y)\in L']\le 1/3. \end{aligned}
  • \text {Op}=\oplus (read: parity)
    \begin{aligned} x\in L\Leftrightarrow \text {there is an odd number of }y\in \{0,1\} ^{|x|^{d}}:(x,y)\in L'. \end{aligned}
  • \text {Op}=\exists
    \begin{aligned} x\in L\Leftrightarrow \exists y\in \{0,1\} ^{|x|^{d}}:(x,y)\in L'. \end{aligned}
  • \text {Op}=\forall
    \begin{aligned} x\in L\Leftrightarrow \forall y\in \{0,1\} ^{|x|^{d}}:(x,y)\in L'. \end{aligned}

With this notation we have: \text {NP}=\exists \cdot \text {P}, \text {BPP}=\text {BP}\cdot \text {P}, \Sigma _{2}\text {P}=\exists \cdot \forall \cdot \text {P}.

6.5 PH is a random low-degree polynomial

In this section we prove the following result.

Theorem 6.4. [37] \text {PH}\subseteq \text {BP}\cdot \oplus \cdot \text {P}.

This is saying that any constant number of \exists and \forall quantifier can be replaced by a BP quantifier followed by a \oplus quantifier. Let’s see what this has to do with the title of this section. Where is the polynomial? Consider polynomials over \mathbb {F}_{2}\in \{0,1\} . Recall that such a polynomial over n bits is an object like

\begin{aligned} p(x_{1},x_{2},\ldots ,x_{n})=x_{1}\cdot x_{2}+x_{3}+x_{7}\cdot x_{2}\cdot x_{1}+x_{2}+1. \end{aligned}

Because we are only interested in inputs in \{0,1\} we have x^{i}=x for any i\ge 1 and any variable x, so we don’t need to raise variables to powers bigger than one.

Example 6.1. The And function on n bits can be written as the polynomial

\begin{aligned} \text {And}(x_{1},x_{2},\ldots ,x_{n})=\prod _{i=1}^{n}x_{i}. \end{aligned}

The Or function on n bits can be written as the polynomial

\begin{aligned} \text {Or}(x_{1},x_{2},\ldots ,x_{n})=1+\text {And}(1+x_{1},1+x_{2},\ldots ,1+x_{n})=1+\prod _{i=1}^{n}(1+x_{i}). \end{aligned}

For n=2 we have

\begin{aligned} \text {Or}(x_{1},x_{2})=x_{1}+x_{2}+x_{1}\cdot x_{2}. \end{aligned}

The polynomial corresponding to a PH computation will have an exponential number of terms, so we can’t write it down. The big sum over all its monomials corresponds to the \oplus in Theorem 6.4. The polynomial will be sufficiently explicit: we will be able to compute each of its monomials in P. Finally, there won’t be just one polynomial, but we will have a distribution on polynomials, and that’s the BP part.

Confusing? Like before, a good way to look at this result is in terms of alternating circuits. We state the basic circuit result behind Theorem 6.4 after a definition. The result is of independent interest and will be useful later.

Definition 6.4. A distribution P on polynomials computes a function f:\zonzo with error \epsilon if for every x we have

\begin{aligned} \mathbb {P}_{P}[P(x)=f(x)]\ge 1-\epsilon . \end{aligned}

Theorem 6.5. Let C:\{0,1\} ^{n}\to \{0,1\} be an alternating circuit of depth d and size s. Then there is a distribution P on polynomials over \mathbb {F}_{2} of degree \log ^{d}s/\epsilon that computes C with error \epsilon .

Ultimately we only need constant error, but the construction requires small error. Jumping ahead, this is because we construct distributions for each gate separately, and we need the error to be small enough for a union bound over all gates in the circuit.

The important point in Theorem 6.5 is that if the depth d is small (e.g., constant) (and the size is not enormous and the error is not too small) then the degree is small as well. For example, for power-size alternating circuits of constant depth the degree is power logarithmic for constant error.

Let us slowly illustrate the ideas behind Theorem 6.5 starting with the simplest case: C is just a single Or gate on n bits.

Lemma 6.5. For every \epsilon and n there is a distribution P on polynomials of degree \log 1/\epsilon in n variables over \mathbb {F}_{2} that computes Or with error \epsilon .

ProofFor starters, pick the following distribution on linear polynomials: For a uniform A=(A_{1},A_{2},\ldots ,A_{n})\in \{0,1\} ^{n} output the polynomial

\begin{aligned} p_{A}(x_{1},x_{2},\ldots ,x_{n}):=\sum _{i}A_{i}\cdot x_{i}. \end{aligned}

Let us analyze how p_{A} behaves on a fixed input x\in \{0,1\} ^{n}:

  • If \text {Or}(x)=0 then p_{A}(x)=0;
  • If \text {Or}(x)=1 then \mathbb {P}_{A}[p_{A}(x)=1]\ge 1/2.

While the error is large in some cases, a useful feature of p_{A} is that it never makes mistakes if \text {Or}(x)=0. This allows us to easily reduce the error by taking t:=\log 1/\epsilon polynomials p_{A} and combining them with an Or.

\begin{aligned} p_{A_{1},A_{2},\ldots ,A_{t}}(x):=p_{A_{1}}(x)\vee p_{A_{2}}(x)\vee \cdots \vee p_{A_{t}}(x). \end{aligned}

The analysis is like before:

  • If \text {Or}(x)=0 then p_{A_{1},A_{2},\ldots ,A_{t}}(x)=0;
  • If \text {Or}(x)=1 then \mathbb {P}_{A_{1},A_{2},\ldots ,A_{t}}[p{}_{A_{1},A_{2},\ldots ,A_{t}}(x)=1]\ge 1-(1/2)^{t}\ge 1-\epsilon .

It remains to bound the degree. Each p_{A_{i}} has degree 1. The Or on t bits has degree t by Example 6.1. Hence the final degree is t=\log 1/\epsilon . QED

Exercise 6.10. Obtain the same result for C=\text {And}.

Now we would like to handle general circuits which have any number of And and Or gates. As mentioned earlier, we apply the construction above to every gate, and compose the polynomials. We pick the error at each gate small enough so that we can do a union bound over all gates.

Proof of Theorem 6.5 We apply Lemma 6.5 to every gate in the circuit with error \epsilon /s. By a union bound, the probability that any gate makes a mistake is \epsilon , as desired.

The final polynomial is obtained by composing the polynomials of each gate. The composition of a polynomial of degree d_{1} with another of degree d_{2} results in a polynomial of degree d_{1}\cdot d_{2}. Since each polynomial has degree \log s/\epsilon , the final degree is \log ^{d}s/\epsilon . QED

6.5.1 Back to PH

We have proved Theorem 6.5 which is a circuit analogue of Theorem 6.4. We now go back to the PH. First, we have to come up with a more explicit description of the polynomials, instead of picking them at random. This is similar to the way we proceeded in section º6.3: After a non-explicit construction (Lemma 6.3) we then obtained an explicit construction (Lemma 6.4).

Let us go back to the simplest case of Or. Recall that the basic building block in the proof of Lemma 6.5 was the construction of a distribution p_{A} on linear polynomials which are zero on the all-zero input (which just means that they do not have constant terms), and are often non-zero on any non-zero input. We introduce a definition, since now we will have several constructions with different parameters.

Definition 6.5. A distribution p_{A} on linear polynomials with no constant term has the Or property if \mathbb {P}_{A}[p_{A}(x)=1]\ge 1/3 for any x\ne 0. We identify p_{A} with the n bits A corresponding to its coefficients.

The next lemma shows that we can compute distributions on linear polynomials with the Or property from a seed of just \log n+c bits, as opposed to the n bits that were used for A in the proof of Lemma 6.5. Recall that for our application to Lemma 6.5 the polynomials have an exponential number of monomials and so we cannot afford to write them down. Instead we shall guarantee that given a seed r and an index to a monomial we can compute the monomial via a function f in P. In this linear case, for a polynomial in n variables we have \le n monomials x_{i}. So the function f takes as input r and a number i\le n and outputs the coefficient to x_{i}.

Lemma 6.6. [25] Given n, i\le n, and r\in \{0,1\} ^{2\log n+c} we can compute in P a function f(r,i) such that for uniform R\in \{0,1\} ^{2\log n+c} the distribution

\begin{aligned} (f(R,1),f(R,2),\ldots ,f(R,n)) \end{aligned}

has the Or property.

Proof[4] Let q:=2^{\log n+c} and identify the field \mathbb {F}_{q} with bit strings of length \log q. We view r as a pair (s,t)\in (\mathbb {F}_{q})^{2}. Then we define

\begin{aligned} f((s,t),i):=\langle s^{i},t\rangle \end{aligned}

where s^{i} is exponentiation in \mathbb {F}_{q} and \langle .,.\rangle :(\mathbb {F}_{q})^{2}\to \{0,1\} is defined as \langle u,v\rangle :=\sum u_{i}\cdot v_{i} over \mathbb {F}_{2}.

To show that this has the Or property, pick any non-zero x\in \{0,1\} ^{n}. We have to show that

\begin{aligned} p:=\mathbb {P}_{S,T}[\sum _{i}\langle S^{i},T\rangle x_{i}=1]\ge 1/3. \end{aligned}

The critical step is to note that

\begin{aligned} \sum _{i}\langle S^{i},T\rangle x_{i}=\sum _{i}\langle x_{i}\cdot S^{i},T\rangle =\langle \sum _{i}x_{i}\cdot S^{i},T\rangle . \end{aligned}

Now, if x\ne 0, then the probability over S that \sum _{i}x_{i}\cdot S^{i}=0 is \le n/q\le 1/6. This is because any S that gives a zero is a root of the non-zero, univariate polynomial q(y):=\sum _{i}x_{i}\cdot y^{i} of degree \le n over \mathbb {F}_{q}, and so the bound follows by Theorem 5.8.

Whenever \sum _{i}x_{i}\cdot S^{i}\ne 0, the probability over T that \langle \sum _{i}x_{i}\cdot S^{i},T\rangle =0 is 1/2. Hence our target probability p above satisfies

\begin{aligned} p\ge 1/2-1/6 \end{aligned}

as desired. QED

Exercise 6.11. Give an alternative construction of a distribution with the Or property following this guideline.

(1) Satisfy the Or property for every input x with weight 1.

(2) For any j, satisfy the Or property for every input x with weight between 2^{j} and 2^{j+1}. Use Lemma 5.2.

(3) Combine (2) with various j to satisfy the Or property for every input.

(4) State the seed length for your distribution and compare it to that of Lemma 6.6.

With this in hand, we can now reduce the error in the same way we reduced it in the proof of Lemma 6.5.

Lemma 6.7. Given n, a seed r\in \{0,1\} ^{c\log (1/\epsilon )\log n}, and m\le n^{c\log 1/\epsilon } we can compute in P a monomial X_{r,m} of degree c\log 1/\epsilon such that the distribution

\begin{aligned} \sum _{m}X_{R,m} \end{aligned}

for uniform R computes Or with error \epsilon .

ProofWe use the construction

\begin{aligned} p_{A_{1},A_{2},\ldots ,A_{t}}(x):=p_{A_{1}}(x)\vee p_{A_{2}}(x)\vee \cdots \vee p_{A_{t}}(x) \end{aligned}

from the proof of Lemma 6.5, except that each A_{i} is now generated via Lemma 6.5, and that t=c\log 1/\epsilon (as opposed to t=\log 1/\epsilon before). The bound on the degree is the same as before, as is the proof that it computes Or: The error will be (1/3)^{c\log 1/\epsilon }\le \epsilon .

There remains to show that the monomials can be computed in P. For this we can go back to the polynomial for Or in Example 6.1. Plugging that gives

\begin{aligned} p_{A_{1},A_{2},\ldots ,A_{t}}(x)=\sum _{a\in \{0,1\} ^{t}:a\ne 0}\prod _{i\le t}(p_{A_{i}}(x)+a_{i}+1). \end{aligned}

We can use m to index a choice of a and then a choice for a monomial in each of the t linear factors p_{A_{i}}(x)+a_{i}+1. For each factor we can use Lemma 6.6 to compute the monomials. QED

We can now use Lemma 6.7 to prove Theorem 6.4. As in the proof Theorem 6.5 we will convert each gate to a polynomial.

Proof of Theorem 6.4 Let L\in \text {PH}. Apply Lemma 6.2 to obtain a corresponding alternating circuit C of depth d and size s\le 2^{n^{d}} for a constant d. Replace each gate with the distribution on polynomials given by Lemma 6.7. Note each of these polynomials is on 2^{n^{c_{d}}}variables and has degree n^{c_{d}}. We set the error \epsilon in Lemma 6.7 to be 1/(3s). This guarantees that the seed length in each application of Lemma 6.7 is \le n^{c_{d}}. Moreover, the seed used to sample the polynomials is re-used across all gates. We can afford this because we use a union bound in the analysis. Hence the seed length for all the polynomials is again just n^{c_{d}}. We can quantify over this many bits using the BP quantifier.

Finally, we have to compose the polynomials at each gate. We are going to show how we can compute the monomials of the composed polynomial in the same way as we computed monomials in Lemma 6.7. Once we have a monomial in the input variables x_{1},x_{2},\ldots ,x_{n} we simply evaluate it in P on the input bits.

Start at the output gate. We use n^{c_{d}} bits in the \oplus quantifier to choose a monomial in the corresponding polynomial. We write down this monomial, using Lemma 6.7. This monomial is over the 2^{n^{d}} variables z_{1},z_{2},\ldots corresponding to the children of the output gate, and only contains n^{c_{d}} variables. To each z_{i} there corresponds a polynomial p_{i}. Choose a monomial from p_{i} and replace z_{i} with that monomial; do this for every i. The choice of the monomials is done again using bits quantified by the \oplus quantifier. We continue in this way until we have monomials just in the input variables x_{1},x_{2},\ldots ,x_{n}. Because each monomial is over n^{c_{d}} variables, and the depth is constant, the total number of bits in the \oplus quantifier, which are needed to choose monomials, is n^{c_{d}}. QED

Exercise 6.12. A set C\subseteq \{0,1\} ^{n} of size 2^{k} is called linear if there exists an n\times k matrix M over \mathbb {F}_{2} such that C=\{Mx:x\in \{0,1\} ^{k}\}.

  1. Recall error-correcting codes from Exercise 2.18. Prove that a linear set C is an error-correcting code iff the weight of any non-zero string in C is at least n/3.
  2. Prove the existence of linear error-correcting codes matching the parameters in Exercise 2.18.
  3. Let S be a subset of \{0,1\} ^{k} s.t. the uniform distribution over S has the Or property. Define |S|\times k matrix M_{S} where the rows are the elements of S. Prove that \{M_{S}x:x\in \{0,1\} ^{k}\} is an error-correcting code.
  4. Give explicit error-correcting codes over ck^{2} bits of size 2^{k}.
  5. This motivates improving the parameters of distributions with the Or property. Improve the seed length in Lemma 6.6 to \log n+c\log \log n. Hint: What property you need from T?
  6. Give explicit error-correcting codes over k\log ^{c}k bits of size 2^{k}.

6.6 The power of majority

Exercise 6.13. [The power of majority] Prove:

  1. \text {BPP}\cdot \text {P}\subseteq \text {Maj}\cdot \text {P}.
  2. \{(M,x,t):\text { the number of }y\in \{0,1\} ^{|t|}\text { such that }M(x,y)=1\text { is }\ge t\}\in \text {Maj}\cdot \text {P}.
  3. The same as 2. but with \ge replaced by \le .
  4. \text {NP}\subseteq \text {Maj}\cdot \text {P}.
  5. Maj \cdot \oplus \cdot \text {P}\subseteq \text {Maj}\cdot \text {Maj}\cdot \text {P}. (Hint: This is confusing if you don’t have the right angle, otherwise is not too bad. For starters, forget everything and imagine you have an integer w in some range and you want to know if w is odd by just asking questions of the type w\ge t and w\le t', for various t,t'. You want that the number of questions with answer “yes” only depends on whether w is odd or even.)
  6. \text {PH}\subseteq \text {Maj}\cdot \text {Maj}\cdot \text {P}.

It is not known if \text {NP} has linear-size circuits. We saw in Exercise 6.4 that PH does not have circuits of size n^{k}, for any k. By Exercise 6.13 this holds for \text {Maj}\cdot \text {Maj}\cdot \text {P} as well. The following result improves this Maj \cdot P. It is particularly interesting because it cannot be established using a well-studied class of techniques which includes all the results about PH we have encountered so far.

Theorem 6.6. [40] \text {Maj}\cdot \text {P}\not \subseteq \text {CktGates}(n^{k}), for any k\in \mathbb {N}.


[1]   Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59–78. IEEE Computer Society, 2015.

[2]   Leonard Adleman. Two theorems on random polynomial time. In 19th IEEE Symp. on Foundations of Computer Science (FOCS), pages 75–83. 1978.

[3]   Mikl≤s Ajtai. \Sigma \sp {1}\sb {1}-formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.

[4]   Noga Alon, Oded Goldreich, Johan Hσstad, and RenΘ Peralta. Simple constructions of almost k-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992.

[5]   Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. J. Comput. Syst. Sci., 18(2):155–193, 1979.

[6]   Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof verification and the hardness of approximation problems. J. of the ACM, 45(3):501–555, May 1998.

[7]   Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087–1097, 2018.

[8]   Kenneth E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, volume 32, pages 307–314, 1968.

[9]   Stephen A. Cook. A hierarchy for nondeterministic time complexity. J. of Computer and System Sciences, 7(4):343–353, 1973.

[10]   Aviezri S. Fraenkel and David Lichtenstein. Computing a perfect strategy for n x n chess requires time exponential in n. J. Comb. Theory, Ser. A, 31(2):199–214, 1981.

[11]   Anka Gajentaan and Mark H. Overmars. On a class of {O}(n^2) problems in computational geometry. Comput. Geom., 5:165–185, 1995.

[12]   M. R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.

[13]   K. G÷del. ▄ber formal unentscheidbare sΣtze der Principia Mathematica und verwandter systeme I. Monatsh. Math. Phys., 38, 1931.

[14]   Oded Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008.

[15]   David Harvey and Joris van der Hoeven. Integer multiplication in time O(n\mathrm {log}\, n). Annals of Mathematics, 193(2):563 – 617, 2021.

[16]   F. C. Hennie. One-tape, off-line turing machine computations. Information and Control, 8(6):553–578, 1965.

[17]   Fred Hennie and Richard Stearns. Two-tape simulation of multitape turing machines. J. of the ACM, 13:533–546, October 1966.

[18]   Russell Impagliazzo and Ramamohan Paturi. The complexity of k-sat. In IEEE Conf. on Computational Complexity (CCC), pages 237–, 1999.

[19]   Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Computer & Systems Sciences, 63(4):512–530, Dec 2001.

[20]   Russell Impagliazzo and Avi Wigderson. \mathit {P} = \mathit {BPP} if E requires exponential circuits: Derandomizing the XOR lemma. In 29th ACM Symp. on the Theory of Computing (STOC), pages 220–229. ACM, 1997.

[21]   Richard M. Karp and Richard J. Lipton. Turing machines that take advice. L’Enseignement MathΘmatique. Revue Internationale. IIe SΘrie, 28(3-4):191–209, 1982.

[22]   Kojiro Kobayashi. On the structure of one-tape nondeterministic turing machine time hierarchy. Theor. Comput. Sci., 40:175–193, 1985.

[23]   Leonid A. Levin. Universal sequential search problems. Problemy Peredachi Informatsii, 9(3):115–116, 1973.

[24]   O. B. Lupanov. A method of circuit synthesis. Izv. VUZ Radiofiz., 1:120–140, 1958.

[25]   Joseph Naor and Moni Naor. Small-bias probability spaces: efficient constructions and applications. SIAM J. on Computing, 22(4):838–856, 1993.

[26]   NEU. From RAM to SAT. Available at, 2012.

[27]   Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. J. Comput. Syst. Sci., 43(3):425–440, 1991.

[28]   Wolfgang J. Paul, Nicholas Pippenger, Endre SzemerΘdi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In IEEE Symp. on Foundations of Computer Science (FOCS), pages 429–438, 1983.

[29]   Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. of the ACM, 26(2):361–381, 1979.

[30]   J. M. Robson. N by N checkers is exptime complete. SIAM J. Comput., 13(2):252–267, 1984.

[31]   Rahul Santhanam. On separators, segregators and time versus space. In IEEE Conf. on Computational Complexity (CCC), pages 286–294, 2001.

[32]   Arnold Sch÷nhage. Storage modification machines. SIAM J. Comput., 9(3):490–508, 1980.

[33]   Claude E. Shannon. The synthesis of two-terminal switching circuits. Bell System Tech. J., 28:59–98, 1949.

[34]   Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Mathematics of Computation, 54(189):435–447, 1990.

[35]   Michael Sipser. A complexity theoretic approach to randomness. In ACM Symp. on the Theory of Computing (STOC), pages 330–335, 1983.

[36]   Larry Stockmeyer and Albert R. Meyer. Cosmological lower bound on the circuit complexity of a small problem in logic. J. ACM, 49(6):753–784, 2002.

[37]   Seinosuke Toda. PP is as hard as the polynomial-time hierarchy. SIAM J. on Computing, 20(5):865–877, 1991.

[38]   Alan M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. London Math. Soc., s2-42(1):230–265, 1937.

[39]   Leslie G. Valiant and Vijay V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85–93, 1986.

[40]   N. V. Vinodchandran. A note on the circuit complexity of PP. Theor. Comput. Sci., 347(1-2):415–418, 2005.

[41]   Emanuele Viola. On approximate majority and probabilistic time. Computational Complexity, 18(3):337–375, 2009.

[42]   Emanuele Viola. Reducing 3XOR to listing triangles, an exposition. Available at, 2011.

[43]   Emanuele Viola. Pseudorandom bits and lower bounds for randomized turing machines. Theory of Computing, 18(10):1–12, 2022.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s