Special Topics in Complexity Theory, Lectures 8-9

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lecture 8-9, Scribe: Xuangui Huang

In these lectures, we finish the proof of the approximate degree lower bound for AND-OR function, then we move to the surjectivity function SURJ. Finally we discuss quasirandom groups.

1.1 Lower Bound of d_{1/3}(AND-OR)

Recall from the last lecture that AND-OR:\{0,1\}^{R\times N} \rightarrow \{0,1\} is the composition of the AND function on R bits and the OR function on N bits. We also proved the following lemma.

Lemma 1. Suppose that distributions A^0, A^1 over \{0,1\}^{n_A} are k_A-wise indistinguishable distributions; and distributions B^0, B^1 over \{0,1\}^{n_B} are k_B-wise indistinguishable distributions. Define C^0, C^1 over \{0,1\}^{n_A \cdot n_B} as follows:

C^b: draw a sample x \in \{0,1\}^{n_A} from A^b, and replace each bit x_i by a sample of B^{x_i} (independently).

Then C^0 and C^1 are k_A \cdot k_B-wise indistinguishable.

To finish the proof of the lower bound on the approximate degree of the AND-OR function, it remains to see that AND-OR can distinguish well the distributions C^0 and C^1. For this, we begin with observing that we can assume without loss of generality that the distributions have disjoint supports.

Claim 2. For any function f, and for any k-wise indistinguishable distributions A^0 and A^1, if f can distinguish A^0 and A^1 with probability \epsilon then there are distributions B^0 and B^1 with the same properties (k-wise indistinguishability yet distinguishable by f) and also with disjoint supports. (By disjoint support we mean for any x either \Pr [B^0 = x] = 0 or \Pr [B^1 = x] = 0.)

Proof. Let distribution C be the “common part” of A^0 and A^1. That is to say, we define C such that \Pr [C = x] := \min \{\Pr [A^0 = x], \Pr [A^1 = x]\} multiplied by some constant that normalize C into a distribution.

Then we can write A^0 and A^1 as

\begin{aligned} A^0 &= pC + (1-p) B^0 \,,\\ A^1 &= pC + (1-p) B^1 \,, \end{aligned}

where p \in [0,1], B^0 and B^1 are two distributions. Clearly B^0 and B^1 have disjoint supports.

Then we have

\begin{aligned} \mathbb{E} [f(A^0)] - \mathbb{E} [f(A^1)] =&~p \mathbb{E} [f(C)] + (1-p) \mathbb{E} [f(B^0)] \notag \\ &- p \mathbb{E} [f(C)] - (1-p) \mathbb{E} [f(B^1)] \\ =&~(1-p) \big ( \mathbb{E} [f(B^0)] - \mathbb{E} [f(B^1)] \big ) \\ \leq &~\mathbb{E} [f(B^0)] - \mathbb{E} [f(B^1)] \,. \end{aligned}

Therefore if f can distinguish A^0 and A^1 with probability \epsilon then it can also distinguish B^0 and B^1 with such probability.

Similarly, for all S \neq \varnothing such that |S| \leq k, we have

\begin{aligned} 0 = \mathbb{E} [\chi _S(A^0)] - \mathbb{E} [\chi _S(A^1)] = (1-p) \big ( \mathbb{E} [\chi _S(B^0)] - \mathbb{E} [\chi _S(B^1)] \big ) = 0 \,. \end{aligned}

Hence, B^0 and B^1 are k-wise indistinguishable. \square

Equipped with the above lemma and claim, we can finally prove the following lower bound on the approximate degree of AND-OR.

Theorem 3. d_{1/3}(AND-OR) = \Omega (\sqrt {RN}).

Proof. Let A^0, A^1 be \Omega (\sqrt {R})-wise indistinguishable distributions for AND with advantage 0.99, i.e. \Pr [\mathrm {AND}(A^1) = 1] > \Pr [\mathrm {AND}(A^0) = 1] + 0.99. Let B^0, B^1 be \Omega (\sqrt {N})-wise indistinguishable distributions for OR with advantage 0.99. By the above claim, we can assume that A^0, A^1 have disjoint supports, and the same for B^0, B^1. Compose them by the lemma, getting \Omega (\sqrt {RN})-wise indistinguishable distributions C^0,C^1. We now show that AND-OR can distinguish C^0, C^1:

  • C_0: First sample A^0. As there exists a unique x = 1^R such that \mathrm {AND}(x)= 1, \Pr [A^1 = 1^R] >0. Thus by disjointness of support \Pr [A^0 = 1^R] = 0. Therefore when sampling A^0 we always get a string with at least one “0”. But then “0” is replaced with sample from B^0. We have \Pr [B^0 = 0^N] \geq 0.99, and when B^0 = 0^N, AND-OR=0.
  • C_1: First sample A^1, and we know that A^1 = 1^R with probability at least 0.99. Each bit “1” is replaced by a sample from B^1, and we know that \Pr [B^1 = 0^N] = 0 by disjointness of support. Then AND-OR=1.

Therefore we have d_{1/3}(AND-OR)= \Omega (\sqrt {RN}). \square

1.2 Lower Bound of d_{1/3}(SURJ)

In this subsection we discuss the approximate degree of the surjectivity function. This function is defined as follows.

Definition 4. The surjectivity function SURJ\colon \left (\{0,1\}^{\log R}\right )^N \to \{0,1\}, which takes input (x_1, \dots , x_N) where x_i \in [R] for all i, has value 1 if and only if \forall j \in [R], \exists i\colon x_i = j.

First, some history. Aaronson first proved that the approximate degree of SURJ and other functions on n bits including “the collision problem” is n^{\Omega (1)}. This was motivated by an application in quantum computing. Before this result, even a lower bound of \omega (1) had not been known. Later Shi improved the lower bound to n^{2/3}, see [AS04]. The instructor believes that the quantum framework may have blocked some people from studying this problem, though it may have very well attracted others. Recently Bun and Thaler [BT17] reproved the n^{2/3} lower bound, but in a quantum-free paper, and introducing some different intuition. Soon after, together with Kothari, they proved [BKT17] that the approximate degree of SURJ is \Theta (n^{3/4}).

We shall now prove the \Omega (n^{3/4}) lower bound, though one piece is only sketched. Again we present some things in a different way from the papers.

For the proof, we consider the AND-OR function under the promise that the Hamming weight of the RN input bits is at most N. Call the approximate degree of AND-OR under this promise d_{1/3}^{\leq N}(AND-OR). Then we can prove the following theorems.

Theorem 5. d_{1/3}(SURJ) \geq d_{1/3}^{\leq N}(AND-OR).

Theorem 6. d_{1/3}^{\leq N}(AND-OR) \geq \Omega (N^{3/4}) for some suitable R = \Theta (N).

In our settings, we consider R = \Theta (N). Theorem 5 shows surprisingly that we can somehow “shrink” \Theta (N^2) bits of input into N\log N bits while maintaining the approximate degree of the function, under some promise. Without this promise, we just showed in the last subsection that the approximate degree of AND-OR is \Omega (N) instead of \Omega (N^{3/4}) as in Theorem 6.

Proof of Theorem 5. Define an N \times R matrix Y s.t. the 0/1 variable y_{ij} is the entry in the i-th row j-th column, and y_{ij} = 1 iff x_i = j. We can prove this theorem in following steps:

  1. d_{1/3}(SURJ(\overline {x})) \geq d_{1/3}(AND-OR(\overline {y})) under the promise that each row has weight 1;
  2. let z_j be the sum of the j-th column, then d_{1/3}(AND-OR(\overline {y})) under the promise that each row has weight 1, is at least d_{1/3}(AND-OR(\overline {z})) under the promise that \sum _j z_j = N;
  3. d_{1/3}(AND-OR(\overline {z})) under the promise that \sum _j z_j = N, is at least d_{1/3}^{=N}(AND-OR(\overline {y}));
  4. we can change “=N” into “\leq N”.

Now we prove this theorem step by step.

  1. Let P(x_1, \dots , x_N) be a polynomial for SURJ, where x_i = (x_i)_1, \dots , (x_i)_{\log R}. Then we have
    \begin{aligned} (x_i)_k = \sum _{j: k\text {-th bit of }j \text { is } 1} y_{ij}. \end{aligned}

    Then the polynomial P'(\overline {y}) for AND-OR(\overline {y}) is the polynomial P(\overline {x}) with (x_i)_k replaced as above, thus the degree won’t increase. Correctness follows by the promise.

  2. This is the most extraordinary step, due to Ambainis [Amb05]. In this notation, AND-OR becomes the indicator function of \forall j, z_j \neq 0. Define
    \begin{aligned} Q(z_1, \dots , z_R) := \mathop {\mathbb{E} }_{\substack {\overline {y}: \text { his rows have weight } 1\\ \text {and is consistent with }\overline {z}}} P(\overline {y}). \end{aligned}

    Clearly it is a good approximation of AND-OR(\overline {z}). It remains to show that it’s a polynomial of degree k in z’s if P is a polynomial of degree k in y’s.

    Let’s look at one monomial of degree k in P: y_{i_1j_1}y_{i_2j_2}\cdots y_{i_kj_k}. Observe that all i_\ell ’s are distinct by the promise, and by u^2 = u over \{0,1\}. By chain rule we have

    \begin{aligned} \mathbb{E} [y_{i_1j_1}\cdots y_{i_kj_k}] = \mathbb{E} [y_{i_1j_1}]\mathbb{E} [y_{i_2j_2}|y_{i_1j_1} = 1] \cdots \mathbb{E} [y_{i_kj_k}|y_{i_1j_1}=\cdots =y_{i_{k-1}j_{k-1}} = 1]. \end{aligned}

    By symmetry we have \mathbb{E} [y_{i_1j_1}] = \frac {z_{j_1}}{N}, which is linear in z’s. To get \mathbb{E} [y_{i_2j_2}|y_{i_1j_1} = 1], we know that every other entry in row i_1 is 0, so we give away row i_1, average over y’s such that \left \{\begin {array}{ll} y_{i_1j_1} = 1 &\\ y_{ij} = 0 & j\neq j_1 \end {array}\right . under the promise and consistent with z’s. Therefore

    \begin{aligned} \mathbb{E} [y_{i_2j_2}|y_{i_1j_1} = 1] = \left \{ \begin {array}{ll} \frac {z_{j_2}}{N-1} & j_1 \neq j_2,\\ \frac {z_{j_2}-1}{N-1} & j_1 = j_2. \end {array}\right . \end{aligned}

    In general we have

    \begin{aligned} \mathbb{E} [y_{i_kj_k}|y_{i_1j_1}=\cdots =y_{i_{k-1}j_{k-1}} = 1] = \frac {z_{j_k} - \#\ell < k \colon j_\ell = j_k}{N-k + 1}, \end{aligned}

    which has degree 1 in z’s. Therefore the degree of Q is not larger than that of P.

  3. Note that \forall j, z_j = \sum _i y_{ij}. Hence by replacing z’s by y’s, the degree won’t increase.
  4. We can add a “slack” variable z_0, or equivalently y_{01}, \dots , y_{0N}; then the condition \sum _{j=0}^R z_j = N actually means \sum _{j=1}^R z_j \leq N.


Proof idea for Theorem 6. First, by the duality argument we can verify that d_{1/3}^{\leq N}(f) \geq d if and only if there exists d-wise indistinguishable distributions A, B such that:

  • f can distinguish A, B;
  • A and B are supported on strings of weight \leq N.

Claim 7. d_{1/3}^{\leq \sqrt {N}}(OR_N) = \Omega (N^{1/4}).

The proof needs a little more information about the weight distribution of the indistinguishable distributions corresponding to this claim. Basically, their expected weight is very small.

Now we combine these distributions with the usual ones for And using the lemma mentioned at the beginning.

What remains to show is that the final distribution is supported on Hamming weight \le N. Because by construction the R copies of the distributions for Or are sampled independently, we can use concentration of measure to prove a tail bound. This gives that all but an exponentially small measure of the distribution is supported on strings of weight \le N. The final step of the proof consists of slightly tweaking the distributions to make that measure 0. \square

1.3 Groups

Groups have many applications in theoretical computer science. Barrington [Bar89] used the permutation group S_5 to prove a very surprising result, which states that the majority function can be computed efficiently using only constant bits of memory (something which was conjectured to be false). More recently, catalytic computation [BCK^{+}14] shows that if we have a lot of memory, but it’s full with junk that cannot be erased, we can still compute more than if we had little memory. We will see some interesting properties of groups in the following.

Some famous groups used in computer science are:

  • \{0,1\}^n with bit-wise addition;
  • \mathbb {Z}_m with addition mod m ;
  • S_n, which are permutations of n elements;
  • Wreath product G:= (\mathbb {Z}_m \times \mathbb {Z}_m) \wr \mathbb {Z}_2\,, whose elements are of the form (a,b)z where z is a “flip bit”, with the following multiplication rules:
    • (a, b) 1 = 1 (b, a) ;
    • z\cdot z' := z+z' in \mathbb {Z}_2 ;
    • (a,b) \cdot (a',b') := (a+a', b+b') is the \mathbb {Z}_m\times \mathbb {Z}_m operation;

    An example is (5,7)1 \cdot (2,1) 1 = (5,7) 1 \cdot 1 (1, 2) = (6,9)0 . Generally we have

    \begin{aligned} (a, b) z \cdot (a', b') z' = \left \{ \begin {array}{ll} (a + a', b+b') z+z' & z = 1\,,\\ (a+b', b + a') z+z' & z = 0\,; \end {array}\right . \end{aligned}

  • SL_2(q) := \{2\times 2 matrices over \mathbb {F}_q with determinant 1\}, in other words, group of matrices \begin {pmatrix} a & b\\ c & d \end {pmatrix} such that ad - bc = 1.

The group SL_2(q) was invented by Galois. (If you haven’t, read his biography on wikipedia.)

Quiz. Among these groups, which is the “least abelian”? The latter can be defined in several ways. We focus on this: If we have two high-entropy distributions X, Y over G, does X \cdot Y has more entropy? For example, if X and Y are uniform over some \Omega (|G|) elements, is X\cdot Y close to uniform over G? By “close to” we mean that the statistical distance is less that a small constant from the uniform distribution. For G=(\{0,1\}^n, +), if Y=X uniform over \{0\}\times \{0,1\}^{n-1}, then X\cdot Y is the same, so there is not entropy increase even though X and Y are uniform on half the elements.

Definition 8.[Measure of Entropy] For \lVert A\rVert _2 = \left (\sum _xA(x)^2\right )^{\frac {1}{2}}, we think of \lVert A\rVert ^2_2 = 100 \frac {1}{|G|} for “high entropy”.

Note that \lVert A\rVert ^2_2 is exactly the “collision probability”, i.e. \Pr [A = A']. We will consider the entropy of the uniform distribution U as very small, i.e. \lVert U\rVert ^2_2 = \frac {1}{|G|} \approx \lVert \overline {0}\rVert ^2_2. Then we have

\begin{aligned} \lVert A - U \rVert ^2_2 &= \sum _x \left (A(x) - \frac {1}{|G|}\right )^2\\ &= \sum _x A(x)^2 - 2A(x) \frac {1}{|G|} + \frac {1}{|G|^2} \\ &= \lVert A \rVert ^2_2 - \frac {1}{|G|} \\ &= \lVert A \rVert ^2_2 - \lVert U \rVert ^2_2\\ &\approx \lVert A \rVert ^2_2\,. \end{aligned}

Theorem 9.[[Gow08], [BNP08]] If X, Y are independent over G, then

\begin{aligned} \lVert X\cdot Y - U \rVert _2 \leq \lVert X \rVert _2 \lVert Y \rVert _2 \sqrt {\frac {|G|}{d}}, \end{aligned}

where d is the minimum dimension of irreducible representation of G.

By this theorem, for high entropy distributions X and Y, we get \lVert X\cdot Y - U \rVert _2 \leq \frac {O(1)}{\sqrt {|G|d}}, thus we have

\begin{aligned} ~~~~(1) \lVert X\cdot Y - U \rVert _1 \leq \sqrt {|G|} \lVert X\cdot Y - U \rVert _2 \leq \frac {O(1)}{\sqrt {d}}. \end{aligned}

If d is large, then X \cdot Y is very close to uniform. The following table shows the d’s for the groups we’ve introduced.

G \{0,1\}^n \mathbb {Z}_m (\mathbb {Z}_m \times \mathbb {Z}_m) \wr \mathbb {Z}_2 A_n SL_2(q)

d 1 1 should be very small \frac {\log |G|}{\log \log |G|} |G|^{1/3}

Here A_n is the alternating group of even permutations. We can see that for the first groups, Equation ((1)) doesn’t give non-trivial bounds.

But for A_n we get a non-trivial bound, and for SL_2(q) we get a strong bound: we have \lVert X\cdot Y - U \rVert _2 \leq \frac {1}{|G|^{\Omega (1)}}.


[Amb05]    Andris Ambainis. Polynomial degree and lower bounds in quantum complexity: Collision and element distinctness with small range. Theory of Computing, 1(1):37–46, 2005.

[AS04]    Scott Aaronson and Yaoyun Shi. Quantum lower bounds for the collision and the element distinctness problems. J. of the ACM, 51(4):595–605, 2004.

[Bar89]    David A. Mix Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC^1. J. of Computer and System Sciences, 38(1):150–164, 1989.

[BCK^{+}14]    Harry Buhrman, Richard Cleve, Michal Koucký, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In ACM Symp. on the Theory of Computing (STOC), pages 857–866, 2014.

[BKT17]    Mark Bun, Robin Kothari, and Justin Thaler. The polynomial method strikes back: Tight quantum query bounds via dual polynomials. CoRR, arXiv:1710.09079, 2017.

[BNP08]    László Babai, Nikolay Nikolov, and László Pyber. Product growth and mixing in finite groups. In ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 248–257, 2008.

[BT17]    Mark Bun and Justin Thaler. A nearly optimal lower bound on the approximate degree of AC0. CoRR, abs/1703.05784, 2017.

[Gow08]    W. T. Gowers. Quasirandom groups. Combinatorics, Probability & Computing, 17(3):363–387, 2008.


The emission protection agency

Libby, Montana, the town that health-conscious dwellers have learned to fear. The vermiculite extracted from a mine next to the town was packaged as Zonolite insulation and ended up in millions of homes, possibly including yours.  Unfortunately, that vermiculite was contaminated with asbestos, a mineral once considered a miracle and now known to cause cancer.  The town population has been gruesomely decimated by the asbestos contamination, and new cases keep coming. That mine has produced the vast majority of the vermiculite insulation in the world.

Yesterday it was announced that the so-called Environment Protection Agency (EPA) won’t after all review much of the hazardous material in use, including asbestos. What this means exactly is of course murky, but the bottom line is crystal clear: The EPA will do less to protect your health.


Special Topics in Complexity Theory, Lectures 6-7

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lecture 6-7, Scribe: Willy Quach

In these lectures, we introduce k-wise indistinguishability and link this notion to the approximate degree of a function. Then, we study the approximate degree of some functions, namely, the AND function and the AND-OR function. For the latter function we begin to see a proof that is different (either in substance or language) from the proofs in the literature. We begin with some LATEXtips.

1.1 Some LATEX tips.

  • Mind the punctuation. Treat equations as part of the phrases; add commas, colons, etc accordingly.
  • In math mode, it is usually better to use \ell (\ell ) rather than regular l. The latter can be confused with 1.
  • Align equations with \begin{align} \cdots \end{align} with the alignment character &.
  • For set inclusion, use \subset (\subset ) only if you mean proper inclusion (which is uncommon). Otherwise use \subseteq (\subseteq ). (Not everybody agrees with this, but this seems the most natural convention by analogy with < and \le .)

1.2 Introducing k-wise indistinguishability.

We studied previously the following questions:

  • What is the minimum k such that any k-wise independent distribution P over \{0,1\}^n fools \mathrm {AC}^0 (i.e. \mathbb {E}C(P) \approx \mathbb {E}C(U) for all poly(n)-size circuits C with constant depth)?

    We saw that k = \log ^{\mathcal {O}(d)}(s/\epsilon ) is enough.

  • What is the minimum k such that P fools the AND function?

    Taking k=\mathcal {O}(1) for \epsilon =\mathcal {O}(1) suffices (more precisely we saw that k-wise independence fools the AND function with \epsilon = 2^{-\Omega (k)}).

Consider now P and Q two distributions over \{0,1\}^n that are k-wise indistinguishable, that is, any projection over k bits of P and Q have the same distribution. We can ask similar questions:

  • What is the minimum k such that \mathrm {AC}^0 cannot distinguish P and Q (i.e. \mathbb {E}C(P) \approx \mathbb {E}C(Q) for all poly(n)-size circuits C with constant depth)?

    It turns out this requires k \geq n^{1-o(1)}: there are some distributions that are almost always distinguishable in this regime. (Whether k=\Omega (n) is necessary or not is an open question.)

    Also, k = n\left (1- \frac {1}{polylog(n)}\right ) suffices to fool \mathrm {AC}^0 (in which case \epsilon is essentially exponentially small).

  • What is the minimum k such that the AND function (on n bits) cannot distinguish P and Q?

    It turns out that k=\Theta (\sqrt {n}) is necessary and sufficient. More precisely:

    • There exists some P,Q over \{0,1\}^n that are c\sqrt {n}-wise indistinguishable for some constant c, but such that:
      \begin{aligned} \left | \Pr _P [AND(P)=1] - \Pr _Q [AND(Q)=1] \right | \geq 0.99 \,;\end{aligned}
    • For all P, Q that are c'\sqrt {n}-wise indistinguishable for some bigger constant c', we have:
      \begin{aligned} \left | \Pr _P [AND(P)=1] - \Pr _Q [AND(Q)=1] \right | \leq 0.01 \,.\end{aligned}

1.3 Duality.

Those question are actually equivalent to ones related about approximation by real-valued polynomials:

Theorem 1. Let f:\{0,1\}^n \rightarrow \{0,1\} be a function, and k an integer. Then:

\begin{aligned} \max _{P,Q \, k\text {-wise indist.}} \left | \mathbb {E}f(P)-\mathbb {E}f(Q) \right | = \min \{ \, \epsilon \, | \, \exists g\in \mathbb {R}_k[X]: \forall x, \left |f(x)-g(x) \right | \leq \epsilon \}. \end{aligned}

Here \mathbb {R}_k[X] denotes degree-k real polynomials. We will denote the right-hand side \epsilon _k(f).

Some examples:

  • f=1: then \mathbb {E}f(P)=1 for all distribution P, so that both sides of the equality are 0.
  • f(x) = \sum _i x_i \bmod 2 the parity function on n bits.

    Then for k = n-1, the left-hand side is at least 1/2: take P to be uniform; and Q to be uniform on n-1 bits, defining the nth bit to be Q_n = \sum _{i<n} Q_i \bmod 2 to be the parity of the first n-1 bits. Then \mathbb {E}f(P)=1/2 but \mathbb {E}f(Q)=0.

    Furthermore, we have:

    Claim 2. \epsilon _{n-1}(\mathrm {Parity}) \geq 1/2.

    Proof. Suppose by contradiction that some polynomial g has degree k and approximates Parity by \epsilon < 1/2.

    The key ingredient is to symmetrize a polynomial p, by letting

    \begin{aligned} p^{sym}(x) := \frac {1}{n!} \sum _{\pi \in \mathfrak {S}_n} f(\pi x),\end{aligned}

    where \pi ranges over permutations. Note that p^{sym}(x) only depends on \|x\| = \sum _i x_i.

    Now we claim that there is a univariate polynomial p' also of degree k such that

    \begin{aligned} p'(\sum x_i) = p^{sym}(x_1, x_2, \ldots , x_n)\end{aligned}

    for every x.

    To illustrate, let M be a monomial of p. For instance if M = X_1, then p'(i) = i/n, where i is the Hamming weight of the input. (For this we think of the input as being \in \{0,1\}. Similar calculations can be done for \in \{-1,-1\}.)

    If M = X_1 X_2, then p'(i) = \frac {i}{n}\cdot \frac {i-1}{n} which is quadratic in i.

    And so on.

    More generally p^{sym}(X_1,\dots ,X_n) is a symmetric polynomial. As \{(\sum _j X_j)^\ell \}_{\ell \leq k} form a basis of symmetric polynomials of degree k, p^{sym} can be written as a linear combination in this basis. Now note that \{(\sum _j X_j)^{\ell } (x)\}_{\ell \leq k} only depends on \|x\|; substituting i = \sum _j X_j gives that p' is of degree \leq k in i.

    (Note that the degree of p' can be strictly less than the degree of p (e.g. for p(X_1,X_2) = X_1-X_2: we have p^{sym} = p' = 0).)

    Then, applying symmetrization on g, if g is a real polynomial \epsilon -close to Parity (in \ell _\infty norm), then g' is also \epsilon -close to Parity’ (as a convex combination of close values).

    Finally, remark that for every integer k \in \{0,\dots ,\lfloor n/2 \rfloor \}, we have: Parity'(2k)=0 and Parity'(2k+1)=1. In particular, as \epsilon < 1/2, g'-1/2 must have at least n zeroes, and must therefore be zero, which is a contradiction. \square

We will now focus on proving the theorem.

Note that one direction is easy: if a function fis closely approximated by a polynomial g of degree k, it cannot distinguish two k-wise indistinguishable distributions P and Q:

\begin{aligned} \mathbb {E}[f(P)]&= \mathbb {E}[g(P)] \pm \epsilon \\ &\stackrel {(*)}{=} \mathbb {E}[g(Q)] \pm \epsilon \\ &= \mathbb {E}[f(Q)] \pm 2\epsilon \, , \end{aligned}

where (*) comes from the fact that P and Q are k-wise indistinguishable.

The general proof goes by a Linear Programming Duality (aka finite-dimensional Hahn-Banach theorem, min-max, etc.). This states that:

If A \in \mathbb {R}^{n\times m}, x\in \mathbb {R}^m, b\in \mathbb {R}^n and c\in \mathbb {R}^m, then:

\left . \begin {array}{rrcl} &\min \langle c,x \rangle &=& \sum _{i \leq m} c_i x_i\\ &&\\ \text { subject to:} &Ax &=& b\\ &x &\geq & 0\\ \end {array} \right | \, = \, \left | \begin {array}{cc} &\max \langle b,y \rangle \\ &\\ \text { subject to:} &A^T y \leq c\\ &\\ \end {array} \right .

We can now prove the theorem:


The proof will consist in rewriting the sides of the equality in the theorem as outputs of a Linear Program. Let us focus on the left side of the equality: \max _{P,Q \, k\text {-wise indist.}} \left | \mathbb {E}f(P)-\mathbb {E}f(Q) \right |.

We will introduce 2^{n+1} variables, namely P_x and Q_x for every x\in \{0,1\}^n, which will represent \Pr [D=x] for D=P,Q.

We will also use the following, which can be proved similarly to the Vazirani XOR Lemma:

Claim 3. Two distributions P and Q are k-wise indistinguishable if and only if: \forall S\subseteq \{1,\dots ,n\} with |S|\leq k, \sum _x P_x \chi _S(x) - \sum _x Q_x \chi _S(x)=0, where \chi _S(X) = \prod _S X_i is the Fourier basis of boolean functions.

The quantity \max _{P,Q \, k\text {-wise indist.}} \left | \mathbb {E}f(P)-\mathbb {E}f(Q) \right | can then be rewritten:

\begin {array}{rrl} &-\min \sum _x P_xf(x) - \sum _x Q_xf(x)\\ &&\\ \text { subject to:} &\sum _x P_x &= 1\\ &\sum _x Q_x &= 1\\ &\forall S \subseteq \{1,\dots ,n\} \text { s.t. } |S|\leq k,\sum _x (P_x - Q_x) \chi _S(x) &= 0 \\ \end {array}

Following the syntax of LP Duality stated above, we have:

c^T = \overbrace {\cdots f(x) \cdots }^{2^n}\overbrace {\cdots -f(x) \cdots }^{2^n} \in \mathbb {R}^{2n}, (where x goes over \{0,1\}^n),

x^T = \overbrace {\cdots P_x \cdots }^{2^n}\overbrace {\cdots Q_x \cdots }^{2^n} \in \mathbb {R}^{2n},

b^T = 1 1 \overbrace {0\cdots 0}^{\# S},

A = \left ( \begin {array}{cc} \overbrace {1\cdots \cdots 1}^{2^n} & \overbrace {0\cdots \cdots 0}^{2^n} \\ 0 \cdots \cdots 0 & 1 \cdots \cdots 1 \\ \cdots \cdots & \cdots \cdots \\ \vdots \cdots \cdots \vdots & \vdots \cdots \cdots \vdots \\ \cdots \chi _S(x) \cdots & \cdots -\chi _S(x) \cdots \\ \vdots \cdots \cdots \vdots & \vdots \cdots \cdots \vdots \\ \cdots \cdots & \cdots \cdots \\ \end {array} \right ) ,

where the rows of A except the first two correspond to some S \subseteq \{1,\dots ,n\} such that |S|\leq k.

We apply LP duality. We shall denote the new set of variables by

y^T = d \, d'\, \overbrace {\cdots d_S\cdots }^{\#S}.

We have the following program:

\begin {array}{rrl} &-\max d+d'\\ &&\\ \text { subject to:} &\forall x, d + \sum _x d_S \chi _S(x) &\leq f(x)\\ &\forall x, d' - \sum _x d_S \chi _S(x) &\leq -f(x)\\ \end {array}

Writing d' = -d-\epsilon , the objective becomes to minimize \epsilon , while the second set of constraints can be rewritten:

\begin{aligned} \forall x, d+\epsilon + \sum _S d_S\chi _S(x) \geq f(x) \, . \end{aligned}

The expression d + \sum _S d_S \chi _S(X) is an arbitrary degree-k polynomial which we denote by g(X). So our constrains become

\begin{aligned} g(x) &\leq f(x)\\ g(x) + \epsilon &\geq f(x). \end{aligned}

Where g ranges over all degree-k polynomials, and we are trying to minimize \epsilon . Because g is always below f, but when you add \epsilon it becomes bigger, g is always within \epsilon of f. \square

1.4 Approximate Degree of AND.

Let us now study the AND function on n bits. Let us denote d_{\epsilon }(f) the minimal degree of a polynomial approximating f with error \epsilon .

We will show that d_{1/3}(AND) = \Theta (\sqrt {n}).

Let us first show the upper bound:

Claim 4. We have:

\begin{aligned}d_{1/3}(\text {AND}) = \mathcal {O}(\sqrt {n}).\end{aligned}

To prove this claim, we will consider a special family of polynomials:

Definition 5. (Chebychev polynomials of the first kind.)

The Chebychev polynomials (of the first kind) are a family \{T_k\}_{k\in \mathbb {N}} of polynomials defined inductively as:

  • T_0(X) := 1,
  • T_1(X) := X,
  • \forall k \geq 1, T_{k+1}(X) := 2X T_k - T_{k-1}.

Those polynomials satisfy some useful properties:

  1. \forall x \in [-1,1], T_k(x) = \cos (k \arccos (x))\, ,
  2. \forall x \in [-1,1], \forall k, |T_k(x)| \leq 1 \, ,
  3. \forall x such that |x| \geq 1, |T'_k(x)| \geq k^2 \, ,
  4. \forall k, T_k(1)=1 \, .

Property 2 follows from 1, and property 4 follows from a direct induction. For a nice picture of these polynomials you should have come to class (or I guess you can check wikipedia). We can now prove our upper bound:

Proof. Proof of Claim:

We construct a univariate polynomial p:\{0,1,\dots ,n\} \rightarrow \mathbb {R} such that:

  • \deg p = \mathcal {O}(\sqrt {n});
  • \forall i<n, |P(i)| \leq 1/3;
  • |P(n)-1| \leq 1/3.

In other words, p will be close to 0 on [0,n-1], and close to 1 on n. Then, we can naturally define the polynomial for the AND function on n bits to be q(X_1,\dots ,X_n) := p(\sum _i X_i), which also has degree \mathcal {O}(\sqrt {n}). Indeed, we want q to be close to 0 if X has Hamming weight less than n, while being close to 1 on X of Hamming weight n (by definition of AND). This will conclude the proof.

Let us define p as follows:

\begin{aligned} \forall i\leq n, \quad p(i):= \frac {T_k\left ( \frac {i}{n-1}\right )}{T_k\left ( \frac {n}{n-1}\right )}. \end{aligned}

Intuitively, this uses the fact that Chebychev polynomials are bounded in [-1,1] (Property 2.) and then increase very fast (Property 3.).

More precisely, we have:

  • p(n)=1 by construction;
  • for i<n, we have:

    T_k\left ( \frac {i}{n-1}\right ) \leq 1 by Property 2.;

    T_k\left ( \frac {n}{n-1}\right ) = T_k\left (1 + \frac {1}{n-1}\right ) \geq 1 + \frac {k^2}{n-1} by Property 3. and 4., and therefore for some k = \mathcal {O}(\sqrt {n}), we have: T_k\left ( \frac {n}{n-1}\right ) \geq 3.


Let us now prove the corresponding lower bound:

Claim 6. We have:

\begin{aligned}d_{1/3}(\text {AND}) = \Omega (\sqrt {n}).\end{aligned}

Proof. Let p be a polynomial that approximates the AND function with error 1/3. Consider the univariate symmetrization p' of p.

We have the following result from approximation theory:

Theorem 7. Let q be a real univariate polynomial such that:

  1. \forall i \in \{0,\dots ,n\}, |q(i)| \leq \mathcal {O}(1);
  2. q'(x) \geq \Omega (1) for some x \in [0,n].

    Then \deg q = \Omega (\sqrt {n}).

To prove our claim, it is therefore sufficient to check that p' satisfies conditions 1. and 2., as we saw that \deg p \geq \deg p':

  1. We have: \forall i \in \{0,\dots ,n\}, |p'(i)| \leq 1 + 1/3 by assumption on p;
  2. We have p'(n-1) \leq 1/3 and p'(n) \geq 2/3 (by assumption), so that the mean value theorem gives some x such that p'(x) \geq \Omega (1).

This concludes the proof. \square

1.5 Approximate Degree of AND-OR.

Consider the AND function on R bits and the OR function on N bits. Let AND-OR:\{0,1\}^{R\times N} \rightarrow \{0,1\} be their composition (which outputs the AND of the R outputs of the OR function on N-bits (disjoint) blocks).

It is known that d_{1/3}(AND-OR) = \Theta (\sqrt {RN}). To prove the upper bound, we will need a technique to compose approximating polynomials which we will discuss later.

Now we focus on the lower bound. This lower bound was recently proved independently by Sherstov and by Bun and Thaler. We present a proof that is different (either in substance or in language) and which we find more intuitive. Our proof replaces the “dual block method” with the following lemma.

Lemma 8. Suppose that

distributions A^0, A^1 over \{0,1\}^{n_A} are k_A-wise indistinguishable distributions; and

distributions B^0, B^1 over \{0,1\}^{n_B} are k_B-wise indistinguishable distributions.

Define C^0, C^1 over \{0,1\}^{n_A \cdot n_B} as follows: C^b: draw a sample x \in \{0,1\}^{n_A} from A^b, and replace each bit x_i by a sample of B^{x_i} (independently).

Then C^0 and C^1 are k_A \cdot k_B-wise indistinguishable.

Proof. Consider any set S \subseteq \{1,\dots , n_A\cdot n_B \} of k_A \cdot k_B bit positions; let us show that they have the same distribution in C^0 and C^1.

View the n_A \cdot n_B as n_A blocks of n_B bits. Call a block K of n_B bits heavy if |S\cap K| \geq k_B; call the other blocks light.

There are at most k_A heavy blocks by assumption, so that the distribution of the (entire) heavy blocks are the same in C^0 and C^1 by k_A-wise indistinguishability of A^0 and A^1.

Furthermore, conditioned on any outcome for the A^b samples in C^b, the light blocks have the same distribution in both C^0 and C^1 by k_B-wise indistinguishability of B^0 and B^1.

Therefore C^0 and C^1 are k_A \cdot k_B-wise indistinguishable. \square