Special Topics in Complexity Theory, Lectures 2-3

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lectures 2-3, Scribe: Tanay Mehta

In these lectures we conclude the proof that bounded independence fools the AND function, and look at the more recent result that bounded independence fools the circuit class $\mathrm {AC}^0$.

1.1 Bounded Independence Fools AND

We state again the theorem from last time.

Theorem 1. Let $(X_1, \dots , X_n)$ be a distribution over $\{0,1\}^n$ such that any $k$ $X_i$ are independent (but not necessarily uniform). Then, we have that

\begin{aligned} \left |\Pr \left [\bigwedge _{i=1}^n X_i = 1\right ] - \prod _{i=1}^n \Pr [X_i = 1]\right | \leq 2^{- \Omega (k)}\end{aligned}

Proof. Let $D$ be the distribution of $(X_1, \dots , X_n)$. Let $B$ be the $n$-wise independent distribution $(Y_1, \dots , Y_n)$ such that $\Pr [Y_i = 1] = \Pr [X_i = 1]$ for all $i \in [n]$ and the $Y_i$ are independent. The theorem is equivalent to the following statement.

\begin{aligned}|\Pr _{X \leftarrow D}\left [\bigwedge _{i=1}^n X_i = 1 \right ] - \Pr _{X \leftarrow B}\left [\bigwedge _{i=1}^n X_i = 1 \right ]| \leq 2^{-\Omega (k)}\end{aligned}

We will prove the above statement by the following version of the Inclusion-Exclusion principle.

1.1.1 Inclusion-Exclusion Principle

Let $V$ be any distribution over $\{0,1\}^n$. Note that by De Morgan’s laws, we have

\begin{aligned}\Pr \left [\bigwedge V_i = 1 \right ] = 1 - \Pr \left [\bigvee V_i = 0 \right ]\end{aligned}

Let $E_i$ be the event that $V_i = 0$. We want to bound the quantity $\Pr \left [\bigcup E_i\right ]$. By looking at the Venn diagram of the events $E_i$, we can see that

\begin{aligned} \Pr \left [\bigcup E_i\right ] & \leq \Pr [E_1] + \dots + \Pr [E_n] = \sum _i \Pr [E_i] \nonumber \\ \Pr \left [\bigcup E_i\right ] & \geq \sum _i \Pr [E_i] - \sum _{i, j} \Pr [E_i \cap E_j] \nonumber \\ \Pr \left [\bigcup E_i\right ] & \leq \sum _i \Pr [E_i] ~~ - \sum _{S \subseteq [n], |S| = 2} \Pr \left [\bigcap _{i \in S} E_i \right ]~~+ \sum _{S \subseteq [n], |S| = 3} \Pr \left [\bigcap _{i \in S} E_i\right ], \nonumber \end{aligned}

and so on. In general, we have the following. Define

\begin{aligned}T_j := \sum _{S \subseteq [n], |S| = j} \Pr \left [ \bigcap _{i \in S} E_i \right ]\end{aligned}
\begin{aligned}S_h := \sum _{i=1}^h (-1)^{i+1} T_i \end{aligned}

Then, we have the bounds $\Pr \left [\bigcup E_i\right ] \leq S_j$ for odd $j$, and $\Pr \left [\bigcup E_i\right ] \geq S_j$ for even $j$. This fact holds for any distribution.

Let us return to the proof. Note that the $S_h$ are the same for $D$ and $B$ up to $h = k$ because they only involve sums of ANDs of at most $k$ events. Hence, we have that

\begin{aligned}\left |\Pr _D \left [\bigwedge X_i = 1\right ] - \Pr _B\left [ \bigwedge X_i = 1\right ]\right | \leq |S_k - S_{k-1}| = |T_k|\end{aligned}

where the last equality comes from the definition of $S_k$. Therefore, we are done if $|T_k| \leq 2^{-\Omega (k)}$. We have that

\begin{aligned}T_k = \sum _{S \subseteq [n], |S| = k} \Pr \left [\bigcap _{i \in S} E_i \right ] ={ n \choose k} ~~ \mathbb {E}_{S \subseteq [n], |S| = k} \left [ \prod _{i \in S} P_i \right ] \end{aligned}

where $P_i := \Pr [E_i] = 1 - \Pr [X_i = 1]$. To bound the expectation we recall a useful inequality.

1.1.2 A Useful Inequality

Let $Q_1, \dots , Q_n$ be non-negative real numbers. Then, by the AM-GM inequality, we have that

\begin{aligned}\frac {\sum _i Q_i}{n} \geq \left (\prod _i Q_i \right )^{1/n}.\end{aligned}

Consider the following more general statement,

\begin{aligned} \mathbb {E}_{S \subseteq [n], |S| = 1} \left [ \prod _{i \in S} Q_i \right ] \geq \mathbb {E}_{S \subseteq [n], |S| = 2} \left [ \prod _{i \in S} Q_i \right ]^{1/2} \geq \dots \hspace {5cm} \nonumber \\ \dots \geq \mathbb {E}_{S \subseteq [n], |S| = k} \left [ \prod _{i \in S}Q_i \right ]^{1/k} \geq \dots \geq \mathbb {E}_{S \subseteq [n], |S| = n} \left [ \prod _{i \in S} Q_i \right ]^{1/n} \nonumber \end{aligned}

and note that the left most term is equal to $\frac {\sum _i Q_i}{n}$, while the right most term is equal to $\left (\prod _i Q_i \right )^{1/n}$

Applying the above inequality to $T_k$ and a common approximation for the binomial coefficient, we have that

\begin{aligned}T_k = { n \choose k} ~~ \mathbb {E}_{S \subseteq [n], |S| = k} \left [ \prod _{i \in S} P_i \right ] \leq {n \choose k} \sum _{i = 1}^n \left (\frac {P_i}{n}\right )^k \leq \left (\frac {e n}{k}\right )^k \left (\frac {\sum P_i}{n}\right )^k = \left (\frac {e \sum P_i}{k}\right )^k.\end{aligned}

Therefore, we are done if $\sum P_i \leq \frac {k}{2 e}$. Recall that $P_i = \Pr [E_i] = 1 - \Pr [X_i = 1]$. So if $P_i$ is small then $\Pr [X_i = 1]$ is close to $1$.

It remains to handle the case that $\sum P_i \geq \frac {k}{2 e}$. Pick $n'$ such that

\begin{aligned}\sum _{i=1}^{n'} P_i = \frac {k}{2e} \pm 1.\end{aligned}

By the previous argument, the AND of the first $n'$ is the same up to $2^{-\Omega (k)}$ for $D$ and $B$. Also, for every distribution the probability of that the And of $n$ bits is 1 is at most the probability that the And of $n'$ bits is $1$. And also, for the $n$-wise independent distribution $B$ we have

\begin{aligned} \Pr _B\left [\bigwedge _{i=1}^{n'} X_i = 1\right ] & = \prod _{i=1}^{n'} \Pr [X_i = 1] \\ & = \prod _{i=1}^{n'} (1 - P_i)\\ & \leq \left (\frac {\sum _{i=1}^{n'}(1-P_i)}{n'}\right )^{n'} \textrm { by the AM-GM inequality} \\ & \leq \left (\frac {n' - k/2e}{n'}\right )^{n'} \leq (1 - k/2en')^{n'} \leq e^{-\Omega (k)}. \end{aligned}

The combination of these facts concludes this case. To summarize, in this case we showed that

\begin{aligned}\Pr _D[\bigwedge _{i=1}^n X_i = 1] \le \Pr _D[\bigwedge _{i=1}^{n'} X_i = 1].\end{aligned}

as well as

\begin{aligned}\Pr _B[\bigwedge _{i=1}^n X_i = 1] \le \Pr _B[\bigwedge _{i=1}^{n'} X_i = 1] \leq 2^{-\Omega (k)}.\end{aligned}

By the choice of $n'$ and the previous argument, we also know that $|\Pr _D[\bigwedge _{i=1}^{n'} X_i = 1]- \Pr _B[\bigwedge _{i=1}^{n'} X_i = 1]| \le 2^{-\Omega (k)}$ and so we are done, as all quantities above are at most $2^{-\Omega (k)}$ (and at least $0$). $\square$

Remark 2. The bound is tight up to $\Omega ( .)$

Proof. Let $D$ be the distribution over $\{0,1\}^{k+1}$ as follows: $D_{1, \dots , k} = U_k$ and $D_{k+1} = D_1 + \dots + D_k \mod 2$. Then, $D$ is $k$-wise independent. However, if $k$ is even, then

\begin{aligned}\Pr [\bigwedge _{i=1}^{k+1} D_i = 1] = 0.\end{aligned}

Yet, we have that

\begin{aligned}\Pr [\bigwedge _{i=1}^{k+1} U_i = 1] = 2^{-(k+1)}.\end{aligned}

$\square$

1.2 Bounded Independence Fools $\textrm {AC}^0$

Acknowledgement. This section is based on Amnon Ta-Shma’s notes for the class 0368.4159 Expanders, Pseudorandomness and Derandomization CS dept, Tel-Aviv University, Fall 2016.

Note that a DNF on $n$ bits can be modeled as a depth two circuit where the top layer is an OR-gate whose inputs are AND-gates, which take inputs $X_1, \dots , X_n$ and their negations. The circuit class $\textrm {AC}^0$ can be viewed as a generalization of this to higher (but constant) depth circuits. That is, $\textrm {AC}^0$ consists of circuits using AND-gates, OR-gates, NOT-gates, and input registers. Each of the gates have unbounded fan-in (i.e. the number of input wires). The size of the circuit is defined to be the number of gates.

$\textrm {AC}^0$ is one of the most studied classes in complexity theory. $\textrm {AC}^0$ circuits of polynomial size can do many things, including adding and subtracting $n$-bit integers.

Conjecture 3.[Linial-Nisan[LN90]] $\log ^{O(d)} s$-wise independence fools $\textrm {AC}^0$ circuits of depth $d$ and size $s$.

The conjecture was open for a long time, even for in the special case $d=2$. In 2007 a breakthrough work by Bazzi [Baz09] proved it for $d=2$. Shortly afterwards, Razborov presented a simpler proof of Bazzi’s result [Raz09], and Braverman proved the conjecture for any $d$ with $\log ^{d^2} s$-wise independence [Bra10]. Tal improved the result to $\log ^{O(d)} s$ [Tal17].

Interestingly, the progress on the conjecture does not use ideas that were not around since the time of its formulation. Bottom line: if a problem is open for a long time, you should immediately attack it with existing tools.

The high-level intuition why such a result should be true is the following:

1. $\textrm {AC}^0$ is approximated by polylog degree polynomials.
2. $k$-wise independence fools degree-$k$ polynomials.

Proof of (2). Let $x = (x_1, \dots , x_n) \in \{0,1\}^n$. Let $p(x_1, \dots , x_n)$ be a degree $k$ polynomial over $\mathbb {R}$. Write $p$ as

\begin{aligned}p(x_1, \dots , x_n) = \sum _{M \subseteq [n], |M| \leq k} c_M \cdot x_M.\end{aligned}

If $D$ is a $k$-wise independent distribution on $\{0,1\}^n$, then by linearity of expectation

\begin{aligned} \mathbb {E}_D[P] = \sum _{M \subseteq [n], |M| \leq k} c_M \mathbb {E}_D[x_M] = \sum _{M \subseteq [n], |M| \leq k} c_M \mathbb {E}_U[x_M] = \mathbb {E}_U[P]. \end{aligned}

$\square$

There are several notions of approximating AC$^0$ by low-degree polynomials. We now review two of them, explaining why neither of them is sufficient. Braverman showed how to cleverly combine the two methods to prove a version of (1) that’s strong enough.

1.2.1 Approximation 1

Theorem 4. For all $\textrm {AC}^0$ circuits $C(x_1, \dots , x_n)$ of size $s$ and depth $d$, for all distributions $D$ over $\{0,1\}^n$, for all $\epsilon$, there exists a polynomial $p(x_1, \dots , x_n)$ of degree $\log ^{O(d)} s/\epsilon$ such that

\begin{aligned}\Pr _{x \leftarrow D}[p(x) = C(x)] \geq 1 - \epsilon .\end{aligned}

The important features of this approximation are that it works under any distribution, and when the polynomial is correct it outputs a boolean value.

Similar approximations appear in many papers, going back to Razborov’s paper [Raz87] (who considers polynomials modulo 2) which uses ideas from earlier still work.

Note that the polynomial $p$ depends on the circuit $C$ chosen, and on the distribution. This theorem is not a good enough approximation because on the $\epsilon$ fraction of inputs where the polynomial and circuit are unequal, the value of the polynomial can (and does) explode to be much greater than $1/\epsilon$. This prevents us from bounding the average of the polynomial.

Nevertheless, let us prove the above theorem.

Proof. Consider one OR-gate of fan-in $s$. We construct a distribution of polynomials that compute any input with high probability. This implies that there is a fixed polynomial that computes the circuit on a large fraction of the inputs by an averaging argument.

For $i = 1, 2, \dots , \log s$. let $S_i$ be a random subset of $[s]$ where every element is included with probability $1/2^i$, independently.

Suppose $x$ has Hamming weight $2^j$. Then, $\mathbb {E}[\sum _{n \in S_j} x_n] = 1$. And the sum can be shown to equal $1$ with constant probability.

Define the approximation polynomial $p$ to be

\begin{aligned}p(x) := 1 - \prod _{i=1}^{\log s} (1 - \sum _{h \in S_i} x_h)\end{aligned}

Note that if $x$ has weight $w > 0$, then $p(x) = 0$ with constant probability. If $w =0$, then $p(x) = 1$ with probability $1$. We can adjust the error probability to $\epsilon$ by repeating each term in the product $\log (1/\epsilon )$ times.

Thus, we can approximate one gate with the above polynomial of degree $O(\log (s) \cdot \log (1/\epsilon ))$. Construct polynomials as $p$ above for each gate, with error parameter $\epsilon /s$. The probability that any of them is wrong is at most $\epsilon$ by a union bound. To obtain the approximating polynomial for the whole circuit compose all the polynomials together. Since the circuit is of depth $d$, the final degree of the approximating polynomial is $(\log (s) \cdot \log (s/\epsilon ))^{d}$, as desired.

As mentioned at the beginning, this is a distribution on polynomials that computes correctly any input with probability at least $1-\epsilon$. By averaging, there exists a fixed polynomial that computes correctly a $1-\epsilon$ fraction of inputs. $\square$

It can be verified that the value of the polynomial can be larger than $1/\epsilon$. The polynomial for the gates closest to the input can be as large as $s$. Then at the next level it can be as large as $s^{\log s/\epsilon }$, which is already much larger than $1/\epsilon$.

1.3 Approximation 2

Theorem 5. For all circuits $C$ of size $s$ and depth $d$, for all error values $\epsilon$, there exists a polynomial $p(x_1, \dots , x_n)$ of degree $O(\log (s)^{d-1} \log (1/\epsilon ))$ such that

\begin{aligned} \mathbb {E}_{x \leftarrow U_n}[(C(x) - p(x))^2] \leq \epsilon . \end{aligned}

The important feature of this approximation is that it bounds the average, but only under the uniform distribution. Because it does not provide any guarantee on other distributions, including $k$-wise independent distributions, it cannot be used directly for our aims.

Remark 6. Approximation 2 is proved via the switching lemma, an influential lemma first proved in the early 80’s by Ajtai [Ajt83] and by Furst, Saxe, and Sipser [FSS84]. The idea is to randomly set a subset of the variables to simplify the circuit. You can do this repeatedly to simplify the circuit even further, but it only works on the uniform distribution. Hastad [Hås87] gave a much tighter analysis of the switching lemma, and the paper [LMN93] used it to prove a version of Approximation 2 with a slightly worse dependence on the error. Recently, a refinement of the switching lemma was proved in [Hås14IMP12]. Based on that, Tal [Tal17] obtained the corresponding refinement of Approximation 2 where the parameters are as stated above. (The polynomial is simply obtained from the Fourier expansion of the function computed by the circuit by removing all Fourier coefficients larger than a certain threshold. The bound on the Fourier decay in [Tal17] implies the desired approximation.)

1.4 Bounded Independence Fools $\textrm {AC}^0$

Theorem 7. For all circuits $C$ with unbounded fan-in of size $s$ and depth $d$, for all error values $\epsilon$, for all $k$-wise independent distributions $D$ on $\{0,1\}^n$, we have that

\begin{aligned} |\mathbb {E}[C(D)] - \mathbb {E}[C(U_n)]| \leq \epsilon \end{aligned}

for $k = \log (s/\epsilon )^{O(d)}$.

Corollary 8. In particular, if $s = \textrm {poly}(n)$, $d = O(1)$, $s = 1/\textrm {poly}(n)$, then $k = \log ^{O(1)}(n)$ suffices.

The next claim is the ultimate polynomial approximation used to prove the theorem.

Claim 9. For all circuits $C$ with unbounded fan-in of size $s$ and depth $d$, for all error values $\epsilon$, for all $k$-wise independent distributions $D$ on $\{0,1\}^n$, there is a set $E$ of inputs, and a degree-$k$ polynomial $p$ such that:

1. $E$ is ’rare’ under both $D$ and $U_n$:
$\Pr _{x \leftarrow U_n}[E(x) = 1] \leq \epsilon$, and $\Pr _{x \leftarrow D}[E(x) =1] \leq \epsilon$. Here we write $E(x)$ for the indicator function of the event $x \in E$.
2. For all $x$, $p(x) \leq C(x) \vee E(x)$. Here $\vee$ is the logical Or.
3. $\mathbb {E}[p(U_n)] = \mathbb {E}[C(U_n)] \pm \epsilon$.

We only need (1) under $D$, but (1) under $U$ is used to prove (3).

Proof of Theorem 7 from Claim 9.

\begin{aligned} \mathbb {E}[C(D)] & = \mathbb {E}[C(D) \vee E(D)] \pm \epsilon \text {, by Claim.(1)} \nonumber \\ & \geq \mathbb {E}[p(D)] \pm \epsilon \text {, by Claim.(2)} \nonumber \\ & = \mathbb {E}[p(U_n)] \pm \epsilon \text {, because } p \text { has degree } k \text { and } D \text { is } k \text {-wise independent} \nonumber \\ & = \mathbb {E}[C(U_n)] \pm \epsilon \text {, by Claim.(3)} \nonumber \end{aligned}

For the other direction, repeat the argument for ‘not $C$’. $\square$

We can construct the polynomial approximation from Claim 9 by using a combination of Approximation 1 and 2. First we need a little more information about Approximation 1.

Claim 10. Two properties of approximation 1:

1. For all $x$, $p(x) \leq 2^{\log (s/\epsilon )^{O(d)}}$.
2. The ’bad’ set $E$ is computable by a circuit of size $\textrm {poly}(s)$, and depth $d + O(1)$.

Proof of Claim 10 part 2. Consider a single OR gate with input gates $g_1, \dots , g_s$. This is represented in the approximating polynomial by the term

\begin{aligned}1 - \prod _{i=1}^{\textrm {polylog}(s/\epsilon )}(1 - \sum _{j \in S_i} g_j).\end{aligned}

Note that the term is incorrect exactly when the input $g_1, \dots , g_s$ has weight $> 0$ but all the sets $S_i$ intersect $0$ or $\geq 2$ ones. This can be checked in AC$^0$, in parallel for all gates in the circuit. $\square$

Proof of Claim 9. Run approximation 1 for the distribution $\frac {D+U}{2}$, yielding the polynomial $p_c$ and the set $E$. This already proves the first part of the claim for both $D$ and $U$, because if $E$ has probability $\epsilon$ under $D$ it has probability $\ge \epsilon /2$ under $(D+U)/2$, and the same for $U$ . Use Claim 10 part 2, and run approximation 2 on $E$. Call the resulting polynomial $p_E$, which has degree $\log (s/\delta )^{O(d)}$ with error bound $\delta$.

The idea in the ultimate approximating polynomial is to “check if there is a mistake, and if so, output $0$. Otherwise, output $C$”. Formally:

\begin{aligned}p(x) := 1 - (1 - p_c(1 - p_E))^2\end{aligned}

Claim 9 part 2 can be shown as follows. $p(x) \leq 1$ by definition. So, if $C(x) \vee E(x) = 1$, then we are done. Otherwise, $C(x) \vee E(x) = 0$. So there is no mistake, and $C=0$. Hence, by the properties of Approximation 1, $p_c(x) = 0$. This implies $p(x)=0$.

It only remains to show Claim 9 part 3:

\begin{aligned} \mathbb {E}_U[p(x)] = \mathbb {E}_U[C(x)] \pm \epsilon . \end{aligned}

By part 1 of Claim 9,

\begin{aligned} \mathbb {E}_U[C(x) - p(x)] = \mathbb {E}_U[C(x) \vee E(x) - p(x)] \pm \epsilon . \end{aligned}

We can show that this equals

\begin{aligned} \mathbb {E}_U\left [\left (C(x) \vee E(x) - p_c(x)(1-p_E(x))\right )^2\right ] \pm \epsilon \end{aligned}

by the following argument: If $C(x) \vee E(x) =1$ then $1-p(x) = (1-p_c(x)(1-p_E(x)) )^2$ by definition. If $C(x) \vee E(x) =0$, then there is no mistake, and $C(x) = 0$. This implies that $p_c(x)(1 - p_E(x)) = p(x) = 0$.

Let us rewrite the above expression in terms of the expectation $\ell _2$ norm.

\begin{aligned} || C \vee E - p_c(1 - p_E)||_2^2. \end{aligned}

Recall the triangle inequality, which states: $||u - v||_2 \leq ||u - w||_2 + ||w - v||_2$. Therefore, letting $w = p_c(1 - E)$ we have that the above quantity is

\begin{aligned} \leq (|| p_c(1 - E) - p_c(1 - p_E)||_2 ~~ + ~~ ||p_c(1 - E) - C \vee E ||_2)^2 \end{aligned}
\begin{aligned} \leq O(||p_c(1-E) - p_c(1 - p_E)||_2^2 + ||p_c (1 - E) - C \vee E ||_2^2). \end{aligned}

To conclude, we will show that each of the above terms are $\leq \epsilon$. Note that

\begin{aligned} ||p_c(1-E) - p_c(1 - p_E)||_2^2 \leq \max _x |p_c(x)|^2 ||(1 - E) - (1 - p_E)||_2^2. \end{aligned}

By Claim 10 part 1 and Approximation 2, this is at most

\begin{aligned} 2^{\log (s/\epsilon )^{O(d)}} \cdot ||E - p_E||_2^2 \leq 2^{\log (s/\epsilon )^{O(d)}} \cdot \delta . \end{aligned}

For this quantity to be at most $\epsilon$ we set $\delta = \epsilon \cdot 2^{-\log (s/\epsilon )^{O(d)}}$. Here we critically set the error in Approximation 2 much lower, to cancel the large values arising from Approximation 1. By Theorem 5, the polynomial arising from approximation 2 has degree $O(\log (s)^{d-1} \log (1/\delta )) = \log (s/\epsilon )^{O(d)}$.

Finally, let us bound the other term, $||p_c (1 - E) - C \vee E ||_2^2$. If $E(x) = 0$, then the distance is $0$. If $E(x) =1$, then the distance is $\leq 1$. Therefore, this term is at most $\Pr _U[E(x) =1]$, which we know to be at most $\epsilon$. $\square$

References

[Ajt83]    Miklós Ajtai. $\Sigma \sp {1}\sb {1}$-formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.

[Baz09]    Louay M. J. Bazzi. Polylogarithmic independence can fool DNF formulas. SIAM J. Comput., 38(6):2220–2272, 2009.

[Bra10]    Mark Braverman. Polylogarithmic independence fools AC$^{\mbox {0}}$ circuits. J. of the ACM, 57(5), 2010.

[FSS84]    Merrick L. Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Mathematical Systems Theory, 17(1):13–27, 1984.

[Hås87]    Johan Håstad. Computational limitations of small-depth circuits. MIT Press, 1987.

[Hås14]    Johan Håstad. On the correlation of parity and small-depth circuits. SIAM J. on Computing, 43(5):1699–1708, 2014.

[IMP12]    Russell Impagliazzo, William Matthews, and Ramamohan Paturi. A satisfiability algorithm for AC$^{\mbox {0}}$. In ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 961–972, 2012.

[LMN93]   Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform, and learnability. J. of the ACM, 40(3):607–620, 1993.

[LN90]     Nathan Linial and Noam Nisan. Approximate inclusion-exclusion. Combinatorica, 10(4):349–365, 1990.

[Raz87]    Alexander Razborov. Lower bounds on the dimension of schemes of bounded depth in a complete basis containing the logical addition function. Akademiya Nauk SSSR. Matematicheskie Zametki, 41(4):598–607, 1987. English translation in Mathematical Notes of the Academy of Sci. of the USSR, 41(4):333-338, 1987.

[Raz09]    Alexander A. Razborov. A simple proof of Bazzi’s theorem. ACM Transactions on Computation Theory (TOCT), 1(1), 2009.

[Tal17]    Avishay Tal. Tight bounds on the fourier spectrum of AC0. In Conf. on Computational Complexity (CCC), pages 15:1–15:31, 2017.