Just coincidence?

Proving lower bounds is one of the greatest intellectual challenges of our time. Something that strikes me is when people reach the same bounds from seemingly different angles.  Two recent examples:

What does this mean?  Again, the only things that matter are those that you can prove.  Still, here are some options:

  • Lower bounds are true, and provable with the bag of tricks people are using.  The above is just coincidence. Given the above examples (and others) I find this possibility quite bizarre. To illustrate the bizarre in a bizarre way, imagine a graph where one edge is a trick from the bag, and each node is a bound. Why should different paths lead to the same sink, over and over again?
  • Lower bounds are true, but you need to use a different bag of tricks. My impression is that two types of results are available here.  The first is for “infinitary” proof systems, and includes famous results like the Paris-Harrington theorem. The second is for “finitary” proof systems, and includes results like Razborov’s proof that superpolynomial lower bounds cannot be proved in Res(k). What I really would like is a survey that explains what these and all other relevant proof systems are and can do, and what would it mean to either strengthen the proof system or make the unprovable statement closer to the state-of-the-art. (I don’t even have the excuse of not having a background in logic.  I took classes both in Italy and in the USA.  In Italy I went to a summer school in logic, and took the logic class in the math department.  It was a rather tough class, one of the last offerings before the teacher was forced to water it down.  If I remember correctly, it lasted an entire year (though now it seems a lot).  As in the European tradition, at least of the time, instruction was mostly one-way: you’d sit there for hours each week and just swallow this avalanche of material. At the very end, there was an oral exam where you sit with the instructor — face-to-face — and they mostly ask you to repeat random bits of the lectures.  But for the bright student some simple original problems are also asked — to be solved on the spot.  So there is substantial focus on memorization, a word which has acquired a negative connotation, some of which I sympathize with.  However a 30-minute oral exam does have its benefits, and on certain aspects I’d argue it can’t quite be replaced by written exams, let alone take-home.  But I digress.)
  • Lower bounds are false. That is, all “simple” functions have say n^3 formula size.  You can prove this using computational checkpoints, a notion which in hindsight isn’t too complicated, but alas has not yet been invented.  To me, this remains the more likely option.

What do you think?


Nonclassical polynomials and exact computation of Boolean functions

Guest post by Abhishek Bhrushundi.

I would like to thank Emanuele for giving me the opportunity to write a guest post here. I recently stumbled upon an old post on this blog which discussed two papers: Nonclassical polynomials as a barrier to polynomial lower bounds by Bhowmick and Lovett, and Anti-concentration for random polynomials by Nguyen and Vu. Towards the end of the post, Emanuele writes:

“Having discussed these two papers in a sequence, a natural question is whether non-classical polynomials help for exact computation as considered in the second paper. In fact, this question is asked in the paper by Bhowmick and Lovett, who conjecture that the answer is negative: for exact computation, non-classical polynomials should not do better than classical.”

In a joint work with Prahladh Harsha and Srikanth Srinivasan from last year, On polynomial approximations over \mathbb {Z}/2^k\mathbb {Z}, we study exact computation of Boolean functions by nonclassical polynomials. In particular, one of our results disproves the aforementioned conjecture of Bhowmick and Lovett by giving an example of a Boolean function for which low degree nonclassical polynomials end up doing better than classical polynomials of the same degree in the case of exact computation.

The counterexample we propose is the elementary symmetric polynomial of degree 16 in \mathbb {F}_2[x_1, \ldots , x_n]. (Such elementary symmetric polynomials also serve as counterexamples to the inverse conjecture for the Gowers norm [LMS11GT07], and this was indeed the reason why we picked these functions as candidate counterexamples),

\begin{aligned}S_{16}(x_1, \ldots , x_n) = \left (\sum _{S\subseteq [n],|S| = 16} \prod _{i \in S}x_i\right )\textrm { mod 2} = {|x| \choose 16} \textrm { mod 2},\end{aligned}

where |x| = \sum _{i=1}^n x_i is the Hamming weight of x. One can verify (using, for example, Lucas’s theorem) that S_{16}(x_1, \ldots , x_n) = 1 if and only if the 5^{th} least significant bit of |x| is 1.

We use that no polynomial of degree less than or equal to 15 can compute S_{16}(x) correctly on more than half of the points in \{0,1\}^n.

Theorem 1. Let P be a polynomial of degree at most 15 in \mathbb {F}_2[x_1, \ldots , x_n]. Then

\begin{aligned}\Pr _{x \sim \{0,1\}^n}[P(x) = S_{16}(x)] \le \frac {1}{2} + o(1).\end{aligned}

[Emanuele’s note. Let me take advantage of this for a historical remark. Green and Tao first claimed this fact and sent me and several others a complicated proof. Then I pointed out the paper by Alon and Beigel [AB01]. Soon after they and I independently discovered the short proof reported in [GT07].]

The constant functions (degree 0 polynomials) can compute any Boolean function on half of the points in \{0,1\}^n and this result shows that even polynomials of higher degree don’t do any better as far as S_{16}(x_1, \ldots , x_n) is concerned. What we prove is that there is a nonclassical polynomial of degree 14 that computes S_{16}(x_1, \ldots , x_n) on 9/16 \ge 1/2 + \Omega (1) of the points in \{0,1\}^n.

Theorem 2. There is a nonclassical polynomial P of degree 14 such that

\begin{aligned}\Pr _{x \sim \{0,1\}^n}[P(x) = S_{16}(x)] = \frac {9}{16} - o(1).\end{aligned}

A nonclassical polynomial takes values on the torus \mathbb {T} = \mathbb {R}/\mathbb {Z} and in order to compare the output of a Boolean function (i.e., a classical polynomial) to that of a nonclassical polynomial it is convenient to think of the range of Boolean functions to be \{0,1/2\} \subset \mathbb {T}. So, for example, S_{16}(x_1, \ldots , x_n) = \frac {1}{2} if |x|_4 = 1, and S_{16}(x_1, \ldots , x_n) = 0 otherwise. Here |x|_4 denotes the 5^{th} least significant bit of |x|.

We show that the nonclassical polynomial that computes S_{16}(x) on 9/16 of the points in \{0,1\}^n is

\begin{aligned}P(x_1, \ldots , x_n) = \frac {\sum _{S \subseteq [n], |S|=12} \prod _{i \in S}x_i}{8} \textrm { mod 1}= \frac {{|x| \choose 12}}{8} \textrm { mod 1} .\end{aligned}

The degree of this nonclassical polynomial is 14 but I wouldn’t get into much detail as to why this is case (See [BL15] for a primer on the notion of degree in the nonclassical world).

Understanding how P(x) behaves comes down to figuring out the largest power of two that divides |x| \choose 12 for a given x: if the largest power of two that divides |x| \choose 12 is 2 then P(x) = 1/2, otherwise if the largest power is at least 3 then P(x) = 0. Fortunately, there is a generalization of Lucas’s theorem, known as Kummer’s theorem, that helps characterize this:

Theorem 3.[Kummer’s theorem] The largest power of 2 dividing a \choose b for a,b \in \mathbb {N}, a \ge b, is equal to the number of borrows required when subtracting b from a in base 2.
Equipped with Kummer’s theorem, it doesn’t take much work to arrive at the following conclusion.

Lemma 4. P(x) = S_{16}(x) if either |x|_{2} = 0 or (|x|_2, |x|_3, |x|_4, |x|_5) = (1,0,0,0), where |x|_i denotes the (i+1)^{th} least significant bit of |x|.

If x = (x_1, \ldots , x_n) is uniformly distributed in \{0,1\}^n then it’s not hard to verify that the bits |x|_0, \ldots , |x|_5 are almost uniformly and independently distributed in \{0,1\}, and so the above lemma proves that P(x) computes S_{16}(x) on 9/16 of the points in \{0,1\}^n. It turns out that one can easily generalize the above argument to show that S_{2^\ell }(x) is a counterexample to Bhowmick and Lovett’s conjecture for every \ell \ge 4.

We also show in our paper that it is not the case that nonclassical polynomials always do better than classical polynomials in the case of exact computation — for the majority function, nonclassical polynomials do as badly as their classical counterparts (this was also conjectured by Bhowmick and Lovett in the same work), and the Razborov-Smolensky bound for classical polynomials extends to nonclassical polynomials.

We started out trying to prove that S_4(x_1, \ldots , x_n) is a counterexample but couldn’t. It would be interesting to check if it is one.


[AB01]    N. Alon and R. Beigel. Lower bounds for approximations by low degree polynomials over z m. In Proceedings 16th Annual IEEE Conference on Computational Complexity, pages 184–187, 2001.

[BL15]    Abhishek Bhowmick and Shachar Lovett. Nonclassical polynomials as a barrier to polynomial lower bounds. In Proceedings of the 30th Conference on Computational Complexity, pages 72–87, 2015.

[GT07]    B. Green and T. Tao. The distribution of polynomials over finite fields, with applications to the Gowers norms. ArXiv e-prints, November 2007.

[LMS11]   Shachar Lovett, Roy Meshulam, and Alex Samorodnitsky. Inverse conjecture for the gowers norm is false. Theory of Computing, 7(9):131–145, 2011.

Entropy polarization

Sometimes you see quantum popping up everywhere. I just did the opposite and gave a classical talk at a quantum workshop, part of an AMS meeting held at Northeastern University, which poured yet another avalanche of talks onto the Boston area. I spoke about the complexity of distributions, also featured in an earlier post, including a result I posted two weeks ago which gives a boolean function f:\{0,1\}^{n}\to \{0,1\} such that the output distribution of any AC^{0} circuit has statistical distance 1/2-1/n^{\omega (1)} from (Y,f(Y)) for uniform Y\in \{0,1\}^{n}. In particular, no AC^{0} circuit can compute f much better than guessing at random even if the circuit is allowed to sample the input itself. The slides for the talk are here.

The new technique that enables this result I’ve called entropy polarization. Basically, for every AC^{0} circuit mapping any number L of bits into n bits, there exists a small set S of restrictions such that:

(1) the restrictions preserve the output distribution, and

(2) for every restriction r\in S, the output distribution of the circuit restricted to r either has min-entropy 0 or n^{0.9}. Whence polarization: the entropy will become either very small or very large.

Such a result is useless and trivial to prove with |S|=2^{n}; the critical feature is that one can obtain a much smaller S of size 2^{n-n^{\Omega (1)}}.

Entropy polarization can be used in conjunction with a previous technique of mine that works for high min-entropy distributions to obtain the said sampling lower bound.

It would be interesting to see if any of this machinery can yield a separation between quantum and classical sampling for constant-depth circuits, which is probably a reason why I was invited to give this talk.

Hardness amplification proofs require majority… and 15 years

Aryeh Grinberg, Ronen Shaltiel, and myself have just posted a paper which proves conjectures I made 15 years ago (the historians want to consult the last paragraph of [2] and my Ph.D. thesis).

At that time, I was studying hardness amplification, a cool technique to take a function f:\{0,1\}^{k}\to \{0,1\} that is somewhat hard on average, and transform it into another function f':\{0,1\}^{n}\to \{0,1\} that is much harder on average. If you call a function \delta -hard if it cannot be computed on a \delta fraction of the inputs, you can start e.g. with f that is 0.1-hard and obtain f' that is 1/2-1/n^{100} hard, or more. This is very important because functions with the latter hardness imply pseudorandom generators with Nisan’s design technique, and also “additional” lower bounds using the “discriminator lemma.”

The simplest and most famous technique is Yao’s XOR lemma, where

\begin{aligned} f'(x_{1},x_{2},\ldots ,x_{t}):=f(x_{1})\oplus f(x_{2})\oplus \ldots \oplus f(x_{t}) \end{aligned}

and the hardness of f' decays exponentially with t. (So to achieve the parameters above it suffices to take t=O(\log k).)

At the same time I was also interested in circuit lower bounds, so it was natural to try to use this technique for classes for which we do have lower bounds. So I tried, and… oops, it does not work! In all known techniques, the reduction circuit cannot be implemented in a class smaller than TC^{0} – a class for which we don’t have lower bounds and for which we think it will be hard to get them, also because of the Natural proofs barrier.

Eventually, I conjectured that this is inherent, namely that you can take any hardness amplification reduction, or proof, and use it to compute majority. To be clear, this conjecture applied to black-box proofs: decoding arguments which take anything that computes f' too well and turn it into something which computes f too well. There were several partial results, but they all had to restrict the proof further, and did not capture all available techniques.

Should you have had any hope that black-box proofs might do the job, in this paper we prove the full conjecture (improving on a number of incomparable works in the literature, including a 10-year-anniversary work by Shaltiel and myself which proved the conjecture for non-adaptive proofs).


One thing that comes up in the proof is the following basic problem. You have a distribution X on n bits that has large entropy, very close to n. A classic result shows that most bits of X are close to uniform. We needed an adaptive version of this, showing that a decision tree making few queries cannot distinguish X from uniform, as long as the tree does not query a certain small forbidden set of variables. This also follows from recent and independent work of Or Meir and Avi Wigderson.

Turns out this natural extension is not enough for us. In a nutshell, it is difficult to understand what queries an arbitrary reduction is making, and so it is hard to guarantee that the reduction does not query the forbidden set. So we prove a variant, where the variables are not forbidden, but are fixed. Basically, you condition on some fixing X_{B}=v of few variables, and then the resulting distribution X|X_{B}=v is indistinguishable from the distribution U|U_{B}=v where U is uniform. Now the queries are not forbidden but have a fixed answer, and this makes things much easier. (Incidentally, you can’t get this simply by fixing the forbidden set.)

Fine, so what?

One great question remains. Can you think of a counter-example to the XOR lemma for a class such as constant-depth circuits with parity gates?

But there is something more why I am interested in this. Proving 1/2-1/n average-case hardness results for restricted classes “just” beyond AC^{0} is more than a long-standing open question in lower bounds: It is necessary even for worst-case lower bounds, both in circuit and communication complexity, as we discussed earlier. And here’s hardness amplification, which intuitively should provide such hardness results. It was given many different proofs, see e.g. [1]. However, none can be applied as we just saw. I don’t know, someone taking results at face value may even start thinking that such average-case hardness results are actually false.


[1]   Oded Goldreich, Noam Nisan, and Avi Wigderson. On Yao’s XOR lemma. Technical Report TR95–050, Electronic Colloquium on Computational Complexity, March 1995. http://www.eccc.uni-trier.de/.

[2]   Emanuele Viola. The complexity of constructing pseudorandom generators from hard functions. Computational Complexity, 13(3-4):147–188, 2004.

Matrix rigidity, and all that

The rigidity challenge asks to exhibit an n × n matrix M that cannot be written as M = A + B where A is “sparse” and B is “low-rank.” This challenge was raised by Valiant who showed in [Val77] that if it is met for any A with at most n1+ϵ non-zero entries and any B with rank O(n∕ log log n) then computing the linear transformation M requires either logarithmic depth or superlinear size for linear circuits. This connection relies on the following lemma.

Lemma 1. Let C : {0, 1}n →{0, 1}n be a circuit made of XOR gates. If you can remove e edges and reduce the depth to d then the linear transformation computed by C equals A + B where A has ≤ 2d non-zero entries per row (and so a total of ≤ n2d non-zero entries), and B has rank ≤ e.

Proof: After you remove the edges, each output bit is a linear combination of the removed edges and at most 2d input variables. The former can be done by B, the latter by A. QED

Valiant shows that in a log-depth, linear-size circuit one can remove O(n∕ log log n) edges to reduce the depth to nϵ – a proof can be found in [Vio09] – and this gives the above connection to lower bounds.

However, the best available tradeoff for explicit matrices give sparsity n2∕r log(n∕r) and rank r, for any parameter r; and this is not sufficient for application to lower bounds.

Error-correcting codes

It was asked whether generator matrixes of good linear codes are rigid. (A code is good if it has constant rate and constant relative distance. The dimensions of the corresponding matrixes are off by only a constant factor, and so we can treat them as identical.) Spielman [Spi95] shows that there exist good codes that can be encoded by linear-size logarithmic depth circuits. This immediately rules out the possibility of proving a lower bound, and it gives a non-trivial rigidity upper bound via the above connections.

Still, one can ask if these matrices at least are more rigid than the available tradeoffs. Goldreich reports a negative answer by Dvir, showing that there exist good codes whose generating matrix C equals A + B where A has at most O(n2∕d) non-zero entries and B has rank O(d log n∕d), for any d.

A similar negative answer follows by the paper [GHK+13]. There we show that there exist good linear codes whose generating matrix can be written as the product of few sparse matrixes. The corresponding circuits are very structured, and so perhaps it is not surprising that they give good rigidity upper bounds. More precisely, the paper shows that we can encode an n-bit message by a circuit made of XOR gates and with say n log *n wires and depth O(1) – with unbounded fan-in. Each gate in the circuit computes the XOR of some t gates, which can be written as a binary tree of depth log 2t + O(1). Such trees have poor rigidity:

Lemma 2.[Trees are not rigid] Let C be a binary tree of depth d. You can remove an O(1∕2b) fraction of edges to reduce the depth to b, for any b.

Proof: It suffices to remove all edges at depths d – b, d – 2b, …. The number of such edges is O(2d-b + 2d-2b + …) = O(2d-b). Note this includes the case d ≤ b, where we can remove 0 edges. QED

Applying Lemma 2 to a gate in our circuit, we reduce the depth of the binary tree computed at that gate to b. Applying this to every gate we obtain a circuit of depth O(b). In total we have removed an O(1∕2b) fraction of the n log *n edges.

Writing 2b = n∕d, by Lemma 1 we can write the generating matrixes of our code as C = A + B where A has at most O(n∕d) non-zero entries per row, and B has rank O(d log *n). These parameters are the same as in Dvir’s result, up to lower-order terms. The lower-order terms appear incomparable.

Walsh-Fourier transform

Another matrix that was considered is the n×n Inner Product matrix H, aka the Walsh-Hadamard matrix, where the x,y entry is the inner product of x and y modulo 2. Alman and Williams [AW16] recently give an interesting rigidity upper bound which prevents this machinery to establish a circuit lower bound. Specifically they show that H can be written as H = A + B where A has at most n1+ϵ non-zero entries, and B has rank n1-ϵ′, for any ϵ and an ϵ′ which goes to 0 when ϵ does.

Their upper bound works as follows. Let h = log 2n. Start with the univariate, real polynomial p(z1,z2,…,zh) which computes parity exactly on inputs of Hamming weight between 2ϵn and (1∕2 + ϵ)n. By interpolation such a polynomial exists with degree (1∕2 – ϵ)n. Replacing zi with xiyi you obtain a polynomial of degree n – ϵn which computes IP correctly on inputs x,y whose inner product is between 2ϵn and (1∕2 + ϵ)n.

This polynomial has 2(1-ϵ′)n monomials, where ϵ′ = Ω(ϵ2). The truth-table of a polynomial with m monomials is a matrix with rank m, and this gives a low-rank matrix B′.

The fact that sparse polynomials yield low-rank matrixes also appeared in the paper [SV12], which suggested to study the rigidity challenge for matrixes arising from polynomials.

Returning to the proof in [AW16], it remains to deal with inputs whose inner product does not lie in that range. The number of x whose weight is not between (1∕2 – ϵ)n and (1∕2 + ϵ)n is 2(1-ϵ′)n. For each such input x we modify a row of the matrix B′. Repeating the process for the y we obtain the matrix B, and the rank bound 2(1-ϵ′)n hasn’t changed.

Now a calculation shows that B differs from H in few entries. That is, there are few x and y with Hamming weight between (1∕2 – ϵ)n and (1∕2 + ϵ)n, but with inner product less than 2ϵn.

Boolean complexity

There exists a corresponding framework for boolean circuits (as opposed to circuits with XOR gates only). Rigid matrixes informally correspond to depth-3 Or-And-Or circuits. If this circuit has fan-in fo at the output gate and fan-in fi at each input gate, then the correspondence in parameters is

rank = log fo
sparsity = 2fi .

More precisely, we have the following lemma.

Lemma 3. Let C : {0, 1}n →{0, 1}n be a boolean circuit. If you can remove e edges and reduce the depth to d then you can write C as an Or-And-Or circuit with output fan-in 2e and input fan-in 2d.

Proof: After you remove the edges, each output bit and each removed edge depends on at most 2d input bits or removed edges. The output Or gate of the depth-3 circuit is a big Or over all 2e assignments of values for the removed edges. Then we need to check consistency. Each consistency check just depends on 2d inputs and so can be written as a depth-2 circuit with fan-in 2d. QED

The available bounds are of the form log fo = n∕fi. For example, for input fan-in fi = nα we have lower bounds exponential in n1-α but not more. Again it can be shown that breaking this tradeoff in certain regimes (namely, log 2fo = O(n∕ log log n)) yields lower bounds against linear-size log-depth circuits. (A proof appears in [Vio09].) It was also pointed out in [Vio13] that breaking this tradeoff in any regime yields lower bounds for branching programs. See also the previous post.

One may ask how pairwise independent hash functions relate to this challenge. Ishai, Kushilevitz, Ostrovsky, and Sahai showed [IKOS08] that they can be computed by linear-size log-depth circuits. Again this gives a non-trivial upper bound for depth-3 circuits via these connections, and one can ask for more. In [GHK+13] we give constructions of such circuits which in combination with Lemma 3 can again be used to almost match the available trade-offs.

The bottom line of this post is that we can’t prove lower bounds because they are false, and it is a puzzle to me why some people appear confident that P is different from NP.


[AW16]    Josh Alman and Ryan Williams. Probabilistic rank and matrix rigidity, 2016. https://arxiv.org/abs/1611.05558.

[GHK+13]   Anna Gál, Kristoffer Arnsfelt Hansen, Michal Koucký, Pavel Pudlák, and Emanuele Viola. Tight bounds on computing error-correcting codes by bounded-depth circuits with arbitrary gates. IEEE Transactions on Information Theory, 59(10):6611–6627, 2013.

[IKOS08]    Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography with constant computational overhead. In 40th ACM Symp. on the Theory of Computing (STOC), pages 433–442, 2008.

[Spi95]    Daniel Spielman. Computationally Efficient Error-Correcting Codes and Holographic Proofs. PhD thesis, Massachusetts Institute of Technology, 1995.

[SV12]    Rocco A. Servedio and Emanuele Viola. On a special case of rigidity. Available at http://www.ccs.neu.edu/home/viola/, 2012.

[Val77]    Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. In 6th Symposium on Mathematical Foundations of Computer Science, volume 53 of Lecture Notes in Computer Science, pages 162–176. Springer, 1977.

[Vio09]    Emanuele Viola. On the power of small-depth computation. Foundations and Trends in Theoretical Computer Science, 5(1):1–72, 2009.

[Vio13]    Emanuele Viola. Challenges in computational lower bounds. Available at http://www.ccs.neu.edu/home/viola/, 2013.

Restricted models


Map 1



Map 2


To understand Life, what should you study?

a. People’s dreams.

b. The AMPK gene of the fruit fly.

Studying restricted computational models corresponds to b. Just like microbes constitute a wealth of open problems whose solutions are sometimes far-reaching, so restricted computational models present a number of challenges whose study is significant. For one example, Valiant’s study of arithmetic lower bounds boosted the study of superconcentrators, an influential type of graphs closely related to expanders.

The maps above, taken from here, include a number of challenges together with their relationships. Arrows go towards special cases (which are presumably easier). As written in the manuscript, my main aim was to put these challenges in perspective, and to present some connections which do not seem widely known. Indeed, one specific reason why I drew the first map was the realization that an open problem that I spent some time working on can actually be solved immediately by combining known results. The problem was to show that multiparty (number-on-forehead) communication lower bounds imply correlation bounds for polynomials over GF(2). The classic work by Hastad and Goldman does show that k-party protocols can simulate polynomials of degree k-1, and so obviously that correlation bounds for k-party protocols imply the same bounds for polynomials of degree k-1. But what I wanted was a connection with worst-case communication lower bounds, to show that correlation bounds for polynomials (survey) are a prerequisite even for that.

As it turns out, and as the arrows from (1.5) to (1.2) in the first map show, this is indeed true when k is polylogarithmic. So, if you have been trying to prove multiparty lower bounds for polylogarithmic k, you may want to try correlation bounds first. (This connection is for proving correlation bounds under some distribution, not necessarily uniform.)

Another reason why I drew the first map was to highlight a certain type of correlation bound (1.3), discussed in this paper with Razborov. It is a favorite example of mine of a seemingly very basic open problem that is, again, a roadblock for much of what we’d like to know. The problem is to obtain correlation bounds against polynomials that are real valued, with the convention that whenever the polynomial does not output a boolean value we count that as a mistake, thus making the goal of proving a lower bound presumably easier. Amazingly, the following is still open:

Prove that the correlation of the parity function on n bits is at most 1/n with any real polynomial of degree log(n).

To be precise, correlation is defined as the probability that the polynomial correctly computes parity, minus 1/2. For example, the constant polynomial 1 has correlation zero with parity — it gets it right half the times. Whereas the polynomial x1+x2+…+xn does a lot worse, it has negative correlation with parity or in fact any boolean function, just because it is unlikely that its output is in {0,1}.

What we do in the paper, in my opinion, is to begin to formalize the intuition that these polynomials cannot do much. We show that the correlation with parity is zero (not very small, but actually zero) as long as the polynomial has degree 0.001 loglog(n). This is different from the more familiar models of polynomials modulo m or sign polynomials, because those can achieve non-zero correlation even with constant degree.

On the other hand, with a simple construction, we can obtain non-zero correlation with polynomials of degree O(sqrt(n)). Note the huge gap with the 0.001 loglog(n) lower bound.

Question: what is the largest degree for which the correlation is zero?

The second map gives another slice of open problems. It highlights how superlinear-length lower bounds for branching programs are necessary for several notorious circuit lower bounds.

A third map was scheduled to include Valiant’s long-standing rigidity question and algebraic lower bounds. In the end it was dropped because it required a lot of definitions while I knew of very few arrows. But one problem that was meant to be there is a special case of the rigidity question from this work with Servedio. The question is basically a variant of the above question of real polynomials, where instead of considering low-degree polynomials we consider sparse polynomials. What may not be immediately evident, although in hindsight it is technically immediate, is that this problem is indeed a special case of the rigidity question. The question is to improve on the rigidity bounds in this special case.

In the paper we prove some variant that does not seem to be known in the rigidity world, but what I want to focus on right now is an application that such bounds would have, if established for the Inner Product function modulo 2 (IP). They would imply that IP cannot be computed by polynomial-size AC0-Parity circuits, i.e., AC0 circuits which take as input a layer of parity gates that’s connected to the input. It seems ridiculous that IP can be computed by such circuits, of course. It is easy to handle Or-And-Parity circuits, but circuits of higher depth have resisted attacks.

The question was reasked by Akavia, Bogdanov, Guo, Kamath, and Rosen.

Cheraghchi, Grigorescu, Juba, Wimmer, and Xie have just obtained some lower bounds for this problem. For And-Or-And-Parity circuits they obtain almost quadratic; the bounds degrade for larger depth but stay polynomial. Their proof of the quadratic lower bound looks nice to me. Their first moves are relatively standard: first they reduce to an approximation question for Or-And-Parity circuits; then they fix half the variables of IP so that IP becomes a parity that is “far” from the parities that are input to the DNF. The more interesting step of the argument, in my opinion, comes at this point. They consider the random variable N that counts the number of And-parity gates that evaluate to one, and they observe that the distribution of several moments of this variable is the same in the case where the parity that comes from IP is zero or one. From this, they use approximation theory to argue about the probability that N will be zero in the two cases. They get that these probabilities are also quite close, as long as the circuit is not too large, which shows that the circuit is not correctly computing IP.

What is the P vs. NP problem? My two cents


I just had to convert a movie clip into a different format. The conversion took ten minutes. Given that the clip can be loaded into memory in one second, ten minutes is a long time. Could not this be done faster? In fact, why not one second?

Afterwards, I played video games. I have a playstation 3, but I heard that on the playstation 4 the games look better because with the faster processor the 3D scenes have more details. But do I really need a faster processor for those details? Could not my playstation 3 be programmed to run those games? In fact, could there be a way to have playstation 4 games run on a Commodore 64, or even… see the picture. There do exist hardware limitations of course, but these are not the bottleneck. A present-day 3D game engine on a Commodore 64 would be a stunning achievement, even if the resolution were a bit coarser and the cut scenes deleted.

These are two examples of a wide-open question that is central to theoretical computer science, and to a growing number of fields of science: Are there computational tasks that require a long time?

This may sound puzzling at first, since computers that take a long time are everyday experience, from the cases mentioned above to every time the mouse pointer turns into a sand clock for a ridiculous amount of time, including just now when I booted my computer so that I could continue this post. But the point is that nobody knows whether computers could be programmed to run much faster.

To be sure, there does exist one problem that is known to take a long time. I call this the program-enumeration problem, and is as follows: consider a computer program that goes through every possible program of length n, runs each for n steps, and does something different from each. For example the program could output the first number that is not output by any program of size n within n steps. By construction, this strange task cannot be performed in time n. On the other hand, it can be done in time exponential in n. (Think n = 10000000000000000, which is roughly how many instructions a modern computer can do in a year.)

A crude, pessimistic summary of our understanding of the limitations of computation is that this is it: all we know is that the program-enumeration problem cannot be solved in time n. This single result does have many applications, for example it can be used to show that there is no algorithm for checking the correctness of programs, somewhat justifying the immense industry devoted to that. But the result is very unsatisfactory because most problems that we face, including those mentioned above, have nothing to do with program enumeration. It also feels unsatisfactory because it does not give any information on computation beyond the simple fact that programs can run other programs.

The wide-open question can now be reformulated more precisely as follows.

Grand question: Is there any computational task, besides program enumeration, that requires a long time?

A negative answer, meaning that computers are all-powerful, would have dramatic consequences, well beyond the above examples. What would you give for a computer that executes any task with no perceptible delay? What would you give to play today the game engines of the next fifty years? Certain companies would see a dramatic increase in profits. And, what I find the most interesting application, scientists would be able to solve very complex models, pushing way beyond current knowledge. Let me elaborate on this last point. Interestingly, the situation in several branches of science is somewhat similar to theoretical computer science. Specifically, scientists have identified a number of features which are desirable in a model of whatever it is that they are studying. However, nobody is able to solve models with these feature, except in toy cases, because known programs are too slow.

That computers are all-powerful sounds too good to be true, and in fact the common belief is that there do exist many problems that take a long time. This also has many desirable applications: the security of many everyday electronic systems relies on this belief for specific problems, such as factoring numbers. But we cannot be completely confident of the security until the belief is proved.

The P vs. NP problem is a young, prominent special case of the grand challenge. I think it should be presented as such, which is not always done. The problem asks whether a specific class of problems requires at least a specific amount of time to solve.

P stands for the computational tasks that can be done somewhat efficiently. The letter P is short for “polynomial time:” the time required to solve the problem must scale with a polynomial in the size of the problem. For example, if you have a program that converts movies of size n in time quadratic in n, that would be efficient, and the conversion task would be in P. This is of course a theoretical approximation which does not necessarily guarantee efficiency in practice. But this approximation is convenient, and meaningful when compared to seemingly exponential-time tasks such as program enumeration.

I don’t like the terminology “polynomial time.” Calling something a polynomial emphasizes that that something is made of many monomials. However, the only monomial that is relevant in the definition of P is the one with the highest power. I thought of “power time.” I find it more to the point, and simpler, while preserving the initial.

NP is what you can solve in power time on a non-deterministic computer. This class captures many problems that people care about, but that are not known to be in P. To keep with the spirit of the post, I’ll mention that among these problems are appropriate generalizations of Tetris, Lemmings, and Super Mario.

The P vs. NP problem can also be presented as the difference between computing a solution (P) and checking it (NP). For example, if someone can solve a tough Tetris level, it is easy for you to check that. By contrast, if someone can solve the program-enumeration problem, it is not clear how you would check that. Indeed, that problem is not believed to be in NP.

I have mixed feelings about this presentation. I do find it catchy, but I don’t like that it suggests to me that to check solutions we will use the same variety of algorithms that we use in computing them. Actually, solutions can be encoded so that the verification is extremely simple, as is known since the original formulations. On the other hand, the length of the solution does increase for this encoding, and a variety of algorithms may be recovered from the encoding.

Do I think that P is different from NP? I feel that I have no idea. In the few years that I have been following this research I have already seen several efficient algorithms for tasks that at first sight looked impossible, and in some cases were conjectured to be. Here are two:

1. Suppose your computer has no memory (i.e., it only has a constant number of registers and the program counter). Can you determine in power time if a.b > c, for integers a,b, and c with many digits? This is actually possible, by Barrington’s theorem.

2. Is there a pseudorandom generator where each output bit depends on only five input bits? This is also believed to be possible, by a result of Applebaum, Ishai, and Kushilevitz.

Both relate to restricted computational models, which is not that surprising, since we do not understand general computation. However, by analogy, should there not be similar surprises for P? Or will the only surprises in computational classes be of the type that something in P is shown to be doable even in a more restricted computational model?