An interview

The Italian newspaper Il fatto quotidiano just published online an interview with me, part of a series about Italian expats.  You can read it in English by pasting it into Google Translate. Please do not take every sentence, including the opening, as absolute.  Besides what is lost in translation, some thoughts have been de-contextualized, without my opposition, I think to make the narrative more gripping.

The main difference? “That in America, the degree you buy it. In Italy you must deserve it “. Emanuele Viola left Italy in 2001, during his doctorate at the Sapienza University of Rome. “I gave up a scholarship for a PhD at Harvard – he recalls -. Then I moved to Princeton, Columbia and Boston. ” Today he is a professor of theoretical computer science at Northeastern University in Boston. Return? “Yes, I hope to come back one day”.

Emanuele, born in 1977, was born in Rome. At 14, he programmed the video game Nathan Never, followed by Black Viper. At the age of 24, he traveled to the United States for a doctorate in computer science at Harvard University, followed by a postdoc at the Institute of Advanced Study in Princeton and one at Columbia University. “Then I became a professor at Northeastern University in Boston, where I received my professorship a few years ago.”

The typical day may vary based on academic work. “Personally, I work better if I spend a lot of time at home in almost complete isolation – explains Emanuele -. If I do not have to teach, I usually stand in front of a blank sheet trying to solve some problems – continues – until finally it’s time for my walk in the woods, so at least in one thing I can feel close to Einstein and Darwin, “he smiles. “I go to university a few days a week to teach or to attend various meetings. But I often connect via Skype “.

Italy misses him a lot, has less time to visit and the difference with the American academic world is drastic: “American universities are direct as companies in competition with each other, constantly looking for more money, better teachers and better students. Here, after you’ve been admitted, it’s almost as if you already had a degree in your pocket. It’s not exactly like this in Italy: of the 200 of my course – he recalls – I was the only one who graduated in five years, that is, not going out of course “.

For Emanuele then, the academic world and Italian research has not only fund problems. Rather. “A hundred years ago, it was typical for an American scholar to spend a period of training in Europe – continues Emanuele -. In a few generations, the situation has exactly reversed “. In this sense the problem of Italy is also that of the rest of Europe and other parts of the world. “America has amassed so many brilliant minds from all over the world that it is very difficult for another nation to be competitive, regardless of funding. Indeed, those in the European community are substantial and competitive. Right now “in America there are not many funds – he specifies – especially for the theory”.

The situation is reversed for the doctorate. “Here it does not have a fixed duration: if you do not throw yourself out, you go out when you have competitive publications, so it can take you even six or seven years. In Italy, the pre-established duration is three years, once absolutely insufficient to produce competitive publications “. This difference is also due to the fact that in the United States the salary of the student comes from the advisor, in Italy mainly from a government grant.

If we talk about training, in short, the subject changes. “Personally, I consider the instruction I received almost gratuitously at Sapienza, much more solid than the typical American preparation. This however reverses completely for advanced studies. Here there are more chances for deserving students. In Italy there is very little research in my field “.

The most beautiful memories? The rare moments when the clear sensation of solving a mathematical problem arrives. “It happened to me once while rolling on my ball and three times while walking through cemeteries,” he smiles. The goal for Emanuele is to return to Italy, even if with the family in America it is not easy. “For some time I have been planning a sabbatical year in Italy. I hope to get in touch with the contacts and that maybe one day not too far they will translate into a return “.

The environment in a private university where taxes exceed 50 thousand dollars a year is completely different from “what I remember from my student days”. Yet Emanuele is keen to say something: “No, I do not want to give the impression that money makes a big difference. The fact is that America has succeeded in attracting the best minds from all over the world – he concludes -. And no other country has succeeded “.

 

Nonclassical polynomials and exact computation of Boolean functions

Guest post by Abhishek Bhrushundi.

I would like to thank Emanuele for giving me the opportunity to write a guest post here. I recently stumbled upon an old post on this blog which discussed two papers: Nonclassical polynomials as a barrier to polynomial lower bounds by Bhowmick and Lovett, and Anti-concentration for random polynomials by Nguyen and Vu. Towards the end of the post, Emanuele writes:

“Having discussed these two papers in a sequence, a natural question is whether non-classical polynomials help for exact computation as considered in the second paper. In fact, this question is asked in the paper by Bhowmick and Lovett, who conjecture that the answer is negative: for exact computation, non-classical polynomials should not do better than classical.”

In a joint work with Prahladh Harsha and Srikanth Srinivasan from last year, On polynomial approximations over \mathbb {Z}/2^k\mathbb {Z}, we study exact computation of Boolean functions by nonclassical polynomials. In particular, one of our results disproves the aforementioned conjecture of Bhowmick and Lovett by giving an example of a Boolean function for which low degree nonclassical polynomials end up doing better than classical polynomials of the same degree in the case of exact computation.

The counterexample we propose is the elementary symmetric polynomial of degree 16 in \mathbb {F}_2[x_1, \ldots , x_n]. (Such elementary symmetric polynomials also serve as counterexamples to the inverse conjecture for the Gowers norm [LMS11GT07], and this was indeed the reason why we picked these functions as candidate counterexamples),

\begin{aligned}S_{16}(x_1, \ldots , x_n) = \left (\sum _{S\subseteq [n],|S| = 16} \prod _{i \in S}x_i\right )\textrm { mod 2} = {|x| \choose 16} \textrm { mod 2},\end{aligned}

where |x| = \sum _{i=1}^n x_i is the Hamming weight of x. One can verify (using, for example, Lucas’s theorem) that S_{16}(x_1, \ldots , x_n) = 1 if and only if the 5^{th} least significant bit of |x| is 1.

We use that no polynomial of degree less than or equal to 15 can compute S_{16}(x) correctly on more than half of the points in \{0,1\}^n.

Theorem 1. Let P be a polynomial of degree at most 15 in \mathbb {F}_2[x_1, \ldots , x_n]. Then

\begin{aligned}\Pr _{x \sim \{0,1\}^n}[P(x) = S_{16}(x)] \le \frac {1}{2} + o(1).\end{aligned}

[Emanuele’s note. Let me take advantage of this for a historical remark. Green and Tao first claimed this fact and sent me and several others a complicated proof. Then I pointed out the paper by Alon and Beigel [AB01]. Soon after they and I independently discovered the short proof reported in [GT07].]

The constant functions (degree 0 polynomials) can compute any Boolean function on half of the points in \{0,1\}^n and this result shows that even polynomials of higher degree don’t do any better as far as S_{16}(x_1, \ldots , x_n) is concerned. What we prove is that there is a nonclassical polynomial of degree 14 that computes S_{16}(x_1, \ldots , x_n) on 9/16 \ge 1/2 + \Omega (1) of the points in \{0,1\}^n.

Theorem 2. There is a nonclassical polynomial P of degree 14 such that

\begin{aligned}\Pr _{x \sim \{0,1\}^n}[P(x) = S_{16}(x)] = \frac {9}{16} - o(1).\end{aligned}

A nonclassical polynomial takes values on the torus \mathbb {T} = \mathbb {R}/\mathbb {Z} and in order to compare the output of a Boolean function (i.e., a classical polynomial) to that of a nonclassical polynomial it is convenient to think of the range of Boolean functions to be \{0,1/2\} \subset \mathbb {T}. So, for example, S_{16}(x_1, \ldots , x_n) = \frac {1}{2} if |x|_4 = 1, and S_{16}(x_1, \ldots , x_n) = 0 otherwise. Here |x|_4 denotes the 5^{th} least significant bit of |x|.

We show that the nonclassical polynomial that computes S_{16}(x) on 9/16 of the points in \{0,1\}^n is

\begin{aligned}P(x_1, \ldots , x_n) = \frac {\sum _{S \subseteq [n], |S|=12} \prod _{i \in S}x_i}{8} \textrm { mod 1}= \frac {{|x| \choose 12}}{8} \textrm { mod 1} .\end{aligned}

The degree of this nonclassical polynomial is 14 but I wouldn’t get into much detail as to why this is case (See [BL15] for a primer on the notion of degree in the nonclassical world).

Understanding how P(x) behaves comes down to figuring out the largest power of two that divides |x| \choose 12 for a given x: if the largest power of two that divides |x| \choose 12 is 2 then P(x) = 1/2, otherwise if the largest power is at least 3 then P(x) = 0. Fortunately, there is a generalization of Lucas’s theorem, known as Kummer’s theorem, that helps characterize this:

Theorem 3.[Kummer’s theorem] The largest power of 2 dividing a \choose b for a,b \in \mathbb {N}, a \ge b, is equal to the number of borrows required when subtracting b from a in base 2.
Equipped with Kummer’s theorem, it doesn’t take much work to arrive at the following conclusion.

Lemma 4. P(x) = S_{16}(x) if either |x|_{2} = 0 or (|x|_2, |x|_3, |x|_4, |x|_5) = (1,0,0,0), where |x|_i denotes the (i+1)^{th} least significant bit of |x|.

If x = (x_1, \ldots , x_n) is uniformly distributed in \{0,1\}^n then it’s not hard to verify that the bits |x|_0, \ldots , |x|_5 are almost uniformly and independently distributed in \{0,1\}, and so the above lemma proves that P(x) computes S_{16}(x) on 9/16 of the points in \{0,1\}^n. It turns out that one can easily generalize the above argument to show that S_{2^\ell }(x) is a counterexample to Bhowmick and Lovett’s conjecture for every \ell \ge 4.

We also show in our paper that it is not the case that nonclassical polynomials always do better than classical polynomials in the case of exact computation — for the majority function, nonclassical polynomials do as badly as their classical counterparts (this was also conjectured by Bhowmick and Lovett in the same work), and the Razborov-Smolensky bound for classical polynomials extends to nonclassical polynomials.

We started out trying to prove that S_4(x_1, \ldots , x_n) is a counterexample but couldn’t. It would be interesting to check if it is one.

References

[AB01]    N. Alon and R. Beigel. Lower bounds for approximations by low degree polynomials over z m. In Proceedings 16th Annual IEEE Conference on Computational Complexity, pages 184–187, 2001.

[BL15]    Abhishek Bhowmick and Shachar Lovett. Nonclassical polynomials as a barrier to polynomial lower bounds. In Proceedings of the 30th Conference on Computational Complexity, pages 72–87, 2015.

[GT07]    B. Green and T. Tao. The distribution of polynomials over finite fields, with applications to the Gowers norms. ArXiv e-prints, November 2007.

[LMS11]   Shachar Lovett, Roy Meshulam, and Alex Samorodnitsky. Inverse conjecture for the gowers norm is false. Theory of Computing, 7(9):131–145, 2011.

Entropy polarization

Sometimes you see quantum popping up everywhere. I just did the opposite and gave a classical talk at a quantum workshop, part of an AMS meeting held at Northeastern University, which poured yet another avalanche of talks onto the Boston area. I spoke about the complexity of distributions, also featured in an earlier post, including a result I posted two weeks ago which gives a boolean function f:\{0,1\}^{n}\to \{0,1\} such that the output distribution of any AC^{0} circuit has statistical distance 1/2-1/n^{\omega (1)} from (Y,f(Y)) for uniform Y\in \{0,1\}^{n}. In particular, no AC^{0} circuit can compute f much better than guessing at random even if the circuit is allowed to sample the input itself. The slides for the talk are here.

The new technique that enables this result I’ve called entropy polarization. Basically, for every AC^{0} circuit mapping any number L of bits into n bits, there exists a small set S of restrictions such that:

(1) the restrictions preserve the output distribution, and

(2) for every restriction r\in S, the output distribution of the circuit restricted to r either has min-entropy 0 or n^{0.9}. Whence polarization: the entropy will become either very small or very large.

Such a result is useless and trivial to prove with |S|=2^{n}; the critical feature is that one can obtain a much smaller S of size 2^{n-n^{\Omega (1)}}.

Entropy polarization can be used in conjunction with a previous technique of mine that works for high min-entropy distributions to obtain the said sampling lower bound.

It would be interesting to see if any of this machinery can yield a separation between quantum and classical sampling for constant-depth circuits, which is probably a reason why I was invited to give this talk.

Child Care at STOC 2018

The organizers asked me to advertise this and I sympathize:

We are pleased to announce that we will provide pooled, subsidized child care at STOC 2018. The cost will be $40 per day per child for regular conference attendees, and $20 per day per child for students.
For more detailed information, including how to register for STOC 2018 childcare, see http://acm-stoc.org/stoc2018/childcare.html

Ilias Diakonikolas and David Kempe (local arrangements chairs)

Hardness amplification proofs require majority… and 15 years

Aryeh Grinberg, Ronen Shaltiel, and myself have just posted a paper which proves conjectures I made 15 years ago (the historians want to consult the last paragraph of [2] and my Ph.D. thesis).

At that time, I was studying hardness amplification, a cool technique to take a function f:\{0,1\}^{k}\to \{0,1\} that is somewhat hard on average, and transform it into another function f':\{0,1\}^{n}\to \{0,1\} that is much harder on average. If you call a function \delta -hard if it cannot be computed on a \delta fraction of the inputs, you can start e.g. with f that is 0.1-hard and obtain f' that is 1/2-1/n^{100} hard, or more. This is very important because functions with the latter hardness imply pseudorandom generators with Nisan’s design technique, and also “additional” lower bounds using the “discriminator lemma.”

The simplest and most famous technique is Yao’s XOR lemma, where

\begin{aligned} f'(x_{1},x_{2},\ldots ,x_{t}):=f(x_{1})\oplus f(x_{2})\oplus \ldots \oplus f(x_{t}) \end{aligned}

and the hardness of f' decays exponentially with t. (So to achieve the parameters above it suffices to take t=O(\log k).)

At the same time I was also interested in circuit lower bounds, so it was natural to try to use this technique for classes for which we do have lower bounds. So I tried, and… oops, it does not work! In all known techniques, the reduction circuit cannot be implemented in a class smaller than TC^{0} – a class for which we don’t have lower bounds and for which we think it will be hard to get them, also because of the Natural proofs barrier.

Eventually, I conjectured that this is inherent, namely that you can take any hardness amplification reduction, or proof, and use it to compute majority. To be clear, this conjecture applied to black-box proofs: decoding arguments which take anything that computes f' too well and turn it into something which computes f too well. There were several partial results, but they all had to restrict the proof further, and did not capture all available techniques.

Should you have had any hope that black-box proofs might do the job, in this paper we prove the full conjecture (improving on a number of incomparable works in the literature, including a 10-year-anniversary work by Shaltiel and myself which proved the conjecture for non-adaptive proofs).

Indistinguishability

One thing that comes up in the proof is the following basic problem. You have a distribution X on n bits that has large entropy, very close to n. A classic result shows that most bits of X are close to uniform. We needed an adaptive version of this, showing that a decision tree making few queries cannot distinguish X from uniform, as long as the tree does not query a certain small forbidden set of variables. This also follows from recent and independent work of Or Meir and Avi Wigderson.

Turns out this natural extension is not enough for us. In a nutshell, it is difficult to understand what queries an arbitrary reduction is making, and so it is hard to guarantee that the reduction does not query the forbidden set. So we prove a variant, where the variables are not forbidden, but are fixed. Basically, you condition on some fixing X_{B}=v of few variables, and then the resulting distribution X|X_{B}=v is indistinguishable from the distribution U|U_{B}=v where U is uniform. Now the queries are not forbidden but have a fixed answer, and this makes things much easier. (Incidentally, you can’t get this simply by fixing the forbidden set.)

Fine, so what?

One great question remains. Can you think of a counter-example to the XOR lemma for a class such as constant-depth circuits with parity gates?

But there is something more why I am interested in this. Proving 1/2-1/n average-case hardness results for restricted classes “just” beyond AC^{0} is more than a long-standing open question in lower bounds: It is necessary even for worst-case lower bounds, both in circuit and communication complexity, as we discussed earlier. And here’s hardness amplification, which intuitively should provide such hardness results. It was given many different proofs, see e.g. [1]. However, none can be applied as we just saw. I don’t know, someone taking results at face value may even start thinking that such average-case hardness results are actually false.

References

[1]   Oded Goldreich, Noam Nisan, and Avi Wigderson. On Yao’s XOR lemma. Technical Report TR95–050, Electronic Colloquium on Computational Complexity, March 1995. http://www.eccc.uni-trier.de/.

[2]   Emanuele Viola. The complexity of constructing pseudorandom generators from hard functions. Computational Complexity, 13(3-4):147–188, 2004.