NEWTON MUST NOT BECOME THE HUB OF MARIJUANA

If you are a resident of Newton, MA, sign this petition.

In 2016 Massachusetts voters voted to legalize Marijuana. Except they didn’t know what they were voting for! In Colorado and Washington, the question of legalization and commercialization were completely separate. The marijuana industry apparently learned from that and rigged the Massachusetts ballot question so that a voter legalizing marijuana would also be mandating communities to open marijuana stores. For Newton, MA, this means at least 8 stores. When voters were recently polled, it became clear that the vast majority did not know that this was at stake, and that the majority of them in fact does not want to open marijuana stores in their communities. For example, when I voted I didn’t know that this was at stake. Read the official Massachusetts document to inform voters, see especially the summary on pages 12-13. There is no hint that a community would be mandated by state law to open marijuana stores unless it goes through an additional legislative crusade. Instead it says that communities can choose. I think I even read the summary back then.

Now to avoid opening stores in Newton, MA, we need a new ballot question. The City Council could have put this question on the ballot easily, but a few days ago decided that it won’t by a vote of 13 to 8. You can find the list of names of councilors and how they voted here.

Note that the council was not deciding whether or not to open stores, it was just deciding whether or not we should have a question about this on the ballot.

Instead now we are stuck doing things the hard way. To put this question on the ballot, we need to collect 6000 signatures, or 9000 if the city is completely uncooperative, a possibility which now unfortunately cannot be dismissed.

However we must do it, for the alternative is too awful. Most of the surrounding towns (Wellesley, Weston, Needham, Dedham, etc.) have already opted out. So if Newton opens stores, it basically becomes the hub for west suburban marijuana users, at least some of whom would drive under the influence of marijuana (conveniently undetectable). Proposed store locations include sites on the way to elementary schools, and there is an amusing proposal to open a marijuana store in a prime Newton Center Location, after Peet’s Coffee moves out (they lost the bid for renewal of the lease). The owners of the space admit that people have asked them for a small grocery store instead, but they think that a marijuana store would bring more traffic and business to Newton Center. I told them to open a gym instead. That too would bring traffic and business, but in addition it would have other benefits that cannabis does not have.

Advertisements

l2w

This is the post about l2w version 1.0, a Latex to WordPress converter painstakingly put together by me with big help from the LaTeX community. Click here to download it. Below is an example of what you can do, taken at random from my class notes which were compiled with this script. I also used this in conjunction with Lyx for several posts such as I believe P=NP, so you can also call this a Lyx to WordPress converter. I just export to latex and then run l2w.

This might work out of the box. More in detail, it needs tex4ht (which is included e.g. in MiKTeX distributions) and Perl (the script only uses minimalistic, shell perl commands). Simply unzip l2w.zip, which contains four files. The file post.tex is this document, which you can edit. To compile, run l2w.bat (which calls myConfig5.cfg). This will create the output post.html which you can copy and past in the wordpress HTML editor. I have tested it on an old Windows XP machine, and a more recent Windows 7 with MixTeX 2.9. I haven’t tested it on linux, which might require some simple changes to l2w.bat. For LyX I add certain commands in the preamble, and as an example the .lyx source of the post I believe P=NP is included in the zip archive.

The non-math source is compiled using full-fledged LaTeX, which means you can use your own macros and bibliography. The math source is not compiled, but more or less left as is for wordpress, which has its own LaTeX interpreter. This means that you can’t use your own macros in math mode. For the same reason, label and ref of equations are a problem. To make them work, the script fetches their values from the .aux file and then crudely applies them. This is a hack with a rather unreadable script; however, it works for me. One catch: your labels should start with eq:.

I hope this will spare you the enormous amount of time it took me to arrive to this solution. Let me know if you use it!

1 Example of what you can do

First, some of the problematic math references:

\begin{aligned} x = 2 ~~~~(1) \end{aligned}

Equation (1).

Next, some weird font stuff: \mathbb {A}, \mathrm {A}, \text {A}.

Lemma 1. Suppose that distributions A^0, A^1 over \{0,1\}^{n_A} are k_A-wise indistinguishable distributions; and distributions B^0, B^1 over \{0,1\}^{n_B} are k_B-wise indistinguishable distributions. Define C^0, C^1 over \{0,1\}^{n_A \cdot n_B} as follows:

C^b: draw a sample x \in \{0,1\}^{n_A} from A^b, and replace each bit x_i by a sample of B^{x_i} (independently).

Then C^0 and C^1 are k_A \cdot k_B-wise indistinguishable.

To finish the proof of the lower bound on the approximate degree of the AND-OR function, it remains to see that AND-OR can distinguish well the distributions C^0 and C^1. For this, we begin with observing that we can assume without loss of generality that the distributions have disjoint supports.

Claim 2. For any function f, and for any k-wise indistinguishable distributions A^0 and A^1, if f can distinguish A^0 and A^1 with probability \epsilon then there are distributions B^0 and B^1 with the same properties (k-wise indistinguishability yet distinguishable by f) and also with disjoint supports. (By disjoint support we mean for any x either \Pr [B^0 = x] = 0 or \Pr [B^1 = x] = 0.)

Proof. Let distribution C be the “common part” of A^0 and A^1. That is to say, we define C such that \Pr [C = x] := \min \{\Pr [A^0 = x], \Pr [A^1 = x]\} multiplied by some constant that normalize C into a distribution.

Then we can write A^0 and A^1 as

\begin{aligned} A^0 &= pC + (1-p) B^0 \,,\\ A^1 &= pC + (1-p) B^1 \,, \end{aligned}

where p \in [0,1], B^0 and B^1 are two distributions. Clearly B^0 and B^1 have disjoint supports.

Then we have

\begin{aligned} \mathbb {E}[f(A^0)] - \mathbb {E}[f(A^1)] =&~p \mathbb {E}[f(C)] + (1-p) \mathbb {E}[f(B^0)] \notag \\ &- p \mathbb {E}[f(C)] - (1-p) \mathbb {E}[f(B^1)] \\ =&~(1-p) \big ( \mathbb {E}[f(B^0)] - \mathbb {E}[f(B^1)] \big ) \\ \leq &~\mathbb {E}[f(B^0)] - \mathbb {E}[f(B^1)] \,. \end{aligned}

Therefore if f can distinguish A^0 and A^1 with probability \epsilon then it can also distinguish B^0 and B^1 with such probability.

Similarly, for all S \neq \varnothing such that |S| \leq k, we have

\begin{aligned} 0 = \mathbb {E}[\chi _S(A^0)] - \mathbb {E}[\chi _S(A^1)] = (1-p) \big ( \mathbb {E}[\chi _S(B^0)] - \mathbb {E}[\chi _S(B^1)] \big ) = 0 \,. \end{aligned}

Hence, B^0 and B^1 are k-wise indistinguishable. \square

Equipped with the above lemma and claim, we can finally prove the following lower bound on the approximate degree of AND-OR.

Theorem 3. d_{1/3}(AND-OR) = \Omega (\sqrt {RN}).

Proof. Let A^0, A^1 be \Omega (\sqrt {R})-wise indistinguishable distributions for AND with advantage 0.99, i.e. \Pr [\mathrm {AND}(A^1) = 1] > \Pr [\mathrm {AND}(A^0) = 1] + 0.99. Let B^0, B^1 be \Omega (\sqrt {N})-wise indistinguishable distributions for OR with advantage 0.99. By the above claim, we can assume that A^0, A^1 have disjoint supports, and the same for B^0, B^1. Compose them by the lemma, getting \Omega (\sqrt {RN})-wise indistinguishable distributions C^0,C^1. We now show that AND-OR can distinguish C^0, C^1:

  • C_0: First sample A^0. As there exists a unique x = 1^R such that \mathrm {AND}(x)= 1, \Pr [A^1 = 1^R] >0. Thus by disjointness of support \Pr [A^0 = 1^R] = 0. Therefore when sampling A^0 we always get a string with at least one “0”. But then “0” is replaced with sample from B^0. We have \Pr [B^0 = 0^N] \geq 0.99, and when B^0 = 0^N, AND-OR=0.
  • C_1: First sample A^1, and we know that A^1 = 1^R with probability at least 0.99. Each bit “1” is replaced by a sample from B^1, and we know that \Pr [B^1 = 0^N] = 0 by disjointness of support. Then AND-OR=1.

Therefore we have d_{1/3}(AND-OR)= \Omega (\sqrt {RN}). \square

1.1 Lower Bound of d_{1/3}(SURJ)

In this subsection we discuss the approximate degree of the surjectivity function. This function is defined as follows.

Definition 4. The surjectivity function SURJ\colon \left (\{0,1\}^{\log R}\right )^N \to \{0,1\}, which takes input (x_1, \dots , x_N) where x_i \in [R] for all i, has value 1 if and only if \forall j \in [R], \exists i\colon x_i = j.

First, some history. Aaronson first proved that the approximate degree of SURJ and other functions on n bits including “the collision problem” is n^{\Omega (1)}. This was motivated by an application in quantum computing. Before this result, even a lower bound of \omega (1) had not been known. Later Shi improved the lower bound to n^{2/3}, see [AS04]. The instructor believes that the quantum framework may have blocked some people from studying this problem, though it may have very well attracted others. Recently Bun and Thaler [BT17] reproved the n^{2/3} lower bound, but in a quantum-free paper, and introducing some different intuition. Soon after, together with Kothari, they proved [BKT17] that the approximate degree of SURJ is \Theta (n^{3/4}).

We shall now prove the \Omega (n^{3/4}) lower bound, though one piece is only sketched. Again we present some things in a different way from the papers.

For the proof, we consider the AND-OR function under the promise that the Hamming weight of the RN input bits is at most N. Call the approximate degree of AND-OR under this promise d_{1/3}^{\leq N}(AND-OR). Then we can prove the following theorems.

Theorem 5. d_{1/3}(SURJ) \geq d_{1/3}^{\leq N}(AND-OR).

Theorem 6. d_{1/3}^{\leq N}(AND-OR) \geq \Omega (N^{3/4}) for some suitable R = \Theta (N).

In our settings, we consider R = \Theta (N). Theorem 5 shows surprisingly that we can somehow “shrink” \Theta (N^2) bits of input into N\log N bits while maintaining the approximate degree of the function, under some promise. Without this promise, we just showed in the last subsection that the approximate degree of AND-OR is \Omega (N) instead of \Omega (N^{3/4}) as in Theorem 6.

Proof of Theorem 5. Define an N \times R matrix Y s.t. the 0/1 variable y_{ij} is the entry in the i-th row j-th column, and y_{ij} = 1 iff x_i = j. We can prove this theorem in following steps:

  1. d_{1/3}(SURJ(\overline {x})) \geq d_{1/3}(AND-OR(\overline {y})) under the promise that each row has weight 1;
  2. let z_j be the sum of the j-th column, then d_{1/3}(AND-OR(\overline {y})) under the promise that each row has weight 1, is at least d_{1/3}(AND-OR(\overline {z})) under the promise that \sum _j z_j = N;
  3. d_{1/3}(AND-OR(\overline {z})) under the promise that \sum _j z_j = N, is at least d_{1/3}^{=N}(AND-OR(\overline {y}));
  4. we can change “=N” into “\leq N”.

Now we prove this theorem step by step.

  1. Let P(x_1, \dots , x_N) be a polynomial for SURJ, where x_i = (x_i)_1, \dots , (x_i)_{\log R}. Then we have
    \begin{aligned} (x_i)_k = \sum _{j: k\text {-th bit of }j \text { is } 1} y_{ij}. \end{aligned}

    Then the polynomial P'(\overline {y}) for AND-OR(\overline {y}) is the polynomial P(\overline {x}) with (x_i)_k replaced as above, thus the degree won’t increase. Correctness follows by the promise.

  2. This is the most extraordinary step, due to Ambainis [Amb05]. In this notation, AND-OR becomes the indicator function of \forall j, z_j \neq 0. Define
    \begin{aligned} Q(z_1, \dots , z_R) := \mathop {\mathbb {E}}_{\substack {\overline {y}: \text { his rows have weight } 1\\ \text {and is consistent with }\overline {z}}} P(\overline {y}). \end{aligned}

    Clearly it is a good approximation of AND-OR(\overline {z}). It remains to show that it’s a polynomial of degree k in z’s if P is a polynomial of degree k in y’s.

    Let’s look at one monomial of degree k in P: y_{i_1j_1}y_{i_2j_2}\cdots y_{i_kj_k}. Observe that all i_\ell ’s are distinct by the promise, and by u^2 = u over \{0,1\}. By chain rule we have

    \begin{aligned} \mathbb {E}[y_{i_1j_1}\cdots y_{i_kj_k}] = \mathbb {E}[y_{i_1j_1}]\mathbb {E}[y_{i_2j_2}|y_{i_1j_1} = 1] \cdots \mathbb {E}[y_{i_kj_k}|y_{i_1j_1}=\cdots =y_{i_{k-1}j_{k-1}} = 1]. \end{aligned}

    By symmetry we have \mathbb {E}[y_{i_1j_1}] = \frac {z_{j_1}}{N}, which is linear in z’s. To get \mathbb {E}[y_{i_2j_2}|y_{i_1j_1} = 1], we know that every other entry in row i_1 is 0, so we give away row i_1, average over y’s such that \left \{\begin {array}{ll} y_{i_1j_1} = 1 &\\ y_{ij} = 0 & j\neq j_1 \end {array}\right . under the promise and consistent with z’s. Therefore

    \begin{aligned} \mathbb {E}[y_{i_2j_2}|y_{i_1j_1} = 1] = \left \{ \begin {array}{ll} \frac {z_{j_2}}{N-1} & j_1 \neq j_2,\\ \frac {z_{j_2}-1}{N-1} & j_1 = j_2. \end {array}\right . \end{aligned}

    In general we have

    \begin{aligned} \mathbb {E}[y_{i_kj_k}|y_{i_1j_1}=\cdots =y_{i_{k-1}j_{k-1}} = 1] = \frac {z_{j_k} - \#\ell < k \colon j_\ell = j_k}{N-k + 1}, \end{aligned}

    which has degree 1 in z’s. Therefore the degree of Q is not larger than that of P.

  3. Note that \forall j, z_j = \sum _i y_{ij}. Hence by replacing z’s by y’s, the degree won’t increase.
  4. We can add a “slack” variable z_0, or equivalently y_{01}, \dots , y_{0N}; then the condition \sum _{j=0}^R z_j = N actually means \sum _{j=1}^R z_j \leq N.

\square

Proof idea for Theorem 6. First, by the duality argument we can verify that d_{1/3}^{\leq N}(f) \geq d if and only if there exists d-wise indistinguishable distributions A, B such that:

  • f can distinguish A, B;
  • A and B are supported on strings of weight \leq N.

Claim 7. d_{1/3}^{\leq \sqrt {N}}(OR_N) = \Omega (N^{1/4}).

The proof needs a little more information about the weight distribution of the indistinguishable distributions corresponding to this claim. Basically, their expected weight is very small.

Now we combine these distributions with the usual ones for And using the lemma mentioned at the beginning.

What remains to show is that the final distribution is supported on Hamming weight \le N. Because by construction the R copies of the distributions for Or are sampled independently, we can use concentration of measure to prove a tail bound. This gives that all but an exponentially small measure of the distribution is supported on strings of weight \le N. The final step of the proof consists of slightly tweaking the distributions to make that measure 0. \square

1.2 Groups

Groups have many applications in theoretical computer science. Barrington [Bar89] used the permutation group S_5 to prove a very surprising result, which states that the majority function can be computed efficiently using only constant bits of memory (something which was conjectured to be false). More recently, catalytic computation [BCK^{+}14] shows that if we have a lot of memory, but it’s full with junk that cannot be erased, we can still compute more than if we had little memory. We will see some interesting properties of groups in the following.

Some famous groups used in computer science are:

  • \{0,1\}^n with bit-wise addition;
  • \mathbb {Z}_m with addition mod m ;
  • S_n, which are permutations of n elements;
  • Wreath product G:= (\mathbb {Z}_m \times \mathbb {Z}_m) \wr \mathbb {Z}_2\,, whose elements are of the form (a,b)z where z is a “flip bit”, with the following multiplication rules:
    • (a, b) 1 = 1 (b, a) ;
    • z\cdot z' := z+z' in \mathbb {Z}_2 ;
    • (a,b) \cdot (a',b') := (a+a', b+b') is the \mathbb {Z}_m\times \mathbb {Z}_m operation;

    An example is (5,7)1 \cdot (2,1) 1 = (5,7) 1 \cdot 1 (1, 2) = (6,9)0 . Generally we have

    \begin{aligned} (a, b) z \cdot (a', b') z' = \left \{ \begin {array}{ll} (a + a', b+b') z+z' & z = 1\,,\\ (a+b', b + a') z+z' & z = 0\,; \end {array}\right . \end{aligned}

  • SL_2(q) := \{2\times 2 matrices over \mathbb {F}_q with determinant 1\}, in other words, group of matrices \begin {pmatrix} a & b\\ c & d \end {pmatrix} such that ad - bc = 1.

The group SL_2(q) was invented by Galois. (If you haven’t, read his biography on wikipedia.)

Quiz. Among these groups, which is the “least abelian”? The latter can be defined in several ways. We focus on this: If we have two high-entropy distributions X, Y over G, does X \cdot Y has more entropy? For example, if X and Y are uniform over some \Omega (|G|) elements, is X\cdot Y close to uniform over G? By “close to” we mean that the statistical distance is less that a small constant from the uniform distribution. For G=(\{0,1\}^n, +), if Y=X uniform over \{0\}\times \{0,1\}^{n-1}, then X\cdot Y is the same, so there is not entropy increase even though X and Y are uniform on half the elements.

Definition 8.[Measure of Entropy] For \lVert A\rVert _2 = \left (\sum _xA(x)^2\right )^{\frac {1}{2}}, we think of \lVert A\rVert ^2_2 = 100 \frac {1}{|G|} for “high entropy”.

Note that \lVert A\rVert ^2_2 is exactly the “collision probability”, i.e. \Pr [A = A']. We will consider the entropy of the uniform distribution U as very small, i.e. \lVert U\rVert ^2_2 = \frac {1}{|G|} \approx \lVert \overline {0}\rVert ^2_2. Then we have

\begin{aligned} \lVert A - U \rVert ^2_2 &= \sum _x \left (A(x) - \frac {1}{|G|}\right )^2\\ &= \sum _x A(x)^2 - 2A(x) \frac {1}{|G|} + \frac {1}{|G|^2} \\ &= \lVert A \rVert ^2_2 - \frac {1}{|G|} \\ &= \lVert A \rVert ^2_2 - \lVert U \rVert ^2_2\\ &\approx \lVert A \rVert ^2_2\,. \end{aligned}

Theorem 9.[[Gow08], [BNP08]] If X, Y are independent over G, then

\begin{aligned} \lVert X\cdot Y - U \rVert _2 \leq \lVert X \rVert _2 \lVert Y \rVert _2 \sqrt {\frac {|G|}{d}}, \end{aligned}

where d is the minimum dimension of irreducible representation of G.

By this theorem, for high entropy distributions X and Y, we get \lVert X\cdot Y - U \rVert _2 \leq \frac {O(1)}{\sqrt {|G|d}}, thus we have

\begin{aligned} ~~~~(2) \lVert X\cdot Y - U \rVert _1 \leq \sqrt {|G|} \lVert X\cdot Y - U \rVert _2 \leq \frac {O(1)}{\sqrt {d}}. \end{aligned}

If d is large, then X \cdot Y is very close to uniform. The following table shows the d’s for the groups we’ve introduced.








G \{0,1\}^n \mathbb {Z}_m (\mathbb {Z}_m \times \mathbb {Z}_m) \wr \mathbb {Z}_2 A_n SL_2(q)






d 1 1 should be very small \frac {\log |G|}{\log \log |G|} |G|^{1/3}







Here A_n is the alternating group of even permutations. We can see that for the first groups, Equation ((2)) doesn’t give non-trivial bounds.

But for A_n we get a non-trivial bound, and for SL_2(q) we get a strong bound: we have \lVert X\cdot Y - U \rVert _2 \leq \frac {1}{|G|^{\Omega (1)}}.

References

[Amb05]    Andris Ambainis. Polynomial degree and lower bounds in quantum complexity: Collision and element distinctness with small range. Theory of Computing, 1(1):37–46, 2005.

[AS04]    Scott Aaronson and Yaoyun Shi. Quantum lower bounds for the collision and the element distinctness problems. J. of the ACM, 51(4):595–605, 2004.

[Bar89]    David A. Mix Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC^1. J. of Computer and System Sciences, 38(1):150–164, 1989.

[BCK^{+}14]    Harry Buhrman, Richard Cleve, Michal Koucký, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In ACM Symp. on the Theory of Computing (STOC), pages 857–866, 2014.

[BKT17]    Mark Bun, Robin Kothari, and Justin Thaler. The polynomial method strikes back: Tight quantum query bounds via dual polynomials. CoRR, arXiv:1710.09079, 2017.

[BNP08]    László Babai, Nikolay Nikolov, and László Pyber. Product growth and mixing in finite groups. In ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 248–257, 2008.

[BT17]    Mark Bun and Justin Thaler. A nearly optimal lower bound on the approximate degree of AC0. CoRR, abs/1703.05784, 2017.

[Gow08]    W. T. Gowers. Quasirandom groups. Combinatorics, Probability & Computing, 17(3):363–387, 2008.

An interview

The Italian newspaper Il fatto quotidiano just published online an interview with me, part of a series about Italian expats.  You can read it in English by pasting it into Google Translate. Please do not take every sentence, including the opening, as absolute.  Besides what is lost in translation, some thoughts have been de-contextualized, without my opposition, I think to make the narrative more gripping.

The main difference? “That in America, the degree you buy it. In Italy you must deserve it “. Emanuele Viola left Italy in 2001, during his doctorate at the Sapienza University of Rome. “I gave up a scholarship for a PhD at Harvard – he recalls -. Then I moved to Princeton, Columbia and Boston. ” Today he is a professor of theoretical computer science at Northeastern University in Boston. Return? “Yes, I hope to come back one day”.

Emanuele, born in 1977, was born in Rome. At 14, he programmed the video game Nathan Never, followed by Black Viper. At the age of 24, he traveled to the United States for a doctorate in computer science at Harvard University, followed by a postdoc at the Institute of Advanced Study in Princeton and one at Columbia University. “Then I became a professor at Northeastern University in Boston, where I received my professorship a few years ago.”

The typical day may vary based on academic work. “Personally, I work better if I spend a lot of time at home in almost complete isolation – explains Emanuele -. If I do not have to teach, I usually stand in front of a blank sheet trying to solve some problems – continues – until finally it’s time for my walk in the woods, so at least in one thing I can feel close to Einstein and Darwin, “he smiles. “I go to university a few days a week to teach or to attend various meetings. But I often connect via Skype “.

Italy misses him a lot, has less time to visit and the difference with the American academic world is drastic: “American universities are direct as companies in competition with each other, constantly looking for more money, better teachers and better students. Here, after you’ve been admitted, it’s almost as if you already had a degree in your pocket. It’s not exactly like this in Italy: of the 200 of my course – he recalls – I was the only one who graduated in five years, that is, not going out of course “.

For Emanuele then, the academic world and Italian research has not only fund problems. Rather. “A hundred years ago, it was typical for an American scholar to spend a period of training in Europe – continues Emanuele -. In a few generations, the situation has exactly reversed “. In this sense the problem of Italy is also that of the rest of Europe and other parts of the world. “America has amassed so many brilliant minds from all over the world that it is very difficult for another nation to be competitive, regardless of funding. Indeed, those in the European community are substantial and competitive. Right now “in America there are not many funds – he specifies – especially for the theory”.

The situation is reversed for the doctorate. “Here it does not have a fixed duration: if you do not throw yourself out, you go out when you have competitive publications, so it can take you even six or seven years. In Italy, the pre-established duration is three years, once absolutely insufficient to produce competitive publications “. This difference is also due to the fact that in the United States the salary of the student comes from the advisor, in Italy mainly from a government grant.

If we talk about training, in short, the subject changes. “Personally, I consider the instruction I received almost gratuitously at Sapienza, much more solid than the typical American preparation. This however reverses completely for advanced studies. Here there are more chances for deserving students. In Italy there is very little research in my field “.

The most beautiful memories? The rare moments when the clear sensation of solving a mathematical problem arrives. “It happened to me once while rolling on my ball and three times while walking through cemeteries,” he smiles. The goal for Emanuele is to return to Italy, even if with the family in America it is not easy. “For some time I have been planning a sabbatical year in Italy. I hope to get in touch with the contacts and that maybe one day not too far they will translate into a return “.

The environment in a private university where taxes exceed 50 thousand dollars a year is completely different from “what I remember from my student days”. Yet Emanuele is keen to say something: “No, I do not want to give the impression that money makes a big difference. The fact is that America has succeeded in attracting the best minds from all over the world – he concludes -. And no other country has succeeded “.

 

Nonclassical polynomials and exact computation of Boolean functions

Guest post by Abhishek Bhrushundi.

I would like to thank Emanuele for giving me the opportunity to write a guest post here. I recently stumbled upon an old post on this blog which discussed two papers: Nonclassical polynomials as a barrier to polynomial lower bounds by Bhowmick and Lovett, and Anti-concentration for random polynomials by Nguyen and Vu. Towards the end of the post, Emanuele writes:

“Having discussed these two papers in a sequence, a natural question is whether non-classical polynomials help for exact computation as considered in the second paper. In fact, this question is asked in the paper by Bhowmick and Lovett, who conjecture that the answer is negative: for exact computation, non-classical polynomials should not do better than classical.”

In a joint work with Prahladh Harsha and Srikanth Srinivasan from last year, On polynomial approximations over \mathbb {Z}/2^k\mathbb {Z}, we study exact computation of Boolean functions by nonclassical polynomials. In particular, one of our results disproves the aforementioned conjecture of Bhowmick and Lovett by giving an example of a Boolean function for which low degree nonclassical polynomials end up doing better than classical polynomials of the same degree in the case of exact computation.

The counterexample we propose is the elementary symmetric polynomial of degree 16 in \mathbb {F}_2[x_1, \ldots , x_n]. (Such elementary symmetric polynomials also serve as counterexamples to the inverse conjecture for the Gowers norm [LMS11GT07], and this was indeed the reason why we picked these functions as candidate counterexamples),

\begin{aligned}S_{16}(x_1, \ldots , x_n) = \left (\sum _{S\subseteq [n],|S| = 16} \prod _{i \in S}x_i\right )\textrm { mod 2} = {|x| \choose 16} \textrm { mod 2},\end{aligned}

where |x| = \sum _{i=1}^n x_i is the Hamming weight of x. One can verify (using, for example, Lucas’s theorem) that S_{16}(x_1, \ldots , x_n) = 1 if and only if the 5^{th} least significant bit of |x| is 1.

We use that no polynomial of degree less than or equal to 15 can compute S_{16}(x) correctly on more than half of the points in \{0,1\}^n.

Theorem 1. Let P be a polynomial of degree at most 15 in \mathbb {F}_2[x_1, \ldots , x_n]. Then

\begin{aligned}\Pr _{x \sim \{0,1\}^n}[P(x) = S_{16}(x)] \le \frac {1}{2} + o(1).\end{aligned}

[Emanuele’s note. Let me take advantage of this for a historical remark. Green and Tao first claimed this fact and sent me and several others a complicated proof. Then I pointed out the paper by Alon and Beigel [AB01]. Soon after they and I independently discovered the short proof reported in [GT07].]

The constant functions (degree 0 polynomials) can compute any Boolean function on half of the points in \{0,1\}^n and this result shows that even polynomials of higher degree don’t do any better as far as S_{16}(x_1, \ldots , x_n) is concerned. What we prove is that there is a nonclassical polynomial of degree 14 that computes S_{16}(x_1, \ldots , x_n) on 9/16 \ge 1/2 + \Omega (1) of the points in \{0,1\}^n.

Theorem 2. There is a nonclassical polynomial P of degree 14 such that

\begin{aligned}\Pr _{x \sim \{0,1\}^n}[P(x) = S_{16}(x)] = \frac {9}{16} - o(1).\end{aligned}

A nonclassical polynomial takes values on the torus \mathbb {T} = \mathbb {R}/\mathbb {Z} and in order to compare the output of a Boolean function (i.e., a classical polynomial) to that of a nonclassical polynomial it is convenient to think of the range of Boolean functions to be \{0,1/2\} \subset \mathbb {T}. So, for example, S_{16}(x_1, \ldots , x_n) = \frac {1}{2} if |x|_4 = 1, and S_{16}(x_1, \ldots , x_n) = 0 otherwise. Here |x|_4 denotes the 5^{th} least significant bit of |x|.

We show that the nonclassical polynomial that computes S_{16}(x) on 9/16 of the points in \{0,1\}^n is

\begin{aligned}P(x_1, \ldots , x_n) = \frac {\sum _{S \subseteq [n], |S|=12} \prod _{i \in S}x_i}{8} \textrm { mod 1}= \frac {{|x| \choose 12}}{8} \textrm { mod 1} .\end{aligned}

The degree of this nonclassical polynomial is 14 but I wouldn’t get into much detail as to why this is case (See [BL15] for a primer on the notion of degree in the nonclassical world).

Understanding how P(x) behaves comes down to figuring out the largest power of two that divides |x| \choose 12 for a given x: if the largest power of two that divides |x| \choose 12 is 2 then P(x) = 1/2, otherwise if the largest power is at least 3 then P(x) = 0. Fortunately, there is a generalization of Lucas’s theorem, known as Kummer’s theorem, that helps characterize this:

Theorem 3.[Kummer’s theorem] The largest power of 2 dividing a \choose b for a,b \in \mathbb {N}, a \ge b, is equal to the number of borrows required when subtracting b from a in base 2.
Equipped with Kummer’s theorem, it doesn’t take much work to arrive at the following conclusion.

Lemma 4. P(x) = S_{16}(x) if either |x|_{2} = 0 or (|x|_2, |x|_3, |x|_4, |x|_5) = (1,0,0,0), where |x|_i denotes the (i+1)^{th} least significant bit of |x|.

If x = (x_1, \ldots , x_n) is uniformly distributed in \{0,1\}^n then it’s not hard to verify that the bits |x|_0, \ldots , |x|_5 are almost uniformly and independently distributed in \{0,1\}, and so the above lemma proves that P(x) computes S_{16}(x) on 9/16 of the points in \{0,1\}^n. It turns out that one can easily generalize the above argument to show that S_{2^\ell }(x) is a counterexample to Bhowmick and Lovett’s conjecture for every \ell \ge 4.

We also show in our paper that it is not the case that nonclassical polynomials always do better than classical polynomials in the case of exact computation — for the majority function, nonclassical polynomials do as badly as their classical counterparts (this was also conjectured by Bhowmick and Lovett in the same work), and the Razborov-Smolensky bound for classical polynomials extends to nonclassical polynomials.

We started out trying to prove that S_4(x_1, \ldots , x_n) is a counterexample but couldn’t. It would be interesting to check if it is one.

References

[AB01]    N. Alon and R. Beigel. Lower bounds for approximations by low degree polynomials over z m. In Proceedings 16th Annual IEEE Conference on Computational Complexity, pages 184–187, 2001.

[BL15]    Abhishek Bhowmick and Shachar Lovett. Nonclassical polynomials as a barrier to polynomial lower bounds. In Proceedings of the 30th Conference on Computational Complexity, pages 72–87, 2015.

[GT07]    B. Green and T. Tao. The distribution of polynomials over finite fields, with applications to the Gowers norms. ArXiv e-prints, November 2007.

[LMS11]   Shachar Lovett, Roy Meshulam, and Alex Samorodnitsky. Inverse conjecture for the gowers norm is false. Theory of Computing, 7(9):131–145, 2011.

Entropy polarization

Sometimes you see quantum popping up everywhere. I just did the opposite and gave a classical talk at a quantum workshop, part of an AMS meeting held at Northeastern University, which poured yet another avalanche of talks onto the Boston area. I spoke about the complexity of distributions, also featured in an earlier post, including a result I posted two weeks ago which gives a boolean function f:\{0,1\}^{n}\to \{0,1\} such that the output distribution of any AC^{0} circuit has statistical distance 1/2-1/n^{\omega (1)} from (Y,f(Y)) for uniform Y\in \{0,1\}^{n}. In particular, no AC^{0} circuit can compute f much better than guessing at random even if the circuit is allowed to sample the input itself. The slides for the talk are here.

The new technique that enables this result I’ve called entropy polarization. Basically, for every AC^{0} circuit mapping any number L of bits into n bits, there exists a small set S of restrictions such that:

(1) the restrictions preserve the output distribution, and

(2) for every restriction r\in S, the output distribution of the circuit restricted to r either has min-entropy 0 or n^{0.9}. Whence polarization: the entropy will become either very small or very large.

Such a result is useless and trivial to prove with |S|=2^{n}; the critical feature is that one can obtain a much smaller S of size 2^{n-n^{\Omega (1)}}.

Entropy polarization can be used in conjunction with a previous technique of mine that works for high min-entropy distributions to obtain the said sampling lower bound.

It would be interesting to see if any of this machinery can yield a separation between quantum and classical sampling for constant-depth circuits, which is probably a reason why I was invited to give this talk.

Child Care at STOC 2018

The organizers asked me to advertise this and I sympathize:

We are pleased to announce that we will provide pooled, subsidized child care at STOC 2018. The cost will be $40 per day per child for regular conference attendees, and $20 per day per child for students.
For more detailed information, including how to register for STOC 2018 childcare, see http://acm-stoc.org/stoc2018/childcare.html

Ilias Diakonikolas and David Kempe (local arrangements chairs)

Hardness amplification proofs require majority… and 15 years

Aryeh Grinberg, Ronen Shaltiel, and myself have just posted a paper which proves conjectures I made 15 years ago (the historians want to consult the last paragraph of [2] and my Ph.D. thesis).

At that time, I was studying hardness amplification, a cool technique to take a function f:\{0,1\}^{k}\to \{0,1\} that is somewhat hard on average, and transform it into another function f':\{0,1\}^{n}\to \{0,1\} that is much harder on average. If you call a function \delta -hard if it cannot be computed on a \delta fraction of the inputs, you can start e.g. with f that is 0.1-hard and obtain f' that is 1/2-1/n^{100} hard, or more. This is very important because functions with the latter hardness imply pseudorandom generators with Nisan’s design technique, and also “additional” lower bounds using the “discriminator lemma.”

The simplest and most famous technique is Yao’s XOR lemma, where

\begin{aligned} f'(x_{1},x_{2},\ldots ,x_{t}):=f(x_{1})\oplus f(x_{2})\oplus \ldots \oplus f(x_{t}) \end{aligned}

and the hardness of f' decays exponentially with t. (So to achieve the parameters above it suffices to take t=O(\log k).)

At the same time I was also interested in circuit lower bounds, so it was natural to try to use this technique for classes for which we do have lower bounds. So I tried, and… oops, it does not work! In all known techniques, the reduction circuit cannot be implemented in a class smaller than TC^{0} – a class for which we don’t have lower bounds and for which we think it will be hard to get them, also because of the Natural proofs barrier.

Eventually, I conjectured that this is inherent, namely that you can take any hardness amplification reduction, or proof, and use it to compute majority. To be clear, this conjecture applied to black-box proofs: decoding arguments which take anything that computes f' too well and turn it into something which computes f too well. There were several partial results, but they all had to restrict the proof further, and did not capture all available techniques.

Should you have had any hope that black-box proofs might do the job, in this paper we prove the full conjecture (improving on a number of incomparable works in the literature, including a 10-year-anniversary work by Shaltiel and myself which proved the conjecture for non-adaptive proofs).

Indistinguishability

One thing that comes up in the proof is the following basic problem. You have a distribution X on n bits that has large entropy, very close to n. A classic result shows that most bits of X are close to uniform. We needed an adaptive version of this, showing that a decision tree making few queries cannot distinguish X from uniform, as long as the tree does not query a certain small forbidden set of variables. This also follows from recent and independent work of Or Meir and Avi Wigderson.

Turns out this natural extension is not enough for us. In a nutshell, it is difficult to understand what queries an arbitrary reduction is making, and so it is hard to guarantee that the reduction does not query the forbidden set. So we prove a variant, where the variables are not forbidden, but are fixed. Basically, you condition on some fixing X_{B}=v of few variables, and then the resulting distribution X|X_{B}=v is indistinguishable from the distribution U|U_{B}=v where U is uniform. Now the queries are not forbidden but have a fixed answer, and this makes things much easier. (Incidentally, you can’t get this simply by fixing the forbidden set.)

Fine, so what?

One great question remains. Can you think of a counter-example to the XOR lemma for a class such as constant-depth circuits with parity gates?

But there is something more why I am interested in this. Proving 1/2-1/n average-case hardness results for restricted classes “just” beyond AC^{0} is more than a long-standing open question in lower bounds: It is necessary even for worst-case lower bounds, both in circuit and communication complexity, as we discussed earlier. And here’s hardness amplification, which intuitively should provide such hardness results. It was given many different proofs, see e.g. [1]. However, none can be applied as we just saw. I don’t know, someone taking results at face value may even start thinking that such average-case hardness results are actually false.

References

[1]   Oded Goldreich, Noam Nisan, and Avi Wigderson. On Yao’s XOR lemma. Technical Report TR95–050, Electronic Colloquium on Computational Complexity, March 1995. http://www.eccc.uni-trier.de/.

[2]   Emanuele Viola. The complexity of constructing pseudorandom generators from hard functions. Computational Complexity, 13(3-4):147–188, 2004.

I believe P=NP

The only things that matter in a theoretical study are those that you can prove, but it’s always fun to speculate. After worrying about P vs. NP for half my life, and having carefully reviewed the available “evidence” I have decided I believe that P = NP.

A main justification for my belief is history:

  1. In the 1950’s Kolmogorov conjectured that multiplication of n-bit integers requires time \Omega (n^{2}). That’s the time it takes to multiply using the method that mankind has used for at least six millennia. Presumably, if a better method existed it would have been found already. Kolmogorov subsequently started a seminar where he presented again this conjecture. Within one week of the start of the seminar, Karatsuba discovered his famous algorithm running in time O(n^{\log _{2}3})\approx n^{1.58}. He told Kolmogorov about it, who became agitated and terminated the seminar. Karatsuba’s algorithm unleashed a new age of fast algorithms, including the next one. I recommend Karatsuba’s own account [9] of this compelling story.
  2. In 1968 Strassen started working on proving that the standard O(n^{3}) algorithm for multiplying two n\times n matrices is optimal. Next year his landmark O(n^{\log _{2}7})\approx n^{2.81} algorithm appeared in his paper “Gaussian elimination is not optimal” [12].
  3. In the 1970s Valiant showed that the graphs of circuits computing certain linear transformations must be a super-concentrator, a graph which certain strong connectivity properties. He conjectured that super-concentrators must have a super-linear number of wires, from which super-linear circuit lower bounds follow [13]. However, he later disproved the conjectured [14]: building on a result of Pinsker he constructed super-concentrators using a linear number of edges.
  4. At the same time Valiant also defined rigid matrices and showed that an explicit construction of such matrices yields new circuit lower bounds. A specific matrix that was conjectured to be sufficiently rigid is the Hadamard matrix. Alman and Williams recently showed that, in fact, the Hadamard matrix is not rigid [1].
  5. After finite automata, a natural step in lower bounds was to study sightly more general programs with constant memory. Consider a program that only maintains O(1) bits of memory, and reads the input bits in a fixed order, where bits may be read several times. It seems quite obvious that such a program could not compute the majority function in polynomial time. This was explicitly conjectured by several people, including [5]. Barrington [4] famously disproved the conjecture by showing that in fact those seemingly very restricted constant-memory programs are in fact equivalent to log-depth circuits, which can compute majority (and many other things).
  6. [Added 2/18] Mansour, Nisan, and Tiwari conjectured [10] in 1990 that computing hash functions on n bits requires circuit size \Omega (n\log n). Their conjecture was disproved in 2008 [8] where a circuit of size O(n) was given.

And these are just some of the more famous ones. The list goes on and on. In number-on-forehead communication complexity, the function Majority-of-Majorities was a candidate for being hard for more than logarithmically many parties. This was disproved in [3] and subsequent works, where many other counter-intuitive protocols are presented. In data structures, would you think it possible to switch between binary and ternary representation of a number using constant time per digit and zero space overhead? Turns out it is [117]. Do you believe factoring is hard? Then you also believe there are pseudorandom generators where each output bit depends only on O(1) input bits [2]. Known algorithms for directed connectivity use either super-polynomial time or polynomial memory. But if you are given access to polynomial memory full of junk that you can’t delete, then you can solve directed connectivity using only logarithmic (clean) memory and polynomial time [6]. And I haven’t even touched on the many broken conjectures in cryptography, most recently related to obfuscation.

On the other hand, arguably the main thing that’s surprising in the lower bounds we have is that they can be proved at all. The bounds themselves are hardly surprising. Of course, the issue may be that we can prove so few lower bounds that we shouldn’t expect surprises. Some of the undecidability results I do consider surprising, for example Hilbert’s 10th problem. But what is actually surprising in those results are the algorithms, showing that even very restricted models can simulate more complicated ones (same for the theory of NP completeness). In terms of lower bounds they all build on diagonalization, that is, go through every program and flip the answer, which is boring.

The evidence is clear: we have grossly underestimated the reach of efficient computation, in a variety of contexts. All signs indicate that we will continue to see bigger and bigger surprises in upper bounds, and P=NP. Do I really believe the formal inclusion P=NP? Maybe, let me not pick parameters. What I believe is that the idea that lower bounds are obviously true and we just can’t prove them is not only baseless but even clashes with historical evidence. It’s the upper bounds that are missing.

References

[1]   Josh Alman and R. Ryan Williams. Probabilistic rank and matrix rigidity. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 641–652, 2017.

[2]   Benny Applebaum, Yuval Ishai, and Eyal Kushilevitz. Cryptography in NC^0. SIAM J. on Computing, 36(4):845–888, 2006.

[3]   László Babai, Anna Gál, Peter G. Kimmel, and Satyanarayana V. Lokam. Communication complexity of simultaneous messages. SIAM J. on Computing, 33(1):137–166, 2003.

[4]   David A. Mix Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC^1. J. of Computer and System Sciences, 38(1):150–164, 1989.

[5]   Allan Borodin, Danny Dolev, Faith E. Fich, and Wolfgang J. Paul. Bounds for width two branching programs. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 25-27 April, 1983, Boston, Massachusetts, USA, pages 87–93, 1983.

[6]   Harry Buhrman, Richard Cleve, Michal Koucký, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In ACM Symp. on the Theory of Computing (STOC), pages 857–866, 2014.

[7]   Yevgeniy Dodis, Mihai Pǎtraşcu, and Mikkel Thorup. Changing base without losing space. In 42nd ACM Symp. on the Theory of Computing (STOC), pages 593–602. ACM, 2010.

[8]   Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography with constant computational overhead. In 40th ACM Symp. on the Theory of Computing (STOC), pages 433–442, 2008.

[9]   A. A. Karatsuba. The complexity of computations. Trudy Mat. Inst. Steklov., 211(Optim. Upr. i Differ. Uravn.):186–202, 1995.

[10]   Yishay Mansour, Noam Nisan, and Prasoon Tiwari. The computational complexity of universal hashing. Theoretical Computer Science, 107:121–133, 1993.

[11]   Mihai Pǎtraşcu. Succincter. In 49th IEEE Symp. on Foundations of Computer Science (FOCS). IEEE, 2008.

[12]   Volker Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354–356, 1969.

[13]   Valiant. On non-linear lower bounds in computational complexity. In ACM Symp. on the Theory of Computing (STOC), pages 45–53, 1975.

[14]   Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. In 6th Symposium on Mathematical Foundations of Computer Science, volume 53 of Lecture Notes in Computer Science, pages 162–176. Springer, 1977.