An interview

The Italian newspaper Il fatto quotidiano just published online an interview with me, part of a series about Italian expats.  You can read it in English by pasting it into Google Translate. Please do not take every sentence, including the opening, as absolute.  Besides what is lost in translation, some thoughts have been de-contextualized, without my opposition, I think to make the narrative more gripping.

The main difference? “That in America, the degree you buy it. In Italy you must deserve it “. Emanuele Viola left Italy in 2001, during his doctorate at the Sapienza University of Rome. “I gave up a scholarship for a PhD at Harvard – he recalls -. Then I moved to Princeton, Columbia and Boston. ” Today he is a professor of theoretical computer science at Northeastern University in Boston. Return? “Yes, I hope to come back one day”.

Emanuele, born in 1977, was born in Rome. At 14, he programmed the video game Nathan Never, followed by Black Viper. At the age of 24, he traveled to the United States for a doctorate in computer science at Harvard University, followed by a postdoc at the Institute of Advanced Study in Princeton and one at Columbia University. “Then I became a professor at Northeastern University in Boston, where I received my professorship a few years ago.”

The typical day may vary based on academic work. “Personally, I work better if I spend a lot of time at home in almost complete isolation – explains Emanuele -. If I do not have to teach, I usually stand in front of a blank sheet trying to solve some problems – continues – until finally it’s time for my walk in the woods, so at least in one thing I can feel close to Einstein and Darwin, “he smiles. “I go to university a few days a week to teach or to attend various meetings. But I often connect via Skype “.

Italy misses him a lot, has less time to visit and the difference with the American academic world is drastic: “American universities are direct as companies in competition with each other, constantly looking for more money, better teachers and better students. Here, after you’ve been admitted, it’s almost as if you already had a degree in your pocket. It’s not exactly like this in Italy: of the 200 of my course – he recalls – I was the only one who graduated in five years, that is, not going out of course “.

For Emanuele then, the academic world and Italian research has not only fund problems. Rather. “A hundred years ago, it was typical for an American scholar to spend a period of training in Europe – continues Emanuele -. In a few generations, the situation has exactly reversed “. In this sense the problem of Italy is also that of the rest of Europe and other parts of the world. “America has amassed so many brilliant minds from all over the world that it is very difficult for another nation to be competitive, regardless of funding. Indeed, those in the European community are substantial and competitive. Right now “in America there are not many funds – he specifies – especially for the theory”.

The situation is reversed for the doctorate. “Here it does not have a fixed duration: if you do not throw yourself out, you go out when you have competitive publications, so it can take you even six or seven years. In Italy, the pre-established duration is three years, once absolutely insufficient to produce competitive publications “. This difference is also due to the fact that in the United States the salary of the student comes from the advisor, in Italy mainly from a government grant.

If we talk about training, in short, the subject changes. “Personally, I consider the instruction I received almost gratuitously at Sapienza, much more solid than the typical American preparation. This however reverses completely for advanced studies. Here there are more chances for deserving students. In Italy there is very little research in my field “.

The most beautiful memories? The rare moments when the clear sensation of solving a mathematical problem arrives. “It happened to me once while rolling on my ball and three times while walking through cemeteries,” he smiles. The goal for Emanuele is to return to Italy, even if with the family in America it is not easy. “For some time I have been planning a sabbatical year in Italy. I hope to get in touch with the contacts and that maybe one day not too far they will translate into a return “.

The environment in a private university where taxes exceed 50 thousand dollars a year is completely different from “what I remember from my student days”. Yet Emanuele is keen to say something: “No, I do not want to give the impression that money makes a big difference. The fact is that America has succeeded in attracting the best minds from all over the world – he concludes -. And no other country has succeeded “.

 

Advertisements

Nonclassical polynomials and exact computation of Boolean functions

Guest post by Abhishek Bhrushundi.

I would like to thank Emanuele for giving me the opportunity to write a guest post here. I recently stumbled upon an old post on this blog which discussed two papers: Nonclassical polynomials as a barrier to polynomial lower bounds by Bhowmick and Lovett, and Anti-concentration for random polynomials by Nguyen and Vu. Towards the end of the post, Emanuele writes:

“Having discussed these two papers in a sequence, a natural question is whether non-classical polynomials help for exact computation as considered in the second paper. In fact, this question is asked in the paper by Bhowmick and Lovett, who conjecture that the answer is negative: for exact computation, non-classical polynomials should not do better than classical.”

In a joint work with Prahladh Harsha and Srikanth Srinivasan from last year, On polynomial approximations over \mathbb {Z}/2^k\mathbb {Z}, we study exact computation of Boolean functions by nonclassical polynomials. In particular, one of our results disproves the aforementioned conjecture of Bhowmick and Lovett by giving an example of a Boolean function for which low degree nonclassical polynomials end up doing better than classical polynomials of the same degree in the case of exact computation.

The counterexample we propose is the elementary symmetric polynomial of degree 16 in \mathbb {F}_2[x_1, \ldots , x_n]. (Such elementary symmetric polynomials also serve as counterexamples to the inverse conjecture for the Gowers norm [LMS11GT07], and this was indeed the reason why we picked these functions as candidate counterexamples),

\begin{aligned}S_{16}(x_1, \ldots , x_n) = \left (\sum _{S\subseteq [n],|S| = 16} \prod _{i \in S}x_i\right )\textrm { mod 2} = {|x| \choose 16} \textrm { mod 2},\end{aligned}

where |x| = \sum _{i=1}^n x_i is the Hamming weight of x. One can verify (using, for example, Lucas’s theorem) that S_{16}(x_1, \ldots , x_n) = 1 if and only if the 5^{th} least significant bit of |x| is 1.

We use that no polynomial of degree less than or equal to 15 can compute S_{16}(x) correctly on more than half of the points in \{0,1\}^n.

Theorem 1. Let P be a polynomial of degree at most 15 in \mathbb {F}_2[x_1, \ldots , x_n]. Then

\begin{aligned}\Pr _{x \sim \{0,1\}^n}[P(x) = S_{16}(x)] \le \frac {1}{2} + o(1).\end{aligned}

[Emanuele’s note. Let me take advantage of this for a historical remark. Green and Tao first claimed this fact and sent me and several others a complicated proof. Then I pointed out the paper by Alon and Beigel [AB01]. Soon after they and I independently discovered the short proof reported in [GT07].]

The constant functions (degree 0 polynomials) can compute any Boolean function on half of the points in \{0,1\}^n and this result shows that even polynomials of higher degree don’t do any better as far as S_{16}(x_1, \ldots , x_n) is concerned. What we prove is that there is a nonclassical polynomial of degree 14 that computes S_{16}(x_1, \ldots , x_n) on 9/16 \ge 1/2 + \Omega (1) of the points in \{0,1\}^n.

Theorem 2. There is a nonclassical polynomial P of degree 14 such that

\begin{aligned}\Pr _{x \sim \{0,1\}^n}[P(x) = S_{16}(x)] = \frac {9}{16} - o(1).\end{aligned}

A nonclassical polynomial takes values on the torus \mathbb {T} = \mathbb {R}/\mathbb {Z} and in order to compare the output of a Boolean function (i.e., a classical polynomial) to that of a nonclassical polynomial it is convenient to think of the range of Boolean functions to be \{0,1/2\} \subset \mathbb {T}. So, for example, S_{16}(x_1, \ldots , x_n) = \frac {1}{2} if |x|_4 = 1, and S_{16}(x_1, \ldots , x_n) = 0 otherwise. Here |x|_4 denotes the 5^{th} least significant bit of |x|.

We show that the nonclassical polynomial that computes S_{16}(x) on 9/16 of the points in \{0,1\}^n is

\begin{aligned}P(x_1, \ldots , x_n) = \frac {\sum _{S \subseteq [n], |S|=12} \prod _{i \in S}x_i}{8} \textrm { mod 1}= \frac {{|x| \choose 12}}{8} \textrm { mod 1} .\end{aligned}

The degree of this nonclassical polynomial is 14 but I wouldn’t get into much detail as to why this is case (See [BL15] for a primer on the notion of degree in the nonclassical world).

Understanding how P(x) behaves comes down to figuring out the largest power of two that divides |x| \choose 12 for a given x: if the largest power of two that divides |x| \choose 12 is 2 then P(x) = 1/2, otherwise if the largest power is at least 3 then P(x) = 0. Fortunately, there is a generalization of Lucas’s theorem, known as Kummer’s theorem, that helps characterize this:

Theorem 3.[Kummer’s theorem] The largest power of 2 dividing a \choose b for a,b \in \mathbb {N}, a \ge b, is equal to the number of borrows required when subtracting b from a in base 2.
Equipped with Kummer’s theorem, it doesn’t take much work to arrive at the following conclusion.

Lemma 4. P(x) = S_{16}(x) if either |x|_{2} = 0 or (|x|_2, |x|_3, |x|_4, |x|_5) = (1,0,0,0), where |x|_i denotes the (i+1)^{th} least significant bit of |x|.

If x = (x_1, \ldots , x_n) is uniformly distributed in \{0,1\}^n then it’s not hard to verify that the bits |x|_0, \ldots , |x|_5 are almost uniformly and independently distributed in \{0,1\}, and so the above lemma proves that P(x) computes S_{16}(x) on 9/16 of the points in \{0,1\}^n. It turns out that one can easily generalize the above argument to show that S_{2^\ell }(x) is a counterexample to Bhowmick and Lovett’s conjecture for every \ell \ge 4.

We also show in our paper that it is not the case that nonclassical polynomials always do better than classical polynomials in the case of exact computation — for the majority function, nonclassical polynomials do as badly as their classical counterparts (this was also conjectured by Bhowmick and Lovett in the same work), and the Razborov-Smolensky bound for classical polynomials extends to nonclassical polynomials.

We started out trying to prove that S_4(x_1, \ldots , x_n) is a counterexample but couldn’t. It would be interesting to check if it is one.

References

[AB01]    N. Alon and R. Beigel. Lower bounds for approximations by low degree polynomials over z m. In Proceedings 16th Annual IEEE Conference on Computational Complexity, pages 184–187, 2001.

[BL15]    Abhishek Bhowmick and Shachar Lovett. Nonclassical polynomials as a barrier to polynomial lower bounds. In Proceedings of the 30th Conference on Computational Complexity, pages 72–87, 2015.

[GT07]    B. Green and T. Tao. The distribution of polynomials over finite fields, with applications to the Gowers norms. ArXiv e-prints, November 2007.

[LMS11]   Shachar Lovett, Roy Meshulam, and Alex Samorodnitsky. Inverse conjecture for the gowers norm is false. Theory of Computing, 7(9):131–145, 2011.

Entropy polarization

Sometimes you see quantum popping up everywhere. I just did the opposite and gave a classical talk at a quantum workshop, part of an AMS meeting held at Northeastern University, which poured yet another avalanche of talks onto the Boston area. I spoke about the complexity of distributions, also featured in an earlier post, including a result I posted two weeks ago which gives a boolean function f:\{0,1\}^{n}\to \{0,1\} such that the output distribution of any AC^{0} circuit has statistical distance 1/2-1/n^{\omega (1)} from (Y,f(Y)) for uniform Y\in \{0,1\}^{n}. In particular, no AC^{0} circuit can compute f much better than guessing at random even if the circuit is allowed to sample the input itself. The slides for the talk are here.

The new technique that enables this result I’ve called entropy polarization. Basically, for every AC^{0} circuit mapping any number L of bits into n bits, there exists a small set S of restrictions such that:

(1) the restrictions preserve the output distribution, and

(2) for every restriction r\in S, the output distribution of the circuit restricted to r either has min-entropy 0 or n^{0.9}. Whence polarization: the entropy will become either very small or very large.

Such a result is useless and trivial to prove with |S|=2^{n}; the critical feature is that one can obtain a much smaller S of size 2^{n-n^{\Omega (1)}}.

Entropy polarization can be used in conjunction with a previous technique of mine that works for high min-entropy distributions to obtain the said sampling lower bound.

It would be interesting to see if any of this machinery can yield a separation between quantum and classical sampling for constant-depth circuits, which is probably a reason why I was invited to give this talk.

Child Care at STOC 2018

The organizers asked me to advertise this and I sympathize:

We are pleased to announce that we will provide pooled, subsidized child care at STOC 2018. The cost will be $40 per day per child for regular conference attendees, and $20 per day per child for students.
For more detailed information, including how to register for STOC 2018 childcare, see http://acm-stoc.org/stoc2018/childcare.html

Ilias Diakonikolas and David Kempe (local arrangements chairs)

Hardness amplification proofs require majority… and 15 years

Aryeh Grinberg, Ronen Shaltiel, and myself have just posted a paper which proves conjectures I made 15 years ago (the historians want to consult the last paragraph of [2] and my Ph.D. thesis).

At that time, I was studying hardness amplification, a cool technique to take a function f:\{0,1\}^{k}\to \{0,1\} that is somewhat hard on average, and transform it into another function f':\{0,1\}^{n}\to \{0,1\} that is much harder on average. If you call a function \delta -hard if it cannot be computed on a \delta fraction of the inputs, you can start e.g. with f that is 0.1-hard and obtain f' that is 1/2-1/n^{100} hard, or more. This is very important because functions with the latter hardness imply pseudorandom generators with Nisan’s design technique, and also “additional” lower bounds using the “discriminator lemma.”

The simplest and most famous technique is Yao’s XOR lemma, where

\begin{aligned} f'(x_{1},x_{2},\ldots ,x_{t}):=f(x_{1})\oplus f(x_{2})\oplus \ldots \oplus f(x_{t}) \end{aligned}

and the hardness of f' decays exponentially with t. (So to achieve the parameters above it suffices to take t=O(\log k).)

At the same time I was also interested in circuit lower bounds, so it was natural to try to use this technique for classes for which we do have lower bounds. So I tried, and… oops, it does not work! In all known techniques, the reduction circuit cannot be implemented in a class smaller than TC^{0} – a class for which we don’t have lower bounds and for which we think it will be hard to get them, also because of the Natural proofs barrier.

Eventually, I conjectured that this is inherent, namely that you can take any hardness amplification reduction, or proof, and use it to compute majority. To be clear, this conjecture applied to black-box proofs: decoding arguments which take anything that computes f' too well and turn it into something which computes f too well. There were several partial results, but they all had to restrict the proof further, and did not capture all available techniques.

Should you have had any hope that black-box proofs might do the job, in this paper we prove the full conjecture (improving on a number of incomparable works in the literature, including a 10-year-anniversary work by Shaltiel and myself which proved the conjecture for non-adaptive proofs).

Indistinguishability

One thing that comes up in the proof is the following basic problem. You have a distribution X on n bits that has large entropy, very close to n. A classic result shows that most bits of X are close to uniform. We needed an adaptive version of this, showing that a decision tree making few queries cannot distinguish X from uniform, as long as the tree does not query a certain small forbidden set of variables. This also follows from recent and independent work of Or Meir and Avi Wigderson.

Turns out this natural extension is not enough for us. In a nutshell, it is difficult to understand what queries an arbitrary reduction is making, and so it is hard to guarantee that the reduction does not query the forbidden set. So we prove a variant, where the variables are not forbidden, but are fixed. Basically, you condition on some fixing X_{B}=v of few variables, and then the resulting distribution X|X_{B}=v is indistinguishable from the distribution U|U_{B}=v where U is uniform. Now the queries are not forbidden but have a fixed answer, and this makes things much easier. (Incidentally, you can’t get this simply by fixing the forbidden set.)

Fine, so what?

One great question remains. Can you think of a counter-example to the XOR lemma for a class such as constant-depth circuits with parity gates?

But there is something more why I am interested in this. Proving 1/2-1/n average-case hardness results for restricted classes “just” beyond AC^{0} is more than a long-standing open question in lower bounds: It is necessary even for worst-case lower bounds, both in circuit and communication complexity, as we discussed earlier. And here’s hardness amplification, which intuitively should provide such hardness results. It was given many different proofs, see e.g. [1]. However, none can be applied as we just saw. I don’t know, someone taking results at face value may even start thinking that such average-case hardness results are actually false.

References

[1]   Oded Goldreich, Noam Nisan, and Avi Wigderson. On Yao’s XOR lemma. Technical Report TR95–050, Electronic Colloquium on Computational Complexity, March 1995. http://www.eccc.uni-trier.de/.

[2]   Emanuele Viola. The complexity of constructing pseudorandom generators from hard functions. Computational Complexity, 13(3-4):147–188, 2004.

I believe P=NP

The only things that matter in a theoretical study are those that you can prove, but it’s always fun to speculate. After worrying about P vs. NP for half my life, and having carefully reviewed the available “evidence” I have decided I believe that P = NP.

A main justification for my belief is history:

  1. In the 1950’s Kolmogorov conjectured that multiplication of n-bit integers requires time \Omega (n^{2}). That’s the time it takes to multiply using the method that mankind has used for at least six millennia. Presumably, if a better method existed it would have been found already. Kolmogorov subsequently started a seminar where he presented again this conjecture. Within one week of the start of the seminar, Karatsuba discovered his famous algorithm running in time O(n^{\log _{2}3})\approx n^{1.58}. He told Kolmogorov about it, who became agitated and terminated the seminar. Karatsuba’s algorithm unleashed a new age of fast algorithms, including the next one. I recommend Karatsuba’s own account [9] of this compelling story.
  2. In 1968 Strassen started working on proving that the standard O(n^{3}) algorithm for multiplying two n\times n matrices is optimal. Next year his landmark O(n^{\log _{2}7})\approx n^{2.81} algorithm appeared in his paper “Gaussian elimination is not optimal” [12].
  3. In the 1970s Valiant showed that the graphs of circuits computing certain linear transformations must be a super-concentrator, a graph which certain strong connectivity properties. He conjectured that super-concentrators must have a super-linear number of wires, from which super-linear circuit lower bounds follow [13]. However, he later disproved the conjectured [14]: building on a result of Pinsker he constructed super-concentrators using a linear number of edges.
  4. At the same time Valiant also defined rigid matrices and showed that an explicit construction of such matrices yields new circuit lower bounds. A specific matrix that was conjectured to be sufficiently rigid is the Hadamard matrix. Alman and Williams recently showed that, in fact, the Hadamard matrix is not rigid [1].
  5. After finite automata, a natural step in lower bounds was to study sightly more general programs with constant memory. Consider a program that only maintains O(1) bits of memory, and reads the input bits in a fixed order, where bits may be read several times. It seems quite obvious that such a program could not compute the majority function in polynomial time. This was explicitly conjectured by several people, including [5]. Barrington [4] famously disproved the conjecture by showing that in fact those seemingly very restricted constant-memory programs are in fact equivalent to log-depth circuits, which can compute majority (and many other things).
  6. [Added 2/18] Mansour, Nisan, and Tiwari conjectured [10] in 1990 that computing hash functions on n bits requires circuit size \Omega (n\log n). Their conjecture was disproved in 2008 [8] where a circuit of size O(n) was given.

And these are just some of the more famous ones. The list goes on and on. In number-on-forehead communication complexity, the function Majority-of-Majorities was a candidate for being hard for more than logarithmically many parties. This was disproved in [3] and subsequent works, where many other counter-intuitive protocols are presented. In data structures, would you think it possible to switch between binary and ternary representation of a number using constant time per digit and zero space overhead? Turns out it is [117]. Do you believe factoring is hard? Then you also believe there are pseudorandom generators where each output bit depends only on O(1) input bits [2]. Known algorithms for directed connectivity use either super-polynomial time or polynomial memory. But if you are given access to polynomial memory full of junk that you can’t delete, then you can solve directed connectivity using only logarithmic (clean) memory and polynomial time [6]. And I haven’t even touched on the many broken conjectures in cryptography, most recently related to obfuscation.

On the other hand, arguably the main thing that’s surprising in the lower bounds we have is that they can be proved at all. The bounds themselves are hardly surprising. Of course, the issue may be that we can prove so few lower bounds that we shouldn’t expect surprises. Some of the undecidability results I do consider surprising, for example Hilbert’s 10th problem. But what is actually surprising in those results are the algorithms, showing that even very restricted models can simulate more complicated ones (same for the theory of NP completeness). In terms of lower bounds they all build on diagonalization, that is, go through every program and flip the answer, which is boring.

The evidence is clear: we have grossly underestimated the reach of efficient computation, in a variety of contexts. All signs indicate that we will continue to see bigger and bigger surprises in upper bounds, and P=NP. Do I really believe the formal inclusion P=NP? Maybe, let me not pick parameters. What I believe is that the idea that lower bounds are obviously true and we just can’t prove them is not only baseless but even clashes with historical evidence. It’s the upper bounds that are missing.

References

[1]   Josh Alman and R. Ryan Williams. Probabilistic rank and matrix rigidity. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 641–652, 2017.

[2]   Benny Applebaum, Yuval Ishai, and Eyal Kushilevitz. Cryptography in NC^0. SIAM J. on Computing, 36(4):845–888, 2006.

[3]   László Babai, Anna Gál, Peter G. Kimmel, and Satyanarayana V. Lokam. Communication complexity of simultaneous messages. SIAM J. on Computing, 33(1):137–166, 2003.

[4]   David A. Mix Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC^1. J. of Computer and System Sciences, 38(1):150–164, 1989.

[5]   Allan Borodin, Danny Dolev, Faith E. Fich, and Wolfgang J. Paul. Bounds for width two branching programs. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 25-27 April, 1983, Boston, Massachusetts, USA, pages 87–93, 1983.

[6]   Harry Buhrman, Richard Cleve, Michal Koucký, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In ACM Symp. on the Theory of Computing (STOC), pages 857–866, 2014.

[7]   Yevgeniy Dodis, Mihai Pǎtraşcu, and Mikkel Thorup. Changing base without losing space. In 42nd ACM Symp. on the Theory of Computing (STOC), pages 593–602. ACM, 2010.

[8]   Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography with constant computational overhead. In 40th ACM Symp. on the Theory of Computing (STOC), pages 433–442, 2008.

[9]   A. A. Karatsuba. The complexity of computations. Trudy Mat. Inst. Steklov., 211(Optim. Upr. i Differ. Uravn.):186–202, 1995.

[10]   Yishay Mansour, Noam Nisan, and Prasoon Tiwari. The computational complexity of universal hashing. Theoretical Computer Science, 107:121–133, 1993.

[11]   Mihai Pǎtraşcu. Succincter. In 49th IEEE Symp. on Foundations of Computer Science (FOCS). IEEE, 2008.

[12]   Volker Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354–356, 1969.

[13]   Valiant. On non-linear lower bounds in computational complexity. In ACM Symp. on the Theory of Computing (STOC), pages 45–53, 1975.

[14]   Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. In 6th Symposium on Mathematical Foundations of Computer Science, volume 53 of Lecture Notes in Computer Science, pages 162–176. Springer, 1977.

 

Et al.

The et al. citation style favors scholars whose last name comes early in the dictionary. For example, other things equal, a last name like Aaron would circulate a lot more than Zuck. This problem is compounded by the existence of highly-cited papers which deviate from alphabetical ordering of authors. They carry the message: order matters, and some of you can’t use this trick, vae victis!

My suggestion is to avoid et al. and instead spell out every name (as in Aaron and Zuck) or every initial (as in AZ). It isn’t perfect, but improvements like randomly permuting the order still aren’t easy to implement. The suggestion actually cannot be implemented in journals like computational complexity which punish the authors into using an idiosyncratic style which has et al. But it doesn’t matter too much; nobody reads papers in those formats anyway, as we discussed several times.

Special Topics in Complexity Theory: class is over :-(

I put together in a single file all the lectures given by me. On the class webpage you can also find the scribes of the two guest lectures, and the students’ presentations. Many thanks to Matthew Dippel, Xuangui Huang, Chin Ho Lee, Biswaroop Maiti, Tanay Mehta, Willy Quach, and Giorgos Zirdelis for doing an excellent job scribing these lectures. (And for giving me perfect teaching evaluations. Though I am not sure if I biased the sample. It went like this. One day I said: “Please fill the student evaluations, we need 100%.” A student said: “100% what?  Participation or score?” I meant participation but couldn’t resist replying jokingly “both.”) Finally, thanks also to all the other students, postdocs, and faculty who attended the class and created a great atmosphere.

Special Topics in Complexity Theory, Lecture 19

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lecture 19, Guest lecture by Huacheng Yu, Scribe: Matthew Dippel

Guest lecture by Huacheng Yu on dynamic data structure lower bounds, for the 2D range query and 2D range parity problems. Thanks to Huacheng for giving this lecture and for feedback on the write-up.

What is covered.

  • Overview of Larsen’s lower bound for 2D range counting.
  • Extending these techniques for \Omega (\log ^{1.5}n / \log \log ^3 n) for 2D range parity.

2 Problem definitions

Definition 1. 2D range counting

Give a data structure D that maintains a weighted set of 2 dimensional points with integer coordinates, that supports the following operations:

  1. UPDATE: Add a (point, weight) tuple to the set.
  2. QUERY: Given a query point (x, y), return the sum of weights of points (x', y') in the set satisfying x' \leq x and y' \leq y.

Definition 2. 2D range parity

Give a data structure D that maintains an unweighted set of 2 dimensional points with integer coefficients, that supports the following operations:

  1. UPDATE: Add a point to the set.
  2. QUERY: Given a query point (x, y), return the parity of the number of points (x', y') in the set satisfying x' \leq x and y' \leq y.

Both of these definitions extend easily to the d-dimensional case, but we state the 2D versions as we will mainly work with those.

2.1 Known bounds

All upper bounds assume the RAM model with word size \Theta (\log n).

Upper bounds: Using range trees, we can create a data structure for 2D range counting, with all update and query operations taking time O(\log ^d n) time. With extra tricks, we can make this work for 2D range parity with operations running in time O((\log n / \log \log n)^d).

Lower bounds. There are a series of works on lower bounds:

  • Fredman, Saks ’89 – 1D range parity requires \Omega (\log n / \log \log n).
  • Patrascu, Demaine ’04 – 1D range counting requires \Omega (\log n).
  • Larsen ’12 – 2D range counting requires \Omega ((\log n / \log \log n)^2).
  • Larsen, Weinstein, Yu ’17 – 2D range parity requires \Omega (\log ^{1.5} n / \log \log ^3 n).

This lecture presents the recent result of [Larsen ’12] and [Larsen, Weinstein, Yu ’17]. They both use the same general approach:

  1. Show that, for an efficient approach to exist, the problem must demonstrate some property.
  2. Show that the problem doesn’t have that property.

3 Larsen’s technique

All lower bounds are in the cell probe model with word size \Theta (\log n).

We consider a general data structure problem, where we require a structure D that supports updates and queries of an unspecified nature. We further assume that there exists an efficient solution with update and query times o((\log n / \log \log n)^2). We will restrict our attention to operation sequences of the form u_1, u_2, \cdots , u_n, q. That is, a sequence of n updates followed by a single query q. We fix a distribution over such sequences, and show that the problem is still hard.

3.1 Chronogram method [FS89]

We divide the updates into r epochs, so that our sequence becomes:

\begin{aligned}U_r, U_{r-1}, \cdots , U_1, q\end{aligned}

where |U_i| = \beta ^i and \beta = \log ^5 n. The epochs are multiplicatively shrinking. With this requirement, we have that r = \Theta (\log n / \log \log n).

Let M be the set of all memory cells used by the data structure when run on the sequence of updates. Further, let A_i be the set of memory cells which are accessed by the structure at least once in U_i, and never again in a further epoch.

Claim 1. The A_r, A_{r-1}, \cdots A_1 are disjoint.

Claim 2. There exists an epoch i such that D probes o(\log n / \log \log n) cells from A_i when answering the query at the end. Note that this is simply our query time divided by the number of epochs. In other words, D can’t afford to read \Omega (\log n / \log \log n) cells from each A_i set without breaking its promise on the query run time.

Claim 2 implies that there is an epoch i which has the smallest effect on the final answer. We will call this the ”easy” epoch.

Idea. : The set A_i contains ”most” information about U_i among all memory cells in M. Also, A_r, A_{r-1}, \cdots , A_{i+1} are not updated past epoch i + 1, and hence should contain no information relative to the updates in U_i. Epochs A_{i-1}, A_{i-2}, \cdots A_1 are progressively shrinking, and so the total touched cells in A_i during the query operation should be small.

\begin{aligned}\sum _{j < i}|A_j| \leq O(\beta ^{i - 1}) \cdot \log ^2 n\end{aligned}

3.2 Communication game

Having set up the framework for how to analyze the data structure, we now introduce a communication game where two parties attempt to solve an identical problem. We will show that, an efficient data structure implies an efficient solution to this communication game. If the message is smaller than the entropy of the updates of epoch i (conditioned on preceding epochs), this gives an information theoretic contradiction. The trick is to find a way for the encoder to exploit the small number of probed cells to send a short message.

The game. The game consists of two players, Alice and Bob, who must jointly compute a single query after a series of updates. The model is as follows:

  • Alice has all of the update epochs U_r, U_{r-1}, ... U_1. She also has an index i, which still corresponds to the ”easy” epoch as defined above.
  • Bob has all update epochs EXCEPT for U_i. He also has a random query q. He is aware of the index i.
  • Communication can only occur in a single direction, from Alice to Bob.
  • We assume some fixed input distribution \mathcal {D}.
  • They win this game if Bob successfully computes the correct answer for the query q.

Then we will show the following generic theorem, relating this communication game to data structures for the corresponding problem:

Theorem 3. If there is a data structure with update time t_u and probes t cells from A_i in expectation when answering the final query q, then the communication game has an efficient solution, with O(p|U_i|t_u\log n + \beta ^{i-1}t_u\log n ) communication cost, and success probability at least p^t. This holds for any choice of 0 < p < 1.

Before we prove the theorem, we consider specific parameters for our problem. If we pick

\begin{aligned} p &= 1 / \log ^5n, \\ t_u &= \log ^2 n, \\ t &= o(\log n / \log \log n), \end{aligned}

then, after plugging in the parameters, the communication cost is |U_i| / \log ^2 n. Note that, we could always trivially achieve |U_i| by having Alice send Bob all of U_i, so that he can compute the solution of the problem with no uncertainty. The success probability is (\log ^{-5} n)^{o(\log n / \log \log n)}, which simplifies to 2^{-o(\log n)} = 1 / n^{o(1)}. This is significantly better than 1 / n^{O(1)}, which could be achieved trivially by having Bob output a random answer to the query, independent of the updates.

Proof.

We assume we have a data structure D for the update / query problem. Then Alice and Bob will proceed as follows:

Alice’s steps.

  1. Simulate D on U_r, U_{r - 1}, ... U_1. While doing so, keep track of memory cell accesses and compute A_r, A_{r-1}, ... A_1.
  2. Sample a random subset C \subset A_i, such that |C| = p|A_i|.
  3. Send C \cup A_{i-1} \cup A_{i-2} \cup ... A_1.

We note that in Alice’s Step 3, to send a cell, she sends a tuple holding the cell ID and the cell state before the query was executed. Also note that, she doesn’t distinguish to Bob which cells are in which sets of the union.

Bob’s steps.

  1. Receive C' from Alice.
  2. Simulate D on epochs U_{r}, U_{r-1}, ... U_{i+1}. Snapshot the current memory state of the data structure as M.
  3. Simulate the query algorithm. Every time q attempts to probe cell c, Bob checks if c \in C'. If it is, he lets D probe from C'. Otherwise, he lets D probe from M.
  4. Bob returns the result from the query algorithm as his answer.

If the query algorithm does not query any cell in A_i - C, then Bob succeeds, as he can exactly simulate the data structure query. Since the query will check t cells in A_i, and Bob has a random subset of them of size p|A_i|, then the probability that he got a subset the data structure will not probe is at least p^t. The communication cost is the cost of Alice sending the cells to Bob, which is

\begin{aligned} (p|A_i| + \sum _{j < i}|A_i|) \leq (pt_u + |U_i| + \beta ^{i-1}t_u)\log n\end{aligned}

\square

4 Extension to 2D Range Parity

The extension to 2D range parity proceeds in nearly identical fashion, with a similar theorem relating data structures to communication games.

Theorem 1. Consider an arbitrary data structure problem where queries have 1-bit outputs. If there exists a data structure having:

  • update time t_u
  • query time t_q
  • Probes t cells from A_i when answering the last query q

Then there exists a protocol for the communication game with O(p|U_i|t_i\log n + t_u\beta ^{i-1}\log n ) bits of communication and success probability at least 1/2 + 2^{-O(\sqrt {t_q t (\log (1 / p)^3})}, for any choice of 0 < p < 1. Again, we plug in the parameters from 2D range parity. If we set

\begin{aligned} t_u = t_q &= o(\log ^{1.5}n / (\log \log n)^2), \\ t = t_q / r &= o(\log ^ (1/2) n / \log \log n), \\ p &= 1 / \log ^5 n, \end{aligned}

then the cost is |U_i| / \log ^2 n, and the probability simplifies to 1/2 + 1 / n^{o(1)}.

We note that, if we had Q = n^{O(1)} different queries, then randomly guessing on all of them, with constant probability we could be correct on as many as Q/2 \pm O(\sqrt {Q}). In this case, the probability of being correct on a single one, amortized, is 1/2 + 1/n^{\Theta (1)}.

Proof. The communication protocol will be slightly adjusted. We assume an a priori distribution on the updates and queries. Bob will then compute the posterior distribution, based on what he knows and what Alice sends him. He then computes the maximum likelihood answer to the query q. We thus need to figure out what Alice can send, so that the answer to q is often biased towards either 1 or 0.

We assume the existence of some public randomness available to both Alice and Bob. Then we adjust the communication protocol as follows:

Alice’s modified steps.

  • Alice samples, using the public randomness, a subset of ALL memory cells M_2, such that each cell is sampled with probability p. Alice sends M_2 \cap A_i to Bob. Since Bob can mimic the sampling, he gains additional information about which cells are and aren’t in A_i.

Bob’s modified steps.

  • Denote by S the set of memory cells probed by the data structure when Bob simulates the query algorithm. That is, S is what Bob ”thinks” D will probe during the query, as the actual set of cells may be different if Bob had full knowledge of the updates, and the data structure may use that information to determine what to probe. Bob will use S to compute the posterior distribution.

Define the function f(z) : [2^w] \rightarrow \mathbb {R} to be the ”bias” when S takes on the value z. In particular, this function is conditioned on C' that Bob receives from Alice. We can then clarify the definition of f as

\begin{aligned} f_{C'}(z) &:= (\text {Pr}[\text {ans to q } = 1 | C', S \leftarrow z] - 1/2) * \text {Pr}[S \leftarrow z | C'] \end{aligned}

In particular, f has the following two properties:

  1. \sum _z |f(z)| \leq 1
  2. \mathbb {E}_{C'}[\max _z |f(z)|] \geq 1/2 \cdot p^t

In these statements, the expectation is over everything that Bob knows, and the probabilities are also conditioned on everything that Bob knows. The randomness comes from what he doesn’t know. We also note that when the query probes no cells in A_i - C', then the bias is always 1/2, since the a posterior distribution will put all its weight on the correct answer of the query.

Finishing the proof requires the following lemma:

Lemma 2. For any f with the above two properties, there exists a Y \subseteq S such that |Y| \leq O(\sqrt {|S| \log 1/p^t}) and

\begin{aligned} \sum _{y \in Y} \left |\sum _{z | y} f(z) \right | &\geq 2^{-O(\sqrt {|S| \log 1 / p^t})}. \end{aligned}

Note that the sum inside the absolute values is the bias when Y \leftarrow y. \square

References

[FS89]   Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. In ACM Symp. on the Theory of Computing (STOC), pages 345–354, 1989.