“It is often said that we live in a computational universe. But if Nature “computes” in a classical, input-output fashion then our current prospect to leverage this viewpoint to gain fundamental insights may be scarce. This is due to the combination of two facts. First, our current understanding of fundamental questions such as “P=NP?” is limited to restricted computational models, for example the class AC0 of bounded-depth circuits. Second, those restricted models are incapable of modeling many processes which appear to be present in nature. For example, a series of works in complexity theory culminating in [Hås87] shows that AC0 cannot count. But what if Nature, instead, “samples?” That is, what if Nature is better understood as a computational device that given some initial source of randomness, samples the observed distribution of the universe? Recent work by the Project Investigator (PI) gives two key insights in this direction. First, the PI has highlighted that, when it comes to sampling, restricted models are capable of surprising behavior. For example, AC0 can count, in the sense that it can sample a uniform bit string together with its hamming weight.[Vio12a] Second, despite the growth in power given by sampling, for these restricted models the PI was still able to answer fundamental questions of the type of “P=NP?”[Vio14]”
|
Thus begins my application for the Turing Centenary Research Fellowship. After reading it, perhaps you too, like me, are not surprised that it was declined. But I was unprepared for the strange emails that accompanied its rejection. Here’s an excerpt:
“[…] A reviewing process can be thought of as a kind of Turing Test for fundability. There is a built-in fallibility; and just as there is as yet no intelligent machine or effective algorithm for recognising one (otherwise why would we bother with a Turing Test), there is no algorithm for either writing the perfect proposal, or for recognising the worth of one. Of course, the feedback may well be useful, and will come. But we will be grateful for your understanding in the meantime.”
|
Well, I am still waiting for comments.
Even the rejection was sluggish: for months I and apparently others were told that our proposal didn’t make it, but was so good that they were looking for extra money to fund it anyway. After the money didn’t materialize, I was invited to try the regular call (of the sponsoring foundation). The first step of this was submitting a preliminary proposal, which I took: I re-sent them the abstract of my proposal. I was then invited to submit the full proposal. This is a rather painstaking process which requires you to address a seemingly endless series of minute questions referring to mysterious concepts such as the “Theory of Change.” Nevertheless, given that they had suggested I try the regular call, they had seen what I was planning to submit, and they had still invited me for the full proposal, I did answer all the questions and re-sent them what they already had, my Turing Research Fellowship application. Perhaps it only makes sense that the outcome was as it was.
The proposal was part of a research direction which started exactly five years ago, when the question was raised of proving computational lower bounds for sampling. Since then, there has been progress: [Vio12a, LV12, DW11, Vio14, Vio12b, BIL12, BCS14]. One thing I like of this area is that it is uncharted – wherever you point your finger chances are you find an open problem. While this is true for much of Complexity Theory, questions regarding sampling haven’t been studied nearly as intensely. Here’s three:
A most basic open question. Let D be the distribution on n-bit strings where each bit is independently 1 with probability 1∕4. Now suppose you want to sample D given some random bits x1,x2,…. You can easily sample D exactly with the map
(x1 ∧ x2,x3 ∧ x4,…,x2n–1 ∧ x2n).
This map is 2-local, i.e., each output bit depends on at most 2 input bits. However, we use 2n inputs bits, whereas the entropy of the distribution is H(1∕4)n ≈ 0.81n. Can you show that any 2-local map using a number of bits closer to H(1∕4)n will sample a distribution that is very far from D? Ideally, we want to show that the statistical distance between the distribution is very high, exponentially close to n.
Such strong statistical distance bounds also enable a connection to lower bounds for succinct dictionaries; a problem that Pǎtraşcu thinks important. A result for d-local maps corresponds to a result for data structures which answer membership queries with d non-adaptive bit probes. Adaptive bit probes correspond to decision trees. While d cell probes correspond to samplers whose input is divided in blocks of O(log n) bits, and each output bit depends on d cells, adaptively.
There are some results in [Vio12a] on a variant of the above question where you need to sample strings whose Hamming weight is exactly n∕4, but even there there are large gaps in our knowledge. And I think the above case of 2-local maps is still open, even though it really looks like you cannot do anything unless you use 2n random bits.
Stretch. With Lovett we suggested [LV12] to prove negative results for sampling (the uniform distribution over a) subset S ⊆{0, 1}n by bounding from below the stretch of any map
f : {0, 1}r → S.
Stretch can be measured as the average Hamming distance between f(x) and f(y), where x and y are two uniform input strings at Hamming distance 1. If you prove a good lower bound on this quantity then some complexity lower bounds for f follow because local maps, AC0 maps, etc. have low stretch.
We were able to apply this to prove that AC0 cannot sample good codes. Our bounds are only polynomially close to 1; but a nice follow-up by Beck, Impagliazzo, and Lovett, [BIL12], improves this to exponential. But can this method be applied to other sets that do not have error-correcting structure?
Consider in particular the distribution UP which is uniform over the upper-half of the hypercube, i.e., uniform over the n-bit strings whose majority is 1. What stretch is required to sample UP? At first sight, it seems the stretch must be quite high.
But a recent paper by Benjamini, Cohen, and Shinkar, [BCS14], shows that in fact it is possible with stretch 5. Moreover, the sampler has zero error, and uses the minimum possible number of input bits: n – 1!
I find their result quite surprising in light of the fact that constant-locality samplers cannot do the job: their output distribution has Ω(1) statistical distance from UP [Vio12a]. But local samplers looked very similar to low-stretch ones. Indeed, it is not hard to see that a local sampler has low average stretch, and the reverse direction follows from Friedgut’s theorem. However, the connections are only average-case. It is pretty cool that the picture changes completely when you go to worst-case computation.
What else can you sample with constant stretch?
AC0 vs. UP. Their results are also interesting in light of the fact that AC0 can sample UP with exponentially small error. This follows from a simple adaptation of the dart-throwing technique for parallel algorithms, known since the early 90’s [MV91, Hag91] – the details are in [Vio12a]. However, unlike their low-stretch map, this AC0 sampler uses superlinear randomness and has a non-zero probability of error.
Can AC0 sample UP with no error? Can AC0 sample UP using O(n) random bits?
Let’s see what the next five years bring.
References
[BCS14] Itai Benjamini, Gil Cohen, and Igor Shinkar. Bi-lipschitz bijection between the boolean cube and the hamming ball. In IEEE Symp. on Foundations of Computer Science (FOCS), 2014.
[BIL12] Chris Beck, Russell Impagliazzo, and Shachar Lovett. Large deviation bounds for decision trees and sampling lower bounds for AC0-circuits. In IEEE Symp. on Foundations of Computer Science (FOCS), pages 101–110, 2012.
[DW11] Anindya De and Thomas Watson. Extractors and lower bounds for locally samplable sources. In Workshop on Randomization and Computation (RANDOM), 2011.
[Hag91] Torben Hagerup. Fast parallel generation of random permutations. In 18th Coll. on Automata, Languages and Programming (ICALP), pages 405–416. Springer, 1991.
[Hås87] Johan Håstad. Computational limitations of small-depth circuits. MIT Press, 1987.
[LV12] Shachar Lovett and Emanuele Viola. Bounded-depth circuits cannot sample good codes. Computational Complexity, 21(2):245–266, 2012.
[MV91] Yossi Matias and Uzi Vishkin. Converting high probability into nearly-constant time-with applications to parallel hashing. In 23rd ACM Symp. on the Theory of Computing (STOC), pages 307–316, 1991.
[Vio12a] Emanuele Viola. The complexity of distributions. SIAM J. on Computing, 41(1):191–218, 2012.
[Vio12b] Emanuele Viola. Extractors for turing-machine sources. In Workshop on Randomization and Computation (RANDOM), 2012.
[Vio14] Emanuele Viola. Extractors for circuit sources. SIAM J. on Computing, 43(2):355–972, 2014.