Data-structure lower bounds without encoding arguments

I have recently posted the paper [Vio21] (download) which does something that I have been trying to do for a long time, more than ten years, on and off. Consider the basic data-structure problem of storing m bits of data x\in \{0,1\}^{m} into m+r bits so that the prefix-sum queries

\begin{aligned} \mathbb {\text {\textsc {Rank}}}(i):=\sum _{j\le i}x_{j} \end{aligned}

can be computed by probing q cells (or words) of w bits each. (You can think w=\log m throughout this post.) The paper [PV10] with Pǎtraşcu shows that r\ge m/w^{O(q)}, and this was recently shown to be tight by Yu [Yu19] (building on the breakthrough data structure [Pǎt08] which motivated the lower bound and is not far from it).

As is common in data-structure lower bounds, the proof in [PV10] is an encoding argument. In the recently posted paper, an alternative proof is presented which avoids the encoding argument and is perhaps more in line with other proofs in complexity lower bounds. Of course, everything is an encoding argument, and nothing is an encoding argument, and this post won’t draw a line.

The new proof establishes an intrinsic property of efficient data structures, whereas typical proofs including [PV10] are somewhat tailored to the problem at hand. The property is called the separator and is a main technical contribution of the work. At the high level the separator shows that in any efficient data structure you can restrict the input space a little so that many queries are nearly pairwise independent.

Also, the new proof rules out a stronger object: a sampler (see previous post here on sampling lower bounds). Specifically, the distribution Rank(U) where U is the uniform distribution cannot be sampled, not even slightly close, by an efficient cell-probe algorithm. This implies the data-structure result, and it can be informally interpreted as saying that the “reason” why the lower bound holds is not that the data is compressed, but rather that one can’t generate the type of dependencies occurring in Rank via an efficient cell-probe algorithm, regardless of what the input is.

Building on this machinery, one can prove several results about sampling, like showing that cell-probe samplers are strictly weaker than AC0 samplers. While doing this, it occurred to me that one gets a corollary for data structures which I had not seen in the literature. The corollary is a probe hierarchy, showing that some problem can be solved with zero redundancy (r=0) with O(q) probes, while it requires almost linear r for q probes. For example I don’t know of a result yielding this for small q such as q=O(1); I would appreciate a reference. (As mentioned in the paper, the sampling viewpoint is not essential and just like for Rank one can prove the data-structure corollaries directly. Personally, and obviously, I find the sampling viewpoint useful.)

One of my favorite open problems in the area still is: can a uniform distribution over [m] be approximately sampled by an efficient cell-probe algorithm? I can’t even rule out samplers making two probes!

References

[Pǎt08]   Mihai Pǎtraşcu. Succincter. In 49th IEEE Symp. on Foundations of Computer Science (FOCS). IEEE, 2008.

[PV10]   Mihai Pǎtraşcu and Emanuele Viola. Cell-probe lower bounds for succinct partial sums. In 21th ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 117–122, 2010.

[Vio21]   Emanuele Viola. Lower bounds for samplers and data structures via the cell-probe separator. Available at http://www.ccs.neu.edu/home/viola/, 2021.

[Yu19]    Huacheng Yu. Optimal succinct rank data structure via approximate nonnegative tensor decomposition. In Moses Charikar and Edith Cohen, editors, ACM Symp. on the Theory of Computing (STOC), pages 955–966. ACM, 2019.

Questions on the future of lower bounds

Will any of the yellow books be useful?

https://media.springernature.com/w153/springer-static/cover/book/9783030276447.jpg

Book 576 (not pictured) was saved just in time from the paper mill. It was rumored that Lemma 76.7.(ii) could have applications to lower bounds. Upon closer inspection, that lemma has a one-line proof by linearity of expectation if you change the constant 17 to 19. This change does not affect the big-Oh.

Will the study of randomness lead to the answer to any of the questions that are open since before randomness became popular? I think it’s a coin-toss.

Will there be any substance to the belief that algebraic lower bounds must be proved first?

Will the people who were mocked for working on DLOGTIME uniformity, top fan-in k circuits, or ZFC independence have the last laugh?

Will someone switch the circuit breaker and lit up CRYPT, DERAND, and PCPOT, or will they remain unplugged amusement parks where you sit in the roller coaster, buckle up, and pretend?

Will diagonalization be forgotten, or will it continue to frustrate combinatorialists with lower bounds they can’t match for functions they don’t care about?

Will decisive progress be made tonight, or will it take centuries?

Only Ketan Mulmuley knows for sure.

Submitting to ICALP 2021

I am taking advantage of the pandemic to participate in conferences where it’s usually hard for me to participate because they require intercontinental travel. I have a paper in CSR 2021, and now am submitting a paper to ICALP 2021 (I count submission as participation, even in case the paper gets rejected). The latter requires submissions to be formatted in a specific way, a topic discussed at length on this blog.

Begin 12:48

Download the LIPICs package.

Try to compile their sample paper.

Get error message: ! LaTeX Error: File `l3backend-dvips.def’ not found.

Google solution. Says to install packages.

Unfortunately, I am using windows but I only have LyX, and the solution expects MixTeX.

Google how to install packages in LyX.

Can’t find anything simple.

Create a new document on overleaf.

Copy all the LIPIcs files there.

Try to compile their sample.

It works!

Paste my latex.

Usual avalanche of problems to be fixed at the speed of light.

Add dummy section “Introduction” which wasn’t in my paper, otherwise theorem numbers look weird.

Numbers still look weird. Something’s wrong with theorem statements.

Replace {thm} with {theorem}

Looks better. Still some wrong stuff all around, however.

No it wasn’t that. Remove the dummy section. It seems their “paragraph” environment puts strange numbers like 0.0.0.1

Replace \paragaph with \paragaph* everywhere

Actually, looks weird the way they put the ack– put back the dummy Introduction section.

Check page limit: no more than 12 pages, excluding references

I’m a little over. Does this really matter? Apparently, it does! Move last proof to the appendix. Actually, last proof is kind of short, I should move the penultimate proof. Update paper organization (next time I shouldn’t put it).

Final look. Fix a few indentations.

OK, time to actually submit. Go to the easychair website. They want me to re-enter all the information!? Why, after forcing me to enter title, keywords, etc. in their format, are they asking me to do this again? Can’t we just send the .tex file and extract it from there?

Oh come one, it’s just a few seconds of copy-paste.

OK, done, paper submitted.

End: 3:05

Well, next time it will be easier. Perhaps easier, but not easy because as the reader knows there will be another missing package, another incompatible system, etc. And of course, if the paper is rejected, then I won’t even save the time to convert it into camera-ready format. On the other hand, the benefit is non-existent. It would be better for everyone if in order to submit a paper you have to complete a random 1-hour task on Amazon mechanical Turk and donate the profit to charity.

Et Al. II

From Thoughts, :

The et al. citation style favors scholars whose last name comes early in the dictionary. For example, other things equal, a last name like Aaron would circulate a lot more than Zuck. This problem is compounded by the existence of highly-cited papers which deviate from alphabetical ordering of authors. They carry the message: order matters, and some of you can’t use this trick, vae victis!

My suggestion is to avoid et al. and instead spell out every name (as in Aaron and Zuck) or every initial (as in AZ). It isn’t perfect, but improvements like randomly permuting the order still aren’t easy to implement. The suggestion actually cannot be implemented in journals like computational complexity which punish the authors into using an idiosyncratic style which has et al. But it doesn’t matter too much; nobody reads papers in those formats anyway, as we discussed several times.

From the STOC 2021 call for papers:

Authors are asked to avoid “et al.” in citations in favor of an equal mention of all authors’ surnames (unless the number of authors is very large, and if it is large, consider just using \cite{} with no “et al.”). When not listing authors’ names, citations should preferably include the first letters of the authors’ surnames (or at least the first three followed by a +, and possibly the year of publication). If using BibTeX, this can be accomplished by using \bibliographystyle{alpha}.

The role of government

Is there anything in this world which works well, which is not a series of time-consuming, useless steps inherited from the before-Internet era, of endless waits to speak to someone who behaves like a robot in none of the good ways and all the bad?

I hear linear algebra.

Yes, I think that’s as good as it gets.

If I had one penny for every time I said “V like Victor” over the phone… A game-changing innovation of the unfolding millennium will be a button which, upon pressure, instantly releases all your information to wh(o/at)ever needs to know it. Instead, they now do us the favor of “safeguarding our privacy” by forcing us to enter and re-enter and re-re-renter our information; us who don’t have someone who does it in our stead.

In yet another example of adding insult to injury, much of this is cloaked under “Consumer protection.” While in fact its effect is precisely the opposite: it is to cut the consumer off the market. For example, there’s a rule that you must wait three days to close on a loan. This is supposed to help you sleep over it and think if you really want to go for it and if you can afford it. How nice of them to help us avoid rushed decisions! Obviously, sellers don’t want to deal with people subject to such delays. If one really wanted to help, the rule should be that everybody has to wait three days to complete a transaction, regardless if it’s cash or financed, so that those for whom the transaction is more significant than drinking a tea can indeed sleep over it.

More absurdities. Person A currently has loan with interest rate X. Now interests have been set to Y < X. However, A cannot refinance their loan because A does not have enough cash reserves (or some other suspicious condition). So A will be forbidden from refinancing and will be forced to keep their loan at old, unfavorable interest rate X > Y.

Question: Under which interest rate is A more likely to default?

The only possible deduction is that the lender wants A to default. You can’t say that by refinancing the lender will take a greater risk. They are just using this loophole to make more money. Wasn’t this supposed to help citizens? This situation has another name as well.

Special torture awaits serial debtors. Another loan in effect can delay processing by… months. In the meanwhile, the documents for the new loan will expire, and re-expire, and re-re-expire yet again, while the hapless debtor will be forced to continuously work to keep them up to date — similar to what happens to Sisyphus or the Danaides, but the classics weren’t perverse enough to conceive the endless task of collecting documents from disconnected websites, each with its own distinctive and rapidly-mutating user interface, each requiring a new password or dual-factor authentication at each download, each in turn disappearing and being replaced by yet another system hosted by yet another party which will require yet another type of authentication, and so on forever — do you want this money or not?

There is a (probably well-known) quote which I read in The Instinct for Cooperation : A Graphic Novel Conversation with Noam Chomsky (not a terrible read — I fittingly found it in a free exchange library and returned it to another) which stuck with me. I don’t recall it exactly (see, I shouldn’t have returned it), and I can’t find it online precisely, but it basically says that “The role of the government is to protect property from those who don’t have it.” (An online search seems to reveal that something along these lines was expressed by James Madison.) Regardless of whether it was said or not, and regardless of whether you agree with it or not, it is quite useful to keep this quote in mind, because it explains much of the world we live in.

Marijuana brings good to your community

Subsequence of official communication from Newton:


[…] recently reviewed applications from […] marijuana retailers who have submitted proposals […].

[…] has a diverse management team with experience in the cannabis industry, equity, community relations and public health. […] certified by the Cannabis Control Commission as an Economic Empowerment Applicant, signifying that the applicant demonstrates experience in or business practices that promote economic empowerment in disproportionately impacted communities. Their plan involves […] additional parking on-site.

[…] also signing […] is a family owned and operated business. The company is certified by the Massachusetts Supplier Diversity Office as both a Minority Business and a Women Business Enterprise. […] will renovate the building and create surface parking. It has indicated it is willing to make transportation infrastructure improvements to the intersection […].

I am also signing […]. […] brings experience in the cannabis industry along with commitments to equity, public health, and community relations. […] will be constructing a new building and parking on what is currently a vacant parcel. 

[…] with a revised plan to […] include more parking.

Et cetera.

My monitor setup

In the not-distant future there will be a single monitor that gets you the best of both worlds. For the contemporaneous, I maintain that the above is the best monitor setup available to us in 2020. I use the tiny E-ink monitor as much as possible, including now, for my blitz matches on chess.com, and of course for writing and sometimes reviewing papers. But as I mentioned earlier unfortunately for certain bureaucratic tasks that not all of us can skip altogether you just need a bigger monitor with color. So I push a button on the hdmi switch, and the image blasts open on the 30-inch screen, m’illumino d’immenso, and suddenly the mouse feels like the interface from Minority Reports.

Previous vs. concurrent/independent work

Should Paper X cite Paper Y as previous or concurrent/independent work? This is sometimes tricky: maybe Paper Y circulated privately before Paper X, maybe the authors of Paper X knew about Paper Y maybe not — nobody can know for sure. One can say that the authors of Paper Y should have posted Paper Y online earlier to prevent this issue, but that is not standard practice and might lead to other problems, including Paper Y never getting published!

I propose the following guiding principle:

“If a different accept/reject outcome would have forced paper X to cite paper Y as previous work, then paper X should cite paper Y as previous work.”

The reasons behind my principle seem to me especially valid in the fast-moving theoretical computer science community, where papers are typically sent to conferences and thus seen by the entire program committee plus around 3 external referees, who are typically experts — only to be rejected. Moreover, the progress is extremely fast, with the next conference cycle making obsolete a number of papers in just the previous cycle.

We don’t need no education

Around us, educational systems whose primary emphasis is on social rather than intellectual skills are simply disintegrating. Unequipped for content delivery, teachers spray parents with a mixture of links and credentials whose collection is a complete waste of everybody’s time. Suggestion: What about assigning homework and let teachers provide feedback?

To all the school-age kids stuck at home doing some fun coding, this immortal song is for you.