Teaching special topics in complexity theory in Fall 2017

Next Fall I am teaching a special topics class in complexity theory. Below and here is the formal announcement, including a highly tentative list of topics. Some of the topics are somewhat ambitious, but that problem lies well in the future, concerning not me but my future self. He also appreciates suggestions and pointers of cool results to include.


Special topics in complexity theory

Instructor.

Emanuele Viola

Logistics.

Class 1:35 pm – 3:15 pm Tuesday Friday Ryder Hall 273.

First class Sep 08, 2017.

Running at the same time Program on Combinatorics and Complexity at Harvard.

Tentative syllabus

This class will present recent (amazing) progress in complexity theory and related areas. A highly tentative list of topics follows:

(1) Pseudorandom generators. Bounded independence, small-bias, sandwiching polynomials. Bounded independence fools And, bounded independence fools AC0 (Braverman’s result), the Gopalan-Kane-Meka inequality, the Gopalan-Meka-Reingold-Trevisan-Vadhan generator. Vadhan’s survey, Amnon Ta-Shma’s class

(2) Approximate degree. Bounded indistinguishability. Bounded indistinguishability of Or, And-Or, and Surjectivity (the Bun-Thaler proof)

(3) Circuit complexity. Williams’ lower bounds from satisfiability. Succinct and explicit NP-completeness. ACC0-SAT algorithms. Exposition in web appendix of Arora and Barak’s book.

(4) Quasi-random groups. Austin’s corner theorem in SL(2,q).

(5) Communication complexity and quasi-random groups. Determinism vs. randomization in number-on-forehead communication complexity. Number-in-hand lower bounds for quasi-random groups. Various notes.

(5) Data structures. Overview: static, dynamic, bit-probe, cell-probe. Siegel’s lower bound for hashing. The Larsen-Weinstein-Yu superlogarithmic lower bound.

(6) Arithmetic circuits. Overview. The chasm at depth 3. (The Gupta-Kamath-Kayal-Saptharishi result.) Shpilka and Yehudayoff’s survey Survey

(7) Fine-grained complexity reductions (SETH, 3SUM)

Deliverables

Each student will scribe #lectures/#students, and present a lecture. Grade is based on scribe, presentation, and class participation.

Scribes: due 72 hours from the time of the lecture. Feel free to send me a draft and ask for suggestions before the cutoff. Scribe templates: lyx, tex. Optionally, the lectures will be posted on my blog. Using this template minimizes the risk that my wordpress compiler won’t work.

Presentations: should convey both a high-level overview of the proof of the result, as well as a self-contained exposition of at least one step. Talk to me for suggestions. Discuss with me your presentation plan at least 1 week before your presentation.

Presentation papers:

Note: Always look if there is an updated version. Check the authors’ webpages as well as standard repositories (arxiv, ECCC, iacr, etc.)

Pseudorandomness:

Amnon Ta-Shma. Explicit, almost optimal, epsilon-balanced codes.

(Perhaps Avraham Ben-Aroya, Dean Doron and Amnon Ta-Shma. An efficient reduction from non-malleable extractors to two-source extractors, and explicit two-source extractors with near-logarithmic min-entropy.)

Prahladh Harsha, Srikanth Srinivasan: On Polynomial Approximations to AC0

SAT algorithms for depth-2 threshold circuits:

Here and here,

Quasirandom groups:

Higher-dimensional corners

Data structures:

Parts of Larsen-Weinstein-Yu superlogarithmic we left out.

Any of the many lower bounds we didn’t see.

Arithmetic circuit complexity:

Parts of the reduction we did not see.

Survey

Tavenas

Regularity lemmas:

TTV, Skorski

Fine-grained reductions:

Dynamic problems, LCS, edit distance.

There are many other reductions in this area.

Classes

  1. 2017-09-08 Fri
  2. 2017-09-12 Tue
  3. 2017-09-15 Fri
  4. 2017-09-19 Tue
  5. 2017-09-22 Fri
  6. 2017-09-26 Tue
  7. 2017-09-29 Fri
  8. 2017-10-03 Tue. Additive combinatorics workshop. NO CLASS
  9. 2017-10-06 Fri. Additive combinatorics workshop. NO CLASS
  10. 2017-10-10 Tue
  11. W-R Lectures by Gowers
  12. 2017-10-13 Fri
  13. 2017-10-17 Tue
  14. 2017-10-20 Fri
  15. 2017-10-24 Tue
  16. 2017-10-27 Fri
  17. 2017-10-31 Tue
  18. 2017-11-03 Fri
  19. 2017-11-07 Tue
  20. 2017-11-10 Fri
  21. 2017-11-14 Tue. Workshop. Class planned as usual, but check later
  22. 2017-11-17 Fri. Workshop. Class planned as usual, but check later
  23. 2017-11-21 Tue
  24. 2017-11-24 Fri. Thanksgiving, no classes.
  25. 2017-11-28 Tue
  26. 2017-12-01 Fri
  27. 2017-12-05 Tue
  28. 2017-12-08 Fri
  29. 2017-12-12 Tue
  30. 2017-12-15 Fri

STOC/FOCS PC meetings: Does nature of decisions justify cost?

I am just back from the FOCS 2017 program committee (PC) meeting. This is my second time attending a physical PC meeting, both times for FOCS. In the past I was sorry that I had to decline some invitations for personal reasons. Just like the previous instance, I was impressed with the depth and thoroughness of the discussion. I also had a great time sitting in the same room with so many esteemed colleagues, chatting about what’s cool in the new research papers, and cracking nerdy jokes. It is also a privilege for me to hear what everyone has to say about what is going on. I even managed to have some quick research meetings here and there. During the flight I even proved a great theorem watched Trainspotting 2. And I had never been to Caltech: it is a great place with a moving history as told by the book in the drawer next to my bed in the Athenaeum.

However in the rest of the post I want to address the title. Throughout, by “PC meetings”, I mean “physical” meetings as opposed to “virtual.” Before I start, I want to stress two things. First, what I am about to say has absolutely nothing to do with this or any specific STOC/FOCS PC meeting, rather it is aimed to PC meetings in general. Second, I am not saying that the decisions made are “bad.” Indeed I don’t think a substantially better way to make decisions exists (here and in many other cases).

First, the experiment. As mentioned in a previous post, this time around I ran a little experiment: Right before the meeting started, I saved the ranking of the papers, based on the reviews and the offline discussion, weighted by confidence. Then I compared it with the final decisions at the end of the meeting (with the knowledge of the total number of papers accepted). There is a handful of papers that I have a conflict with and so I don’t know about. Those I know about number 88. The two lists of 88 papers have edit distance 19, meaning that you need to change 19 papers in one list to get the other list. This is a 0.216 fraction of the papers. I consider this number negligible. Note that it is based on information that was entered in expectation of resolving many things during the meeting: extra rounds of offline discussion would have probably calibrated this much better. Think also how this compares with the NIPS experiment. In that case, the two lists of 37 papers have edit distance 21 (plus or minus one), which is a lot larger than the above. However STOC/FOCS decisions may be more “stable” than NIPS (would be fun to have any data on this).

A common experience is also that a large fraction of papers is a clear reject, a small fraction is a clear accept, and the rest are more or less rated like novels or movies; and I am not aware of a better rating scheme.

Second, how the decisions are made. I am not sure that decisions made during the meeting are significantly better than decisions made offline (I will use the word “offline” to mean “not in a physical meeting”). A meeting can start at 8:30AM and go past 6PM, with total break time about 1 hour. Many people are also jet lagged or exhausted. Working offline may be better. Sure, in a physical meeting you can literally grab someone by the arm and ask: Do you really think X should be accepted given Y? But in general there is very little time for this. On the other hand, in an offline discussion you can say: OK, we need a little extra expertise on papers x, y, z: reviewer w, please dig into it and report (happened to me). This supposedly should also take place before the physical meeting, but I think it’s done less than it should be, because the feeling is that we are going to have a meeting anyway so we might as well resolve this then. Instead what happens at times during the meeting is that expertise is missing, and there’s just no time to remedy. Sometimes, the in-person decision process can be rather chaotic. I want to stress again that I am not saying that *any* PC is not run well. My point is that this is the nature of the hard decisions that we need to make. Of course, email conversations can get very tedious. And you can also say that if there’s no threat of a physical meeting people feel less pressure to write good reviews and make good decisions, though that’s not my experience from SODA/CCC program committees. A good model could be to have a virtual meeting, with all the participants present at the same time. This could be done early on, and in fact having two such meetings could be the best thing. At my institutions we have many, many meetings virtually. During such meetings, you can still stare at someone in the camera and ask: Do you really think X should be accepted given Y?

There are many venues where decisions are made offline, such as CCC/CRYPTO/SODA/JACM/SICOMP. One can say that STOC/FOCS are more important than CCC/CRYPTO/SODA, but it’s hard to say that they are more important than say JACM. An argument is then that journal decisions are different because they involve a lot more time and care. I tend to disagree with this last point. My impression is that journal and conference reviews are rather indistinguishable when it comes to computing the accept/reject bit. The difference is only in the second-order term: journals can spend more time on details and style.

Third, the location. This meeting was held at Caltech. The Theory festival at Montreal ended with a full day of amazing tutorials on friday. The festival is promoted as a “must-attend” event. So what we are telling the program committee is this: “Look, you really gotta attend the theory festival. If you need to attend anything at all this year, that’s the meeting! Oh, and right after that (4PM) please catch a red-eye Montreal-LAX, and be ready to start at 8:30 AM on Saturday.” Personally, I wanted to attend the theory fest, but I just couldn’t to the above. So I skipped it in favor of the PC meeting.

If we really need to have physical meetings, I think we should combine them with some other popular event, or at least put them where the center of mass is. Typically, none of this happens, and the tradition seems to be that the meeting is held where the PC chair works, which may not even be where the conference is. I am not the only one who feels so, by the way, about this point and the others. In particular a past program committee in an act of rebelion organized a conference for themselves and co-located it with the PC meeting. But that’s unusual, and it’s clearly better to co-locate with the existing events, since attendance is already an issue. One argument is that the PC chair can offer a nice room with lots of support, which is hard to find elsewhere. I think this argument does not stand. The “support” consists in everybody bringing their laptops (an example of “downfall of mankind” discussed earlier), and wireless. I don’t think it’s so hard to get a room with a table and wireless in a hotel or another university. Finally, I think these meetings favor the United States even more disproportionately so than the conference itself. Indeed, STOC/FOCS are at least once in a while held elsewhere (one was in Rome, for example — I didn’t go). I’ve never heard of a meeting done abroad, would be curious to know.

I am told that this year TCC has a physical meeting (not always the case) but that it is co-located with CRYPTO, a conference typically attended by most of the crypto community. This makes sense to me.

Look at some recent STOC/FOCS PC, and think of the cost of flying everybody to the meeting, and think of the result.

Fourth, the time. As discussed earlier, do we really have to hold such meetings during week-ends? Especially in the summer, I don’t know of a reason for this. Why not Tuesday to Thursday? People don’t teach, departments are empty, and flying is less chaotic. Plus maybe you want to spend your summer week-ends going to the beach instead of being shut in a windowless room with dozens of laptops? An argument is that international flights are sometimes cheaper if you stay during the week-end. I am not sure this really makes a difference, for a number of reasons (including: not so many people from abroad, and they typically stay extra days anyway).

Synthesis.

My impression is that things are done this way mostly because of inertia: This is how it’s been done, and now good luck changing it. Also, being a STOC/FOCS chair is (or is close to) a once-in-a-lifetime appointment, which clearly one doesn’t want to screw up. Another argument I heard is that physical meetings are a better way to preserve tradition. That is, a young member of the PC learns better the trade through physical interaction.

If you put all the above things together, my impression is that the answer to the title is “no.” The resources are probably enough for at least a fraction of a postdoc, or they could be used to allow more people to actually attend the conference. My concrete proposal is to have the next meeting offline, with a mixture of desynchronized discussion and virtual synchronized meetings. At least, we should try.

Bounded independence fools And

One of the earliest (and coolest) results in pseudorandomness is the following theorem that shows that k-wise independence “fools” the And function.


Theorem 1. [1] Let x_{1},x_{2},\ldots ,x_{n} be k-wise independent distributions over \{0,1\}. Then

\begin{aligned} |\mathbb {E}[\prod _{i\le n}x_{i}]-\prod _{i\le n}\mathbb{E} [x_{i}]|\le 2^{-\Omega (k)}. \end{aligned}

Proof. First recall that the inclusion-exclusion principle shows that for any random variables y_{1},y_{2},\ldots ,y_{n} jointly distributed over \{0,1\}^{n} according to a distribution D we have (write Or_{i} for the Or function on i bits);

\begin{aligned} \mathbb{E} \prod _{i\le n}y_{i} & =1-\mathbb{E} \text {Or\ensuremath {_{i}}}(1-y_{i})\nonumber \\ & =1-\sum _{j=1}^{n}T_{j}~~~~(1) \end{aligned}

where

\begin{aligned} T_{j}=(-1)^{j}\sum _{S\subseteq [n],|S|=j}\mathbb{E} \prod _{i\in S}1-y_{i}. \end{aligned}

Moreover, if we truncate the sum in Equation (1) to the first t terms, it gives either a lower bound or an upper bound depending on whether t is odd or even.

Because this holds under any distribution, if we show that the right-hand side of Equation (1) is approximately the same if the y_{i} are n-wise independent and k-wise, then the left-hand sides of that equation will also be approximately the same in the two scenarios, giving the result. (Note that \prod _{i\le n}\mathbb{E} [x_{i}] equals the expectation of the product of the x_{i} if the latter are n-wise independent.)

Now, since the terms T_{j} only involve expectations of the product of at most j variables, we have that they are the same under n-wise and k-wise independence up to j=k. This would conclude the argument if we can show that |T_{k}| is 2^{-\Omega (k)} (since this is the quantity that you need to add to go from a lower/upper bound to an upper/lower bound). This is indeed the case if \sum _{i\le n}\mathbb{E} [1-x_{i}]\le k/2e because then by McLaurin’s inequality we have

\begin{aligned} |T_{k}|=\sum _{S\subseteq [n],|S|=k}\prod _{i\in S}\mathbb{E} [1-x_{i}]\le (e/k)^{k}(\sum _{i\le n}\mathbb{E} [1-x_{i}])^{k}\le 2^{-k} \end{aligned}

where the first equality holds because the x_{i} are k-wise independent.

There remains to handle the case where \sum _{i\le n}\mathbb{E} [1-x_{i}]>k/2e. In this case, the expectations are so small that even running the above argument over a subset of the random variables is enough. Specifically, let n' be such that \sum _{i\le n}\mathbb{E} [1-x_{i}]=k/2e (more formally you can find an n' such that this holds up to an additive 1, which is enough for the argument; but I will ignore this for simplicity). Then the above argument still applies to the first n' variables. Moreover, the expectation of the product of just these n' variables is already fairly small. Specifically, because the geometric mean is always less than the arithmetic mean (a fact which is closely related to McLaurin’s inequality), we have:

\begin{aligned} \prod _{i\le n'}\mathbb{E} [x_{i}]\le (\frac {1}{n'}\sum _{i\le n'}\mathbb{E} [x_{i}])^{n'}\le (\frac {1}{n'}(n'-k/2e))^{n'}\le (1-k/2en')^{n'}\le 2^{-\Omega (k)}. \end{aligned}

Since under any distribution the expectation of the product of the first n variables is at most the expectation of the product of the first n', the result follows. \square

An interesting fact is that this theorem is completely false if the variables are over \{-1,1\} instead of \{0,1\}, even if the independence is n-1. The counterexample is well-known: parity. Specifically, take uniform variables and let x_{n}=\prod _{i<n}x_{i}. What may be slightly less known is that this parity counterexample also shows that the error term in the theorem is tight up to the constant in the \Omega . This is because you can write any function on t bits as the disjoint union of 2^{t} rectangles.

A side aim of this post is to try a new system I put together to put math on wordpress. I’ll test it some more and then post about it later, comparing it with the alternatives. Hopefully, my thinking that it was the lack of this system that prevented me from writing math-heavier posts was not just an excuse to procrastinate.

References

[1]   Guy Even, Oded Goldreich, Michael Luby, Noam Nisan, and Boban Velickovic. Approximations of general independent distributions. In ACM Symp. on the Theory of Computing (STOC), pages 10–16, 1992.

ECCC as a zero-formatting “publisher” for CCC proceedings?

Background: After going solo, the CCC conference is using LIPIcs as a “publisher” for the papers accepted to the conference. This involves a non-trivial amount of formatting (to put the papers in their format) and also some monetary costs.

I would like to use the opportunity that CCC is going solo to move to a model where the “publishing” involves *zero* effort from authors. This could be a selling point for the conference, and maybe set an example for others.

Specifically, in the vein of previous posts, I propose that authors of accepted papers simply send the .pdf of their paper in whatever format they like. The CCC people take care of placing a stamp “CCC 20xx camera-ready” and putting the paper on the ECCC. Papers with indecent formatting are treated exactly as papers with indecent introductions.

Disclaimer: although I am on the reviewing board of ECCC I had no discussions with the ECCC people about this.

The main benefits of ECCC are:

– Submission is painless: just send the .pdf! Again, authors can write their paper in whatever format they like.

– Indexed by DBLP

– It’s run by “us”, it’s about computational complexity and in fact “Under the auspices of the Computational Complexity Foundation (CCF)”

– It has an ISSN number (1433-8092). I am told this is important for some institutions, though I don’t know if some insist on ISBN over ISSN. If they do, perhaps there’s a way to get that too?

– They do various nice things already, like archiving papers in CD’s etc. In fact, going back to the ISBN issue, couldn’t we simply assign an ISBN to the reports from each year?

– It has no cost (given that ECCC already exists).

Another option is to use arxiv or an arxiv overlay. This would also be better than using LIPIcs, I think, but it does not enjoy many of the benefits above.

FOCS PC

I am on the FOCS program committee. The deadline for submission is in four days, and I look forward to reading in detail about the exciting progress that theory is seeing!  (To pick a random exciting paper: this.)  This time around I will also run a small private experiment and report on it after the decisions.  I was looking briefly for an online system that would allow me to commit to the experiment with the usual cryptographic guarantees, but couldn’t find it.  I thought maybe I could commit now to using a definite future value, say tomorrow’s closing value of the S&P500, as a seed for a commitment scheme.  It would be overkill in this case, because the experiment I have in mind is not so important, perhaps, but in other scenarios it may be useful to be able to do this without much effort.

The Day Care Slavery

There will be a time in my life where I won’t have to worry about day cares anymore. That time will be enormously sad, but I will still find a way to rejoice, because it will be the end of the Day Care Slavery. You hear about the surging cost of college, you hear about the gender gap. But you don’t hear as much about the Day Care Slavery.

If you are planning to start a family my best advice is to start looking at day cares before even trying, and calling your top picks the moment you find out you are expecting, before you call anyone else. A typical wait time for a spot in a day care is… two years! (This reminds me of a great scene in an Italian comedy film — Fantozzi, probably not well known outside of Italy — where Fantozzi bribes a doctor to get a spot in the hospital for his daugther’s abortion. The doctor agrees and says it will take two years for the spot. At which Fantozzi replies that gestation lasts only nine months. “But then your case is desperate!” says the doctor.)

Think of a family which is relocating. The hassle of moving pales compared to these wait lists.

Sure, there are other options with a faster turnaround. They are generally incredibly worse. There is also pretty substantive research that quality day cares has a disproportionate effect on life-long “success”. (Success is measured according to typical indicators, whence the quotes.) A quick search pointed me to many interesting papers that back this up. The keyword to look for is “early childhood education”. I’ll just give a few meta-references (that is, popular-press articles that summarize the research findings):

Long-Term Effects of Early Childhood Programs on Cognitive and School Outcomes, Preschool education boosts children’s academic success, research finds, Preschool Education and Its Lasting Effects: Research and Policy Implications, and finally The Sooner The Better: Early Childhood Education, A Key To Life-Long Success.

The day cares must know all of this, because a full-time infant spot in a day care (good Boston-area location) costs more than… $3K/month! Higher than any rent I have ever paid (and I spent one year in a not bad place in Chelsea, Manhattan).

It is not uncommon that one year of day care will cost you more than a year of college. Plus there are many ways in which you can save for college. It is hard but not impossible to get a scholarship for attending a school, but go get one for a preschool. And you can always work to pay for college. Infants can’t do that. College themselves offer many options, like being a teaching or research assistant, to save for education.

What do you get for the hefty tuition of a day care, by the way? You get to shut the infant in a windowless basement for the entire day. This is the usual issue that they are liable if a child trips outside and scratches a knee, but no method was yet FDA approved to measure the brain damage ensuing from spending 10 hours in a basement during summertime. (Naturally, they promise that the child will go out multiple times *every* day.) You also have to bring lunch, and provide diapers and wipes.

Frankly, I find all of this ridiculous. Think about it. When your child hits 5 years, they are guaranteed a free spot in public schools. Until then, you’re on your own. You are not guaranteed a spot in a day care, and if you are lucky to get it it costs a fortune. The government’s support in this crucial juncture is this: you can save taxes on $5K spent towards day cares. That’s about $1500 bucks, less than half of one month of tuition. And naturally for this you have to fill some extra bureaucracy, and have a little less cash for a year (you get it all back after that). Oh and by the way, if you end up not using the money, sorry, everything’s gone.

Providing quality, affordable day care is one of the most effective ways to achieve many desired goals, including life-long “success” and reducing the gender gap.

Stoner

The beginning of spring is a busy time in academia. Many committees reach their climax and require immediate action. At the same time, here in Massachusetts the snowstorms that survive climate change carefully destroy schedules that took and will take more weeks to prepare. But one thing that I have managed to do between filling Doodle forms is finish Stoner. The book was written and more or less unnoticed in the 60’s; it was then reprinted in the current millennium and became a success. (I owe my own discovery to my mom who shipped me the Italian translation as a present.)

If you ever wonder about the meaning of life, especially academic life, I recommend this book. I almost put it away because I found the beginning a little boring, but I am happy I did not, even though the book isn’t exactly uplifting.

Black Viper

Over at the Amigospodcast, dreamkatcha posted a series of five posts on the defunct Italian software house Lightshock software. This post is about Black Viper, a game which I coded up back in the 90’s. The last post is about what happened to us afterwards. I have contributed to the posts the following (which you can also read there among nice pictures):


Black Viper is the second game I coded up, after Nathan Never. I vividly remember when I turned down an offer to work on Nathan Never 2. In hindsight, I think it’s been a mistake. True, my experience with Nathan Never had not been very pleasant. The producers did not pay half of what was promised on the contract (something for which I subsequently sued them, in vain). The person who most closely directed the works did not have much experience with games, but understandably was mostly interested in the publicity that would come from the title (Nathan Never is a popular comic book in Italy). I was also threatened extortion if I didn’t finish the game in time.

But on the positive side, after some difficulty they had shipped me an A3000T to complete the game. And most importantly of all, the game got down in six months! Black Viper instead took three ominous years.

But when I made the decision I was 15 and I really had no idea. I had always wanted to work on a beat’em up, and at some point Marco Genovesi and I even had a small demo. My impression is that we ended up working on a bike game because neither of us liked it. We did like the post-apocalyptic atmosphere though.

So we started working on this project again during our high school (the game finally hit the shelves during my first year of college). Our initial name was “Dark Blade” and the game had disproportionate depth, including bets, tournaments, and race against radioactive magma. It was again great to work with Marco, one of the most talented persons I have ever met. We were in the same classroom in high school every day (Italian high school has or at least had a rigid system thanks to which you spent all your time with the same people). Classrooms had desks for two people, and I think the first year Marco and I even shared the desk. If I remember correctly we were both somewhat timid and we naturally ended up together (but here I may be stretching my memory). We would exchange floppy disks in class and generally chat about the game and sketch ideas. Interaction wasn’t always easy, but it was always rewarding.

Our professors took note of what we were doing, and said nothing. I believe in almost any other environment they would have jumped on the students and push them hard to their full potential.

Soon, completing Dark Blade without a however horrible software house proved quite difficult. I don’t remember exactly how we got to Lightshock Software. I think it was through some personal connection. In any case, at some pint this software house materialized, they liked the game, and had some connection to sell it in the broader European market through NEO. So we decided to go with them. We had quite a few fun trips from Rome (where we were based) to Prato (where Lightshock Software was) to discuss the game and our subsequent expansion and world domination. It was also fun to connect with more game developers. They worked with us to make the game better, and to add some extras for the AGA and CD32 versions.

One fond memory I have is when I needed help to hunt down a bug. Basically, the game would occasionally hang up. I had never given that much thought, since it was quite rare. But, of course, the last version of the game systematically crashed exactly at the last level. It’s hard to imagine anything worse. First, reproducing the bug wasn’t the easiest thing. Second, the code was three years old, and it consisted of a single, colossal file in assembly, commented as thoroughly as only a 15-year-old can.

So it was arranged for me to go spend some days at their headquarters and work with Marco Biondi, another Amiga coder. He hosted me at his quaint house in Florence, and together we had a few days of intense debugging. He had no idea of the code whatsoever of course, and was horrified to see certain parts of it which reflected my complete lack of training (my only source had been the Amiga Hardware reference manual, brought to me years ago by Marco Genovesi when I had sprained my ankle jumping from the top of a swing). But Marco Biondi was quite helpful. Eventually, we were able to freeze the machine right before the crash, and we went through one assembly instruction at the time. Amazingly, at some point the instructions became complete gibberish. We exulted. Someone entered the room and proposed something, perhaps going out. I remember Marco Biondi saying no, now “lo teniamo per le palle” (we grab it by the balls).
We looked at how long the gibberish code was, and it turned out to be exactly the length of one of the rectangular elements that made the road. This quickly led us to investigate the code that builds the road. And there I found an assembly subroutine where the register d0 was used instead of a0, which would cause problems if I remember correctly with the carry-around when you increment it. We fixed it and the game didn’t crash anymore. Amazingly, that subroutine was one of the very first which I had written. Throughout the day, we had been playing continuously the same clip of about ten seconds of a heavy metal song.

I think the my share of the sales amounted to the equivalent of $500 today, much less than what I had made with Nathan Never.

Towards the end of Black Viper, and after it, I also worked with other people on a 3D engine. We used things like binary-space-partition trees, and had a demo working both on PC and Amiga. But then it quickly became clear that the only way to even hope to produce anything competitive would be to work on the project full time, which was also difficult because we worked on different parts of Rome and there was no internet. It had been much more convenient to share floppy disks in class with Marco Genovesi!

At some point I also got an offer from Lightshock software to move to their Belluno headquarters to work full time on programming games. They promised a hefty salary and considerable freedom. I thought about it, but in the end I did not go for it. I was in the middle of my college studies which were going well and I didn’t feel like abandoning them (my parents also advised me against). Again I remember the phone call during which I turned them down, perhaps another mistake. I had always wondered what had happened to Lightshock software!

Afterwards I was in touch with other wanna-be software houses, but it was clear that they did not have the capacity that Lightshock software had, and I think they did not end up producing any game.

So I completed my studies, I got interested in mathematics and theoretical computer science, and ended up doing a Ph.D. at Harvard University, and have stayed in the US ever since. I still program, and occasionally I toy with the idea of being involved again with a computer game (besides as a player of course, that has never stopped).

Matrix rigidity, and all that

The rigidity challenge asks to exhibit an n × n matrix M that cannot be written as M = A + B where A is “sparse” and B is “low-rank.” This challenge was raised by Valiant who showed in [Val77] that if it is met for any A with at most n1+ϵ non-zero entries and any B with rank O(n∕ log log n) then computing the linear transformation M requires either logarithmic depth or superlinear size for linear circuits. This connection relies on the following lemma.


Lemma 1. Let C : {0, 1}n →{0, 1}n be a circuit made of XOR gates. If you can remove e edges and reduce the depth to d then the linear transformation computed by C equals A + B where A has ≤ 2d non-zero entries per row (and so a total of ≤ n2d non-zero entries), and B has rank ≤ e.


Proof: After you remove the edges, each output bit is a linear combination of the removed edges and at most 2d input variables. The former can be done by B, the latter by A. QED

Valiant shows that in a log-depth, linear-size circuit one can remove O(n∕ log log n) edges to reduce the depth to nϵ – a proof can be found in [Vio09] – and this gives the above connection to lower bounds.

However, the best available tradeoff for explicit matrices give sparsity n2∕r log(n∕r) and rank r, for any parameter r; and this is not sufficient for application to lower bounds.

Error-correcting codes

It was asked whether generator matrixes of good linear codes are rigid. (A code is good if it has constant rate and constant relative distance. The dimensions of the corresponding matrixes are off by only a constant factor, and so we can treat them as identical.) Spielman [Spi95] shows that there exist good codes that can be encoded by linear-size logarithmic depth circuits. This immediately rules out the possibility of proving a lower bound, and it gives a non-trivial rigidity upper bound via the above connections.

Still, one can ask if these matrices at least are more rigid than the available tradeoffs. Goldreich reports a negative answer by Dvir, showing that there exist good codes whose generating matrix C equals A + B where A has at most O(n2∕d) non-zero entries and B has rank O(d log n∕d), for any d.

A similar negative answer follows by the paper [GHK+13]. There we show that there exist good linear codes whose generating matrix can be written as the product of few sparse matrixes. The corresponding circuits are very structured, and so perhaps it is not surprising that they give good rigidity upper bounds. More precisely, the paper shows that we can encode an n-bit message by a circuit made of XOR gates and with say n log *n wires and depth O(1) – with unbounded fan-in. Each gate in the circuit computes the XOR of some t gates, which can be written as a binary tree of depth log 2t + O(1). Such trees have poor rigidity:


Lemma 2.[Trees are not rigid] Let C be a binary tree of depth d. You can remove an O(1∕2b) fraction of edges to reduce the depth to b, for any b.


Proof: It suffices to remove all edges at depths d – b, d – 2b, …. The number of such edges is O(2d-b + 2d-2b + …) = O(2d-b). Note this includes the case d ≤ b, where we can remove 0 edges. QED

Applying Lemma 2 to a gate in our circuit, we reduce the depth of the binary tree computed at that gate to b. Applying this to every gate we obtain a circuit of depth O(b). In total we have removed an O(1∕2b) fraction of the n log *n edges.

Writing 2b = n∕d, by Lemma 1 we can write the generating matrixes of our code as C = A + B where A has at most O(n∕d) non-zero entries per row, and B has rank O(d log *n). These parameters are the same as in Dvir’s result, up to lower-order terms. The lower-order terms appear incomparable.

Walsh-Fourier transform

Another matrix that was considered is the n×n Inner Product matrix H, aka the Walsh-Hadamard matrix, where the x,y entry is the inner product of x and y modulo 2. Alman and Williams [AW16] recently give an interesting rigidity upper bound which prevents this machinery to establish a circuit lower bound. Specifically they show that H can be written as H = A + B where A has at most n1+ϵ non-zero entries, and B has rank n1-ϵ′, for any ϵ and an ϵ′ which goes to 0 when ϵ does.

Their upper bound works as follows. Let h = log 2n. Start with the univariate, real polynomial p(z1,z2,…,zh) which computes parity exactly on inputs of Hamming weight between 2ϵn and (1∕2 + ϵ)n. By interpolation such a polynomial exists with degree (1∕2 – ϵ)n. Replacing zi with xiyi you obtain a polynomial of degree n – ϵn which computes IP correctly on inputs x,y whose inner product is between 2ϵn and (1∕2 + ϵ)n.

This polynomial has 2(1-ϵ′)n monomials, where ϵ′ = Ω(ϵ2). The truth-table of a polynomial with m monomials is a matrix with rank m, and this gives a low-rank matrix B′.

The fact that sparse polynomials yield low-rank matrixes also appeared in the paper [SV12], which suggested to study the rigidity challenge for matrixes arising from polynomials.

Returning to the proof in [AW16], it remains to deal with inputs whose inner product does not lie in that range. The number of x whose weight is not between (1∕2 – ϵ)n and (1∕2 + ϵ)n is 2(1-ϵ′)n. For each such input x we modify a row of the matrix B′. Repeating the process for the y we obtain the matrix B, and the rank bound 2(1-ϵ′)n hasn’t changed.

Now a calculation shows that B differs from H in few entries. That is, there are few x and y with Hamming weight between (1∕2 – ϵ)n and (1∕2 + ϵ)n, but with inner product less than 2ϵn.

Boolean complexity

There exists a corresponding framework for boolean circuits (as opposed to circuits with XOR gates only). Rigid matrixes informally correspond to depth-3 Or-And-Or circuits. If this circuit has fan-in fo at the output gate and fan-in fi at each input gate, then the correspondence in parameters is

rank = log fo
sparsity = 2fi .

More precisely, we have the following lemma.


Lemma 3. Let C : {0, 1}n →{0, 1}n be a boolean circuit. If you can remove e edges and reduce the depth to d then you can write C as an Or-And-Or circuit with output fan-in 2e and input fan-in 2d.


Proof: After you remove the edges, each output bit and each removed edge depends on at most 2d input bits or removed edges. The output Or gate of the depth-3 circuit is a big Or over all 2e assignments of values for the removed edges. Then we need to check consistency. Each consistency check just depends on 2d inputs and so can be written as a depth-2 circuit with fan-in 2d. QED

The available bounds are of the form log fo = n∕fi. For example, for input fan-in fi = nα we have lower bounds exponential in n1-α but not more. Again it can be shown that breaking this tradeoff in certain regimes (namely, log 2fo = O(n∕ log log n)) yields lower bounds against linear-size log-depth circuits. (A proof appears in [Vio09].) It was also pointed out in [Vio13] that breaking this tradeoff in any regime yields lower bounds for branching programs. See also the previous post.

One may ask how pairwise independent hash functions relate to this challenge. Ishai, Kushilevitz, Ostrovsky, and Sahai showed [IKOS08] that they can be computed by linear-size log-depth circuits. Again this gives a non-trivial upper bound for depth-3 circuits via these connections, and one can ask for more. In [GHK+13] we give constructions of such circuits which in combination with Lemma 3 can again be used to almost match the available trade-offs.

The bottom line of this post is that we can’t prove lower bounds because they are false, and it is a puzzle to me why some people appear confident that P is different from NP.

References

[AW16]    Josh Alman and Ryan Williams. Probabilistic rank and matrix rigidity, 2016. https://arxiv.org/abs/1611.05558.

[GHK+13]   Anna Gál, Kristoffer Arnsfelt Hansen, Michal Koucký, Pavel Pudlák, and Emanuele Viola. Tight bounds on computing error-correcting codes by bounded-depth circuits with arbitrary gates. IEEE Transactions on Information Theory, 59(10):6611–6627, 2013.

[IKOS08]    Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography with constant computational overhead. In 40th ACM Symp. on the Theory of Computing (STOC), pages 433–442, 2008.

[Spi95]    Daniel Spielman. Computationally Efficient Error-Correcting Codes and Holographic Proofs. PhD thesis, Massachusetts Institute of Technology, 1995.

[SV12]    Rocco A. Servedio and Emanuele Viola. On a special case of rigidity. Available at http://www.ccs.neu.edu/home/viola/, 2012.

[Val77]    Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. In 6th Symposium on Mathematical Foundations of Computer Science, volume 53 of Lecture Notes in Computer Science, pages 162–176. Springer, 1977.

[Vio09]    Emanuele Viola. On the power of small-depth computation. Foundations and Trends in Theoretical Computer Science, 5(1):1–72, 2009.

[Vio13]    Emanuele Viola. Challenges in computational lower bounds. Available at http://www.ccs.neu.edu/home/viola/, 2013.

Mixing in groups, II

In the previous post we have reduced the “three-step mixing” over SL(2,q), the group of 2×2 matrices over the field with q elements with determinant 1, to the following statement about mixing of conjugacy classes.


Theorem 1.[Mixing of conjugacy classes of SL(2,q)] Let G = SL(2,q). With probability ≥ 1 -|G|-Ω(1) over uniform a,b in G, the distribution C(a)C(b) is |G|-Ω(1) close in statistical distance to uniform.


Here and throughout this post, C(g) denotes a uniform element from the conjugacy class of g, and every occurrence of C corresponds to an independent draw.

In this post we sketch a proof of Theorem 1, following [GV15]. Similar theorems were proved already. For example Shalev [Sha08] proves a version of Theorem 1 without a quantitative bound on the statistical distance. It is possible to plug some more representation-theoretic information into Shalev’s proof and get the same quantitative bound as in Theorem 1, though I don’t have a good reference for this extra information. However the proof in [Sha08] builds on a number of other things, which also means that if I have to prove a similar but not identical statement, as we had to do in [GV15], it would be hard for me.

Instead, here is how you can proceed to prove the theorem. First, we remark that the distribution of C(a)C(b) is the same as that of

C(C(a)C(b)),

because for uniform x, y, and z in Fq we have the following equalities of distributions:

C(C(a)C(b)) = x-1(y-1ayz-1bz)x = x-1(y-1ayxx-1z-1bz)x = C(a)C(b)

where the last equality follows by replacing y with yx and z with zx.

That means that we get one conjugation “for free” and we just have to show that C(a)C(b) falls into various conjugacy classes with the right probability.

Now the great thing about SL(2,q) is that you can essentially think of it as made up of q conjugacy classes each of size q2 (the whole group has size q3 – q). This is of course not exactly correct, in particular the identity element obviously gives a conjugacy class of size 1. But except for a constant number of conjugacy classes, every conjugacy class has size q2 up to lower-order terms. This means that what we have to show is simply that the conjugacy class of C(a)C(b) is essentially uniform over conjugacy classes.

Next, the trace map Tr  : SL(2,q) → Fq is essentially a bijection between conjugacy classes and the field Fq. To see this recall that the trace map satisfies the cyclic property:

Tr xyz = Tr yzx.

This implies that

Tr u-1au = Tr auu-1 = Tr a,

and so conjugate elements have the same trace. On the other hand, the q matrixes

x  1

1  0

for x in Fq all have different traces, and by what we said above their conjugacy classes make up essentially all the group.

Putting altogether, what we are trying to show is that

Tr C(a)C(b)

is |G|-Ω(1) close to uniform over F q in statistical distance.

Furthermore, again by the cyclic property we can consider without loss of generality

Tr aC(b)

instead, and moreover we can let a have the form

0  1

1  w

and b have the form

v  1

1  0

(there is no important reason why w is at the bottom rather than at the top).

Writing a generic g in SL(2,q) as the matrix

u1   u2

u3   u4

you can now with some patience work out the expression

Tr au-1bu = vu 3u4 – u32 + u 42 – vu 1u2 + u12 – vwu 2u3 + wu1u3 – u22 – wu 2u4.

What we want to show is that for typical choices of w and v, the value of this polynomial is q-Ω(1) close to uniform over F q for a uniform choice of u subject to the determinant of u being 1, i.e, u1u4 – u2u3 = 1.

Maybe there is some machinery that immediately does that. Lacking the machinery, you can use the equation u1u4 – u2u3 = 1 to remove u4 by dividing by u1 (the cases where u1 = 0 are few and do not affect the final answer). Now you end up with a polynomial p in three variables, which we can rename x, y, and z. You want to show that p(x,y,z) is close to uniform, for uniform choices for x,y,z. The benefit of this substitution is that we removed the annoying condition that the determinant is one.

To argue about p(x,y,z), the DeMillo–Lipton–Schwartz-Zippel lemma comes to mind, but is not sufficient for our purposes. It is consistent with that lemma that the polynomial doesn’t take a constant fraction of the values of the field, which would give a constant statistical distance. One has to use more powerful results known as the Lang-Weil theorem. This theorem provides under suitable conditions on p a sharp bound on the probability that p(x,y,z) = a for a fixed a in Fq. The probability is 1∕q plus lower-order terms, and then by summing over all a in Fq one obtains the desired bound on the statistical distance.

I am curious if there is a way to get the statistical distance bound without first proving a point-wise bound.

To apply the Lang-Weil theorem you have to show that the polynomial is “absolutely irreducible,” i.e., irreducible over any algebraic extension of the field. This can be proven from first principles by a somewhat lengthy case analysis.

References

[GV15]   W. T. Gowers and Emanuele Viola. The communication complexity of interleaved group products. In ACM Symp. on the Theory of Computing (STOC), 2015.

[Sha08]   Aner Shalev. Mixing and generation in simple groups. J. Algebra, 319(7):3075–3086, 2008.