In the spirit of Reingold’s research-life stories, the series “behind the paper” collects snapshots of the generation of papers. For example, did you spend months proving an exciting bound, only to discover it was already known? Or what was the key insight which made everything fit together? Records of this baffling process are typically expunged from research publications. This is a place for them. The posts will have a technical component.

The classical Cook-Levin reduction of non-deterministic time to 3SAT can be optimized along two important axes.

Axis 1: The size of the 3SAT instance. The tableau proof reduces time T to a 3SAT instance of size O(T^{2}), but this has been improved to a quasilinear T x polylog(T) in the 70s and 80s, notably using the oblivious simulation by Pippenger and Fischer, and the non-deterministic, quasilinear equivalence between random-access and sequential machines by Gurevich and Shelah.

Axis 2: The complexity of computing the 3SAT instance. If you just want to write down the 3SAT instance in time poly(T), the complexity of doing so is almost trivial. The vast majority of the clauses are fixed once you fix the algorithm and its running time, while each of the rest depends on a constant number of bits of the input to the algorithm.

However, things get more tricky if we want our 3SAT instance to enjoy what I’ll call **clause-explicitness**. Given an index i to a clause, it should be possible to compute that clause very efficiently, say in time polynomial in |i| = O(log T), which is logarithmic in the size of the formula. Still, it is another classical result that it is indeed possible to do so, yielding for example the NEXP completeness of succinct-3SAT (where your input is a circuit describing a 3SAT instance). More uses of clause explicitness can be found in a 2009 paper by Arora, Steurer, and Wigderson, where they show that interesting graph problems remain hard even on exponential-size graphs that are described by poly-size AC^{0} circuits.

I got more interested in the efficiency of reductions after Williams’ paper Improving exhaustive search implies superpolynomial lower bounds, because the in-efficiency of available reductions was a bottleneck to the applicability of his connection to low-level circuit classes. Specifically, for a lower bound against a circuit class C, one needed a reduction to 3SAT that both has quasilinear blowup and is **C-clause-explicit**: computing the ith clause had to be done by a circuit from the class C on input i. For one thing, since previous reductions where at best NC^{1}-clause-explicit, the technique wouldn’t apply to constant-depth classes.

I had some ideas how to obtain an AC^{0}-clause-explicit reduction, when Williams’ sequel came out. This work did not employ more efficient reductions, instead it used the classical polynomial-size-clause-explicit reduction as a black-box together with an additional argument to more or less convert it to a constant-depth-clause-explicit one. This made my preliminary ideas a bit useless, since there was a bypass. However disappointing, a lot worse was to come.

I was then distracted by other things, then eventually returned to the topic. I still found it an interesting question whether a very clause-explicit reduction could be devised. First, it would remove Williams’ bypass, resulting in a possibly more direct proof. Second, the in-efficiency of the reduction was **still** a bottleneck to obtain further lower bounds (more on this later).

The first step for me was to gain a deeper understanding of the classical quasilinear-size reduction — ignoring clause explicitness — so I ran a mini-polymath project in a Ph.D. class at NEU. The result is this survey, which presents a proof using sorting networks that may be conceptually simpler than the one based on Pippenger and Fischer’s oblivious simulation. The idea to use sorting is from the paper by Gurevich and Shelah, but if you follow the reductions without thinking you will make the sorting oblivious using the general, complicated simulation. About one hour after posting the fruit of months of collaborative work on ECCC, we are notified that this is Dieter van Melkebeek’s proof from Section 2.3 in his survey, and that this is the way he’s been teaching it for over a decade. This was a harder blow, yet worse was to come.

On the positive side, I am happy I have been exposed to this proof, which is strangely little-known. Now I never miss an opportunity to teach my students

**THE TRUE REASON WHY SORTING IS IMPORTANT:**

**BECAUSE IT SHOWS UP IN THE COOK-LEVIN REDUCTION.**

To try to stay positive I’ll add that our survey has reason to exist, perhaps, because it proves some technicalities that I cannot find elsewhere, and for completeness covers the required sorting network which has disappeared from standard algorithms textbooks.

Armed with this understanding, we went back to our original aim, and managed to show that reductions can be made constant-locality clause-explicit: each bit of the ith clause depends only on a constant number of bits of the index i. Note with constant locality you can’t even add 1 to the input in binary. This is a joint work with two NEU students: Hamid Jahanjou and Eric Miles. Eric will start a postdoc at UCLA in September.

# The proof

Our first and natural attempt involved showing that the sorting network has the required level of explicitness, since that network is one of the things encoded in the SAT instance. We could make this network pretty explicit (in particular, DNF-clause-explicit). Kowalski and Van Melkebeek independently obtained similar results, leading to an AC^{0}-clause-explicit reduction.

But we could not get constant locality, no matter how hard we dug in the bottomless pit of different sorting algorithms… on the bright side, when I gave the talk at Stanford and someone whom I hadn’t recognized asked “why can’t you just use the sorting algorithm in *my thesis?”* I knew immediately who this person was and what he was referring to. Can you guess?

Then a conversation with Ben-Sasson made us realize that sorting was an overkill, and that we should instead switch to switching networks, as has long been done in the PCP literature, starting, to my knowledge, with the ’94 work of Polishchuk and Spielman. Both sorting and switching networks are made of nodes that take two inputs and output either the same two, or the two swapped. But whereas in sorting networks the node is a deterministic comparator, in switching networks there is an extra switch bit to select whether you should swap or not. Thanks to this relaxation the networks can be very simple. So this is the type of network that appears in our work.

Sorting isn’t all that there is to it. One more thing is that any log-space uniform circuit can be made constant-locality uniform, in the sense that given an index to a gate you can compute its children by a map where each output bit depends on a constant number of input bits. The techniques to achieve this are similar to those used in various equivalences between uniformity conditions established by Ruzzo in the 1979-1981 paper *On Uniform Circuit Complexity*, which does not seem to be online. Ruzzo’s goal probably was not constant locality, so that is not established in his paper. This requires some more work; for one thing, with constant locality you can’t check if your input is a valid index to a gate or a junk string, so you have to deal with that.

Of course, in the 3rd millennium we should not reduce merely to SAT, but to GAP-SAT. In a more recent paper with Ben-Sasson we gave a variant of the BGHSV PCP reduction where each query is just a projection of the input index (and the post-process is a 3CNF). Along the way we also get a reduction to 3SAT that is not constant-locality clause-explicit, but after you fix few bits it becomes locality-1 clause-explicit. In general, it is still an open problem to determine the minimum amount of locality, and it is not even clear to me how to rule out locality 1.

One thing that this line of works led to is the following. Let the complexity of 3SAT be c^{n}. The current (deterministic) record is

c < 1.34…

We obtain that if

c < 1.10…

then you get some circuit lower bounds that, however modest, we don’t know how to prove otherwise.