Matrix rigidity, and all that

The rigidity challenge asks to exhibit an n × n matrix M that cannot be written as M = A + B where A is “sparse” and B is “low-rank.” This challenge was raised by Valiant who showed in [Val77] that if it is met for any A with at most n1+ϵ non-zero entries and any B with rank O(n∕ log log n) then computing the linear transformation M requires either logarithmic depth or superlinear size for linear circuits. This connection relies on the following lemma.


Lemma 1. Let C : {0, 1}n →{0, 1}n be a circuit made of XOR gates. If you can remove e edges and reduce the depth to d then the linear transformation computed by C equals A + B where A has ≤ 2d non-zero entries per row (and so a total of ≤ n2d non-zero entries), and B has rank ≤ e.


Proof: After you remove the edges, each output bit is a linear combination of the removed edges and at most 2d input variables. The former can be done by B, the latter by A. QED

Valiant shows that in a log-depth, linear-size circuit one can remove O(n∕ log log n) edges to reduce the depth to nϵ – a proof can be found in [Vio09] – and this gives the above connection to lower bounds.

However, the best available tradeoff for explicit matrices give sparsity n2∕r log(n∕r) and rank r, for any parameter r; and this is not sufficient for application to lower bounds.

Error-correcting codes

It was asked whether generator matrixes of good linear codes are rigid. (A code is good if it has constant rate and constant relative distance. The dimensions of the corresponding matrixes are off by only a constant factor, and so we can treat them as identical.) Spielman [Spi95] shows that there exist good codes that can be encoded by linear-size logarithmic depth circuits. This immediately rules out the possibility of proving a lower bound, and it gives a non-trivial rigidity upper bound via the above connections.

Still, one can ask if these matrices at least are more rigid than the available tradeoffs. Goldreich reports a negative answer by Dvir, showing that there exist good codes whose generating matrix C equals A + B where A has at most O(n2∕d) non-zero entries and B has rank O(d log n∕d), for any d.

A similar negative answer follows by the paper [GHK+13]. There we show that there exist good linear codes whose generating matrix can be written as the product of few sparse matrixes. The corresponding circuits are very structured, and so perhaps it is not surprising that they give good rigidity upper bounds. More precisely, the paper shows that we can encode an n-bit message by a circuit made of XOR gates and with say n log *n wires and depth O(1) – with unbounded fan-in. Each gate in the circuit computes the XOR of some t gates, which can be written as a binary tree of depth log 2t + O(1). Such trees have poor rigidity:


Lemma 2.[Trees are not rigid] Let C be a binary tree of depth d. You can remove an O(1∕2b) fraction of edges to reduce the depth to b, for any b.


Proof: It suffices to remove all edges at depths d – b, d – 2b, …. The number of such edges is O(2d-b + 2d-2b + …) = O(2d-b). Note this includes the case d ≤ b, where we can remove 0 edges. QED

Applying Lemma 2 to a gate in our circuit, we reduce the depth of the binary tree computed at that gate to b. Applying this to every gate we obtain a circuit of depth O(b). In total we have removed an O(1∕2b) fraction of the n log *n edges.

Writing 2b = n∕d, by Lemma 1 we can write the generating matrixes of our code as C = A + B where A has at most O(n∕d) non-zero entries per row, and B has rank O(d log *n). These parameters are the same as in Dvir’s result, up to lower-order terms. The lower-order terms appear incomparable.

Walsh-Fourier transform

Another matrix that was considered is the n×n Inner Product matrix H, aka the Walsh-Hadamard matrix, where the x,y entry is the inner product of x and y modulo 2. Alman and Williams [AW16] recently give an interesting rigidity upper bound which prevents this machinery to establish a circuit lower bound. Specifically they show that H can be written as H = A + B where A has at most n1+ϵ non-zero entries, and B has rank n1-ϵ′, for any ϵ and an ϵ′ which goes to 0 when ϵ does.

Their upper bound works as follows. Let h = log 2n. Start with the univariate, real polynomial p(z1,z2,…,zh) which computes parity exactly on inputs of Hamming weight between 2ϵn and (1∕2 + ϵ)n. By interpolation such a polynomial exists with degree (1∕2 – ϵ)n. Replacing zi with xiyi you obtain a polynomial of degree n – ϵn which computes IP correctly on inputs x,y whose inner product is between 2ϵn and (1∕2 + ϵ)n.

This polynomial has 2(1-ϵ′)n monomials, where ϵ′ = Ω(ϵ2). The truth-table of a polynomial with m monomials is a matrix with rank m, and this gives a low-rank matrix B′.

The fact that sparse polynomials yield low-rank matrixes also appeared in the paper [SV12], which suggested to study the rigidity challenge for matrixes arising from polynomials.

Returning to the proof in [AW16], it remains to deal with inputs whose inner product does not lie in that range. The number of x whose weight is not between (1∕2 – ϵ)n and (1∕2 + ϵ)n is 2(1-ϵ′)n. For each such input x we modify a row of the matrix B′. Repeating the process for the y we obtain the matrix B, and the rank bound 2(1-ϵ′)n hasn’t changed.

Now a calculation shows that B differs from H in few entries. That is, there are few x and y with Hamming weight between (1∕2 – ϵ)n and (1∕2 + ϵ)n, but with inner product less than 2ϵn.

Boolean complexity

There exists a corresponding framework for boolean circuits (as opposed to circuits with XOR gates only). Rigid matrixes informally correspond to depth-3 Or-And-Or circuits. If this circuit has fan-in fo at the output gate and fan-in fi at each input gate, then the correspondence in parameters is

rank = log fo
sparsity = 2fi .

More precisely, we have the following lemma.


Lemma 3. Let C : {0, 1}n →{0, 1}n be a boolean circuit. If you can remove e edges and reduce the depth to d then you can write C as an Or-And-Or circuit with output fan-in 2e and input fan-in 2d.


Proof: After you remove the edges, each output bit and each removed edge depends on at most 2d input bits or removed edges. The output Or gate of the depth-3 circuit is a big Or over all 2e assignments of values for the removed edges. Then we need to check consistency. Each consistency check just depends on 2d inputs and so can be written as a depth-2 circuit with fan-in 2d. QED

The available bounds are of the form log fo = n∕fi. For example, for input fan-in fi = nα we have lower bounds exponential in n1-α but not more. Again it can be shown that breaking this tradeoff in certain regimes (namely, log 2fo = O(n∕ log log n)) yields lower bounds against linear-size log-depth circuits. (A proof appears in [Vio09].) It was also pointed out in [Vio13] that breaking this tradeoff in any regime yields lower bounds for branching programs. See also the previous post.

One may ask how pairwise independent hash functions relate to this challenge. Ishai, Kushilevitz, Ostrovsky, and Sahai showed [IKOS08] that they can be computed by linear-size log-depth circuits. Again this gives a non-trivial upper bound for depth-3 circuits via these connections, and one can ask for more. In [GHK+13] we give constructions of such circuits which in combination with Lemma 3 can again be used to almost match the available trade-offs.

The bottom line of this post is that we can’t prove lower bounds because they are false, and it is a puzzle to me why some people appear confident that P is different from NP.

References

[AW16]    Josh Alman and Ryan Williams. Probabilistic rank and matrix rigidity, 2016. https://arxiv.org/abs/1611.05558.

[GHK+13]   Anna Gál, Kristoffer Arnsfelt Hansen, Michal Koucký, Pavel Pudlák, and Emanuele Viola. Tight bounds on computing error-correcting codes by bounded-depth circuits with arbitrary gates. IEEE Transactions on Information Theory, 59(10):6611–6627, 2013.

[IKOS08]    Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography with constant computational overhead. In 40th ACM Symp. on the Theory of Computing (STOC), pages 433–442, 2008.

[Spi95]    Daniel Spielman. Computationally Efficient Error-Correcting Codes and Holographic Proofs. PhD thesis, Massachusetts Institute of Technology, 1995.

[SV12]    Rocco A. Servedio and Emanuele Viola. On a special case of rigidity. Available at http://www.ccs.neu.edu/home/viola/, 2012.

[Val77]    Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. In 6th Symposium on Mathematical Foundations of Computer Science, volume 53 of Lecture Notes in Computer Science, pages 162–176. Springer, 1977.

[Vio09]    Emanuele Viola. On the power of small-depth computation. Foundations and Trends in Theoretical Computer Science, 5(1):1–72, 2009.

[Vio13]    Emanuele Viola. Challenges in computational lower bounds. Available at http://www.ccs.neu.edu/home/viola/, 2013.

What is the P vs. NP problem? My two cents

PvsNP2

I just had to convert a movie clip into a different format. The conversion took ten minutes. Given that the clip can be loaded into memory in one second, ten minutes is a long time. Could not this be done faster? In fact, why not one second?

Afterwards, I played video games. I have a playstation 3, but I heard that on the playstation 4 the games look better because with the faster processor the 3D scenes have more details. But do I really need a faster processor for those details? Could not my playstation 3 be programmed to run those games? In fact, could there be a way to have playstation 4 games run on a Commodore 64, or even… see the picture. There do exist hardware limitations of course, but these are not the bottleneck. A present-day 3D game engine on a Commodore 64 would be a stunning achievement, even if the resolution were a bit coarser and the cut scenes deleted.

These are two examples of a wide-open question that is central to theoretical computer science, and to a growing number of fields of science: Are there computational tasks that require a long time?

This may sound puzzling at first, since computers that take a long time are everyday experience, from the cases mentioned above to every time the mouse pointer turns into a sand clock for a ridiculous amount of time, including just now when I booted my computer so that I could continue this post. But the point is that nobody knows whether computers could be programmed to run much faster.

To be sure, there does exist one problem that is known to take a long time. I call this the program-enumeration problem, and is as follows: consider a computer program that goes through every possible program of length n, runs each for n steps, and does something different from each. For example the program could output the first number that is not output by any program of size n within n steps. By construction, this strange task cannot be performed in time n. On the other hand, it can be done in time exponential in n. (Think n = 10000000000000000, which is roughly how many instructions a modern computer can do in a year.)

A crude, pessimistic summary of our understanding of the limitations of computation is that this is it: all we know is that the program-enumeration problem cannot be solved in time n. This single result does have many applications, for example it can be used to show that there is no algorithm for checking the correctness of programs, somewhat justifying the immense industry devoted to that. But the result is very unsatisfactory because most problems that we face, including those mentioned above, have nothing to do with program enumeration. It also feels unsatisfactory because it does not give any information on computation beyond the simple fact that programs can run other programs.

The wide-open question can now be reformulated more precisely as follows.

Grand question: Is there any computational task, besides program enumeration, that requires a long time?

A negative answer, meaning that computers are all-powerful, would have dramatic consequences, well beyond the above examples. What would you give for a computer that executes any task with no perceptible delay? What would you give to play today the game engines of the next fifty years? Certain companies would see a dramatic increase in profits. And, what I find the most interesting application, scientists would be able to solve very complex models, pushing way beyond current knowledge. Let me elaborate on this last point. Interestingly, the situation in several branches of science is somewhat similar to theoretical computer science. Specifically, scientists have identified a number of features which are desirable in a model of whatever it is that they are studying. However, nobody is able to solve models with these feature, except in toy cases, because known programs are too slow.

That computers are all-powerful sounds too good to be true, and in fact the common belief is that there do exist many problems that take a long time. This also has many desirable applications: the security of many everyday electronic systems relies on this belief for specific problems, such as factoring numbers. But we cannot be completely confident of the security until the belief is proved.

The P vs. NP problem is a young, prominent special case of the grand challenge. I think it should be presented as such, which is not always done. The problem asks whether a specific class of problems requires at least a specific amount of time to solve.

P stands for the computational tasks that can be done somewhat efficiently. The letter P is short for “polynomial time:” the time required to solve the problem must scale with a polynomial in the size of the problem. For example, if you have a program that converts movies of size n in time quadratic in n, that would be efficient, and the conversion task would be in P. This is of course a theoretical approximation which does not necessarily guarantee efficiency in practice. But this approximation is convenient, and meaningful when compared to seemingly exponential-time tasks such as program enumeration.

I don’t like the terminology “polynomial time.” Calling something a polynomial emphasizes that that something is made of many monomials. However, the only monomial that is relevant in the definition of P is the one with the highest power. I thought of “power time.” I find it more to the point, and simpler, while preserving the initial.

NP is what you can solve in power time on a non-deterministic computer. This class captures many problems that people care about, but that are not known to be in P. To keep with the spirit of the post, I’ll mention that among these problems are appropriate generalizations of Tetris, Lemmings, and Super Mario.

The P vs. NP problem can also be presented as the difference between computing a solution (P) and checking it (NP). For example, if someone can solve a tough Tetris level, it is easy for you to check that. By contrast, if someone can solve the program-enumeration problem, it is not clear how you would check that. Indeed, that problem is not believed to be in NP.

I have mixed feelings about this presentation. I do find it catchy, but I don’t like that it suggests to me that to check solutions we will use the same variety of algorithms that we use in computing them. Actually, solutions can be encoded so that the verification is extremely simple, as is known since the original formulations. On the other hand, the length of the solution does increase for this encoding, and a variety of algorithms may be recovered from the encoding.

Do I think that P is different from NP? I feel that I have no idea. In the few years that I have been following this research I have already seen several efficient algorithms for tasks that at first sight looked impossible, and in some cases were conjectured to be. Here are two:

1. Suppose your computer has no memory (i.e., it only has a constant number of registers and the program counter). Can you determine in power time if a.b > c, for integers a,b, and c with many digits? This is actually possible, by Barrington’s theorem.

2. Is there a pseudorandom generator where each output bit depends on only five input bits? This is also believed to be possible, by a result of Applebaum, Ishai, and Kushilevitz.

Both relate to restricted computational models, which is not that surprising, since we do not understand general computation. However, by analogy, should there not be similar surprises for P? Or will the only surprises in computational classes be of the type that something in P is shown to be doable even in a more restricted computational model?