Moshe Vardi’s latest insight in the Communications of the ACM (whose title we adopt for this post) agrees with our previous post “Because of pollution, conferences should be virtual.” Vardi calls for “sweeping policy change […] requiring that authors of accepted papers that must fly to participate in a conference may opt out from in-person involvement and contribute instead by video.” Vardi gives some indication of the environmental impact of the travel-based system, and further suspects that in-person conference participation is “much less valuable than we would like to believe.”
These issues are closely related to the decades-old discussion of journals vs. conferences. Several older posts on this blog were devoted to that. Lance Fortnow, back in 2009, wrote that it’s Time for computer science to grow up. The full text of this article is premium content, but you can read a pre-publication version here. Basically, he argues in favor of a journal-based publication system.
Apparently in response, judging from the title, Boaz Barak wrote a piece titled Computer science should stay young. I can’t quickly find a link to the whole thing, but his bottom line is online “I disagree with the conclusion that we should transition to a classical journal-based model similar to that of other fields. I believe conferences offer a number of unique advantages that have helped make computer science dynamic and successful, and can continue to do so in the future.”
I disagree that conferences are young. They belong to the BI (before internet) era, and so look rather anchored in the past to me. Historically, I also suppose in-person discussion predates writing, though this is irrelevant. What is young is the health impact of pollution (Fortnow and Barak’s pieces don’t touch on health issues). (By health impact I include climate change, but I prefer not to use that term for various reasons.)
And what is young and cool is arxiv overlay journals, TCS+ talks, videoconferences, ECCC, etc.
Instead, we impose on our community most inconvenient transoceanic flights. To end I’ll quote from Oded Goldreich’s my choices:
Phoenix in June: […] One must be out of their mind to hold a conference under such weather conditions. I guess humans can endure such weather conditions and even worse ones, but why choose to do so? Why call upon people from all over the world to travel to one of the least comfortable locations (per the timing)?
Historically, progressive people have been understandably quite skeptical of big business, including developers. (I hesitated before using the word “progressive” because the meaning is obscure, and there are several related words, like “liberal” and so on. But the meaning on this post should be clear.)
Recently, something shocking happened. Self-declared progressive people in Newton have come to believe that the way to solve the world’s problems is to slash regulations, rewrite zoning documents, chop down forests, and give a free hand to developers (not residents in Newton) to build whatever they want, no questions asked. (Wait, we are putting solar panels on the new roofs!)
As a consequence, there is now a heated battle in Newton, ward for ward, to try to protect our city against this well-funded and politically well-connected assault.
And we are not even discussing if we should build a mega complex as opposed to creating new green spaces and protected bike lanes, or improving public transportation, or finally having a gym and a swimming pool — all things that would improve our health and the quality of life. The discussion is just how big the mega complex should be.
I have prepared this talk which is a little unusual and is in part historical and speculative. You can view the slides here. I am scheduled to give it in about three hours at Boston University. And because it’s just another day in the greater Boston area, while I’ll be talking my ex office-mate Vitaly Feldman will be speaking at Harvard University. His talk looks quite interesting and attempts to explain why overfitting is actually necessary for good learning. As for mine, well you’ll have to come and see or take a peek at the slides.
We briefly interrupt the on-flight entertainment for an update on e-ink monitors (see previous posts here and here). I am happy to report that during the summer my e-ink monitor worked extremely well. Writing as I am doing now with the sun shining through my window is fantastic. My entire summer production — including this survey on non-abelian combinatorics — was written exclusively on the e-ink monitor. I don’t think I ever felt so good about a piece of electronics since relentless market pressure forced me to abandon Amiga.
Perhaps conferences made sense fifty years ago. We did not have internet, and the pollution was not as bad. Today, we can have effective virtual meetings, while the pollution has reached a level of crisis, see this moving talk by Greta_Thunberg. Moving to a system of virtual conferences is I believe a duty of every scientist. Doing so will cut the significant air travel emissions that come from shipping scientists across the world. To attend a climate summit in the USofA, Greta will sail across the Atlantic ocean on a zero emission boat.
We can keep everything the way it is, but simply give the talks online. This change doesn’t involve anybody higher up in the political ladder. It only involves us, the program chairs, the steering committees.
While we wait for that, we can begin by virtualizing the physical STOC/FOCS PC meetings, whose added value over a virtual meeting, if any, does not justify the cost; and by holding conferences where the center of mass is, instead of exotic places where one can combine the trip with a vacation at the expense of tax payers’ money and everybody’s health. And that is also why I put a bid to hold the 2021 Conference on Computational Complexity in Boston.
NSF panels, making decisions worth millions, routinely have virtual panelists (I was the last few times). So why do we insist on shipping scientists across the globe multiple times a year to give 15-minute talks which to most people are less useful than spending 20 minutes reading the paper on the arxiv?
Below and here in pdf is a survey I am writing for SIGACT, due next week. Comments would be very helpful.
Finite groups provide an amazing wealth of problems of interest to complexity theory. And complexity theory also provides a useful viewpoint of group-theoretic notions, such as what it means for a group to be “far from abelian.” The general problem that we consider in this survey is that of computing a group product over a finite group . Several variants of this problem are considered in this survey and in the literature, including in [KMR66, Bar89, BC92, IL95, BGKL03, PRS97, Amb96, AL00, Raz00, MV13, Mil14, GVa].
Some specific, natural computational problems related to are, from hardest to easiest:
(1) Computing ,
(2) Deciding if , where is the identity element of , and
(3) Deciding if under the promise that either or for a fixed .
Problem (3) is from [MV13]. The focus of this survey is on (2) and (3).
We work in the model of communication complexity [Yao79], with which we assume familiarity. For background see [KN97, RY19]. Briefly, the terms in a product will be partitioned among collaborating parties – in several ways – and we shall bound the number of bits that the parties need to exchange to solve the problem.
We begin in Section 2 with two-party communication complexity. In Section 3 we give a streamlined proof, except for a step that is only sketched, of a result of Gowers and the author [GV15, GVb] about interleaved group products. In particular we present an alternative proof, communicated to us by Will Sawin, of a lemma from [GVa]. We then consider two models of three-party communication. In Section 4 we consider number-in-hand protocols, and we relate the communication complexity to so-called quasirandom groups [Gow08, BNP08]. In Section 6 we consider number-in-hand protocols, and specifically the problem of separating deterministic and randomized communication. In Section 7 we give an exposition of a result by Austin [Aus16], and show that it implies a separation that matches the state-of-the-art [BDPW10] but applies to a different problem.
Some of the sections follow closely a set of lectures by the author [Vio17]; related material can also be found in the blog posts [Vioa, Viob]. One of the goals of this survey is to present this material in a more organized matter, in addition to including new material.
Let be a group and let us start by considering the following basic communication task. Alice gets an element and Bob gets an element and their goal is to check if . How much communication do they need? Well, is equivalent to . Because Bob can compute without communication, this problem is just a rephrasing of the equality problem, which has a randomized protocol with constant communication. This holds for any group.
The same is true if Alice gets two elements and and they need to check if . Indeed, it is just checking equality of and , and again Alice can compute the latter without communication.
Things get more interesting if both Alice and Bob get two elements and they need to check if the interleaved product of the elements of Alice and Bob equals , that is, if
Now the previous transformations don’t help anymore. In fact, the complexity depends on the group. If it is abelian then the elements can be reordered and the problem is equivalent to checking if . Again, Alice can compute without communication, and Bob can compute without communication. So this is the same problem as before and it has a constant communication protocol.
For non-abelian groups this reordering cannot be done, and the problem seems hard. This can be formalized for a class of groups that are “far from abelian” – or we can take this result as a definition of being far from abelian. One of the groups that works best in this sense is the following, first constructed by Galois in the 1830’s.
Theorem 1. Let and let . Suppose Alice receives and Bob receives . They are promised that either equals or . Deciding which case it is requires randomized communication .
This bound is tight as Alice can send her input, taking bits. We present the proof of this theorem in the next section.
If we work over instead of in Theorem 1 then the communication complexity is [Sha16]. The latter bound is tight [MV13]: with knowledge of , the parties can agree on an element such that . Hence they only need to keep track of the image . This takes communication because In more detail, the protocol is as follows. First Bob sends . Then Alice sends . Then Bob sends and finally Alice can check if .
Interestingly, to decide if without the promise a stronger lower bound can be proved for many groups, including , see Corollary 3 below.
Theorem 1 and the corresponding results for other groups also scale with the length of the product: for example deciding if over requires communication which is tight.
A strength of the above results is that they hold for any choice of in the promise. This makes them equivalent to certain results, discussed below in Section 5.0.1. Next we prove two other lower bounds that do not have this property and can be obtained by reduction from disjointness. First we show that for any non-abelian group there exists an element such that deciding if or requires communication linear in the length of the product. Interestingly, the proof works for any non-abelian group. The choice of is critical, as for some and the problem is easy. For example: take any group and consider where is the group of integers with addition modulo . Distinguishing between and amounts to computing the parity of (the components of) the input, which takes constant communication.
Theorem 2. Let be a non-abelian group. There exists such that the following holds. Suppose Alice receives and receives . They are promised that either equals or . Deciding which case it is requires randomized communication .
Proof. We reduce from unique set-disjointness, defined below. For the reduction we encode the And of two bits as a group product. This encoding is similar to the famous puzzle that asks to hang a picture on a wall with two nails in such a way that the picture falls if either one of the nails is removed. Since is non-abelian, there exist such that , and in particular with . We can use this fact to encode the And of and as
In the disjointness problem Alice and Bob get inputs respectively, and they wish to check if there exists an such that . If you think of as characteristic vectors of sets, this problem is asking if the sets have a common element or not. The communication of this problem is [KS92, Raz92]. Moreover, in the “unique” variant of this problem where the number of such ’s is 0 or 1, the same lower bound still applies. This follows from [KS92, Raz92] – see also Proposition 3.3 in [AMS99]. For more on disjointness see the surveys [She14, CP10].
We will reduce unique disjointness to group products. For we produce inputs for the group problem as follows:
The group product becomes
If there isn’t an such that , then for each the term is , and thus the whole product is 1.
Otherwise, there exists a unique such that and thus the product will be , with being in the -th position. If Alice and Bob can check if the above product is equal to 1, they can also solve the unique set disjointness problem, and thus the lower bound applies for the former.
We required the uniqueness property, because otherwise we might get a product that could be equal to 1 in some groups.
Theorem 3. Let be a non-abelian group and consider . Suppose Alice receives and Bob receives . Deciding if requires randomized communication .
Proof. The proof is similar to the proof of Theorem 2. We use coordinate of to encode bit of the disjointness instance. If there is no intersection in the latter, the product will be . Otherwise, at least some coordinate will be .
As a corollary we can prove a lower bound for .
Corollary 3. Theorem 3 holds for .
Proof. Note that contains and that is not abelian. Apply Theorem 3.
Several related proofs of this theorem exist, see [GV15, GVa, Sha16]. As in [GVa], the proof that we present can be broken down in three steps. First we reduce the problem to a statement about conjugacy classes. Second we reduce this to a statement about trace maps. Third we prove the latter. We present the first step in a way that is similar but slightly different from the presentation in [GVa]. The second step is only sketched, but relies on classical results about and can be found in [GVa]. For the third we present a proof that was communicated to us by Will Sawin. We thank him for his permission to include it here.
We would like to rule out randomized protocols, but it is hard to reason about them directly. Instead, we are going to rule out deterministic protocols on random inputs. First, for any group element we define the distribution on quadruples , where are uniformly random elements. Note the product of the elements in is always .
Towards a contradiction, suppose we have a randomized protocol such that
This implies a deterministic protocol with the same gap, by fixing the randomness.
We reach a contradiction by showing that for every deterministic protocol using little communication, we have
We start with the following standard lemma, which describes a protocol using product sets.
Lemma 4. (The set of accepted inputs of) A deterministic -bit protocol for a function can be written as a disjoint union of rectangles, where a rectangle is a set of the form with and and where is constant.
Proof. (sketch) For every communication transcript , let be the set of inputs giving transcript . The sets are disjoint since an input gives only one transcript, and their number is : one for each communication transcript of the protocol. The rectangle property can be proven by induction on the protocol tree.
Next, we show that any rectangle cannot distinguish . The way we achieve this is by showing that for every the probability that is roughly the same for every , and is roughly the density of the rectangle. (Here we write for the characteristic function of the set .) Without loss of generality we set . Let have density and have density . We aim to bound above
where note the distribution of is the same as .
Because the distribution of is uniform in , the above can be rewritten as
The inequality is Cauchy-Schwarz, and the step after that is obtained by expanding the square and noting that is uniform in , so that the expectation of the term is .
Now we do several transformations to rewrite the distribution in the last expectation in a convenient form. First, right-multiplying by we can rewrite the distribution as the uniform distribution on tuples such that
The last equation is equivalent to .
We can now do a transformation setting to be to rewrite the distribution of the four-tuple as
where we use to denote a uniform element from the conjugacy class of , that is for a uniform .
Hence it is sufficient to bound
where all the variables are uniform and independent.
With a similar derivation as above, this can be rewritten as
Here each occurrence of denotes a uniform and independent conjugate. Hence it is sufficient to bound
We can now replace with Because has the same distribution of , it is sufficient to bound
For this, it is enough to show that with high probability over and , the distribution of , over the choice of the two independent conjugates, has statistical distance from uniform.
In this step we use information on the conjugacy classes of the group to reduce the latter task to one about the equidistribution of the trace map. Let be the Trace map:
We state the lemma that we want to show.
is close to uniform over in statistical distance.
To give some context, in the conjugacy class of an element is essentially determined by the trace. Moreover, we can think of and as generic elements in . So the lemma can be interpreted as saying that for typical , taking a uniform element from the conjugacy class of and multiplying it by yields an element whose conjugacy class is uniform among the classes of . Using that essentially all conjugacy classes are equal, and some of the properties of the trace map, one can show that the above lemma implies that for typical the distribution of is close to uniform. For more on how this fits we refer the reader to [GVa].
We now present a proof of Lemma 5. The high-level argument of the proof is the same as in [GVa] (Lemma 5.5), but the details may be more accessible and in particular the use of the Lang-Weil theorem [LW54] from algebraic geometry is replaced by a more elementary argument. For simplicity we shall only cover the case where is prime. We will show that for all but values of , the probability over that is within of , and for the others it is at most . Summing over gives the result.
We shall consider elements whose trace is unique to the conjugacy class of . (This holds for all but conjugacy classes – see for example [GVa] for details.) This means that the distribution of is that of a uniform element in conditioned on having trace . Hence, we can write the probability that as the number of solutions in to the following three equations (divided by the size of the group, which is ):
We use the second one to remove and the first one to remove from the last equation. This gives
This is an equation in two variables. Write and and use distributivity to rewrite the equation as
At least since Lagrange it has been known how to reduce this to a Pell equation . This is done by applying an invertible affine transformation, which does not change the number of solutions. First set . Then the equation becomes
Equivalently, the cross-term has disappeared and we have
Now one can add constants to and to remove the linear terms, changing the constant term. Specifically, let and set and . The equation becomes
The linear terms disappear, the coefficients of and do not change and the equation can be rewritten as
So this is now a Pell equation
For all but values of we have that is non-zero. Moreover, for all but values of the term is a non-zero polynomial in . (Specifically, for any and any such that .) So we only consider the values of that make it non-zero. Those where give solutions, which is fine. We conclude with the following lemma.
is within of .
This is a basic result from algebraic geometry that can be proved from first principles.
Proof. If for some , then we can replace with and we can count instead the solutions to the equation
Because we can set and , which preserves the number of solutions, and rewrite the equation as
Because , this has solutions: for every non-zero we have .
So now we can assume that for any . Because the number of squares is , the range of has size . Similarly, the range of also has size . Hence these two ranges intersect, and there is a solution .
We take a line passing through : for parameters we consider pairs . There is a bijection between such pairs with and the points with . Because the number of solutions with is , using that , it suffices to count the solutions with .
The intuition is that this line has two intersections with the curve . Because one of them, , lies in , the other has to lie as well there. Algebraically, we can plug the pair in the expression to obtain the equivalent equation
Using that is a solution this becomes
We can divide by . Obtaining
We can now divide by which is non-zero by the assumption . This yields
Hence for every value of there is a unique giving a solution. This gives solutions.
In this section we consider the following three-party number-in-hand problem: Alice gets , Bob gets , Charlie gets , and they want to know if . The communication depends on the group . We present next two efficient protocols for abelian groups, and then a communication lower bound for other groups.
We begin with the simplest setting. Let , that is -bit strings with bit-wise addition modulo 2. The parties want to check if . They can do so as follows. First, they pick a hash function that is linear: . Specifically, for a uniformly random define . Then, the protocol is as follows.
- Alice sends ,
- Bob send ,
- Charlie accepts if and only if .
The hash function outputs 1 bit, so the communication is constant. By linearity, the protocol accepts iff . If this is always the case, otherwise it happens with probability .
This protocol is from [Vio14]. For simplicity we only consider the case here – the protocol for general is in [Vio14]. Again, the parties want to check if . For this group, there is no 100% linear hash function but there are almost linear hash functions that satisfy the following properties. Note that the inputs to are interpreted modulo and the outputs modulo .
- for all there is such that ,
- for all we have ,
Assuming some random hash function that satisfies the above properties the protocol works similarly to the previous one:
- Alice sends ,
- Bob sends ,
- Charlie accepts if and only if .
We can set to achieve constant communication and constant error.
To prove correctness of the protocol, first note that for some . Then consider the following two cases:
- if then and the protocol is always correct.
- if then the probability that for some is at most the probability that which is ; so the protocol is correct with high probability.
The hash function..
For the hash function we can use a function analyzed in [DHKP97]. Let be a random odd number modulo . Define
where the product is integer multiplication, and is bit-shift. In other words we output the bits of the integer product .
We now verify that the above hash function family satisfies the three properties we required above.
Property (3) is trivially satisfied.
For property (1) we have the following. Let and and . To recap, by definition we have:
Notice that if in the addition the carry into the bit is , then
which concludes the proof for property (1).
Finally, we prove property (2). We start by writing where is odd. So the binary representation of looks like
The binary representation of the product for a uniformly random looks like
We consider the two following cases for the product :
- If , or equivalently , the output never lands in the bad set ;
- Otherwise, the hash function output has uniform bits. For any set , the probability that the output lands in is at most .
What happens in other groups? The hash function used in the previous result was fairly non-trivial. Do we have an almost linear hash function for matrices? The answer is negative. For and the problem is hard, even under the promise. For a group the complexity can be expressed in terms of a parameter which comes from representation theory. We will not formally define this parameter here, but several qualitatively equivalent formulations can be found in [Gow08]. Instead the following table shows the ’s for the groups we’ve introduced.
Theorem 1. Let be a group, and let . Let be the minimum dimension of any irreducible representation of . Suppose Alice, Bob, and Charlie receive , y, and respectively. They are promised that either equals or . Deciding which case it is requires randomized communication complexity .
This result is tight for the groups we have discussed so far. The arguments are the same as before. Specifically, for the communication is . This is tight up to constants, because Alice and Bob can send their elements. For the communication is . This is tight as well, as the parties can again just communicate the images of an element such that , as discussed in Section 1. This also gives a computational proof that cannot be too large for , i.e., it is at most . For abelian groups we get nothing, matching the efficient protocols given above.
First we discuss several “mixing” lemmas for groups, then we come back to protocols and see how to apply one of them there.
We want to consider “high entropy” distributions over , and state a fact showing that the multiplication of two such distributions “mixes” or in other words increases the entropy. To define entropy we use the norms . Our notion of (non-)entropy will be . Note that is exactly the collision probability where is independent and identically distributed to . The smaller this quantity, the higher the entropy of . For the uniform distribution we have and so we can think of as maximum entropy. If is uniform over elements, we have and we think of as having “high” entropy.
Because the entropy of is small, we can think of the distance between and in the 2-norm as being essentially the entropy of :
where is the minimum dimension of an irreducible representation of .
By this lemma, for high entropy distributions and , we get . The factor allows us to pass to statistical distance using Cauchy-Schwarz:
This is the way in which we will use the lemma.
Another useful consequence of this lemma, which however we will not use directly, is this. Suppose now you have independent, high-entropy variables . Then for every we have
To show this, set without loss of generality and rewrite the left-hand-side as
By Cauchy-Schwarz this is at most
and we can conclude by Lemma 7. Hence the product of three high-entropy distributions is close to uniform in a point-wise sense: each group element is obtained with roughly probability .
Theorem 1. Let . Let and be two distributions over . Suppose is independent from . Let . We have
For example, when and have high entropy over (that is, are uniform over pairs), we have , and so . In particular, is close to uniform over in statistical distance.
As in the beginning of Section 3, for any group element we define the distribution on triples , where are uniform and independent. Note the product of the elements in is always . Again as in Section 3, it suffices to show that for every deterministic protocols using little communication we have
Analogously to Lemma 4, the following lemma describes a protocol using rectangles. The proof is nearly identical and is omitted.
Next we show that these product sets cannot distinguish these two distributions , via a straightforward application of lemma 7.
Proof. Pick any and let be the inputs of Alice, Bob, and Charlie respectively. Then
where is uniform in . If either or is small, that is or , then also and hence (??) is at most as well. This holds for every , so we also have We will choose later.
Otherwise, and are large: and . Let be the distribution of conditioned on . We have that and are independent and each is uniform over at least elements. By Lemma 7 this implies , where is the uniform distribution. As mentioned after the lemma, by Cauchy–Schwarz we obtain
where the last inequality follows from the fact that .
This implies that and , because taking inverses and multiplying by does not change the distance to uniform. These two last inequalities imply that
and thus we get that
Picking completes the proof.
Returning to arbitrary deterministic protocols (as opposed to rectangles), write as a union of disjoint rectangles by Lemma 8. Applying Lemma 9 and summing over all rectangles we get that the distinguishing advantage of is at most . For the advantage is at most , concluding the proof.
In number-on-forehead (NOH) communication complexity [CFL83] with parties, the input is a -tuple and each party sees all of it except . For background, it is not known how to prove negative results for parties.
We mention that Theorem 1 can be extended to the multiparty setting, see [GVa]. Several questions arise here, such as whether this problem remains hard for , and what is the minimum length of an interleaved product that is hard for parties (the proof in 1 gives a large constant).
However in this survey we shall instead focus on the problem of separating deterministic and randomized communication. For , we know the optimal separation: The equality function requires communication for deterministic protocols, but can be solved using communication if we allow the protocols to use public coins. For , the best known separation between deterministic and randomized protocol is vs [BDPW10]. In the following we give a new proof of this result, for a different function: if and only if for . As is true for some functions in [BDPW10], a stronger separation could hold for . For context, let us state and prove the upper bound for randomized communication.
Proof. In the number-on-forehead model, computing reduces to two-party equality with no additional communication: Alice computes privately, then Alice and Bob check if .
To prove the lower bound for deterministic protocols we reduce the communication problem to a combinatorial problem.
For intuition, if is the abelian group of real numbers with addition, a corner becomes for , which are the coordinates of an isosceles triangle. We now state the theorem that connects corners and lower bounds.
It is known that implies a corner for certain abelian groups , see [LM07] for the best bound and pointers to the history of the problem. For a stronger result is known: implies a corner [Aus16]. This in turn implies communication .
Proof. We saw already twice that a number-in-hand -bit protocol can be written as a disjoint union of rectangles (Lemmas 4, 8). Likewise, a number-on-forehead -bit protocol can be written as a disjoint union of cylinder intersections for some :
The proof idea of the above fact is to consider the transcripts of , then one can see that the inputs giving a fixed transcript are a cylinder intersection.
Let be a -bit protocol. Consider the inputs on which accepts. Note that at least fraction of them are accepted by some cylinder intersection . Let . Since the first two elements in the tuple determine the last, we have .
Now suppose contains a corner . Then
This implies , which is a contradiction because and so .
In this section we prove the corners theorem for quasirandom groups, following Austin [Aus16]. Our exposition has several minor differences with that in [Aus16], which may make it more computer-science friendly. Possibly a proof can also be obtained via certain local modifications and simplifications of Green’s exposition [Gre05b, Gre05a] of an earlier proof for the abelian case. We focus on the case for simplicity, but the proof immediately extends to other quasirandom groups (with corresponding parameters).
Theorem 1. Let . Every subset of density contains a corner .
For intuition, suppose is a product set, i.e., for . Let’s look at the quantity
where iff . Note that the random variable in the expectation is equal to exactly when form a corner in . We’ll show that this quantity is greater than , which implies that contains a corner (where ). Since we are taking , we can rewrite the above quantity as
where the last line follows by replacing with in the uniform distribution. If , then both |B|/|G| and . Condition on , , . Then the distribution is a product of three independent distributions, each uniform on a set of density . (In fact, two distributions would suffice for this.) By Lemma 7, is close to uniform in statistical distance. This implies that the above expectation equals
for for a small enough constant . Hence, product sets of density polynomial in contain corners.
Given the above, it is natural to try to decompose an arbitrary set into product sets. We will make use of a more general result.
Let be some universe (we will take ) and let be a function (for us, ). Let be some set of functions, which can be thought of as “easy functions” or “distinguishers” (these will be rectangles or closely related to them). The next theorem shows how to decompose into a linear combination of the up to an error which is polynomial in the length of the combination. More specifically, will be indistinguishable from by the .
A different way to state the conclusion, which we will use, is to say that we can write so that is small.
The lemma is due to Frieze and Kannan [FK96]. It is called “weak” because it came after Szemerédi’s regularity lemma, which has a stronger distinguishing conclusion. However, the lemma is also “strong” in the sense that Szemerédi’s regularity lemma has as a tower of whereas here we have polynomial in . The weak regularity lemma is also simpler. There also exists a proof [Tao17] of Szemerédi’s theorem (on arithmetic progressions), which uses weak regularity as opposed to the full regularity lemma used initially.
Proof. We will construct the approximation through an iterative process producing functions . We will show that decreases by each iteration.
Start: Define (which can be realized setting ).
Iterate: If not done, there exists such that . Assume without loss of generality .
Update: where shall be picked later.
Let us analyze the progress made by the algorithm.
where the last line follows by taking . Therefore, there can only be iterations because .
Returning to the main proof, we will use the weak regularity lemma to approximate the indicator function for arbitrary by rectangles. That is, we take to be the collection of indicator functions for all sets of the form for . The weak regularity lemma shows how to decompose into a linear combination of rectangles. These rectangles may overlap. However, we ideally want to be a linear combination of non-overlapping rectangles. In other words, we want a partition of rectangles. It is possible to achieve this at the price of exponentiating the number of rectangles. Note that an exponential loss is necessary even if in every rectangle; or in other words in the uni-dimensional setting. This is one step where the terminology “rectangle” may be misleading – the set is not necessarily an interval. If it was, a polynomial rather than exponential blow-up would have sufficed to remove overlaps.
In the above decomposition, note that it is natural to take the coefficients of rectangles to be the density of points in that are in the rectangle. This gives rise to the following claim.
Consequently, we have that , where is the sum of non-overlapping rectangles with coefficients .
Proof. Let be a partition decomposition with arbitrary weights. Let be a partition decomposition with weights being the average of . It is enough to show that for all rectangle distinguishers
By the triangle inequality, we have that
To bound , note that the error is maximized for a that respects the decomposition in non-overlapping rectangles, i.e., is the union of some non-overlapping rectangles from the decomposition. This can be argued using that, unlike , the value of and on a rectangle from the decomposition is fixed. But, from the point of “view” of such , ! More formally, . This gives
and concludes the proof.
We need to get still a little more from this decomposition. In our application of the weak regularity lemma above, we took the set of distinguishers to be characteristic functions of rectangles. That is, distinguishers that can be written as where and map . We will use that the same guarantee holds for and with range , up to a constant factor loss in the error. Indeed, let and have range . Write where and have range , and the same for . The error for distinguisher is at most the sum of the errors for distinguishers , , , and . So we can restrict our attention to distinguishers where and have range . In turn, a function with range can be written as an expectation for functions with range , and the same for . We conclude by observing that
Let us now finish the proof by showing a corner exists for sufficiently dense sets . We’ll use three types of decompositions for , with respect to the following three types of distinguishers, where and have range :
The first type is just rectangles, what we have been discussing until now. The distinguishers in the last two classes can be visualized over as parallelograms with a 45-degree angle. The same extra properties we discussed for rectangles can be verified hold for them too.
Recall that we want to show
We’ll decompose the -th occurrence of via the -th decomposition listed above. We’ll write this decomposition as . We apply this in a certain order to produce sums of products of three functions. The inputs to the functions don’t change, so to avoid clutter we do not write them, and it is understood that in each product of three functions the inputs are, in order . The decomposition is:
We first show that the expectation of the first term is big. This takes the next two claims. Then we show that the expectations of the other terms are small.
Proof. We just need to get error for any product of three functions for the three decomposition types. We have:
This is similar to what we discussed in the overview, and is where we use mixing. Specifically, if or are at most for a small enough constant than we are done. Otherwise, conditioned on , the distribution on is uniform over a set of density , and the same holds for , and the result follows by Lemma 7.
Recall that we start with a set of density .
Proof. We will relate the expectation over to using the Hölder inequality: For random variables ,
To apply this inequality in our setting, write
By the Hölder inequality the expectation of the right-hand side is
The last three terms equal to because
where is the set in the partition that contains . Putting the above together we obtain
Finally, because the functions are positive, we have that . This concludes the proof.
It remains to show the other terms are small. Let be the error in the weak regularity lemma with respect to distinguishers with range . Recall that this implies error with respect to distinguishers with range . We give the proof for one of the terms and then we say little about the other two.
The proof involves changing names of variables and doing Cauchy-Schwarz to remove the terms with and bound the expectation above by , which is small by the regularity lemma.
Proof. Replace with in the uniform distribution to get
where the first inequality is by Cauchy-Schwarz.
Now replace and reason in the same way:
Replace to rewrite the expectation as
We want to view the last three terms as a distinguisher . First, note that has range . This is because and has range , where recall that is the set in the partition that contains . Fix . The last term in the expectation becomes a constant . The second term only depends on , and the third only on . Hence for appropriate functions and with range this expectation can be rewritten as
which concludes the proof.
There are similar proofs to show the remaining terms are small. For , we can perform simple manipulations and then reduce to the above case. For , we have a slightly easier proof than above.
Suppose our set has density , and the error in the regularity lemma is . By the above results we can bound
[AL00] Andris Ambainis and Satyanarayana V. Lokam. Imroved upper bounds on the simultaneous messages complexity of the generalized addressing function. In Latin American Symposium on Theoretical Informatics (LATIN), pages 207–216, 2000.
[RY19] Anup Rao and Amir Yehudayoff. Communication complexity. 2019. https://homes.cs.washington.edu/ anuprao/pubs/book.pdf.
[Tao17] Terence Tao. Szemerédiâs proof of Szemerédiâs theorem, 2017. https://terrytao.files.wordpress.com/2017/09/szemeredi-proof1.pdf.
[Vioa] Emanuele Viola. Thoughts: Mixing in groups. https://emanueleviola.wordpress.com/2016/10/21/mixing-in-groups/.
[Viob] Emanuele Viola. Thoughts: Mixing in groups ii. https://emanueleviola.wordpress.com/2016/11/15/mixing-in-groups-ii/.
[Vio17] Emanuele Viola. Special topics in complexity theory. Lecture notes of the class taught at Northeastern University. Available at http://www.ccs.neu.edu/home/viola/classes/spepf17.html, 2017.
For more than 20 years we’ve had lower bounds for threshold circuits of depth [IPS97], for a fixed . There have been several “explanations” for the lack of progress [AK10]. Recently Chen and Tell have given a better explanation showing that you can’t even improve the result to a better without proving “the whole thing.”
Say you have a finite group and you want to compute the iterated product of elements.
Suppose you can compute this with circuits of size and depth . Now we show how you can trade size for depth. Put a complete tree with fan-in on top of the group product, where each node computes the product of its children (this is correct by associativity, in general this works for a monoid). This tree needs depth . If you stick your circuit of size and depth at each node, the depth of the overall circuit would be obviously and the overall size would be dominated by the input layer which is . If you are aiming for overall depth , you need . This gives size .
Hence we have shown that proving bounds for some depth suffices to prove lower bounds for depth .
Chen and Tell..
The above is not the most efficient way to build a tree! I am writing this post following their paper to understand what they do. As they say, the idea is quite simple. While above the size will be dominated by the input layer, we want to balance things so that every layer has roughly the same contribution.
Let’s say we are aiming for size and let’s see what depth we can get. Let’s say now the size is . Let us denote by the number of nodes at level with being the root. The fan-in at level is so that the cost is as desired. We have the recursion .
The solution to this recursion is , see below.
So that’s it. We need to get to nodes. So if you set you get say . Going back to , we have exhibited circuits of size and depth just . So proving stronger bounds than this would rule out circuits of size and depth .
Added later: About the recurrence.
Letting we have the following recurrence for the exponents of .
If it was obviously would already be . Instead for we need to get to .
My two cents..
I am not sure I need more evidence that making progress on long-standing bounds in complexity theory is hard, but I do find it interesting to prove these links; we have quite a few by now! The fact that we have been stuck forever just short of proving “the whole thing” makes me think that these long-sought bounds may in fact be false. Would love to be proved wrong, but it’s 2019, this connection is proved by balancing a tree better, and you feel confident that P NP?