Special Topics in Complexity Theory: class is over :-(

I put together in a single file all the lectures given by me. On the class webpage you can also find the scribes of the two guest lectures, and the students’ presentations. Many thanks to Matthew Dippel, Xuangui Huang, Chin Ho Lee, Biswaroop Maiti, Tanay Mehta, Willy Quach, and Giorgos Zirdelis for doing an excellent job scribing these lectures. (And for giving me perfect teaching evaluations. Though I am not sure if I biased the sample. It went like this. One day I said: “Please fill the student evaluations, we need 100%.” A student said: “100% what?  Participation or score?” I meant participation but couldn’t resist replying jokingly “both.”) Finally, thanks also to all the other students, postdocs, and faculty who attended the class and created a great atmosphere.

Special Topics in Complexity Theory, Lecture 19

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lecture 19, Guest lecture by Huacheng Yu, Scribe: Matthew Dippel

Guest lecture by Huacheng Yu on dynamic data structure lower bounds, for the 2D range query and 2D range parity problems. Thanks to Huacheng for giving this lecture and for feedback on the write-up.

What is covered.

  • Overview of Larsen’s lower bound for 2D range counting.
  • Extending these techniques for \Omega (\log ^{1.5}n / \log \log ^3 n) for 2D range parity.

2 Problem definitions

Definition 1. 2D range counting

Give a data structure D that maintains a weighted set of 2 dimensional points with integer coordinates, that supports the following operations:

  1. UPDATE: Add a (point, weight) tuple to the set.
  2. QUERY: Given a query point (x, y), return the sum of weights of points (x', y') in the set satisfying x' \leq x and y' \leq y.

Definition 2. 2D range parity

Give a data structure D that maintains an unweighted set of 2 dimensional points with integer coefficients, that supports the following operations:

  1. UPDATE: Add a point to the set.
  2. QUERY: Given a query point (x, y), return the parity of the number of points (x', y') in the set satisfying x' \leq x and y' \leq y.

Both of these definitions extend easily to the d-dimensional case, but we state the 2D versions as we will mainly work with those.

2.1 Known bounds

All upper bounds assume the RAM model with word size \Theta (\log n).

Upper bounds: Using range trees, we can create a data structure for 2D range counting, with all update and query operations taking time O(\log ^d n) time. With extra tricks, we can make this work for 2D range parity with operations running in time O((\log n / \log \log n)^d).

Lower bounds. There are a series of works on lower bounds:

  • Fredman, Saks ’89 – 1D range parity requires \Omega (\log n / \log \log n).
  • Patrascu, Demaine ’04 – 1D range counting requires \Omega (\log n).
  • Larsen ’12 – 2D range counting requires \Omega ((\log n / \log \log n)^2).
  • Larsen, Weinstein, Yu ’17 – 2D range parity requires \Omega (\log ^{1.5} n / \log \log ^3 n).

This lecture presents the recent result of [Larsen ’12] and [Larsen, Weinstein, Yu ’17]. They both use the same general approach:

  1. Show that, for an efficient approach to exist, the problem must demonstrate some property.
  2. Show that the problem doesn’t have that property.

3 Larsen’s technique

All lower bounds are in the cell probe model with word size \Theta (\log n).

We consider a general data structure problem, where we require a structure D that supports updates and queries of an unspecified nature. We further assume that there exists an efficient solution with update and query times o((\log n / \log \log n)^2). We will restrict our attention to operation sequences of the form u_1, u_2, \cdots , u_n, q. That is, a sequence of n updates followed by a single query q. We fix a distribution over such sequences, and show that the problem is still hard.

3.1 Chronogram method [FS89]

We divide the updates into r epochs, so that our sequence becomes:

\begin{aligned}U_r, U_{r-1}, \cdots , U_1, q\end{aligned}

where |U_i| = \beta ^i and \beta = \log ^5 n. The epochs are multiplicatively shrinking. With this requirement, we have that r = \Theta (\log n / \log \log n).

Let M be the set of all memory cells used by the data structure when run on the sequence of updates. Further, let A_i be the set of memory cells which are accessed by the structure at least once in U_i, and never again in a further epoch.

Claim 1. The A_r, A_{r-1}, \cdots A_1 are disjoint.

Claim 2. There exists an epoch i such that D probes o(\log n / \log \log n) cells from A_i when answering the query at the end. Note that this is simply our query time divided by the number of epochs. In other words, D can’t afford to read \Omega (\log n / \log \log n) cells from each A_i set without breaking its promise on the query run time.

Claim 2 implies that there is an epoch i which has the smallest effect on the final answer. We will call this the ”easy” epoch.

Idea. : The set A_i contains ”most” information about U_i among all memory cells in M. Also, A_r, A_{r-1}, \cdots , A_{i+1} are not updated past epoch i + 1, and hence should contain no information relative to the updates in U_i. Epochs A_{i-1}, A_{i-2}, \cdots A_1 are progressively shrinking, and so the total touched cells in A_i during the query operation should be small.

\begin{aligned}\sum _{j < i}|A_j| \leq O(\beta ^{i - 1}) \cdot \log ^2 n\end{aligned}

3.2 Communication game

Having set up the framework for how to analyze the data structure, we now introduce a communication game where two parties attempt to solve an identical problem. We will show that, an efficient data structure implies an efficient solution to this communication game. If the message is smaller than the entropy of the updates of epoch i (conditioned on preceding epochs), this gives an information theoretic contradiction. The trick is to find a way for the encoder to exploit the small number of probed cells to send a short message.

The game. The game consists of two players, Alice and Bob, who must jointly compute a single query after a series of updates. The model is as follows:

  • Alice has all of the update epochs U_r, U_{r-1}, ... U_1. She also has an index i, which still corresponds to the ”easy” epoch as defined above.
  • Bob has all update epochs EXCEPT for U_i. He also has a random query q. He is aware of the index i.
  • Communication can only occur in a single direction, from Alice to Bob.
  • We assume some fixed input distribution \mathcal {D}.
  • They win this game if Bob successfully computes the correct answer for the query q.

Then we will show the following generic theorem, relating this communication game to data structures for the corresponding problem:

Theorem 3. If there is a data structure with update time t_u and probes t cells from A_i in expectation when answering the final query q, then the communication game has an efficient solution, with O(p|U_i|t_u\log n + \beta ^{i-1}t_u\log n ) communication cost, and success probability at least p^t. This holds for any choice of 0 < p < 1.

Before we prove the theorem, we consider specific parameters for our problem. If we pick

\begin{aligned} p &= 1 / \log ^5n, \\ t_u &= \log ^2 n, \\ t &= o(\log n / \log \log n), \end{aligned}

then, after plugging in the parameters, the communication cost is |U_i| / \log ^2 n. Note that, we could always trivially achieve |U_i| by having Alice send Bob all of U_i, so that he can compute the solution of the problem with no uncertainty. The success probability is (\log ^{-5} n)^{o(\log n / \log \log n)}, which simplifies to 2^{-o(\log n)} = 1 / n^{o(1)}. This is significantly better than 1 / n^{O(1)}, which could be achieved trivially by having Bob output a random answer to the query, independent of the updates.


We assume we have a data structure D for the update / query problem. Then Alice and Bob will proceed as follows:

Alice’s steps.

  1. Simulate D on U_r, U_{r - 1}, ... U_1. While doing so, keep track of memory cell accesses and compute A_r, A_{r-1}, ... A_1.
  2. Sample a random subset C \subset A_i, such that |C| = p|A_i|.
  3. Send C \cup A_{i-1} \cup A_{i-2} \cup ... A_1.

We note that in Alice’s Step 3, to send a cell, she sends a tuple holding the cell ID and the cell state before the query was executed. Also note that, she doesn’t distinguish to Bob which cells are in which sets of the union.

Bob’s steps.

  1. Receive C' from Alice.
  2. Simulate D on epochs U_{r}, U_{r-1}, ... U_{i+1}. Snapshot the current memory state of the data structure as M.
  3. Simulate the query algorithm. Every time q attempts to probe cell c, Bob checks if c \in C'. If it is, he lets D probe from C'. Otherwise, he lets D probe from M.
  4. Bob returns the result from the query algorithm as his answer.

If the query algorithm does not query any cell in A_i - C, then Bob succeeds, as he can exactly simulate the data structure query. Since the query will check t cells in A_i, and Bob has a random subset of them of size p|A_i|, then the probability that he got a subset the data structure will not probe is at least p^t. The communication cost is the cost of Alice sending the cells to Bob, which is

\begin{aligned} (p|A_i| + \sum _{j < i}|A_i|) \leq (pt_u + |U_i| + \beta ^{i-1}t_u)\log n\end{aligned}


4 Extension to 2D Range Parity

The extension to 2D range parity proceeds in nearly identical fashion, with a similar theorem relating data structures to communication games.

Theorem 1. Consider an arbitrary data structure problem where queries have 1-bit outputs. If there exists a data structure having:

  • update time t_u
  • query time t_q
  • Probes t cells from A_i when answering the last query q

Then there exists a protocol for the communication game with O(p|U_i|t_i\log n + t_u\beta ^{i-1}\log n ) bits of communication and success probability at least 1/2 + 2^{-O(\sqrt {t_q t (\log (1 / p)^3})}, for any choice of 0 < p < 1. Again, we plug in the parameters from 2D range parity. If we set

\begin{aligned} t_u = t_q &= o(\log ^{1.5}n / (\log \log n)^2), \\ t = t_q / r &= o(\log ^ (1/2) n / \log \log n), \\ p &= 1 / \log ^5 n, \end{aligned}

then the cost is |U_i| / \log ^2 n, and the probability simplifies to 1/2 + 1 / n^{o(1)}.

We note that, if we had Q = n^{O(1)} different queries, then randomly guessing on all of them, with constant probability we could be correct on as many as Q/2 \pm O(\sqrt {Q}). In this case, the probability of being correct on a single one, amortized, is 1/2 + 1/n^{\Theta (1)}.

Proof. The communication protocol will be slightly adjusted. We assume an a priori distribution on the updates and queries. Bob will then compute the posterior distribution, based on what he knows and what Alice sends him. He then computes the maximum likelihood answer to the query q. We thus need to figure out what Alice can send, so that the answer to q is often biased towards either 1 or 0.

We assume the existence of some public randomness available to both Alice and Bob. Then we adjust the communication protocol as follows:

Alice’s modified steps.

  • Alice samples, using the public randomness, a subset of ALL memory cells M_2, such that each cell is sampled with probability p. Alice sends M_2 \cap A_i to Bob. Since Bob can mimic the sampling, he gains additional information about which cells are and aren’t in A_i.

Bob’s modified steps.

  • Denote by S the set of memory cells probed by the data structure when Bob simulates the query algorithm. That is, S is what Bob ”thinks” D will probe during the query, as the actual set of cells may be different if Bob had full knowledge of the updates, and the data structure may use that information to determine what to probe. Bob will use S to compute the posterior distribution.

Define the function f(z) : [2^w] \rightarrow \mathbb {R} to be the ”bias” when S takes on the value z. In particular, this function is conditioned on C' that Bob receives from Alice. We can then clarify the definition of f as

\begin{aligned} f_{C'}(z) &:= (\text {Pr}[\text {ans to q } = 1 | C', S \leftarrow z] - 1/2) * \text {Pr}[S \leftarrow z | C'] \end{aligned}

In particular, f has the following two properties:

  1. \sum _z |f(z)| \leq 1
  2. \mathbb {E}_{C'}[\max _z |f(z)|] \geq 1/2 \cdot p^t

In these statements, the expectation is over everything that Bob knows, and the probabilities are also conditioned on everything that Bob knows. The randomness comes from what he doesn’t know. We also note that when the query probes no cells in A_i - C', then the bias is always 1/2, since the a posterior distribution will put all its weight on the correct answer of the query.

Finishing the proof requires the following lemma:

Lemma 2. For any f with the above two properties, there exists a Y \subseteq S such that |Y| \leq O(\sqrt {|S| \log 1/p^t}) and

\begin{aligned} \sum _{y \in Y} \left |\sum _{z | y} f(z) \right | &\geq 2^{-O(\sqrt {|S| \log 1 / p^t})}. \end{aligned}

Note that the sum inside the absolute values is the bias when Y \leftarrow y. \square


[FS89]   Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. In ACM Symp. on the Theory of Computing (STOC), pages 345–354, 1989.

How to buy a house II

After (reading?) my previous post, a bank agent suggested I get a “buyer-ready” mortgage commitment from the bank. This, they said, would make me “compete with cash buyers”. Naturally, I was suspicious, but they insisted that my offer would be “indistinguishable” from a cash offer, from the point of view of the seller. What can I tell you? I fell for the cs terminology.

I spent months fishing out, producing, and emailing back-and-forth documents. I found a little strange that my being tenured did not affect their evaluation of my financial stability the least. I thought I could provide a small but stable cash flow that they could reliably bleed white over the course of my remaining lifetime. The only logical explanation I have is that they benefit if I default. Instead, they were very curious about exactly why I wrote multiple checks for a few thousand dollars that were cashed in California?

The barrage of bureaucracy got to the point that I had to switch lender, in favor of someone who was less demanding in that department. At long last, I got back into the market, however only to find out that the document I had chased so hard was almost completely worthless. To explain in one word: appraisal.

This buyer-ready commitment is still contingent on appraisal. This means that after the offer is accepted, the bank still has to go there and see the property, and decide if it is valued right. Only in that case I get the mortgage. That means that the seller can’t be sure I have the dough, so why should they bother with me? Indeed, they don’t. The only slight advantage that this document provides is a little saving in time over someone who has to get a mortgage from scratch. But that has nothing to do with competing against cash buyers.

For the benefit of posterity, let me list the three main contingencies related to buying real-estate the old way.

MORTGAGE: This is whether the bank thinks that you (the buyer) are financially stable enough to be given a loan. This is the check that you can preprocess with the “buyer-ready commitment.”

APPRAISAL: As mentioned above, this is whether the bank thinks that the *property* is actually worth the money they put down. This can’t be done until after an offer is accepted, requires one-two appraisers, and guess who pays for them. In today’s crazy market when properties are sold way over asking price, you can’t be sure at all that the appraiser will say the house is worth what you pay for. At least, I can’t. And if they don’t, you are supposed to pay for the difference, which most likely you don’t have. For example, putting down all your savings of $200k, you can get a loan of $800k, for a purchase price of $1M. The house which you saw listed for $800k is sold for $1M, but the appraiser says the right price is $900k. Either you find another $100k quick, or you lose the 5% you gave at the purchase and sale (and the deal is over). Appraisal should not be confused with assessment, which is how much the town thinks the house is worth for tax purposes.

INSPECTION: OK, you can forget this. Moreover, from my experience a general inspection is nearly useless. If you are paying $1M for a house, why do you care if the boiler needs to be updated? Anything which interests me, like does this house have lead/asbestos/mold/structural damage/pest etc. the inspector can’t answer on the spot. For each of those things you need a different specialist, which you can’t get in time, and who can’t even do the job until the house is yours (because for example they can’t collect samples).

The running joke in the area where I am looking continues to be to list houses ridiculously below market price, and then have inexperienced families stress over their offers just to see them wiped out by yet another $1M cash. There are reasons slightly more subtle than my poverty why I think this is outrageous. Today’s house-buying protocol does nothing but force poor people into gambling desperate offers which could result in their financial ruin. Why don’t we also legalize Russian Roulette then? I think today’s protocol should be made illegal. That is, we should find a way so that someone with a mortgage has a fair shot at buying a house. There are several ways in which this could be realized. For example, the offers should not reveal the appraisal contingency. The fact that the buyer pays for the appraisal prevents them from making baseless offers. And the millionaires who offered less can wait one day for the appraisal to come back.

Nevertheless, after a 3-year ordeal, I am now a homeowner. Here’s how my offer went. First, it so happens that I was sick on that fateful Thursday. At around noon, a new listing pops up. The open house is scheduled for the week-end, so I might just wait for that, right? I instantly call and schedule a showing for the afternoon. At around 6PM, with effort I manage to get to the house. As usual, there are already 5 other interested parties, and the broker is busy scheduling more visits over the phone. At 9PM we put in an offer with a 16-hour deadline. The offer is completely “clean:” here’s the money, no contingencies, no questions asked. Moreover, it is over asking, though not by very much. My wife has not seen the property.

I then go to a pharmacy to buy medications. There I meet someone who was checking out the house at the same time as me! They say the house needs $.5M in works, which I later take as a move to kick me out of the competition. They also ask me if I’d be interested in putting in an offer.

To my astonishment, our offer is accepted on Friday morning. For once, I was the annoying person who took the property out of the market before the open house! There is however a small caveat: you wouldn’t think that the above gets you a house where you can actually live, would you?

Special Topics in Complexity Theory, Lecture 18

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lecture 18, Scribe: Giorgos Zirdelis

In this lecture we study lower bounds on data structures. First, we define the setting. We have n bits of data, stored in s bits of memory (the data structure) and want to answer m queries about the data. Each query is answered with d probes. There are two types of probes:

  • bit-probe which return one bit from the memory, and
  • cell-probe in which the memory is divided into cells of \log n bits, and each probe returns one cell.

The queries can be adaptive or non-adaptive. In the adaptive case, the data structure probes locations which may depend on the answer to previous probes. For bit-probes it means that we answer a query with depth-d decision trees.

Finally, there are two types of data structure problems:

  • The static case, in which we map the data to the memory arbitrarily and afterwards the memory remains unchanged.
  • The dynamic case, in which we have update queries that change the memory and also run in bounded time.

In this lecture we focus on the non-adaptive, bit-probe, and static setting. Some trivial extremes for this setting are the following. Any problem (i.e., collection of queries) admits data structures with the following parameters:

  • s=m and d=1, i.e. you write down all the answers, and
  • s=n and d=n, i.e. you can always answer a query about the data if you read the entire data.

Next, we review the best current lower bound, a bound proved in the 80’s by Siegel [Sie04] and rediscovered later. We state and prove the lower bound in a different way. The lower bound is for the problem of k-wise independence.

Problem 1. The data is a seed of size n=k \log m for a k-wise independent distribution over \{0,1\}^m. A query i is defined to be the i-th bit of the sample.

The question is: if we allow a little more space than seed length, can we compute such distributions fast?

Theorem 2. For the above problem with k=m^{1/3} it holds that

\begin{aligned} d \geq \Omega \left ( \frac {\lg m}{\lg (s/n)} \right ). \end{aligned}

It follows, that if s=O(n) then d is \Omega (\lg m). But if s=n^{1+\Omega (1)} then nothing is known.

Proof. Let p=1/m^{1/4d}. We have the memory of s bits and we are going to subsample it. Specifically, we will select a bit of s with probability p, independently.

The intuition is that we will shrink the memory but still answer a lot of queries, and derive a contradiction because of the seed length required to sample k-wise independence.

For the “shrinking” part we have the following. We expect to keep p\cdot s memory bits. By a Chernoff bound, it follows that we keep O(p\cdot s) bits except with probability 2^{-\Omega (p \cdot s)}.

For the “answer a lot of queries” part, recall that each query probes d bits from the memory. We keep one of the m queries if it so happens that we keep all the d bits that it probed in the memory. For a fixed query, the probability that we keep all its d probes is p^d = 1/m^{1/4}.

We claim that with probability at least 1/m^{O(1)}, we keep \sqrt {m} queries. This follows by Markov’s inequality. We expect to not keep m - m^{3/4} queries on average. We now apply Markov’s inequality to get that the probability that we don’t keep at least m - \sqrt {m} queries is at most (m - m^{3/4})/(m-\sqrt {m}).

Thus, if 2^{-\Omega (p\cdot s)} \leq 1/m^{O(1)}, then there exists a fixed choice of memory bits that we keep, to achieve both the “shrinking” part and the “answer a lot of queries” part as above. This inequality is true because s \geq n > m^{1/3} and so p \cdot s \ge m^{-1/4 + 1/3} = m^{\Omega (1)}. But now we have O(p \cdot s) bits of memory while still answering as many as \sqrt {m} queries.

The minimum seed length to answer that many queries while maintaining k-wise independence is k \log \sqrt {m} = \Omega (k \lg m) = \Omega (n). Therefore the memory has to be at least as big as the seed. This yields

\begin{aligned} O(ps) \ge \Omega (n) \end{aligned}

from which the result follows. \square

This lower bound holds even if the s memory bits are filled arbitrarily (rather than having entropy at most n). It can also be extended to adaptive cell probes.

We will now show a conceptually simple data structure which nearly matches the lower bound. Pick a random bipartite graph with s nodes on the left and m nodes on the right. Every node on the right side has degree d. We answer each probe with an XOR of its neighbor bits. By the Vazirani XOR lemma, it suffices to show that any subset S \subseteq [m] of at most k memory bits has an XOR which is unbiased. Hence it suffices that every subset S \subseteq [m] with |S| \leq k has a unique neighbor. For that, in turn, it suffices that S has a neighborhood of size greater than \frac {d |S|}{2} (because if every element in the neighborhood of S has two neighbors in S then S has a neighborhood of size < d|S|/2). We pick the graph at random and show by standard calculations that it has this property with non-zero probability.

\begin{aligned} & \Pr \left [ \exists S \subseteq [m], |S| \leq k, \textrm { s.t. } |\mathsf {neighborhood}(S)| \leq \frac {d |S|}{2} \right ] \\ & = \Pr \left [ \exists S \subseteq [m], |S| \leq k, \textrm { and } \exists T \subseteq [s], |T| \leq \frac {d|S|}{2} \textrm { s.t. all neighbors of S land in T} \right ] \\ & \leq \sum _{i=1}^k \binom {m}{i} \cdot \binom {s}{d \cdot i/2} \cdot \left (\frac {d \cdot i}{s}\right )^{d \cdot i} \\ & \leq \sum _{i=1}^k \left (\frac {e \cdot m}{i}\right )^i \cdot \left (\frac {e \cdot s} {d \cdot i/2}\right )^{d\cdot i/2} \cdot \left (\frac {d \cdot i}{s}\right )^{d \cdot i} \\ & = \sum _{i=1}^k \left (\frac {e \cdot m}{i}\right )^i \cdot \left (\frac {e \cdot d \cdot i/2}{s}\right )^{d \cdot i/2} \\ & = \sum _{i=1}^k \left [ \underbrace { \frac {e \cdot m}{i} \cdot \left (\frac {e \cdot d \cdot i/2}{s}\right )^{d/2} }_{C} \right ]^{i}. \end{aligned}

It suffices to have C \leq 1/2, so that the probability is strictly less than 1, because \sum _{i=1}^{k} 1/2^i = 1-2^{-k}. We can match the lower bound in two settings:

  • if s=m^{\epsilon } for some constant \epsilon , then d=O(1) suffices,
  • s=O(k \cdot \log m) and d=O(\lg m) suffices.

Remark 3. It is enough if the memory is (d\cdot k)-wise independent as opposed to completely uniform, so one can have n = d \cdot k \cdot \log s. An open question is if you can improve the seed length to optimal.

As remarked earlier the lower bound does not give anything when s is much larger than n. In particular it is not clear if it rules out d=2. Next we show a lower bound which applies to this case.

Problem 4. Take n bits to be a seed for 1/100-biased distribution over \{0,1\}^m. The queries, like before, are the bits of that distribution. Recall that n=O(\lg m).

Theorem 5. You need s = \Omega (m).

Proof. Every query is answered by looking at d=2 bits. But t = \Omega (m) queries are answered by the same 2-bit function f of probes (because there is a constant number of functions on 2-bits). There are two cases for f:

  1. f is linear (or affine). Suppose for the sake of contradiction that t>s. Then you have a linear dependence, because the space of linear functions on s bits is s. This implies that if you XOR those bits, you always get 0. This in turn contradicts the assumption that the distributions has small bias.
  2. f is AND (up to negating the input variables or the output). In this case, we keep collecting queries as long as they probe at least one new memory bit. If t > s when we stop we have a query left such that both their probes query bits that have already been queried. This means that there exist two queries q_1 and q_2 whose probes cover the probes of a third query q_3. This in turn implies that the queries are not close to uniform. That is because there exist answers to q_1 and q_2 that fix bits probed by them, and so also fix the bits probed by q_3. But this contradicts the small bias of the distribution.



[Sie04]   Alan Siegel. On universal classes of extremely random constant-time hash functions. SIAM J. on Computing, 33(3):505–543, 2004.

Special Topics in Complexity Theory, Lectures 16-17

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lectures 16-17, Scribe: Tanay Mehta

In these lectures we prove the corners theorem for pseudorandom groups, following Austin [Aus16]. Our exposition has several non-major differences with that in [Aus16], which may make it more computer-science friendly. The instructor suspects a proof can also be obtained via certain local modifications and simplifications of Green’s exposition [Gre05bGre05a] of an earlier proof for the abelian case. We focus on the case G = \textit {SL}_2(q) for simplicity, but the proof immediately extends to other pseudorandom groups.

Theorem 1. Let G = \textit {SL}_2(q). Every subset A \subseteq G^2 of density \mu (A) \geq 1/\log ^a |G| contains a corner, i.e., a set of the form \{(x, y), (xz, y), (x, zy) ~|~ z \neq 1\}.

1.1 Proof Overview

For intuition, suppose A is a product set, i.e., A = B \times C for B, C \subseteq G. Let’s look at the quantity

\begin{aligned}\mathbb {E}_{x, y, z \leftarrow G}[A(x, y) A(xz, y) A(x, zy)]\end{aligned}

where A(x, y) = 1 iff (x, y) \in A. Note that the random variable in the expectation is equal to 1 exactly when x, y, z form a corner in A. We’ll show that this quantity is greater than 1/|G|, which implies that A contains a corner (where z \neq 1). Since we are taking A = B \times C, we can rewrite the above quantity as

\begin{aligned} & \mathbb {E}_{x, y, z \leftarrow G}[B(x)C(y) B(xz)C(y) B(x)C(zy)] \\ & = \mathbb {E}_{x, y, z \leftarrow G}[B(x)C(y) B(xz)C(zy)] \\ & = \mathbb {E}_{x, y, z \leftarrow G}[B(x)C(y) B(z)C(x^{-1}zy)] \end{aligned}

where the last line follows by replacing z with x^{-1}z in the uniform distribution. If \mu (A) \ge \delta , then \mu (B) \ge \delta and \mu (C) \ge \delta . Condition on x \in B, y \in C, z \in B. Then the distribution x^{-1}zy is a product of three independent distributions, each uniform on a set of measure greater than \delta . By pseudorandomness x^{-1}zy is 1/|G|^{\Omega (1)} close to uniform in statistical distance. This implies that the above quantity equals

\begin{aligned} & \mu (A) \cdot \mu (C) \cdot \mu (B) \cdot \left (\mu (C) \pm \frac {1}{|G|^{\Omega (1)}}\right )\\ & \geq \delta ^3 \left ( \delta - \frac {1}{|G|^{\Omega (1)}} \right ) \\ & \geq \delta ^4 /2 \\ & > 1/|G|. \end{aligned}

Given this, it is natural to try to write an arbitrary A as a combination of product sets (with some error). We will make use of a more general result.

1.2 Weak Regularity Lemma

Let U be some universe (we will take U = G^2). Let f:~U \rightarrow [-1,1] be a function (for us, f = 1_A). Let D \subseteq \{d: U \rightarrow [-1,1]\} be some set of functions, which can be thought of as “easy functions” or “distinguishers.”

Theorem 2.[Weak Regularity Lemma] For all \epsilon > 0, there exists a function g := \sum _{i \le s} c_i \cdot d_i where d_i \in D, c_i \in \mathbb {R} and s = 1/\epsilon ^2 such that for all d \in D

\begin{aligned}\mathbb {E}_{x \leftarrow U}[f(x) \cdot d(x)] = \mathbb {E}_{x \leftarrow U}[g(x) \cdot d(x)] \pm \epsilon .\end{aligned}

The lemma is called ‘weak’ because it came after Szemerédi’s regularity lemma, which has a stronger distinguishing conclusion. However, the lemma is also ‘strong’ in the sense that Szemerédi’s regularity lemma has s as a tower of 1/\epsilon whereas here we have s polynomial in 1/\epsilon . The weak regularity lemma is also simpler. There also exists a proof of Szemerédi’s theorem (on arithmetic progressions), which uses weak regularity as opposed to the full regularity lemma used initially.

Proof. We will construct the approximation g through an iterative process producing functions g_0, g_1, \dots , g. We will show that ||f - g_i||_2^2 decreases by \ge \epsilon ^2 each iteration.

  1. Start: Define g_0 = 0 (which can be realized setting c_0 = 0).
  2. Iterate: If not done, there exists d \in D such that |\mathbb {E}[(f - g) \cdot d]| > \epsilon . Assume without loss of generality \mathbb {E}[(f - g) \cdot d] > \epsilon .
  3. Update: g' := g + \lambda d where \lambda \in \mathbb {R} shall be picked later.

Let us analyze the progress made by the algorithm.

\begin{aligned} ||f - g'||_2^2 &~ = \mathbb {E}_x[(f - g')^2(x)] \\ &~ = \mathbb {E}_x[(f - g - \lambda d)^2(x)] \\ &~ = \mathbb {E}_x[(f - g)^2] + \mathbb {E}_x[\lambda ^2 d^2 (x)] - 2\mathbb {E}_x[(f - g)\cdot \lambda d(x)] \\ &~ \leq ||f - g||_2^2 + \lambda ^2 - 2\lambda \mathbb {E}_x[(f-g)d(x)] \\ &~ \leq ||f - g||_2^2 + \lambda ^2 - 2\lambda \epsilon \\ &~ \leq ||f-g||_2^2 - \epsilon ^2 \end{aligned}

where the last line follows by taking \lambda = \epsilon . Therefore, there can only be 1/\epsilon ^2 iterations because ||f - g_0||_2^2 = ||f||_2^2 \leq 1. \square

1.3 Getting more for rectangles

Returning to the lower bound proof, we will use the weak regularity lemma to approximate the indicator function for arbitrary A by rectangles. That is, we take D to be the collection of indicator functions for all sets of the form S \times T for S, T \subseteq G. The weak regularity lemma gives us A as a linear combination of rectangles. These rectangles may overlap. However, we ideally want A to be a linear combination of non-overlapping rectangles.

Claim 3. Given a decomposition of A into rectangles from the weak regularity lemma with s functions, there exists a decomposition with 2^{O(s)} rectangles which don’t overlap.

Proof. Exercise. \square

In the above decomposition, note that it is natural to take the coefficients of rectangles to be the density of points in A that are in the rectangle. This gives rise to the following claim.

Claim 4. The weights of the rectangles in the above claim can be the average of f in the rectangle, at the cost of doubling the distinguisher error.

Consequently, we have that f = g + h, where g is the sum of 2^{O(s)} non-overlapping rectangles S \times T with coefficients \Pr _{(x, y) \in S \times T}[f(x, y) = 1].

Proof. Let g be a partition decomposition with arbitrary weights. Let g' be a partition decomposition with weights being the average of f. It is enough to show that for all rectangle distinguishers d \in D

\begin{aligned}|\mathbb {E}[(f-g')d]| \leq |\mathbb {E}[(f-g)d]|.\end{aligned}

By the triangle inequality, we have that

\begin{aligned}|\mathbb {E}[(f-g')d]| \leq |\mathbb {E}[(f-g)d]| + |\mathbb {E}[(g-g')d]|.\end{aligned}

To bound \mathbb {E}[(g-g')d]|, note that the error is maximized for a d that respects the decomposition in non-overlapping rectangles, i.e., d is the union of some non-overlapping rectangles from the decomposition. This can be argues using that, unlike f, the value of g and g' on a rectangle S\times T from the decomposition is fixed. But, for such d, g' = f! More formally, \mathbb {E}[(g-g')d] = \mathbb {E}[(g-f)d]. \square

We need to get a little more from this decomposition. The conclusion of the regularity lemma holds with respect to distinguishers that can be written as U(x) \cdot V(y) where U and V map G \to \{0,1\}. We need the same guarantee for U and V with range [-1,1]. This can be accomplished paying only a constant factor in the error, as follows. Let U and V have range [-1,1]. Write U = U_+ - U_- where U_+ and U_- have range [0,1], and the same for V. The error for distinguisher U \cdot V is at most the sum of the errors for distinguishers U_+ \cdot V_+, U_+ \cdot V_-, U_- \cdot V_+, and U_- \cdot V_-. So we can restrict our attention to distinguishers U(x) \cdot V(y) where U and V have range [0,1]. In turn, a function U(x) with range [0,1] can be written as an expectation \mathbb{E} _a U_a(x) for functions U_a with range \{0,1\}, and the same for V. We conclude by observing that

\begin{aligned} \mathbb{E} _{x,y}[ (f-g)(x,y) \mathbb{E} _a U_a(x) \cdot \mathbb{E} _b V_b(y)] \le \max _{a,b} \mathbb{E} _{x,y}[ (f-g)(x,y) U_a(x) \cdot V_b(y)].\end{aligned}

1.4 Proof

Let us now finish the proof by showing a corner exists for sufficiently dense sets A \subseteq G^2. We’ll use three types of decompositions for f: G^2 \rightarrow \{0,1\}, with respect to the following three types of distinguishers, where U_i and V_i have range \{0,1\}:

  1. U_1(x) \cdot V_1(y),
  2. U_2(xy) \cdot V_2(y),
  3. U_3(x) \cdot V_3(xy).

The last two distinguishers can be visualized as parallelograms with a 45-degree angle between two segments. The same extra properties we discussed for rectangles hold for them too.

Recall that we want to show

\begin{aligned}\mathbb {E}_{x, y, g}[f(x, y) f(xg, y) f(x, gy)] > \frac {1}{|G|}.\end{aligned}

We’ll decompose the i-th occurrence of f via the i-th decomposition listed above. We’ll write this decomposition as f = g_i + h_i. We do this in the following order:

\begin{aligned} & ~f(x, y) \cdot f(xg, y) \cdot f(x, gy) \\ = & ~f(x, y) f(xg, y) g_3(x, gy) + f(x, y) f(xg, y) h_3(x, gy) \\ &~ \vdots \\ =&~ g_1 g_2 g_3 + h_1 g_2 g_3 + f h_2 g_3 + f f h_3 \end{aligned}

We first show that \mathbb{E} [g_1 g_2 g_3] is big (i.e., inverse polylogarithmic in expectation) in the next two claims. Then we show that the expectations of the other terms are small.

Claim 5. For all g \in G, the values \mathbb {E}_{x, y}[g_1(x, y) g_2(xg, y) g_3(x, gy)] are the same (over g) up to an error of 2^{O(s)} \cdot 1/|G|^{\Omega (1)}.

Proof. We just need to get error 1/|G|^{\Omega (1)} for any product of three functions for the three decomposition types. By the standard pseudorandomness argument we saw in previous lectures,

\begin{aligned} \mathbb {E}_{x, y}[c_1 U_1(x)V_1(y) \cdot c_2 U_2(xgy)V_2(y) \cdot c_3 U_3(x)V_3(xgy)] \\ = c_1 c_2 c_3 \mathbb {E}_{x, y}[(U_1 \cdot U_3)(x) (V_1 \cdot V_2)(y) (U_2 \cdot V_3)(xgy)] \\ = c_1 c_2 c_3 \cdot \mu (U_1 \cdot U_3) \mu (V_1 \cdot V_2) \mu (U_2 \cdot V_3) \pm \frac {1}{|G|^{\Omega (1)}}. \end{aligned}


Recall that we start with a set of density \ge 1/\log ^{a} |G|.

Claim 6. \mathbb {E}_{g, x, y}[g_1 g_2 g_3] > \Omega (1/\log ^{4a} |G|).

Proof. By the previous claim, we can fix g = 1_G. We will relate the expectation over x, y to f by a trick using the Hölder inequality: For random variables X_1, X_2, \ldots , X_k,

\begin{aligned}\mathbb {E}[X_1 \dots X_k] \leq \prod _{i=1}^k \mathbb {E}[X_i^{c_i}]^{1/c_i} \text { such that } \sum 1/c_i = 1.\end{aligned}

To apply this inequality in our setting, write

\begin{aligned}\mathbb {E}[f] = \mathbb {E}\left [(f \cdot g_1 g_2 g_3)^{1/4} \cdot \left (\frac {f}{g_1}\right )^{1/4}\cdot \left (\frac {f}{g_2}\right )^{1/4}\cdot \left (\frac {f}{g_3}\right )^{1/4}\right ].\end{aligned}

By the Hölder inequality, we get that

\begin{aligned}\mathbb {E}[f] \leq \mathbb {E}[f \cdot g_1 g_2 g_3]^{1/4} \mathbb {E}\left [\frac {f}{g_1}\right ]^{1/4} \mathbb {E}\left [\frac {f}{g_2}\right ]^{1/4} \mathbb {E}\left [\frac {f}{g_3}\right ]^{1/4}.\end{aligned}

Note that

\begin{aligned} \mathbb {E}_{x, y} \frac {f(x,y)}{g_1(x, y)} & = \mathbb {E}_{x, y} \frac {f(x, y)}{\mathbb {E}_{x', y' \in \textit {Cell}(x,y)}[f(x', y')] } \\ & = \mathbb {E}_{x, y} \frac {\mathbb {E}_{x', y' \in \textit {Cell}(x, y)}[f(x',y')]}{\mathbb {E}_{x', y' \in \textit {Cell}(x,y)}[f(x', y')] }\\ & = 1 \end{aligned}

where \textit {Cell}(x, y) is the set in the partition that contains (x, y). Finally, by non-negativity of f, we have that \mathbb {E}[f \cdot g_1 g_2 g_3]^{1/4} \leq \mathbb {E}[g_1 g_2 g_3]. This concludes the proof. \square

We’ve shown that the g_1 g_2 g_3 term is big. It remains to show the other terms are small. Let \epsilon be the error in the weak regularity lemma with respect to distinguishers with range [-1,1].

Claim 7. |\mathbb {E}[f f h_3]| \leq \epsilon ^{1/4}.

Proof. Replace g with gy^{-1} in the uniform distribution to get

\begin{aligned} & \mathbb {E}^4_{x, y, g}[f(x,y) f(xg,y)h_3(x, gy)] \\ & = \mathbb {E}^4_{x, y, g}[f(x,y) f(xgy^{-1},y)h_3(x, g)] \\ & = \mathbb {E}^4_{x, y}[f(x,y) \mathbb {E}_g [f(xgy^{-1},y)h_3(x, g)]] \\ & \leq \mathbb {E}^2_{x, y} [f^2(x, y)] \mathbb {E}^2_{x, y} \mathbb {E}^2_g [f(xgy^{-1},y)h_3(x, g)]\\ & \leq \mathbb {E}^2_{x, y} \mathbb {E}^2_g [f(xgy^{-1},y)h_3(x, g)]\\ & = \mathbb {E}^2_{x, y, g, g'}[f(xgy^{-1}, y) h_3(x, g) f(xg'y^{-1}, y) h_3(x, g')], \end{aligned}

where the first inequality is by Cauchy-Schwarz.

Now replace g \rightarrow x^{-1}g, g' \rightarrow x^{-1}g and reason in the same way:

\begin{aligned} & = \mathbb {E}^2_{x, y, g, g'}[f(gy^{-1}, y) h_3(x, x^{-1}g) f(g'y^{-1}, y) h_3(x, x^{-1}g')] \\ & = \mathbb {E}^2_{g, g', y}[f(gy^{-1}, y) \cdot f(g'y^{-1}, y) \mathbb {E}_x [h_3(x, x^{-1}g) \cdot h_3(x, x^{-1}g')]] \\ & \leq \mathbb {E}_{x,x',g,g'}[h_3(x, x^{-1}g) h_3(x, x^{-1}g') h_3(x', x'^{-1}g) h_3(x', x'^{-1}g')]. \end{aligned}

Replace g \rightarrow xg to rewrite the expectation as

\begin{aligned} \mathbb {E}[h_3(x, g) h_3(x, x^{-1}g') h_3(x', x'^{-1}xg) h_3(x', x'^{-1}g')].\end{aligned}

We want to view the last three terms as a distinguisher U(x) \cdot V(xg). First, note that h_3 has range [-1,1]. This is because h_3(x,y) = f(x,y) - \mathbb{E} _{x', y' \in \textit {Cell}(x,y)} f(x',y') and f has range \{0,1\}.

Fix x', g'. The last term in the expectation becomes a constant c \in [-1,1]. The second term only depends on x, and the third only on xg. Hence for appropriate functions U and V with range [-1,1] this expectation can be rewritten as

\begin{aligned} \mathbb {E}[h_3(x, g) U(x) V(xg)], \end{aligned}

which concludes the proof. \square

There are similar proofs to show the remaining terms are small. For fh_2g_3, we can perform simple manipulations and then reduce to the above case. For h_1 g_2 g_3, we have a slightly easier proof than above.

1.4.1 Parameters

Suppose our set has density \delta \ge 1/\log ^a |G|. We apply the weak regularity lemma for error \epsilon = 1/\log ^c |G|. This yields the number of functions s = 2^{O(1/\epsilon ^2)} = 2^{O(\log ^{2c} |G|)}. For say c = 1/3, we can bound \mathbb{E} _{x,y,g}[g_1 g_2 g_3] from below by the same expectation with g fixed to 1, up to an error 1/|G|^{\Omega (1)}. Then, \mathbb {E}_{x,y,g=1}[g_1g_2g_3] \geq \mathbb {E}[f]^4 = 1/\log ^{4a}|G|. The expectation of terms with h is less than 1/\log ^{c/4} |G|. So the proof can be completed for all sufficiently small a.


[Aus16]    Tim Austin. Ajtai-Szemerédi theorems over quasirandom groups. In Recent trends in combinatorics, volume 159 of IMA Vol. Math. Appl., pages 453–484. Springer, [Cham], 2016.

[Gre05a]   Ben Green. An argument of shkredov in the finite field setting, 2005. Available at people.maths.ox.ac.uk/greenbj/papers/corners.pdf.

[Gre05b]   Ben Green. Finite field models in additive combinatorics. Surveys in Combinatorics, London Math. Soc. Lecture Notes 327, 1-27, 2005.

Special Topics in Complexity Theory, Lecture 15

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lecture 15, Scribe: Chin Ho Lee

In this lecture fragment we discuss multiparty communication complexity, especially the problem of separating deterministic and randomized communication, which we connect to a problem in combinatorics.

2 Number-on-forehead communication complexity

In number-on-forehead (NOH) communication complexity each party i sees all of the input (x_1, \dotsc , x_k) except its own input x_i. For background, it is not known how to prove negative results for k \ge \log n parties. We shall focus on the problem of separating deterministic and randomizes communication. For k = 2, we know the optimal separation: The equality function requires \Omega (n) communication for deterministic protocols, but can be solved using O(1) communication if we allow the protocols to use public coins. For k = 3, the best known separation between deterministic and randomized protocol is \Omega (\log n) vs O(1) [BDPW10]. In the following we give a new proof of this result, for a simpler function: f(x, y, z) = 1 if and only if x \cdot y \cdot z = 1 for x, y, z \in SL_2(q).

For context, let us state and prove the upper bound for randomized communication.

Claim 1. f has randomized communication complexity O(1).

Proof. In the NOH model, computing f reduces to 2-party equality with no additional communication: Alice computes y \cdot z =: w privately, then Alice and Bob check if x = w^{-1}. \square

To prove a \Omega (\log n) lower bound for deterministic protocols, where n = \log |G|, we reduce the communication problem to a combinatorial problem.

Definition 2. A corner in a group G is \{ (x,y), (xz, y), (x,zy) \} \subseteq G^2, where x, y are arbitrary group elements and z \neq 1_G.

For intuition, consider the case when G is Abelian, where one can replace multiplication by addition and a corner becomes \{ (x, y), (x + z, y), (x, y + z)\} for z \neq 0.

We now state the theorem that gives the lower bound.

Theorem 3. Suppose that every subset A \subseteq G^2 with \mu (A) := |A|/|G^2| \ge \delta contains a corner. Then the deterministic communication complexity of f(x, y, z) = 1 \iff x \cdot y \cdot z = 1_G is \Omega (\log (1/\delta )).

It is known that when G is Abelian, then \delta \ge 1/\mathrm {polyloglog}|G| implies a corner. We shall prove that when G = SL_2(q), then \delta \ge 1/\mathrm {polylog}|G| implies a corner. This in turn implies communication \Omega (\log \log |G|) = \Omega (\log n).

Proof. We saw that a number-in-hand (NIH) c-bit protocol can be written as a disjoint union of 2^c rectangles. Likewise, a number-on-forehead c-bit protocol P can be written as a disjoint union of 2^c cylinder intersections C_i := \{ (x, y, z) : f_i(y,z) g_i(x,z) h_i(x,y) = 1\} for some f_i, g_i, h_i\colon G^2 \to \{0, 1\}:

\begin{aligned} P(x,y,z) = \sum _{i=1}^{2^c} f_i(y,z) g_i(x,z) h_i(x,y). \end{aligned}

The proof idea of the above fact is to consider the 2^c transcripts of P, then one can see that the inputs giving a fixed transcript are a cylinder intersection.

Let P be a c-bit protocol. Consider the inputs \{(x, y, (xy)^{-1}) \} on which P accepts. Note that at least 2^{-c} fraction of them are accepted by some cylinder intersection C. Let A := \{ (x,y) : (x, y, (xy)^{-1}) \in C \} \subseteq G^2. Since the first two elements in the tuple determine the last, we have \mu (A) \ge 2^{-c}.

Now suppose A contains a corner \{ (x, y), (xz, y), (x, zy) \}. Then

\begin{aligned} (x,y) \in A &\implies (x, y, (xy)^{-1}) \in C &&\implies h(x, y) = 1 , \\ (xz,y) \in A &\implies (xz, y, (xzy)^{-1}) \in C &&\implies f(y,(xyz)^{-1}) = 1 , \\ (x,zy) \in A &\implies (x, zy, (xzy)^{-1}) \in C &&\implies g(x,(xyz)^{-1}) = 1 . \end{aligned}

This implies (x,y,(xzy)^{-1}) \in C, which is a contradiction because z \neq 1 and so x \cdot y \cdot (xzy)^{-1} \neq 1_G. \square


[BDPW10]   Paul Beame, Matei David, Toniann Pitassi, and Philipp Woelfel. Separating deterministic from randomized multiparty communication complexity. Theory of Computing, 6(1):201–225, 2010.