# shoving marijuana down the throats of Newton’s residents

Congratulations to the marijuana industry and the Newton MA administration for rigging the elections and pouring > \$70K into a campaign strategist who lives in a neighboring city where recreational pot shops are banned, thereby snatching a narrow victory and shoving marijuana down the throats of Newton’s residents. When the pot shops open, owned by people who live in the same neighboring city which does not have them, I’ll have a toast to you with a marijuana drink.

Well, I think I am taking a break from politics, at least until I have a stronger financial backing. I have a bigger impact on society with my research.

# to opt out must vote no to sham 2-4 “limit” and vote yes to Opt Out

Even if you don’t live in Newton, MA, it may interest you to know how the marijuana industry is doing everything it can to win this ballot, including rigging the election twice (one and two), and even hiring a national, professional political consulting company. To know more see the opt out website.

# Just coincidence?

Proving lower bounds is one of the greatest intellectual challenges of our time. Something that strikes me is when people reach the same bounds from seemingly different angles.  Two recent examples:

• Static Data Structure Lower Bounds Imply Rigidity, by Golovnev, Dvir, Weinstein.  They show that improving static data-structure lower bounds, for linear data structures, implies new lower bounds for matrix rigidity.  My understanding (the paper isn’t out) is that the available weak but non-trivial data structure lower bounds imply the available weak but non-trivial rigidity lower bounds, and there is absolutely no room for improvement on the former without improving the latter.
• Toward the KRW Composition Conjecture: Cubic Formula Lower Bounds via Communication Complexity, by Dinur and Meir.  They reprove the $n^3$ bound on formula size via seemingly different techniques.

What does this mean?  Again, the only things that matter are those that you can prove.  Still, here are some options:

• Lower bounds are true, and provable with the bag of tricks people are using.  The above is just coincidence. Given the above examples (and others) I find this possibility quite bizarre. To illustrate the bizarre in a bizarre way, imagine a graph where one edge is a trick from the bag, and each node is a bound. Why should different paths lead to the same sink, over and over again?
• Lower bounds are true, but you need to use a different bag of tricks. My impression is that two types of results are available here.  The first is for “infinitary” proof systems, and includes famous results like the Paris-Harrington theorem. The second is for “finitary” proof systems, and includes results like Razborov’s proof that superpolynomial lower bounds cannot be proved in Res(k). What I really would like is a survey that explains what these and all other relevant proof systems are and can do, and what would it mean to either strengthen the proof system or make the unprovable statement closer to the state-of-the-art. (I don’t even have the excuse of not having a background in logic.  I took classes both in Italy and in the USA.  In Italy I went to a summer school in logic, and took the logic class in the math department.  It was a rather tough class, one of the last offerings before the teacher was forced to water it down.  If I remember correctly, it lasted an entire year (though now it seems a lot).  As in the European tradition, at least of the time, instruction was mostly one-way: you’d sit there for hours each week and just swallow this avalanche of material. At the very end, there was an oral exam where you sit with the instructor — face-to-face — and they mostly ask you to repeat random bits of the lectures.  But for the bright student some simple original problems are also asked — to be solved on the spot.  So there is substantial focus on memorization, a word which has acquired a negative connotation, some of which I sympathize with.  However a 30-minute oral exam does have its benefits, and on certain aspects I’d argue it can’t quite be replaced by written exams, let alone take-home.  But I digress.)
• Lower bounds are false. That is, all “simple” functions have say $n^3$ formula size.  You can prove this using computational checkpoints, a notion which in hindsight isn’t too complicated, but alas has not yet been invented.  To me, this remains the more likely option.

What do you think?

# How to rig an election

After the historic signature collection there was a pitched battle to decide which questions to put on the ballot.  Alas, the battle resulted in somewhat of a defeat for the residents of Newton.  The councilors of Newton saw it fit to put two conflicting questions on the ballot, and to resolve the conflict by stipulating that if both questions pass, the one with the highest number of yes votes will prevail. As explained below, this forces residents to strategize, take a risk, and in a way answer questions against their true preference — a well-known, and bad, situation in election theory.

The two questions are:

• Question 1:  Shall the City adopt the following general ordinance?
All recreational marijuana retail establishments shall be prohibited from operating in the City of NewtonCouncilors unanimously approved the inclusion of this question on the ballot.
• Question 2:  Shall the City adopt the following zoning ordinance?
The number of recreational marijuana retail establishments shall be not fewer than two (2) nor more than four (4). Councilors approved the inclusion of this question on the ballot by a vote of 11 to 10.

Yes, the motion to put Question 2 on the ballot passed by 1 vote. Each of those 11 councilors can go home feeling satisfied that they bear full responsibility for ignoring the clear preference of their constituents.  It doesn’t matter what the chief of the Newton police says, or what the former head of the Newton-Wellsely hospital says, or what any of the other dozens of high-profile people say, or that you collected thousands of signatures.  Those 11 councilors know what’s best for Newton. (Oh, and by the way, the upper bound is meaningless and can be easily increased. )

Before they convened to deliberate I sent them this message:

• If you want to put another question on the ballot besides a simple YES/NO question, then you should first collect 7,000 signatures.

I doubt they could have even collected 70 for Question 2.

But the real problem is the rule I mentioned before, that if both questions have a majority of yes votes, the one with the highest number of yes votes will prevail.  To illustrate, consider the following realistic scenario.  Suppose that a resident of Newton loathes recreational marijuana establishments.  When they go to the ballot, they obviously vote yes on Question 1.  What should they do about Question 2?  If Question 1 loses, they are better off if Question 2 wins.  Suppose they also vote yes on 2, and that 99% of Newton residents behaves this way. Then it’s enough that a merry 1% band of business(wo)men vote no on Question 1 and yes on Question 2, and they harness all the votes that people cast to their own advantage.

There do exist fair ways of having both questions on the ballot, but this isn’t one. The current setup forces people who really want to ban recreational marijuana to strategize by voting no on question 2, and risk that if Question 1 loses, they end up with unlimited recreational stores.

Maybe it’s a little hard to understand this in terms of marijuana.  Consider the following scenario:

1. Question 1: Do you want to ban torture?
2. Question 2: Do you want to limit the amount of torture that can be inflicted upon you?
3. Default: Unlimited torture can be inflicted upon you.
4. If both Questions 1 and 2 have majority Yes, the one with the highest number of yes prevails.

It is not going to be easy, but it seems that in the upcoming campaign we will have to convince people to answer ‘NO’ to question 2.

# ​​​​​The residents of Newton vs. the marijuana industry: 1-1

With a historic effort, the residents of Newton MA have collected in a very short time 6,000+ signatures thanks to which a forthcoming ballot will include a question on banning recreational Marijuana sales in the city. (For background see the previous post and comments.)

# bounded independence plus noise fools space

There are many classes of functions on $n$ bits that we know are fooled by bounded independence, including small-depth circuits, halfspaces, etc. (See this previous post.)

On the other hand the simple parity function is not fooled. It’s easy to see that you require independence at least $n-1$. However, if you just perturb the bits with a little noise $N$, then parity will be fooled. You can find other examples of functions that are not fooled by bounded independence alone, but are if you just perturb the bits a little.

In [3] we proved that any distribution with independence about $n^{2/3}$ fools space-bounded algorithms, if you perturb it with noise. We asked, both in the paper and many people, if the independence could be lowered. Forbes and Kelley have recently proved [2] that the independence can be lowered all the way to $O(\log n)$, which is tight [1]. Shockingly, their proof is nearly identical to [3]!

This exciting result has several interesting consequences. First, we now have almost the same generators for space-bounded computation in a fixed order as we do for any order. Moreover, the proof greatly simplifies a number of works in the literature. And finally, an approach in [4] to prove limitations for the sum of small-bias generators won’t work for space (possibly justifying some optimism in the power of the sum of small-bias generators).

My understanding of all this area is inseparable from the collaboration I have had with Chin Ho Lee, with whom I co-authored all the papers I have on this topic.

### The proof

Let $f:\{0,1\}^{n}\to \{0,1\}$ be a function. We want to show that it is fooled by $D+E$, where $D$ has independence $k$, $E$ is the noise vector of i.i.d. bits coming up $1$ with probability say $1/4$, and $+$ is bit-wise XOR.

The approach in [3] is to decompose $f$ as the sum of a function $L$ with Fourier degree $k$, and a sum of $t$ functions $H_{i}=h_{i}\cdot g_{i}$ where $h_{i}$ has no Fourier coefficient of degree less than $k$, and $h_{i}$ and $g_{i}$ are bounded. The function $L$ is immediately fooled by $D$, and it is shown in [3] that each $H_{i}$ is fooled as well.

To explain the decomposition it is best to think of $f$ as the product of $\ell :=n/k$ functions $f_{i}$ on $k$ bits, on disjoint inputs. The decomposition in [3] is as follows: repeatedly decompose each $f_{i}$ in low-degree $f_{L}$ and high-degree $f_{H}$. To illustrate:

\begin{aligned} f_{1}f_{2}f_{3} & =f_{1}f_{2}(f_{3H}+f_{3L})=f_{1}f_{2}f_{3H}+f_{1}(f_{2H}+f_{2L})f_{3L}=\ldots \\ = & f_{1H}f_{2L}f_{3L}+f_{1}f_{2H}f_{3L}+f_{1}f_{2}f_{3H}+f_{1L}f_{2L}f_{3L}\\ = & H_{1}+H_{2}+H_{3}+L. \end{aligned}

This works, but the problem is that even if each time $f_{iL}$ has degree $1$, the function $L$ increases the degree by at least $1$ per decomposition; and so we can afford at most $k$ decompositions.

The decomposition in [2] is instead: pick $L$ to be the degree $k$ part of $f$, and $H_{i}$ are all the Fourier coefficients which are non-zero in the inputs to $f_{i}$ and whose degree in the inputs of $f_{1},\ldots ,f_{i}$ is $\ge k$. The functions $H_{i}$ can be written as $h_{i}\cdot g_{i}$, where $h_{i}$ is the high-degree part of $f_{1}\cdots f_{i}$ and $h_{i}$ is $f_{i+1}\cdots f_{\ell }$.

Once you have this decomposition you can apply the same lemmas in [3] to get improved bounds. To handle space-bounded computation they extend this argument to matrix-valued functions.

### What’s next

In [3] we asked for tight “bounded independence plus noise” results for any model, and the question remains. In particular, what about high-degree polynomials modulo $2$?

### References

[1]   Ravi Boppana, Johan Håstad, Chin Ho Lee, and Emanuele Viola. Bounded independence vs. moduli. In Workshop on Randomization and Computation (RANDOM), 2016.

[2]   Michael A. Forbes and Zander Kelley. Pseudorandom generators for read-once branching programs, in any order. In IEEE Symp. on Foundations of Computer Science (FOCS), 2018.

[3]   Elad Haramaty, Chin Ho Lee, and Emanuele Viola. Bounded independence plus noise fools products. SIAM J. on Computing, 47(2):295–615, 2018.

[4]   Chin Ho Lee and Emanuele Viola. Some limitations of the sum of small-bias distributions. Theory of Computing, 13, 2017.

# Environmental obstacles

Former EPA chief’s resignation confession-of-faith letter according to Breitbart (a website I didn’t know but that I started consulting semi-regularly):

Elsewhere

### My desire in service to you has always been to bless you as you make important decisions for the American people. I believe you are serving as President today because of God’s providence. I believe that same providence brought me into your service. I pray as I have served you that I have blessed you and enabled you to effectively lead the American people. Thank you again Mr. President for the honor of serving you and I wish you Godspeed in all that you put your hand to.

The letter also makes me think that I should have added “to worship God” to this list.

The EPA chief is approved by Congress. So if you care about your health get ready for November.  If you’ll be traveling start looking into absentee voting for your state.

# NEWTON MUST NOT BECOME THE HUB OF MARIJUANA

If you are a resident of Newton, MA, sign this petition.

In 2016 Massachusetts voters voted to legalize Marijuana. Except they didn’t know what they were voting for! In Colorado and Washington, the question of legalization and commercialization were completely separate. The marijuana industry apparently learned from that and rigged the Massachusetts ballot question so that a voter legalizing marijuana would also be mandating communities to open marijuana stores. For Newton, MA, this means at least 8 stores. When voters were recently polled, it became clear that the vast majority did not know that this was at stake, and that the majority of them in fact does not want to open marijuana stores in their communities. For example, when I voted I didn’t know that this was at stake. Read the official Massachusetts document to inform voters, see especially the summary on pages 12-13. There is no hint that a community would be mandated by state law to open marijuana stores unless it goes through an additional legislative crusade. Instead it says that communities can choose. I think I even read the summary back then.

Now to avoid opening stores in Newton, MA, we need a new ballot question. The City Council could have put this question on the ballot easily, but a few days ago decided that it won’t by a vote of 13 to 8. You can find the list of names of councilors and how they voted here.

Note that the council was not deciding whether or not to open stores, it was just deciding whether or not we should have a question about this on the ballot.

Instead now we are stuck doing things the hard way. To put this question on the ballot, we need to collect 6000 signatures, or 9000 if the city is completely uncooperative, a possibility which now unfortunately cannot be dismissed.

However we must do it, for the alternative is too awful. Most of the surrounding towns (Wellesley, Weston, Needham, Dedham, etc.) have already opted out. So if Newton opens stores, it basically becomes the hub for west suburban marijuana users, at least some of whom would drive under the influence of marijuana (conveniently undetectable). Proposed store locations include sites on the way to elementary schools, and there is an amusing proposal to open a marijuana store in a prime Newton Center Location, after Peet’s Coffee moves out (they lost the bid for renewal of the lease). The owners of the space admit that people have asked them for a small grocery store instead, but they think that a marijuana store would bring more traffic and business to Newton Center. I told them to open a gym instead. That too would bring traffic and business, but in addition it would have other benefits that cannabis does not have.

# l2w

This is the post about l2w version 1.0, a Latex to WordPress converter painstakingly put together by me with big help from the LaTeX community. Click here to download it. Below is an example of what you can do, taken at random from my class notes which were compiled with this script. I also used this in conjunction with Lyx for several posts such as I believe P=NP, so you can also call this a Lyx to WordPress converter. I just export to latex and then run l2w.

This might work out of the box. More in detail, it needs tex4ht (which is included e.g. in MiKTeX distributions) and Perl (the script only uses minimalistic, shell perl commands). Simply unzip l2w.zip, which contains four files. The file post.tex is this document, which you can edit. To compile, run l2w.bat (which calls myConfig5.cfg). This will create the output post.html which you can copy and past in the wordpress HTML editor. I have tested it on an old Windows XP machine, and a more recent Windows 7 with MixTeX 2.9. I haven’t tested it on linux, which might require some simple changes to l2w.bat. For LyX I add certain commands in the preamble, and as an example the .lyx source of the post I believe P=NP is included in the zip archive.

The non-math source is compiled using full-fledged LaTeX, which means you can use your own macros and bibliography. The math source is not compiled, but more or less left as is for wordpress, which has its own LaTeX interpreter. This means that you can’t use your own macros in math mode. For the same reason, label and ref of equations are a problem. To make them work, the script fetches their values from the .aux file and then crudely applies them. This is a hack with a rather unreadable script; however, it works for me. One catch: your labels should start with eq:.

I hope this will spare you the enormous amount of time it took me to arrive to this solution. Let me know if you use it!

### 1 Example of what you can do

First, some of the problematic math references:

\begin{aligned} x = 2 ~~~~(1) \end{aligned}

Equation (1).

Next, some weird font stuff: $\mathbb {A}$, $\mathrm {A}$, $\text {A}$.

Lemma 1. Suppose that distributions $A^0, A^1$ over $\{0,1\}^{n_A}$ are $k_A$-wise indistinguishable distributions; and distributions $B^0, B^1$ over $\{0,1\}^{n_B}$ are $k_B$-wise indistinguishable distributions. Define $C^0, C^1$ over $\{0,1\}^{n_A \cdot n_B}$ as follows:

$C^b$: draw a sample $x \in \{0,1\}^{n_A}$ from $A^b$, and replace each bit $x_i$ by a sample of $B^{x_i}$ (independently).

Then $C^0$ and $C^1$ are $k_A \cdot k_B$-wise indistinguishable.

To finish the proof of the lower bound on the approximate degree of the AND-OR function, it remains to see that AND-OR can distinguish well the distributions $C^0$ and $C^1$. For this, we begin with observing that we can assume without loss of generality that the distributions have disjoint supports.

Claim 2. For any function $f$, and for any $k$-wise indistinguishable distributions $A^0$ and $A^1$, if $f$ can distinguish $A^0$ and $A^1$ with probability $\epsilon$ then there are distributions $B^0$ and $B^1$ with the same properties ($k$-wise indistinguishability yet distinguishable by $f$) and also with disjoint supports. (By disjoint support we mean for any $x$ either $\Pr [B^0 = x] = 0$ or $\Pr [B^1 = x] = 0$.)

Proof. Let distribution $C$ be the “common part” of $A^0$ and $A^1$. That is to say, we define $C$ such that $\Pr [C = x] := \min \{\Pr [A^0 = x], \Pr [A^1 = x]\}$ multiplied by some constant that normalize $C$ into a distribution.

Then we can write $A^0$ and $A^1$ as

\begin{aligned} A^0 &= pC + (1-p) B^0 \,,\\ A^1 &= pC + (1-p) B^1 \,, \end{aligned}

where $p \in [0,1]$, $B^0$ and $B^1$ are two distributions. Clearly $B^0$ and $B^1$ have disjoint supports.

Then we have

\begin{aligned} \mathbb {E}[f(A^0)] - \mathbb {E}[f(A^1)] =&~p \mathbb {E}[f(C)] + (1-p) \mathbb {E}[f(B^0)] \notag \\ &- p \mathbb {E}[f(C)] - (1-p) \mathbb {E}[f(B^1)] \\ =&~(1-p) \big ( \mathbb {E}[f(B^0)] - \mathbb {E}[f(B^1)] \big ) \\ \leq &~\mathbb {E}[f(B^0)] - \mathbb {E}[f(B^1)] \,. \end{aligned}

Therefore if $f$ can distinguish $A^0$ and $A^1$ with probability $\epsilon$ then it can also distinguish $B^0$ and $B^1$ with such probability.

Similarly, for all $S \neq \varnothing$ such that $|S| \leq k$, we have

\begin{aligned} 0 = \mathbb {E}[\chi _S(A^0)] - \mathbb {E}[\chi _S(A^1)] = (1-p) \big ( \mathbb {E}[\chi _S(B^0)] - \mathbb {E}[\chi _S(B^1)] \big ) = 0 \,. \end{aligned}

Hence, $B^0$ and $B^1$ are $k$-wise indistinguishable. $\square$

Equipped with the above lemma and claim, we can finally prove the following lower bound on the approximate degree of AND-OR.

Theorem 3. $d_{1/3}($AND-OR$) = \Omega (\sqrt {RN})$.

Proof. Let $A^0, A^1$ be $\Omega (\sqrt {R})$-wise indistinguishable distributions for AND with advantage $0.99$, i.e. $\Pr [\mathrm {AND}(A^1) = 1] > \Pr [\mathrm {AND}(A^0) = 1] + 0.99$. Let $B^0, B^1$ be $\Omega (\sqrt {N})$-wise indistinguishable distributions for OR with advantage $0.99$. By the above claim, we can assume that $A^0, A^1$ have disjoint supports, and the same for $B^0, B^1$. Compose them by the lemma, getting $\Omega (\sqrt {RN})$-wise indistinguishable distributions $C^0,C^1$. We now show that AND-OR can distinguish $C^0, C^1$:

• $C_0$: First sample $A^0$. As there exists a unique $x = 1^R$ such that $\mathrm {AND}(x)= 1$, $\Pr [A^1 = 1^R] >0$. Thus by disjointness of support $\Pr [A^0 = 1^R] = 0$. Therefore when sampling $A^0$ we always get a string with at least one “$0$”. But then “$0$” is replaced with sample from $B^0$. We have $\Pr [B^0 = 0^N] \geq 0.99$, and when $B^0 = 0^N$, AND-OR$=0$.
• $C_1$: First sample $A^1$, and we know that $A^1 = 1^R$ with probability at least $0.99$. Each bit “$1$” is replaced by a sample from $B^1$, and we know that $\Pr [B^1 = 0^N] = 0$ by disjointness of support. Then AND-OR$=1$.

Therefore we have $d_{1/3}($AND-OR$)= \Omega (\sqrt {RN})$. $\square$

#### 1.1 Lower Bound of $d_{1/3}($SURJ$)$

In this subsection we discuss the approximate degree of the surjectivity function. This function is defined as follows.

Definition 4. The surjectivity function SURJ$\colon \left (\{0,1\}^{\log R}\right )^N \to \{0,1\}$, which takes input $(x_1, \dots , x_N)$ where $x_i \in [R]$ for all $i$, has value $1$ if and only if $\forall j \in [R], \exists i\colon x_i = j$.

First, some history. Aaronson first proved that the approximate degree of SURJ and other functions on $n$ bits including “the collision problem” is $n^{\Omega (1)}$. This was motivated by an application in quantum computing. Before this result, even a lower bound of $\omega (1)$ had not been known. Later Shi improved the lower bound to $n^{2/3}$, see [AS04]. The instructor believes that the quantum framework may have blocked some people from studying this problem, though it may have very well attracted others. Recently Bun and Thaler [BT17] reproved the $n^{2/3}$ lower bound, but in a quantum-free paper, and introducing some different intuition. Soon after, together with Kothari, they proved [BKT17] that the approximate degree of SURJ is $\Theta (n^{3/4})$.

We shall now prove the $\Omega (n^{3/4})$ lower bound, though one piece is only sketched. Again we present some things in a different way from the papers.

For the proof, we consider the AND-OR function under the promise that the Hamming weight of the $RN$ input bits is at most $N$. Call the approximate degree of AND-OR under this promise $d_{1/3}^{\leq N}($AND-OR$)$. Then we can prove the following theorems.

Theorem 5. $d_{1/3}($SURJ$) \geq d_{1/3}^{\leq N}($AND-OR$)$.

Theorem 6. $d_{1/3}^{\leq N}($AND-OR$) \geq \Omega (N^{3/4})$ for some suitable $R = \Theta (N)$.

In our settings, we consider $R = \Theta (N)$. Theorem 5 shows surprisingly that we can somehow “shrink” $\Theta (N^2)$ bits of input into $N\log N$ bits while maintaining the approximate degree of the function, under some promise. Without this promise, we just showed in the last subsection that the approximate degree of AND-OR is $\Omega (N)$ instead of $\Omega (N^{3/4})$ as in Theorem 6.

Proof of Theorem 5. Define an $N \times R$ matrix $Y$ s.t. the 0/1 variable $y_{ij}$ is the entry in the $i$-th row $j$-th column, and $y_{ij} = 1$ iff $x_i = j$. We can prove this theorem in following steps:

1. $d_{1/3}($SURJ$(\overline {x})) \geq d_{1/3}($AND-OR$(\overline {y}))$ under the promise that each row has weight $1$;
2. let $z_j$ be the sum of the $j$-th column, then $d_{1/3}($AND-OR$(\overline {y}))$ under the promise that each row has weight $1$, is at least $d_{1/3}($AND-OR$(\overline {z}))$ under the promise that $\sum _j z_j = N$;
3. $d_{1/3}($AND-OR$(\overline {z}))$ under the promise that $\sum _j z_j = N$, is at least $d_{1/3}^{=N}($AND-OR$(\overline {y}))$;
4. we can change “$=N$” into “$\leq N$”.

Now we prove this theorem step by step.

1. Let $P(x_1, \dots , x_N)$ be a polynomial for SURJ, where $x_i = (x_i)_1, \dots , (x_i)_{\log R}$. Then we have
\begin{aligned} (x_i)_k = \sum _{j: k\text {-th bit of }j \text { is } 1} y_{ij}. \end{aligned}

Then the polynomial $P'(\overline {y})$ for AND-OR$(\overline {y})$ is the polynomial $P(\overline {x})$ with $(x_i)_k$ replaced as above, thus the degree won’t increase. Correctness follows by the promise.

2. This is the most extraordinary step, due to Ambainis [Amb05]. In this notation, AND-OR becomes the indicator function of $\forall j, z_j \neq 0$. Define
\begin{aligned} Q(z_1, \dots , z_R) := \mathop {\mathbb {E}}_{\substack {\overline {y}: \text { his rows have weight } 1\\ \text {and is consistent with }\overline {z}}} P(\overline {y}). \end{aligned}

Clearly it is a good approximation of AND-OR$(\overline {z})$. It remains to show that it’s a polynomial of degree $k$ in $z$’s if $P$ is a polynomial of degree $k$ in $y$’s.

Let’s look at one monomial of degree $k$ in $P$: $y_{i_1j_1}y_{i_2j_2}\cdots y_{i_kj_k}$. Observe that all $i_\ell$’s are distinct by the promise, and by $u^2 = u$ over $\{0,1\}$. By chain rule we have

\begin{aligned} \mathbb {E}[y_{i_1j_1}\cdots y_{i_kj_k}] = \mathbb {E}[y_{i_1j_1}]\mathbb {E}[y_{i_2j_2}|y_{i_1j_1} = 1] \cdots \mathbb {E}[y_{i_kj_k}|y_{i_1j_1}=\cdots =y_{i_{k-1}j_{k-1}} = 1]. \end{aligned}

By symmetry we have $\mathbb {E}[y_{i_1j_1}] = \frac {z_{j_1}}{N}$, which is linear in $z$’s. To get $\mathbb {E}[y_{i_2j_2}|y_{i_1j_1} = 1]$, we know that every other entry in row $i_1$ is $0$, so we give away row $i_1$, average over $y$’s such that $\left \{\begin {array}{ll} y_{i_1j_1} = 1 &\\ y_{ij} = 0 & j\neq j_1 \end {array}\right .$ under the promise and consistent with $z$’s. Therefore

\begin{aligned} \mathbb {E}[y_{i_2j_2}|y_{i_1j_1} = 1] = \left \{ \begin {array}{ll} \frac {z_{j_2}}{N-1} & j_1 \neq j_2,\\ \frac {z_{j_2}-1}{N-1} & j_1 = j_2. \end {array}\right . \end{aligned}

In general we have

\begin{aligned} \mathbb {E}[y_{i_kj_k}|y_{i_1j_1}=\cdots =y_{i_{k-1}j_{k-1}} = 1] = \frac {z_{j_k} - \#\ell < k \colon j_\ell = j_k}{N-k + 1}, \end{aligned}

which has degree $1$ in $z$’s. Therefore the degree of $Q$ is not larger than that of $P$.

3. Note that $\forall j$, $z_j = \sum _i y_{ij}$. Hence by replacing $z$’s by $y$’s, the degree won’t increase.
4. We can add a “slack” variable $z_0$, or equivalently $y_{01}, \dots , y_{0N}$; then the condition $\sum _{j=0}^R z_j = N$ actually means $\sum _{j=1}^R z_j \leq N$.

$\square$

Proof idea for Theorem 6. First, by the duality argument we can verify that $d_{1/3}^{\leq N}(f) \geq d$ if and only if there exists $d$-wise indistinguishable distributions $A, B$ such that:

• $f$ can distinguish $A, B$;
• $A$ and $B$ are supported on strings of weight $\leq N$.

Claim 7. $d_{1/3}^{\leq \sqrt {N}}($OR$_N) = \Omega (N^{1/4})$.

The proof needs a little more information about the weight distribution of the indistinguishable distributions corresponding to this claim. Basically, their expected weight is very small.

Now we combine these distributions with the usual ones for And using the lemma mentioned at the beginning.

What remains to show is that the final distribution is supported on Hamming weight $\le N$. Because by construction the $R$ copies of the distributions for Or are sampled independently, we can use concentration of measure to prove a tail bound. This gives that all but an exponentially small measure of the distribution is supported on strings of weight $\le N$. The final step of the proof consists of slightly tweaking the distributions to make that measure $0$. $\square$

#### 1.2 Groups

Groups have many applications in theoretical computer science. Barrington [Bar89] used the permutation group $S_5$ to prove a very surprising result, which states that the majority function can be computed efficiently using only constant bits of memory (something which was conjectured to be false). More recently, catalytic computation shows that if we have a lot of memory, but it’s full with junk that cannot be erased, we can still compute more than if we had little memory. We will see some interesting properties of groups in the following.

Some famous groups used in computer science are:

• $\{0,1\}^n$ with bit-wise addition;
• $\mathbb {Z}_m$ with addition mod $m$ ;
• $S_n$, which are permutations of $n$ elements;
• Wreath product $G:= (\mathbb {Z}_m \times \mathbb {Z}_m) \wr \mathbb {Z}_2\,$, whose elements are of the form $(a,b)z$ where $z$ is a “flip bit”, with the following multiplication rules:
• $(a, b) 1 = 1 (b, a)$ ;
• $z\cdot z' := z+z'$ in $\mathbb {Z}_2$ ;
• $(a,b) \cdot (a',b') := (a+a', b+b')$ is the $\mathbb {Z}_m\times \mathbb {Z}_m$ operation;

An example is $(5,7)1 \cdot (2,1) 1 = (5,7) 1 \cdot 1 (1, 2) = (6,9)0$ . Generally we have

\begin{aligned} (a, b) z \cdot (a', b') z' = \left \{ \begin {array}{ll} (a + a', b+b') z+z' & z = 1\,,\\ (a+b', b + a') z+z' & z = 0\,; \end {array}\right . \end{aligned}

• $SL_2(q) := \{2\times 2$ matrices over $\mathbb {F}_q$ with determinant $1\},$ in other words, group of matrices $\begin {pmatrix} a & b\\ c & d \end {pmatrix}$ such that $ad - bc = 1$.

The group $SL_2(q)$ was invented by Galois. (If you haven’t, read his biography on wikipedia.)

Quiz. Among these groups, which is the “least abelian”? The latter can be defined in several ways. We focus on this: If we have two high-entropy distributions $X, Y$ over $G$, does $X \cdot Y$ has more entropy? For example, if $X$ and $Y$ are uniform over some $\Omega (|G|)$ elements, is $X\cdot Y$ close to uniform over $G$? By “close to” we mean that the statistical distance is less that a small constant from the uniform distribution. For $G=(\{0,1\}^n, +)$, if $Y=X$ uniform over $\{0\}\times \{0,1\}^{n-1}$, then $X\cdot Y$ is the same, so there is not entropy increase even though $X$ and $Y$ are uniform on half the elements.

Definition 8.[Measure of Entropy] For $\lVert A\rVert _2 = \left (\sum _xA(x)^2\right )^{\frac {1}{2}}$, we think of $\lVert A\rVert ^2_2 = 100 \frac {1}{|G|}$ for “high entropy”.

Note that $\lVert A\rVert ^2_2$ is exactly the “collision probability”, i.e. $\Pr [A = A']$. We will consider the entropy of the uniform distribution $U$ as very small, i.e. $\lVert U\rVert ^2_2 = \frac {1}{|G|} \approx \lVert \overline {0}\rVert ^2_2$. Then we have

\begin{aligned} \lVert A - U \rVert ^2_2 &= \sum _x \left (A(x) - \frac {1}{|G|}\right )^2\\ &= \sum _x A(x)^2 - 2A(x) \frac {1}{|G|} + \frac {1}{|G|^2} \\ &= \lVert A \rVert ^2_2 - \frac {1}{|G|} \\ &= \lVert A \rVert ^2_2 - \lVert U \rVert ^2_2\\ &\approx \lVert A \rVert ^2_2\,. \end{aligned}

Theorem 9.[[Gow08], [BNP08]] If $X, Y$ are independent over $G$, then

\begin{aligned} \lVert X\cdot Y - U \rVert _2 \leq \lVert X \rVert _2 \lVert Y \rVert _2 \sqrt {\frac {|G|}{d}}, \end{aligned}

where $d$ is the minimum dimension of irreducible representation of $G$.

By this theorem, for high entropy distributions $X$ and $Y$, we get $\lVert X\cdot Y - U \rVert _2 \leq \frac {O(1)}{\sqrt {|G|d}}$, thus we have

\begin{aligned} ~~~~(2) \lVert X\cdot Y - U \rVert _1 \leq \sqrt {|G|} \lVert X\cdot Y - U \rVert _2 \leq \frac {O(1)}{\sqrt {d}}. \end{aligned}

If $d$ is large, then $X \cdot Y$ is very close to uniform. The following table shows the $d$’s for the groups we’ve introduced.

 $G$ $\{0,1\}^n$ $\mathbb {Z}_m$ $(\mathbb {Z}_m \times \mathbb {Z}_m) \wr \mathbb {Z}_2$ $A_n$ $SL_2(q)$ $d$ $1$ $1$ should be very small $\frac {\log |G|}{\log \log |G|}$ $|G|^{1/3}$

Here $A_n$ is the alternating group of even permutations. We can see that for the first groups, Equation ((2)) doesn’t give non-trivial bounds.

But for $A_n$ we get a non-trivial bound, and for $SL_2(q)$ we get a strong bound: we have $\lVert X\cdot Y - U \rVert _2 \leq \frac {1}{|G|^{\Omega (1)}}$.

### References

[Amb05]    Andris Ambainis. Polynomial degree and lower bounds in quantum complexity: Collision and element distinctness with small range. Theory of Computing, 1(1):37–46, 2005.

[AS04]    Scott Aaronson and Yaoyun Shi. Quantum lower bounds for the collision and the element distinctness problems. J. of the ACM, 51(4):595–605, 2004.

[Bar89]    David A. Mix Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC$^1$. J. of Computer and System Sciences, 38(1):150–164, 1989.

[BCK$^{+}$14]    Harry Buhrman, Richard Cleve, Michal Koucký, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In ACM Symp. on the Theory of Computing (STOC), pages 857–866, 2014.

[BKT17]    Mark Bun, Robin Kothari, and Justin Thaler. The polynomial method strikes back: Tight quantum query bounds via dual polynomials. CoRR, arXiv:1710.09079, 2017.

[BNP08]    László Babai, Nikolay Nikolov, and László Pyber. Product growth and mixing in finite groups. In ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 248–257, 2008.

[BT17]    Mark Bun and Justin Thaler. A nearly optimal lower bound on the approximate degree of AC0. CoRR, abs/1703.05784, 2017.

[Gow08]    W. T. Gowers. Quasirandom groups. Combinatorics, Probability & Computing, 17(3):363–387, 2008.