Special Topics in Complexity Theory, Lecture 10

Added Dec 27 2017: An updated version of these notes exists on the class page.

 

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lecture 10, Guest lecture by Justin Thaler, Scribe: Biswaroop Maiti

This is a guest lecture by Justin Thaler regarding lower bounds on approximate degree [BKT17BT15BT17]. Thanks to Justin for giving this lecture and for his help with the write-up. We will sketch some details of the lower bound on the approximate degree of \mathsf {AND} \circ \mathsf {OR}, \mathsf {SURJ} and some intuition about the techniques used. Recall the definition of \mathsf {SURJ} from the previous lecture as below:

Definition 1. The surjectivity function \mathsf {SURJ}\colon \left (\{-1,1\}^{\log R}\right )^N \to \{-1,1\}, takes input x=(x_1, \dots , x_N) where each x_i \in \{-1, 1\}^{\log R} is interpreted as an element of [R]. \mathsf {SURJ}(x) has value -1 if and only if \forall j \in [R], \exists i\colon x_i = j.

Recall from the last lecture that \mathsf {AND}_R \circ \mathsf {OR}_N \colon \{-1,1\}^{R\times N} \rightarrow \{-1,1\} is the block-wise composition of the \mathsf {AND} function on R bits and the \mathsf {OR} function on N bits. In general, we will denote the block-wise composition of two functions f, and g, where f is defined on R bits and g is defined on N bits, by f_R \circ g_N. Here, the outputs of R copies of g are fed into f (with the inputs to each copy of g being pairwise disjoint). The total number of inputs to f_R \circ g_N is R \cdot N.

1.1 Lower Bound of d_{1/3}( \mathsf {SURJ} ) via lower bound of d_{1/3}(AND-OR)

 

Claim 2. d_{1/3}( \mathsf {SURJ} ) = \widetilde {\Theta }(n^{3/4}) .

We will look at only the lower bound in the claim. We interpret the input as a list of N numbers from [R]:= \{1,2, \cdots R\}. As presented in [BKT17], the proof for the lower bound proceeds in the following steps.

  1. Show that to approximate \mathsf {SURJ}, it is necessary to approximate the block-composition \mathsf {AND}_R \circ \mathsf {OR}_N on inputs of Hamming weight at most N. i.e., show that d_{1/3}(\mathsf {surj}) \geq d_{1/3}^{\leq N}(\mathsf {AND}_R \circ \mathsf {OR}_N).

    Step 1 was covered in the previous lecture, but we briefly recall a bit of intuition for why the claim in this step is reasonable. The intuition comes from the fact that the converse of the claim is easy to establish, i.e., it is easy to show that in order to approximate \mathsf {SURJ}, it is sufficient to approximate \mathsf {AND}_R \circ \mathsf {OR}_N on inputs of Hamming weight exactly N.

    This is because \mathsf {SURJ} can be expressed as an \mathsf {AND}_R (over all range items r \in [R]) of the \mathsf {OR}_N (over all inputs i \in [N]) of “Is input x_i equal to r”? Each predicate of the form in quotes is computed exactly by a polynomial of degree \log R, since it depends on only \log R of the input bits, and exactly N of the predicates (one for each i \in [N]) evaluates to TRUE.

    Step 1 of the lower bound proof for \mathsf {SURJ} in [BKT17] shows a converse, namely that the only way to approximate \mathsf {SURJ} is to approximate \mathsf {AND}_R \circ \mathsf {OR}_N on inputs of Hamming weight at most N.

  2. Show that d_{1/3}^{\leq N}(\mathsf {AND}_R \circ \mathsf {OR}_N) = \widetilde {\Omega }(n^{3/4}), i.e., the degree required to approximate \mathsf {AND} _R \circ \mathsf {OR}_N on inputs of Hamming weight at most N is at least D=\widetilde {\Omega }(n^{3/4}).

    In the previous lecture we also sketched this Step 2. In this lecture we give additional details of this step. As in the papers, we use the concept of a “dual witness.” The latter can be shown to be equivalent to bounded indistinguishability.

    Step 2 itself proceeds via two substeps:

    1. Give a dual witness \Phi for \mathsf {AND}_R \cdot \mathsf {OR}_N that has places little mass (namely, total mass less then (R \cdot N \cdot D)^{-D}) on inputs of hamming weight \geq N.
    2. By modifying \Phi , give a dual witness \Phi ' for \mathsf {AND}_R \cdot \mathsf {OR}_N that places zero mass on inputs of Hamming weight \geq N.

In [BKT17], both Substeps 2a and 2b proceed entirely in the dual world (i.e., they explicitly manipulate dual witnesses \Phi and \Phi '). The main goal of this section of the lecture notes is to explain how to replace Step 2b of the argument of [BKT17] with a wholly “primal” argument.

The intuition of the primal version of Step 2b that we’ll cover is as follows. First, we will show that a polynomial p \colon \{-1, 1\}^{R \cdot N} \to \{-1, 1\} of degree D that is bounded on the low Hamming Weight inputs, cannot be too big on the high Hamming weight inputs. In particular, we will prove the following claim.

Claim 3. If p \colon \{-1, 1\}^{M} \to \mathbb {R} is a degree D polynomial that satisfies |p(x)| \leq 4/3 on all inputs of x of Hamming weight at most D, then |p(x)| \leq (4/3) \cdot D \cdot M^D for all inputs x.

Second, we will explain that the dual witness \Phi constructed in Step 2a has the following “primal” implication:

Claim 4. For D \approx N^{3/4}, any polynomial p of degree D satisfying |p(x) - \left (\mathsf {AND}_R \circ \mathsf {OR}_N\right )(x) | \leq 1/3 for all inputs x of Hamming weight at most N must satisfy |p(x)| > (4/3) \cdot D \cdot ( R \cdot N)^D for some input x \in \{-1, 1\}^{R \cdot N}.

Combining Claims 3 and 4, we conclude that no polynomial p of degree D \approx N^{3/4} can satisfy

\begin{aligned} ~~~~(1) |p(x) - (\mathsf {AND}_R \circ \mathsf {OR}_N)(x) | \leq 1/3 \text { for all inputs } x \text { of Hamming weight at most } N,\end{aligned}

which is exactly the desired conclusion of Step 2. This is because any polynomial p satisfying Equation (1) also satisfies |p(x)| \leq 4/3 for all x of Hamming weight of most N, and hence Claim 3 implies that

\begin{aligned} ~~~~(2) |p(x)| \leq \frac {4}{3} \cdot D \cdot (R \cdot N)^D \text { for \emph {all} inputs } x \in \{-1, 1\}^{R \cdot N}.\end{aligned}

But Claim 4 states that any polynomial satisfying both Equations (1) and (2) requires degree strictly larger than D.

In the remainder of this section, we prove Claims 3 and 4.

1.2 Proof of Claim 3

Proof of Claim 3. For notational simplicity, let us prove this claim for polynomials on domain \{0, 1\}^{M}, rather than \{-1, 1\}^M.

Proof in the case that p is symmetric. Let us assume first that p is symmetric, i.e., p is only a function of the Hamming weight |x| of its input x. Then p(x) = g(|x|) for some degree D univariate polynomial g (this is a direct consequence of Minsky-Papert symmetrization, which we have seen in the lectures before). We can express g as below in the same spirit of Lagrange interpolation.

\begin{aligned}g(t)= \sum _{k=0}^{D-1} g(k) \cdot \prod _{i=0}^{D-1} \frac {t-i}{k-i}. \end{aligned}

Here, the first term, g(k) ,is bounded in magnitude by |g(k)| \leq 4/3, and |\prod _{i=0}^{D-1} \frac {t-i}{k-i}| \leq M^D. Therefore, we get the final bound:

\begin{aligned}|g(t)| \leq (4/3) \cdot D \cdot M^D.\end{aligned}

Proof for general p. Let us now consider the case of general (not necessarily symmetric) polynomials p. Fix any input x \in \{0, 1\}^M. The goal is to show that |p(x)| \leq \frac 43 D \cdot M^D.

Let us consider a polynomial \hat {p}_x \colon \{0,1\}^{|x|} \rightarrow \{0,1\} of degree D obtained from p by restricting each input i such that x_i=0 to have the value 0. For example, if M=4 and x=(0, 1, 1, 0), then \hat {p}_x(y_2, y_3)=p(0, y_2, y_3, 0). We will exploit three properties of \hat {p}_x:

  • \deg (\hat {p}_x) \leq \deg (p) \leq D.
  • Since |p(x)| \leq 4/3 for all inputs with |x| \leq D, \hat {p}_x(y) satisfies the analogous property: |\hat {p}_x(y)| \leq 4/3 for all inputs with |y| \leq D.
  • If \mathbf {1}_{|x|} denotes the all-1s vector of length |x|, then \hat {p}_x(\mathbf {1}_x) = p(x).

Property 3 means that our goal is to show that |\widehat {p}(\mathbf {1}_x)| \leq \frac 43 \cdot D \cdot M^D.

Let p^{\text {symm}}_x \colon \{0, 1\}^{M} \to \mathbb {R} denote the symmetrized version of \hat {p}_x, i.e., p^{\text {symm}}_x(y) = \mathbb {E}_{\sigma }[\hat {p}_x(\sigma (y))], where the expectation is over a random permutation \sigma of \{1, \dots , |x|\}, and \sigma (y)=(y_{\sigma (1)}, \dots , y_{\sigma (|x|)}). Since \sigma (\mathbf {1}_{|x|}) = \mathbf {1}_{|x|} for all permutations \sigma , p^{\text {symm}}_x(\mathbf {1}_{|x|}) = \hat {p}_x(\mathbf {1}_{|x|}) = p(x). But p^{\text {symm}}_x is symmetric, so Properties 1 and 2 together mean that the analysis from the first part of the proof implies that |p^{\text {symm}}_x(y)| \leq \frac 43 \cdot D \cdot M^D for all inputs y. In particular, letting y = \mathbf {1}_{|x|}, we conclude that |p(x)| \leq \frac 43 \cdot D \cdot M^D as desired. \square

Discussion. One may try to simplify the analysis of the general case in the proof Claim 3 by considering the polynomial p^{\text {symm}} \colon \{0, 1\}^M \to \mathbb {R} defined via p^{\text {symm}}(x)=\mathbb {E}_{\sigma }[p(\sigma (x))], where the expectation is over permutations \sigma of \{1, \dots , M\}. p^{\text {symm}} is a symmetric polynomial, so the analysis for symmetric polynomials immediately implies that |p^{\text {symm}}(x)| \leq \frac 43 \cdot D \cdot M^D. Unfortunately, this does not mean that |p(x)| \leq \frac 43 \cdot D \cdot M^D.

This is because the symmetrized polynomial p^{\mathsf {symm}} is averaging the values of p over all those inputs of a given Hamming weight. So, a bound on this averaging polynomial does not preclude the case where p is massively positive on some inputs of a given Hamming weight, and massively negative on other inputs of the same Hamming weight, and these values cancel out to obtain a small average value. That is, it is not enough to conclude that on the average over inputs of any given Hamming weight, the magnitude of p is not too big.

Thus, we needed to make sure that when we symmetrize \hat {p}_x to p^{\mathsf {sym}}_x, such large cancellations don’t happen, and a bound of the average value of \hat {p} on a given Hamming weight really gives us a bound on p on the input x itself. We defined \hat {p}_x so that \hat {p}_x(\mathbf {1}_M) = p(x). Since there is only one input in \{0, 1\}^M of Hamming weight M, p^{\text {symm}}_x(\mathbf {1}_M) does not average \hat {p}_x’s values on many inputs, meaning we don’t need to worry about massive cancellations.

A note on the history of Claim 3. Claim 3 was implicit in [RS10]. They explicitly showed a similar bound for symmetric polynomials using primal view and (implicitly) gave a different (dual) proof of the case for general polynomials.

1.3 Proof of Claim 4

1.3.1 Interlude Part 1: Method of Dual Polynomials [BT17]

A dual polynomial is a dual solution to a certain linear program that captures the approximate degree of any given function f \colon \{-1, 1\}^n \to \{-1, 1\}. These polynomials act as certificates of the high approximate degree of f. The notion of strong LP duality implies that the technique is lossless, in comparison to symmetrization techniques which we saw before. For any function f and any \varepsilon , there is always some dual polynomial \Psi that witnesses a tight \varepsilon -approximate degree lower bound for f. A dual polynomial that witnesses the fact that \mathsf {d}_\varepsilon (f) \geq d is a function \Psi \colon \{-1, 1\}^n \rightarrow \{-1, 1\} satisfying three properties:

  • Correlation analysis:
    \begin{aligned}\sum _{x \in \{-1,1\}^n }{\Psi (x) \cdot f(x)} > \varepsilon .\end{aligned}

    If \Psi satisfies this condition, it is said to be well-correlated with f.

  • Pure high degree: For all polynomials p \colon \{-1, 1\}^n \rightarrow \mathbb {R} of degree less than d, we have
    \begin{aligned}\sum _{x \in \{-1,1\}^n } { p(x) \cdot \Psi (x)} = 0.\end{aligned}

    If \Psi satisfies this condition, it is said to have pure high degree at least d.

  • \ell _1 norm:
    \begin{aligned}\sum _{x \in \{-1,1\}^n }|\Psi (x)| = 1.\end{aligned}
1.3.2 Interlude Part 2: Applying The Method of Dual Polynomials To Block-Composed Functions

For any function f \colon \{-1, 1\}^n \to \{-1, 1\}, we can write an LP capturing the approximate degree of f. We can prove lower bounds on the approximate degree of f by proving lower bounds on the value of feasible solution of this LP. One way to do this is by writing down the Dual of the LP, and exhibiting a feasible solution to the dual, thereby giving an upper bound on the value of the Dual. By the principle of LP duality, an upper bound on the Dual LP will be a lower bound of the Primal LP. Therefore, exhibiting such a feasible solution, which we call a dual witness, suffices to prove an approximate degree lower bound for f.

However, for any given dual witness, some work will be required to verify that the witness indeed meets the criteria imposed by the Dual constraints.

When the function f is a block-wise composition of two functions, say h and g, then we can try to construct a good dual witness for f by looking at dual witnesses for each of h and g, and combining them carefully, to get the dual witness for h \circ g.

The dual witness \Phi constructed in Step 2a for \mathsf {AND} \circ \mathsf {OR} is expressed below in terms of the dual witness of the inner \mathsf {OR} function viz. \Psi _{\mathsf {OR}} and the dual witness of the outer \mathsf {AND}, viz. \Psi _{ \mathsf {AND} }.

\begin{aligned} ~~~~(3) \Phi (x_1 \dots x_R) = \Psi _{ \mathsf {AND} }\left ( \cdots , \mathsf {sgn}(\Psi _{\mathsf {OR}}(x_i)), \cdots \right ) \cdot \prod _{i=1}^R| \Psi _{\mathsf {OR}}(x_i)|. \end{aligned}

This method of combining dual witnesses \Psi _{\mathsf {AND}} for the “outer” function \mathsf {AND} and \Psi _{\mathsf {OR}} for the “inner function” \Psi _{\mathsf {OR}} is referred to in [BKT17BT17] as dual block composition.

1.3.3 Interlude Part 3: Hamming Weight Decay Conditions

Step 2a of the proof of the \mathsf {SURJ} lower bound from [BKT17] gave a dual witness \Phi for \mathsf {AND}_R \circ \mathsf {OR}_N (with R=\Theta (N)) that had pure high degree \tilde {\Omega }(N^{3/4}), and also satisfies Equations (4) and (5) below.

\begin{aligned} ~~~~(4) \sum _{|x|>N} {|\Phi (x)|} \ll (R \cdot N \cdot D)^{-D} \end{aligned}
\begin{aligned} ~~~~(5) \text {For all } t=0, \dots , N, \sum _{|x|=t} {|\Phi (x)|} \leq \frac {1}{15 \cdot (1+t)^2}. \end{aligned}

Equation (4) is a very strong “Hamming weight decay” condition: it shows that the total mass that \Psi places on inputs of high Hamming weight is very small. Hamming weight decay conditions play an essential role in the lower bound analysis for \mathsf {SURJ} from [BKT17]. In addition to Equations (4) and (5) themselves being Hamming weight decay conditions, [BKT17]’s proof that \Phi satisfies Equations (4) and (5) exploits the fact that the dual witness \Psi _{\mathsf {OR}} for \mathsf {OR} can be chosen to simultaneously have pure high degree N^{1/4}, and to satisfy the following weaker Hamming weight decay condition:

Claim 5. There exist constants c_1, c_2 such that for all t=0, \cdots N,

\begin{aligned} ~~~~(6) \sum _{|x|=t} { \Psi _{\mathsf {OR}}(x)} \leq c_1 \cdot \frac {1}{(1+t)^2} \cdot \exp (-c_2 \cdot t/N^{1/4}). \end{aligned}

(We will not prove Claim 5 in these notes, we simply state it to highlight the importance of dual decay to the analysis of \mathsf {SURJ}).

Dual witnesses satisfying various notions of Hamming weight decay have a natural primal interpretation: they witness approximate degree lower bounds for the target function (\mathsf {AND}_R \circ \mathsf {OR}_N in the case of Equation (4), and \mathsf {OR}_N in the case of Equation (6)) even when the approximation is allowed to be exponentially large on inputs of high Hamming weight. This primal interpretation of dual decay is formalized in the following claim.

Claim 6. Let L(t) be any function mapping \{0, 1, \dots , N\} to \mathbb {R}_+. Suppose \Psi is a dual witness for f satisfying the following properties:

  • (Correlation): \sum _{x \in \{-1,1\}^n }{\Psi (x) \cdot f(x)} > 1/3.
  • (Pure high degree): \Psi has pure high degree D.
  • (Dual decay): \sum _{|x|=t} |\Psi (x)| \leq \frac {1}{5 \cdot (1+t)^2 \cdot L(t)} for all t = 0, 1, \dots , N.

Then there is no degree D polynomial p such that

\begin{aligned} ~~~~(7) |p(x)-f(x)| \leq L(t) \text { for all } t = 0, 1, \dots , N.\end{aligned}

Proof. Let p be any degree D polynomial. Since \Psi has pure high degree D, \sum _{x \in \{-1, 1\}^N} p(x) \cdot \Psi (x)=0.

We will now show that if p satisfies Equation (7), then the other two properties satisfied by \Psi (correlation and dual decay) together imply that \sum _{x \in \{-1, 1\}^N} p(x) \cdot \Psi (x) >0, a contradiction.

\begin{aligned} \sum _{x \in \{-1, 1\}^N} \Psi (x) \cdot p(x) = \sum _{x \in \{-1, 1\}^N} \Psi (x) \cdot f(x) - \sum _{x \in \{-1, 1\}^N} \Psi (x) \cdot (p(x) - f(x))\\ \geq 1/3 - \sum _{x \in \{-1, 1\}^N} |\Psi (x)| \cdot |p(x) - f(x)|\\ \geq 1/3 - \sum _{t=0}^N \sum _{|x|=t} |\Psi (x)| \cdot L(t)\\ \geq 1/3 - \sum _{t=0}^N \frac {1}{5 \cdot (1+t)^2 \cdot L(t)} \cdot L(t)\\ = 1/3 - \sum _{t=0}^N \frac {1}{5 \cdot (1+t)^2} > 0. \end{aligned}

Here, Line 2 exploited that \Psi has correlation at least 1/3 with f, Line 3 exploited the assumption that p satisfies Equation (7), and Line 4 exploited the dual decay condition that \Psi is assumed to satisfy. \square

1.3.4 Proof of Claim 4

Proof. Claim 4 follows from Equations (4) and (5), combined with Claim 6. Specifically, apply Claim 6 with f=\mathsf {AND}_R \circ \mathsf {OR}_N, and

\begin{aligned}L(t) = \begin {cases} 1/3 \text { if } t \leq N \\ (R \cdot N \cdot D)^{D} \text { if } t > N. \end {cases}\end{aligned}

\square

2 Generalizing the analysis for \mathsf {SURJ} to prove a nearly linear approximate degree lower bound for \mathsf {AC}^0

Now we take a look at how to extend this kind of analysis for \mathsf {SURJ} to obtain even stronger approximate degree lower bounds for other functions in \mathsf {AC}^0. Recall that \mathsf {SURJ} can be expressed as an \mathsf {AND}_R (over all range items r \in [R]) of the \mathsf {OR}_N (over all inputs i \in [N]) of “Is input x_i equal to r”? That is, \mathsf {SURJ} simply evaluates \mathsf {AND}_R \circ \mathsf {OR}_N on the inputs (\dots , y_{j, i}, \dots ) where y_{j, i} indicates whether or not input x_i is equal to range item j \in [R].

Our analysis for \mathsf {SURJ} can be viewed as follows: It is a way to turn the \mathsf {AND} function on R bits (which has approximate degree \Theta \left (\sqrt []{R}\right )) into a function on close to R bits, with polynomially larger approximate degree (i.e. \mathsf {SURJ} is defined on N \log R bits where, say, the value of N is 100R, i.e., it is a function on 100 R \log R bits). So, this function is on not much more than R bits, but has approximate degree \tilde {\Omega }(R^{3/4}), polynomially larger than the approximate degree of \mathsf {AND}_R.

Hence, the lower bound for \mathsf {SURJ} can be seen as a hardness amplification result. We turn the \mathsf {AND} function on R bits to a function on slightly more bits, but the approximate degree of the new function is significantly larger.

From this perspective, the lower bound proof for \mathsf {SURJ} showed that in order to approximate \mathsf {SURJ}, we need to not only approximate the \mathsf {AND}_R function, but, additionally, instead of feeding the inputs directly to \mathsf {AND} gate itself, we are further driving up the degree by feeding the input through \mathsf {OR}_N gates. The intuition is that we cannot do much better than merely approximate the \mathsf {AND} function and then approximating the block composed \mathsf {OR}_N gates. This additional approximation of the \mathsf {OR} gates give us the extra exponent in the approximate degree expression.

We will see two issues that come in the way of naive attempts at generalizing our hardness amplification technique from \mathsf {AND}_R to more general functions.

2.1 Interlude: Grover’s Algorithm

Grover’s algorithm [Gro96] is a quantum algorithm that finds with high probability the unique input to a black box function that produces a given output, using O({\sqrt {N}}) queries on the function, where N is the size of the the domain of the function. It is originally devised as a database search algorithm that searches an unsorted database of size N and determines whether or not there is a record in the database that satisfies a given property in O(\sqrt []{N}) queries. This is strictly better compared to deterministic and randomized query algorithms because they will take \Omega (N) queries in the worst case and in expectation respectively. Grover’s algorithm is optimal up to a constant factor, for the quantum world.

2.2 Issues: Why a dummy range item is necessary

In general, let us consider the problem of taking any function f that does not have maximal approximate degree (say, with approximate degree n^{1-\Omega (1)}), and turning it into a function on roughly the same number of bits, but with polynomially larger approximate degree.

In analogy with how \mathsf {SURJ}(x_1, \dots , x_N) equals \mathsf {AND}_R \circ \mathsf {OR}_N evaluated on inputs (\dots , y_{ji}, \dots ), where y_{ji} indicates whether or not x_i=j, we can consider the block composition f_R \circ \mathsf {OR}_N evaluated on (\dots , y_{ji}, \dots ), and hope that this function has polynomially larger approximate degree than f_R itself.

Unfortunately, this does not work. Consider for example the case f_R = \mathsf {OR}_R. The function \mathsf {OR}_R \circ \mathsf {OR}_N = \mathsf {OR}_{R \cdot N} evaluates to 1 on all possible vectors (\dots , y_{ji}, \dots , ), since all such vectors of Hamming weight exactly N > 0.

One way to try to address this is to introduce a dummy range item, all occurrences of which are simply ignored by the function. That is, we can consider the (hopefully harder) function G to interpret its input as a list of N numbers from the range [R]_0 := \{0, 1, \dots , R\}, rather than range [R], and define G=f_R \circ \mathsf {OR}_N(y_{1, 1}, \dots , y_{R, N}) (note that variables y_{0, 1}, \dots , y_{0, N}, which indicate whether or not each input x_i equals range item 0, are simply ignored).

In fact, in the previous lecture we already used this technique of introducing a “dummy” range item, to ease the lower bound analysis for \mathsf {SURJ} itself. Last lecture we covered Step 1 of the lower bound proof for \mathsf {SURJ}, and we let z_0= \sum _{i = 1}^N y_{0, i} denote the frequency of the dummy range item, 0. The introduction of this dummy range item let us replace the condition \sum _{j=0}^R z_j = N (i.e., the sum of the frequencies of all the range items is exactly N) by the condition \sum _{j=1}^R z_j \leq N (i.e., the sum of the frequencies of all the range items is at most N).

2.3 A dummy range item is not sufficient on its own

Unfortunately, introducing a dummy range item is not sufficient on its own. That is, even when the range is is [R]_0 rather than [R], the function G=f_R \circ \mathsf {OR}_N(y_{1, 1}, \dots , y_{R, N}) may have approximate degree that is not polynomially larger than that of f_R itself. An example of this is (once again) f_R = \mathsf {OR}_R. With a dummy range item, \mathsf {OR}_R \circ \mathsf {OR}_N(y_{1, 1}, \dots , y_{R, N}) evaluates to TRUE if and only if at least one of the N inputs is not equal to the dummy range item 0. This problem has approximate degree O(N^{1/2}) (it can be solved using Grover search).

Therefore, the most naive approach at general hardness amplification, even with a dummy range item, does not work.

2.4 The approach that works

The approach that succeeds is to consider the block composition f \circ \mathsf {AND}_{\log R} \circ \mathsf {OR}_N (i.e., apply the naive approach with a dummy range item not to f_R itself, but to f_R \circ \mathsf {AND}_{\log R}). As pointed out in Section 2.3, the \mathsf {AND}_{\log R} gates are crucial here for the analysis to go through.

It is instructive to look at where exactly the lower bound proof for \mathsf {SURJ} breaks down if we try to adapt it to the function \mathsf {OR}_R \circ \mathsf {OR}_N = \mathsf {OR}_{R \cdot N} (rather than the function \mathsf {AND}_R \circ \mathsf {OR}_N which we analyzed to prove the lower bound for \mathsf {SURJ}). Then we can see why the introduction of the \mathsf {AND}_{\log R} gates fixes the issue.

When analyzing the more naively defined function G= \left (\mathsf {OR}_R \circ \mathsf {OR}_N\right )(y_{1, 1}, \dots , y_{R, N}) (with a dummy range item), Step 1 of the lower bound analysis for \mathsf {SURJ} does work unmodified to imply that in order to approximate G, it is necessary to approximate block composition of \mathsf {OR}_R \circ \mathsf {OR}_N on inputs of Hamming weight at most N. But Step 2 of the analysis breaks down: one can approximate \mathsf {OR}_R \circ \mathsf {OR}_N on inputs of Hamming weight at most N using degree just O(N^{1/2}).

Why does the Step 2 analysis break down for \mathsf {OR}_R \circ \mathsf {OR}_N? If one tries to construct a dual witness \Phi for \mathsf {OR}_R \circ \mathsf {OR}_N by applying dual block composition (cf. Equation (3), but with the dual witness \Psi _{\mathsf {AND}} for \mathsf {AND}_R replaced by a dual witness for \mathsf {OR}_R), \Phi will not be well-correlated with \mathsf {OR}_R \circ \mathsf {OR}_N.

Roughly speaking, the correlation analysis thinks of each copy of the inner dual witness \Psi _{\mathsf {OR}}(x_i) as consisting of a sign, \mathsf {sgn}(\Psi _{\mathsf {OR}})(x_i), and a magnitude |\Psi _{\mathsf {OR}}(x_i)|, and the inner dual witness “makes an error” on x_i if it outputs the wrong sign, i.e., if \mathsf {sgn}(\Psi _{\mathsf {OR}})(x_i) \neq \mathsf {OR}(x_i). The correlation analysis winds up performing a union bound over the probability (under the product distribution \prod _{i=1}^{R}|\Psi _{\mathsf {OR}}(x_i)|) that any of the R copies of the inner dual witness makes an error. Unfortunately, each copy of the inner dual witness makes an error with constant probability under the distribution |\Psi _{\mathsf {OR}}|. So at least one of them makes an error under the product distribution with probability very close to 1. This means that the correlation of the dual-block-composed dual witness \Phi with \mathsf {OR}_R \circ \mathsf {OR}_N is poor.

But if we look at \mathsf {OR}_R \circ \left (\mathsf {AND}_{\log R} \circ \mathsf {OR}_N\right ), the correlation analysis does go through. That is, we can give a dual witness \Psi _{\mathsf {in}} for \mathsf {AND}_{\log R} \circ \mathsf {OR}_N and a dual witness \Psi _{\mathsf {out}} for \mathsf {OR}_R such that the the dual-block-composition of \Psi _{\mathsf {out}} and \Psi _{\mathsf {in}} is well-correlated with \mathsf {OR}_R \circ \left (\mathsf {AND}_{\log R} \circ \mathsf {OR}_N\right ).

This is because [BT15] showed that for \epsilon =1-1/(3R), d_{\epsilon }\left (\mathsf {AND}_{\log R} \circ \mathsf {OR}_N\right ) = \Omega (N^{1/2}). This means that \left (\mathsf {AND}_{\log R} \circ \mathsf {OR}_N\right ) has a dual witness \Psi _{\mathsf {in}} that “makes an error” with probability just 1/(3R). This probability of making an error is so low that a union bound over all R copies of \Psi _{\mathsf {in}} appearing in the dual-block-composition of \Psi _{\mathsf {out}} and \Psi _{\mathsf {in}} implies that with probability at least 1/3, none of the copies of \Psi _{\mathsf {in}} make an error.

In summary, the key difference between \mathsf {OR}_N and \mathsf {AND}_{\log R} \circ \mathsf {OR}_N that allows the lower bound analysis to go through for the latter but not the former is that the latter has \epsilon -approximate degree \Omega (N^{1/2}) for \epsilon = 1-1/(3R), while the former only has \epsilon -approximate degree \Omega (N^{1/2}) if \epsilon is a constant bounded away from 1.

To summarize, the \mathsf {SURJ} lower bound can be seen as a way to turn the function f_R = \mathsf {AND}_R into a harder function G=\mathsf {SURJ}, meaning that G has polynomially larger approximate degree than f_R. The right approach to generalize the technique for arbitrary f_R is to (a) introduce a dummy range item, all occurrences of which are effectively ignored by the harder function G, and (b) rather than considering the “inner” function \mathsf {OR}_N, consider the inner function \mathsf {AND}_{\log R} \circ \mathsf {OR}_N, i.e., let G=f_R \circ \mathsf {AND}_{\log R} \circ \mathsf {OR}_N(y_{1, 1} \dots , y_{R \log R, N}). The \mathsf {AND}_{\log R} gates are essential to make sure that the error in the correlation of the inner dual witness is very small, and hence the correlation analysis for the dual-block-composed dual witness goes through. Note that G can be interpreted as follows: it breaks the range [R \log R]_0 up into R blocks, each of length \log R, (the dummy range item is excluded from all of the blocks), and for each block it computes a bit indicating whether or not every range item in the block has frequency at least 1. It then feeds these bits into f_R.

By recursively applying this construction, starting with f_R = \mathsf {AND}_R, we get a function in AC^0 with approximate degree \Omega (n^{1-\delta }) for any desired constant \delta > 0.

2.5 k-distinctness

The above mentioned very same issue also arises in [BKT17]’s proof of a lower bound on the approximate degree of the k-distinctness function. Step 1 of the lower bound analysis for \mathsf {SURJ} reduced analyzing k-distinctness to analyzing \mathsf {OR} \circ \mathsf {TH}^k_N (restricted to inputs of Hamming weight at most N), where \mathsf {TH}^k_N is the function that evaluates to TRUE if and only if its input has Hamming weight at least k. The lower bound proved in [BKT17] for k-distinctness is \Omega (n^{3/4-1/(2k)}). \mathsf {OR} is the \mathsf {TH}^1 function. So, \mathsf {OR}_R \circ \mathsf {TH}^k is “close” to \mathsf {OR}_R \circ \mathsf {OR}_N. And we’ve seen that the correlation analysis of the dual witness obtained via dual-block-composition breaks down for \mathsf {OR}_R \circ \mathsf {OR}_N.

To overcome this issue, we have to show that \mathsf {TH}^k_N is harder to approximate than \mathsf {OR}_N itself, but we have to give up some small factor in the process. We will lose some quantity compared to the \Omega (n^{3/4}) lower bound for \mathsf {SURJ}. It may seem that this loss factor is just a technical issue and not intrinsic, but this is not so. In fact, this bound is almost tight. There is an upper bound from a complicated quantum algorithm [BL11Bel12] for k-distinctness that makes O(n^{3/4-1/(2^{k+2}-4)})= n^{3/4-\Omega (1)} that we won’t elaborate on here.

References

[Bel12]    Aleksandrs Belovs. Learning-graph-based quantum algorithm for k-distinctness. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pages 207–216. IEEE, 2012.

[BKT17]   Mark Bun, Robin Kothari, and Justin Thaler. The polynomial method strikes back: Tight quantum query bounds via dual polynomials. arXiv preprint arXiv:1710.09079, 2017.

[BL11]    Aleksandrs Belovs and Troy Lee. Quantum algorithm for k-distinctness with prior knowledge on the input. arXiv preprint arXiv:1108.3022, 2011.

[BT15]    Mark Bun and Justin Thaler. Hardness amplification and the approximate degree of constant-depth circuits. In International Colloquium on Automata, Languages, and Programming, pages 268–280. Springer, 2015.

[BT17]    Mark Bun and Justin Thaler. A nearly optimal lower bound on the approximate degree of \mathsf {AC}^0. arXiv preprint arXiv:1703.05784, 2017.

[Gro96]    Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 212–219. ACM, 1996.

[RS10]    Alexander A Razborov and Alexander A Sherstov. The sign-rank of \mathsf {AC}^{0}. SIAM Journal on Computing, 39(5):1833–1855, 2010.

Special Topics in Complexity Theory, Lectures 12-13

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

1 Lectures 12-13, Scribe: Giorgos Zirdelis

In these lectures we study the communication complexity of some problems on groups. We give the definition of a protocol when two parties are involved and generalize later to more parties.

Definition 1. A 2-party c-bit deterministic communication protocol is a depth-c binary tree such that:

  • the leaves are the output of the protocol
  • each internal node is labeled with a party and a function from that party’s input space to \{0,1\}

Computation is done by following a path on edges, corresponding to outputs of functions at the nodes.

A public-coin randomized protocol is a distribution on deterministic protocols.

2 2-party communication protocols

We start with a simple protocol for the following problem.

Let G be a group. Alice gets x \in G and Bob gets y \in G and their goal is to check if x \cdot y = 1_G, or equivalently if x = y^{-1}.

There is a simple deterministic protocol in which Alice simply sends her input to Bob who checks if x \cdot y = 1_G. This requires O(\log |G|) communication complexity.

We give a randomized protocol that does better in terms on communication complexity. Alice picks a random hash function h: G \rightarrow \{0,1\}^{\ell }. We can think that both Alice and Bob share some common randomness and thus they can agree on a common hash function to use in the protocol. Next, Alice sends h(x) to Bob, who then checks if h(x)=h(y^{-1}).

For \ell = O(1) we get constant error and constant communication.

3 3-party communication protocols

There are two ways to extend 2-party communication protocols to more parties. We first focus on the Number-in-hand (NIH), where Alice gets x, Bob gets y, Charlie gets z, and they want to check if x \cdot y \cdot z = 1_G. In the NIH setting the communication depends on the group G.

3.1 A randomized protocol for the hypercube

Let G=\left ( \{0,1\}^n, + \right ) with addition modulo 2. We want to test if x+y+z=0^n. First, we pick a linear hash function h, i.e. satisfying h(x+y) = h(x) + h(y). For a uniformly random a \in \{0,1\}^n set h_a(x) = \sum a_i x_i \pmod 2. Then,

  • Alice sends h_a(x)
  • Bob send h_a(y)
  • Charlie accepts if and only if \underbrace {h_a(x) + h_a(y)}_{h_a(x+y)} = h_a(z)

The hash function outputs 1 bit. The error probability is 1/2 and the communication is O(1). For a better error, we can repeat.

3.2 A randomized protocol for \mathbb {Z}_m

Let G=\left (\mathbb {Z}_m, + \right ) where m=2^n. Again, we want to test if x+y+z=0 \pmod m. For this group, there is no 100% linear hash function but there are almost linear hash function families h: \mathbb {Z}_m \rightarrow \mathbb {Z}_{\ell } that satisfy the following properties:

  1. \forall a,x,y we have h_a(x) + h_a(y) = h_a(x+y) \pm 1
  2. \forall x \neq 0 we have \Pr _{a} [h_a(x) \in \{\pm 2, \pm 1, 0\}] \leq 2^{-\Omega (\ell )}
  3. h_a(0)=0

Assuming some random hash function h (from a family) that satisfies the above properties the protocol works similar to the previous one.

  • Alice sends h_a(x)
  • Bob sends h_a(y)
  • Charlie accepts if and only if h_a(x) + h_a(y) + h_a(z) \in \{\pm 2, \pm 1, 0\}

We can set \ell = O(1) to achieve constant communication and constant error.

Analysis

To prove correctness of the protocol, first note that h_a(x) + h_a(y) + h_a(z) = h_a(x+y+z) \pm 2, then consider the following two cases:

  • if x+y+z=0 then h_a(x+y+z) \pm 2 = h_a(0) \pm 2 = 0 \pm 2
  • if x+y+z \neq 0 then \Pr _{a} [h_a(x+y+z) \in \{\pm 2, \pm 1, 0\}] \leq 2^{-\Omega (\ell )}

It now remains to show that such hash function families exist.

Let a be a random odd number modulo 2^n. Define

\begin{aligned} h_a(x) := (a \cdot x \gg n-\ell ) \pmod {2^{\ell }} \end{aligned}

where the product a \cdot x is integer multiplication. In other words we output the bits n-\ell +1, n-\ell +2, \ldots , n of the integer product a\cdot x.

We now verify that the above hash function family satisfies the three properties we required above.

Property (3) is trivially satisfied.

For property (1) we have the following. Let s = a\cdot x and t = a \cdot y and u=n-\ell . The bottom line is how (s \gg u) + (t \gg u) compares with (s+t) \gg u. In more detail we have that,

  • h_a(x+y) = ((s+t) \gg u) \pmod {2^{\ell }}
  • h_a(x) = (s \gg u) \pmod {2^{\ell }}
  • h_a(x) = (t \gg u) \pmod {2^{\ell }}

Notice, that if in the addition s+t the carry into the u+1 bit is 0, then

\begin{aligned} (s \gg u) + (t \gg u) = (s+t) \gg u \end{aligned}

otherwise

\begin{aligned} (s \gg u) + (t \gg u) + 1 = (s+t) \gg u \end{aligned}

which concludes the proof for property (1).

Finally, we prove property (2). We start by writing x=s \cdot 2^c where s is odd. Bitwise, this looks like (\cdots \cdots 1 \underbrace {0 \cdots 0}_{c~ \textrm {bits}}).

The product a \cdot x for a uniformly random a, bitwise looks like ( \textit {uniform} ~ 1 \underbrace {0 \cdots 0}_{c~\textrm {bits}}). We consider the two following cases for the product a \cdot x:

  1. If a \cdot x = (\underbrace {\textit {uniform} ~ 1 \overbrace {00}^{2~bits}}_{\ell ~bits} \cdots 0), or equivalently c \geq n-\ell + 2, the output never lands in the bad set \{\pm 2, \pm 1, 0\} (some thought should be given to the representation of negative numbers – we ignore that for simplicity).
  2. Otherwise, the hash function output has \ell - O(1) uniform bits. Again for simplicity, let B = \{0,1,2\}. Thus,
    \begin{aligned} \Pr [\textrm {output} \in B] \leq |B| \cdot 2^{-\ell + O(1)} \end{aligned}

    In other words, the probability of landing in any small set is small.

4 Other groups

What happens in other groups? Do we have an almost linear hash function for 2 \times 2 matrices? The answer is negative. For SL_2(q) and A_n the problem of testing equality with 1_G is hard.

We would like to rule out randomized protocols, but it is hard to reason about them directly. Instead, we are going to rule out deterministic protocols on random inputs. For concreteness our main focus will be SL_2(q).

First, for any group element g \in G we define the distribution on triples, D_g := (x,y, (x \cdot y)^{-1} g), where x,y \in G are uniformly random elements. Note the product of the elements in D_g is always g.

Towards a contradiction, suppose we have a randomized protocol P for the xyz=^? 1_G problem. In particular, we have

\begin{aligned} \Pr [P(D_1)=1] \geq \Pr [P(D_h)=1] + \frac {1}{10}. \end{aligned}

This implies a deterministic protocol with the same gap, by fixing the randomness.

We reach a contradiction by showing that for every deterministic protocols P using little communication (will quantify later), we have

\begin{aligned} | \Pr [P(D_1)=1] - \Pr [P(D_h)=1] | \leq \frac {1}{100}. \end{aligned}

We start with the following lemma, which describes a protocol using product sets.

Lemma 1. (The set of accepted inputs of) A deterministic c-bit protocol can be written as a disjoint union of 2^c “rectangles,” that is sets of the form A \times B \times C.

Proof. (sketch) For every communication transcript t, let S_t \subseteq G^3 be the set of inputs giving transcript t. The sets S_t are disjoint since an input gives only one transcript, and their number is 2^c, i.e. one for each communication transcript of the protocol. The rectangle property can be proven by induction on the protocol tree. \square

Next, we show that these product sets cannot distinguish these two distributions D_1,D_h, and for that we will use the pseudorandom properties of the group G.

Lemma 2. For all A,B,C \subseteq G and we have

\begin{aligned} |\Pr [A \times B \times C(D_1)=1] - \Pr [A \times B \times C(D_h)=1]| \leq \frac {1}{d^{\Omega (1)}} .\end{aligned}

Recall the parameter d from the previous lectures and that when the group G is SL_2(q) then d=|G|^{\Omega (1)}.

Proof. Pick any h \in G and let x,y,z be the inputs of Alice, Bob, and Charlie respectively. Then

\begin{aligned} \Pr [A \times B \times C(D_h)=1] = \Pr [ (x,y) \in A \times B ] \cdot \Pr [(x \cdot y)^{-1} \cdot h \in C | (x,y) \in A \times B] \end{aligned}

If either A or B is small, that is \Pr [x \in A] \leq \epsilon or \Pr [y \in B] \leq \epsilon , then also \Pr [P(D_h)=1] \leq \epsilon because the term \Pr [ (x,y) \in A \times B ] will be small. We will choose \epsilon later.

Otherwise, A and B are large, which implies that x and y are uniform over at least \epsilon |G| elements. Recall from Lecture 9 that this implies \lVert x \cdot y - U \rVert _2 \leq \lVert x \rVert _2 \cdot \lVert y \rVert _2 \cdot \sqrt {\frac {|G|}{d}}, where U is the uniform distribution.

By Cauchy–Schwarz we obtain,

\begin{aligned} \lVert x \cdot y - U \rVert _1 \leq |G| \cdot \lVert x \rVert _2 \cdot \lVert y \rVert _2 \cdot \sqrt {\frac {1}{d}} \leq \frac {1}{\epsilon } \cdot \frac {1}{\sqrt {d}}. \end{aligned}

The last inequality follows from the fact that \lVert x \rVert _2, \lVert y \rVert _2 \leq \sqrt {\frac {1}{\epsilon |G|}}.

This implies that \lVert (x \cdot y)^{-1} - U \rVert _1 \leq \frac {1}{\epsilon } \cdot \frac {1}{\sqrt {d}} and \lVert (x \cdot y)^{-1} \cdot h - U \rVert _1 \leq \frac {1}{\epsilon } \cdot \frac {1}{\sqrt {d}}, because taking inverses and multiplying by h does not change anything. These two last inequalities imply that,

\begin{aligned} \Pr [(x \cdot y)^{-1} \in C | (x,y) \in A \times B] = \Pr [(x \cdot y)^{-1} \cdot h \in C | (x,y) \in A \times B] \pm \frac {2}{\epsilon } \frac {1}{\sqrt {d}} \end{aligned}

and thus we get that,

\begin{aligned} \Pr [P(D_1)=1] = \Pr [P(D_h)=1] \pm \frac {2}{\epsilon } \frac {1}{\sqrt {d}}. \end{aligned}

To conclude, based on all the above we have that for all \epsilon and independent of the choice of h, it is either the case that

\begin{aligned} | \Pr [P(D_1)=1] - \Pr [P(D_h)=1] | \leq 2 \epsilon \end{aligned}

or

\begin{aligned} | \Pr [P(D_1)=1] - \Pr [P(D_h)=1] | \leq \frac {2}{\epsilon } \frac {1}{\sqrt {d}} \end{aligned}

and we will now choose the \epsilon to balance these two cases and finish the proof:

\begin{aligned} \frac {2}{\epsilon } \frac {1}{\sqrt {d}} = 2 \epsilon \Leftrightarrow \frac {1}{\sqrt {d}} = \epsilon ^2 \Leftrightarrow \epsilon = \frac {1}{d^{1/4}}. \end{aligned}

\square

The above proves that the distribution D_h behaves like the uniform distribution for product sets, for all h \in G.

Returning to arbitrary deterministic protocols P, write P as a union of 2^c disjoint rectangles by the first lemma. Applying the second lemma and summing over all rectangles we get that the distinguishing advantage of P is at most 2^c/d^{1/4}. For c \leq (1/100) \log d the advantage is at most 1/100 and thus we get a contradiction on the existence of such a correct protocol. We have concluded the proof of this theorem.

Theorem 3. Let G be a group, and d be the minimum dimension of an irreducible representation of G. Consider the 3-party, number-in-hand communication protocol f : G^3 \to \{0,1\} where f(x,y,z) = 1 \Leftrightarrow x \cdot y \cdot z = 1_G. Its randomized communication complexity is \Omega (\log d).

For SL_2(q) the communication is \Omega (\log |G|). This is tight up to constants, because Alice can send her entire group element.

For the group A_n the known bounds on d yield communication \Omega ( \log \log |G|). This bound is tight for the problem of distinguishing D_1 from D_h for h\neq 1, as we show next. The identity element 1_G for the group A_n is the identity permutation. If h \neq 1_G then h is a permutation that maps some element a \in G to h(a)=b \neq a. The idea is that the parties just need to “follow” a, which is logarithmically smaller than G. Specifically, let x,y,z be the permutations that Alice, Bob and Charlie get. Alice sends x(a) \in [n]. Bob gets x(a) and sends y(x(a)) \in [n] to Charlie who checks if z(y(x(a))) = 1. The communication is O(\log n). Because the size of the group is |G|=\Theta (n!) = \Theta \left ( \left ( \frac {n}{e} \right )^n \right ), the communication is O(\log \log |G|).

This is also a proof that d cannot be too large for A_n, i.e. is at most (\log |G|)^{O(1)}.

5 More on 2-party protocols

We move to another setting where a clean answer can be given. Here we only have two parties. Alice gets x_1,x_2,\ldots ,x_n, Bob gets y_1,y_2,\ldots ,y_n, and they want to know if x_1 \cdot y_1 \cdot x_2 \cdot y_2 \cdots x_n \cdot y_n = 1_G.

When G is abelian, the elements can be reordered as to check whether (x_1 \cdot x_2 \cdots x_n) \cdot (y_1 \cdot y_2 \cdots y_n) = 1_G. This requires constant communication (using randomness) as we saw in Lecture 12, since it is equivalent to the check x \cdot y = 1_G where x=x_1 \cdot x_2 \cdots x_n and y=y_1 \cdot y_2 \cdots y_n.

We will prove the next theorem for non-abelian groups.

Theorem 1. For every non-abelian group G the communication of deciding if x_1 \cdot y_1 \cdot x_2 \cdot y_2 \cdots x_n \cdot y_n = 1_G is \Omega (n).

Proof. We reduce from unique disjointness, defined below. For the reduction we will need to encode the And of two bits x,y \in \{0,1\} as a group product. (This question is similar to a puzzle that asks how to hang a picture on the wall with two nails, such that if either one of the nails is removed, the picture will fall. This is like computing the And function on two bits, where both bits (nails) have to be 1 in order for the function to be 1.) Since G is non-abelian, there exist a,b \in G such that a \cdot b \neq b\cdot a, and in particular a \cdot b \cdot a^{-1} \cdot b^{-1} = h with h \neq 1. We can use this fact to encode And as

\begin{aligned} a^x \cdot b^y \cdot a^{-x} \cdot b^{-y}= \begin {cases} 1,~~\text {if And(x,y)=0}\\ h,~~\text {otherwise} \end {cases}. \end{aligned}

In the disjointness problem Alice and Bob get inputs x,y \in \{0,1\}^n respectively, and they wish to check if there exists an i \in [n] such that x_i \land y_i =1. If you think of them as characteristic vectors of sets, this problem is asking if the sets have a common element or not. The communication of this problem is \Omega (n). Moreover, in the variant of this problem where the number of such i’s is 0 or 1 (i.e. unique), the same lower bound \Omega (n) still applies. This is like giving Alice and Bob two sets that either are disjoint or intersect in exactly one element, and they need to distinguish these two cases.

Next, we will reduce the above variant of the set disjointness to group products. For x,y \in \{0,1\}^n we product inputs for the group problem as follows:

\begin{aligned} x & \rightarrow (a^{x_1} , a^{-x_1} , \ldots , a^{x_n}, a^{-x_n} ) \\ y & \rightarrow (b^{y_1} , b^{-y_1}, \ldots , b^{y_n}, b^{-y_n}). \end{aligned}

Now, the product x_1 \cdot y_1 \cdot x_2 \cdot y_2 \cdots x_n \cdot y_n we originally wanted to compute becomes

\begin{aligned} \underbrace {a^{x_1} \cdot b^{y_1} \cdot a^{-x_1} \cdot b^{-y_1}}_{\text {1 bit}} \cdots \cdots a^{x_n} \cdot b^{y_n} \cdot a^{-x_n} \cdot b^{-y_n}. \end{aligned}

If there isn’t an i \in [n] such that x_i \land y_i=1, then each product term a^{x_i} \cdot b^{y_i} \cdot a^{-x_i} \cdot b^{-y_i} is 1 for all i, and thus the whole product is 1.

Otherwise, there exists a unique i such that x_i \land y_i=1 and thus the product will be 1 \cdots 1 \cdot h \cdot 1 \cdots 1=h, with h being in the i-th position. If Alice and Bob can test if the above product is equal to 1, they can also solve the unique set disjointness problem, and thus the lower bound applies for the former. \square

We required the uniqueness property, because otherwise we might get a product h^c that could be equal to 1 in some groups.

Why I vote for women

Perhaps in my previous post I should have explained more precisely why I think many things would be better if women were in control. I tried to summarize many statistics, papers, and books that I read through the years, but some may have found my language too sweeping. Let me try to be a bit more precise now.

First, polling conducted during the past four decades has shown that typically, men favor the United States’ going to war to resolve disputes much more than women do.

Second, women are more concerned about climate change than men, and they are more willing to make major lifestyle changes to do something about it.

Finally, it’s also a fact that women live longer, see e.g. this. The advantage is quite noticeable: about 5 years. I won’t give a statistic for what the consequences of this are, instead I’ll conduct the following mental experiment. Suppose population X has expected lifespan 1000 years, while population Y has expected lifespan 100 years. I think population X would be more interested in renewable energies, sustainable practices, et cetera.

I vote for women

UPDATE: I voted!  I tried and gathered all last-minute available information. I found this website rather useful. In the end I defaulted twice. The comments also pushed me to look for some relevant statistics. I may do another post about that later. In the meanwhile look e.g. at the first chart here (Approval in mid-August of first year). Look at the last five presidents. What do you think of sex as a proxy?


Tomorrow is election day in my city, Newton MA. I am happy to participate in this election because I feel that at the local level my votes can make a difference. This is also because I am asked to cast an unusually large number of votes, something like 10 or maybe 20. This number is so high that for the majority of the candidates I will have absolutely zero information, except their names on the ballot. In such cases I have decided to vote for women.

I think many things would be a lot better if women were in control. Because women tend to have a greater appreciation for life, health, and the environment. Their actions are also generally less myopic.

This default vote is overturned by the slightest relevant information I have. In one such case, I won’t vote for a candidate who advocates a more vibrant Newton, and growth. I will vote for the opponent who instead advocates protecting the city. I want Newton to stay the way it is: dead.

Unfortunately, there isn’t much discussion of the issues I really care about, like heavily protected bike lanes.