Special Topics in Complexity Theory, Lectures 1213
Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola
1 Lectures 1213, Scribe: Giorgos Zirdelis
In these lectures we study the communication complexity of some problems on groups. We give the definition of a protocol when two parties are involved and generalize later to more parties.
Definition 1. A 2party cbit deterministic communication protocol is a depthc binary tree such that:
 the leaves are the output of the protocol
 each internal node is labeled with a party and a function from that party’s input space to
Computation is done by following a path on edges, corresponding to outputs of functions at the nodes.
A publiccoin randomized protocol is a distribution on deterministic protocols.
2 2party communication protocols
We start with a simple protocol for the following problem.
Let be a group. Alice gets and Bob gets and their goal is to check if , or equivalently if .
There is a simple deterministic protocol in which Alice simply sends her input to Bob who checks if . This requires communication complexity.
We give a randomized protocol that does better in terms on communication complexity. Alice picks a random hash function . We can think that both Alice and Bob share some common randomness and thus they can agree on a common hash function to use in the protocol. Next, Alice sends to Bob, who then checks if .
For we get constant error and constant communication.
3 3party communication protocols
There are two ways to extend 2party communication protocols to more parties. We first focus on the Numberinhand (NIH), where Alice gets , Bob gets , Charlie gets , and they want to check if . In the NIH setting the communication depends on the group .
3.1 A randomized protocol for the hypercube
Let with addition modulo 2. We want to test if . First, we pick a linear hash function , i.e. satisfying . For a uniformly random set . Then,
 Alice sends
 Bob send
 Charlie accepts if and only if
The hash function outputs 1 bit. The error probability is and the communication is . For a better error, we can repeat.
3.2 A randomized protocol for
Let where . Again, we want to test if . For this group, there is no 100% linear hash function but there are almost linear hash function families that satisfy the following properties:
 we have
 we have
Assuming some random hash function (from a family) that satisfies the above properties the protocol works similar to the previous one.
 Alice sends
 Bob sends
 Charlie accepts if and only if
We can set to achieve constant communication and constant error.
Analysis
To prove correctness of the protocol, first note that , then consider the following two cases:
 if then
 if then
It now remains to show that such hash function families exist.
Let be a random odd number modulo . Define
where the product is integer multiplication. In other words we output the bits of the integer product .
We now verify that the above hash function family satisfies the three properties we required above.
Property (3) is trivially satisfied.
For property (1) we have the following. Let and and . The bottom line is how compares with . In more detail we have that,
Notice, that if in the addition the carry into the bit is , then
otherwise
which concludes the proof for property (1).
Finally, we prove property (2). We start by writing where is odd. Bitwise, this looks like .
The product for a uniformly random , bitwise looks like . We consider the two following cases for the product :
 If , or equivalently , the output never lands in the bad set (some thought should be given to the representation of negative numbers – we ignore that for simplicity).
 Otherwise, the hash function output has uniform bits. Again for simplicity, let . Thus,
In other words, the probability of landing in any small set is small.
4 Other groups
What happens in other groups? Do we have an almost linear hash function for matrices? The answer is negative. For and the problem of testing equality with is hard.
We would like to rule out randomized protocols, but it is hard to reason about them directly. Instead, we are going to rule out deterministic protocols on random inputs. For concreteness our main focus will be .
First, for any group element we define the distribution on triples, , where are uniformly random elements. Note the product of the elements in is always .
Towards a contradiction, suppose we have a randomized protocol for the problem. In particular, we have
This implies a deterministic protocol with the same gap, by fixing the randomness.
We reach a contradiction by showing that for every deterministic protocols using little communication (will quantify later), we have
We start with the following lemma, which describes a protocol using product sets.
Lemma 1. (The set of accepted inputs of) A deterministic bit protocol can be written as a disjoint union of “rectangles,” that is sets of the form .
Proof. (sketch) For every communication transcript , let be the set of inputs giving transcript . The sets are disjoint since an input gives only one transcript, and their number is , i.e. one for each communication transcript of the protocol. The rectangle property can be proven by induction on the protocol tree.
Next, we show that these product sets cannot distinguish these two distributions , and for that we will use the pseudorandom properties of the group .
Lemma 2. For all and we have
Recall the parameter from the previous lectures and that when the group is then .
Proof. Pick any and let be the inputs of Alice, Bob, and Charlie respectively. Then
If either or is small, that is or , then also because the term will be small. We will choose later.
Otherwise, and are large, which implies that and are uniform over at least elements. Recall from Lecture 9 that this implies , where is the uniform distribution.
By Cauchy–Schwarz we obtain,
The last inequality follows from the fact that .
This implies that and , because taking inverses and multiplying by does not change anything. These two last inequalities imply that,
and thus we get that,
To conclude, based on all the above we have that for all and independent of the choice of , it is either the case that
or
and we will now choose the to balance these two cases and finish the proof:
The above proves that the distribution behaves like the uniform distribution for product sets, for all .
Returning to arbitrary deterministic protocols , write as a union of disjoint rectangles by the first lemma. Applying the second lemma and summing over all rectangles we get that the distinguishing advantage of is at most . For the advantage is at most and thus we get a contradiction on the existence of such a correct protocol. We have concluded the proof of this theorem.
Theorem 3. Let be a group, and be the minimum dimension of an irreducible representation of . Consider the 3party, numberinhand communication protocol where . Its randomized communication complexity is .
For the communication is . This is tight up to constants, because Alice can send her entire group element.
For the group the known bounds on yield communication . This bound is tight for the problem of distinguishing from for , as we show next. The identity element for the group is the identity permutation. If then is a permutation that maps some element to . The idea is that the parties just need to “follow” , which is logarithmically smaller than . Specifically, let be the permutations that Alice, Bob and Charlie get. Alice sends . Bob gets and sends to Charlie who checks if . The communication is . Because the size of the group is , the communication is .
This is also a proof that cannot be too large for , i.e. is at most .
5 More on 2party protocols
We move to another setting where a clean answer can be given. Here we only have two parties. Alice gets , Bob gets , and they want to know if .
When is abelian, the elements can be reordered as to check whether . This requires constant communication (using randomness) as we saw in Lecture 12, since it is equivalent to the check where and .
We will prove the next theorem for nonabelian groups.
Theorem 1. For every nonabelian group the communication of deciding if is .
Proof. We reduce from unique disjointness, defined below. For the reduction we will need to encode the And of two bits as a group product. (This question is similar to a puzzle that asks how to hang a picture on the wall with two nails, such that if either one of the nails is removed, the picture will fall. This is like computing the And function on two bits, where both bits (nails) have to be 1 in order for the function to be 1.) Since is nonabelian, there exist such that , and in particular with . We can use this fact to encode And as
In the disjointness problem Alice and Bob get inputs respectively, and they wish to check if there exists an such that . If you think of them as characteristic vectors of sets, this problem is asking if the sets have a common element or not. The communication of this problem is . Moreover, in the variant of this problem where the number of such ’s is 0 or 1 (i.e. unique), the same lower bound still applies. This is like giving Alice and Bob two sets that either are disjoint or intersect in exactly one element, and they need to distinguish these two cases.
Next, we will reduce the above variant of the set disjointness to group products. For we product inputs for the group problem as follows:
Now, the product we originally wanted to compute becomes
If there isn’t an such that , then each product term is 1 for all , and thus the whole product is 1.
Otherwise, there exists a unique such that and thus the product will be , with being in the th position. If Alice and Bob can test if the above product is equal to 1, they can also solve the unique set disjointness problem, and thus the lower bound applies for the former.
We required the uniqueness property, because otherwise we might get a product that could be equal to 1 in some groups.
Why I vote for women
Perhaps in my previous post I should have explained more precisely why I think many things would be better if women were in control. I tried to summarize many statistics, papers, and books that I read through the years, but some may have found my language too sweeping. Let me try to be a bit more precise now.
Finally, it’s also a fact that women live longer, see e.g. this. The advantage is quite noticeable: about 5 years. I won’t give a statistic for what the consequences of this are, instead I’ll conduct the following mental experiment. Suppose population X has expected lifespan 1000 years, while population Y has expected lifespan 100 years. I think population X would be more interested in renewable energies, sustainable practices, et cetera.
I vote for women
UPDATE: I voted! I tried and gathered all lastminute available information. I found this website rather useful. In the end I defaulted twice. The comments also pushed me to look for some relevant statistics. I may do another post about that later. In the meanwhile look e.g. at the first chart here (Approval in midAugust of first year). Look at the last five presidents. What do you think of sex as a proxy?
Tomorrow is election day in my city, Newton MA. I am happy to participate in this election because I feel that at the local level my votes can make a difference. This is also because I am asked to cast an unusually large number of votes, something like 10 or maybe 20. This number is so high that for the majority of the candidates I will have absolutely zero information, except their names on the ballot. In such cases I have decided to vote for women.
I think many things would be a lot better if women were in control. Because women tend to have a greater appreciation for life, health, and the environment. Their actions are also generally less myopic.
This default vote is overturned by the slightest relevant information I have. In one such case, I won’t vote for a candidate who advocates a more vibrant Newton, and growth. I will vote for the opponent who instead advocates protecting the city. I want Newton to stay the way it is: dead.
Unfortunately, there isn’t much discussion of the issues I really care about, like heavily protected bike lanes.
Special Topics in Complexity Theory, Lectures 89
Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola
1 Lecture 89, Scribe: Xuangui Huang
In these lectures, we finish the proof of the approximate degree lower bound for ANDOR function, then we move to the surjectivity function SURJ. Finally we discuss quasirandom groups.
1.1 Lower Bound of ANDOR
Recall from the last lecture that ANDOR is the composition of the AND function on bits and the OR function on bits. We also proved the following lemma.
Lemma 1. Suppose that distributions over are wise indistinguishable distributions; and distributions over are wise indistinguishable distributions. Define over as follows:
: draw a sample from , and replace each bit by a sample of (independently).
Then and are wise indistinguishable.
To finish the proof of the lower bound on the approximate degree of the ANDOR function, it remains to see that ANDOR can distinguish well the distributions and . For this, we begin with observing that we can assume without loss of generality that the distributions have disjoint supports.
Claim 2. For any function , and for any wise indistinguishable distributions and , if can distinguish and with probability then there are distributions and with the same properties (wise indistinguishability yet distinguishable by ) and also with disjoint supports. (By disjoint support we mean for any either or .)
Proof. Let distribution be the “common part” of and . That is to say, we define such that multiplied by some constant that normalize into a distribution.
Then we can write and as
where , and are two distributions. Clearly and have disjoint supports.
Then we have
Therefore if can distinguish and with probability then it can also distinguish and with such probability.
Similarly, for all such that , we have
Hence, and are wise indistinguishable.
Equipped with the above lemma and claim, we can finally prove the following lower bound on the approximate degree of ANDOR.
Theorem 3. ANDOR.
Proof. Let be wise indistinguishable distributions for AND with advantage , i.e. . Let be wise indistinguishable distributions for OR with advantage . By the above claim, we can assume that have disjoint supports, and the same for . Compose them by the lemma, getting wise indistinguishable distributions . We now show that ANDOR can distinguish :
 : First sample . As there exists a unique such that , . Thus by disjointness of support . Therefore when sampling we always get a string with at least one “”. But then “” is replaced with sample from . We have , and when , ANDOR.
 : First sample , and we know that with probability at least . Each bit “” is replaced by a sample from , and we know that by disjointness of support. Then ANDOR.
Therefore we have ANDOR.
1.2 Lower Bound of SURJ
In this subsection we discuss the approximate degree of the surjectivity function. This function is defined as follows.
Definition 4. The surjectivity function SURJ, which takes input where for all , has value if and only if .
First, some history. Aaronson first proved that the approximate degree of SURJ and other functions on bits including “the collision problem” is . This was motivated by an application in quantum computing. Before this result, even a lower bound of had not been known. Later Shi improved the lower bound to , see [AS04]. The instructor believes that the quantum framework may have blocked some people from studying this problem, though it may have very well attracted others. Recently Bun and Thaler [BT17] reproved the lower bound, but in a quantumfree paper, and introducing some different intuition. Soon after, together with Kothari, they proved [BKT17] that the approximate degree of SURJ is .
We shall now prove the lower bound, though one piece is only sketched. Again we present some things in a different way from the papers.
For the proof, we consider the ANDOR function under the promise that the Hamming weight of the input bits is at most . Call the approximate degree of ANDOR under this promise ANDOR. Then we can prove the following theorems.
Theorem 6. ANDOR for some suitable .
In our settings, we consider . Theorem 5 shows surprisingly that we can somehow “shrink” bits of input into bits while maintaining the approximate degree of the function, under some promise. Without this promise, we just showed in the last subsection that the approximate degree of ANDOR is instead of as in Theorem 6.
Proof of Theorem 5. Define an matrix s.t. the 0/1 variable is the entry in the th row th column, and iff . We can prove this theorem in following steps:
 SURJANDOR under the promise that each row has weight ;
 let be the sum of the th column, then ANDOR under the promise that each row has weight , is at least ANDOR under the promise that ;
 ANDOR under the promise that , is at least ANDOR;
 we can change “” into “”.
Now we prove this theorem step by step.
 Let be a polynomial for SURJ, where . Then we have
Then the polynomial for ANDOR is the polynomial with replaced as above, thus the degree won’t increase. Correctness follows by the promise.
 This is the most extraordinary step, due to Ambainis [Amb05]. In this notation, ANDOR becomes the indicator function of . Define
Clearly it is a good approximation of ANDOR. It remains to show that it’s a polynomial of degree in ’s if is a polynomial of degree in ’s.
Let’s look at one monomial of degree in : . Observe that all ’s are distinct by the promise, and by over . By chain rule we have
By symmetry we have , which is linear in ’s. To get , we know that every other entry in row is , so we give away row , average over ’s such that under the promise and consistent with ’s. Therefore
In general we have
which has degree in ’s. Therefore the degree of is not larger than that of .
 Note that , . Hence by replacing ’s by ’s, the degree won’t increase.
 We can add a “slack” variable , or equivalently ; then the condition actually means .
Proof idea for Theorem 6. First, by the duality argument we can verify that if and only if there exists wise indistinguishable distributions such that:
 can distinguish ;
 and are supported on strings of weight .
Claim 7. OR.
The proof needs a little more information about the weight distribution of the indistinguishable distributions corresponding to this claim. Basically, their expected weight is very small.
Now we combine these distributions with the usual ones for And using the lemma mentioned at the beginning.
What remains to show is that the final distribution is supported on Hamming weight . Because by construction the copies of the distributions for Or are sampled independently, we can use concentration of measure to prove a tail bound. This gives that all but an exponentially small measure of the distribution is supported on strings of weight . The final step of the proof consists of slightly tweaking the distributions to make that measure .
1.3 Groups
Groups have many applications in theoretical computer science. Barrington [Bar89] used the permutation group to prove a very surprising result, which states that the majority function can be computed efficiently using only constant bits of memory (something which was conjectured to be false). More recently, catalytic computation [BCK14] shows that if we have a lot of memory, but it’s full with junk that cannot be erased, we can still compute more than if we had little memory. We will see some interesting properties of groups in the following.
Some famous groups used in computer science are:
 with bitwise addition;
 with addition mod ;
 , which are permutations of elements;
 Wreath product , whose elements are of the form where is a “flip bit”, with the following multiplication rules:
 ;
 in ;
 is the operation;
An example is . Generally we have
 matrices over with determinant in other words, group of matrices such that .
The group was invented by Galois. (If you haven’t, read his biography on wikipedia.)
Quiz. Among these groups, which is the “least abelian”? The latter can be defined in several ways. We focus on this: If we have two highentropy distributions over , does has more entropy? For example, if and are uniform over some elements, is close to uniform over ? By “close to” we mean that the statistical distance is less that a small constant from the uniform distribution. For , if uniform over , then is the same, so there is not entropy increase even though and are uniform on half the elements.
Definition 8.[Measure of Entropy] For , we think of for “high entropy”.
Note that is exactly the “collision probability”, i.e. . We will consider the entropy of the uniform distribution as very small, i.e. . Then we have
Theorem 9.[[Gow08], [BNP08]] If are independent over , then
where is the minimum dimension of irreducible representation of .
By this theorem, for high entropy distributions and , we get , thus we have
If is large, then is very close to uniform. The following table shows the ’s for the groups we’ve introduced.

Here is the alternating group of even permutations. We can see that for the first groups, Equation ((1)) doesn’t give nontrivial bounds.
But for we get a nontrivial bound, and for we get a strong bound: we have .
References
[Amb05] Andris Ambainis. Polynomial degree and lower bounds in quantum complexity: Collision and element distinctness with small range. Theory of Computing, 1(1):37–46, 2005.
[AS04] Scott Aaronson and Yaoyun Shi. Quantum lower bounds for the collision and the element distinctness problems. J. of the ACM, 51(4):595–605, 2004.
[Bar89] David A. Mix Barrington. Boundedwidth polynomialsize branching programs recognize exactly those languages in NC. J. of Computer and System Sciences, 38(1):150–164, 1989.
[BCK14] Harry Buhrman, Richard Cleve, Michal Koucký, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In ACM Symp. on the Theory of Computing (STOC), pages 857–866, 2014.
[BKT17] Mark Bun, Robin Kothari, and Justin Thaler. The polynomial method strikes back: Tight quantum query bounds via dual polynomials. CoRR, arXiv:1710.09079, 2017.
[BNP08] László Babai, Nikolay Nikolov, and László Pyber. Product growth and mixing in finite groups. In ACMSIAM Symp. on Discrete Algorithms (SODA), pages 248–257, 2008.
[BT17] Mark Bun and Justin Thaler. A nearly optimal lower bound on the approximate degree of AC0. CoRR, abs/1703.05784, 2017.
[Gow08] W. T. Gowers. Quasirandom groups. Combinatorics, Probability & Computing, 17(3):363–387, 2008.
The emission protection agency
Libby, Montana, the town that healthconscious dwellers have learned to fear. The vermiculite extracted from a mine next to the town was packaged as Zonolite insulation and ended up in millions of homes, possibly including yours. Unfortunately, that vermiculite was contaminated with asbestos, a mineral once considered a miracle and now known to cause cancer. The town population has been gruesomely decimated by the asbestos contamination, and new cases keep coming. That mine has produced the vast majority of the vermiculite insulation in the world.
Yesterday it was announced that the socalled Environment Protection Agency (EPA) won’t after all review much of the hazardous material in use, including asbestos. What this means exactly is of course murky, but the bottom line is crystal clear: The EPA will do less to protect your health.
Special Topics in Complexity Theory, Lectures 67
Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola
1 Lecture 67, Scribe: Willy Quach
In these lectures, we introduce wise indistinguishability and link this notion to the approximate degree of a function. Then, we study the approximate degree of some functions, namely, the AND function and the ANDOR function. For the latter function we begin to see a proof that is different (either in substance or language) from the proofs in the literature. We begin with some LATEXtips.
1.1 Some LATEX tips.
 Mind the punctuation. Treat equations as part of the phrases; add commas, colons, etc accordingly.
 In math mode, it is usually better to use \ell () rather than regular l. The latter can be confused with 1.
 Align equations with \begin{align} \end{align} with the alignment character &.
 For set inclusion, use \subset () only if you mean proper inclusion (which is uncommon). Otherwise use \subseteq (). (Not everybody agrees with this, but this seems the most natural convention by analogy with and .)
1.2 Introducing kwise indistinguishability.
We studied previously the following questions:
 What is the minimum such that any wise independent distribution over fools (i.e. for all size circuits with constant depth)?
We saw that is enough.
 What is the minimum such that fools the AND function?
Taking for suffices (more precisely we saw that wise independence fools the AND function with ).
Consider now and two distributions over that are wise indistinguishable, that is, any projection over bits of and have the same distribution. We can ask similar questions:
 What is the minimum such that cannot distinguish and (i.e. for all size circuits with constant depth)?
It turns out this requires : there are some distributions that are almost always distinguishable in this regime. (Whether is necessary or not is an open question.)
Also, suffices to fool (in which case is essentially exponentially small).
 What is the minimum such that the AND function (on bits) cannot distinguish and ?
It turns out that is necessary and sufficient. More precisely:
 There exists some over that are wise indistinguishable for some constant , but such that:
 For all that are wise indistinguishable for some bigger constant , we have:
 There exists some over that are wise indistinguishable for some constant , but such that:
1.3 Duality.
Those question are actually equivalent to ones related about approximation by realvalued polynomials:
Theorem 1. Let be a function, and an integer. Then:
Here denotes degree real polynomials. We will denote the righthand side .
Some examples:
 : then for all distribution , so that both sides of the equality are .
 the parity function on bits.
Then for , the lefthand side is at least : take to be uniform; and to be uniform on bits, defining the th bit to be to be the parity of the first bits. Then but .
Claim 2. .
Proof. Suppose by contradiction that some polynomial has degree and approximates Parity by .
The key ingredient is to symmetrize a polynomial , by letting
where ranges over permutations. Note that only depends on .
Now we claim that there is a univariate polynomial also of degree such that
for every .
To illustrate, let be a monomial of . For instance if , then , where is the Hamming weight of the input. (For this we think of the input as being . Similar calculations can be done for .)
If , then which is quadratic in .
And so on.
More generally is a symmetric polynomial. As form a basis of symmetric polynomials of degree , can be written as a linear combination in this basis. Now note that only depends on ; substituting gives that is of degree in .
(Note that the degree of can be strictly less than the degree of (e.g. for : we have ).)
Then, applying symmetrization on , if is a real polynomial close to Parity (in norm), then is also close to Parity’ (as a convex combination of close values).
Finally, remark that for every integer , we have: Parity and Parity. In particular, as , must have at least zeroes, and must therefore be zero, which is a contradiction.
We will now focus on proving the theorem.
Note that one direction is easy: if a function is closely approximated by a polynomial of degree , it cannot distinguish two wise indistinguishable distributions and :
where comes from the fact that and are wise indistinguishable.
The general proof goes by a Linear Programming Duality (aka finitedimensional HahnBanach theorem, minmax, etc.). This states that:
If , , and , then:
We can now prove the theorem:
Proof.
The proof will consist in rewriting the sides of the equality in the theorem as outputs of a Linear Program. Let us focus on the left side of the equality: .
We will introduce variables, namely and for every , which will represent for .
We will also use the following, which can be proved similarly to the Vazirani XOR Lemma:
Claim 3. Two distributions and are wise indistinguishable if and only if: with , , where is the Fourier basis of boolean functions.
The quantity can then be rewritten:
Following the syntax of LP Duality stated above, we have:
, (where goes over ),
,
,
,
where the rows of except the first two correspond to some such that .
We apply LP duality. We shall denote the new set of variables by
.
We have the following program:
Writing , the objective becomes to minimize , while the second set of constraints can be rewritten:
The expression is an arbitrary degree polynomial which we denote by . So our constrains become
Where ranges over all degree polynomials, and we are trying to minimize . Because is always below , but when you add it becomes bigger, is always within of .
1.4 Approximate Degree of AND.
Let us now study the AND function on bits. Let us denote the minimal degree of a polynomial approximating with error .
We will show that AND.
Let us first show the upper bound:
Claim 4. We have:
To prove this claim, we will consider a special family of polynomials:
Definition 5. (Chebychev polynomials of the first kind.)
The Chebychev polynomials (of the first kind) are a family of polynomials defined inductively as:
 ,
 ,
 .
Those polynomials satisfy some useful properties:
 ,
 such that ,
 .
Property follows from , and property follows from a direct induction. For a nice picture of these polynomials you should have come to class (or I guess you can check wikipedia). We can now prove our upper bound:
Proof. Proof of Claim:
We construct a univariate polynomial such that:
 ;
 ;
 .
In other words, will be close to on , and close to on . Then, we can naturally define the polynomial for the AND function on bits to be , which also has degree . Indeed, we want to be close to if has Hamming weight less than , while being close to on of Hamming weight (by definition of AND). This will conclude the proof.
Let us define as follows:
Intuitively, this uses the fact that Chebychev polynomials are bounded in (Property 2.) and then increase very fast (Property 3.).
More precisely, we have:
 by construction;
 for , we have:
by Property 2.;
by Property 3. and 4., and therefore for some , we have: .
Let us now prove the corresponding lower bound:
Claim 6. We have:
Proof. Let be a polynomial that approximates the AND function with error . Consider the univariate symmetrization of .
We have the following result from approximation theory:
Theorem 7. Let be a real univariate polynomial such that:
 ;
 for some .
Then .
To prove our claim, it is therefore sufficient to check that satisfies conditions 1. and 2., as we saw that :
 We have: by assumption on ;
 We have and (by assumption), so that the mean value theorem gives some such that .
This concludes the proof.
1.5 Approximate Degree of ANDOR.
Consider the AND function on bits and the OR function on bits. Let ANDOR be their composition (which outputs the AND of the outputs of the function on bits (disjoint) blocks).
It is known that ANDOR. To prove the upper bound, we will need a technique to compose approximating polynomials which we will discuss later.
Now we focus on the lower bound. This lower bound was recently proved independently by Sherstov and by Bun and Thaler. We present a proof that is different (either in substance or in language) and which we find more intuitive. Our proof replaces the “dual block method” with the following lemma.
Lemma 8. Suppose that
distributions over are wise indistinguishable distributions; and
distributions over are wise indistinguishable distributions.
Define over as follows: : draw a sample from , and replace each bit by a sample of (independently).
Then and are wise indistinguishable.
Proof. Consider any set of bit positions; let us show that they have the same distribution in and .
View the as blocks of bits. Call a block of bits heavy if ; call the other blocks light.
There are at most heavy blocks by assumption, so that the distribution of the (entire) heavy blocks are the same in and by wise indistinguishability of and .
Furthermore, conditioned on any outcome for the samples in , the light blocks have the same distribution in both and by wise indistinguishability of and .
Therefore and are wise indistinguishable.
Special Topics in Complexity Theory, Lectures 45
Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola
1 Lectures 45, Scribe: Matthew Dippel
These lectures cover some basics of smallbias distributions, and then a more recent pseudorandom generator for readonce CNF [GMR12].
2 Small bias distributions
Definition 1.[Small bias distributions] A distribution over has bias if no parity function can distinguish it from uniformly random strings with probability greater than . More formally, we have:
In this definition, the is simply the probability of a parity test being or over the uniform distribution. We also note that whether we change the definition to have the probability of the parity test being or doesn’t matter. If a test has probability of being equal to , then it has probability of being , so the bias is independent of this choice.
This can be viewed as a distribution which fools tests that are restricted to computing parity functions on a subset of bits.
Before we answer the important question of how to construct and efficiently sample from such a distribution, we will provide one interesting application of small bias sets to expander graphs.
Theorem 2.[Expander construction from a small bias set] Let be a distribution over with bias . Define as the following graph:
Then, when we take the eigenvalues of the random walk matrix of in descending order , we have that:
Thus, smallbias sets yields expander graphs. Smallbias sets also turn out to be equivalent to constructing good linear codes. Although all these questions have been studied much before the definition of smallbias sets [NN90], the computational perspective has been quite useful, even in answering old questions. For example TaShma used this perspective to construct better codes [Ta17].
3 Constructions of small bias distributions
Just like our construction of boundedwise independent distributions from the previous lecture, we will construct smallbias distributions using polynomials over finite fields.
Theorem 1.[Small bias construction] Let be a finite field of size , with elements represented as bit strings of length . We define the generator as the following:
In this notation, a subscript of indicates taking the th bit of the representation. Then the output of over uniform and has bias .
Proof. Consider some parity test induced by a subset . Then when applied to the output of , it simplifies as:
Note that is the evaluation of the polynomial at the point . We note that if , then the value of is equally likely to be or over the probability of a uniformly random . This follows from the fact that the inner product of any nonzero bit string with a uniformly random bit string is equally likely to be or . Hence in this case, our generator has no bias.
In the case where , then the inner product will always be , independent of the value of . In these situations, the bias is , but this is conditioned on the event that .
We claim that this event has probability . Indeed, for non empty , is a polynomial of degree . Hence it has at most roots. But we are selecting from a field of size . Hence the probability of picking one root is .
Hence overall the bias is at most .
To make use of the generator, we need to pick a specific . Note that the seed length will be . If we want to achieve bias , then we must have . Al the logarithms in this lecture are in base . This gives us a seed length of .
Smallbias are so important that a lot of attention has been devote to optimizing the constant “2” above. A lower bound of on the seed length was known. TaShma recently [Ta17] gave a nearly matching construction with seed length .
We next give a sense of how to obtain different tradeoffs between and in the seed length. We specifically focus on getting a nearly optimal dependence on , because the construction is a simple, interesting “derandomization” of the above one.
3.1 An improved small bias distribution via bootstrapping
We will show another construction of small bias distributions that achieves seed length . It will make use of the previous construction and proof.
The intuition is the following: the only time we used that was uniform was in asserting that if , then is uniform. But we don’t need to be uniform for that. What do we need from ? We need that it has smallbias!
Our new generator is where and are as before but with different parameters. For , we pick of length , whereas just needs to be an biased generator on bits, which can be done as we just saw with bits. This gives a seed length of , as promised.
We can of course repeat the argument but the returns diminish.
4 Connecting small bias to kwise independence
We will show that using our small bias generators, we can create distributions which are almost wise independent. That is, they are very close to a wise independent distribution in statistical distance, while having a substantially shorter seed length than what is required for wise independence. In particular, we will show two results:
 Small bias distributions are themselves close to wise independent.
 We can improve the parameters of the above by feeding a small bias distribution to the generator for wise independence from the previous lectures. This will improve the seed length of simply using a small bias distribution.
Before we can show these, we’ll have to take a quick aside into some fundamental theorems of Fourier analysis of boolean functions.
4.1 Fourier analysis of boolean functions 101
Let . Here the switch between and is common, but you can think of them as being isomorphic. One way to think of is as being a vector in . The th entry of indicates the value of . If we let be the indicator function returning iff , but once again written as a vector like is, then any function can be written over the basis of the vectors, as:
This is the “standard” basis.
Fourier analysis simply is a different basis in which to write functions, which is sometimes more useful. The basis functions are . Then any boolean function can be expressed as:
where the , called the “Fourier coefficients,” can be derived as:
where the expectation is over uniformly random .
Claim 1. For any function with range , its Fourier coefficients satisfy:
Proof. We know that , as squaring the function makes it . We can reexpress this expectation as:
We make use of the following fact: if , then . If they equal each other, then their difference is the empty set and this function is .
Overall, this implies that the above expectation can be simply rewritten as:
Since we already decided that the expectation is , the claim follows.
5 Small bias distributions are close to wise independent
Before we can prove our claim, we formally introduce what we mean for two distributions to be close. We use the most common definition of statistical difference, which we repeat here:
Definition 1. Let be two distributions over the same domain . Then we denote their statistical distance , and sometimes written as , as
Note that the probabilities are with respect to the individual distributions and . We may also say that is close to if .
We can now show our result, which is known as Vazirani’s XOR Lemma:
Theorem 2. If a distribution over has bias , then is close to the uniform distribution.
Proof. Let be a test. To fit the above notation, we can think of as being defined as the set of inputs for which . Then we want to bound:
Expanding in Fourier basis we rewrite this as
We know that for all non empty , and when is the empty set. We also know that for all non empty , and is when is the empty set. So the above can be bounded as:
Lemma 3.
Using the above lemma completes the upper bound and the proof of the theorem.
Corollary 4. Any bits of an biased distribution are close to uniform.
Using the corollary above, we see that we can get close to a wise independent distribution (in the sense of the corollary) by taking a small bias distribution with . This requires seed length Recall that for exact wise we required seed length .
5.1 An improved construction
Theorem 5. Let be the generator previously described that samples a wise independent distribution (or any linear ). If we replace the input to with a small bias distribution of , then the output of is close to being wise independent.
Proof. Consider any parity test on bits on the output of . It can be shown that is a linear map, that is, simply takes its seed and it multiplies it by a matrix over the field GF(2) with two elements. Hence, corresponds to a test on the input of , on possibly many bits. The test is not empty because is wise independent. Since we fool with error , we also fool with error , and the theorem follows by Vazirani’s XOR lemma.
Using the seed lengths we saw we get the following.
Corollary 6. There is a generator for almost wise independent distributions with seed length .
6 Tribes Functions and the GMRTV Generator
We now move to a more recent result. Consider the Tribes function, which is a readonce CNF on bits, given by the And of terms, each on bits. You should think of where and .
We’d like a generator for this class with seed length . This is still open! (This is just a single function, for which a generator is trivial, but one can make this challenge precise for example by asking to fool the Tribes function for any possible negation of the input variables. These are tests and a generator with seed length is unknown.)
The result we saw earlier about fooling And gives a generator with seed length , however the dependence on is poor. Achieving a good dependence on has proved to be a challenge. We now describe a recent generator [GMR12] which gives seed length . This is incomparable with the previous , and in particular the dependence on is always suboptimal. However, when the generator [GMR12] gives seed length which is better than previously available constructions.
The highlevel technique for doing this is based on iteratively restricting variables, and goes back about 30 years [AW89]. This technique seems to have been abandoned for a while, possibly due to the spectacular successes of Nisan [Nis91, Nis92]. It was revived in [GMR12] (see also [GLS12]) with an emphasis on a good dependence on .
A main tool is this claim, showing that smallbias distributions fool products of functions with small variance. Critically, we work with nonboolean functions (which later will be certain averages of boolean functions).
Claim 1. Let be a series of boolean functions. Further, let be an biased distribution over bits, where each is bits long. Then
where is variance of with respect to the uniform distribution.
This claim has emerged from a series of works, and this statement is from a work in progress with Chin Ho Lee. For intuition, note that constant functions have variance , in which case the claim gives good bounds (and indeed any distribution fools constant functions). By contrast, for balanced functions the variance is constant, and the sum of the variances is about , and the claim gives nothing. Indeed, you can write Inner Product as a product of nearly balanced functions, and it is known that smallbias does not fool it. For this claim to kick in, we need each variance to be at most .
In the tribes function, the And fucntions have variance , and the sum of the variances is about and the claim gives nothing. However, if you perturb the Ands with a little noise, the variance drops polynomially, and the claim is useful.
Claim 2. Let be the AND function on bits. Rewrite it as , where . That is, we partition the input into two sets. Define as:
where is uniform. Then
Proof.
We know that is simply the expected value of , and since is the AND function, this is , so the right term is .
We reexpress the left term as . But we note that this product is iff . The probability of this happening is .
Thus the final difference is .
We’ll actually apply this claim to the Or function, which has the same variance as And by De Morgan’s laws.
We now present the main inductive step to fool tribes.
Claim 3. Let be the tribes function, where the first bits of each of the terms are fixed. Let be the free bits per term, and the number of terms that are nonconstant (some term may have become after fixing the bits).
Reexpress as , where each term’s input bits are split in half, so .
Let be a small bias distribution with bias (for a big enough to be set later). Then
That is, if we replace half of the free bits with a small bias distribution, then the resulting expectation of the function only changes by a small amount.
To get the generator from this claim, we repeatedly apply Claim 3, replacing half of the bits of the input with another small bias distribution. We repeat this until we have a small enough remaining amount of free bits that replacing all of them with a small bias distribution causes an insignificant change in the expectation of the output.
At each step, is cut in half, so the required number of repetitions to reduce to constant is . Actually, as explained below, we’ll stop when for a suitable constant (this arises from the error bound in the claim above, and we).
After each replacement, we incur an error of , and then we incur the final error from replacing all bits with a small bias distribution. This final error is negligible by a result which we haven’t seen, but which is close in spirit to the proof we saw that bounded independence fools AND.
The total accumulated error is then . If we wish to achieve a specific error , we can run each small bias generator with .
At each iteration, our small bias distribution requires bits, so our final seed length is .
Proof of Claim 3. Define , and rewrite our target expression as:
This is in the form of Claim 1. We also note that from Claim 2 that .
We further assume that . For if this is not true, then the expectation over the first terms is , because of the calculation
Then we can reason as in the proof that bounded independence fools AND (i.e., we can run the argument just on the first terms to show that the products are close, and then use the fact that it is small under uniform, and the fact that adding terms only decreases the probability under any distribution).
Under the assumption, we can bound the sum of the variances of as:
If we assume that then this sum is .
We can then plug this into the bound from Claim 1 to get
Now we set so that , and the bound becomes:
By making large enough the claim is proved.
In the original paper, they apply these ideas to readonce CNF formulas. Interestingly, this extension is more complicated and uses additional ideas. Roughly, the progress measure is going to be number of terms in the CNF (as opposed to the width). A CNF is broken up into a small number of Tribes functions, the above argument is applied to each Tribe, and then they are put together using a general fact that they prove, that if and are fooled by smallbias then also on disjoint inputs is fooled by smallbias.
References
[AW89] Miklos Ajtai and Avi Wigderson. Deterministic simulation of probabilistic constantdepth circuits. Advances in Computing Research – Randomness and Computation, 5:199–223, 1989.
[GLS12] Dmitry Gavinsky, Shachar Lovett, and Srikanth Srinivasan. Pseudorandom generators for readonce accˆ0. In Proceedings of the 27th Conference on Computational Complexity, CCC 2012, Porto, Portugal, June 2629, 2012, pages 287–297, 2012.
[GMR12] Parikshit Gopalan, Raghu Meka, Omer Reingold, Luca Trevisan, and Salil Vadhan. Better pseudorandom generators from milder pseudorandom restrictions. In IEEE Symp. on Foundations of Computer Science (FOCS), 2012.
[Nis91] Noam Nisan. Pseudorandom bits for constant depth circuits. Combinatorica. An Journal on Combinatorics and the Theory of Computing, 11(1):63–70, 1991.
[Nis92] Noam Nisan. Pseudorandom generators for spacebounded computation. Combinatorica, 12(4):449–461, 1992.
[NN90] J. Naor and M. Naor. Smallbias probability spaces: efficient constructions and applications. In 22nd ACM Symp. on the Theory of Computing (STOC), pages 213–223. ACM, 1990.
[Ta17] Amnon TaShma. Explicit, almost optimal, epsilonbalanced codes. In ACM Symp. on the Theory of Computing (STOC), pages 238–251, 2017.
Special Topics in Complexity Theory, Lectures 23
Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola
1 Lectures 23, Scribe: Tanay Mehta
In these lectures we conclude the proof that bounded independence fools the AND function, and look at the more recent result that bounded independence fools the circuit class .
1.1 Bounded Independence Fools AND
We state again the theorem from last time.
Theorem 1. Let be a distribution over such that any are independent (but not necessarily uniform). Then, we have that
Proof. Let be the distribution of . Let be the wise independent distribution such that for all and the are independent. The theorem is equivalent to the following statement.
We will prove the above statement by the following version of the InclusionExclusion principle.
1.1.1 InclusionExclusion Principle
Let be any distribution over . Note that by De Morgan’s laws, we have
Let be the event that . We want to bound the quantity . By looking at the Venn diagram of the events , we can see that
and so on. In general, we have the following. Define
Then, we have the bounds for odd , and for even . This fact holds for any distribution.
Let us return to the proof. Note that the are the same for and up to because they only involve sums of ANDs of at most events. Hence, we have that
where the last equality comes from the definition of . Therefore, we are done if . We have that
where . To bound the expectation we recall a useful inequality.
1.1.2 A Useful Inequality
Let be nonnegative real numbers. Then, by the AMGM inequality, we have that
Consider the following more general statement,
and note that the left most term is equal to , while the right most term is equal to
Applying the above inequality to and a common approximation for the binomial coefficient, we have that
Therefore, we are done if . Recall that . So if is small then is close to .
It remains to handle the case that . Pick such that
By the previous argument, the AND of the first is the same up to for and . Also, for every distribution the probability of that the And of bits is 1 is at most the probability that the And of bits is . And also, for the wise independent distribution we have
The combination of these facts concludes this case. To summarize, in this case we showed that
as well as
By the choice of and the previous argument, we also know that and so we are done, as all quantities above are at most (and at least ).
Remark 2. The bound is tight up to
Proof. Let be the distribution over as follows: and . Then, is wise independent. However, if is even, then
Yet, we have that
1.2 Bounded Independence Fools
Acknowledgement. This section is based on Amnon TaShma’s notes for the class 0368.4159 Expanders, Pseudorandomness and Derandomization CS dept, TelAviv University, Fall 2016.
Note that a DNF on bits can be modeled as a depth two circuit where the top layer is an ORgate whose inputs are ANDgates, which take inputs and their negations. The circuit class can be viewed as a generalization of this to higher (but constant) depth circuits. That is, consists of circuits using ANDgates, ORgates, NOTgates, and input registers. Each of the gates have unbounded fanin (i.e. the number of input wires). The size of the circuit is defined to be the number of gates.
is one of the most studied classes in complexity theory. circuits of polynomial size can do many things, including adding and subtracting bit integers.
Conjecture 3.[LinialNisan[LN90]] wise independence fools circuits of depth and size .
The conjecture was open for a long time, even for in the special case . In 2007 a breakthrough work by Bazzi [Baz09] proved it for . Shortly afterwards, Razborov presented a simpler proof of Bazzi’s result [Raz09], and Braverman proved the conjecture for any with wise independence [Bra10]. Tal improved the result to [Tal17].
Interestingly, the progress on the conjecture does not use ideas that were not around since the time of its formulation. Bottom line: if a problem is open for a long time, you should immediately attack it with existing tools.
The highlevel intuition why such a result should be true is the following:
 is approximated by polylog degree polynomials.
 wise independence fools degree polynomials.
Proof of (2). Let . Let be a degree polynomial over . Write as
If is a wise independent distribution on , then by linearity of expectation
There are several notions of approximating AC by lowdegree polynomials. We now review two of them, explaining why neither of them is sufficient. Braverman showed how to cleverly combine the two methods to prove a version of (1) that’s strong enough.
1.2.1 Approximation 1
Theorem 4. For all circuits of size and depth , for all distributions over , for all , there exists a polynomial of degree such that
The important features of this approximation are that it works under any distribution, and when the polynomial is correct it outputs a boolean value.
Similar approximations appear in many papers, going back to Razborov’s paper [Raz87] (who considers polynomials modulo 2) which uses ideas from earlier still work.
Note that the polynomial depends on the circuit chosen, and on the distribution. This theorem is not a good enough approximation because on the fraction of inputs where the polynomial and circuit are unequal, the value of the polynomial can (and does) explode to be much greater than . This prevents us from bounding the average of the polynomial.
Nevertheless, let us prove the above theorem.
Proof. Consider one ORgate of fanin . We construct a distribution of polynomials that compute any input with high probability. This implies that there is a fixed polynomial that computes the circuit on a large fraction of the inputs by an averaging argument.
For . let be a random subset of where every element is included with probability , independently.
Suppose has Hamming weight . Then, . And the sum can be shown to equal with constant probability.
Define the approximation polynomial to be
Note that if has weight , then with constant probability. If , then with probability . We can adjust the error probability to by repeating each term in the product times.
Thus, we can approximate one gate with the above polynomial of degree . Construct polynomials as above for each gate, with error parameter . The probability that any of them is wrong is at most by a union bound. To obtain the approximating polynomial for the whole circuit compose all the polynomials together. Since the circuit is of depth , the final degree of the approximating polynomial is , as desired.
As mentioned at the beginning, this is a distribution on polynomials that computes correctly any input with probability at least . By averaging, there exists a fixed polynomial that computes correctly a fraction of inputs.
It can be verified that the value of the polynomial can be larger than . The polynomial for the gates closest to the input can be as large as . Then at the next level it can be as large as , which is already much larger than .
1.3 Approximation 2
Theorem 5. For all circuits of size and depth , for all error values , there exists a polynomial of degree such that
The important feature of this approximation is that it bounds the average, but only under the uniform distribution. Because it does not provide any guarantee on other distributions, including wise independent distributions, it cannot be used directly for our aims.
Remark 6. Approximation 2 is proved via the switching lemma, an influential lemma first proved in the early 80’s by Ajtai [Ajt83] and by Furst, Saxe, and Sipser [FSS84]. The idea is to randomly set a subset of the variables to simplify the circuit. You can do this repeatedly to simplify the circuit even further, but it only works on the uniform distribution. Hastad [Hås87] gave a much tighter analysis of the switching lemma, and the paper [LMN93] used it to prove a version of Approximation 2 with a slightly worse dependence on the error. Recently, a refinement of the switching lemma was proved in [Hås14, IMP12]. Based on that, Tal [Tal17] obtained the corresponding refinement of Approximation 2 where the parameters are as stated above. (The polynomial is simply obtained from the Fourier expansion of the function computed by the circuit by removing all Fourier coefficients larger than a certain threshold. The bound on the Fourier decay in [Tal17] implies the desired approximation.)
1.4 Bounded Independence Fools
Theorem 7. For all circuits with unbounded fanin of size and depth , for all error values , for all wise independent distributions on , we have that
Corollary 8. In particular, if , , , then suffices.
The next claim is the ultimate polynomial approximation used to prove the theorem.
Claim 9. For all circuits with unbounded fanin of size and depth , for all error values , for all wise independent distributions on , there is a set of inputs, and a degree polynomial such that:
 is ’rare’ under both and :
, and . Here we write for the indicator function of the event .  For all , . Here is the logical Or.
 .
We only need (1) under , but (1) under is used to prove (3).
We can construct the polynomial approximation from Claim 9 by using a combination of Approximation 1 and 2. First we need a little more information about Approximation 1.
Claim 10. Two properties of approximation 1:
Proof of Claim 10 part 2. Consider a single OR gate with input gates . This is represented in the approximating polynomial by the term
Note that the term is incorrect exactly when the input has weight but all the sets intersect or ones. This can be checked in AC, in parallel for all gates in the circuit.
Proof of Claim 9. Run approximation 1 for the distribution , yielding the polynomial and the set . This already proves the first part of the claim for both and , because if has probability under it has probability under , and the same for . Use Claim 10 part 2, and run approximation 2 on . Call the resulting polynomial , which has degree with error bound .
The idea in the ultimate approximating polynomial is to “check if there is a mistake, and if so, output . Otherwise, output ”. Formally:
Claim 9 part 2 can be shown as follows. by definition. So, if , then we are done. Otherwise, . So there is no mistake, and . Hence, by the properties of Approximation 1, . This implies .
It only remains to show Claim 9 part 3:
By part 1 of Claim 9,
We can show that this equals
by the following argument: If then by definition. If , then there is no mistake, and . This implies that .
Let us rewrite the above expression in terms of the expectation norm.
Recall the triangle inequality, which states: . Therefore, letting we have that the above quantity is
To conclude, we will show that each of the above terms are . Note that
By Claim 10 part 1 and Approximation 2, this is at most
For this quantity to be at most we set . Here we critically set the error in Approximation 2 much lower, to cancel the large values arising from Approximation 1. By Theorem 5, the polynomial arising from approximation 2 has degree .
Finally, let us bound the other term, . If , then the distance is . If , then the distance is . Therefore, this term is at most , which we know to be at most .
References
[Ajt83] Miklós Ajtai. formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.
[Baz09] Louay M. J. Bazzi. Polylogarithmic independence can fool DNF formulas. SIAM J. Comput., 38(6):2220–2272, 2009.
[Bra10] Mark Braverman. Polylogarithmic independence fools AC circuits. J. of the ACM, 57(5), 2010.
[FSS84] Merrick L. Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the polynomialtime hierarchy. Mathematical Systems Theory, 17(1):13–27, 1984.
[Hås87] Johan Håstad. Computational limitations of smalldepth circuits. MIT Press, 1987.
[Hås14] Johan Håstad. On the correlation of parity and smalldepth circuits. SIAM J. on Computing, 43(5):1699–1708, 2014.
[IMP12] Russell Impagliazzo, William Matthews, and Ramamohan Paturi. A satisfiability algorithm for AC. In ACMSIAM Symp. on Discrete Algorithms (SODA), pages 961–972, 2012.
[LMN93] Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform, and learnability. J. of the ACM, 40(3):607–620, 1993.
[LN90] Nathan Linial and Noam Nisan. Approximate inclusionexclusion. Combinatorica, 10(4):349–365, 1990.
[Raz87] Alexander Razborov. Lower bounds on the dimension of schemes of bounded depth in a complete basis containing the logical addition function. Akademiya Nauk SSSR. Matematicheskie Zametki, 41(4):598–607, 1987. English translation in Mathematical Notes of the Academy of Sci. of the USSR, 41(4):333338, 1987.
[Raz09] Alexander A. Razborov. A simple proof of Bazzi’s theorem. ACM Transactions on Computation Theory (TOCT), 1(1), 2009.
[Tal17] Avishay Tal. Tight bounds on the fourier spectrum of AC0. In Conf. on Computational Complexity (CCC), pages 15:1–15:31, 2017.
Special Topics in Complexity Theory: Lecture 1
Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola
1 Lecture 1, Scribe: Chin Ho Lee
In this first lecture we begin with some background on pseudorandomness and then we move on to the study of bounded independence, presenting in particular constructions and lower bounds.
1.1 Background
Let us first give some background on randomness. There are 3 different theories:
(1) Classical probability theory. For example, if we toss a coin times then the probability of each outcome is the same, i.e., . However, intuitively we feel that the first outcome is less random than the second.
(2) Kolmogorov complexity. Here the randomness is measured by the length of the shortest program outputting a string. In the previous example, the program for the second outcome could be “print 011011100011”, whereas the program for the first outcome can be “print 01 six times”, which is shorter than the first program.
(3) Pseudorandomness. This is similar to resourcebounded Kolmogorov complexity. Here random means the distribution “looks random” to “efficient observers.”
Let us now make the above intuition precise.
Definition 1.[Pseudorandom generator (PRG)] A function is a pseudorandom generator (PRG) against a class of tests with error , if it satisfies the following 3 conditions:
(1) the output of the generator must be longer than its input, i.e., ;
(2) it should fool , that is, for every test , we have ;
(3) the generator must be efficient.
To get a sense of the definition, note that a PRG is easy to obtain if we drop any one of the above 3 conditions. Dropping condition (1), then we can define our PRG as . Dropping condition (2), then we can define our PRG as . Dropping condition (3), then the PRG is not as obvious to obtain as the previous two cases. We have the following claim.
Claim 2. For every class of tests , there exists an inefficient PRG with error and seed length .
Before proving the claim, consider the example where is the class of circuits of size over bit input, it is known that . Hence, applying our claim above we see that there is an inefficient PRG that fools with error and seed length .
We now prove the claim using the probabilistic method.
Proof. Consider picking at random. Then by the Chernoff bound, we have for every test ,
if . Therefore, by a union bound over , there exists a fixed such that for every , the probabilities are within .
1.2 wise independent distribution
A major goal in research in pseudorandomness is to construct PRGs for (1) richer and richer class , (2) smaller and smaller seed length , and making the PRG explicit. For starters, let us consider a simple class of tests.
Definition 3.[local tests] The local tests are tests that depend only on bits.
We will show that for this class of tests we can actually achieve error . To warm up, consider what happens when , then we can have a PRG with seed length by defining and .
For , we have the following construction. Define
Here the length of and is , and we exclude . Note that the output has bits, but we can append one uniform bit to the output of . So the seed length would be .
Now we prove the correctness of this PRG.
Claim 4. The defined above is a PRG against local tests with error .
Proof. We need to show that for every , the random variable over the choice of is identical to , the uniform bit string. Since , suppose without loss of generality that there exists an such that and . Now is uniform, and conditioned on , is also uniform, thanks to the index .
The case for becomes much more complicated and involves the use of finite fields. One can think of a finite field as a finite domain that behaves like in the sense that it allows you to perform arithmetic operations, including division, on the elements. We will use the following fact about finite fields.
Lemma 5. There exist finite fields of size , for every prime and integer . Moreover, they can be constructed and operated with in time .
Remark 6. Ideally one would like the dependence on to be . However, such construction remains an open question and there have been many attempts to constructing finite fields in time . Here we only work with finite fields with , and there are a lot of explicit constructions for that.
One simple example of finite fields are integers modulo .
Theorem 7. Let . For every , there exists an explicit construction over such that
(1) elements in can be sampled with bits, and
(2) every symbols are uniform in .
For , we can use the above theorem with , and the PRG can output the first bit of every symbol.
Remark 8. There exist other constructions that are similar to the inner product construction for the case , with carefully chosen, but the way to choose involves the use of finite fields as well.
Note that we can also apply the theorem for larger to fool local tests with seed length .
We now prove the theorem.
Proof. Pick a finite field of size . Let be uniform random elements in which we think of as a polynomial of degree . We define the generator to be
(One should think of the outputs of as lines and curves in the real plane.)
The analysis of the PRG follows from the following useful fact: For every points , there exists exactly one degree polynomial going through them.
Let us now introduce a terminology for PRGs that fool local tests.
Definition 9. We call distributions that look uniform (with error ) to local tests wise independent (also known as wise uniform). The latter terminology is more precise, but the former is more widespread.
We will soon see an example of a distribution where every elements are independent but not necessarily uniform.
1.3 Lower bounds
We have just seen a construction of wise independent distributions with seed length . It is natural to ask, what is the minimum seed length of generating wise independent distributions?
Claim 10. For every , every PRG for local tests over has seed length .
Proof. We use the linearalgebraic method. See the book by Babai–Frankl [1] for more applications of this method.
To begin, we will switch from to , and write the PRG as a matrix , where the rows are all the possible outputs of the PRG. Since the PRG fools local tests and , one can verify that every columns of are orthogonal, i.e., for . As shown below, this implies that the vectors are independent. And by linear algebra this gives a lower bound on .
However so far we have not used . Here’s how to use it. Consider all the column vectors obtained by taking the entrywise products of any of the vectors in . Because of wise independence, these ’s are again orthogonal, and this also implies that they are linearly independent.
Claim 11. If are orthogonal, then they are linearly independent.
Proof. Suppose they are not and we can write for some . Taking inner product with on both sides, we have that the L.H.S. is nonzero, whereas the R.H.S. is zero because the vectors are orthogonal, a contradiction.
Therefore, the rank of must be at least the number of ’s, and so
Rearranging gives .
1.4 Who is fooled by wise independence?
In the coming lectures we will see that wise independence fools , the class of constantdepth circuits with unbounded fanin. Today, let us see what else is fooled by independence in addition to local tests.
(1) Suppose we have independent variables and we want to understand the behavior of their sum . Then we can apply tools such as the Chernoff bound, tail bounds, Central Limit Theorem, and the Berry–Esseen theorem. The first two give bounds on large deviation from the mean. The latter two are somewhat more precise facts that show that the sum will approach a normal distribution (i.e., the probability of being larger than for any is about the same). One can show that similar results hold when the ’s are wise independent. The upshot is that the Chernoff bound gives error , while under wise independence we can only get an error .
(2) We will see next time that wise independence fools and .
(3) wise independence is also used as hashing in loadbalancing.
1.4.1 wise independence fools AND
We now show that wise independent distributions fool the AND function.
Claim 12. Every wise uniform distribution fools the AND functions on bits with error .
Proof. If the AND function is on at most bits, then by definition the error is . Otherwise the AND is over more than bits. Without loss of generality we can assume the AND is on the first bits. Observe that for any distribution , we have
The righthandside is the same under uniform and wise uniformity, and is . Hence,
Instead of working over bits, let us now consider what happens over a general domain . Given functions . Suppose are wise uniform over . What can you say about the AND of the outputs of the ’s, ?
This is similar to the previous example, except now that the variables are independent but not necessarily uniform. Nevertheless, we can show that a similar bound of still holds.
Theorem 13.[[2]] Let be random variables over , which are wise independent, but not necessarily uniform. Then
This fundamental theorem appeared in the conference version of [2], but was removed in the journal version. One of a few cases where the journal version contains less results than the conference version.
Proof. Since each is in , by De Morgan’s law, we can write
If we define the event to be , then is the same as . Now we apply the inclusionexclusion principle, which says
we will finish the proof in the next lecture.
References