Non-abelian groups behave in ways that are useful in computer science. Barrington’s famous result [Bar89] shows that we can write efficiently an arbitrary low-depth computation as a group product over any non-solvable group. (Being non-solvable is a certain strengthening of being non-abelian which is not important now.) His result, which is false for abelian groups, has found myriad applications in computer science. And it is amusing to note that actually results about representing computation as group products were obtained twenty years before Barrington, see [KMR66]; but the time was not yet ripe.

This post is about a different property that certain non-abelian groups have and that is also useful. Basically, these groups ”mix” in the sense that if you have several distributions over the group, and the distributions have high entropy, then the product distribution (i.e., sample from each distribution and output the product) is very close to uniform.

First, let us quickly remark that this is completely false for abelian groups. To make an example that is familiar to computer scientists, consider the group of n-bit strings with bit-wise xor. Now let A be the uniform distribution over this group where the first bit is always 0. Then no matter how many independent copies of A you multiply together, the product is always A.

Remarkably, over other groups it is possible to show that the product distribution will become closer and closer to uniform. A group that works very well in this respect is SL(2,q), the group of 2×2 matrices over the field with q elements with determinant 1. This is a group that in some sense is very far from abelian. In particular, one can prove the following result.

**Theorem 1.**[Three-step mixing [Gow08, BNP08]] Let G = SL(2,q), and let A, B, and C be three subsets of G of constant density. Let a, b, and c be picked independently and uniformly from A, B, and C respectively. Then for any g in G we have

| Pr[abc = g] – 1∕|G|| < 1∕|G|^{1+Ω(1)}.

Note that the conclusion of the theorem in particular implies that abc is supported over the entire group. This is remarkable, since the starting distributions are supported over only a small fraction of the group. Moreover, by summing over all elements g in the group we obtain that abc is polynomially close to uniform in statistical distance.

Theorem 1 can be proved using representation theory. This must be a great tool, but for some reason I always found it a little difficult to digest the barrage of definitions that usually anticipate the interesting stuff.

Luckily, there is another way to prove Theorem 1. I wouldn’t be surprised if this is in some sense the same way, and moreover this other way is not sometimes I would call elementary. But it is true that I will be able to sketch a proof of the theorem without using the word ”representation”. In this post we will prove some preliminary results that are valid for all groups, and the most complicated thing used is the Cauchy-Schwarz inequality. In the next post we will work specifically with the group SL(2,q), and use more machinery. This is all taken from this paper with Gowers [GV15] (whose main focus is the study of mixing in the presence of dependencies).

First, for convenience let us identify a set A with its characteristic function. So we write A(a) for a belongs to the set A. It is convenient to work with a slightly different statement:

**Theorem 2.** Let G = SL(2,q) and let A,B,C be three subsets of G of densities α,β,γ respectively. For any g in G,

|E_{abc=g}A(a)B(b)C(c) – αβγ|≤|G|^{-Ω(1)}

where the expectation is over uniform elements a, b, and c from the group G such that their product is equal to g.

This Theorem 2 is equivalent to Theorem 1, because

E_{abc=g}A(a)B(b)C(c) |
= Pr[A(a),B(b),C(c)|abc = g] | ||

= Pr[abc = g|A(a),B(b),C(c)]|G|αβγ |

by Bayes’ rule. So we can get Theorem 1 by dividing by |G|αβγ.

Now we observe that to prove this ”mixing in three steps” it actually suffices to prove mixing in *four* steps.

**Theorem 3.**[Mixing in four steps] Let G = SL(2,q) and let A,B,C,D be four subsets of G of densities α,β,γ,δ respectively. For any g in G,

E_{abcd=g}A(a)B(b)C(c)D(d) – αβγδ ≤|G|^{-Ω(1)},

where the expectation is over uniform elements a, b, c, and d from the group G such that their product is equal to g.

**Lemma 4.** Mixing in four steps implies mixing in three.

**Proof:** Rewrite

|E_{abc=g}A(a)B(b)C(c) – αβγ| = |E_{abc=g}f(a)B(b)C(c)|

where f(a) := A(a) – α.

In these proofs we will apply Cauchy-Schwarz several times. Each application ”loses a square,” but since we are aiming for an upper bound of the form 1∕|G|^{Ω(1)} we can afford any constant number of applications. Our first one is now:

(E_{abc=g}f(a)B(b)C(c))^{2} |
≤ (E_{ c}C(c)^{2})(E_{ c}(E_{ab=gc-1}f(a)B(b))^{2}) |
||

= γE_{c}E_{ab=a′b′=gc-1}f(a)B(b)f(a′)B(b′) |
|||

= γE_{ab=a′b′}(A(a) – α)B(b)B(b′)(A(a′) – α). |

There are four terms that make up the expectation. The terms that involve at least one α sum to -α^{2}β^{2}. The remaining term is the expectation of A(a)B(b)B(b′)A(a′). Note that ab = a′b′ is equivalent to ab(1∕b′)(1∕a′) = 1_{G}. Hence by Theorem 3 this expectation is at most |G|^{-Ω(1)}. QED

So what remains to see is how to prove mixing in four steps. We shall reduce the mixing problem to the following statement about the mixing of *conjugacy classes* of our group.

**Definition 5.** We denote by C(g) the conjugacy class {h^{-1}gh : h in G} of an element g in G. We also denote by C(g) the uniform distribution over C(g) for a uniformly selected g in G.

**Theorem 6.**[Mixing of conjugacy classes of SL(2,q)] Let G = SL(2,q). With probability ≥ 1 -|G|^{-Ω(1)} over uniform a,b in G, the distribution C(a)C(b) is |G|^{-Ω(1)} close in statistical distance to uniform.

Theorem 6 is proved in the next blog post. Here we just show that is suffices for our needs.

**Lemma 7.** Theorem 6 implies Theorem 3.

**Proof:** We rewrite the quantity to bound as

E_{a,c}(A(a)C(c)E_{b,d:abcd=g}f(b,d))

for f(b,d) = B(b)D(d) – βδ.

Now by Cauchy-Schwarz we bound this above by

E_{a,c,b,d,b′,d′}f(b,d)f(b′,d′)

where the expectation is over variables such that abcd = g and ab′cd′ = g. As in the proof that mixing in four steps implies mixing in three, we can rewrite the last two equations as the single equation bcd = b′cd′.

The fact that the same variable c occurs on both sides of the equation is what gives rise to conjugacy classes. Indeed, this equation can be rewritten as

c^{-1}(1∕b)b′c = d(1∕d′).

Performing the following substitutions: b = x,b′ = xh,d′ = y we can rewrite our equation as

d = c^{-1}hcy.

Hence we have reduced our task to that of bounding

Ef(x,C(h)y)f(xh,y)

for uniform x,y,h.

We can further replace y with C(h)^{-1}y, and rewrite the expression as

E_{x,y}(f(x,y)E_{h}f(xh,C(h^{-1})y)).

This is at most

(E_{x,y}f^{2}(x,y))E_{ x,y,h,h′}f(xh,C(h^{-1})y)f(xh′,C(h′^{-1})y).

Recalling that f(b,d) = B(b)D(d) – βδ, and that E[f] = βδ, the first factor is at most 1. The second can be rewritten as

E_{x,y,h,h′}f(x,y)f(xh^{-1}h′,C(h′^{-1})C(h)y)

replacing x with xh^{-1} and y with C(h^{-1})^{-1}y = C(h)y.

Again using the definition of f this equals

E_{x,y,h,h′}B(x)D(y)B(xh^{-1}h′)D(C(h′^{-1})C(h)y) – β^{2}δ^{2}.

Now Lemma 6 guarantees that the distribution (x,y,xh^{-1}h′,C(h′^{-1})C(h)y) is 1∕|G|^{Ω(1)}-close in statistical distance to the uniform distribution over G^{4}, and this concludes the proof. QED

### References

[Bar89] David A. Mix Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC^{1}. J. of Computer and System Sciences, 38(1):150–164, 1989.

[BNP08] László Babai, Nikolay Nikolov, and László Pyber. Product growth and mixing in finite groups. In ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 248–257, 2008.

[Gow08] W. T. Gowers. Quasirandom groups. Combinatorics, Probability & Computing, 17(3):363–387, 2008.

[GV15] W. T. Gowers and Emanuele Viola. The communication complexity of interleaved group products. In ACM Symp. on the Theory of Computing (STOC), 2015.

[KMR66] Kenneth Krohn, W. D. Maurer, and John Rhodes. Realizing complex Boolean functions with simple groups. Information and Control, 9:190–195, 1966.