In the previous post we have reduced the “three-step mixing” over SL(2,q), the group of 2×2 matrices over the field with q elements with determinant 1, to the following statement about mixing of conjugacy classes.

**Theorem 1.**[Mixing of conjugacy classes of SL(2,q)] Let G = SL(2,q). With probability ≥ 1 -|G|^{-Ω(1)} over uniform a,b in G, the distribution C(a)C(b) is |G|^{-Ω(1)} close in statistical distance to uniform.

Here and throughout this post, C(g) denotes a uniform element from the conjugacy class of g, and every occurrence of C corresponds to an independent draw.

In this post we sketch a proof of Theorem 1, following [GV15]. Similar theorems were proved already. For example Shalev [Sha08] proves a version of Theorem 1 without a quantitative bound on the statistical distance. It is possible to plug some more representation-theoretic information into Shalev’s proof and get the same quantitative bound as in Theorem 1, though I don’t have a good reference for this extra information. However the proof in [Sha08] builds on a number of other things, which also means that if I have to prove a similar but not identical statement, as we had to do in [GV15], it would be hard for me.

Instead, here is how you can proceed to prove the theorem. First, we remark that the distribution of C(a)C(b) is the same as that of

C(C(a)C(b)),

because for uniform x, y, and z in F_{q} we have the following equalities of distributions:

C(C(a)C(b)) = x^{-1}(y^{-1}ayz^{-1}bz)x = x^{-1}(y^{-1}ayxx^{-1}z^{-1}bz)x = C(a)C(b)

where the last equality follows by replacing y with yx and z with zx.

That means that we get one conjugation “for free” and we just have to show that C(a)C(b) falls into various conjugacy classes with the right probability.

Now the great thing about SL(2,q) is that you can essentially think of it as made up of q conjugacy classes each of size q^{2} (the whole group has size q^{3} – q). This is of course not exactly correct, in particular the identity element obviously gives a conjugacy class of size 1. But except for a constant number of conjugacy classes, every conjugacy class has size q^{2} up to lower-order terms. This means that what we have to show is simply that the conjugacy class of C(a)C(b) is essentially uniform over conjugacy classes.

Next, the trace map Tr : SL(2,q) → F_{q} is essentially a bijection between conjugacy classes and the field F_{q}. To see this recall that the trace map satisfies the cyclic property:

Tr xyz = Tr yzx.

This implies that

Tr u^{-1}au = Tr auu^{-1} = Tr a,

and so conjugate elements have the same trace. On the other hand, the q matrixes

x 1

1 0

for x in F_{q} all have different traces, and by what we said above their conjugacy classes make up essentially all the group.

Putting altogether, what we are trying to show is that

Tr C(a)C(b)

is |G|^{-Ω(1)} close to uniform over F_{ q} in statistical distance.

Furthermore, again by the cyclic property we can consider without loss of generality

Tr aC(b)

instead, and moreover we can let a have the form

0 1

1 w

and b have the form

v 1

1 0

(there is no important reason why w is at the bottom rather than at the top).

Writing a generic g in SL(2,q) as the matrix

u_{1} u_{2}

u_{3} u_{4}

you can now with some patience work out the expression

Tr au^{-1}bu = vu_{ 3}u_{4} – u_{3}^{2} + u_{ 4}^{2} – vu_{ 1}u_{2} + u_{1}^{2} – vwu_{ 2}u_{3} + wu_{1}u_{3} – u_{2}^{2} – wu_{ 2}u_{4}.

What we want to show is that for typical choices of w and v, the value of this polynomial is q^{-Ω(1)} close to uniform over F_{ q} for a uniform choice of u subject to the determinant of u being 1, i.e, u_{1}u_{4} – u_{2}u_{3} = 1.

Maybe there is some machinery that immediately does that. Lacking the machinery, you can use the equation u_{1}u_{4} – u_{2}u_{3} = 1 to remove u_{4} by dividing by u_{1} (the cases where u_{1} = 0 are few and do not affect the final answer). Now you end up with a polynomial p in three variables, which we can rename x, y, and z. You want to show that p(x,y,z) is close to uniform, for uniform choices for x,y,z. The benefit of this substitution is that we removed the annoying condition that the determinant is one.

To argue about p(x,y,z), the DeMillo–Lipton–Schwartz-Zippel lemma comes to mind, but is not sufficient for our purposes. It is consistent with that lemma that the polynomial doesn’t take a constant fraction of the values of the field, which would give a constant statistical distance. One has to use more powerful results known as the Lang-Weil theorem. This theorem provides under suitable conditions on p a sharp bound on the probability that p(x,y,z) = a for a fixed a in F_{q}. The probability is 1∕q plus lower-order terms, and then by summing over all a in F_{q} one obtains the desired bound on the statistical distance.

I am curious if there is a way to get the statistical distance bound without first proving a point-wise bound.

To apply the Lang-Weil theorem you have to show that the polynomial is “absolutely irreducible,” i.e., irreducible over any algebraic extension of the field. This can be proven from first principles by a somewhat lengthy case analysis.