In the previous post we have reduced the “three-step mixing” over SL(2,q), the group of 2×2 matrices over the field with q elements with determinant 1, to the following statement about mixing of conjugacy classes.
Theorem 1.[Mixing of conjugacy classes of SL(2,q)] Let G = SL(2,q). With probability ≥ 1 -|G|-Ω(1) over uniform a,b in G, the distribution C(a)C(b) is |G|-Ω(1) close in statistical distance to uniform.
Here and throughout this post, C(g) denotes a uniform element from the conjugacy class of g, and every occurrence of C corresponds to an independent draw.
In this post we sketch a proof of Theorem 1, following [GV15]. Similar theorems were proved already. For example Shalev [Sha08] proves a version of Theorem 1 without a quantitative bound on the statistical distance. It is possible to plug some more representation-theoretic information into Shalev’s proof and get the same quantitative bound as in Theorem 1, though I don’t have a good reference for this extra information. However the proof in [Sha08] builds on a number of other things, which also means that if I have to prove a similar but not identical statement, as we had to do in [GV15], it would be hard for me.
Instead, here is how you can proceed to prove the theorem. First, we remark that the distribution of C(a)C(b) is the same as that of
because for uniform x, y, and z in Fq we have the following equalities of distributions:
C(C(a)C(b)) = x-1(y-1ayz-1bz)x = x-1(y-1ayxx-1z-1bz)x = C(a)C(b)
where the last equality follows by replacing y with yx and z with zx.
That means that we get one conjugation “for free” and we just have to show that C(a)C(b) falls into various conjugacy classes with the right probability.
Now the great thing about SL(2,q) is that you can essentially think of it as made up of q conjugacy classes each of size q2 (the whole group has size q3 – q). This is of course not exactly correct, in particular the identity element obviously gives a conjugacy class of size 1. But except for a constant number of conjugacy classes, every conjugacy class has size q2 up to lower-order terms. This means that what we have to show is simply that the conjugacy class of C(a)C(b) is essentially uniform over conjugacy classes.
Next, the trace map Tr : SL(2,q) → Fq is essentially a bijection between conjugacy classes and the field Fq. To see this recall that the trace map satisfies the cyclic property:
Tr xyz = Tr yzx.
This implies that
Tr u-1au = Tr auu-1 = Tr a,
and so conjugate elements have the same trace. On the other hand, the q matrixes
for x in Fq all have different traces, and by what we said above their conjugacy classes make up essentially all the group.
Putting altogether, what we are trying to show is that
is |G|-Ω(1) close to uniform over F q in statistical distance.
Furthermore, again by the cyclic property we can consider without loss of generality
instead, and moreover we can let a have the form
and b have the form
(there is no important reason why w is at the bottom rather than at the top).
Writing a generic g in SL(2,q) as the matrix
you can now with some patience work out the expression
Tr au-1bu = vu 3u4 – u32 + u 42 – vu 1u2 + u12 – vwu 2u3 + wu1u3 – u22 – wu 2u4.
What we want to show is that for typical choices of w and v, the value of this polynomial is q-Ω(1) close to uniform over F q for a uniform choice of u subject to the determinant of u being 1, i.e, u1u4 – u2u3 = 1.
Maybe there is some machinery that immediately does that. Lacking the machinery, you can use the equation u1u4 – u2u3 = 1 to remove u4 by dividing by u1 (the cases where u1 = 0 are few and do not affect the final answer). Now you end up with a polynomial p in three variables, which we can rename x, y, and z. You want to show that p(x,y,z) is close to uniform, for uniform choices for x,y,z. The benefit of this substitution is that we removed the annoying condition that the determinant is one.
To argue about p(x,y,z), the DeMillo–Lipton–Schwartz-Zippel lemma comes to mind, but is not sufficient for our purposes. It is consistent with that lemma that the polynomial doesn’t take a constant fraction of the values of the field, which would give a constant statistical distance. One has to use more powerful results known as the Lang-Weil theorem. This theorem provides under suitable conditions on p a sharp bound on the probability that p(x,y,z) = a for a fixed a in Fq. The probability is 1∕q plus lower-order terms, and then by summing over all a in Fq one obtains the desired bound on the statistical distance.
I am curious if there is a way to get the statistical distance bound without first proving a point-wise bound.
To apply the Lang-Weil theorem you have to show that the polynomial is “absolutely irreducible,” i.e., irreducible over any algebraic extension of the field. This can be proven from first principles by a somewhat lengthy case analysis.