The birthday paradox is the fact that if you sample t independent variables each uniform in {1, 2,,n} then the probability that two will be equal is at least a constant independent from n when t n. the The word ”paradox” refers to the fact that t can be as small as n, as opposed to being closer to n. (Here I am not interested in the precise value of this constant as a function of t.)

The Wikipedia page lists several proofs of the birthday paradox where it is not hard to see why the n bound arises. Nevertheless, I find the following two-stage approach more intuitive.

Divide the random variables in two sets of 0.5 n each. If there are two in the first set that are equal, then we are done. So we can condition on this event not happening, which means that the variables in the first set are all distinct. Now take any variable in the second set. The probability that it is equal to any variable in the first set is 0.5 n∕n = 0.5 n. Hence, the probability that all the variables in the second set are different from those in the first is

(1 0.5 n)0.5 n e0.25 < 1.

## 3 thoughts on “The birthday paradox”

1. kodlu says:

Nice. I am a fan of birthday paradox arguments. This is how I needed to spell it out, to understand it.

Since you conditioned on the first half being distinct, call that event L, exp(-1/4) is of course a conditional probability. It is P(A|L).

Call the event that all are distinct A.

P(A)=P(A|L) P(L) + P(A|L’) P(L’)

where L’ is the complement of L. Note that A \subset L. So P(A|L’)=0.

So P(A)=P(A|L) P(L) < P(A|L) < 1, and we are done.

2. iart says:

The $\sqrt{n}$ symbol is not rendered on your post.

1. Manu says:

Thanks. I used the √ HTML command; not sure what wordpress is trying to do. I see it correctly on Firefox but indeed there is some strange behavior on Chrome.