If you are in a room containing 30 people, here is a good bet for you: Say that at least two will share the same birthday (day and month, not necessarily year). You will win the bet over 70% of the time. Even if there are only 23 people in the room you will win more often than you lose. This is called the birthday paradox, since with so many days in the year, we expect that we should need many more than 23 people to have a 50% chance of a match.
This is a quite well known result, based upon some fairly simple analysis, which assumes that the people are randomly selected, and that each day of the year (except February 29, which is omitted) is equally probable for a birthday. Under this uniform assumption, the actual probability for a match is 47.6% for 22 people. 50.7% for 23 people and 70.6% for 30 people.
But in reality, birthdays are not spread uniformly throughout the year. In England and Wales in the 20 years from 1995 to 2014, the number of babies born on Boxing Day (26th December) was only 74.7% of what it would have been if birthdays were uniform throughout the year. For Christmas Day (25th December) it was 78.2%, for 1st January 86.5%, for 27th December 92.0%, for Christmas Eve (24th December) 93.5% and for April 1st 93.6%. At the other end of the scale, on 26 September it was 108.5%, and for every day in September, except 1st September, it was over 100% compared to a uniform distribution.
Also 29th February does come round once every four years, and had 24.3% of the birthdays that would be expected under a uniform distribution, very close to the 25% if it were a typical day.
The obvious question is how does this non-uniformity affect the birthday paradox? It is fairly easy to see that uniform birthdays give the largest number of random people you need in a room to have a 50% chance of a match. So non-uniform birthdays will mean that at most 23 people will be needed to have a 50% chance of a match. But does the actual non-uniformity we see in the birthday data mean the number drops to 22, 21 or even below? In other words, is the birthday paradox even more paradoxical in the real world than in the idealised world of uniform birthdays?
To tackle this algebraically requires some pretty advanced mathematics and has to the best of my knowledge not been attempted for England and Wales (please contact me if you know differently). But to simulate it numerically, by sticking all the data into a spreadsheet and doing thousands of runs to see what happens, is reasonably simple, and made a fun new year exercise. Using the actual England and Wales birthday data for 1995 to 2014, including 29th February, the probability of a match is 47.5% +/- 0.1% for 22 people, 50.7% +/- 0.1% for 23 people, and 70.7% +/- 0.2% for 30 people, essentially unchanged from the idealised result with uniform birthdays omitting 29th February. The error bounds are the 95% confidence intervals from 100,000 numerical simulations.
How far from uniform would birthdays have to be to alter the result? If we assume that two days each month are felt to be ‘lucky’ and so have twice the usual number of birthdays, while six bank holidays a year are inconvenient for hospitals to schedule elective births and so have only half the usual number of birthdays, then the probability of a match is 49.5% +/- 0.6% for 22 people, 52.6% +/- 0.4% for 23 people and 72.5% +/- 0.3% for 30 people. That’s how extreme the non-uniformity has to be to push the break-even result down from 23 to within a whisker of 22.
In today’s real world, provided you have a random sample of people, you won’t go wrong using the idealised result.
Happy New Year.