It is winter, and the nights are getting colder. Actually the nights seem to be staying the same temperature, while the early hours of the morning seem to be getting freezing, which is what is causing the whole problem. After waking up too cold one morning, I decided that it was time for my winter duvet. So I dragged the massive down duvet (which weighs more than I do) from the top of the cupboard and replaced my flimsy summer duvet. The problem is that evenings are still warm, and my duvet ensures that I am kept uncomfortably hot, and unable to sleep.

Lying awake last night, my mind went wandering. Eventually it settled on something which I’d read, maybe six or seven years ago. Apparently, in a group of 50 people, the chance of two or more people having a birthday on the same day is around 97%. I found this slightly incredible, and my subconscious mind seemed to want verification.

(Note: Not a side note. Anyone not mathematically inclined may find themselves slightly lost in the rest of this post. You have been warned.)

So I set up a statistics problem in my head to calculate the probability of two or more people sharing a birthday in a sample size on n. Starting with the obvious, I set it clear in my mind that for n = 1, the probability is 0. For a sample size of 2, the probability would be 1 in 365.25 (to take leap years into account). For three people, the probability that the third person shares a birthday with either of the first two is added in. Eventually I came up with the nice simple equation of n*(n-1)/730.5. Unfortunately, this equation did not satisfy the obvious conditions from the other end. You’d expect the probability in a group of 366 people to be 100%. In fact, my formula gives 18000%. A slight overestimate. A little annoyed, I finally fell asleep.

The problem returned to me this morning. Some vague recollection of the statistics course I did last year made me realise that the focus should not be on the probability of two people sharing a birthday, but rather on the probability of two people not sharing a birthday. This would make the probabilities multiplicative rather than additive, which would solve the whole problem.

This meant that the probability of not sharing a birthday with anyone already in the group when the nth person is added is (1-(n-1)/365.25) multiplied by the probability for the rest of the group. That gave a nice iterative solution, but I don’t like iterative solutions (Actually I do. I love iterative solutions, but I also love it when something horribly complicated comes down to one simple formula). Wherever possible, I want a nice short equation that requires no understanding of advanced maths to solve. So I wrote down the iterative solution, and multiplied it out and got a simple expression involving factorials – the largest being 365!. So I pulled out my little scientific calculator, forgetting that it squeals when shown anything more than 70!. Since I happened to be sitting in front of a computer, I turned to that, and started Microsoft Excel.

(On a side note: If I’d been at home, I would have opened Python. If I’d been at university, I’d have opened Matlab. But I was at work, and Excel was the most powerful numerical computational tool I had available. Excel is limited, but has more power than most people give it credit for.)

Excel, however, was only mildly more powerful than my calculator, and pronounced its unintelligible death cry of “#NUM!” at 171!. With no tool more powerful, I had to think more broadly. I thought, “If I was some super genius, what would I do?”, and came up with an answer. I’d reprogram Excel. Unfortunately, I’m not a super genius and wouldn’t know where to begin. I’m only a moderately smarter than an average genius, and had to settle for creating a whole new technique for handling very, very large numbers. Luckily, I was saved the trouble by whoever it was that invented scientific notation. Using this principle, I set up a simple algorithm for finding the factorial of really large numbers.

I realise I only needed the algorithm to work up to 365!, but I do have a tendency to overdo this sort of thing. Limited only by the 65536 rows available in Excel I set out to prove that I could not be defeated. Upon reaching the end, and finding that 65535! is in fact 7.878034e287 188 (A number almost 300 000 digits long), I realised that there were 256 columns available, and I’d only used 7. Laughing, I started the algorithm with 65536 in column H, and double clicked the little black box to copy the cell contents to the bottom of the spreadsheet. The computer laughed back as its RAM was used up. It is incredible annoying how the computer just displays a white screen and refuses to do anything. The nice thing is that when it does finally give you a message, it allows you to drag the message over the screen, and doesn’t refresh anything behind it, which lets you get rather fascinating patterns, depending on how quickly you can drag the message box..

Eventually, I worked out a way past the RAM limitations of the computer and was driven by an overwhelming desire to find out when my algorithm would cease to work.. So I copied the algorithm all the way to column IR and created a spreadsheet that took several minutes to process a save request (and produced a 499 MB file). I finally found that 2 359 260! is 1.073258e14 010 425 (that’s a 14 million digit number). I hate it when I overdo things like this, but I’m always driven by curiosity.

So driven in fact, that I typed exactly 1000 words on it, and forgot completely about the probability of people sharing birthdays.

Lying awake last night, my mind went wandering. Eventually it settled on something which I’d read, maybe six or seven years ago. Apparently, in a group of 50 people, the chance of two or more people having a birthday on the same day is around 97%. I found this slightly incredible, and my subconscious mind seemed to want verification.

(Note: Not a side note. Anyone not mathematically inclined may find themselves slightly lost in the rest of this post. You have been warned.)

So I set up a statistics problem in my head to calculate the probability of two or more people sharing a birthday in a sample size on n. Starting with the obvious, I set it clear in my mind that for n = 1, the probability is 0. For a sample size of 2, the probability would be 1 in 365.25 (to take leap years into account). For three people, the probability that the third person shares a birthday with either of the first two is added in. Eventually I came up with the nice simple equation of n*(n-1)/730.5. Unfortunately, this equation did not satisfy the obvious conditions from the other end. You’d expect the probability in a group of 366 people to be 100%. In fact, my formula gives 18000%. A slight overestimate. A little annoyed, I finally fell asleep.

The problem returned to me this morning. Some vague recollection of the statistics course I did last year made me realise that the focus should not be on the probability of two people sharing a birthday, but rather on the probability of two people not sharing a birthday. This would make the probabilities multiplicative rather than additive, which would solve the whole problem.

This meant that the probability of not sharing a birthday with anyone already in the group when the nth person is added is (1-(n-1)/365.25) multiplied by the probability for the rest of the group. That gave a nice iterative solution, but I don’t like iterative solutions (Actually I do. I love iterative solutions, but I also love it when something horribly complicated comes down to one simple formula). Wherever possible, I want a nice short equation that requires no understanding of advanced maths to solve. So I wrote down the iterative solution, and multiplied it out and got a simple expression involving factorials – the largest being 365!. So I pulled out my little scientific calculator, forgetting that it squeals when shown anything more than 70!. Since I happened to be sitting in front of a computer, I turned to that, and started Microsoft Excel.

(On a side note: If I’d been at home, I would have opened Python. If I’d been at university, I’d have opened Matlab. But I was at work, and Excel was the most powerful numerical computational tool I had available. Excel is limited, but has more power than most people give it credit for.)

Excel, however, was only mildly more powerful than my calculator, and pronounced its unintelligible death cry of “#NUM!” at 171!. With no tool more powerful, I had to think more broadly. I thought, “If I was some super genius, what would I do?”, and came up with an answer. I’d reprogram Excel. Unfortunately, I’m not a super genius and wouldn’t know where to begin. I’m only a moderately smarter than an average genius, and had to settle for creating a whole new technique for handling very, very large numbers. Luckily, I was saved the trouble by whoever it was that invented scientific notation. Using this principle, I set up a simple algorithm for finding the factorial of really large numbers.

I realise I only needed the algorithm to work up to 365!, but I do have a tendency to overdo this sort of thing. Limited only by the 65536 rows available in Excel I set out to prove that I could not be defeated. Upon reaching the end, and finding that 65535! is in fact 7.878034e287 188 (A number almost 300 000 digits long), I realised that there were 256 columns available, and I’d only used 7. Laughing, I started the algorithm with 65536 in column H, and double clicked the little black box to copy the cell contents to the bottom of the spreadsheet. The computer laughed back as its RAM was used up. It is incredible annoying how the computer just displays a white screen and refuses to do anything. The nice thing is that when it does finally give you a message, it allows you to drag the message over the screen, and doesn’t refresh anything behind it, which lets you get rather fascinating patterns, depending on how quickly you can drag the message box..

Eventually, I worked out a way past the RAM limitations of the computer and was driven by an overwhelming desire to find out when my algorithm would cease to work.. So I copied the algorithm all the way to column IR and created a spreadsheet that took several minutes to process a save request (and produced a 499 MB file). I finally found that 2 359 260! is 1.073258e14 010 425 (that’s a 14 million digit number). I hate it when I overdo things like this, but I’m always driven by curiosity.

So driven in fact, that I typed exactly 1000 words on it, and forgot completely about the probability of people sharing birthdays.

*If you enjoyed this post, then don't forget to like, tweet, +1, or upvote on reddit. If you have any questions, comments or complaints, post them using the form below.*

. . . . . . . . . . . . . . . . . . . . . . . .

## 1 comment:

Do a word count. Its exactly 1000 words. My English teachers at school used to hate me.

Post a Comment