This lab introduces three probability questions that are famous for their ability to confuse people. We show how to calculate the answers and how to use S-Plus to verify your calculations.

THE BIRTHDAY PROBLEM

There are 26 people in a room. What is the chance that two of them have the same birthday?

Most people guess that the chance is quite small. In fact, it is greater than 50%. We calculate first with S-Plus, then mathematically.

S-Plus calculations

We'll ignore leap years and take 365 as the number of days in a year. Sampling a person is like drawing a number from 1 to 365. We want to draw 26 numbers (with replacement) and check whether any of them are equal. To draw the 26 numbers we say

bday _ sample ( 1:365, size=26, replace=T )

For me this gave
344 18 24 306 185 197 261 22 50 345 306 273 275 31 145 31 98 359 237 276 290 86 134 306 173 281

The S-Plus command duplicated(bday) tells whether there are any duplicated values in bday. For me,

duplicated(bday)

yields

F F F F F F F F F F T F F F F T F F F F F F F T F F
This is a vector of F(alse) and T(rue), the same length as bday. It's T(rue) is positions 11, 16 and 24 because the 11th and 16th and 24th birthdays are duplicates of earlier ones. The other positions are F(alse) because they are not duplicates of earlier ones. This particular set of 26 birthdays has 3 duplicates.

Of course we don't want to check each set of birthdays by eye; we want S-Plus to do it for us. We can use the any() function, which checks whether any of its arguments are True. In our case

any ( duplicated ( bday ))

yields T, because positions 11, 16 and 24 are True.

Now we want to repeat the procedure many times. In this example we'll repeat N.reps=1000 times. We'll let N.people=26 be the number of people in the room. Calling it N.people will make it easy to change later. We'll begin by making a 1000 by N.people matrix of birthdays. We'll treat each row as a replicate of the birthday problem and we'll count how many rows contain duplicates.

N.reps _ 1000
N.people _ 26
bday _ matrix ( sample ( 1:365, N.people*N.reps, replace=T), nrow=N.reps )
dup _ apply ( bday, 1, duplicated )
has.dup _ apply ( dup, 2, any )
sum ( has.dup ) / N.reps

For me, this yielded about 62%. Try it for yourself. Try it with different values of N.reps and N.people.

Mathematical Calculations

The easiest way to calculate the chance of a duplicated birthday is
Chance(duplicated birthday) = 100% - Chance(all the people have different birthdays)
For all the people to have different birthdays we need the second person to have a different birthday than the firts (the chance is 364/365), the third to have a different birthday than the first two (the chance is 363/365), etc. So the answer is
Chance(duplicated birthday) = 100% - (364/365) (363/365) (362/365) ... (365-25)/365
= 100% - 40% = 60%.
You can do this calculation in S-Plus as

1 - prod ( 365:340 / rep(365,26) )

365:340 means the sequence of numbers (365 364 363 ... 341 340)
rep(365,26) means (365 365 365 ... 365 365)
So 365:340 / rep(365,26) means the sequence of numbers ( 365/365 364/365 363/365 ... 341/365 340/365)
prod ( ... ) means take their product; and that's exactly what you want.

Let's dig a little deeper. Instead of 26 people in the room, take a different number. Call it N.people. Let's plot the chance of a duplicated birthday against N.people.

N.people _ 10:50 # I'm taking N.people between 10 and 50 but you can use different numbers
prob.dup _ rep(NA, length(N.people)) # make an empty vector to hold the answer
for ( i in 1:length(N.people) ) {
numerators _ 365:(365-N.people[i]+1)
denominators _ rep ( 365, N.people[i] )
prob.dup[i] _ 1 - prod ( numerators / denominators )
}
plot ( N.people, prob.dup )

Many people find it surprising that with only a bit more than 20 people you are more likely than not to find a duplicated birthday, and that with around 50 people you are almost sure to have a duplicate.

LET'S MAKE A DEAL

Let's Make a Deal is an old television show that always ended with the following scenario. There are three doors. Behind one of them is a fabulous prize. You choose a door. Now the host, Monte Hall, opens one of the two remaining doors and shows you that it is empty. (He knows which door has the prize, so he can be sure of choosing an empty door to show you.) At this point you have the choice of keeping your original door, or trading for the remaining unopened door. If you choose the right door you get the prize; otherwise you get nothing. What should you do --- switch or keep your first choice?

For many people the answer is obvious. There are two unopened doors, they are equally likely to contain the prize, so it doesn't matter whether you switch. Although obvious, this answer is wrong. When Marilyn Vos Savant gave the correct answer in a newspaper column she was deluged with mail telling her that she had made a mistake. Some of the letters were from professional mathematicians. But Marilyn was right. We'll see how to simulate a solution in S-Plus and how to calculate an exact solution mathematically.

S-Plus calculations

Let's simulated this game N times. I'll use N=5000; but you can use a different number. First I'll make a vector that represents where the prize is. There will be N draws, each draw will be a 1, 2 or 3. Then I'll make a vector that represents my initial choice of door. It will also be N random draws of 1, 2 or 3.

N _ 5000
prize _ sample ( 1:3, N, replace=T )
choice _ sample ( 1:3, N, replace=T )

Now we have to simulate which door Monte shows me. See whether you can follow my S-Plus commands.

shown _ rep(NA,N)
for ( i in 1:N ) {
if ( prize[i]==1 & choice[i]==1 ) shown[i] _ sample ( c(2,3), 1 ) else
if ( prize[i]==1 & choice[i]==2 ) shown[i] _ 3 else
if ( prize[i]==1 & choice[i]==3 ) shown[i] _ 2 else
if ( prize[i]==2 & choice[i]==1 ) shown[i] _ 3 else
if ( prize[i]==2 & choice[i]==2 ) shown[i] _ sample ( c(1,3), 1 ) else
if ( prize[i]==2 & choice[i]==3 ) shown[i] _ 1 else
if ( prize[i]==3 & choice[i]==1 ) shown[i] _ 2 else
if ( prize[i]==3 & choice[i]==2 ) shown[i] _ 1 else
if ( prize[i]==3 & choice[i]==3 ) shown[i] _ sample ( c(1,2), 1 )
}

Now, for each simulated game we have to record whether keeping the original choice wins and whether switching wins. Whenever prize[i] equals choice[i] switching loses and keeping wins. Whenever prize[i] does not equal choice[i] switching wins and keeping loses. (Do you follow the logic? If not, print out prize[1:20] and choice[1:20] and verify that my logic is correct.) So we can keep track of wins and losses with the following S-Plus statements.

keep.wins _ rep ( NA, N )
switch.wins _ rep ( NA, N )
for ( i in 1:N ) {
if ( prize[i]==choice[i] ) {
keep.wins[i] _ 1
switch.wins[i] _ 0
} else {
keep.wins[i] _ 0
switch.wins[i] _ 1
}
}
sum(keep.wins)/N
sum(switch.wins)/N

For me, sum(keep.wins)/N = 0.3324 and sum(switch.wins)/N = 0.6676, suggesting that the strategy of keeping your original choice wins about 1/3 of the time while the strategy of switching wins about 2/3 of the time.

Mathematical calculations

It's easy to see that the strategy of keeping wins whenever your original choice matched the door with the prize. That happens 1/3 of the time, so Ch(winning by keeping) = 1/3. It's a little bit harder, but still possible to see that the strategy of switching wins whenever your original choice did not match the door with the prize. (Think about what Monte will do and what you will do if you originally chose an empty door.) That happens 2/3 of the time, so Ch(winning by switching = 2/3. It's also possible to calculate these chances using Bayes' formula, but it's hard.

There are two lessons here. First, even mathematicians have bad intuition when it comes to probability. So don't be discouraged if things don't seem right to you. Keep practicing and you'll get the hang of it.

Second, just because there are two doors to choose from doesn't mean they are equally likely. Of course, when the game starts they are equally likely. But when Monte chooses a door it gives you some information that changes the chances. Sometimes Monte is choosing a door at random. But 2/3 of the time he is purposely avoiding the door with the prize, and it's those times you can take advantage of.

THE PROBLEM WITH NO NAME

A bureau has 3 drawers. One drawer has 2 gold coins, one has 2 silver coins, and one has a gold and a silver. You randomly choose a drawer and randomly choose a coin from that drawer. You look at it. It's gold. What's the chance that the other coin in the same drawer is gold?

Splus calculations

For specificity, let's say drawer 1 has two G's, drawer 2 has 1 G and 1 S, and drawer 3 has two S's. Try to figure out what my S-Plus code is doing.

N _ 5000
my.drawer _ sample ( 1:3, N, replace=T )
my.coin _ rep ( NA, N )
for ( i in 1:N ) {
if ( my.drawer[i] == 1 ) my.coin[i] _ "G" else
if ( my.drawer[i] == 2 ) my.coin[i] _ sample ( c("G","S"), 1 ) else
my.coin[i] _ "S"
}

sum ( my.drawer==1 ) / sum ( my.coin=="G" )

Mathematical calculation

Ch(2nd is G | 1st is G) = Ch(2nd is G and 1st is G) / Ch(1st is G)
= Ch(drawer #1) / Ch(1st is G)
= (1/3) / (1/2) = 2/3.