This lab introduces three probability questions that are famous for their ability to confuse people. We show how to calculate the answers and how to use S-Plus to verify your calculations.
bday _ sample ( 1:365, size=26, replace=T )
For me this gave
344 18 24 306 185 197 261 22 50 345 306 273 275 31 145 31 98 359 237
276 290 86 134 306 173 281
The S-Plus command duplicated(bday) tells whether there are any duplicated values in bday. For me,
duplicated(bday)
yields
F F F F F F F F F F T F F F F T F F F F F F F T F F
This is a vector of F(alse) and T(rue), the same length as bday. It's
T(rue) is positions 11, 16 and 24 because the 11th and 16th and 24th
birthdays are
duplicates of earlier ones. The other positions are F(alse) because they
are not duplicates of earlier ones. This particular set of 26 birthdays
has 3 duplicates.
Of course we don't want to check each set of birthdays by eye; we want S-Plus to do it for us. We can use the any() function, which checks whether any of its arguments are True. In our case
any ( duplicated ( bday ))
yields T, because positions 11, 16 and 24 are True.
Now we want to repeat the procedure many times. In this example we'll repeat N.reps=1000 times. We'll let N.people=26 be the number of people in the room. Calling it N.people will make it easy to change later. We'll begin by making a 1000 by N.people matrix of birthdays. We'll treat each row as a replicate of the birthday problem and we'll count how many rows contain duplicates.
N.reps _ 1000
N.people _ 26
bday _ matrix ( sample ( 1:365, N.people*N.reps, replace=T),
nrow=N.reps )
dup _ apply ( bday, 1, duplicated )
has.dup _ apply ( dup, 2, any )
sum ( has.dup ) / N.reps
For me, this yielded about 62%. Try it for yourself. Try it with different values of N.reps and N.people.
1 - prod ( 365:340 / rep(365,26) )
365:340 means the sequence of numbers (365 364 363 ... 341 340)
rep(365,26) means (365 365 365 ... 365 365)
So 365:340 / rep(365,26) means the sequence of numbers ( 365/365 364/365
363/365 ... 341/365 340/365)
prod ( ... ) means take their product; and that's exactly what you want.
Let's dig a little deeper. Instead of 26 people in the room, take a different number. Call it N.people. Let's plot the chance of a duplicated birthday against N.people.
N.people _ 10:50 # I'm taking N.people between 10 and 50 but you can use
different numbers
prob.dup _ rep(NA, length(N.people)) # make an empty vector to hold
the answer
for ( i in 1:length(N.people) ) {
numerators _ 365:(365-N.people[i]+1)
denominators _ rep ( 365, N.people[i] )
prob.dup[i] _ 1 - prod ( numerators / denominators )
}
plot ( N.people, prob.dup )
Many people find it surprising that with only a bit more than 20 people you are more likely than not to find a duplicated birthday, and that with around 50 people you are almost sure to have a duplicate.
For many people the answer is obvious. There are two unopened doors, they are equally likely to contain the prize, so it doesn't matter whether you switch. Although obvious, this answer is wrong. When Marilyn Vos Savant gave the correct answer in a newspaper column she was deluged with mail telling her that she had made a mistake. Some of the letters were from professional mathematicians. But Marilyn was right. We'll see how to simulate a solution in S-Plus and how to calculate an exact solution mathematically.
N _ 5000
prize _ sample ( 1:3, N, replace=T )
choice _ sample ( 1:3, N, replace=T )
Now we have to simulate which door Monte shows me. See whether you can follow my S-Plus commands.
shown _ rep(NA,N)
for ( i in 1:N ) {
if ( prize[i]==1 & choice[i]==1 ) shown[i] _ sample ( c(2,3), 1 ) else
if ( prize[i]==1 & choice[i]==2 ) shown[i] _ 3 else
if ( prize[i]==1 & choice[i]==3 ) shown[i] _ 2 else
if ( prize[i]==2 & choice[i]==1 ) shown[i] _ 3 else
if ( prize[i]==2 & choice[i]==2 ) shown[i] _ sample ( c(1,3), 1 ) else
if ( prize[i]==2 & choice[i]==3 ) shown[i] _ 1 else
if ( prize[i]==3 & choice[i]==1 ) shown[i] _ 2 else
if ( prize[i]==3 & choice[i]==2 ) shown[i] _ 1 else
if ( prize[i]==3 & choice[i]==3 ) shown[i] _ sample ( c(1,2), 1 )
}
Now, for each simulated game we have to record whether keeping the original choice wins and whether switching wins. Whenever prize[i] equals choice[i] switching loses and keeping wins. Whenever prize[i] does not equal choice[i] switching wins and keeping loses. (Do you follow the logic? If not, print out prize[1:20] and choice[1:20] and verify that my logic is correct.) So we can keep track of wins and losses with the following S-Plus statements.
keep.wins _ rep ( NA, N )
switch.wins _ rep ( NA, N )
for ( i in 1:N ) {
if ( prize[i]==choice[i] ) {
keep.wins[i] _ 1
switch.wins[i] _ 0
} else {
keep.wins[i] _ 0
switch.wins[i] _ 1
}
}
sum(keep.wins)/N
sum(switch.wins)/N
For me, sum(keep.wins)/N = 0.3324 and sum(switch.wins)/N = 0.6676, suggesting that the strategy of keeping your original choice wins about 1/3 of the time while the strategy of switching wins about 2/3 of the time.
There are two lessons here. First, even mathematicians have bad intuition when it comes to probability. So don't be discouraged if things don't seem right to you. Keep practicing and you'll get the hang of it.
Second, just because there are two doors to choose from doesn't mean they are equally likely. Of course, when the game starts they are equally likely. But when Monte chooses a door it gives you some information that changes the chances. Sometimes Monte is choosing a door at random. But 2/3 of the time he is purposely avoiding the door with the prize, and it's those times you can take advantage of.
N _ 5000
my.drawer _ sample ( 1:3, N, replace=T )
my.coin _ rep ( NA, N )
for ( i in 1:N ) {
if ( my.drawer[i] == 1 ) my.coin[i] _ "G" else
if ( my.drawer[i] == 2 ) my.coin[i] _ sample ( c("G","S"), 1 ) else
my.coin[i] _ "S"
}
sum ( my.drawer==1 ) / sum ( my.coin=="G" )