This lab reinforces Cahpter 18 of Freedman, Pisani and Purves. We will use the computer to duplicate experiments carried out in the text.

PROBABILITY HISTOGRAMS

Page 310, Section 2 talks about the number of spots shown on two dice. We know already (See Figure 1 on page 239.) that

We can make a histogram of these numbers.

numbers _ 2:12
probs _ c ( 1,2,3,4,5,6,5,4,3,2,1 ) * 100 / 36
barplot ( probs, names=as.character(numbers), space=0, density=-1 )

That's the bottom panel of Figure 1 on page 311. Now we want to simulate rolling a pair of dice and finding the total number of spots a large number of times. We'll do it 10,000 times, just like the book. We'll use the sample function twice, once for the first die and once for the second. Then we'll add the results together.

die.1 _ sample ( 1:6, 10000, replace=T )
die.2 _ sample ( 1:6, 10000, replace=T )
total _ die.1 + die.2

We want to make a histogram of the first 100 rolls. That would be total[1:100]. We want a histogram of the first 1000 rolls, total[1:1000]. And finally we want a histogram of all 10,000 rolls. We'll stack these histograms along with the probability histogram, and recreate Figure 1.

par ( mfrow=c(4,1) )
edges _ (1:12) + .5
hist ( total[1:100], probability=T, breaks=edges, density=-1 )
hist ( total[1:1000], probability=T, breaks=edges, density=-1 )
hist ( total, probability=T, breaks=edges , density=-1)
barplot ( probs, names=as.character(numbers), space=0, density=-1 )

That's like Figure 1 on page 311. Notice that your top panel is different than FPP's top panel? Why? It's chance variation. Your 100 rolls turned out differently than their 100 rolls. Your second and third panels might be different than FPP's second and third panels. That's chance variation again. But the differences in the second panels are probably smaller the differences in the first panels; and the differences in the third panels are even smaller. That's the empirical histogram converging to the probability histogram; and that's because of the law of averages.

Now we'll recreate Figure 2 on page 313.

numbers _ 0:40
probs _ c ( 1,2,2,3,2,4,0,2,1,2,0,4,0,0,2,1,0,3,0,2,0,0,0,2,1,0,0,0,0,2, 0,0,0,0,0,1,0,0,0,0 ) / 36
product _ die.1 * die.2
edges _ (0:39) + .5
hist ( product[1:100], probability=T, breaks=edges, density=-1 )
hist ( product[1:1000], probability=T, breaks=edges, density=-1 )
hist ( product, probability=T, breaks=edges , density=-1)
barplot ( probs, widths=edges, names=as.character(numbers), density=-1, histo=T )

PROBABILITY HISTOGRAMS AND THE NORMAL CURVE

This section talks about the number of heads in many tosses of a coin. We already know (Chapter 15, the binomial formula) how to calculate the probability histogram for the number of heads. For example, the chance of exactly 40 heads in 100 tosses is (100!/40!60!)(1/2)^100. S-Plus has a built-in function dbinom() to do these calculations. The chance of exactly 40 heads in 100 tosses is dbinom(40,100,.5) Try it.

dbinom ( 40, 100, .5 )

Now suppose we want to make a histogram like Figure 3 on page 315. On the x-axis are the numbers from 35 to 65. Instead of calculating the chance for each of them separately, dbinom will do them all at once.

numbers _ 35:64
probs _ dbinom(numbers,100,.5)
barplot ( probs, widths=numbers, histo=T, names=as.character(numbers) )

Let's recreate the histograms in Figure 4 on page 316. We won't go to the trouble of adding the Normal curves to the plots.

par ( mfrow=c(3,1) )

numbers _ 35:64
probs _ dbinom(numbers,100,.5)
barplot ( probs, widths=numbers, histo=T, names=as.character(numbers) )

numbers _ 170:230
probs _ dbinom(numbers,400,.5)
barplot ( probs, widths=numbers, histo=T, names=as.character(numbers) )

numbers _ 405:495
probs _ dbinom(numbers,900,.5)
barplot ( probs, widths=numbers, histo=T, names=as.character(numbers) )

We can do something similar for Figure 6 on page 320. The difference is that we want to draw from a box with nine 0's and one 1. That's like saying the chance of Heads is 0.1. Here's how to do it.

par ( mfrow=c(3,1) )

numbers _ 0:7
probs _ dbinom(numbers,25,.1)
barplot ( probs, widths=numbers, histo=T, names=as.character(numbers) )

numbers _ 1:19
probs _ dbinom(numbers,100,.1)
barplot ( probs, widths=numbers, histo=T, names=as.character(numbers) )

numbers _ 22:58
probs _ dbinom(numbers,400,.1)
barplot ( probs, widths=numbers, histo=T, names=as.character(numbers) )

Figure 9 is trickier because the box has more than just 0's and 1's in it. We want to draw many times from a box with one 1, one 2 and one 9 and keep track of the sum. We'll use the sample() function to draw from the box. For example:

tickets _ c(1,2,9)
sample ( tickets, 25, replace=T )

gives us 25 draws from the box. If we sum those 25 draws, and repeat the process 10000 times, we'll have the top panel of Figure 9.

total _ rep(NA,10000)
for ( i in 1:10000 )
total[i] _ sum ( sample ( tickets, 25, replace=T ) )
hist(total,breaks=seq(min(total)-.5, max(total)+.5, by=1))

A subtle point is that the top panel of Figure 9 is a probability histogram, but we have not calculated a histogram. Instead, we've plotted a histogram of 10,000 sums of 25 draws. Our histogram is an approximation to the probability histogram. Why? Because empirical histograms converge to probability histograms as the number of repetitions gets large. That's the law of averages.

Can you figure out how to mimic Figure 10 on your own?