Binomial probability distribution

Today we're going to use the computer to better understand the binomial distribution.

Imagine that you're trying to treat a terrible illness, from which only 30% of sufferers generally recover. You've developed a new medication to treat this illness, but you're not absolutely sure of its effectiveness. The medication is extremely expensive, so you don't want to give it to patients if it doesn't improve the survival rate.

You randomly choose a group of 10 ill people to receive the experimental medication. For the purposes of this example, it is fair to assume that the probability of recovery is the same for each person, so that the outcome of the treatment can be considered a series of independent trials with the same probability of success for each trial.


1.Assuming that the medication actually isn't improving the recovery rate, what is the probability of seeing 30% of your test group recover? 50%? 70%?

To answer these questions, we can generate the distribution function for a binomial distribution with number of trials n=10 and probability of success (recovery) p=0.30. S-Plus can easily show us this information without a series of tedious calculations by hand.

  1. Create a new data set. Go to File, New, Data set.
  2. Fill a column with the number of recoveries possible: 0, 1, 2, ..., 10. Go to Data, Fill. Make sure to fill in the name you want this column to have (maybe "Successes"), its length of 11, the starting value of 0, and the increment of 1.
  3. To find the probabilities associated with varying degrees of success in your sample, go to Data, Distribution functions. Choose the previously generated column as the source column, and name the new (target) column something informative (maybe "Probability"). It is important that under "Statistic, Result Type" you choose "Density". (This is counter-intuitive, but in Splus, "probability" often implies "cumulative distribtuion".) The other blanks are fairly self-explanatory, but beware that the default is "normal", rather than "binomial".
2. Is the probability of seeing 10 out of 10 people recover when the probability of recovery is only 30% actually 0? Select the column and choose Properties, changing the precision to get a better idea of what these probabilities really are. (The point of this is to realize that even if the probability of recovery is unchanged by the medicine, there is some probability that all of the people in your sample could recover. However, if you saw this many recover, you would understandably think that your medication was beneficial. So, there is always the possibility of making errors when trying to draw inferences from samples.)

3.To get a graph of these probabilities, choose Graph, 2-D Plot, Bar with Base at Y min... .

4.How do the probabilities change if the probability of recovery using the new medication has actually increased to 90%?

5.How do the graphs change if the sample size is increased to 100 randomly selected patients? (A more realistic example.) For instance, do you notice the charts becoming more bell-shaped as sample size increases?


Don't forget to logout of your PC when you are done!