This lab has three parts. In the first part we will look at the Poisson process, in the second we will look at the Poisson distribution, and in the third we will examine the relationship between the Poisson distribution and the binomial distribution. At the end of this lab, as last week, you will have an opportunity to practice your mastery of what you learned in the lab.
Poisson process
The Poisson distribution is often used to model the number of events that occur in a fixed time interval or fixed spatial region when they are occurring "at random", like shooting stars. Let's first see what a Poisson process might look like by using a simulation.
Suppose you spend five minutes watching shooting stars, and you know that on average they fall at 3 per minute. Thus, the rate parameter for our five-minute time period is lambda=15. We expect, on average, to see 15 shooting stars per five minutes.
To simulate this as a Poisson process, let's break that five-minute-long time interval into very small intervals of 1 second. (You might want to re-read pp. 137-8 to understand this better.)
t = [1:300]'; % This is a vector of seconds in time.
Now we need to simulate the shooting stars. Since the rate is, on average, 3 per minute, that's the same as 0.05 per second. In our simulation, each second will have a 0.05 chance of having a shooting star occur.
ss = rand(300,1)<0.05; % "ss" stands for "shooting stars" % rand(300,1) creates a column vector of 300 random numbers between 0 and 1, one for each second in the interval. % The Boolean operation <0.05 returns 1 for each value that is less than 0.05 and 0 for the rest. plot(t,ss,'o')
The row of o's along the bottom of the plot don't matter. They're the seconds in which no shooting star occurred. The row on the top is a simulation of a Poisson process. Let's now see how many shooting stars we got:
sum(ss)
Repeat the last three commands (copied again below) several times to see what Poisson process realizations typically look like and to see the typical number of shooting stars seen for this particular process.
ss = rand(300,1)<0.05; plot(t,ss,'o') sum(ss)
Poisson distribution
In the simulation above, one column represented a single simulation. We can do many simulations at once by using a matrix instead of a single column vector. The commands below will simulate 10,000 star-gazing five-minute periods all at once. It's too difficult to try to see them all. We'll only keep track of their sums: how many shooting stars occurred in each simulation. The command sum(ss) in Matlab returns a vector of column sums when ss is a matrix.
ss = rand(300,10000)<0.05; p = sum(ss); hist(p,0:50)
What you're simulating, a distribution of the count of how many shooting stars you'd see in a five-minute stretch, is a Poisson(15) distribution. Let's check out its mean, standard deviation, and variance:
mean(p) std(p) std(p)^2
Ideally, the mean and variance should both be close to 15, because the theoretical mean and variance of a Poisson(lambda) distribution are both lambda.
Now let's see how well the simulated data match the actual distribution, which is a built-in Matlab function.
plot(0:50,histc(p,0:50)/10000,'o') % the histc "histogram counts" function helps us rescale the frequencies to relative frequencies. hold on plot(0:50,poisspdf(0:50,15),'r')
Relationship between Poisson distribution and binomial distribution
In class we learned that Binomial(n,p) can be approximated by Poisson(lambda) with lambda = np , when n is very large. Let us check the approximation.
x = 0:12; y1 = poisspdf(x, 4); % calculate the Poisson probabilities with % parameter 4 at values 0 to 12 stem(x, y1, 'o', 'fill'); hold on; n=25; y2 = binopdf(x, n, 4/n); plot(x+0.3, y2,'+r'); % offset the x-coordinate by 0.3 to avoid overlapping your two distributions, % and then draw the Binomial probabilities in 'r'ed and marked by '+' hold off;Try different values of n, for example, try n=75 and 100. See how the Poisson approximates a binomial better when n (in the binomial) is large.
Summary of Commands for Discrete Distributions (Reminder)
Last week we learned three kinds of Matlab commands related with discrete distributions. They are commands for Probability Density Functions (ending with pdf), for Cumulative Distribution Functions (ending with cdf) and for Mean and Variance (ending with stat). The corresponding commands for each distribution are listed below:
binopdf, binocdf, binostat
geopdf, geocdf, geostat
poisspdf, poisscdf, poisstat
(The following two distributions will not be covered explicitly in lab. They are in your text, and the implementation of these commands in Matlab should be fairly easy after learning the ones above.)
hygepdf, hygecdf, hygestat
nbincdf, nbinpdf, nbinstat
Work out solutions to the following problems using Matlab, referring back to the earlier sections in this web page as little as possible. You do not need to turn your answers in. These problems are for you to check your own mastery and understanding. If you have difficulty answering the questions, ask one of the TAs for help.