Pitman MTH 135/ STA 104 Probability Week 3

Normal and Binomial Distributions

A new photolithography process is still in the experimental stage; 
about 70% of the chips made from silicon dies treated with this process 
work properly.  Imagine that we are testing a series of these chips,
and let's assume independence.

In four independent trials, what is the probability that exactly ONE of
the four chips works?

Is it:

   (7/10) * (3/10) * (3/10) * (3/10)  =  7*27/10^4 = 189/10000 = 0.0189,

about one in 53???

No---  it is four times bigger than that, because the working chip could
be any of chips # 1,2,3,4:

             1         3
   4 * (0.70)  * (0.30)

What is the chance that exactly k chips will work in n independent tries,
for 0 <= k <= n, if a fraction 0<p<1 of the chips works?

Not quite p^k (1-p)^(n-k)...  this must be multiplied by

 / n \                           n!          n * (n-1) * ... * (n-k+1)
(     )  =  "n choose k"  =  -----------  =  -------------------------
 \ k /                        k!  (n-k)!                k !

"binomial coefficient", the number of distinct subsets of size k in a set
of size n.  We'll denote it "(n:k)" in these type-written notes, and will
sometimes shorten "(1-p)" to "q" (it comes up a lot).

------ DEFINITION -------

The       * NUMBER OF SUCCESSES
in a      * FIXED NUMBER
of        * INDEPENDENT TRIALS,
with the  * SAME PROBABILITY OF SUCCESS

has a "BINOMIAL" distribution, with

      P[ X = k ]  =  (n:k) p^k (1-p)^(n-k)

for 0 <= k <= n (and 0 for other values of k)

Math side-light:

  (a+b) * (a+b) * ... * (a+b) =  (a+b)^n  =  \sum (n:k) a^k b^(n-k).

--------------------------------------------------------------------

Most probable outcome?  For 0 <= k < n,

     P( X = k+1 )      n-k    p
     ------------  =  ----- -----                             (*)
      P( X = k )       k+1   1-p

starts out huge (n * p/q) and gets smaller and smaller, ending up at (p/nq).

As long as it's bigger than one, the probababilities are INCREASING; once
it's smaller than one, they DECREASE.  If it is ever exactly equal to one,
then two consecutive values k, k+1 have equal probability, bigger than any
other; that can only happen if:

    (n-k) p = (k+1)(1-p) ==>   (n+1) p = (k+1)  ==>  p = (k+1)/(n+1),

so e.g. if  p = 7/10  and n=9 then k=6 and k=7 have the same probability,
about 0.2668.  Otherwise the unique maximum happens when 

                          np-q < k < np+p

where (as usual) q = 1-p--- roughly, k \approx  n*p

===========

Even this maximum probability isn't very big....  P[ X = k ] maxes out at
a little below 1/sqrt(n) (more precise info is in the homework and text!),
so NO point has very high probability for large values of n.

More interesting is the SUM of the probabilities of extreme values---  say,
for p = 0.70 and n=10,

  P[ X <= 5 ] =  \sum _0 ^5  (10:k) (0.7)^k  (0.3)^(n-k) = 0.1502683

Is "5 successes in 10 tries" an extreme result if 70% of subjects improve??


 . Normal Random Variables
   *) Appendix 5, page 531:  Phi[z], z = 0.00(.01)3.59, P=0.5000...0.9998
   1. The Normal Approximation to the Binomial Distribution

      0.30 |
      0.25 |                                     .
      0.20 |                                .    :    :
      0.15 |                                :    :    :
      0.10 |                           .    :    :    :    :
      0.05 |                      .    :    :    :    :    :    .
      0.00 |__,____,____;____;____;____;____;____;____;____;____;___
              0    1    2    3    4    5    6    7    8    9   10

                                                              -b*(z-a)^2
These ALWAYS follow a "bell-shaped curve", approximately c * e
for some numbers a,b,c.  Demoivre and Laplace figured this out, and figured
out a,b,c, and figured out how to use it.  It's called the Normal
Distibution, and in their honor the Normal approximation to the Binomial is
called the "DeMoivre-Laplace Limit Theorem".  It is the first example of a
more general result called the "Central Limit Theorem" (abbrv. CLT) that
we'll encounter in week or two.

One can use (*) as a starting-point to prove the DeMoivre-Laplace result.
APPROXIMATELY, the Binomial pmf P(x) has a maximum at np, and satisfies

log P(np + z) = log P(np) + \sum_{k=np}^{np+z-1} log (n-k)p / (k+1)q 
              = log P(np) + \sum_{j=0} ^{z-1}    log (npq- jp)/(npq +(j+1)q) 
              ~ log P(np) + \sum_{j=0} ^{z-1}   -j/npq
              ~ log P(np) - z^2/2npq

since  log(1+s) = s + o(s); here s = (npq- jp)/(npq +(j+1)q) - 1.

                  -(x-np)^2 / 2npq
hence P(x) ~ c * e

This is the normal density function with mu=np and sigma^2 = npq.
Pitman's section 2.3 gives a more detailed derivation.

================================

Show how to get limits for Normal approx'n, with change-of-variables

================================

Roll a fair die 500 times---- what's the probability of at least 100
aces?

A1:         \sum _{x=100}^{500}   (500:x) (1/6)^x  (5/6)^(500-x)
            =  0.0282871   = 1-pbinom(99,500,1/6) = pbinom(400,500,5/6)

A2:         mu = n*p = 500/6 = 83.33;  sig^2 = n*p*(1-p) = 8.333^2

            ( 99.5 - 83.33) / 8.333 = 1.94000000000
            (500.5 - 83.33) / 8.333 = 50.06 = infinity

            = 1 - Phi(1.94) = pnorm(-1.94) = 0.02618984

            nb: pnorm(-2) = 0.02275013, error is 2.6 times bigger!


================================

Normal approx has mean mu = 10*0.7 = 7 and sdev sqrt(10*0.7*0.3) = 1.44914

P[ <= 5 Successes ] =  pbinom(5,10,0.7) =

     0.0000059049 + 0.0001377810 + 0.0014467005 + 0.0090016920 + 0.0367569090
   + 0.1029193452 = 0.1502683326, or approximately


Phi( (5.5 - 7.0) / 1.44914 ) =  pnorm(5.5, 7.0, 1.44914)
             = Phi ( -1.0351) = 1-Phi(1.0351) =  [ <-- for tables ]
             = 1 - (.8485+.8508)/2 = 1 - 0.8496 = 0.1504  (not bad!)

   EXAMPLE:  What is the chance of 50 Heads in 100 tosses of fair coin?

   / 100 \        50        50        12611418068195524166851562157
= |       | (0.50)    (0.50)     =   ------------------------------ = 0.0795892...
   \  50 /                           158456325028528675187087900672

  OR, APPROXIMATELY,

= P[ (49.5 - 50)/sqrt(100*.5*.5) < (X-mu)/sqrt(Var) < (50.5 - 50)/sqrt(100*.5*.5)]
= P[ -0.1 < Z < +0.1 ] approx =  2*(0.5398 - 0.5) = 2*(0.0398) = 0.0796


=============================================================================

Another way to see DeMoivre-Laplace:

Stirling's approximation to the factorial function is

          n!   ~   sqrt(2*pi*n) n^n exp(-n)

(the ratio of n! to Stirling's approximation is exp(theta/12 n) for some
 n-dependent number 0 < theta < 1, so the approx'n is very good for big n).

SO,

                         sqrt(2*pi*n) n^n exp(-n)       p^k   q^(n-k)
   P(k) ~ --------------------------------------------------------------------
           sqrt(2*pi*k) k^k exp(-k)    sqrt(2*pi*(n-k)) (n-k)^(n-k) exp(-n+k)

                   n
        = sqrt[------------]  *  [n*p/k]^k   *  [n*q/(n-k)]^(n-k)
               2*pi*k*(n-k)

Set x = k-np,  so k = np+x and n-k = nq-x; then

   P(np+x) ~ 1/sqrt[2*pi*n*p*q] * [np/(np+x)]^(np+x) * [nq/(nq-x)]^(nq-x)

           = const * (1+x/np)^np * (1-x/nq)^nq  * (1+x/np)/(1-x/nq)]^(-x)

For any x, the last term converges to one as n->oo; taking logs of the rest,
and using the approximation    log(1+s) = s - s^2/2 + o(s),

   log P(np+x) ~ c + np log(1+x/np) + nq log(1-x/nq)

               ~ c + np [ x/np - x^2/2(np)^2 ]  + nq [ -x/nq - x^2/2(nq)^2 ]
               = c + x - x^2/2np                - x - x^2/2nq
               = c - x^2/2npq

Thus,  P(k) ~ 1/sqrt(2*pi*n*p*q)   exp(- (k-np)^2 /2npq ),

exactly DeMoivre & Laplace's result.

================================================================

Notice that a Bi(n,p) random variable can be viewed as the sum

        S_n = I_1 + I_2 + ... + I_n

of n independent "Bernoulli" random variables, indicator variables
equal to one or zero with probabilities p or q=1-p, respectively.

Each I_k has mean p and variance pq, so the sum S_n has mean np and
variance npq; if we standardize it,

              S_n - np
    Z_n  =  -------------
            sqrt( n p q )

has (by the DeMoivre-Laplace limit theorem) approximately a standard
Normal No(0,1) distribution.  An amazing result we'll see more about
later is the "central limit theorem" which asserts that if

     S_n  =  X_1 + ... + X_n

for independent random variables X_k with ANY probability distribution
that has a finite mean mu and variance sigma^2, the standardized quantity

                S_n - n mu
     Z_n  =  -----------------
             sqrt( n sigma^2 )

has approximately a No(0,1) distribution for large n.  What does this
say for Poisson random variables X_n?  Geometric?  Cauchy?  Gamma?