Pitman STA 230 / MTH 230 Probability Week 4
Read:  Pitman sections 2.4-2.5

MEAN
---------------

If we observe many replications of an experiment and so collect
data X1 X2 X3 ... Xn, what will happen to the LONG TERM AVERAGE

               (X1 + ... + Xn)/n

for huge n?  If X has finitely-many values--- say, if

               P[ X = a_k ] = p_k

for k=1,2,...,K with \sum p_k = 1 and each p_k>0, then for
huge n, S_n := (X_1 + ... + X_n) should be about

    S_n ~ n*p_1*a_1 + n*p_2*a_2 + ... + n*p_n*a_n     (why?)

and so, dividing by n, the average should be about

    S_n/n ~ p_1*a_1 + p_2*a_2 + ... + p_n*a_n

          = \sum p_k a_k.

Conventional letter:  Greek mu.

For example:  Average die roll should be about E[X] = 3.5

The same formula works even if X takes countably many values, *except*
that we have to worry about whether the sum is finite, and makes sense.
Later on we'll see something like this works for random variables with
uncountably-many values, but we need to replace the sums with integrals.

(Consider St. Petersburg paraxdox here)
------------------------------------------------------------------------

Poisson & Hypergeometric Distributions

* Poisson Approximation to the Binomial
-----------

Shortcomings of DeMoivre/Laplace:  if p is close to 0 or 1, then 
sigma = \sqrt{n p q} is small.  Normal is symmetric, for example,
while binomial for small p is packed up against 0.

Text looks at binomial with n large and mu=1, i.e., p=1/n; present
fishing example here, and sneak in exponential distribution:

     lam = average number of fish caught per hour
     X_t = number of fish caught by time t
         ~ # of successes in N independent tries
           (t/N hours long), with prob. p_N of success
         ~ Bi(N, p_N) => t*lam = N * p_N
                      => p_N = (t lam/N)
     P[ X_t = k ] ~ (N:k) (t lam/N)^k (1-t lam/N)^(N-k)
                 -> (lam t)^k/k!  exp(-lam t)
                 =  mu^k/k!  exp(-mu), k=0,1,2,...
     T_k = time needed to catch k'th fish

     P[ T_1 > t ] =  P[ X_t = 0 ] = exp(-lam t), t>0
                  => P[ a < X_t <= b ] = exp(-lam a)-exp(-lam b)
                  =  \int_a^b  lam exp(-lam t) dt

Recall: exponential sum,   \sum_{k=0}^oo   a^k/k!  =  exp(a)
        exponential limit, (1+x/N)^N -> exp(x)

---------------

What's the Poisson mode?  [ Mention: f'=0 is NOT the way to find it ]

Is there ever a double-mode?

                            k-1              k
                          mu              mu
  P(k-1) < P(k)  <==>  -----------  <  ----------
                         (k-1) !          k !


                                           mu
                 <==>       1       <  ----------
                                            k

                 <==>       k < mu

SO, mode is largest integer <= mu; if mu is an integer, then
(mu-1, mu) is a double mode.

----------------

What happens when mu gets big?

Looks like the normal, with mean mu and variance sigma^2 = mu.

=====================================================================

* Random Sampling:  WITH and WITHOUT Replacement


WITH replacement:  Binomial,  mu = G/(G+B)   ("good" and "bad" in
                                              population), so

      P[g good, b bad in n=b+g tries] =  (n:g) G^g  B^b  / N^n
                                      =  (n:g) mu^g (1-mu)^(N-g)
      (here "(n:k)" denotes the binomial coeffient "n choose k")
WITHOUT replacement:
      Two approaches:
       one-at-a-time:
                    G   G-1     G+1-g     B   B-1      B+1-b
             (n:g) ---  --- ... ----- x  --- ----- ... -----
                    N   N-1     N+1-g    N-g N-g-1     N+1-n


                    n!      G!       B!    (N-n)!
               =  -----   ------   ------  ------
                  g! b!   (G-g)!   (B-b)!    N!

and "numbers of subsets" approach:

                 (G:g)    (B:b)
               --------------------
                      (N:n)

                    G!         B!      (N-n)! n!
               =  --------  ---------  ---------
                  g!(G-g)!  b! (B-b)!      N!

=========================================================================
GEOMETRIC DIST'N:

Count the failures X before the first success, in independent
trials with same probability p of success.  Find the probability
of exactly k failures before first success; also find the average
number of failures before first success.

To have k failures before 1st succ, first k+1 trials must be EXACTLY:

             F F F ... F S

whose probability is   f(k) = p (1-p)^k,    k = 0,1,2,...

Average ("mean") value of this is

           sum(k=0:oo)  k  p q^k  =  ???

One Trick:

       = p * q *  sum(k=0:oo)  k  q^(k-1)

       = p * q * (d/dq) sum(k=0:oo)  q^k

       = p * q * (d/dq) [ 1 / (1-q) ]

       = p * q * [ 1 / (1-q)^2 ]

       = p * q / p^2 =  q/p

Example:  E[ Tails before 1st Head ]   = 1
          E[ Non-aces before 1st Ace ] = 5

Another Trick:

       Think of what happens on first trial:  Either
  p:   you succeed (=> NO  failures)
  q:   or  not     (=> ONE failure already, plus whatever
                       else happens) so M=E[X] satisfies

  mu = p * 0 + q * (1+mu)  Solution: mu(1-q)=q, or mu=q/(1-q)=q/p

==================================================================

NEGATIVE BINOMIAL DIST'N:

Count the failures Y before the r'th success, otherwise as above;

For k failures before rth succ, first k+r trials must be EXACTLY:

             F S S F F S ...      F S
              \ r-1 S's and k Fs /

whose prob is (k+r-1 : k) p^r q^k

May view it as sum of r indep Geometrics--- failures before 1st
succ + add'l failures before 2nd succ + ... + failures before rth
succ.

SO, E[ Y ] = r*(q/p)

Note dist'n makes sense for ALL real r>0, not only integers:

         r (r+1) (r+2)...(r+k-1)                 Gamma(r+k)
  f(k) = -----------------------  p^r (1-p)^k  = ----------- p^r q^k
                   k !                           Gamma(r) k!

  where Gamma(z) := int{ t^(z-1) exp(-t), t=0:infinity }
                  = (z-1)! for positive integers z

  Mean will be mu = r q / p
  Trick 2 works here too:  mu(r) = p { mu(r-1) } + q { 1 + mu(r) }
                                 = \mu(r-1) + q/p    = r q/p
========================================================================

Indicators:

  Binomial:   X = I1 + I2 + ... + In

              Each Ij = 1 w/prob p, 0 w/prob q, ==> Mean p;
              Sum  has mean n*p.

  Hypergeo:   X = I1 + I2 + ... + In
              Each Ij = 1 w/prob p, 0 w/prob q, ==> Mean p;
              Sum  has mean n*p.

  How are they different?

  VARIANCE:

        sig^2 =  E | X - mu |^2 

              =  E ( I1 + ... + In - n*p )^2

              =  E ( I1 - p)^2 +  ( I2 - p)^2 + ... + ( In - p)^2
               + E ( I1-p )(I2-p) + ...
              =  n * p * (1-p) + n*(1-n) * [ E Ii Ij - p^2 ]

         What IS E[ Ii Ij - p^2 ] ???

  Binomial:   p^2 - p^2 = 0, so var is n p q.

  HyperGeo:   (G/N)  ( (G-1)/(N-1) ) - (G/N)^2

              = (G/N) [ (G-1)/(N-1) - G/N ]
              = - (G/N) [ (N-G) / N(N-1) ]
              = - (G/N)(B/N) / (N-1)

       so var = n(G/N)(B/N) - n(n-1)(G/N)(B/N) / (N-1) 
              = n P Q [ 1 - (n-1)/(N-1) ]
              = n P Q [ (N-n) / (N-1) ]

  which is pretty much the same as Binomial IF N >> n

=========================================================================

Poi Mean:

       sum_{k=0:oo}   k  (lam^k)/k! exp(-lam)

         =  sum_{k=1:oo}   k  (lam^k)/k! exp(-lam)
         =  sum_{k=1:oo}   lam * (lam^(k-1))/(k-1)! exp(-lam)
         =  lam * sum_{j=0:oo}   lam^j / j! exp(-lam)
         =  lam * exp(lam) * exp( -lam ) = lam.

Poi Var:

      E[ (X-lam)^2 ] is hard; so is E[ X^2 ].  BUT, easy is:

      E[ X*(X-1) ] = E[ X^2 ] - E[ X ] = lam^2

      so Var(X) = (lam^2 + lam) - (lam^2) = lam,  same as mean.

======================================================================

St. Petersburg:

Start with $1 on the table and a fair coin.

At each step: Toss the coin; if it shows Heads, take the money.
If it shows Tails, I'll double whatever money is on the table.

Let X be the amount you win.

What is the pmf for X?

What is E[ X ] ???

What does that MEAN in terms of averages?

How much would you pay me to play this game?

Can your opponent really pay off the bet?
Note lg($68.7 B) = 36...  

--------