SUMS OF INDEPENDENT RVs
=======================

In general it's hard to find the pdf or pmf of sums Sn=X1+...+Xn of
random variables.  Soon we'll see that, IF the rv's are 
  * (nearly) independent and have
  * (nearly) the same distribution, and if
  * n is very large,
then the distribution of Sn will be approximately normal.  First some
preliminaries:

If X is a random variable with (nonnegative) integer values, define
the "Generating function"

  G(z) = E[z^X] = sum z^k * P[X=k]

We can recover the pmf p(k)=P[X=k] from the power-series expansion of
G(z), for example as
               (k)
       p(k) = G   (0) / k!
            = (k'th derivative of G, evaluated at 0) / k!

This is helpful for sums of indep rv's:  if Sn=X1+...+Xn, then

       G(z) = E[z^Sn] = E[z^X1] * ... * E[z^Xn]

and so we can recover the pmf for Sn from the power series for G; for
example, the sum S2 of X1~Bi(n=10,p=.2) and X2~Bi(n=6,p=.5) has gen.
fn. G(z) = (.2 Z + .8)^10 * (.5 Z + .5)^6, available from Mathematica
as
    0.0016777216     + 0.0142606336 Z    + 0.05505024 Z^2  + 
    0.127926272 Z^3  + 0.2000814080 Z^4  + 0.22320906 Z^5  +
    0.183744102 Z^6  + 0.1139384320 Z^7  + 0.05385946 Z^8  +
    0.019508032 Z^9  + 0.0054071824 Z^10 + 0.00113730 Z^11 +
    0.000178328 Z^12 + 0.0000201920 Z^13 + 1.56 10^-6 Z^14 +
    7.36 10^-8  Z^15 + 1.60 10^-9   Z^16

from which we can read off the probabilities of different possible sums.


LIMIT THEOREMS
==============

0.  MARKOV & CHEBYCHEV INEQUALITIES:

         A simple & useful tool used to prove these limit theorems are the
         inequalities:

         P[ Y > a ] <= E[ Y ] / a          for positive random variables Y

         P[ |X-mu| > k sigma ] <= 1/k^2    (set Y=|X-mu|^2/sigma^2, a=k^2)

         The cool thing about these is that they are NOT "approximations",
         they are guarantees; and that they don't use anything about the
         distributions of X or Y.  Also they're easy to prove....

Two hallmarks of Probability Theory are the:

1.  LAW of LARGE NUMBERS:

         If X1 ... Xn ... are all independent with the same probability
         distribution  with (finite) mean mu, then

         _         X1 + X2 + ... + Xn
         X   =   ----------------------    ------>     mu,
          n                n
                                              _
         i.e. Yn = (Xn - mu)/sigma  satisfies Yn --> 0

         as n -> oo.   The Yn are *random variables*, not just numbers,
         so we have to be a little more careful about what this means:
                         _
         1. WEAK:    P[ |Yn| > eps ] --> 0
                         _
         2. STRONG:  P[ |Yn| -> 0 ] = 1

         One can show that the WEAK formulation says the events En={ |Yn| > eps }
         satisfies P[ En ] -> 0 as n-> oo, while the STRONG formulation makes the
         stronger statement P[ U {En: n>=m} ] -> 0 as m->oo.

         There are a range of possible variations, weakening "same
         distribution" or "independent"; but SOME sort of condition is
         needed to eliminate for example:

         Toss a fair coin.
            If Heads, set X1 = X2 = X3 = ... = 1; 
            if Tails, set X1 = X2 = X3 = ... = 0.
         Then E[Xi] = mu = 1/2; all Xi have the same distribution; but
         Xbar_n = X1 does not converge to mu as n->oo in any sense.

         For indicator random variables Xi = 1 if event A occurs on i'th trial
         the Law of Large Numbers says

         _         # of occurances of A in 1st n tries
         X   =   ---------------------------------------    ------>   Prob(A),
          n                            n

         consistent with the Frequency Interpretation of probability.


2.  CENTRAL LIMIT THEOREM:

         If X1 ... Xn ... are all independent with the same probability
         distribution  with (finite) mean mu and finite variance sigma^2, then
                          _
              sqrt(n) * [ X - mu ] / sigma    ==>  N(0,1)
                           n

         or, for the normalized versions Yn = (Xn - mu)/sigma,
                      _
              sqrt(n) Y  = (Sum Yi: i=1->n)/sqrt n  ==>  N(0,1)
                       n

         i.e., the probability distribution of Xbar_n is closer and closer to
         the normal distribution with mean mu and variance sigma^2/n.  This
         lets us compute approximate probabilities for means & sums of any set
         of  independent random variables.

         Poisson and Gamma random variables are *Approximately* normal, with
         big lambda or alpha, respectively; for example, let X be Poisson with
         mean 100; then

         P[ X <= 110 ] = Sum [ 100^k/k! * exp(-100) : k=0...100] = 0.8528627
                       = P[ (X-100)/10 <= (110.5-100)/10 ]
                APPROX = PHI((110.5-100)/10 = 1.05) =              0.8531409

         Note that this is a better approximation than Phi((110-100)/10)=.84;
         the difference is more striking at smaller n's.

--------

Simple Proof of Weak Law of Large Numbers:

         Look at Yn = (Xn - mu)/sigma;  then:
                          _                                   _
             WLLN:  Pr [ |Yn| > eps ] -> 0       SLLN:  Pr [  Yn -> 0 ] = 1
                         _            
             CLT:   Pr [ Yn < a/sqrt n ] -> Phi(a)
                       _                                _
         The *mean* of Yn is zero and the *variance* of Yn is 1/n, so
         by Chebychev's inequality
             _
         P[ |Yn| > eps ] =    <= (1/n) / eps^2
                               = 1/n*eps^2  --> 0 as n->oo, for any eps>0.


CLT:
         Any probability distribution is characterized by it's "moment
         generating function" (MGF, related to Fourier & Laplace transforms):
                            oo
                            /   tx
         M(t) = E[e^(tX)] = |  e   f(x)dx
                            /
                          -oo

         Interesting properties (and genesis of name) include:

         M(0) = 1  M'(0) = mu  M''(0) = E[X^2] => sigma^2 = M''(0) - {M'(0)}^2
         or, a bit more elegently,
         log M(0) = 0;  [log M]'(0) = mu;  [log M]''(0) = sigma^2
         so a Taylor series expansion gives

              log M(t) = 0 + mu * t + sigma^2 * t^2/2 + R(t),
         or
              M(t)     = e^( mu * t + sigma^2 * t^2/2) * exp( R(t) )

         with remainder R(t) small compared to t^2 as t->0.  

         Some common distributions' MGF's are:

         Binomial:    M(t) = [p e^t + q ]^n            log M = n*log{1+p(e^t-1)}
         Poisson:     M(t) = exp(lambda * (e^t-1))     log M = lambda * (e^t-1)
         Geometric:   M(t) = p/(1-q e^t)               log M = -log{1+(q/p)(1-e^t)}
         Exponential: M(t) = (1-t/lambda)^-1           log M = -log{1-t/lambda}
         Gamma:       M(t) = (1-t/lambda)^-alpha       log M = -alpha * log(1-t/lambda)
         Normal:      M(t) = exp(mu*t + sigma^2*t^2/2) log M = mu*t + sigma^2 * t^2/2

         In particular, the standard normal has log M(t) = t^2/2.

         For independent RV's, the sum Sn has MGF the product of the MGF's:

         M_S(t) = E[e^(t*(X1+...+Xn))] = E[e^(t*X1)] E[e^(t*X2)]... E[e^(t*Xn)]
                                       = M(t)^n

         After subtracting mu and dividing by sigma we might as well start
         with Yn's that have zero mean and unit variance; for mu=0 sigma=1,

         Zn  =  (Sn - n*mu) / (sigma * sqrt n )  =  Sn/sqrt n
        satisfies:

         log M_Zn(t) =  n * log M(t/sqrt n)
                     =  n * [ (t/sqrt n)^2 + R(t/sqrt n) ]  (Taylor's Theorem)
                    --> t^2/2 + n * {something small compared to t^2/n}
                     =  t^2/2

         ===>  for large n, Zn has *approximately* the distribution with MGF
               exp(t^2/2), i.e., standard normal distribution.

         (using more advanced mathematics, including the "Fourier inversion
         formula", we could show that the CDF and expectation E[g(Zn)] for any
         continuous function g(z) converge to those of of a standard normal
         random variable).  Note that the PDF does NOT converge, and in fact
         will not even exist unless at least one of the Xj has a continuous
         distribution. 

         SO, any sum or average of a large number of independent random
         variables is APPROXIMATELY normally distributed.