Pitman MTH 135/ STA 104 Probability Week 6
Read:  Pitman sections 3.3
------------------------------------------------------------
Useful Facts about Expectations:

Definition:  E[ g(X) ] = SUM  g(x) * P[X=x]

             mu        = E[ X ]
        sigma^2        = E[ (X-mu)^2 ] = E[ X^2 ] - mu^2


Linearity:   E[ a + b*X ] = SUM { (a + b*x) * P[X=x] }
                          = SUM a * P[X=x] + Sum b * x * P[X=x]
                          = a + b*mu
           VAR[ a + b*X ] = E[ (a + b*X - a - b*mu)^2 ]
                          = E[ ( b * (X-mu) )^2 ]
                          = b^2 * sigma^2

             E[ X + b*Y ] = mu_X + b * mu_Y
           VAR[ X + b*Y ] = E[ ( X + b*Y - mu_X - b*mu_Y )^2 ]
                          = E{ [ ( X-mu_X ) + b ( Y-mu_Y ) ]^2 }
                          =    E (X-mu_X)^2
                            + 2 * b * E (X-mu_X)*(Y-mu_Y)
                            +   b^2 * E (Y-mu_Y)^2
                          = sigma_X^2 + 2 * b * Cov(X,Y) + b^2 * sigma_Y^2


Markov's Inequality:

      Let phi(x) be INCREASING and EVEN (like |x| or x^2):
      for any number a>0 and any random variable X,

      P[ |X| > a ] <= P[ phi(X) > phi(a) ]
                   <= E[ phi(X) ] / phi(a)

      Example:

      P[ |X| > a ] <= E |X| / |a|

      P[ |X| > a ] <= E X^2 / a^2

      P[ |X-mu| > a ] <= sigma^2 / a^2

      P[ |X-mu| > k * sigma ] <= 1/k^2   (set a=k*sigma)

------------------------------------------------------------

Let's explore what happens for averages

   Xbar =  ( X1 + X1 + ... + Xn ) / n

of a series of independent random variables, all with
the same distribution with 

     MEAN  E [ X ]        = mu        and
 VARIANCE  E [ (X-mu)^2 ] = sigma^2

Let's let Sn = ( X1 + X1 + ... + Xn ) be the "partial sum";
then

    E [ Sn ] = E [ ( X1 + X2 + ... + Xn ) ]
             = (E X1) + (E X2) + ... + (E Xn)
             =   mu   +   mu   + ... +   mu
             =   n * mu

and

   VAR [ Sn ] = E [ { (X1-mu) + (X2-mu) + ... + (Xn-mu) }^2 ]
              = n E [ (Xi-mu)^2 ] + n*(n-1)* E [ (Xi-mu)(Xj-mu) ]
              = n sigma^2

so

   E [ Xbar ] = (1/n) ( n mu ) = mu
 VAR [ Xbar ] = (1/n^2) ( n sigma^2 ) = sigma^2 / n

------------
SO, for samples of size n, for any number eps > 0,


  P[ | Xbar - mu | > eps ]  = P[ | Sn - n*mu | > n*eps ]
                           <= E ( Sn - n*mu)^2  /  (n*eps)^2
                            = n * sig^2 / n^2 eps^2
                            = sig^2 / n * eps^2  -> 0

so the probability that Xbar isn't very close to the mean mu
goes to zero as n->oo.

With more work we can show that

  P[ Xbar -> mu ] = 1

-------------
Sketch if there's time:

  Let {An} be events, and let "A" be the event
  A = { infinitely-many of the An occur }
    = (intersection over m>0) {At least one An occurs for n>=m }

If Sum{ Prob { A_n }: n=1,2,... } < oo,  then fix any epsilon>0
 and find M such that Sum{ Prob { A_n }: n=M,M+1,... } < epsilon.
Then
 P[ A ] <= P [ {At least one An occurs for n>=M }
        <= Sum P[An: n>=M ]  < epsilon
so P[ A ] = 0.  This is called the "Borel-Cantelli Lemma", or "B-C".
If the probabilities of some sequence of events are summable, then
at most finitely-many of them can occur.  The An do NOT have to be
independent.

Okay---  Now let Xn be independent random variables with means mu
and variances sig^2 < oo.  We saw that, for any eps>0,

   P [ { | Xbar - mu | > eps } ]  <=  sig^2 / (eps^2 * n)

The statement that "Xbar -> mu" is the same as saying 
"for every eps>0, only finitely-many An occur" for the event

                An = { | Xbar - mu | > eps }

Unfortunately, Sum [ 1/n ] = oo, so we can't apply B-C directly.
But B-C DOESS tell us that only finitely-many of 

               A1, A4, A9, A16, A25, ..., A{n^2}, ...

occur, so we know  Sk -> mu  along the *subsequence* k=n^2.  With
a little more work we can fill in the gaps between the squares to
get the STRONG LAW OF LARGE NUMBERS:

-------------

Thus,
                Sn - n*mu
LLN:          -------------   = (Xbar - mu)  ->   0
                    n

If we divide by something that grows slower.... namely, sqrt(n)....
then:

                Sn - n*mu
CLT:          --------------  =  sqrt(n) * [ Xbar - mu ]
                  sqrt(n)

               ==> No( 0, sig^2 )

We'll see more of this later... but we've already seen the DeMoivre
Laplace theorem, where X1 X2 ... Xn are INDICATOR RVs with

          P[ Xj = 1 ] = p        P[ Xj = 0 ] = 1-p = q

          E[ Xj ] = p          VAR[ Xj ] = p q

so Sn  has a Binomial Bi(n, p) distribution, and DeM-Lap and CLT
both say:

                  Sn - np
              ----------------   ~~   No(0, 1)
                sqrt( n p q )

----------------------------------------------------------------------
Extremes:

Particularly in the aftermath of events like the 2008 housing market crash,
the 2010 gulf oil leak, global warming trends, and such, we are more and more
interested not only in *AVERAGE* behaviour of random variables but also in
their *EXTREMES*.  The LLNs and the CLT talk only about averages.

If X1 X2 ... Xn are independent random variables, set

   X*n = max(X1, ..., Xn).                (could do minimum too)

Does X*n have a limiting probability distribution (probably not...)

If not, can we find constants a.n, b.n such that

            (X*n - a.n) / b.n

has a limiting distribution?  (YES, usually)  
What will that distribution be?

Example:

    (1)    Xn uniform random variables

    (2)    Xn exponential random variables

    (3)    Xn Pareto random variables

Turns out these are the only possibilities!  They can all be put into one
family, "GEV".  Ask me if you're interested in learning more.

Uniform (0,1):
         P[ X*n < x ] = x^n
         P[ Zn < z ]  = P[ X*n < (a.n + z b.n) ]
                      = (a.n + z b.n)^n
         Set a.n = 1; b.n = 1/n; and consider -n < z < 0:

                      = (1 + z/n)^n -> exp(z),  -oo < z < 0
("Reversed Weibul")

Exponential w/rate lam:

         P[ X*n < x ] = [ 1 - exp(- lam * x ) ]^n
         P[ Zn  < z ] = P[ X*n < (a.n + z b.n) ]
                      = [ 1 - exp(- lam * (a.n + z b.n) ) ]^n

         Set a.n = log(n)/lam and b.n = 1:

                      = [ 1 - (1/n) exp(- lam z ) ]^n
                     -> exp( - exp( - lam z ) ),  -oo < z < oo
("Gumbel")

Pareto:

         P[ Xn  > x ] = (eps/x)^alp ,  x > eps     (alp>0, eps real)
         P[ X*n < x ] = [1-(eps/x)^alp]^n
         P[ Zn  < z ] = P[ X*n < (a.n + z b.n) ]
                      = {1-[eps/(a.n + z b.n)]^alp}^n

         Set a.n = 0, b.n = eps * n^(1/alp); then

                      = {1-[eps/(z eps*n^(1/alp))]^alp}^n
                      = {1- z^{-alp}/n } ^n
                     -> exp(- z^(-alp) ),  0 < z < oo
("Frechet")
----------------------------------------------------------------