Pitman STA 230 / MTH 230 Probability Week 10

Pitman Sections 5.3-4:

5.1 Uniform Distributions
5.2 Densities
5.3 Independent Normal Variables
5.4 Operations (Optional)
    Distribution of Sums
    Density of X+Y
    MGF
    Expo and Gamma Distributions
    Beta Integral, Beta RV's
    Distribution of Ratios

-------------------------------- Indep Normal RV's -------------------------

phi(z) = c exp(-z^2/2)   c = 1/sqrt(2 pi)

X, Y indep std no:

f(x,y) = c^2 exp(-(x^2 + y^2)/2 )   c^2 = 1/2pi

--  Calculate c^2 via integ. by parts

--  Rayleigh Dist'n:  f(r) = r exp(-r^2/2), radius of bivariate normal

    [ Note:  T = R^2/2 has std expo dist'n, so R = sqrt(2 T) ]

--  VARIANCE OF NORMAL:

    [ Consider introducing gamma function, evaluating E [|X|^p ] from that ]

--  Linear Combinations and Rotations

    X_th = X cos(theta) + Y sin(theta) ~ No(0,1)

    Y_th = X sin(theta  - Y cos(theta) ~ No(0,1)

    For theta = pi/4,  (X+Y)/sqrt(2) and (X-Y)/sqrt(2) ~ No(0,1)

    If Z ~ No(0,1) then c*Z ~ No(0,c^2), ==>  X+Y ~ No(0,2)

MGFs:
    We will soon look at sums of indep RVs.  Products behave well for indep
    RVs, while sums don't---  so, we exponentiate sums or, more generally,
    linear combinations of RVs to study their distributions.  One useful
    tool: the Moment Generating Function, or MGF:

          M(t) = E[ exp( t*X ) ] 

    Examples:         Bi:  [ pe^t + q ]^n  
                      Po:  exp [ lam ( e^t - 1 ) ]
                      NB:  q^alp [ 1 - pe^t ]^(-alp)
                      Ga:  ( 1 - t/lam )^(-alp)
                      No:  exp( -mu t -sig^2 t^2/2 )
                      Pa:  oo
    Properties:
          M(0)=1  M'(0) = mu  M"(0) = E[X^2]

    Log MGF:
          psi(0)=0   psi'(0)=mu  psi"(0)=sig^2

    Scale/Shift:
*)        Y = a+bX => M.y (t) = exp(a t) M.x(b*t)

    Evaluate MGF for standard normal, then use *) to get arbitrary normal.

    Joint MGF:
           M(s,t)    = E[ exp( s*X + t*Y ) ]
           M(s,0)    = M.x(s)         M(0,t) = M.y(t)
           M"(0,0)   = E[ XY ]
           psi"(0,0) = E[ (X-mu.x)(Y-mu.y) ] = Covariance

    If X = a*Z.1 + b*Z.2
       Y = c*Z.1 + d*Z.2

    Then M(s,t) = E exp[ (as+ct)Z.1 + (bs+dt)Z.2 ]
                = exp[ -(as+ct)^2/2 -(bs+dt)^2/2 ]
                = exp[ -(a^2+b^2)s^2/2 -(c^2+d^2)t^2/2 -(ac+bd)  st ]
    so Var(X)=a^2+b^2, Var(Y)=c^2+d^2, Cov(X,Y)=ac+bd.

----------------------------------------------------------------------------

    For any desired variance and covariance, we can set:

         a = sig.x  b = 0  c = Cov/sig.x   d = sqrt{ sig^2.y - c^2 }

    and reach X, Y with mean 0 and desired mean/var/covar.

PREDICTION:

    If we know X, what's E[Y | X]????  For Gaussian,

          E[ c*Z.1 + d*Z. | a*Z.1 = x ] =  x(c/a)
                                        =  x * Cov/sig.x^2  (Linear Reg'n)
SUMS OF INDEP NO RV'S:

     X ~ No(lam sig^2)  ==>  E { exp[ t X ] } = exp { t lam + t^2 sig^2/2 }
     Y ~ No(mu, tau^2)  ==>  E { exp[ t Y ] } = exp { t mu  + t^2 tau^2/2 }
        ==>  E { exp[ t (X+Y) ] } = exp{ t (lam+mu) + t^2 (sig^2+tau^2)/2 }
        ==>  X+Y ~ No(lam+mu, [sig^2 + tau^2] )
 
     More generally,

     X_i ~ No(mu_i, sig_i^2)  ==>  \Sum X_i ~ No(\sum mu_i, \sum sig_i^2)

EXAMPLE

     Pr[ X+Y < Z+2 ] = Pr[ X+Y-Z  < 2 ] = Phi [ 2/sqrt(3) ]
                                        = Phi ( 1.1547 ) = 0.8759

Chi-Squared Distribution

     R_n = sqrt(Z_1^2 + ... + Z_n^2) ~ c_n r^(n-1) exp(-r^2/2)    (*)

     Y = (R_n)^2 ~ const * y^(n/2 - 1) exp(-Y/2)  ==>  Y ~ Ga(n/2, 1/2)

       ==>  get constant c_n for R_n ~ c_n in (*)
   
    
If X_n ~ No(mu, sig^2) then

  SX_n = X_1 + X_2 + ... + X_n  ~  No(n mu,  n sig^2)

 X-bar = (SX_n)/n               ~  No(mu,  sig^2/n ) 

                                             Mean      = mu
                                             Variance  = sig^2/n   -> 0

         (X_1-mu)^2 + (X_2-mu)^2 + ... + (X_n-mu)^2 ~ Ga(n/2, 1/2 sig^2)

         (1/n) * (")  ~ Ga(n/2, n/2 sig^2),  Mean      = sig^2
                                             Variance  = 2sig^4 / n -> 0

Alas we don't know mu...  but we can ESTIMATE it (by x-bar), and get

         (X_1 - Xbar)^2 + ... + (X_n - Xbar)^2 ~ Ga( (n-1)/2, 1/2 sig^2)

         1/(n-1) * "  has mean sig^2, variance 2*sig^4/(n-1) -> 0.  Whee!

-------------------------------------------------------------------------


------------------------------------------------
Functions of Random Vectors

Okay, see the stuff below, but really this week should concentrate on
the idea of expressing two normal random variables (X.1,X.2) in the form

       X.1 = mu.1 + s1 * Z.1
       X.2 = mu.2 + s2 * (rho Z.1 + a Z.2),  where a^2 = 1-rho^2

SO X.2|X.1 ~ No( mu.2+s2*rho*(X.1-mu.1)/s1,  (s2*a)^2 )

Probably should take mu.1=mu.2=0 first.

Really, DO NOT jump to LLN+CLT too fast.  Students struggle in here.
============================================================================


Multivariate Normal Variables:

Last week we saw that the variance of Xi and covariance of Xi and Xj are the
diagonal and off-diagonal entries in the matrix

      E [ (X-mu) (X-mu)' ]       (mu = E[X]; ' denotes transpose)

This is especially interesting for normally distributed random variables.

If Z is a zero-mean unit-variance normal random variable   Z ~ N(0,1) (we call
such a thing a "standard normal" random variable) and if a, b are real numbers
then 
        X = a Z + b

is also normally-distributed, with mean mu = b  and variance sigma^2 = a^2; if
we take Z to be a p-dimensional VECTOR of independent zero-mean unit-variance
normal random variables, take B to be a p-dimensional VECTOR and A a pxp
MATRIX, then the same thing happens:  The random variables

    X.1 = B1 + A11*Z.1 + A12*Z.2 + ... + A1p*Zp
    X.2 = B2 + A21*Z.1 + A22*Z.2 + ... + A2p*Zp
    ...
    X.p = Bp + Ap1*Z.1 + Ap2*Z.2 + ... + App*Zp

or, in vector notation, the components of the vector  X  = B  + A Z ,
are all normally-distributed, with mean E[X] = mu and covariance
matrix

    E[ (X-B) (X-B)' ] = E[ A Z Z' A' ] = AA'

MORALS:

    1.  If you'd like to generate normal random variables with means mu_i
        variances sigma_i^2 and covariances Cij, set Cii = sigma_i^2 and
        find any matrix A with AA' = C (kind of a square root); generate
        p independent standard normals; and set Z = mu + A Z.

    2.  For example, with p=2 and covariance r=Cov(X.1,X.2), we can take

            [ a  0 ]                        [ sig.1^2  r        ]
        A = |      |        and solve AA' = |                   |
            [ b  c ]                        [ r         sig.2^2 ]

                      [ a^2    ab    ]
        to find AA' = |              |
                      [ ab   b^2+c^2 ]

        and hence a = sig.1, b=r/sig.1, c=sqrt(sig.2^2 - r^2/sig.1^2):
        so

        X.1 = mu.1 + Z.1 * sig.1
        X.2 = mu.2 + Z.1 * (r/sig.1) + Z.2 * sqrt(sig.2^2 - r^2/sig.1^2)

    3.  Want to PREDICT something?  Let's find:

        E[ X.2 | X.1 ]   =  mu.2 + (r/sig.1) * (Z.1=(X.1-mu.1)/sig.1)
                         =  mu.2 + r/sig.1^2 * (X.1 - mu.1)

        "Linear Regression"; note sometimes we write the COVARIANCE r
        in terms of the CORRELATION COEFFICIENT rho = r /(sig.1*sig.2),
        whence the formula is 
        E[ X.2 | X.1 ]   =  mu.2 + (rho * sig.2/sig.1) * (X.1 - mu.1) (see p.349)

    4.  Want an even EASIER way?  For all random variables
             INDEPENDENT   ======>  UNCORRELATED,
        but for most NOT the converse; for NORMAL ONLY, INDEP <==> UNCORR 
        Also, for normals, conditional expectations are always LINEAR, SO,

        E[X.2 | X.1] = a + b * X.1       for SOME numbers a,b.  To find them,

        just make sure that the prediction error Y = (X.2 - a - bX.1) is
        orthogonal to (hence independent of) X.1:

        0 = E[ (X.2 - a - bX.1) * (X.1-mu.1) ]
          = r - a*0 - b*sig.1^2    ==>  b = r/sig.1^2

        ALSO, the conditional VARIANCE is just 

               Var[ X.2 | X.1 ] = E[ Y^2 ] 
                              = sig.2^2 - r^2 /sig.1^2 
 
        The "explained variation" fraction of X.2 is 

             r^2/(sig.1^2*sig.2^2) = rho^2,

        a number between 0 and 1 that tells how much of X.2's varying can be
        attributed to its relationship with X.1.

    5.  Want the joint density function for X.1...Xp?  Maybe not... but if so,
        Change variables:

        f(Z.1...Zp) = (2pi)^(-p/2) * exp[ - Sum (Zi)^2 /2 ]
        f(Z.1...Zp) = (2pi)^(-p/2) * exp[ - Z'Z/2 ]
        f(X.1...Xp) = const * exp[ - (X-mu)' C^(-1) (X-mu)/2 ]

    6.  MGF:
             M(t) = E[ exp(t . X) ]