STA250: Statistics, based loosely on DeGroot & Schervish
----------------------------------------
Week 6: Sampling Distributions of Estimators
====================================
We've seen a bit about interval estimation for normal distribution
means, when the variance is known. But what about No(mu, sig^2)
with BOTH parameters UNknown? What if we don't know the variance?
And what about other distributions, besides the normal?
That's today's main topic.
8.1 Sampling Distribution of Statistics
Since estimation etc. is best done with sufficient statistics, it's enough
to know the distribution of sufficient statistics T for each of the common
distributions. For example:
1) X.i ~ Bern(p): T = \sum X.i ~ Bi(n,p)
2) X.i ~ Poi(lam): T = \sum X.i ~ Poi(n*lam)
3) X.i ~ No(mu, sig^2): T = \sum X.i ~ No(n*mu, n*sig^2)
-----------------
Let X.1 ... X.n be n indep RV's from same dist'n with pdf/pmf f(x); let T.n
be any statistic. What is the probability distribution of T.n(X)???
The answer is useful for "sampling based" statistical inference--- for
example, if X.j has No(mu, 1) dist'n and T.n is X-bar, then
X-bar ~ No(mu, 1/n) so e.g.
0.95 = Pr[ - 1.96 < (X-bar - mu)/sig < 1.96 ]
= Pr[ mu - 1.96/sqrt(n) < X-bar < mu + 1.96/sqrt(n) ]
= Pr[ X-bar - 1.96/sqrt(n) < mu < X-bar + 1.96/sqrt(n) ]
and
(1/n) = R(mu, T.n) = E[ | X-bar - mu |^2 | mu ]
More generally---- if X.j have *KNOWN* variance sig^2, then
0.95 = Pr[ - 1.96 < (X-bar - mu)/{sig/sqrt(n)} < 1.96 ] (*)
= Pr[ X-bar - 1.96 sig/sqrt(n) < mu < X-bar + 1.96 sig/sqrt(n) ]
with similar expressions with "0.95" replaced by 0.90 (replace 1.96 with
1.645), 0.99 (with 2.576), or any other gamma (find z* so Phi(z*) =
(1+gamma)/2 ).
But what if sig^2 is NOT known? Hmmm...
-----------------
Sneak Preview:
If sigma^2 is known we use the fact that
Z = sqrt(n) (X-bar - mu)/sig
has a standard No(0,1) distribution to turn a probability statement
0.95 = Pr[ -1.96 < Z < 1.96 ]
into a "confidence" statement about random intervals,
0.95 = Pr[ X-bar-1.96 sig/sqrt(n) < mu < X-bar+1.96 sig/sqrt(n) ]
-------
If sig^2 is UNknown, we can estimate it by (sig-hat)^2, but
xxx sqrt(n) (X-bar - mu) / sig-hat
won't have the same distribution--- in fact, it will have "fatter tails"
because sig-hat might be too small. To fix this, we need to discover just
WHAT IS the DISTRIBUTION of xxx ???
The problem was solved by William Sealy Gosset in 1908 while studying
barley yield in Dublin for the Arthur Guinness & Son brewery.
Set:
*** t = sqrt(n-1) (X-bar - mu) / sig-hat
= sqrt (n) (X-bar - mu) / s
where s^2 = sig-hat^2 * n/(n-1) = (1/(n-1)) sum [ (X.i - x-bar)^2 ]
(note "n-1"; we'll come back to that). This "pivotal quantity" has
a distribution that doesn't depend on mu or sigma; WHAT IS IT???????
----------------------------------------------------------------------
If X.i = mu + sig * Z.i (or Z.i = (X.i-mu)/sig ) for each i, then
X-bar = mu + sig * Z-bar
(sig-hat)^2 = sum (X.i-X-bar)^2 / n = sig^2 * sum (Z.i-Z-bar)^2 / n
sqrt(n-1) (X-bar - mu) / sig-hat
= sqrt(n-1) (Z-bar) / sqrt{ (1/n) sum (Z.i - Z-bar)^2 }
SO, we can write the dist'n of t in terms only of the Z's
(but keep *** up on the board for later).
----------------------------------------------------------------------
Some preliminaries:
1. Let Z ~ No(0,1); then MGF is M(t) = exp(t^2/2).
Let X ~ No(mu,sig^2); then X = mu + sig*Z has MGF exp(t*mu + t^2 sig^2/2).
2. Let Y ~ Ga(alp, bet); then MGF for Y is (1-t/bet)^(-alp).
3. Back to Z~No(0,1); MGF for Z^2 is 1/sqrt(1-2t), so Z^2 ~ Ga(1/2, 1/2).
4. And Z.1^2 + Z.2^2 + ... + Z.n^2 ~ Ga(n/2, 1/2)
5. If X and Y are indep, then joint MGF factors:
E exp(s X + t Y) = E exp(s X) * E exp(t Y)
Converse is true too--- if MGF factors, then X, Y are indep.
----------------------------------------------------------------------
First, let Z.1 Z.2 ... Z.n be iid No(0,1), and consider three RV's:
Z-bar = (Z.1 + Z.2 + ... + Z.n) / n
T = \sum (Z.j)^2
S^2 = \sum (Z.j - Z-bar)^2
8.3 Joint Distribution of X-Bar and S-Squared
What are their distributions? We'll also need something more---
the *joint* distribution of Z-bar and S^2. By Pythagoras,
p
T = S^2 + n |Z-bar|^2 (proj'n of Z onto 1_n is Z-bar 1_n)
and, since Z-bar is indep of (Z-Z-bar), it's also independent of S^2
and
E e^{t T} = (1-2t)^-n/2
= M_S(t) * E[exp(t n |Z-bar|^2)]
= M_S(t) * (1-2t)^-1/2,
so M_S(t) = (1-2t) ^ -(n-1)/2
and S^2 ~ Ga((n-1)/2, 1/2), independent of Z-bar ~ No(0, 1/n).
----------------------------------------------------------------------
More generally--- if X.j are iid No(mu, sig^2) then subtract mu from
everything then divide by sig to find that
X-bar ~ No(mu, sig^2/n) is INDEPENDENT of:
S^2 ~ Ga( (n-1)/2, 1/(2 sig^2) )
SO, also,
sqrt(n) (X-bar - mu)/sig ~ No(0, 1) is independent of:
Y = S^2/sig^2 ~ Ga( nu/2, 1/2 ) = chi-square (nu)
----------------------------------------------------------------------
If we try to replace "sig^2" with an estimate like S^2/(n-1) in (*), what
happens? Note Y = S^2/sig^2 ~ Ga(nu/2, 1/2) for nu = (n-1), while
Z = (X-bar-mu)/(sig/sqrt n) ~ No(0,1), and
X-bar - mu Z
------------------ = -------------
sqrt[ S^2/(n-1) ] sqrt( Y/nu )
In honor of Arthur Guinness & Son Breweries, and of William Sealy Gosset
aka 'Student':
Z
---------------- = "t" Z ~ No(0,1) indep. Y ~ Ga(nu/2, nu/2)
sqrt( Y/nu )
With some algebra, one can show (like Gosset did) that "t" has pdf
f(t) = const / (1 + t^2/nu)^{ (1+nu)/2 }
with nu = n-1 the "degrees of freedom".
-----------------------------------------------------------------------
Here's a sketch of the algebra: First, Z^2 has a chi^2 dist'n, with pdf f(x)
= exp(-x/2) / sqrt(2 pi x), x>0. Then, really a product convolution formula:
P[ | T | <= t ] = P [ Z^2 > t^2 Y/nu ]
= int_0^oo { int_0^{t^2 Y/nu} [exp(-x/2) / sqrt(2 pi x) dx ] }
(1/2)^(nu/2) y^(nu/2-1) exp(-y/2)/Gamma(nu/2) dy
Now take d/dt to get twice the pdf for t; change variables from y to
x = y(1+t^2/nu)/2 inside the "gamma" integral.
-----------------------------------------------------------------------
Unfortunately the distribution of t does depend on nu so, unlike with the
Normal distribution, we need a separate "table" for every sample size n.
Since no publisher will let someone put 30 pages of t tables in their book,
we instead get "critical values" of the t, i.e., the numbers b such that
P[ -b < t < b ] = 0.95, 0.975, 0.99, 0.995 (say)
for nu=1,2,3,...,30 or so; as nu -> oo these converge to the "critical
values" for the normal distribution, so the tables stop at about nu=30.
In R we can compute: pt( q, nu ) = Pr[ t <= q : df = nu ]
and qt( p, nu ) = q if p = Pr[ t <= q : df = nu ]
(also the density function dt( q, nu ), "n" iid random draws rt( n, nu))
For df nu=1 (so we're estimating sigma^2 from just n=2 observations), the
t distribution is the same as the Cauchy, with pdf
1 / pi
f(t) = ------------------
1 + t^2
and hence CDF
F(t) = [ arctan(t) / pi ] + (1/2)
so a central 95% interval would be 0.95 = Pr[ -t* < T < t* ]
for t* satisfying
F(t*) = [ arctan(t*) / pi ] + (1/2) = 0.975
i.e. arctan(t*) = pi * 0.475
i.e. t* = tan( pi * 0.475 ) = 12.70620,
*WAY* bigger than the Normal Distribution value (1.96) we get as nu -> oo.
-------------- Bayesian Analysis with Unknown Variance ------------------
The Normal distribution can be parametrized in terms of its mean and
PRECISION: tau = 1/sigma^2
to have lh
(tau/2 pi)^n/2 exp{- tau sum(X_i-mu)^2/2 }
= (tau/2 pi)^n/2 exp{- tau sum(X_i-x-bar)^2/2 - n tau (x-bar-mu)^2/2 }
As functions of tau and mu, this is of the form a normal density in mu with
mean x-bar and precision PROPORTIONAL TO tau; times a gamma density
function for tau. SO, let's consider the conjugate family:
pi(mu, tau) ~ No(mu: mu.0, lam.0 * tau ) * Ga(tau: alp.0, bet.0)
THEN, the posterior after observing x.1 ... x.n ~ No(mu, tau) will be of
the same form, but now with four new parameters:
mu.1 = [ lam.0 * mu.0 + n * x-bar.n ] / [ lam.0 + n ]
lam.1 = lam.0 + n
alp.1 = alp.0 + n/2
bet.1 = bet.0 + s.n^2/2 + n lam.0 (x-bar.n - mu.0)^2 / 2(lam.0+n)
where x-bar.n = (1/n) sum ( x.j ) and s.n^2 = sum ( x.j - x-bar.n )^2
---
The marginal distribution of mu under the "normal gamma" family with params
mu lam alp bet is:
A constant times a shifted t with df=2 alp.0. The shift is mu.0;
the constant is: sqrt(bet.0/lam.0 alp.0). Thus... with t* from
the t-table with 2*alp.1 degrees of freedom,
gamma =
Pr[ mu.1 - t* *
----------------------------------------------------------------------
Geometric Picture:
The joint distribution of JOINTLY GAUSSIAN ("normal") RV's is completely
determined by the MEAN VECTOR and the COVARIANCE MATRIX.
Jointly normal random variables are UNCORRELATED if and only if
they are INDEPENDENT.
The joint pdf for {Z.j} is spherically symmetric; for any unit vector v
in R^n, the orthogonal proj'n of Z.j onto {c*v: c\in R} is of the form
Y*v for a normal RV Y. The orthogonal complement [ Z - Y v ] has a
standard Gaussian distribution in the (n-1) dimensional space of vectors
orthogonal to v, and is independent of Y.
----------------------------------------------------------------------
A little about MV Normal Distributions:
If Z = (Z_1,...,Z_n) is a standard n-dimensional normal random vector, then
the joint pdf for the components of Z is merely
(2pi)^(-n/2) exp( - z'z/2 )
where by "z" we denote the column (n x 1) vector of components, by " z' "
its transpose, a (1 x n) row vector, and by "z'z" their matrix product, the
sum of squares.
For any n x n matrix A and any column vector mu , the random vector
X = mu + A Z (*)
is also normally-distributed, but now with mean E[ X_j ] = mu_j and
*covariance* matrix C = { C_ij } with components
C_ij = E[ ( X_i - mu_i ) ( X_j - mu_j ) ]
or, in Matrix form,
C = E[ (X-mu) (X-mu)' ]
(work out the coordinates to see that this is correct. Note that for a
column vector V, the quantity " V' V " is the 1x1 sum of squared elements,
but " V V' " is the n x n matrix of terms v_i v_j ).
Upon substituting our formula for X in (*) above, since E [ Z Z' ] is the
identity matrix,
C = E[ A Z (A Z)' ]
= E[ A Z Z' A' ]
= A E[ Z Z'] A'
= A A'.
If A is invertible we can recover Z from X as
Z = A^{-1} [ X - mu ]
and so we can write the sum of squares Z'Z as
Z'Z = [ X - mu ]' (A^{-1} )' A^{-1} [ X - mu ]
= [ X - mu ]' ( A A' ) ^{-1} [ X - mu ]
= [ X - mu ]' C^{-1} [ X - mu ]
The change-of-variables formula now gives the density function for X as:
1 -n/2 - Z'Z / 2
f(x) = ------------ (2 pi) e (**)
| g'(z) |
where z = A^{-1} [ x - mu ] is g^{-1} ( x ) for the function
g(z) = mu + A z
whose Jacobian (derivative matrix) is
g'(z) = A
For n-dimensional change-of-variable (COV), " | g'(z) | " in (**)
MEANS the absolute value of the determinant of the Jacobian matrix g'(z).
Since det(C) = det(A A') = det(A) det(A') = det(A)^2, and since the
determinant of c B for any constant c and matrix B is c^n det(B), we can
evaluate | g'(z) | as sqrt( det( C ) ). Putting this together, we get:
1 - [ X - mu ]' C^{-1} [ X - mu ]/2
f(x) = --------------------- e
sqrt( det( 2 pi C ) )
for the joint density function of a multivariate normal X ~ No( mu, C ).