STA250: Statistics, based loosely on DeGroot & Schervish ---------------------------------------- Week 6: Sampling Distributions of Estimators ==================================== We've seen a bit about interval estimation for normal distribution means, when the variance is known. But what about No(mu, sig^2) with BOTH parameters UNknown? What if we don't know the variance? And what about other distributions, besides the normal? That's today's main topic. 8.1 Sampling Distribution of Statistics Since estimation etc. is best done with sufficient statistics, it's enough to know the distribution of sufficient statistics T for each of the common distributions. For example: 1) X.i ~ Bern(p): T = \sum X.i ~ Bi(n,p) 2) X.i ~ Poi(lam): T = \sum X.i ~ Poi(n*lam) 3) X.i ~ No(mu, sig^2): T = \sum X.i ~ No(n*mu, n*sig^2) ----------------- Let X.1 ... X.n be n indep RV's from same dist'n with pdf/pmf f(x); let T.n be any statistic. What is the probability distribution of T.n(X)??? The answer is useful for "sampling based" statistical inference--- for example, if X.j has No(mu, 1) dist'n and T.n is X-bar, then X-bar ~ No(mu, 1/n) so e.g. 0.95 = Pr[ - 1.96 < (X-bar - mu)/sig < 1.96 ] = Pr[ mu - 1.96/sqrt(n) < X-bar < mu + 1.96/sqrt(n) ] = Pr[ X-bar - 1.96/sqrt(n) < mu < X-bar + 1.96/sqrt(n) ] and (1/n) = R(mu, T.n) = E[ | X-bar - mu |^2 | mu ] More generally---- if X.j have *KNOWN* variance sig^2, then 0.95 = Pr[ - 1.96 < (X-bar - mu)/{sig/sqrt(n)} < 1.96 ] (*) = Pr[ X-bar - 1.96 sig/sqrt(n) < mu < X-bar + 1.96 sig/sqrt(n) ] with similar expressions with "0.95" replaced by 0.90 (replace 1.96 with 1.645), 0.99 (with 2.576), or any other gamma (find z* so Phi(z*) = (1+gamma)/2 ). But what if sig^2 is NOT known? Hmmm... ----------------- Sneak Preview: If sigma^2 is known we use the fact that Z = sqrt(n) (X-bar - mu)/sig has a standard No(0,1) distribution to turn a probability statement 0.95 = Pr[ -1.96 < Z < 1.96 ] into a "confidence" statement about random intervals, 0.95 = Pr[ X-bar-1.96 sig/sqrt(n) < mu < X-bar+1.96 sig/sqrt(n) ] ------- If sig^2 is UNknown, we can estimate it by (sig-hat)^2, but xxx sqrt(n) (X-bar - mu) / sig-hat won't have the same distribution--- in fact, it will have "fatter tails" because sig-hat might be too small. To fix this, we need to discover just WHAT IS the DISTRIBUTION of xxx ??? The problem was solved by William Sealy Gosset in 1908 while studying barley yield in Dublin for the Arthur Guinness & Son brewery. Set: *** t = sqrt(n-1) (X-bar - mu) / sig-hat = sqrt (n) (X-bar - mu) / s where s^2 = sig-hat^2 * n/(n-1) = (1/(n-1)) sum [ (X.i - x-bar)^2 ] (note "n-1"; we'll come back to that). This "pivotal quantity" has a distribution that doesn't depend on mu or sigma; WHAT IS IT??????? ---------------------------------------------------------------------- If X.i = mu + sig * Z.i (or Z.i = (X.i-mu)/sig ) for each i, then X-bar = mu + sig * Z-bar (sig-hat)^2 = sum (X.i-X-bar)^2 / n = sig^2 * sum (Z.i-Z-bar)^2 / n sqrt(n-1) (X-bar - mu) / sig-hat = sqrt(n-1) (Z-bar) / sqrt{ (1/n) sum (Z.i - Z-bar)^2 } SO, we can write the dist'n of t in terms only of the Z's (but keep *** up on the board for later). ---------------------------------------------------------------------- Some preliminaries: 1. Let Z ~ No(0,1); then MGF is M(t) = exp(t^2/2). Let X ~ No(mu,sig^2); then X = mu + sig*Z has MGF exp(t*mu + t^2 sig^2/2). 2. Let Y ~ Ga(alp, bet); then MGF for Y is (1-t/bet)^(-alp). 3. Back to Z~No(0,1); MGF for Z^2 is 1/sqrt(1-2t), so Z^2 ~ Ga(1/2, 1/2). 4. And Z.1^2 + Z.2^2 + ... + Z.n^2 ~ Ga(n/2, 1/2) 5. If X and Y are indep, then joint MGF factors: E exp(s X + t Y) = E exp(s X) * E exp(t Y) Converse is true too--- if MGF factors, then X, Y are indep. ---------------------------------------------------------------------- First, let Z.1 Z.2 ... Z.n be iid No(0,1), and consider three RV's: Z-bar = (Z.1 + Z.2 + ... + Z.n) / n T = \sum (Z.j)^2 S^2 = \sum (Z.j - Z-bar)^2 8.3 Joint Distribution of X-Bar and S-Squared What are their distributions? We'll also need something more--- the *joint* distribution of Z-bar and S^2. By Pythagoras, p T = S^2 + n |Z-bar|^2 (proj'n of Z onto 1_n is Z-bar 1_n) and, since Z-bar is indep of (Z-Z-bar), it's also independent of S^2 and E e^{t T} = (1-2t)^-n/2 = M_S(t) * E[exp(t n |Z-bar|^2)] = M_S(t) * (1-2t)^-1/2, so M_S(t) = (1-2t) ^ -(n-1)/2 and S^2 ~ Ga((n-1)/2, 1/2), independent of Z-bar ~ No(0, 1/n). ---------------------------------------------------------------------- More generally--- if X.j are iid No(mu, sig^2) then subtract mu from everything then divide by sig to find that X-bar ~ No(mu, sig^2/n) is INDEPENDENT of: S^2 ~ Ga( (n-1)/2, 1/(2 sig^2) ) SO, also, sqrt(n) (X-bar - mu)/sig ~ No(0, 1) is independent of: Y = S^2/sig^2 ~ Ga( nu/2, 1/2 ) = chi-square (nu) ---------------------------------------------------------------------- If we try to replace "sig^2" with an estimate like S^2/(n-1) in (*), what happens? Note Y = S^2/sig^2 ~ Ga(nu/2, 1/2) for nu = (n-1), while Z = (X-bar-mu)/(sig/sqrt n) ~ No(0,1), and X-bar - mu Z ------------------ = ------------- sqrt[ S^2/(n-1) ] sqrt( Y/nu ) In honor of Arthur Guinness & Son Breweries, and of William Sealy Gosset aka 'Student': Z ---------------- = "t" Z ~ No(0,1) indep. Y ~ Ga(nu/2, nu/2) sqrt( Y/nu ) With some algebra, one can show (like Gosset did) that "t" has pdf f(t) = const / (1 + t^2/nu)^{ (1+nu)/2 } with nu = n-1 the "degrees of freedom". ----------------------------------------------------------------------- Here's a sketch of the algebra: First, Z^2 has a chi^2 dist'n, with pdf f(x) = exp(-x/2) / sqrt(2 pi x), x>0. Then, really a product convolution formula: P[ | T | <= t ] = P [ Z^2 > t^2 Y/nu ] = int_0^oo { int_0^{t^2 Y/nu} [exp(-x/2) / sqrt(2 pi x) dx ] } (1/2)^(nu/2) y^(nu/2-1) exp(-y/2)/Gamma(nu/2) dy Now take d/dt to get twice the pdf for t; change variables from y to x = y(1+t^2/nu)/2 inside the "gamma" integral. ----------------------------------------------------------------------- Unfortunately the distribution of t does depend on nu so, unlike with the Normal distribution, we need a separate "table" for every sample size n. Since no publisher will let someone put 30 pages of t tables in their book, we instead get "critical values" of the t, i.e., the numbers b such that P[ -b < t < b ] = 0.95, 0.975, 0.99, 0.995 (say) for nu=1,2,3,...,30 or so; as nu -> oo these converge to the "critical values" for the normal distribution, so the tables stop at about nu=30. In R we can compute: pt( q, nu ) = Pr[ t <= q : df = nu ] and qt( p, nu ) = q if p = Pr[ t <= q : df = nu ] (also the density function dt( q, nu ), "n" iid random draws rt( n, nu)) For df nu=1 (so we're estimating sigma^2 from just n=2 observations), the t distribution is the same as the Cauchy, with pdf 1 / pi f(t) = ------------------ 1 + t^2 and hence CDF F(t) = [ arctan(t) / pi ] + (1/2) so a central 95% interval would be 0.95 = Pr[ -t* < T < t* ] for t* satisfying F(t*) = [ arctan(t*) / pi ] + (1/2) = 0.975 i.e. arctan(t*) = pi * 0.475 i.e. t* = tan( pi * 0.475 ) = 12.70620, *WAY* bigger than the Normal Distribution value (1.96) we get as nu -> oo. -------------- Bayesian Analysis with Unknown Variance ------------------ The Normal distribution can be parametrized in terms of its mean and PRECISION: tau = 1/sigma^2 to have lh (tau/2 pi)^n/2 exp{- tau sum(X_i-mu)^2/2 } = (tau/2 pi)^n/2 exp{- tau sum(X_i-x-bar)^2/2 - n tau (x-bar-mu)^2/2 } As functions of tau and mu, this is of the form a normal density in mu with mean x-bar and precision PROPORTIONAL TO tau; times a gamma density function for tau. SO, let's consider the conjugate family: pi(mu, tau) ~ No(mu: mu.0, lam.0 * tau ) * Ga(tau: alp.0, bet.0) THEN, the posterior after observing x.1 ... x.n ~ No(mu, tau) will be of the same form, but now with four new parameters: mu.1 = [ lam.0 * mu.0 + n * x-bar.n ] / [ lam.0 + n ] lam.1 = lam.0 + n alp.1 = alp.0 + n/2 bet.1 = bet.0 + s.n^2/2 + n lam.0 (x-bar.n - mu.0)^2 / 2(lam.0+n) where x-bar.n = (1/n) sum ( x.j ) and s.n^2 = sum ( x.j - x-bar.n )^2 --- The marginal distribution of mu under the "normal gamma" family with params mu lam alp bet is: A constant times a shifted t with df=2 alp.0. The shift is mu.0; the constant is: sqrt(bet.0/lam.0 alp.0). Thus... with t* from the t-table with 2*alp.1 degrees of freedom, gamma = Pr[ mu.1 - t* * ---------------------------------------------------------------------- Geometric Picture: The joint distribution of JOINTLY GAUSSIAN ("normal") RV's is completely determined by the MEAN VECTOR and the COVARIANCE MATRIX. Jointly normal random variables are UNCORRELATED if and only if they are INDEPENDENT. The joint pdf for {Z.j} is spherically symmetric; for any unit vector v in R^n, the orthogonal proj'n of Z.j onto {c*v: c\in R} is of the form Y*v for a normal RV Y. The orthogonal complement [ Z - Y v ] has a standard Gaussian distribution in the (n-1) dimensional space of vectors orthogonal to v, and is independent of Y. ---------------------------------------------------------------------- A little about MV Normal Distributions: If Z = (Z_1,...,Z_n) is a standard n-dimensional normal random vector, then the joint pdf for the components of Z is merely (2pi)^(-n/2) exp( - z'z/2 ) where by "z" we denote the column (n x 1) vector of components, by " z' " its transpose, a (1 x n) row vector, and by "z'z" their matrix product, the sum of squares. For any n x n matrix A and any column vector mu , the random vector X = mu + A Z (*) is also normally-distributed, but now with mean E[ X_j ] = mu_j and *covariance* matrix C = { C_ij } with components C_ij = E[ ( X_i - mu_i ) ( X_j - mu_j ) ] or, in Matrix form, C = E[ (X-mu) (X-mu)' ] (work out the coordinates to see that this is correct. Note that for a column vector V, the quantity " V' V " is the 1x1 sum of squared elements, but " V V' " is the n x n matrix of terms v_i v_j ). Upon substituting our formula for X in (*) above, since E [ Z Z' ] is the identity matrix, C = E[ A Z (A Z)' ] = E[ A Z Z' A' ] = A E[ Z Z'] A' = A A'. If A is invertible we can recover Z from X as Z = A^{-1} [ X - mu ] and so we can write the sum of squares Z'Z as Z'Z = [ X - mu ]' (A^{-1} )' A^{-1} [ X - mu ] = [ X - mu ]' ( A A' ) ^{-1} [ X - mu ] = [ X - mu ]' C^{-1} [ X - mu ] The change-of-variables formula now gives the density function for X as: 1 -n/2 - Z'Z / 2 f(x) = ------------ (2 pi) e (**) | g'(z) | where z = A^{-1} [ x - mu ] is g^{-1} ( x ) for the function g(z) = mu + A z whose Jacobian (derivative matrix) is g'(z) = A For n-dimensional change-of-variable (COV), " | g'(z) | " in (**) MEANS the absolute value of the determinant of the Jacobian matrix g'(z). Since det(C) = det(A A') = det(A) det(A') = det(A)^2, and since the determinant of c B for any constant c and matrix B is c^n det(B), we can evaluate | g'(z) | as sqrt( det( C ) ). Putting this together, we get: 1 - [ X - mu ]' C^{-1} [ X - mu ]/2 f(x) = --------------------- e sqrt( det( 2 pi C ) ) for the joint density function of a multivariate normal X ~ No( mu, C ).