Week 10: One and Two-sided Tests, DeGr & Sche 9.4-7: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6 Comparing the Means of Two Normal Distributions 9.7 The F Distributions -------------------------------------------------------------- Preliminary: Reminder, Hypotheses are statements about the POPULATION, ***NOT*** the data. We *never* test a hypothesis like "Xbar = 10"; we use Xbar and other statistics to shed light on whether or not the POPULATION parameter theta is th.0 (etc). -------------------------------------------------------------- Example 1: Normal Mean To test Ho: mu = mu.0 against the TWO-SIDED alternative H1: mu != mu.0 for data X.j ~ No(mu, sig^2) with sig^2 known, it's obvious (and also true!) that we should base our test only the sample mean from the data, Xbar ~ No(mu, sig^2/n). Since evidence against H.0 would include both values of Xbar that are far above mu.0 and those that are far below mu.0, evidently the rejection region should be something like R = { x: Xbar <= c.1 OR Xbar >= c.2 } and the size of the test will be alpha = P[ X in R | H.0 ] = Phi( sqrt(n) * (c.1-mu.0)/sig ) + Phi( sqrt(n) * (mu.0-c.2)/sig ) (draw picture) so we can achieve a test of whatever alpha we like. The symmetric solution would start with z s.t. Phi( -z ) = alpha/2 and then c.1 = mu.0 - sig * z / sqrt(n) c.2 = mu.0 + sig * z / sqrt(n) so the rejection region becomes R = { x: | Xbar - mu.0 | >= z * sig / sqrt(n) } Unsurprisingly, this is also the Generalized Likelihood Ratio (GLR) test for this composite alternative hypothesis. --------- The Power function for this test will be pi(mu) = P[ X in R | mu ] = Phi( sqrt(n) * (c.1-mu)/sig ) + Phi( sqrt(n) * (mu-c.2)/sig ) (plot it) or, in the symmetric case with c.j = mu.0 +/- sig z/sqrt(n), = Phi( (mu-mu.0) * sqrt(n)/sig - z) + Phi( (mu.0-mu) * sqrt(n)/sig - z) Evidently pi(mu.0) = alpha and otherwise pi(mu) > alpha; it rises quickly to one if sqrt(n)/sig is large, i.e., if EITHER we have a big sample-size n OR sig^2 is tiny. Also draw pi(mu) for the asymmetric case, where (mu.0-c.1) != (c.2-mu.0) (or c-bar != mu.0) ------------------------------------------------------------- Example 2: Poisson Mean Now suppose X.1 ... X.n ~ Po( th ) and we'd like to test H.0: th = th.0 vs H.1: th != th.0 How can we proceed? Again X.bar is a sufficient statistic and again we would like to reject H.0 if it is either too big (say, Xbar > c.2) or too small (say, Xbar < c.1). Now since n * X.bar = X.1+X.2+...+X.n ~ Po(n*th), the *exact* size of the test is alpha = Pr[ Y <= n*c.1 ] + Pr[ Y >= n*c.2 ] for Y ~ Po( n*th.0 ), (Remember--- ALWAYS compute the size using the H.0 dist'n), and the power of the test is: pi(th) = Pr[ Y <= n*c.1 ] + Pr[ Y >= n*c.2 ] for Y ~ Po( n*th ). This can be computed exactly as: = sum_{0 <= k <= n*c.1} [ Poisson pmf here ] + sum_{n*c2 <= k < oo } [ Poisson pmf here ] -------------------------------- The two-sided P-value is: Set S.n = n*Xbar = Sum { X.i }. If Xbar > th.0, report: P = 2 * [ Sum of Po(n*th.0) pmf from S.n to oo ] If Xbar < th.0, report: P = 2 * [ Sum of Po(n*th.0) pmf from 0 to S.n ] If N is large enough, this is in both cases approximately: 1/2 - n | Xbar - th.0 | P = Phi [ ---------------------------- ] sqrt{ n th.0 } ***************************** Optional ************************* * * We can also use the connection with the Gamma distribution: * * Catch [ < k fish in t hours ] * <==> k'th fish takes more than t hours * * If N.t = number of fish caught in t hrs * T.k = time at which k'th fish is caught * and we catch lambda fish/hr on average, then * * P[ N.t < k ] = Pr[ Poisson w/mean t*lam is < k ] * = Pr[ T.k > t ] * = Pr[ Gamma (k, lam) is > t ] * * If we choose lam=1/2 then this is a chi-square: * * = Pr[ chi-square w/ df=2k exceeds t ] * * SO, for an interval that's symmetric in the sense that there * is probability alpha/2 error in each direction, we want * * alpha/2 = Pr[ Y <= n*c.1 ] * = Pr[ Y < 1+n*c.1 ] * = Pr[ T.(1+n*c.1) > 2*n*th.0 ] * * alpha/2 = Pr[ Y >= n*c.2 ] * = Pr[ T.(n*c.2) < 2*n*th.0 ] * * for T.k ~ Ga(k, 1/2) = chi^2 w/ df = 2*k *************************************************************** 9.5 The t Test If sigma^2 isn't known, we estimate it from the data (duh) and replace Phi() with the t distribution with nu = (n-1) degrees of freedom. --------------- PAIRED T TEST ---------------------------------- Let's start looking at COMPARING populations. Let X.j be the mileage Car j gets using Shell Y.j .. ... ....... Car j .... ..... Mobil for j=1,...,n. If Shell and Mobil are identical then we would expect X.j and Y.j to have the same distribution--- perhaps they are both normal with the same mean & variance. If Shell and Mobil are DIFFERENT, then the easiest thing to discover would be for the means to differ, with everything independent and normally distributed: Model: X.j ~ No(mu.x, sig.x^2), Y.j ~ No(mu.y, sig.y^2) H.0: mu.x = mu.y H.1: mu.x != mu.y The sample averages are Xbar ~ No(mu.x, sig.x^2/n) and Ybar ~ No(mu.y, sig.y^2/n), and we can test the hypothesis by rejecting whenever | Xbar - Ybar | (**) ------------------------------- > 1.96 sqrt(sig.x^2/n + sig.y^2/n) if we want alpha=0.05. Doesn't matter if sig.x=sig.y or not. BUT-------- what if the cars are different? Then we expect the X.j's and Y.j's to vary quite a bit, for two reasons: a) Random variation from trial-to-trial b) Variability of the cars If the fuels are pretty much similar, we'd expect a plot of points (X.j, Y.j) to be close to the line Y=X (draw on board) How can we remove the car-to-car variability? This is our first encounter with "BLOCKING": Answer: Look at the DIFFERENCES instead of the values: D.j = Y.j - X.j and model THESE as No(mu, sig^2) with H.0: mu=0 Note that IF the X's and Y's are all iid No with means mu.x and mu.y and the SAME VARIANCE sig.x^2 = sig.y^2 then D.j will be iid No(mu,sig^2) with and variance mu=mu.x-mu.y sig^2 = sig.x^2 + sig.y^2 so this "new" test will be identical with the one we looked at above--- BUT the "paired t" does NOT ASSUME that the X's or Y's have individual normal dis'ns, only that the DIFFERENES do. If the cars differ we expect the individual X's to be quite variable (have big variance sig.x^2) because of the car-to-car variability, and similarly sig.y^2 will be huge because of the car-to-car variability, so we expect sig^2 to be (maybe much) smaller than (sig.x^2+sig.y^2) above)--- making the "paired t" test more powerful. If the cars DON'T vary, then the tests are identical. If sig^2 isn't known, we just estimate it from the sample variance 1 s^2 = ----- Sum { (D.j - Dbar)^2 } n-1 and apply the Student t distribution with nu = n-1 deg fdm. --------------------------------------------------------------- End of Tue lec, start of Thu lec --------------------------------------------------------------- If we have INDEPENDENT SAMPLES from the two populations, the earlier test is best we can do; no need for sample sizes to be the same: Model: X.j ~ No(mu.x, sig.x^2), Y.j ~ No(mu.y, sig.y^2) H.0: mu.x = mu.y H.1: mu.x != mu.y The sample averages are Xbar ~ No(mu.x, sig.x^2/m) and Ybar ~ No(mu.y, sig.y^2/n), and we can test the hypothesis by rejecting whenever | Xbar - Ybar | (**) ------------------------------- > 1.96 sqrt(sig.x^2/m + sig.y^2/n) Notice that IF m=n then the differences "D.i = X.i - Y.i" have mean E[ D.i ] = mu = mu.x - mu.y variance V[ D.i ] = sig^2 = sig.x^2 + sig.y^2 SAMPLE mean Dbar.n = Xbar.n - Ybar.n so a "Paired Z Test" of [ H.0: mu = 0 ] would be identical to this "Independent Sample Z Test" of [ H.0: mu.x = mu.y ]. ------------- UNKNOWN VARIANCE ----------------- If sig.x and sig.y are unknown, we're only okay IF THE VARIANCES ARE KNOWN TO BE IDENTICAL (or PROPORTIONAL, but that pretty much never happens). If they're different, it's a hard problem ("Behrens-Fisher") and nobody has a great answer. If they're identical, though, then the GLR test is: 1 Est. sigma^2 by s^2 = ----------- [ S2x + S2y ] m + n - 2 where S2x = Sum { (x.i - Xbar)^2 } && S2y = Sum { (y.j - Ybar)^2 } Note S2x/sig^2 ~ Chi-square(m-1) and S2y/sig^2 ~ Chi-square(n-1) so ( s^2 / sig^2 ) ~ chi-square(nu)/nu with nu = (n+m-2), so (X.bar-Y.bar) t = -------------------- s sqrt(1/m + 1/n) has a t_nu dist'n and we can test as usual, one-sided or two-sided. The "degrees of freedom" for this t are n+m-2... do you see why??? ------------ Comparison with Paired t test -------------- In the Paired T Test, if we don't know sig^2 we estimate it by 1 s^2 = ------- Sum { (D.i - Dbar)^2 } n - 1 which is NOT the same as in the "Independent Sample t Test"... it has n-1 degrees of freedom, just half of what the Independent Sample test has ( m+n-2 ). Thus we have a choice: Paired t: Better if the variablity of the { X.i } among themselves is large compared to that of the differences { X.i - Y.i } Indep Sample: Better if the variablity of the { X.i } among themselves is the same as that of the differences, and can help us do a better job of estimating sigma^2 ------------------------------------------------------------------ The F Distribution: How can we TELL if two variances are the same? For example, suppose (as above) we have independent samples X.i ~ No(mu.x, sig.x^2) Y.j ~ No(mu.y, sig.y^2) and we'd like to know if sig.x = sig.y or not... how can we tell? Since ( S2x / sig.x^2 ) ~ chi^2 (m-1) = Ga( (m-1)/2, 1/2 ) and ( S2y / sig.y^2 ) ~ chi^2 (n-1) = Ga( (n-1)/2, 1/2 ), IF H.0 is true we have two independent estimates of sig^2: S2x / (m-1) ~ Ga( (m-1)/2, (m-1)/2sig.x^2 ) ~~ sig.x^2 S2y / (n-1) ~ Ga( (n-1)/2, (n-1)/2sig.y^2 ) ~~ sig.y^2 whose ratio ought to be about one, if [H.0: sig.x=sig.y] is true: S2x / (m-1) m-1 Ga( (m-1)/2, (m-1)/2 ) ------------ ~ F = ---------------------- S2y / (n-1) n-1 Ga( (n-1)/2, (n-1)/2 ) the ratio of independent Gamma's, each with mean one. Simce the F distribution has *two* 'degrees of freedom' parameters it's a bear to make tables for it.... but with computers it's no problem. A symmetric test of H.0: sig.x^2 = sig.y^2 vs. H.1: sig.x^2 != sig.y^2 S2x / (m-1) would reject if ------------ is way bigger than 1 or way smaller... S2y / (n-1) i.e. if this ratio "F" satisfies [ F < a ] or [ F > b ] where m-1 n-1 (alpha/2) = Pr[ F < a ] = Pr[ F > 1/a ] n-1 m-1 m-1 (alpha/2) = Pr[ F > b ] n-1 (Do you see why??? Note most chi-square tables give only RIGHT tail probabilities, but by swapping degrees of freedom we can find left tails). To find the P-value, just pick whichever estimate of sig^2 is bigger and put IT on the top (numerator) of the variance ratio, then report TWICE the probability that the appropriate F random variable would be bigger. ------------------------------------------------------------------------ CONNECTIONS: t: If t has a Student t dist'n with nu degrees of freedom, then t^2 has an F distribution with 1 numerator and nu denominator degrees of freedom. Be: If X and Y have independent Gamma distributions with the same (arbitrary) rate parameter and maybe different shapes a, b then: X / a 2a F = --------- has an F Y / b 2b distribution, while (since X = F Y a/b) X F a/b a F Z = ------- = ---------- = --------- X + Y F a/b + 1 a F + b has a Beta(a, b) distribution. This fact can be used to help find the F pdf or CDF by change-of-variables. ------------------------------------------------------------------------- THREE or more POPULATION MEANS: Now suppose we have several (k) "populations", all normally distributed with the same variance, X_{ij} ~ No(mu_i, sig^2), 1 <= j <= n_i and we want to test the hypothesis H.0: All means are equal vs. H.1: Not so. Let Xbar.i be the average of the i'th sample, and set S2i = Sum_j { ( X_{ij} - Xbar.i )^2 } (the sum-of-squares for the i'th sample). Here are two independent estimates of sigma^2: 1 Sig.W ("Within"): ------------- Sum { S2i } (df = sum (n.i-1) ) Sum (n.i-1) If there are k >= 2 samples (maybe from different populations), we can also get a variance estimate from how widely the Xbar.i's vary. Let k be the number of populatins, and N = Sum { n_i }; the "grand mean" is Sum { n.i * Xbar.i } Sum { X_{ij} } XBAR = ------------------------ = ------------------ Sum { n.i } N and the grand sum-of-squares can be decomposed as: Sum { (X_ij - mu)^2 } = Sum { (X_ij - XBAR)^2 } + N (XBAR-mu)^2 = Sum_i [ Sum_j (X_ij - Xbar.i)^2 + n_i (Xbar.i - XBAR)^2 ] + ... = { SSW = Sum_ij (X_ij - Xbar.i)^2 } (df = N-k) + { SSB = Sum_i n_i (Xbar.i-XBAR)^2 } (df = k-1) + { N (XBAR-mu)^2 } (df = 1) If H.0 is true, then SSW/(N-k) and SSB/(k-1) will be independent unbiased estimates of sigma^2, and their ratio SSB / (k-1) k-1 F = --------------- ~ F SSW / (N-k) N-k will have an F distribution. On the other hand, if H.1 is true, then the denominator will still be an unbiased estimator of sigma^2 with (N-k) degrees of freedom, but the numberator should be HUGE... because it includes the variability of the mu_i's. SO, we can test H.0 by rejecting when F is big, under the F distribution. When comparing just k=2 populations, this F is just t^2 for the usual Student's t test.