Week 10: One and Two-sided Tests, DeGr & Sche 9.4-7:
9.4 Two-Sided Alternatives
9.5 The t Test
9.6 Comparing the Means of Two Normal Distributions
9.7 The F Distributions
--------------------------------------------------------------
Preliminary: Reminder, Hypotheses are statements about the
POPULATION, ***NOT*** the data. We *never*
test a hypothesis like "Xbar = 10"; we use Xbar
and other statistics to shed light on whether or
not the POPULATION parameter theta is th.0 (etc).
--------------------------------------------------------------
Example 1: Normal Mean
To test Ho: mu = mu.0 against the TWO-SIDED alternative
H1: mu != mu.0
for data X.j ~ No(mu, sig^2) with sig^2 known, it's obvious
(and also true!) that we should base our test only the sample
mean from the data,
Xbar ~ No(mu, sig^2/n).
Since evidence against H.0 would include both values of Xbar
that are far above mu.0 and those that are far below mu.0,
evidently the rejection region should be something like
R = { x: Xbar <= c.1 OR Xbar >= c.2 }
and the size of the test will be
alpha = P[ X in R | H.0 ]
= Phi( sqrt(n) * (c.1-mu.0)/sig )
+ Phi( sqrt(n) * (mu.0-c.2)/sig ) (draw picture)
so we can achieve a test of whatever alpha we like. The symmetric
solution would start with z s.t. Phi( -z ) = alpha/2 and then
c.1 = mu.0 - sig * z / sqrt(n) c.2 = mu.0 + sig * z / sqrt(n)
so the rejection region becomes
R = { x: | Xbar - mu.0 | >= z * sig / sqrt(n) }
Unsurprisingly, this is also the Generalized Likelihood Ratio
(GLR) test for this composite alternative hypothesis.
---------
The Power function for this test will be
pi(mu) = P[ X in R | mu ]
= Phi( sqrt(n) * (c.1-mu)/sig )
+ Phi( sqrt(n) * (mu-c.2)/sig ) (plot it)
or, in the symmetric case with c.j = mu.0 +/- sig z/sqrt(n),
= Phi( (mu-mu.0) * sqrt(n)/sig - z)
+ Phi( (mu.0-mu) * sqrt(n)/sig - z)
Evidently pi(mu.0) = alpha and otherwise pi(mu) > alpha;
it rises quickly to one if sqrt(n)/sig is large, i.e., if
EITHER we have a big sample-size n OR sig^2 is tiny.
Also draw pi(mu) for the asymmetric case, where
(mu.0-c.1) != (c.2-mu.0) (or c-bar != mu.0)
-------------------------------------------------------------
Example 2: Poisson Mean
Now suppose X.1 ... X.n ~ Po( th ) and we'd like to test
H.0: th = th.0 vs H.1: th != th.0
How can we proceed? Again X.bar is a sufficient statistic
and again we would like to reject H.0 if it is either too
big (say, Xbar > c.2) or too small (say, Xbar < c.1). Now
since n * X.bar = X.1+X.2+...+X.n ~ Po(n*th), the *exact*
size of the test is
alpha = Pr[ Y <= n*c.1 ] + Pr[ Y >= n*c.2 ]
for Y ~ Po( n*th.0 ), (Remember--- ALWAYS compute the
size using the H.0 dist'n), and the power of the test is:
pi(th) = Pr[ Y <= n*c.1 ] + Pr[ Y >= n*c.2 ]
for Y ~ Po( n*th ). This can be computed exactly as:
= sum_{0 <= k <= n*c.1} [ Poisson pmf here ]
+ sum_{n*c2 <= k < oo } [ Poisson pmf here ]
--------------------------------
The two-sided P-value is:
Set S.n = n*Xbar = Sum { X.i }.
If Xbar > th.0, report:
P = 2 * [ Sum of Po(n*th.0) pmf from S.n to oo ]
If Xbar < th.0, report:
P = 2 * [ Sum of Po(n*th.0) pmf from 0 to S.n ]
If N is large enough, this is in both cases approximately:
1/2 - n | Xbar - th.0 |
P = Phi [ ---------------------------- ]
sqrt{ n th.0 }
***************************** Optional *************************
*
* We can also use the connection with the Gamma distribution:
*
* Catch [ < k fish in t hours ]
* <==> k'th fish takes more than t hours
*
* If N.t = number of fish caught in t hrs
* T.k = time at which k'th fish is caught
* and we catch lambda fish/hr on average, then
*
* P[ N.t < k ] = Pr[ Poisson w/mean t*lam is < k ]
* = Pr[ T.k > t ]
* = Pr[ Gamma (k, lam) is > t ]
*
* If we choose lam=1/2 then this is a chi-square:
*
* = Pr[ chi-square w/ df=2k exceeds t ]
*
* SO, for an interval that's symmetric in the sense that there
* is probability alpha/2 error in each direction, we want
*
* alpha/2 = Pr[ Y <= n*c.1 ]
* = Pr[ Y < 1+n*c.1 ]
* = Pr[ T.(1+n*c.1) > 2*n*th.0 ]
*
* alpha/2 = Pr[ Y >= n*c.2 ]
* = Pr[ T.(n*c.2) < 2*n*th.0 ]
*
* for T.k ~ Ga(k, 1/2) = chi^2 w/ df = 2*k
***************************************************************
9.5 The t Test
If sigma^2 isn't known, we estimate it from the data (duh) and
replace Phi() with the t distribution with nu = (n-1) degrees of
freedom.
--------------- PAIRED T TEST ----------------------------------
Let's start looking at COMPARING populations.
Let X.j be the mileage Car j gets using Shell
Y.j .. ... ....... Car j .... ..... Mobil
for j=1,...,n. If Shell and Mobil are identical then we would
expect X.j and Y.j to have the same distribution--- perhaps they
are both normal with the same mean & variance. If Shell and Mobil
are DIFFERENT, then the easiest thing to discover would be for the
means to differ, with everything independent and normally distributed:
Model: X.j ~ No(mu.x, sig.x^2), Y.j ~ No(mu.y, sig.y^2)
H.0: mu.x = mu.y H.1: mu.x != mu.y
The sample averages are Xbar ~ No(mu.x, sig.x^2/n) and
Ybar ~ No(mu.y, sig.y^2/n),
and we can test the hypothesis by rejecting whenever
| Xbar - Ybar |
(**) ------------------------------- > 1.96
sqrt(sig.x^2/n + sig.y^2/n)
if we want alpha=0.05. Doesn't matter if sig.x=sig.y or not.
BUT-------- what if the cars are different? Then we expect
the X.j's and Y.j's to vary quite a bit, for two reasons:
a) Random variation from trial-to-trial
b) Variability of the cars
If the fuels are pretty much similar, we'd expect a plot of
points (X.j, Y.j) to be close to the line Y=X (draw on board)
How can we remove the car-to-car variability? This is our first
encounter with "BLOCKING":
Answer: Look at the DIFFERENCES instead of the values:
D.j = Y.j - X.j
and model THESE as No(mu, sig^2) with H.0: mu=0
Note that IF the X's and Y's are all iid No with means
mu.x and mu.y and the SAME VARIANCE sig.x^2 = sig.y^2
then D.j will be iid No(mu,sig^2) with and variance
mu=mu.x-mu.y sig^2 = sig.x^2 + sig.y^2
so this "new" test will be identical with the one we looked
at above--- BUT the "paired t" does NOT ASSUME that the X's
or Y's have individual normal dis'ns, only that the DIFFERENES
do. If the cars differ we expect the individual X's to be quite
variable (have big variance sig.x^2) because of the car-to-car
variability, and similarly sig.y^2 will be huge because of the
car-to-car variability, so we expect sig^2 to be (maybe much)
smaller than (sig.x^2+sig.y^2) above)--- making the "paired t"
test more powerful. If the cars DON'T vary, then the tests are
identical.
If sig^2 isn't known, we just estimate it from the sample variance
1
s^2 = ----- Sum { (D.j - Dbar)^2 }
n-1
and apply the Student t distribution with nu = n-1 deg fdm.
---------------------------------------------------------------
End of Tue lec, start of Thu lec
---------------------------------------------------------------
If we have INDEPENDENT SAMPLES from the two populations, the
earlier test is best we can do; no need for sample sizes to
be the same:
Model: X.j ~ No(mu.x, sig.x^2), Y.j ~ No(mu.y, sig.y^2)
H.0: mu.x = mu.y H.1: mu.x != mu.y
The sample averages are Xbar ~ No(mu.x, sig.x^2/m) and
Ybar ~ No(mu.y, sig.y^2/n),
and we can test the hypothesis by rejecting whenever
| Xbar - Ybar |
(**) ------------------------------- > 1.96
sqrt(sig.x^2/m + sig.y^2/n)
Notice that IF m=n then the differences "D.i = X.i - Y.i" have
mean E[ D.i ] = mu = mu.x - mu.y
variance V[ D.i ] = sig^2 = sig.x^2 + sig.y^2
SAMPLE mean Dbar.n = Xbar.n - Ybar.n
so a "Paired Z Test" of [ H.0: mu = 0 ] would be identical to
this "Independent Sample Z Test" of [ H.0: mu.x = mu.y ].
------------- UNKNOWN VARIANCE -----------------
If sig.x and sig.y are unknown, we're only okay
IF THE VARIANCES ARE KNOWN TO BE IDENTICAL
(or PROPORTIONAL, but that pretty much never happens). If they're
different, it's a hard problem ("Behrens-Fisher") and nobody has a
great answer. If they're identical, though, then the GLR test is:
1
Est. sigma^2 by s^2 = ----------- [ S2x + S2y ]
m + n - 2
where
S2x = Sum { (x.i - Xbar)^2 } && S2y = Sum { (y.j - Ybar)^2 }
Note S2x/sig^2 ~ Chi-square(m-1) and S2y/sig^2 ~ Chi-square(n-1)
so ( s^2 / sig^2 ) ~ chi-square(nu)/nu with nu = (n+m-2), so
(X.bar-Y.bar)
t = --------------------
s sqrt(1/m + 1/n)
has a t_nu dist'n and we can test as usual, one-sided or two-sided.
The "degrees of freedom" for this t are n+m-2... do you see why???
------------ Comparison with Paired t test --------------
In the Paired T Test, if we don't know sig^2 we estimate it by
1
s^2 = ------- Sum { (D.i - Dbar)^2 }
n - 1
which is NOT the same as in the "Independent Sample t Test"... it
has n-1 degrees of freedom, just half of what the Independent Sample
test has ( m+n-2 ). Thus we have a choice:
Paired t: Better if the variablity of the { X.i } among themselves
is large compared to that of the differences { X.i - Y.i }
Indep Sample: Better if the variablity of the { X.i } among themselves
is the same as that of the differences, and can help us
do a better job of estimating sigma^2
------------------------------------------------------------------
The F Distribution:
How can we TELL if two variances are the same? For example, suppose
(as above) we have independent samples
X.i ~ No(mu.x, sig.x^2)
Y.j ~ No(mu.y, sig.y^2)
and we'd like to know if sig.x = sig.y or not... how can we tell?
Since ( S2x / sig.x^2 ) ~ chi^2 (m-1) = Ga( (m-1)/2, 1/2 )
and ( S2y / sig.y^2 ) ~ chi^2 (n-1) = Ga( (n-1)/2, 1/2 ),
IF H.0 is true we have two independent estimates of sig^2:
S2x / (m-1) ~ Ga( (m-1)/2, (m-1)/2sig.x^2 ) ~~ sig.x^2
S2y / (n-1) ~ Ga( (n-1)/2, (n-1)/2sig.y^2 ) ~~ sig.y^2
whose ratio ought to be about one, if [H.0: sig.x=sig.y] is true:
S2x / (m-1) m-1 Ga( (m-1)/2, (m-1)/2 )
------------ ~ F = ----------------------
S2y / (n-1) n-1 Ga( (n-1)/2, (n-1)/2 )
the ratio of independent Gamma's, each with mean one.
Simce the F distribution has *two* 'degrees of freedom' parameters
it's a bear to make tables for it.... but with computers it's no
problem. A symmetric test of
H.0: sig.x^2 = sig.y^2 vs. H.1: sig.x^2 != sig.y^2
S2x / (m-1)
would reject if ------------ is way bigger than 1 or way smaller...
S2y / (n-1)
i.e. if this ratio "F" satisfies [ F < a ] or [ F > b ] where
m-1 n-1
(alpha/2) = Pr[ F < a ] = Pr[ F > 1/a ]
n-1 m-1
m-1
(alpha/2) = Pr[ F > b ]
n-1
(Do you see why??? Note most chi-square tables give only RIGHT tail
probabilities, but by swapping degrees of freedom we can find left tails).
To find the P-value, just pick whichever estimate of sig^2 is bigger and
put IT on the top (numerator) of the variance ratio, then report TWICE
the probability that the appropriate F random variable would be bigger.
------------------------------------------------------------------------
CONNECTIONS:
t: If t has a Student t dist'n with nu degrees of freedom, then
t^2 has an F distribution with 1 numerator and nu denominator
degrees of freedom.
Be: If X and Y have independent Gamma distributions with the same
(arbitrary) rate parameter and maybe different shapes a, b
then:
X / a 2a
F = --------- has an F
Y / b 2b
distribution, while (since X = F Y a/b)
X F a/b a F
Z = ------- = ---------- = ---------
X + Y F a/b + 1 a F + b
has a Beta(a, b) distribution. This fact can be used to help
find the F pdf or CDF by change-of-variables.
-------------------------------------------------------------------------
THREE or more POPULATION MEANS:
Now suppose we have several (k) "populations", all normally
distributed with the same variance,
X_{ij} ~ No(mu_i, sig^2), 1 <= j <= n_i
and we want to test the hypothesis
H.0: All means are equal vs. H.1: Not so.
Let Xbar.i be the average of the i'th sample, and set
S2i = Sum_j { ( X_{ij} - Xbar.i )^2 }
(the sum-of-squares for the i'th sample). Here are two
independent estimates of sigma^2:
1
Sig.W ("Within"): ------------- Sum { S2i } (df = sum (n.i-1) )
Sum (n.i-1)
If there are k >= 2 samples (maybe from different populations),
we can also get a variance estimate from how widely the Xbar.i's
vary. Let k be the number of populatins, and N = Sum { n_i };
the "grand mean" is
Sum { n.i * Xbar.i } Sum { X_{ij} }
XBAR = ------------------------ = ------------------
Sum { n.i } N
and the grand sum-of-squares can be decomposed as:
Sum { (X_ij - mu)^2 } = Sum { (X_ij - XBAR)^2 } + N (XBAR-mu)^2
= Sum_i [ Sum_j (X_ij - Xbar.i)^2 + n_i (Xbar.i - XBAR)^2 ] + ...
= { SSW = Sum_ij (X_ij - Xbar.i)^2 } (df = N-k)
+ { SSB = Sum_i n_i (Xbar.i-XBAR)^2 } (df = k-1)
+ { N (XBAR-mu)^2 } (df = 1)
If H.0 is true, then SSW/(N-k) and SSB/(k-1) will be independent
unbiased estimates of sigma^2, and their ratio
SSB / (k-1) k-1
F = --------------- ~ F
SSW / (N-k) N-k
will have an F distribution. On the other hand, if H.1 is true,
then the denominator will still be an unbiased estimator of sigma^2
with (N-k) degrees of freedom, but the numberator should be HUGE...
because it includes the variability of the mu_i's.
SO, we can test H.0 by rejecting when F is big, under the F
distribution.
When comparing just k=2 populations, this F is just t^2 for the
usual Student's t test.