Week 10: One and Two-sided Tests, DeGr & Sche 9.4-7:

9.4  Two-Sided Alternatives
9.5  The t Test
9.6  Comparing the Means of Two Normal Distributions
9.7  The F Distributions
--------------------------------------------------------------
Preliminary:  Reminder, Hypotheses are statements about the
              POPULATION,  ***NOT*** the data.  We *never*
              test a hypothesis like "Xbar = 10"; we use Xbar
              and other statistics to shed light on whether or
              not the POPULATION parameter theta is th.0 (etc).
--------------------------------------------------------------

Example 1:  Normal Mean

     To test Ho: mu  = mu.0 against the TWO-SIDED alternative
             H1: mu != mu.0

for data X.j ~ No(mu, sig^2) with sig^2 known, it's obvious
(and also true!) that we should base our test only the sample
mean from the data,
                     Xbar ~ No(mu, sig^2/n).

Since evidence against H.0 would include both values of Xbar 
that are far above mu.0 and those that are far below mu.0, 
evidently the rejection region should be something like

         R  =  { x:  Xbar <= c.1  OR  Xbar >= c.2 }

and the size of the test will be

     alpha = P[ X in R | H.0 ] 
           = Phi( sqrt(n) * (c.1-mu.0)/sig ) 
           + Phi( sqrt(n) * (mu.0-c.2)/sig )   (draw picture)

so we can achieve a test of whatever alpha we like.  The symmetric
solution would start with z s.t. Phi( -z ) = alpha/2 and then
  c.1 = mu.0 - sig * z / sqrt(n)   c.2 = mu.0 + sig * z / sqrt(n) 
so the rejection region becomes

     R  =  { x:  | Xbar - mu.0 |  >=  z * sig / sqrt(n) }

Unsurprisingly, this is also the Generalized Likelihood Ratio
(GLR) test for this composite alternative hypothesis.

---------

The Power function for this test will be

    pi(mu) = P[ X in R | mu ] 
           = Phi( sqrt(n) * (c.1-mu)/sig ) 
           + Phi( sqrt(n) * (mu-c.2)/sig )     (plot it)

or, in the symmetric case with c.j = mu.0 +/- sig z/sqrt(n),

           = Phi( (mu-mu.0) * sqrt(n)/sig - z)
           + Phi( (mu.0-mu) * sqrt(n)/sig - z)

Evidently pi(mu.0) = alpha and otherwise pi(mu) > alpha;
it rises quickly to one if sqrt(n)/sig is large, i.e., if
EITHER we have a big sample-size n OR sig^2 is tiny.

Also draw pi(mu) for the asymmetric case, where

          (mu.0-c.1) != (c.2-mu.0) (or c-bar != mu.0)
-------------------------------------------------------------

Example 2:  Poisson Mean

Now suppose X.1 ... X.n ~ Po( th ) and we'd like to test

      H.0:  th = th.0  vs  H.1:  th != th.0

How can we proceed?  Again X.bar is a sufficient statistic
and again we would like to reject H.0 if it is either too
big (say, Xbar > c.2)  or too small (say, Xbar < c.1).  Now
since n * X.bar = X.1+X.2+...+X.n ~ Po(n*th), the *exact* 
size of the test is

     alpha = Pr[ Y <= n*c.1 ] + Pr[ Y >= n*c.2 ]

for Y ~ Po( n*th.0 ), (Remember--- ALWAYS compute the 
size using the H.0 dist'n), and the power of the test is:

   pi(th)  = Pr[ Y <= n*c.1 ] + Pr[ Y >= n*c.2 ]

for Y ~ Po( n*th ).  This can be computed exactly as:

           = sum_{0 <= k <= n*c.1} [ Poisson pmf here ]
           + sum_{n*c2 <= k < oo } [ Poisson pmf here ]

-------------------------------- 
The two-sided P-value is:

Set S.n = n*Xbar = Sum { X.i }.

If Xbar > th.0, report:

      P = 2 * [ Sum of Po(n*th.0) pmf from S.n to oo ]

If Xbar < th.0,  report:

      P = 2 * [ Sum of Po(n*th.0) pmf from 0 to S.n ]

If N is large enough, this is in both cases approximately:

                   1/2 - n | Xbar - th.0 |
      P = Phi [ ---------------------------- ]
                       sqrt{ n th.0 }


***************************** Optional *************************
* 
* We can also use the connection with the Gamma distribution:
* 
*        Catch [ < k fish in t hours ]
*  <==>  k'th fish takes more than t hours
* 
*   If N.t = number of fish caught in t hrs
*      T.k = time at which k'th fish is caught
*   and we catch lambda fish/hr on average, then
* 
*   P[ N.t < k ] = Pr[ Poisson w/mean t*lam is < k ]
*                = Pr[ T.k > t ]
*                = Pr[ Gamma (k, lam) is > t ]
* 
*   If we choose lam=1/2 then this is a chi-square:
* 
*                = Pr[ chi-square w/ df=2k exceeds t ]
* 
* SO, for an interval that's symmetric in the sense that there
* is probability alpha/2 error in each direction, we want
*       
*      alpha/2 = Pr[ Y <= n*c.1 ]
*              = Pr[ Y <  1+n*c.1 ]
*              = Pr[ T.(1+n*c.1) > 2*n*th.0 ]
* 
*      alpha/2 = Pr[ Y >= n*c.2 ]
*              = Pr[ T.(n*c.2) < 2*n*th.0 ]
* 
* for T.k ~ Ga(k, 1/2) = chi^2 w/ df = 2*k
***************************************************************
9.5  The t Test

If sigma^2 isn't known, we estimate it from the data (duh) and
replace Phi() with the t distribution with nu = (n-1) degrees of
freedom.

--------------- PAIRED T TEST ----------------------------------

Let's start looking at COMPARING populations.  

Let X.j be the mileage Car j gets using Shell
    Y.j .. ... ....... Car j .... ..... Mobil

for j=1,...,n.  If Shell and Mobil are identical then we would
expect X.j and Y.j to have the same distribution---  perhaps they
are both normal with the same mean & variance.  If Shell and Mobil
are DIFFERENT, then the easiest thing to discover would be for the
means to differ, with everything independent and normally distributed:

Model:  X.j ~ No(mu.x, sig.x^2),  Y.j ~ No(mu.y, sig.y^2)

   H.0:  mu.x = mu.y             H.1:  mu.x != mu.y


The sample averages are Xbar ~ No(mu.x, sig.x^2/n) and
                        Ybar ~ No(mu.y, sig.y^2/n),

and we can test the hypothesis by rejecting whenever

                  | Xbar - Ybar |
(**)      -------------------------------  >  1.96
            sqrt(sig.x^2/n + sig.y^2/n)

if we want alpha=0.05.  Doesn't matter if sig.x=sig.y or not.

BUT--------  what if the cars are different?  Then we expect
the X.j's and Y.j's to vary quite a bit, for two reasons:

      a)  Random variation from trial-to-trial
      b)  Variability of the cars

If the fuels are pretty much similar, we'd expect a plot of
points (X.j, Y.j) to be close to the line Y=X (draw on board)

How can we remove the car-to-car variability?  This is our first
encounter with "BLOCKING":

Answer:  Look at the DIFFERENCES instead of the values:

                   D.j = Y.j - X.j

and model THESE as No(mu, sig^2) with H.0: mu=0 

Note that IF the X's and Y's are all iid No with means
mu.x  and mu.y and the SAME VARIANCE sig.x^2 = sig.y^2
then D.j will be iid No(mu,sig^2) with and variance 

     mu=mu.x-mu.y  sig^2 = sig.x^2 + sig.y^2

so this "new" test will be identical with the one we looked
at above--- BUT the "paired t" does NOT ASSUME that the X's
or Y's have individual normal dis'ns, only that the DIFFERENES
do.  If the cars differ we expect the individual X's to be quite
variable (have big variance sig.x^2) because of the car-to-car
variability, and similarly sig.y^2 will be huge because of the
car-to-car variability, so we expect sig^2 to be (maybe much)
smaller than (sig.x^2+sig.y^2) above)---  making the "paired t"
test more powerful.  If the cars DON'T vary, then the tests are
identical.

If sig^2 isn't known, we just estimate it from the sample variance

                           1
                   s^2 = -----  Sum { (D.j - Dbar)^2 }
                          n-1

and apply the Student t distribution with nu = n-1 deg fdm.

---------------------------------------------------------------
           End of  Tue lec, start of Thu lec
---------------------------------------------------------------
If we have INDEPENDENT SAMPLES from the two populations, the
earlier test is best we can do; no need for sample sizes to
be the same:

Model:  X.j ~ No(mu.x, sig.x^2),  Y.j ~ No(mu.y, sig.y^2)

   H.0:  mu.x = mu.y             H.1:  mu.x != mu.y

The sample averages are Xbar ~ No(mu.x, sig.x^2/m) and
                        Ybar ~ No(mu.y, sig.y^2/n),

and we can test the hypothesis by rejecting whenever

                  | Xbar - Ybar |
(**)      -------------------------------  >  1.96
            sqrt(sig.x^2/m + sig.y^2/n)

Notice that IF m=n then the differences "D.i = X.i - Y.i" have

       mean   E[ D.i ] =  mu    =  mu.x - mu.y
   variance   V[ D.i ] =  sig^2 = sig.x^2 + sig.y^2
SAMPLE mean     Dbar.n =  Xbar.n - Ybar.n

so a "Paired Z Test" of [ H.0: mu = 0 ]  would be identical to
this "Independent Sample Z Test" of [ H.0: mu.x = mu.y ].

       -------------  UNKNOWN VARIANCE  -----------------

If sig.x and sig.y are unknown, we're only okay 

       IF THE VARIANCES ARE KNOWN TO BE IDENTICAL

(or PROPORTIONAL, but that pretty much never happens).  If they're
different, it's a hard problem ("Behrens-Fisher") and nobody has a
great answer.  If they're identical, though, then the GLR test is:
                               1
  Est. sigma^2 by s^2 =   -----------  [ S2x + S2y ]
                           m + n - 2
where

 S2x = Sum {  (x.i - Xbar)^2 } && S2y = Sum {  (y.j - Ybar)^2 }

Note S2x/sig^2 ~ Chi-square(m-1) and  S2y/sig^2 ~ Chi-square(n-1)
so ( s^2 / sig^2 ) ~ chi-square(nu)/nu  with nu = (n+m-2), so

                    (X.bar-Y.bar)
            t = --------------------
                  s sqrt(1/m + 1/n)

has a t_nu dist'n and we can test as usual, one-sided or two-sided.
The "degrees of freedom" for this t are n+m-2... do you see why???

   ------------ Comparison with Paired t test --------------

In the Paired T Test, if we don't know sig^2 we estimate it by

                     1
            s^2 = -------  Sum {  (D.i - Dbar)^2  }
                   n - 1

which is NOT the same as in the "Independent Sample t Test"...  it
has n-1 degrees of freedom, just half of what the Independent Sample
test has ( m+n-2 ).  Thus we have a choice:

  Paired t:      Better if the variablity of the { X.i } among themselves
                 is large compared to that of the differences { X.i - Y.i }

  Indep Sample:  Better if the variablity of the { X.i } among themselves
                 is the same as that of the differences, and can help us
                 do a better job of estimating sigma^2

------------------------------------------------------------------

                     The F Distribution:


How can we TELL if two variances are the same?  For example, suppose
(as above) we have independent samples

           X.i ~ No(mu.x, sig.x^2)
           Y.j ~ No(mu.y, sig.y^2)

and we'd like to know if sig.x = sig.y or not... how can we tell?

Since  ( S2x / sig.x^2 ) ~ chi^2 (m-1) = Ga( (m-1)/2, 1/2 )
and    ( S2y / sig.y^2 ) ~ chi^2 (n-1) = Ga( (n-1)/2, 1/2 ),

IF H.0 is true we have two independent estimates of sig^2:

   S2x / (m-1) ~  Ga( (m-1)/2, (m-1)/2sig.x^2 )  ~~  sig.x^2
   S2y / (n-1) ~  Ga( (n-1)/2, (n-1)/2sig.y^2 )  ~~  sig.y^2

whose ratio ought to be about one, if [H.0: sig.x=sig.y] is true:

       S2x / (m-1)      m-1       Ga( (m-1)/2, (m-1)/2 )
      ------------  ~  F       =  ----------------------
       S2y / (n-1)      n-1       Ga( (n-1)/2, (n-1)/2 )

the ratio of independent Gamma's, each with mean one.

Simce the F distribution has *two* 'degrees of freedom' parameters
it's a bear to make tables for it....   but with computers it's no
problem.   A symmetric test of

     H.0:  sig.x^2  = sig.y^2  vs.  H.1:  sig.x^2 != sig.y^2

                  S2x / (m-1)
would reject if  ------------  is way bigger than 1 or way smaller...
                  S2y / (n-1)

i.e. if this ratio "F" satisfies  [ F < a ]   or  [ F > b ]  where

                    m-1                    n-1
  (alpha/2) = Pr[  F     <  a  ]  =  Pr[  F     > 1/a  ]
                    n-1                    m-1

                    m-1
  (alpha/2) = Pr[  F     >  b  ]
                    n-1

(Do you see why???  Note most chi-square tables give only RIGHT tail
probabilities, but by swapping degrees of freedom we can find left tails).

To find the P-value, just pick whichever estimate of sig^2 is bigger and
put IT on the top (numerator) of the variance ratio, then report TWICE
the probability that the appropriate F random variable would be bigger.

------------------------------------------------------------------------

CONNECTIONS:

 t:    If t has a Student t dist'n with nu degrees of freedom, then
       t^2 has an F distribution with 1 numerator and nu denominator
       degrees of freedom.

 Be:   If X and Y have independent Gamma distributions with the same
       (arbitrary) rate parameter and maybe different shapes a, b
       then:
                 X / a            2a
           F = ---------  has an F
                 Y / b            2b

       distribution, while (since X = F Y a/b) 

                  X        F a/b         a F
           Z = ------- = ---------- = ---------
                X + Y    F a/b + 1     a F + b

       has a Beta(a, b) distribution.  This fact can be used to help
       find the F pdf or CDF by change-of-variables.

-------------------------------------------------------------------------

THREE or more POPULATION MEANS:

       Now suppose we have several (k) "populations", all normally
       distributed with the same variance,

            X_{ij} ~ No(mu_i,  sig^2),   1 <= j <= n_i

       and we want to test the hypothesis

            H.0:  All means are equal   vs.   H.1:  Not so.

       Let Xbar.i be the average of the i'th sample, and set

            S2i = Sum_j {  ( X_{ij} - Xbar.i )^2  }

       (the sum-of-squares for the i'th sample).  Here are two
       independent estimates of sigma^2:

                           1
   Sig.W ("Within"):  -------------  Sum { S2i }   (df = sum (n.i-1) )
                       Sum (n.i-1)

       If there are k >= 2 samples (maybe from different populations),
       we can also get a variance estimate from how widely the Xbar.i's
       vary.  Let k be the number of populatins, and N = Sum { n_i };
       the "grand mean" is

                Sum {  n.i * Xbar.i  }        Sum { X_{ij} }
        XBAR = ------------------------  =  ------------------
                   Sum {  n.i  }                     N

        and the grand sum-of-squares can be decomposed as:

        Sum { (X_ij - mu)^2 } = Sum { (X_ij - XBAR)^2 } + N (XBAR-mu)^2

        = Sum_i [ Sum_j (X_ij - Xbar.i)^2 + n_i (Xbar.i - XBAR)^2 ] + ...

        =        {  SSW = Sum_ij (X_ij - Xbar.i)^2 }    (df = N-k)
              +  {  SSB = Sum_i  n_i (Xbar.i-XBAR)^2 }  (df = k-1)
              +  {  N (XBAR-mu)^2 }                     (df =   1)

        
        If H.0 is true, then SSW/(N-k) and SSB/(k-1) will be independent
        unbiased estimates of sigma^2, and their ratio

                    SSB / (k-1)       k-1
              F = --------------- ~  F
                    SSW / (N-k)       N-k

        will have an F distribution.  On the other hand, if H.1 is true,
        then the denominator will still be an unbiased estimator of sigma^2
        with (N-k) degrees of freedom, but the numberator should be HUGE...
        because it includes the variability of the mu_i's.

        SO, we can test H.0 by rejecting when F is big, under the F
        distribution.

        When comparing just k=2 populations, this F is just t^2 for the
        usual Student's t test.