FAQ'S ABOUT THE STUDENT-T DISTRIBUTION

Q: When is it appropriate to use a Student-t distribution?
A: Whenever you are working with a model based on the normal distribution, and the population standard deviation (or standard deviations if there are more than one) are unknown. This includes: inference on m when h is unknown, inference on mA-mB when hA and hB are unknown, and inference on b in the regression model when h is unknown.

Q: When can I approximate the Student-t with a Normal?
A: When you have more than 30 degrees of freedom the results you get from the normal are pretty close to the t. When you have less than 30 degrees of freedom you can get reasonable results using the small sample approximation of page 343 in Berry's book. So the answer is "always, but the quality of the approximation will vary".

Q: In the exam, should we use the T or the normal in small samples?
A: I'll accept either answers, as long as they are executed correctly.

Q: Then why are you bugging us with the Student-t?
A: Because 95% of the world prefers the t to the normal and you will almost certainly encounter it at some point or another in your carreers.

Q: How do I compute the area under a student-t curve using minitab?
A: If you want to calculate the area under the curve to the left of, say, 1.34, for a standard student-t density with 6 degrees of freedom, do

MTB > cdf 1.34;
SUBC> t 6. 
     1.3400    0.8856

the semicolon after the cdf command will get you to the "subcommand" prompt. Don't forget the period. cdf stands for cumulative distribution function and it is the same as the area under the density curve to the left of a given point.

Q: How do I compute the quantiles of a student-t using minitab?
A: If you want to calculate the .975 quantile, for a standard student-t density with 6 degrees of freedom, do

 MTB > invcdf .975;
 SUBC> t 6.
     0.9750    2.4469

invcdf stands for the inverse of the cumulative distribution function and it is the same as a quantile. 2.4469 is the appropriate t-perc for a 95% confidence interval, when you have 6 degrees of freedom.

Q: How do we calculate a t-score.
A: Exactly as you would calculate a z-score, except you replace h (or h's) with their estimates. Which is the right estimate depends on the problem.

Inference on single mean m: estimate of h is sample standard deviation s

Inferece on difference of means mA-mB: estimate of h's are sample standard deviations sA and sB in the two samples. The standard deviation for the difference (which goes at the denominator at the t-score) is given by: square root( sA^2/nA + sB^2/nB )

Inferece on regression coefficient b: estimate of h is: sum of the squares of th residuals divided by n-2.

Q: How do we calculate the degrees of freedom?
A: Practical rules:

Inference on single mean m: df=n-1

Inferece on difference of means mA-mB: df = nA+nB-2. (this is only approximately correct. most computer packages use a slightly more elaborate approximation, but you don't need to worry about that).

Inference on regression coefficient b: df = n-2.

Q: A worked example:
A: Let's look at the distribution of the difference of the two means of two normal populations A and B.

Data

	A: 4 5 7 8 
	B: 3 7 9 0 6

Summary statistics

	     mean   variance 
	A:      6   ( (-2)^2 + (-1)^2 + 1^2 + 2^2 ) / 3 = 3.33
	B:      5   ( (-2)^2 + 2^2 + 4^2 + (-5)^2 + 1^2 ) / 4 = 12.5

The posterior distribution on mA-mB is well approximated by a student-t with mean 6-5=1, standard deviation

 sqrt( 3.33 / 4 + 12.5 / 5 ) = 1.82

and degrees of freedom 4 + 5 - 2 = 7.

To compute the probability that the difference is positive (negative), first determine the t-score (0-1)/1.82 = -0.549. Then determine the probability that a standard t with 9 degrees of freedom is greater (smaller) than the t score. Using minitab:

 MTB > cdf -0.549;
 SUBC> t 9.
    -0.5490    0.30

so the probability that the difference is greater than 0 is 1-0.30=0.70.

To compute the 95% probability interval first determine t-perc. In minitab, evaluate the inverse cdf at .975.

 MTB > invcdf .975;
 SUBC> t 9.
     0.9750    2.3646

The interval is

(1 -  2.3646 * 1.82, 1 + 2.3646 * 1.82 ) = ( -3.306, 5.306)