Sta113: Lab 10 (Friday, April 18, 2003)

In this lab, we will use MatLab to do the following:

1) Testing nested models.
2) Tranforming variables to stabilize variance.
3) Plotting residuals.

Again use the data from EXAMPLE 12.13 on page 677 in the text. The dependent variable is Carbon Monoxide (CM) content of different cigarette brands. We want to investigate the relationship of three predictor or independent variables with CM, these variables are Tar (T), Nicotine (N), and Weight (W). (see page 677 for more details).

Read in the data and call the first column T, the second N, third W, and the fourth column CM.

               smoke = load('lab9.dat');
               T = smoke(:,1);
               N = smoke(:,2);
               W = smoke(:,3);
               X = [ones(length(smoke),1) T N W ]; % X needs to be a
                                                   % matrix with a leading 
                                                   % column of ones.
               CM = smoke(:,4);
               Y = CM;
               g = 1; 
               k = 3;
               n = length(Y);

Let's investigate the the items above:

1) Testing nested models. Say we want to test whether there is anything "added" to the model by including N and W on top of T. We can do this by finding the F-statistic associated with the difference in the sum of square errors for the two models. The complete model is the model with all three predictor variables, and the reduced model has only T (tar content).
```
 
 
         Xc = X; % n x (k+1) design matrix for "complete" model
         Xr = Xc(:,1:(g+1)); % = first g+1 columns, "reduced" model

         [br, bintr, rr] = regress(Y,Xr);
         [bc, bintc, rc] = regress(Y,Xc);

         SSEr = rr'*rr;  dfr = n - (g+1);
         SSEc = rc'*rc;  dfc = n - (k+1);

         F = ( (SSEr-SSEc)/(k-g)) / ( SSEc/(n - (k+1)));

         Pvalue = 1-fcdf(F,k-g,n-(k+1));
```
What hypothesis are we testing here? What can we conclude from the results of the test?
2) Tranforming variables to stabilize variance.
Some times variables need to be transformed to ensure that the underlying assumptions are met. For example if we had a Poisson model and the variance was nonconstant, we could take square roots to transform it into a conforming dataset. Specifically, replacing naive model
```
         Y_i = beta_0 +  X_i beta_1 + eps_i
```
for e.g. Y_i = Poisson number of pot-holds in X_i miles of road (which should be heteroskedastic, with eps_i having a variance that depends linearly on X_i) with
```
         sqrt(Y_i) =  beta_0 +  sqrt(X_i) beta_1 + eps_i
```
which should be okay... and in both cases we expect beta_0 = 0.
3) Plotting residuals against different values and variables allows us to determine the appropiateness of the model we are examining. From last weeks lab (see Lab 9) we can plot residuals vs. the following:
```
 
        - fitted values 
        - included variables
        - omitted variables 
```
looking for patterns in these plots as discussed in lab and in the book.