In this lab, we will use MatLab to do the following:
Again use the data from EXAMPLE 12.13 on page 677 in the text. The dependent variable is Carbon Monoxide (CM) content of different cigarette brands. We want to investigate the relationship of three predictor or independent variables with CM, these variables are Tar (T), Nicotine (N), and Weight (W). (see page 677 for more details).
Read in the data and call the first column T, the second N, third W, and the fourth column CM.
smoke = load('lab9.dat'); T = smoke(:,1); N = smoke(:,2); W = smoke(:,3); X = [ones(length(smoke),1) T N W ]; % X needs to be a % matrix with a leading % column of ones. CM = smoke(:,4); Y = CM; g = 1; k = 3; n = length(Y);
Let's investigate the the items above:
Xc = X; % n x (k+1) design matrix for "complete" model Xr = Xc(:,1:(g+1)); % = first g+1 columns, "reduced" model [br, bintr, rr] = regress(Y,Xr); [bc, bintc, rc] = regress(Y,Xc); SSEr = rr'*rr; dfr = n - (g+1); SSEc = rc'*rc; dfc = n - (k+1); F = ( (SSEr-SSEc)/(k-g)) / ( SSEc/(n - (k+1))); Pvalue = 1-fcdf(F,k-g,n-(k+1));What hypothesis are we testing here? What can we conclude from the results of the test?
Some times variables need to be transformed to ensure that the underlying assumptions are met. For example if we had a Poisson model and the variance was nonconstant, we could take square roots to transform it into a conforming dataset. Specifically, replacing naive model
Y_i = beta_0 + X_i beta_1 + eps_ifor e.g. Y_i = Poisson number of pot-holds in X_i miles of road (which should be heteroskedastic, with eps_i having a variance that depends linearly on X_i) with
sqrt(Y_i) = beta_0 + sqrt(X_i) beta_1 + eps_iwhich should be okay... and in both cases we expect beta_0 = 0.
- fitted values - included variables - omitted variableslooking for patterns in these plots as discussed in lab and in the book.