STA 242/ENV 255: April 8, 1998

Nonlinear Regression

Assignment: Due April 14

  1. Problem 16 in Chapter 5 plus additional questions listed below.
Suggested Reading: Read in the dataset for the problem t.180

The variables are "case", "initial" and "eaten". We will explore linear regression and nonlinear regression to get models to fit the data.


Linear Regression

  1. Fit a linear model by regressing Eaten on Initial. What problems for linear regression do you encounter?
  2. Find a linear regression model with transformed variables ( Y or X's (polynomials terms in X or sqrt(X), log(X), etc ) so that all assumptions are met (or come as close to possible being met). What problems remain?
  3. Calculate the squared correlation between Eaten (Y) and the fitted values transformed to the original scale and compare to the squared correlation from teh linear regression in 1. Remember if you transform Y, you cannot compare R2 for different models on different scales, so you must transform back to the original units first. If you only transofrmed the X's then you can compare R2's directly. You can get the correlation between EATEN and Fitted values usingAnalyze:Multivariate (Y's)
  4. To see the curvilinear fit on the X-Y scatterplot, do the following:First, save you dataset with the variables you just made. Then, submit the following code in the program editor: (Be sure to change PRED to the name of your variable with the fitted observations.)
    
    symbol1 c=black i=none value=star  ;
    symbol2 c=black i=join value=none  ;
    axis1 label=('Initial Egg Density');
    axis2 label=('Eggs Eaten');
    proc gplot data=sasuser.rwg180;
    plot  eaten*initial PRED*initial /overlay haxis=axis1 vaxis=axis2;
    run;
    
The first line says to use the symbol "star"; this is used in the first plot command: plot eaten*initial. The second symbol is for a continuous line (curve) and is used in the plot of the fitted values versus eaten (the second plot command PRED*initial). The /overlay then tells SAS to overlay the fitted curve on the scatterplot. The haxis and vaxis options control the label on the horizontal and vertical axes.

Nonlinear Regression

As suggested in the problem 16, fit a nonlinear negative exponential model (eq [5.18]) to the data.

Starting Values

We will need to find some initial guesses for the parameters for the nonlinear regression procedure. The parameter alpha is the upper asymptote, this is the theoretical limiting value of the population size, which we can guestimate from the scatterplot of Initial versus Eaten. The maximum value of Eaten may be a good guess, or something larger.

The parameter beta is probably more tricky to guess at. Here is one approach -- the key idea is to take the mean of the nonlinear regression and transform the data so that the mean is a linear function in beta. Similar to what we did in class -- remember that logs get rid of exps. 1) Take the guess of alpha, a0, (it should be larger than all of the Eaten values otherwise this won't work). Divide everything by a0 and subtract both sides of equation 15.8 from 1. Ignoring the error term, we have:

1 - Eaten/a0 = exp(-beta*Initial)
If we take the natural logs of both sides we get:
log(1 - eaten/a0) = -beta*Initial
which suggests that a good estimate of beta can be obtained it a linear regression of log(1 - Eaten/a0) on "Initial". To do this, you must add the transformed Y variable to the data set. Although you could do this with theINSIGHT transformations, it is easier to just Submit the following lines in the PROGRAM EDITOR window:

	data sasuser.rwg180;
	set sasuser.rwg180;
	y = log(1 - eaten/a0);
	run;
	proc reg data=sasuser.rwg180;
	model y = initial;
	run;
Remember that the dataset can't be open when you are Submitting these commands! Substitute your number for a0!

A starting value for beta is the -1*(slope coefficient) from the above regression. (look in the OUTPUT window.) The starting value for alpha equals a0.

  • Make a new scatterplot of Initial and Eaten, and add the curve using your starting values.
    
    	data sasuser.rwg180;
    	set sasuser.rwg180;
    	yhat = a0*(1 - exp(-b0*initial));
    	run;
    	symbol1 c=black i=none value=star  ;
    	symbol2 c=black i=join value=none  ;
    	axis1 label=('Initial Egg Density');
    	axis2 label=('Eggs Eaten');
    	proc gplot data=sasuser.rwg180;
    	plot  eaten*initial yhat*initial /overlay haxis=axis1 vaxis=axis2;
    run;
    
    How well does this fit the data? We can use "trial and error" to get a better fit -- or use nonlinear least squares to find better estimates that minimize the residual sum of squares.

    Fitting the Nonlinear Regression

    Now that you have starting values, you can try to find the nonlinear least squares estimates. Once again, you must Submit some code into the PROGRAM EDITOR window, using your values for the a and b below:
    
    	proc nlin data=sasuser.rwg180;
    	parms a= a0    b= b0    ;
    	model eaten = a*(1 - exp(-b*initial));
    	run;
    
    The output will be in the SAS:OUTPUT window. You'll notice that the output tells you the parameter estimates. You will want to make a plot of the residuals of this regression. The easiest way to do this is to make another new variable in your data set (isn't SAS programming fun?). Make sure that the dataset is closed, and Submit the following lines to the PROGRAM EDITOR window, using the parameter estimates for a and b:
    
    	data sasuser.rwg180;
    	set sasuser.rwg180;
    	fitted = a*(1-exp(-b*initial));
    	residual = eaten - fitted;
    	run;
    
    Repeat the commands before to plot Eaten versus Initial and add the fitted curve.

  • Construct a 95% Confidence interval for alpha, and interpret what this means in the context of the problem.

  • Calculate the correlation between Eaten and the fitted values, using Analyze:Multivariate (Y's) How does this compare to your "best" linear regression model?

    Look at the residual plots (Residual vs Fitted)and comment. Is there any evidence that the model from the nonlinear regression is inadequate (lack of fit)? All assumptions for the linear model must apply for the nonlinear regression model -- do you see any problems? (explain)

    Comparison

    Briefly summarize your two models and interpretation based on them. Discuss advantages and disadvantages of the two approaches. (i.e. what are the meanings of the parameters, which model seems more appropriate for extrapolation for high or low values of initial....) How well does this model fit compared, the model we discussed in class on Tuesday?