STA 242/ENV 255

LAB1: Introduction to UNIX workstations, SAS, and OLS

WARNING: THIS ASSIGNMENT IS DUE BY 5PM TUESDAY JANUARY 27, 1998. Late assignments will receive 0 credit.

Lab Goals:

With successful completion of this assignment, you should be able to
  1. Create a SAS Data set
  2. Carry out a simple linear regression and interpret results and residual plots

ASSIGNMENT: Problems 1-6 in Chapter 2, Regression with Graphics (RWG)


References

We will outline the basic steps for completing this assignment; for more information on the procedures please read the SAS/INSIGHT Users Guide:
  1. Chapter 1: Starting SAS INSIGHT
  2. Chapter 2: Entering Data
  3. Chapter 4: Exploring Data in One Dimension
  4. Chapter 12: Examining Distributions
  5. Chapter 13: Fitting Curves

Topics

This lab session will go over the following topics: To go to any particular highlighted session, click on the highlighted text. You should be able to get back to this list by clicking on the Back box at the top of the page.
  1. Starting SAS from UNIX
  2. Reading in a SAS data set
  3. Starting SAS/INSIGHT
  4. Fitting Regression Models
  5. Confidence Intervals
  6. Hints for problem 3
  7. Hints for problem 4
  8. Hints for problem 5
  9. Hints for problem 6: Excluding Points
  10. Printing your results

Starting SAS

To start up SAS enter

	sas &
from the UNIX window. Three windows will appear: the Program Editor, Output Window, and a Log Window. The Program Editor is where you can put SAS commands for non-interactive analyses. We will be using SAS/INSIGHT for interactive and exploratory statistics.


Creating a SAS Data Set via the Program Editor

We will describe how to create a permanent sas data set using the SAS Program Editor. Alternative methods are given in Chapter 2: of the INSIGHT Manual for entering a data set through the SAS INSIGHT Spreadsheet.

To enter a data set using the Program editor, follow the steps in the following sample (don't forget the semi-colon!!!).You should type these statements in the PROGRAM EDITOR window. Italics indicate names specific to this data set that you can change as you like.

data sasuser.rwg59;
input stream $ 1-13 summerph fish ;
datalines;
Moss                6.3            6
Orcutt              6.3            9
Ellinwood           6.3            6
Jacks               6.2            3
Riceville           6.2            5
Lyons               6.1            3
Osgood              5.8            5
Whetstone           5.7            4
Upper Keyup         5.7            1
West                5.7            7
Boyce               5.6            4
Mormon Hollow       5.5            4
Lawrence            5.4            5
Wilder              4.7            0
Templeton           4.5            0
; {NOTE: this semicolon tells SAS that this is the end of the data}
run;
The $ 1-13 after stream in the input line, tells SAS that the first variable is called stream, it is a categorical or nominal variable, and is in columns 1-13 of the data file. Specifying the columns 1-13 is necessary in this problem since the stream names contain spaces, i.e. observation 9 "Upper Keyup" and observation 12 "Mormon Hollow". For variables without spaces, you can omit the column numbers.

Note all data sets from RWG are available from the course web page Data Sets. You can download these to your home directory. File names are of the form "rwg.##" where ## is the page number in the text. If you have saved the data as an ordinary text file, say "rwg.59", then the commands to create the SAS data set for the assignment are:

data sasuser.rwg59;
infile 'rwg.59';
input stream $ 1-13 summerph fish;
run;

To get SAS to process these commands, go to the Locals pull-down menu and choose Submit. You should see "NOTE: 16 Lines submitted." You now have a SAS data set to use. If you don't see this, then there may be some typo or error in the commands. If you were successful, you created a SAS data set called rwg59, that is stored in your SAS library called SASUSER. The data set contains three variables called stream,summerph and fish that correspond to the 3 columns of numbers.


Starting SAS/INSIGHT

See Chapter 2: page 24.

Now, to start up INSIGHT, follow these menus from the PROGRAM EDITOR: Use the mouse to select the Globals menu, and then select

  1. Globals
  2. Analyze ->
  3. Interactive data analysis
(As a shortcut, we will describe this menu selection as "Globals:Analyze:Interactive data analysis". At this point the SAS/INSIGHT window opens and you can choose your data set. The library choice is SASUSER . Click on RWG59 (or use the name you used for the data set you just created using the PROGRAM EDITOR window ). The data window will now appear in a spreadsheet format. Continue reading the SAS/INSIGHT User's Guide Chapter 3 for details on how to manipulate this window.

Fitting Regression Models (Ch 13 INSIGHT)

The homework problems require that you fit a regression line to the data. To do that, choose Fit (Y X) from the Analyze menu.

Select the variable FISH, then press the Y button to specify that FISH is the dependent or response variable. Now put SUMMERPH in the X area. Click on STREAM, then press the Label button. This will be used to label points in graphs, etc.

A new window will appear with a scatterplot of the data and regression line, summary statistics of the regression, and residual plots. Refer to this output to answer Problems 1 and 2.


Confidence Intervals

INSIGHT will give you confidence intervals for the parameter values. To get them, go to the Tables menu of the FIT window (the window with the plot). Select C. I.> (Wald) for Parameters -> and the desired confidence level. A table containing the output will appear in the FIT window. Verify the results by hand.

You can add the CI curves shown in fig 2.7 of RWG to the regression line by:


Hints for Problem 3

For parts (b) and (c), refer to pages 47-48 of RWG. Specifically, look at equations [2.28], [2.29], and [2.30]. The numbers you need are available in INSIGHT - you will need to look at the ANALYZE:DISTRIBUTION menu to get the summary statistics of summerpH

TSSx
This is calculated by multiplying the variance of X by (n-1) which is CSS. (Corrected Sum of Squares)
X bar
This is the mean, found in the distribution window
Xi
This is the value of X you are predicting things about: 6
se
This is the Root MSE, found in the FIT window
t
This value comes from the chart on 350 in the back of RWG It is NOT the t-statistic from the output! The degrees of freedom are from the ANOVA table in the ERROR line.
Y hat
This is the predicted value of Y when X=6. Use your regression equation to calculate the value.

Hints for Problem 4

In problem four, you are asked for the predicted values and the residuals. SAS will add these variables in your spreadsheet when you fit the regression Fit (Y X). In your data window, you should now have two new columns: R_FISH residuals or e, and P_FISH, the yhat or predicted values.

Find the points with the largest absolute residuals is to


Hints for Problem 5

To get the summary statistics and plots for problem 5, again go to the ANALYZE:DISTRIBUTION menu and select R_FISH as the Y variable. See Chapter 12 of SAS INSIGHT. Use these graphs and the boxplot, as well as the summary statistics to answer problem 5.

Problem 6: Excluding Points from the Regression

Problem 6 asks you to look at the results when you exclude the points at the lower left. This is easily done in INSIGHT.

Printing and Saving a Graph

PRELIMINARIES: in each SAS Session you must do the following:
  1. In the PROGRAM EDITOR WINDOW, go to the Menu VIEW and select Preferences... then Display Manager and finally click on "Use Host Printing". Click on OK.
  2. Next go to the PROGRAM EDITOR window and select the FILE menu and click on" Print Setup...". Click on "Properties..." in the popup menu and then OK. You should see that the printer icon is cyan and blue now. Click on OK. Once you have done these two steps at the beginning of the session, your output will be saved to a file called "prn.ps" by default.
a) In the SAS output window to be printed: choose Edit-Windows-Select All; this will highlight the entire window. If you want to save only a single plot or figure, just click on the edge so that a outline is highlighted.

b) Now choose File:Print. Click on the box "Print File to Disk", and then PRINT. A window will pop up; in the box under "Enter directory name, filename or filter, specify a filename for the output, such as HW1reg.ps click on "OK", and select "Fill Page" on the next window that appears". Note, if you don not change the filenames, you will write over the old file. c) Go over to the UNIX window and type

ls
That makes a list of the files in your directory. One of them should be HW1reg.ps. To send it to the printer, type
        print HW1reg.ps
at the UNIX prompt. To make sure that it is o.k. before sending it to the printer, you may preview it with the command
        ghostview HW1reg.ps
As there are quotas on the number of pages you may print for free, make sure your plot is correct before saving it. Since you do not need to turn in every output file, just print the ones that are important.


For printing the numerical summaries, you may find that saving the output as a text file useful. You can then download this to a PC and use a word processor to edit and format. In INSIGHT, select the menus FILE-SAVE-Tables. The output is then sent to the SAS output Window. From the Output Window, you may either print it directly or save it as a text file.