Lab 2 Program and Questions (Week of 1/25/99)

Step 0: Download SAS/Insight program with last week's data

We will continue working with the real estate data this week, if you erased last week's lab program here are instructions on getting it again. Click here and the program will appear in your browser window. Click on "File>Save As..." in Netscape and choose "Format for Saved Document: Text" then click "OK". The program is now saved in your account (in your home directory, by default). The file's name is "lab1.sas". Return to this page by choosing "GO>Back" from the Netscape menu bar.

Step 1: Start the SAS/Insight program lab1.sas

To get started type "sas lab1 &" in one of the terminals open on your screen. You should be in the same directory in which you saved the file "lab1.sas". A spreadsheet will appear. It will have 9 columns, one for each variable in the dataset, and 339 rows, one for each observation. Observations are numbered (i=) 1 to (i=) 339 in the left most column of row labels.

Step 2: Questions

1) Generate a histogram of 'Sqr Feet,' the amount of floor space in the home, measured in square feet. Describe the distribution of 'Sqr Feet.' Where is the modal category of the histogram located? The degree of "smoothing" accomplished by the histogram is a function of the width of the histogram's bars: a histogram with one bar tells us next to nothing (very smooth), a histogram with very narrow bars picks out each data point (very rough) and tells us, perhaps, too much. Try altering bar width (click right mouse button over the histogram's surface, then choose Ticks...), by altering the "Tick Increment" through the values 10, 300, 600, 1800, and 4800. What happens to the modal category? Which value best conveys the shape of the distribution of numbers to you?

2) Descriptive statistics for 'Sqr Feet,' should have been calculated and displayed with the histogram. What are the mean, median and mode (from question 1) of 'Sqr Feet'? How are they related? Comment on why this ordering obtains. Use the calculated quantiles (we referred to them as percentiles in class) to identify three intervals that contain 1/2 of the data points. Which interval is shortest, which longest, why?

3) Look at the box plot. "Outliers" are indicated, click on them, what is displayed? What happens to the histogram (this is called brushing). Use this feature to highlight the first quartile, middle 50% and last quartile of the data on the histogram. Lets look at the effect the largest outlier has on our summary statistics. What is its observation number? Do you think this value is a typo: why, why not (look at the entire observation using Edit > Observations > Examine...)? We can exclude this point from calculations. First make note of the mean and median. On the window displaying the plots and summary stats, use the Edit > Observations > Find... command to tag the observation, then use Edit > Observations > Exclude in Calculations. to do just that. Which has changed more, the mean or median? Which measure of location seems most sensitive to large values?

4) Bring up a summary of the distribution of 'Sale Price'. Use the brushing facility to determine if any homes in the sample were simultaneously in the upper 50% of homes in terms of square footage, but in the lower 50% in sale price. Are any homes in the upper quartile in terms of size, but the lowest quartile in terms of price? How about the reverse (upper quartile of price, lowest in size)?

Step 3: Stop the SAS/Insight program

Click on "File>End" on the SAS/Insight menu bar to quit the program.
Return to the Stat 110B lab page.
iversen@stat.duke.edu
last updated 27 January 1999