Lab 7
Weeks of 3/9/98 and 3/23/98

Down-load this week's SAS/Insight program

Click here and this week's program will appear in your browser window. Click on "File>Save As..." in Netscape and choose "Format for Saved Document: Text" then click "OK". The program is now saved in your account (in your home directory, by default). The file's name is "lab7.sas". Return to this page by choosing "GO>Back" from the Netscape menu bar. To get started type "sas lab7 &" in one of the terminals open on your screen.

Questions

In this lab exercise we will use a sample of 1980 and 1990 sale prices of Dade County, Florida homes to comment on the rate of appreciation of residential real estate there from 1980 to 1990. We will compare two simple approaches to measuring appreciation.

The first is to analyze two "independent" samples: one of 1980 sales, the other of 1990 sales (some homes have sold in both years and might be in both samples, so the samples aren't really independent, but are probably nearly so). The data set that is displayed when you start SAS is called "UNPAIRED". The second column in this data set, threetwo, indicates three bedroom, two bath homes with a "1" and other types of homes with a "0". The sale price, in dollars, is given in column 3, price. The year of sale (1980 or 1990) is given in column 4, year.

1) Make box plots for sale price by year (use the group subcommand on the distribution or box plot menu, group by year). Compare the two: notice that the distributions are very skewed. Recall that confidence intervals that use the normal and t-distributions in determining the allowance for error assume that the sample is from a normal population. This is most certainly not the case here. One way to cope with this problem is to make the data more normal by applying a transformation to it. To this end, the logarithm is a useful transformation for non-negative data with a long right tail. Create a new variable containing the logarithm of price (use the Edit > Variables menu). Calculate box plots for the transformed data. Is one distribution noticeably more variable than the other? Are their medians different? Do the distributions appear more nearly normal?

2) Use the distribution menu to calculate the group means, variances and sample sizes needed to form a 95% confidence interval for the difference in the logarithm of mean sale prices between 1980 and 1990 (1990 mean - 1980 mean) using formula 8-20 in the book (there is a direct way of doing this calculation in SAS, but it requires a topic we haven't covered yet--analysis of variance). Do homes seem to be appreciating or depreciating in value (on the log scale), i.e. does the confidence interval contain zero?

3) In the same way (no need to transform!), calculate a 95% confidence interval for the difference in the proportion of three bedroom, two bath homes sold in 1990 and sold in 1980 (1990 proportion - 1980 proportion) using formula 8-29. Is there evidence of a difference in the fraction of "three-two" homes sold in 1990 and 1980? Going back to problem 2, one reason homes might seem more valuable in 1990 than in 1980 is that different types of homes might have been more likely to sell in one year than the other. How does the confidence interval we calculate here help us address this question?

Our second approach to measuring appreciation will be to look at a paired sample of homes: those sold in 1980 and then again in 1990. Open the dataset "PAIRED." The second column, threetwo, indicates three bedroom, two bath homes with a "1" and other types of homes with a "0". The 1980 sale price, in dollars, is given in column 3, price80, and the 1990 sale price, in dollars, is given in column 4, price90.

4) Create a new variable defined to be the logarithm of price in 1990 minus the logarithm of price in 1980 (use the Edit > Variables menu: first use "log(Y)" to create columns with the logarithm of the respective sale prices, then choose the "Y-X" transformation from Edit > Variables >Other...). Note that differences on the log scale correspond approximately to percent changes on the data's original scale of measurement when the percent change is small, hence a difference in logs of 0.2 corresponds approximately to a 20% increase. Make a histogram of this variable and describe its shape.

5) Calculate a 95% one-sample t-interval for the mean difference in logarithm of price (use the Distribution menu, click on the "Output" option and choose "95% C.I. for the Mean"). Does the interval include zero? What does this mean?

6) Compare the paired and unpaired intervals: which interval is wider? Are the intervals centered near the same value? Which interval do you believe gives a more accurate picture of the appreciation in market value of homes in Dade County from 1980 to 1990? Why?

Return to the Stat 110B lab page.


iversen@stat.duke.edu
last updated 6 March 1998