This week's data set is the same data we've been looking at in class: data on homes sold in two zip codes (33134 and 33146) located in Dade County, Florida. There are 339 Observations and 9 variables. The variables are: 'Zip Code', 'Year Built', 'Sqr Feet', 'Bedrooms', 'Bathroom', 'Floors', 'Lot Size', 'Sale Amt', and 'Yr Sold'. Lets have a look at it...
1) What fraction of the observations are homes from zip code 33134? Click "Analyze > Distribution", (Analyze is the menu option, Distribution is a sub-menu option) a menu will appear. Click on "ZIPCODE" and click on "Y", "ZIPCODE" will appear in the box below the "Y". Click "OK". A window with plots (a rectangular pie chart and a bar chart) will appear. To display values on the plots click the little arrow in the bottom left of the plot and choose "values". This will answer your question. Another way of answering the question is to produce a table of frequencies by choosing "Tables > Frequency Table" from the top of the plot window. Click "File > End" on the plot window to get rid of it.
2) Produce the same plots for the variable "BATHS" (using the "Analyze>Distribution" menu). Look at the bar chart, what is the modal category? What fraction of observations are homes with 2 bathrooms?
3) Are there more small homes in zip code 33134 than in 33146? Use the number of bathrooms to judge size. Produce the plots in Question 2), but this time for the two zip codes separately, using the "Analyze>Distribution" menu. You need to do only 1 thing different than you did for Question 2): after you click on "BATHS" and put it under "Y" click on "ZIPCODE" and then click on "GROUP", then click "OK" to produce separate analyses by zip code. Resize the plot window to see all 4 plots (ask your TA how). Which zip code in the sample has more homes with 2 or fewer bathrooms?
4) Is it likely that the difference we observe between the two zip codes is due to sampling error? On paper, use equation 1-2 to construct 95% confidence intervals for the proportion of homes in each zip code with 2 or fewer bathrooms. What do you conclude?
5) The data are a sample of homes that sold in 1994 and 1995. Is this necessarily a representative sample of homes? Is it possible that estimates (sample proportions or confidence intervals) derived from this sample are biased? Can you think of any confounding factors?