Statistics 110E -- Statistical and Data Analysis-Psychology/Biological Sciences

Statistic 110 Lab 2

Topics

  1. Discussion of Mini-Project - Percentage of SUV's
  2. Introduction to JMP
  3. Creating a RANDOM SAMPLE in JMP
  4. If you did not complete the CLASS SURVEY last week, please do so during the lab time.

SUV Mini-Project

This project is to be conducted outside of lab with your group. You may use JMP to generate random numbers for a random sample. Lab time should be used to discuss some of the issues below that you should think about when you design your actual group survey.

Sports Utility Vehicles (SUVs) have shown tremendous growth in market share over the last 10 years. For this Mini-project, your group will design and carry out a study to estimate the percentage of vehicles that are SUVs in a large parking lot. For example, you might use the W lot on Wannamaker, The Ocean.

In your groups, design a sampling scheme you can use to estimate the percentage of vehicles that are SUVs. Think about:

Now go through the entire area, actually taking a census, and compute the population percentage of SUVs in the lot.

As a group, turn in typed report (1 page MAX) describing your study. In your report, be sure to

  1. Explain your sampling method and discuss any problems or biases you encountered using it. What time of day or day of the week did you collect data?

  2. Construct an interval from your sample that almost surely covers the true population percentage of SUVs for that lot. Does your interval cover the true population percentage you found when you took the census?

  3. Use your experience with taking the census to give one practical difficulty with taking a census. ( Hint: did all the vehicles stay put while you counted? Now aren't you glad you did not have to take a survey to estimate the percentage of grey whales! :-)

  4. Discuss the limitations of this survey. Can these results be used to estimate the percentage of DUKE Students that own SUVs? The percentage of SUVs at DUKE? Explain. Does it matter which lot you use - the ocean, the Duke Garden lot, an RT lot, a lot on East Campus, Trent?

Due DATE: Turn in one copy of the report per group (include all names) during LAB (9/23 or 9/24) or at the LATEST 5pm FRIDAY 9/24 in 219A OLD CHEM)


INTRODUCTION TO JMP IN

For this class we will use SAS JMP IN to analyze data. You may use any other program, if you prefer, but we may not be able to help you with questions on its use. Part of lab time will be periodically devoted to using JMP on the PC.

Start up JMP

  1. FIND a PC in the lab or on campus that has JMP
  2. Go to the START Menu at the bottom left side of the screen
  3. Hold the mouse button down and then drag to select
  4. Programs ->, then Statistics and Mathematics Programs ->, then select JMP IN
  5. The program should start with a blank spread sheet "Untitled 1"

Creating a Random Sample

For the parking lot survey, we will need to have an idea about the population size. You could take a preliminary stroll through the lot counting the number of spaces; this would give you an upper bound on the number of vehicles (ignoring then all vehicles parked on the grass or in illegal spots). Let's say that there are 500 parking spaces. We can take a random sample (without replacement) of the numbers 1 to 500 corresponding to the 500 potential vehicles. When we implement the sampling plan we just go to the cars in order, skipping the empty spaces. This will provide a valid random sample of vehicles assuming your guess on the population size is not too off.

Using JMP, we will create a random order of the numbers 1 to 500 using the "SHUFFLE" command. If we want to take a 10% sample, then we use all vehicles that have SHUFFLED number less than or equal to 50. Let's see how to do it!

The first step is to create an initial table with 500 rows.

  1. Go to the Rows Menu at the top of the JMP IN Menu bar.
  2. Select Add Rows...
  3. In the pop-up window, enter the number of rows you want to add (500 or whatever you will use as the population size - for lab you might try initially with just 20)
  4. Click Add
You should now have a table 500 rows. As we have not specified values for Column 1, you should see "?"s.

The second step is to create a variable that provides an index of the population (1 to 500) in Column 1. We can do this using JMP IN's Spreadsheet Calculator.

  1. Click in the space at the top of Column 1 (if selected the square region at the top should be black). This selects a column.
  2. Go to the Cols (column) Menu at the top Menu bar and select Column Info
  3. In the pop-up window, go to the Data Source: field. Click on the down arrow, and then select Formula. Click the OK box.
  4. This brings up the Calculator. In the middle scroll box at the top of the calculator, click on Terms . (this allows us to specify a "term" in our formula for defining Column 1. Then in the right menu, select i[row #]. Click Evaluate. You have just evaluated a formula that sets each row in column 1 to be its row number, i.
  5. Go to the "Jumping Man" icon at the upper left corner of the "Column 1" calculator window, and select Close.
  6. Back in your spreadsheet table, you should now have Column 1 filled in with the values 1 to 500.
  7. To rename Column 1 something more meaningful, just highlight the text "Column 1" with the mouse, and then type in the new name, say "Vehicle #".
The third step is to create a random sequence of vehicle numbers that tell us which ones to sample.

  1. Double click the mouse in the "triangular region" in the upper left corner of the spread sheet with the text "1 Cols". This should bring up a menu to create a new column. (You can alternatively add a column with the Cols Menu.)
  2. In the Col Name field, enter a name, say "Sample #".
  3. In the Data Source field, select Formula
  4. Click on OK to bring up the Calculator.
  5. In the Calculator window, go to the middle scroll box, and select Random
  6. In the right scroll box window, select Shuffle (you should see ?shuffle in the formula box window at the bottom of the calculator window.
  7. Click Evaluate
  8. Close the calculator window (go to the "Running Man" menu next to "Sample #", and select Close.)

You should now have a column with the numbers 1 to 500 in random order!

To print your table, go to the File menu and select Print. You can also cut and paste info to other programs or use the Journal under the Edit menu

To use this to create a 20% sample (sample size is then 100), you would sample vehicles that have a sample number less than or equal to 100. We are keeping the lists sorted by vehicles, since it may be easier to go through the vehicles in order 1, 2,... and find out if they are in the sample or not. Going through in order of the sample numbers requires that we first go to vehicle 50, then to vehicle 11, etc, which may be more difficult to implement in our context.

For example, here is output for a table based on 10 vehicles. To take a sample of size 4, we would sample only the vehicles that have a sample number less than or equal to 4; we would sample vehicles 1, 4, 6, and 7. As you sample each vehicle keep a running tally of the number of SUVs.