Statistics 110E -- Statistical and Data Analysis-Psychology/Biological Sciences
Statistic 110 Lab 2
Topics
- Discussion of Mini-Project - Percentage of SUV's
- Introduction to JMP
- Creating a RANDOM SAMPLE in JMP
- If you did not complete the CLASS SURVEY
last week, please do so during the lab time.
SUV Mini-Project
This project is to be conducted outside of lab with your group. You may
use JMP to generate random numbers for a random sample. Lab time should be
used to discuss some of the issues below that you should think about
when you design your actual group survey.
Sports Utility Vehicles (SUVs) have shown tremendous growth in market
share over the last 10 years. For this Mini-project, your group will
design and carry out a study to estimate the percentage of vehicles that
are SUVs in a large parking lot. For example, you might use the W lot on
Wannamaker, The Ocean.
In your groups, design a sampling scheme you can use to estimate the
percentage of vehicles that are SUVs. Think about:
- What is the Sampling Frame? Is this the list of vehicles or the
list of Parking Spaces? What should you do if this information is not
known in advance or could fluctuate over time?
- Would a systematic sample be appropriate? (A systematic sample
would involve sampling say every 10th vehicle.)
Could vehicle size/parking space size create a systematic bias in how
vehicles are parked? Are there restrictions for handicap parking?
Compact car spaces?
- In a simple random sample, every vehicle (in your sampling frame)
is equally likely to be included in the sample. How would you carry out
a simple random sample? Do you need to know the
total number of vehicles in the lot to do this? Or the total number of parking
places? (Below, we will describe using JMP IN how to generate a random series
of random numbers that can be used to create a simple random sample.)
- How do you handle cars parked on the grass/off the pavement? Can
this bias results?
- In choosing the number to sample, consider the margin of error
that will accompany your sample result. How large a sample do you need
if you want your margin of error to be within +/- 5 percentage points?
+/- 1 percentage point? +/- .01 percentage points?
- For review: is this an observational study or an experiment? What
data are being collected? (categorical, discrete, continuous?) Do you
have valid measurements? reliable measurements? How do the 7 Critical
Components affect your study design?
- Here is a list of SUVs that
you may find useful in defining whether a vehicle is a SUV.
Now go through the entire area, actually taking a census, and compute
the population percentage of SUVs in the lot.
As a group, turn in typed report (1 page MAX) describing your
study.
In your report, be sure to
- Explain your sampling method and discuss any problems or biases you
encountered using it. What time of day or day of the week did you
collect data?
- Construct an interval from your sample that almost surely covers the
true population percentage of SUVs for that lot. Does your interval
cover the true population percentage you found when you took the census?
- Use your experience with taking the census to give one practical
difficulty with taking a census. ( Hint: did all the vehicles stay
put while you counted? Now aren't you glad you did not have to take
a survey to estimate the percentage of grey whales! :-)
- Discuss the limitations of this survey. Can these results be used
to estimate the percentage of DUKE Students that own SUVs? The
percentage of SUVs at DUKE? Explain.
Does it matter which lot you use - the ocean, the Duke Garden lot, an RT
lot, a lot on East Campus, Trent?
Due DATE: Turn in one copy of the report per group (include all
names) during LAB (9/23 or 9/24) or at the LATEST 5pm
FRIDAY 9/24 in 219A OLD CHEM)
INTRODUCTION TO JMP IN
For this class we will use SAS JMP IN to analyze data. You may use any
other program, if you prefer, but we may not be able to help you with
questions on its use. Part of lab time will be periodically devoted to
using JMP on the PC.
Start up JMP
- FIND a PC in the lab or on campus that has JMP
- Go to the START Menu at the bottom left side of the screen
- Hold the mouse button down and then drag to select
- Programs ->, then Statistics and Mathematics Programs ->,
then select JMP IN
- The program should start with a blank spread sheet "Untitled 1"
Creating a Random Sample
For the parking lot survey, we will need to have an idea about the
population size. You could take a preliminary stroll through the
lot counting the number of spaces; this would give you an upper bound on
the number of vehicles (ignoring then all vehicles parked on the grass
or in illegal spots). Let's say that there are 500 parking spaces. We
can take a random sample (without replacement) of the numbers 1 to
500 corresponding to the 500 potential vehicles. When we implement the
sampling plan we just go to the cars in order, skipping the empty
spaces. This will provide a valid random sample of vehicles assuming
your guess on the population size is not too off.
Using JMP, we will create a random order of the numbers 1 to 500 using
the "SHUFFLE" command. If we want to take a 10% sample, then we use all
vehicles that have SHUFFLED number less than or equal to 50. Let's see
how to do it!
The first step is to create an initial table with 500 rows.
- Go to the Rows Menu at the top of the JMP IN Menu bar.
- Select Add Rows...
- In the pop-up window, enter the number of rows you want to add (500
or whatever you will use as the population size - for lab you might try
initially with just 20)
- Click Add
You should now have a table 500 rows. As we have not specified values
for Column 1, you should see "?"s.
The second step is to create a variable that provides an index of the
population (1 to 500) in Column 1. We can do this using JMP IN's Spreadsheet
Calculator.
- Click in the space at the top of Column 1 (if selected the square region
at the top should be black). This selects a column.
- Go to the Cols (column) Menu at the top Menu bar and select
Column Info
- In the pop-up window, go to the Data Source: field. Click on
the down arrow, and then select Formula. Click the OK
box.
- This brings up the Calculator. In the middle scroll box at the
top of the calculator, click on Terms . (this allows us to
specify a "term" in our formula for defining Column 1. Then in the
right menu, select
i[row #]. Click Evaluate. You have just evaluated a
formula that sets each row in column 1 to be its row number, i.
- Go to the "Jumping Man" icon at the upper left corner of the
"Column 1" calculator window, and select Close.
- Back in your spreadsheet table, you should now have Column 1
filled in with the values 1 to 500.
- To rename Column 1 something more meaningful, just highlight the
text "Column 1" with the mouse, and then type in the new name, say
"Vehicle #".
The third step is to create a random sequence of vehicle numbers that
tell us which ones to sample.
- Double click the mouse in the "triangular region" in the upper left
corner of the spread sheet with the text "1 Cols". This should bring up
a menu to create a new column. (You can alternatively add a column with the
Cols Menu.)
- In the Col Name field, enter a name, say "Sample #".
- In the Data Source field, select Formula
- Click on OK to bring up the Calculator.
- In the Calculator window, go to the middle scroll box, and select
Random
- In the right scroll box window, select Shuffle (you should
see ?shuffle in the formula box window at the bottom of the calculator window.
- Click Evaluate
- Close the calculator window (go to the "Running Man" menu next to
"Sample #", and select Close.)
You should now have a column with the numbers 1 to 500 in random
order!
To print your table, go to the File menu and select Print. You can
also cut and paste info to other programs or use the Journal under the
Edit menu
To use this to create a 20% sample (sample size is then 100), you would
sample vehicles that have a sample number less than or equal to 100. We
are keeping the lists sorted by vehicles, since it may be easier to go
through the vehicles in order 1, 2,... and find out if they are in the
sample or not. Going through in order of the sample numbers requires
that we first go to vehicle 50, then to vehicle 11, etc, which may be
more difficult to implement in our context.
For example, here is output for a table based
on 10 vehicles. To take a sample of size 4, we would sample only the
vehicles that have a sample number less than or equal to 4; we would sample
vehicles 1, 4, 6, and 7. As you sample each vehicle keep a running
tally of the number of SUVs.