Instructions for lab 1
Lab Objective
To become familiar with the software package JMP.
Lab Procedures
JMP gives us an enormous advantage over people who learned about and performed statistical analyses back in the pre-computer days. It allows us to avoid the drudgery of long, arithmetical calculations in favor of understanding concepts and analyzing data. You may find JMP a little annoying at times (all computer software is), but I suspect that you will be thankful of its existence once we start analyzing data.
There should be a shortcut to JMP on the Desktop that you can click
to begin JMP. If not, click on the Windows Start button in the
lower left corner of the page, then select Program-Math and
Statistics-JMPIN-JMPIN. The software should open with a title
page showing. Click on the title page and it will disappear
leaving you ready to work.
Like most Windows-based programs, there are menu choices, including
File,
Edit, Tables, Rows, Cols, Analyze, Graph, Tools, Window, and
Help.
As the semester progresses, we will learn about the options under these
menu choices. For now, we focus on creating and downloading JMP
data sets.
Unit 1: Creating new data sets
Sometimes you need to create your own data sets from scratch (e.g., when analyzing data you collected for your final project). This tutorial familiarizes you with creating JMP data sets.
Click on the New Data Table button on the JMP work window.
This brings up a blank spreadsheet that will become the data set.
You also can create a new data file by navigating through the menus to
File
- New Data Table. The first step is to tell JMP the
number of rows in the data set. Click on the red arrow next to
Rows,
or select Rows from the menu choices, and select Add
Rows. Enter the number of rows you want. To keep
things simple, we'll use 9 rows for this part of the lab.
Here are the data for nine people:
Sex Number
m 1
m 2
f 1
m 3
f 2
f 2
m 3
f 1
f 2
We're going to input the nine numbers in the first column. "Column 1" is a meaningless name for a variable, so let's rename it. Double click on the box containing "Column 1" and change the variable name to "Number".
Data analysis tip: When you create data sets for your own research, give your variables descriptive labels. It is easier to interpret analyses when the output has descriptive labels than when the output has labels like "Column 1", "Column 2", "Column 3", etc. Descriptive labels also make your data set comprehensible to others who may need or want to use it. Finally, if you use the data set in future analyses, you won't have to spend lots of time trying to decipher uninformative variable names.
Let's add another column to record the sexes. Click on Cols in the menu, and select New Column. Change the name of the column to "Sex" by writing over the "Column 2". You see a button for Data Type, which allows you to specify whether the column contains numbers (numeric), labels or names (character), or row states (we won't use this). Choose character. Next is Modeling Type, which helps JMP decide what graphs to show you. The two modeling types we use are continuous and nominal. We'll learn about these in more detail later, but the basic idea is to select continuous for numbers and nominal for variables that are labels. JMP displays variables that have numbers as data with a blue "C" and variables that have labels (or names) as data with a red "N".
After you input all the data, answer the following questions. Write your answers on a blank piece of paper to be turned in at the end of lab as your lab report. You're permitted and encouraged to talk about questions with your classmates, but write up your lab report with your own words. Feel free to ask for help from the TAs or classmates if you get stuck.
Questions:
1) How many people picked each number?
With nine people it's straightforward to look at the data and get an
accurate count. But, in the entire class of 120 people, counting
the incidences of each number "by hand" would be cumbersome. In
such settings, you can make life easier by sorting the numbers in
increasing order, then count the incidences. Let's do this
in JMP just to get familiar with this handy command.
Select the Tables menu option and click on Sort
. Select the variable "Number" and place it in the By
box.
Hit Sort. You get a sorted data set in a new
table. Sorting is useful for many data analyses. In
fact, you may want to use it again later in the lab.
2) If you want to sort the data first by sex and then by
number (i.e. have all the females first with numbers in increasing order
and all the males second with numbers in increasing order), which
sequence of commands would you use? Select (a) or (b), and write
just the letter of your answer on your report.
(a) Select the Tables menu option and click on Sort
. Select the variable "Number" and place it in the By
box.
Then select the variable "Sex" and place it in the By box. Hit Sort.
(b) Select the Tables menu option and click on Sort
. Select the variable "Sex" and place it in the By box.
Then select the variable "Number" and place it in the By
box.
Hit Sort.
Okay, that's enough of the basics of creating your own spreadsheet.
Now for some real data that someone else has collected.
Unit 2: Downloading data sets
Load in the data set forbes94, which contains the1994 compensation information for Chief Executive Officers (CEOs) of several large companies. To open this data set, click here. Take a look at those total compensation figures.... Yikes!! Why did I decide to go into academia?
When you get a data set, the first thing to do is figure out how many variables and how many units of observation you have to play with. This is pretty easy in JMP. Each column represents a variable, and each row represents a unit of observation. Hence, there are 800 CEOs in this data set. There are also a mix of numeric (blue "C") and character variables (red "N") in the data set.
Let's get into some analyses. Ask for help if you get stuck.
Questions:
3) JMP displays missing values with dots. True or false: There are more than five CEOs whose values of total compensation are missing in the data file.
Data analysis tip: It is common for some data to be missing on a file. Unfortunately, there is no universally accepted way of representing missing values. Some software packages, like JMP, use a dot or period. Other packages use an "NA" for not available. Some data producers, like federal agencies, use extreme values of a variable (e.g., -99) to indicate missing values. Using extreme values is bad practice: how does the user know if the value is an actual value or if it is a dummy for missingness? When you get a data set from someone, learn how they code missing data before doing any further analyses.
4) What is the salary (not total compensation) of the CEO of Blockbuster Entertainment?
We need to search through the data base for Blockbuster, then read
off the salary of its CEO. One approach is to look at the
company names row by row. For those who find joy only in tedium,
this is the preferred approach. All others should go to the
Edit
menu option, and select Search and then Find.
Type
in "Blockbuster," selecting nothing else. I was
mildly surprised that Blockbuster is considered a retail--not
entertainment--industry. I also didn't expect Disney to be a
"travel" industry. Who knew....
5) Which CEO has the highest total compensation? Who has
the lowest total compensation?
6) Which industry type has the highest average CEO total compensation? Be careful not to read the decimals incorrectly when you answer the question.
There are way too many CEOs to figure this out by hand. Let JMP do all the work. Select the Tables menu option and click on Summary. Put the variable "Wide Industry" in the Group box, then highlight "Total Compensation". Next, click on Statistics to pull down a menu of summary statistics. Select the Mean (and just for kicks, one other summary that interests you). Hit Okay. You should see a table of the statistics you selected for the industries, ordered alphabetically by industry. You may need to scroll down to see all industries.
Each row in the table reports the value of the statistic aggregated
over the industries. For example, there are 62 CEOs in "Food"
industries, and their average total compensation equals $2,740,661.31.
That's a lot of Twinkies.
In general, the summary command is useful for comparing means and
other statistics for several groups.
7) How many of these CEOs got their undergraduate degree from
Duke?
8) Let's assume all the CEOs from UNC schools graduated from
UNC Chapel Hill. Assuming this, there are more CEOs with
undergraduate degrees from Carolina than there are from Duke.
Your friends at Carolina use this to argue that their graduates
are more likely to be CEOs than Duke graduates. Defend our
school! Use the CEO counts to make a statistical
argument that Duke does not lag behind Carolina in producing CEOs.
Write two or less sentences to justify your answer. (You need some
information that is not in the data but is easily found on the web.)
9) Highest attained educational degree is in the variable
"Grad degree". If your only interest was making good cash
(total comp.), which path should you pursue: MBA (business), JD (law),
MD (physician), PhD, or no graduate degree? Use highest average
total compensation as your criterion, and choose only from these
categories.
Justify your answer in one sentence.
10) Explore the data to answer at least one question that
interests you. Report your findings to one of the TAs or the
instructor; you don't have to write anything on your lab sheet for this
question. Ask your TAs for help with JMP if needed.
You may want to begin your list of JMP commands by adding instructions
for the methods you used in Lab 1. We'll use sorting and
summarizing by groups for Lab 2 (and for later labs), so it will be
helpful to have commands for those data analysis tools handy.
(Obviously, don't turn in this list; it's yours!)
This ends the lab.
DON'T FORGET TO LOG OFF FROM YOUR MACHINE.