This lab exercise is mainly a continuation of the first lab, since not everyone was able to finish last time. The instructions from the first lab are included below, so that you can finish it without having to go back to another link. I have also added some new exercises at the bottom.

If you were able to successfully read in the data set during the last lab section, it should still be saved in your file system. In this case, you will not need to re-submit the code that began with the line "data sta110b.ex1", but you will need to re-submit the line

libname sta110b '~/sta110b';
This identifies for SAS a directory (sta110b) in which it can look to find data sets. Then, skip to part of last week's lab that provides directions for beginning the INSIGHT module. (Type insight in the toolbox's command line.)

If you were not able to successfully read in the data set during the last lab section, just refer to the instruction from the last lab below.



Instructions from Lab 1

Exploring SAS/INSIGHT

We will use as sample data set a portion of the data collected according to the explanation below:

In order to better understand the interactions that take place between salespeople and customers, Ronald P. Willett and Allan L. Pennington (1966) monitored the interactions of applicance salespeople and customers on the floor of a large department store. Part of their reseearch involved observing the length of time customers and salespeople interacted prior to the close of the sale or the departure of the customer. The data below, adapted from the article, are the lengths of time (in minutes) from the first customer-salesperson contact to the close of the sale or the customer's departure for 132 customers who completed their applicance purchase either at the time they were observed or within the following 2 weeks. Instances where a purchase was made by the customer at the time he or she was observed are denoted with an asterisk.

Prepare a directory to hold your SAS datasets

At the UNIX prompt (generally this prompt is a %), type

mkdir sta110b
The command mkdir stands for "make directory" and creates a directory called sta110b one level down from your home directory (which is where you are immediately after you log in).

Start SAS

At the UNIX prompt, type
sas &
The SAS program will generate about 4 different windows (toolbox, program editor, log, output). You will probably want to "iconify" the output window, which we won't need in this exercise. To do this, click on the dot in the upper right hand corner the window's title bar. The window will be reduced to small pictorial representation on the right-hand side of the screen. To get it back, double-click on it.

Read the dataset into SAS

Now we have to tell SAS about the data. This requires some code to be entered in the SAS program editor window. You can copy and paste this code from Netscape directly into the program editor. Highlight the text to be copied using the mouse's left button. Then, position the cursor at the spot in the program editor where you want the text to go. In this case, you want the text to begin 2 spaces to the right of 00001. Click the middle button to paste the text. (The numbers 00001, 00002, etc. are there primarily for mainframe SAS users; you can just ignore them.) Now we tell SAS in which SAS library (for us, this amounts to which directory) datasets can be placed.

 libname sta110b '~/sta110b';
This tells SAS that it can store/read datasets from the directory we have just created. After entering this in the program editor, you need to submit this line to SAS. Either choose the running man icon in the toolbox window or choose submit under Locals in the program editor. The log window should contain a message to the effect that the "libref" (the library reference - "sta110b") was assigned successfully. If you don't see this message, an error has been made, and the following steps won't work correctly.

To read the data set into SAS, submit the following code. As you can see from the first line, the data set will be named ex1 and will be stored in the directory refered to by the libref sta110b (which is your newly created sta110b directory).

data sta110b.ex1;
  infile '~jls11/public/lab1.dat';
  input when_buy $ minutes;
run;
So, the dataset is called ex1 and is stored in the library sta110b (which we have set to the directory sta110b). It has two variables called when_buy and minutes. The log window should tell us that there are 77 observations and 2 variables.

Start INSIGHT

Go the command line in the toolbox window, type insight, followed by return. In the next window, choose the dataset you created; the data window containing your data set should appear.

Create a histogram of all the data

Under the Analyze menu, choose the option Histogram. Click on MINUTES, then click on the button labelled Y. This means that the variable MINUTES will be treated as the response variable. Another window will appear with the histogram. Experiment with the gray arrow button at the bottom left of the histogram to see what you aspects of the plot you can change. You can close the histogram window by selecting End under the File menu.

Group the data

Divide the data into two groups depending on when the purchases were made. An entry of N (as in "now"), denotes that the purchase was made while the observations of time were taken; an entry of L means the customer came back "later" (within 2 weeks) to buy the item.

Click on the arrow at the top left of your spreadsheet. Choose Define Variables, then the variable WHEN_BUY, and then click GROUP. The data will be grouped according to this variable's values when graphs are drawn. To see this, repeat the histogram-drawing procedure above. Now, you should obtain two histograms for the two groups, placed side by side. Notice the strange x-axis for the one of the histograms (negative tick marks). Fix this by using the relevant options with the histogram.

Mark a particular observation in a plot

Click on the number to the left of a particular observation's number in the dataset window (the one that looks like a spreadsheet). Take a look at the histograms, and notice that the highlighted observation's position in the histogram is marked.

Leave certain observations out of the plot

Under Edit, look on the cascading menu corresponding to Observations. Click on Find. Select the expression that represents "minutes greater than or equal to 98.2". The observations that meet this criterion will be highlighted in the data window. Go to any of highlighted observation number, and click on the black square to its left. From the popup menu, select Show in Graphs. Notice that the black squares have disappeared from the highlighted rows. Now, redraw the histogram and note that these observations are no longer there. While these rows are highlighted, you can use the option Show in Graphs on the same path as the Find option to re-include these observations.

Adding more data

Notice that INSIGHT works similarly to a spreadsheet. You can click on a square to enter more data. Trying adding some new rows. How can you save your altered data set with a new name?

To end the SAS program

To end SAS, type bye in the toolbox's command field, or choose Exit under the File menu in the log, program editor, or output window. Your dataset will be saved in the sta110b directory you created at the beginning of this exercise. If you have made any changes to the dataset that you'd like to preserve, you must save the changes before exiting.

Explore the Course Info page for STA110B

Through the new Course Info system, a new course page has been set up at http://cinfo.aas.duke.edu/courses/STA110B.01-04-S2000. We'll try this software out this semester by posting assignments, lecture notes, etc. there. (We will not use all the features, so don't be surprised if some pages are empty.) Depending on student feedback and administrative ease, we may make a complete transition to Course Info as the semester goes on. Until then, all pertinent information will be on both web pages.

Every student registered for this course has his/her own username and password to log in to the Course Info system. You will need to follow these instructions (from Arts & Sciences Computing):


When you go to the course you will be prompted to login. Log in for the first time using your ACPUB ID as BOTH the login ID AND password.
For example, if your acpub ID is "fakeid" you would type:
User ID = fakeid
and
Password = fakeid
As soon as you are in the course you need to change this temporary password to a more secure one of your choice. To do this, click on the "Student Tools" button on the left. Then click on the "Change your Information" icon. Scroll down to the bottom of the page and change your password by typing in the one you want to use. (It can be the same one you use for your email if you wish). Finally, click on the "Update Password" button and you're done. I hope things go well.

Make sure that you know how to find assignments, lecture notes, and general course info on these pages. Also, explore the Student Tools section, which contains some cool resources for students. You can make your own page (viewable by all of us associated with STA110B), set up your own calendar, and more.

To avoid comprimising your security, you must logout of the Course Info pages and exit from your browser (i.e. Netscape, Internet Explorer, etc.) completely when you are done!



NEW: Instructions for Lab 2

Drawing a boxplot

Choose Box Plot/Mosaic Plot from the Analyze menu. Highlight the MINUTES variable, then click the Y box, then click OK. Another window will appear with a box plot. Notice that this boxplot is drawn in a different style than the ones in your text; there are some dots (representing observations) outside the "whiskers" of the plot. To find out the rules that SAS/INSIGHT uses for drawing box plots, we need to use the SAS/INSIGHT help window.

Accessing the SAS/INSIGHT help menu

Choose from the series of "cascading" menus Help->Reference->Box Plot/Mosaic Plot from the window containing the boxplot. A help window about box plots should appear, choose Method at the bottom. The third paragraph from the end explains the rationale used by SAS/INSIGHT to determine the "whisker length". If necessary, the length of the whiskers can be controlled using the Methods option, which is present in the menu that appears after you choose Analyze->Box Plot/Mosaic Plot. Note that you also had the option of choosing a master index of help topics from the Help menu; this may be useful in the future.

Producing summary statistics and descriptive plots at the same time

When doing an exploratory data analysis, you will generally want to combine the previously mentioned graphs with some summary statistics (mean, median, quartiles, etc.) This can be done in one step, using the Distribution option under the Analyze menu. As before, choose MINUTES as the "Y" variable, and click OK. A new window will appear, containing a histogram, boxplot, and tables of summary statistics.

To produce the same graphs and statistics for the grouped data (two groups are established according the values of the variable WHEN_BUY), again choose the Distribution option under the Analyze menu. Choose MINUTES as the "Y" variable (as before), and in addition, choose WHEN_BUY as the "GROUP" variable. Then, click OK. (In the resulting window, you may need to scroll to the right to see the complete results.)


Using statistical functions on your calculator

This section is optional. It is intended for those students who have calculators with built-in statistical functions, such as mean and standard deviation, on their calculators. This functionality is NOT necessary, but if you have it and want to use it, you need to make sure you are using it correctly. It is common to make mistakes when using a built-in function to calculate the standard deviation of an stored data set. Some calculators provide a function that calculates "sample standard deviation"; some provide a function that calculates the "population standard deviation". (Some provide both.) So far in this course, we have been interested in sample standard deviation, which is defined according to the formula

s = sqrt( (1/n-1)*sum((x-avg(x))^2) )
Population standard deviation (which we haven't dealt with yet in this course) is calculated according to a differing number of degrees of freedom:
s = sqrt( (1/n)* sum((x-avg(x))^2) )
You can refer to the instructions/manual for your calculator to determine which your calculator is giving you. Or, try entering this simple data set of two numbers, and use the built-in function of the calculator in question to calculate the standard deviation.
0, 2
If the answer is 1.414214... (square root of 2), then you are indeed calculating the sample standard deviation, which is what we have been interested in so far in this course. If the answer is 1, then you are calculating the population standard deviation. In order to obtain the sample standard deviation from this, you need to multiply this result by the correction factor (as mentioned in the book when going from MSD to variance), which is
sqrt(n/(n-1))


To logout from ACPUB

While the cursor is on the background (not in the xterm windows, netscape, or the windows corresponding to any other program), click the left button. From the menu that appears, select Logout.

To preserve the security of your account (including your files and password), you must logout!