Lab 1 Objectives:Introduce S-Plus 2000, import data, and produce graphical and
tabular summaries as in Section 2.4, Further Applications, and in
the exercises of Chapter 2 of the
text, Principles of Biostatistics (POB). You will not be graded,
but it is in your best interest to do the labs as this sort of
material may show up on graded assignments/exams. Besides, this
is fun! The lab may be a bit long, don't worry if you can't finish all
of this during the lab hour. Topics Covered
The following instructions are based on the assumption that you are using S-Plus 2000 in one of the computer labs on campus. If you have any questions about how to do the assignment at home, do not hesitate to ask your TA or post a question on the Blackboard (Course Info) Discussion Board! Preliminaries:Data sets in Excel format for all the exercises are in the folder Excel/exercise and most of the data used in the chapters of the text book can be accessed from subdirectories of Excel/ (for Excel format), both on the CDROM that came with the text book. Or, they can be downloaded from the web site from the POBdata folder. In this lab, we will cover how to import Excel files into S-Plus. First save the file, birth weights.xls, onto your computer. Do the same for france.xls and health care expend by gdp.xls. Note: you may want to save these files to your own floppy disk so that you may continue working on another computer at a later stage (if need be). Remember where you saved them! Starting S-Plus:Start up S-Plus using the Start menu. The program may be listed under Programs > Statistics & Mathematics > S-PLUS 2000. Or you may be able to use the Run... dialog box to start S-Plus 2000 by typing "Splus" and pressing the OK button. If you have questions, please check with the TA. Creating your Personal Workspace: (optional)You can create your own workspace directory for saving your class work, this will help if you need to move files to different computers. See Chapter 7 of the S-Plus 2000 User's Guide via the S-Plus online help menu (Help>Online Manuals>User's Guide) or follow the instructions here for creating a workspace. Importing Data:The next thing we need to do is read in the data. S-Plus stores data in objects called "dataframes" (in your workspace). A dataframe is like a table of numbers, with columns corresponding to variables and rows corresponding to observations. Data in dataframes can be continuous, discrete or categorical (we'll discuss types of data in lecture, or ask your TA). When you start S-Plus, a "Select Data" window may be open. Select "Import data", and click "OK". If the window was not opened at startup (or you need to create a new dataframe at a later point in the session), go to the File menu and select Import Data > From file.... In the "Import Data" window, select "Microsoft Excel Files (*.xl*)" in the Files of Type scrolling menu. Enter or browse to select the file birth weights.xls that you previously downloaded from from the course web site (or from the CD-ROM included with the text book). The default name for the dataframe will be birth.weights; you may change it if you prefer. (just be sure you use your new name in place of birth.weights in any commands below). Click on Open (or hit Enter). You should see a "spreadsheet" that represents the dataframe birth.weights with 2 columns (weight and number) and 11 rows. We need to reformat the data before we proceed. Select the weight column with the mouse. Select Format and then Format Selected Object(s).... In the "Factor Column" dialog box, change the order of the Factor Levels: to so that they're in increasing numerical order (not alphabetical as S-Plus has ordered them). Note that you merely have to move the "500-999" factor level from the second to the last position to the second position. Make sure you get the quotes and commas right! Click OK. Don't worry about the term "factor"; factor levels are simply categorical data labels. You won't see any change on the spreadsheet, but now the table we'll create below will be in the right order.Introduction to Tabular Summaries:Crosstabulations: Here, we'll try to reproduce Tables 2.10 and 2.11 in Section 2.4 of the text book. First, go to Options > General Settings... and click the Computations tab on the General Settings dialog box. Set Print Digits: 9 and press OK. Now, to create a frequency distribution (since we're creating a table, we'll also call it a frequency table) of the number of infants in each of the categories in weight of the birth.weightsdataframe, go to the Statistics menu and select Data Summaries and then select Crosstabulations.... In the Crosstabulations dialog box select Data Set: birth.weights. Also, select Variables: All, and Counts Variable: number. Click on the Options tab and uncheck the Show Marginal Totals and Run Chi-Square Test options (we may discuss this test later in the course). Also, put Decimal Places: 1. Then click OK. You should see a table in the Report Window that contains the same information as do Tables 2.10 and 2.11 in your text book, except we see proportions whereas Table 2.11 shows percentages. Don't worry about the "Call:" formula; this will be familiar to those who run S-Plus without menus (i.e., by "command-line"). Can you decipher the output? What kind of data are we using here (nominal, ordinal, discrete, continuous,...)? In 1986, what proportion of infants born in the US had birth weights between 500 and 1499 grams? If you were to "randomly" select one infant from the same group of infants, what is the "chance" that the infant weighs 4500 grams or greater? If you were to use this table in a formal report (as in a class assignment), you would include an informative title and column labels as in Tables 2.10 and 2.11. Note that Figure 2.15 in your text book shows the same information in graphical form (using percentages instead of proportions). Introduction to Graphical Summaries:Stacked Bar Chart: Next, we'll create a stacked bar chart like that in Figure 2.14 of your text book using the data in france.xls. As with the birth weight.xls file we need to import the data into S_Plus. Go to the File menu and select Import Data > From file.... In the Import Data window, select "Microsoft Excel Files (*.xl*)" in the Files of Type scrolling menu. Enter or browse to select the file france.xls that you previously downloaded from the course web site (or from the CD-ROM included with the text book). Click the Options tab and put Name Row: 1 to indicate that the first row of the Excel file contains names instead of data. Press Open. The default name for the dataframe will be france. Notice some of the column names may not have imported correctly. They should read something like "year", "stillborn", "0-6 days", "7-27 days", and "28-365 days", but S-Plus has certain naming conventions that do not allow some of these names to be used. We won't worry about this now. Leave the names as they are. Now let's make a stacked bar chart. Choose Graph and then 2D Plot.... In the "Insert Graph" dialog box, choose Plot Type: Bar - Stacked (x,y1..yn) and Axes Type: Linear. Then click OK. In the "Bar Plot" dialog box choose Data Set:france and x Columns: year and for y Columns: include all remaining columns. Then click OK. You should see a plot similar to Figure 2.14 in your text book. Add a legend by going to the Insert menu and selecting Legend.... Press OK on the Legend dialog box. A legend appears on the plot. You may change the legend names by double-clicking on them to get the "Legend Item" dialog box to pop-up. On the Text tab change Text: accordingly. You can change axis labels in a similar manner. Box Plot: Let's reproduce the box plot in Figure 2.16. First import the health care expend by gdp.xls into S-Plus. Go to the File menu and select Import Data > From file.... In the "Import Data" window, select "Microsoft Excel Files (*.xl*)" in the Files of Type scrolling menu. Enter or browse to select the file health care expend by gdp.xls. Click the Options tab and set Name Row: 1 and Name Col: 1. Press Open. We need to change the percent column to numerical data (S-Plus makes them "factor" data (i.e., categorical data)). Right click on the percent column of the spreadsheet with your mouse and select Change Data Type.... First, change New Type: to character (yes, character). Press OK. Now, change the data type to double in the same way. (This is a circuituous way of getting S-Plus to treat percent as numbers (double precision).) Note the "NA" in row 22. This means "Not Available". Now let's make the box plot. Go to the Graph menu and select 2D Plot.... In the "Insert Graph" dialog box, choose Plot Type: Box Plot(x, grouping-optional) and Axes Type: Linear. Then click OK. In the Box Plot dialog box, set y Columns: percent. Then press OK. You should see a plot similar to Figure 2.16 in your text. What type of data are these? Titles: To add a title to any graph, go to the Insert menu, and select Titles > Main then add an appropriate informative title in the box that appears with "@auto". To include information on the source of the data repeat using Subtitle in place of Main. Reposition titles, subtitles, axis labels as needed by dragging with the mouse. Importing Graphs to a Word Processor: To paste the graph into a word processor such as Word, make sure the Graph Sheet is the active Window, then click on the icon of the Clipboard with a graph (at the far right of the second row of toolbar buttons at the top of the S-Plus window-- if you hold the mouse over it you should see "Send Graph to Other App'). A dialog box should appear indicating the graph has been sent to the clipboard. Go to your word processor and click on the clipboard to past the graph into your document. Resize as needed. Quitting: Go to the file menu, and select Exit. Save any data sets or graphs as needed. Remarks: As you can see, sometimes things don't work out exactly the way you'd expect the first time you try to do something in S-Plus (e.g., Did the data type import correctly? Are the names correct?). This is almost the rule rather than the exception, at least until you become proficient at S-Plus. Even then, your first attempt is not alwasy successful. Keep trying and ask questions! For the Energetic StudentAs you can see from the menus, there are lots of options in S-Plus, beyond frequency tables, stacked bar charts and box plots. The other types of plots discussed in Chapter 2 can also be easily created in S-Plus, such as line graphs, and two-way scatter plots. Try creating the plots below.Two-Way Scatter Plot: Read in the data for exercise 19 from the file cigarett. For Exercise 19 part (c), create a scatter plot of tar versus nicotine concentration from the Graph menu using 2D Plot...> Scatter Plot (x, y1, y2...) . In the dialogue box, select variables to go on the Y and X axes. (Does it matter here which way they are arranged?) Click OK. Replace labels and add a title. Line Graph: Read in the data for exercise 20 from the file brate. Select Graph > 2D Plot... > Line Graph then click on OK. Select year for the x Column and birthrt for the y column; click OK. Relabel axes in English and add a title. Why is there a slight drop in the 70's? Bar Chart: For exercise 10, go to File >New > dataset to create a new dataframe; enter the two columns for Year and Cases. By default S-Plus will store variables as double precision; to change any to integers, go to the Data menu and select Change Data Type.... In the pull-down menu for New Type, select integer. Select the Year in the Column field then click Apply; repeat for Cases. To make the Bar graph, go to Graph > 2D Plot... > Bar with Base at Zero (x, y). Click on OK. Select Year for the x Column, and Cases for the Y-column; click on OK. Edit as needed. Another Stacked Bar Chart: (example exercise 12 - create a dataframe with 5 columns, Age, whitemen, blackmen, whitewomen, blackwomen). Select Graph > 2D Plot... > Bar - Stacked (x, y1, y2...), then click on OK. Select Age for the x Column, then select whitemen, blackmen, whitewomen, blackwomen for the y Columns by using a Control-Click to select multiple Y's to stack. Click on OK and then edit as needed.
|
![]()