A Summary of S-Plus Commands for Windows and Command Line
This page will be updated periodically, so be sure to "Reload" or "Refresh".
Please email sandra at stat.duke.edu with any corrections/updates.
Last modified: Sun Feb 22 13:48:48 EST 2004
For each assignment/project, create a directory in your Z: drive where you will store all Splus work for that assignment. This is critical for those who use cluster computers. When you start Splus, Splus will give you a window to specify a "Path" to the directory you just created. If Splus does not give you that option at startup, go to "File" - "Chapters" - "Attach/Create Chapter" and under "Chapter Folder" browse to the new directory you created for your assignment. Make sure that "Position" is set to "1". (As the semester wears on, those who don't carefully set up separate directories for homework problems may find their files corrupted or Splus may not function properly.)
Go to "Statistics" - "Data Summaries" - "Summary Statistics". Choose the variable name you would like to summarize, for example, "concentration". Click on the tab called "Statistics" and check these boxes, "mean", "standard error", "confidence interval for mean", "1st quartile", "median", "3rd quartile".
We'll look at summary statistics where "code" is used as a grouping variable. Go to Statistics - Data Summaries - Statistics, and under "Group Variables"," choose "code". Summary stats will be printed out for each group.
Go to "Graph" - "2D Plot" - "Histogram". Under "X column", select the variable of interest.
Y axis of counts: Go to the "Options" tab and chose "Output type" "Counts"
Y axis of frequencies: For the second, go to the Options tab and chose Output type Freq. Make sure you understand what these two presentation methods say.
Add a title to your histogram by going to Insert Text.
For the boxplot, go to "Graph" - "2D Plot" - "Boxplot". Under "Y column", select the variable of interest.
Click on the y-axis label to add units of measurements.
Go to Graph - 2D plot. On the left, "Axes type" is "linear". On the right, "Plot type" is "boxplot (x, grouping optional)". Click "OK". For a variable called "choles", under "Y Column" select "choles" and click "OK" to produce a boxplot of all the cholesterol measurements. To boxplot the cholesterol data grouped by "urban" or "rural", select "choles" under "Y column" and select "code" under "X column". Refine the axis labels in your plots to include a descriptive variable name and the units of measurement.
Finding a confidence interval for the mean in Splus:
Go to Statistics - Data Summaries - Summary Statistics. Select your variable of interest under the section of the window Data and Variables. Now select the Statistics tab at the top of the pop-up window. Click the box next to Conf. Limits for the Mean and specify the Conf Level. Now click Apply or OK and the answer will be appended to the summary statistics.
Normal and t quantiles and probabilities
Use the command line in Splus by going to Window - Commands Window. A blank window will open with a prompt like this >.
To find the area under a standard normal curve to the left of a number c, type into the command window:
> pnorm(c)
Example:
> pnorm(1) (area or probability under normal curve to the left of 1)
[1] 0.8413447
To find the value of z on a standard normal curve that corresponds to a given area p under a normal curve (a given probability p) type into the command window:
> qnorm(p)
Example: Whats the value of z that cuts off an area of .975 to the left?
> qnorm(.975)
[1] 1.959964
Similarly, for the t-distribution, suppose 19 degrees of freedom, then the commands are:
for quantiles:
> qt(.975,19)
[1] 2.093024 (useful as a multiplier in a 95% CI)
for the area under the t-distribution (probability) on 19 d.f. to the left of 1.67:
> pt(1.67,19)
[1] 0.944344
Format data for 2-sample t-test and fixed effects ANOVA
If your data are grouped by a variable called "code", change the "code" variable to a factor by going to Data and Change Data Type. Select "code" variable and under "New Type" choose "factor".
2-sample t-test and Confidence Interval for Difference of two means
You need to have your data arranged with measurements in one column and a "code" variable in the other column that will determine group membership. For the Guatemalan dataset, the variable names are "choles" and "code" (indicating urban or rural).
STATISTICS - COMPARE SAMPLES - TWO SAMPLE - T-TEST
Variable 1: choles
Variable 2: code
Check the box "variable is a grouping variable"
Assume equal variances
Mean under the null: 0
Alternative: two-sided (or one-sided)
A confidence interval and the two means are printed out.
One-way and Two-way Fixed Effects ANOVA
Go to Statistics and ANOVA and Fixed Effects. Dependent variable is your response or Y variable; and Independent variable is a code for treatment. Make sure you have changed this code to a FACTOR using "Data" "Change Data Type" "Factor". This will create the following formula in the formula window: "Y~code".
Two-way anova is similar except the formula is something like " Y~code1+code2".
For both types of models you may wish to print out group means by clicking on ""means" under the "RESULTS" tab in the ANOVA dialog box. Residual plots such as "Residuals vs. Fit" and "Normal QQ" can be found under "Plot" in the ANOVA dialog box.
Formula statements are as follows:
Interaction plots for 2-way ANOVA. These plots summarize the mean of Factor A at each level of Factor B and are very useful graphical depictions of interaction effects. Go to "Statistics" and "Design" and "Interaction Plot". Choose the dependent and independent variables. Click "Both orderings for each pair" if you want to see 2 plots, one with Factor A on the x-axis, and one with Factor B on the x-axis.
For a dataset that is called "autofilter", with variables measuring emissions and car size (small=1, medium=2, large=3). Create a subset of the emissions data for small car size:
DATA - SUBSETDataset: autofilter
Columns in subset <ALL>
Subset Rows with: SIZE==1
Result type: Dataset
Save in: smallcars
Now a new spreadsheet window should pop up with the dataset called smallcars. You wont need the middle column, which is called SIZE, and has entries all equal to 1, so you can delete it:
How
to remove a column/row of data:
Highlight the column (or row)
DATA - REMOVE
Remove: column or you can choose "row"
Calculating
Power and Sample Size for a 2-sample t-test
Transformations of Data
In
Splus, you will create a new column for a log transformed version of a variable. Call the variable (on its original scale) X. Go to "Data" and "Transform".
Give your the log transformed X a name in "Target Column". We'll call it log.X. In the box next to "Expression",
type the desired transformation formula, in this case it is "log(X)". A new column is created in your
dataframe called log.X with the new log transformed values in it.
Create logit transformed data by creating a new column in Splus. Do this by going to "Data" and "Transform". Give your new variable a name in "Target Column". I'll refer to it as "newvar". In the box next to "Expression", type "log((newvar)/(1-newvar))". A new column is created in your dataframe.
Thanks to Susannah King for providing this command line code:
First save the residuals and fitted values from your regression model
If your fitted and residual columns are fitted1 and residuals1
plot(fit1,residuals1,type="n",xlab="Fitted",ylab="Residuals")
points(fit1[SIZECODE==0],residuals1[SIZECODE==0],pch="0")
points(fit1[SIZECODE==1],residuals1[SIZECODE==1],pch="1")
title("Residuals vs. Fitted")
abline(0,0)
Normal Probability Plots
Another plot of interest in residual analysis is a normal probability plot, or qq-normal plot.
Multiple comparison procedures in ANOVA
First, fit your ANOVA model, and enter a model name under "Save Model Object".
Go to "Statistics" - "ANOVA" - "Multiple Comparisons".
Coded Scatter Plot
Coded Scatter Plot Directions
Fit a linear regression model
Once your data is loaded and in front of you on the Splus spreadsheet, go to "Statistics" - "Regression" - "Linear". Make sure the name of your dataset is entered and specify the dependent (Y) variable. Shift click to choose multiple explanatory (X) variables. Or you can enter the formula directly, "Y~X", for example.
Regression Line Plot
Regression Line Plot Directions
Linear Correlation Coefficient
Linear correlation Coefficient directions
Polynomial Regression Models
Directions for fitting a second order polynomial regression model.
Plotting a fitted regression for log-transformed data on their original scale in Splus
Directions.
Creating a script file in Splus You can create a script file in Splus (like a macro in Excel) as follows:
Go to "File" "New" "Script". Type each command on separate lines in the file. To run the file, press "F10", and the output will be produced in the lower window.
Coding an indicator variable in Splus
There are 2 ways to code an indicator variable in Splus.
Coded Scatterplot with Fitted Regression Lines Superimposed:Command line instructions
The following commands will produce a plot for a dataset, "data.asc", in which the columns are Y, X1 and code. Let "X1" and
"Y" be continuous variables and let "code" be
an indicator variable. We assume that "code" is coded as a "0" or a
"1". The first command, "attach", allows you to
refer to the variable names in the dataframe directly. Without it,
Splus command line only knows about "data.asc" but not the names of the
variables in it.
attach(data)
plot(X1,Y,type="n",xlab="XLABEL",ylab="YLABEL")
points(X1[code==0],Y[code==0],pch="0")
points(X1[code==1],Y[code==1],pch="1")
title("Whatever title you want")
Now you want to plot the regression lines for each level of the
indicator variable. First, you'll need to calculate the slope and
intercept of the lines you will add. To add a line to your plot, use the command
"abline(intercept,slope)". So let's add a line to our plot with
intercept 1.5 and slope 2.
abline(1.5,2)
You can add each line in a similar way.
Add a legend:
legend(xcoord,ycoord,legend=c("Code=1","Code=0"),pch="01")
Note that you need to enter the xcoord and ycoord yourself. This is the point that marks the upper left hand corner of the legend box.
You can always change the range of the X1 and Y axes by typing, for example:
xlim=c(100,200),ylim=c(300,400)
as extra arguments, separated by commas, in the plot statement.
Getting Fancy: You can make a dotted line by adding "lty=2" to the abline command, so that it reads abline(1.5,2,lty=2).
Labeling points on a graph with text
Here is how to make a scatterplot where the points are the observation numbers:
plot(x, y)
text(x, y)
### this will type the observation number (instead of a dot) at each point.
Now plot the points with the observation number above them:
plot(x,y)
text(x,y+.01)
### the "+0.01" will add a tiny bit to the y coordinate so that the number appears above the point. You have to decide what this tiny bit is -- it will depend on the scale of the y-axis.
You can add text at any place on a graph you want. If I want to add "function 1" at x=.3 and y=.7, I would type:
text(.3, .7, "function 1")
Matrix of scatterplots for multiple variables.
Go to "Graph" - "2D Plot". The menu is divided into the left "Axis Type" side and the right "Plot Type" side. Under "Axis Type" choose "Matrix". Under "Plot Type" choose "Scatter plot Matrix". You will control-click on each variable you would like included in the matrix.
Plotting a fitted model for a nonlinear function on the original scale of the data (command line).
Let's say you have just fit the model log(p/(1-p))~log(d).
Write this equation in terms of p as follows: p= ( exp(beta0) X^beta1 ) / ( 1 + exp(beta0) X^beta1 ) ).
Create a new column in Splus using a transformation of data. In the "Expression Box", type: ( exp(beta0) X^beta1 ) / ( 1 + exp(beta0) X^beta1 ) ). Of course you will substitute estimates of beta0 and beta1 directly into the expression. We will name this new column Yhat for this example.
Follow the command line directions in Coded Scatterplot with Fitted Regression Lines Superimposed to create scatterplot of X on Y on their original scale. Perhaps it will be a scatterplot coded according to some grouping variable. Now, to add the line (instead of an "abline"), type
lines(X,Yhat)
Line types can be specified using "lty=" a number from 1 to 10 (1 is a solid line).
lines(X,Yhat,lty=2)
If you have added multiple lines, a legend can be produced as follows legend(xcoord,ycoord,legend=c("Code=1","Code=0"),lty=1:2)
This will produce a legend with a solid line (lty=1) representing Code=1 and a dotted line (lty=2) representing Code=0. Note that you need to enter the xcoord and ycoord yourself. This is the point that marks the upper left hand corner of the legend box.
Saving residuals and/or fitted values from a regression model.
Under "Statistics" "Regression" "Linear", select the "Results" tab in the "Linear Regression" pop-up box. Under "Saved Residuals", check the box labeled "residuals" and under "Save In", choose your dataset (or some other dataset if necessary.) See also Saving and referring to components of a fitted model object, noted below.
Saving and referring to components of a fitted model object.
Fit the model of interest, say, "flowers~time + intensity". Also in the same window you will see "Save Model Object" "Save As". Save the model object as "mymodel".
Once you have saved your model you can go to the command line and type "summary(mymodel)". Compare this result to that produced by the point-and-click regression fitting method. Another useful command for an anova table is "summary(aov(mymodel))" Other options in the command line include: "mymodel$coefficients", "mymodel$fitted.values", "summary(mymodel)$r.squared", "summary(mymodel)$sigma", "sqrt(sum(mymodel$residuals^2)/mymodel$df.residual)".
ESS F-tests
Let's say that you have 2 nested models that you wish to compare. To use the bee pollen data, consider these models:
Save the fitted regression models. (see above) Call the first "equal.lines.lm", and the second "sep.lines.lm". Go to "Statistics" and "Compare Models". Select "Model Class" as "lm", which means you are looking among linear models. Go to the "Model Objects" drop down menu, and shift-click on the 2 models. Make sure that "F" is checked and hit OK. You'll see the F-statistic of interest.