## STA215 Statistics: Computing

 Book: Phil Spector, An Introduction to S and S-Plus (optional) Notes: Venables & al., S-Plus Documentation

Quick links and FAQs on S-Plus, graphics, printing, etc:

S-Plus is an interactive environment for graphics and scientific computation with a range of statistical modelling and analysis tools. We'll use S-Plus on the Acpub unix machines. Go to OIT's acpub unix pages if you're not already familiar with the basics of unix workstations and the public clusters.

To get started, here are a few S-Plus commands to read in the VA data set. First you should download this and the data file from the Data Sets link -- See the walk-through information on Lab 1. Learn S-Plus by doing. The S-Plus environment has a comprehensive on-line, window-based help system once you get beyond the basics; see the cheat sheet.

The best way to operate with S-Plus is from within the emacs editor. This allows you to easily re-use and edit S-Plus commands you used earlier, and does not require much knowledge of emacs. We'll support this in the course. Click on the "S-Plus within emacs" link on the list above to get going, and then follow the info on Lab 1.

SOME S-PLUS BASICS AND COMMONLY USED COMMANDS

```
A.  STARTUP AND ENVIRONMENT

Splus               to start up the Splus system
motif()             to open a graphics window for plots on screen
help.start()        to open the help window system
or you may need to use help.start(gui="motif")

B.  INPUT AND OUTPUT

x <- scan("filename")              to read numbers from a file
source("sfile.s")                  to read Splus code from a file
sink("file")                       to write everthing that follows to a file
sink()                                      ... and back to the screen
write(..)                          write to file -- check help
write.table(..)                    write to file -- check help

C.   ORGANIZING DATA

x <- matrix(x,ncol=5,byrow=T)    reshapes x to a 5-column matrix, by row
x <- matrix(x,12,5)              reshapes x to a 12x5 matrix, by column
x <- matrix(scan("filename"),12,5)   etc
x <- c(x,c(1,10,y),1:10)         c() means "catenate" to create vectors

D.   RANDOM VARIATE GENERATION AND DISTRIBUTIONS

sample                    to sample with or without replacement from a set

dnorm()                   normal pdf
pnorm()                   normal cdf
qnorm()                   normal quantile function, inverse of pnorm()
rnorm()                   generate normal random variates
?pnorm or help(pnorm)     Give syntax help for pnorm() etc.

Others: ?dist where ? =d,p,q,r with parameters as follows:

|+--------------------------------------------------------------+|
||  dist         Distribution       Parameters        Defaults  ||
|+--------------------------------------------------------------+|
||  beta         beta             shape1, shape2        -, -    ||
||  cauchy       Cauchy           loc, scale            0, 1    ||
||  chisq        chi-square       df                    -       ||
||  exp          exponential      -                     -       ||
||  f            F                df1, df2              -, -    ||
||  gamma        Gamma            shape                 -       ||
||  norm         normal           mean, sd              0, 1    ||
||  t            Student's t      df                    -       ||
||  unif         uniform          min, max              0, 1    ||
|+--------------------------------------------------------------+|

E.   PLOTTING

par(mfrow=c(3,2))               Next 6 plots are laid out in 3 rows of 2
par(....)                       Many arguments to control display, such as
mfrow=c(1,1)                  plot layout on screen/page
bty="n"                       no frame drawn around graph
...                           see "help(par)" for others

hist(x,nclass=25,prob=T)        histogram with 25 "classes", normalized
as a pdf (i.e. with area 1)
plot(x,y)                       scatter plot (points with coord x,y)
plot(x,y,type="l")              line plot    (connect-the-dots)
plot(x,y, ...)                  many arguments to control display, such as
type="l"               for lines
xlim=c(0,10)           horizontal range of plot
ylim=c(0,1)            vertical     "    "   "
xlab="here is a label on the x-axis"
ylab=...
col=2                  use second color (could be 1-15)
lty=3                  use third (broken) line type (lots avail)
lwd=2                  twice the usual line width
lines(x, y)                     add lines  to an existing plot
points(x, y)                    add points to an existing plot

tsplot(x)                       "Time Series" Plot a vector vs 1,2,...
qqnorm(x)                       normal quantile-quantile plot
boxplot(x, ..)                  boxplot

mtext(side=3, line=0, cex=2, outer=T,
"This is an Overall Title For the Page")

F.   ASSIGNMENT AND BASIC ARITHMETIC OPERATORS

<-    Assignment (note: S-Plus does NOT use "=" for assignment)
*    Multiply
-    Subtract
/    Divide
^    Exponentiate

G.  SEQUENCE AND REPETITION

x <- 1:50                              1,2,3,...,49,50
x <- seq(1,50,by=10)                   1,11,21,31,41
x <- seq(1,50,length=50)               1,2,3,...
x <- seq(0,1,length=101)               0,0.01,0.02,0.03,...,0.99,1.00
x <- rep(y,10)                         y,y,y,y,y,y,y,y,y,y

H.  SUBSCRIPTS

[ ]    Vector subscript                x[3]<-101;  y <- x[20:1]+1
[,]    Matrix subscript                x[1,5]<-0;  y[1:10,]

I.   RELATIONAL OPERATORS

==   Equal-to                          (4==7) is 0; (1+2==3) is 1
!=   Not-equal-to                      (4==7) is 1; (1+2==3) is 0
<    Less-than
<=   Less-than-or-equal-to
>    Greater-than
>=   Greater-than-or-equal-to

J.   CONDITIONALS

if (i==1) x<-10
if (i>0) { x<-10; y<-20}
else  x<-y<-0

K.   ITERATION
for (i in 1:10)  { x[i]<-i; y<-c(i,y); ... }

L.  ARTIHMETIC OPERATORS AND FUNCTIONS

abs(x)                                 Absolute value
cos(x), sin(x), acos(x), etc           Trig functions
exp(x)                                 Exponential function
gamma(x)                               (x-1)!
log(x)                                 Default base is e, not 10
log(x, base=exp(1))                    Optional argument sets base
max(...), min(...)                     Maximum, minimum
mean(x)                                Sample average
median(x)                                     and median
mode(x)                                       and mode
var(x)                                        and variance
summary(x)                             Gives summary statistics
quantile(x, probs=c(0,.25,.5,.75,1))   Your choice of quantiles
var(x,y)                               Covariance, with 2 arguments
cor(x,y)                               Correlation coefficient

```

In your home directory on acpub you may have a file called .emacs already; if not create one with an editor and simply add the line

If the file is already there, add the above line at the end. This will set you up to run S-Plus inside emacs. Try it:

• Type emacs & to fire up emacs
• When up and running, type Esc x followed by Shift S. Emacs will probably ask you for the name of a directory to run S-Plus in -- just hit return and off you go. (On some keyboards the Esc S can be replaced by Alt S, which is faster & easier to type)

First you should specify a default printer in the graphics window in S-Plus, as follows.

In the Soc 133 cluster, the two printers are imaginatively named soclp1 and soclp2.

In order to directly print the displayed graph in a motif() graphics window in S-Plus, go to the Options menu on the motif window and select "Printing...". In the window that comes up there is a command line: type "lpr -Psoclp2" to select soclp2 as the default printer for all future graphs. Then click on the "Apply" button, and then the "Save" button, and then close the window.

From here on, clicking the "Print" selection on the motif() window will print the displayed graph to that printer

You can save graphs in postscript files for later printing. For example,

```
postscript(file="somefilename.ps")
plot(prior,post)
more plot commands here ...
dev.off()
```

creates a file called somefilename.ps in your directory, and all the graphs done before dev.off() are in there instead of on the motif() display.

Then from an x-window, you can print via either of

```
lpr -Psoclp1 somefilename.ps
lpr -Psoclp2 somefilename.ps
```

Working on your own PC you can ftp postscript files for viewing and printing in your room.

You may open two or more motif() windows and postscript() files for printing simultaneously. These are known by S-Plus as graphics devices, and referred to by dev commands. The first device opened is the window in which you are typing -- this is device 1. If you open a motif() window next, S-Plus knows it as device 2. Open another motif() window, that's device 3. Open a postscript() file next, and that's device 4, and so on. Then you can switch between devices to draw graphs on any one motif() screen, save to file, etc.-- do this using the dev.set() command whose argument is just the device number. At any time, graphs will go to the "current" device, always the last one used or opened. Here's an example; as usual the # signs are comments ignored by S-Plus.

```
motif()                      # opens motif device, number 2 by default
plot(...)                    #   and plots on motif device 2
postscript(file="a.ps")      # open postscript file a.ps, now device 3
plot(..)                     #   and draw something there
dev.set(2)                   # switch back to the motif screen
plot(...)                    #   and a new plot there
dev.set(3)                   # switch back to the a.ps file
plot(...)                    #   add a plot there
motif()                      # open another motif screen
plot(...)                    #   and draw there
dev.set(2)                   # back to the first motif screen
dev.off(3)                   # closes device 3 -- here the postscript file
#    ...
#    etc
```

As usual, explore the on-line help file (search by keyword dev) for more information.

In S-Plus with a graph displayed, enter the command

```         locator(1)
```
then click with the left mouse key somewhere on the graph. You'll see the x and y coordinates returned.

You could also assign these to a 2 element vector, say, myvec, via

```         myvec<-locator(1)
```
Then click, then look at myvec by typing its name

Every variable, vector, etc. created in S-Plus is saved in a directory named MySwork -- use the unix command ls -a to list all "hidden" files and directories starting with the "." and that do not get listed when you use the basic ls. This will reside as a subdirectory of your home directory or, if you created an "sta114" (or other name) directory for work on this course, MySwork will probably be in there. This means S-Plus objects will still be there if you log out and later log in and begin S-Plus again.

Problem: Every time you run an S-Plus session, more stuff is dumped there, for possible use in future S-Plus sessions. This grows and clogs up your disk space, and acpub allocates a limited amount to each user. Clean up periodically by simply erasing everything in there: in unix: rm MySwork/* removes everything in there, but leaves the MySwork directory for further use. Another strategy is to have a different MySwork subdirectory for each project you're working on.

X-Win32 is a computer program, available from OIT for free, that allows your home computer to mimic that of a cluster computer. You run it on your PC, log into acpub and then can open x-windows, run emacs, S-Plus graphics, etc etc as if you were sitting in an acpub cluster. Telecommute to school.

You can download my slides and notes -- most in postscript and pdf formats. This is easy directly on the acpub unix machines via Netscape. Clicking on a link to a postscript document will launch the Ghostview viewer under Netscape. Clicking on a link to a pdf document will launch the Adobe Acrobat viewer under Netscape. On your own PC or Mac, use an existing postscript or pdf viewer set up a plug-in application for your browser. For postscript (which is preferred) you can easily install the Ghostview previewer for PCs and Macs; here's the info:

Ghostview and GSview under Netscape and I.E. on PCs:

• Downloading the Ghostview and GSview for PCs and Macs. This links to the Alladdin Ghostscript pages -- you'll need both AFPL Ghostscript (current version is 7.03) and the user interface portion "GSview" (current version is 4.1). Versions are available for Windows, Mac, Linux, and Unix users.
• You need to tell Netscape or Internet Explorer where to find the viewer when reading a postscript document. Instructions are available on-line for these web browsers at Netscape and IE.
• Now when you click on a postscript file on the web page your browser will launch the viewer and display the file. The first time it is used it will walk you through a configuration of GStools -- again, just hit "Next" or "Yes" or "OK" until completed.
• Ghostview is much nicer, and a lot faster, than Acrobat so you should try this first.

The more adventurous (computingwise) among you might, in later stages of the course, get interested in other software, including a package called BUGS that is developing as a general-purpose statistical modelling package, and is likely to be of interest as an application and research tool in future (as is S-Plus) for students continuing with statistics at a more advanced level.