Computation: getting started

Install R & RStudio

RStudio

Layout

  • Console (where the action happens):
    • Everytime you launch RStudio, it will have the same text at the top of the console telling you the version of R that you’re running.
    • Below that information is the prompt. As its name suggests, this prompt is really a request, a request for a command.
  • The panel in the upper right contains your workspace as well as a history of the commands that you’ve previously entered.

  • Any plots that you generate will show up in the viewer tabl of the panel in the lower right corner.

  • This is also where you can access your files, view/install packages, view help.

Setting a directory

  • If you haven’t yet done so, create a folder for this course, and within this directory create another directory called application exercises.

  • In the Files pane of your RStudio window, browse to this directory.

  • Click on More, and then Set as working directory.

  • This action will prompt a line of code in your Console using the setwd function.

setwd([some_path]/Sta112FS/application_exercises)

RStudio allows you to complete certain routine tasks using point-and-click, but will often also show you the R code associated with that action.

Application exercise: Birth rates - boys vs. girls

Birth rates - boys vs. girls

  • Say someone in your family is pregnant, and it’s too early to find out the sex of the baby. What is the probability she is pregnant with a girl?

  • What type of data would you use to answer the question what percent of births are girls?

Historic exploration

Dr. John Arbuthnot, an 18th century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the baptism records for children born in London for every year from 1629 to 1710.

Load the data frame:

source("http://www.openintro.org/stat/data/arbuthnot.R")

Dimensions and names

View the dimensions of this data frame:

dim(arbuthnot)
## [1] 82  3

View the dimensions of this data frame:

names(arbuthnot)
## [1] "year"  "boys"  "girls"

Reproducible data analysis

Creating a reproducible data analysis

  • We will be using a markdown language, R Markdown, to document your data analysis.
    • Complete all components of the analysis (code + output + narrative) entirely in RStudio
    • Ensure reproducibility of your analysis and results
  • To open a new R Markdown document document click on the green + button on the top left corner of your RStudio window, and then choose R Markdown. Choose Document, and then fill in the Title and Author information. Choose HTML as the output format.

  • In this document R code goes in “chunks”. A quick reference guide for the markdown language can be accessed via the ? button.

  • Independent environment than the console – all steps of the data analysis must be included for the file to compile properly (starting with loading the dataset).

… back to the application exercise

Look back in history

Mysterious data point