The authors informed the journal that the merge of lab results and other survey data used in the paper resulted in an error regarding the identification codes. Results of the analyses were based on the data set in which this error occurred. Further analyses established the results reported in this manuscript and interpretation of the data are not correct.
Original conclusion: Lower levels of CSF IL-6 were associated with current depression and with future depression […].
Revised conclusion: Higher levels of CSF IL-6 and IL-8 were associated with current depression […].
#1 Convince researchers to adopt a reproducible research workflow
#2 Train new researchers who don’t have any other workflow
Scriptability \(\rightarrow\) R
Literate programming \(\rightarrow\) R Markdown
Version control \(\rightarrow\) Git / GitHub
“Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer- what to do, let us concentrate rather on explaining to human beings- what we want a computer to do.”
Go to gort.stat.duke.edu:8787
Log on with your Net ID and password
2 + 2
## [1] 4
factorial(20)
## [1] 2.432902e+18
x <- 2
x * 3
## [1] 6
intro_demo
Go to RStudio
Note for the future: Each course component you work on (an application exercise, a homework assignment, project, exam, etc.) should be its own repository, and should be fully contained in a folder inside the folder sta112
.
On GitHub (on the web) edit the README document and Commit
it with a message describing what you did.
As you work in teams you will run into merge conflicts, learning how to resolve them properly will be very important.
Fully reproducible reports
Simple markdown syntax for text
Code goes in chunks
Tip: Keep the Markdown cheat sheet handy, we’ll refer to it often as the course progresses.
[Live demo – follow along]
Visualize relationship between life expectancy and GDP per capita in 2007 in countries. Also make a plot
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data.
In the following exercises we’ll use dplyr
(for data wrangling) and ggplot2
(for visualization) packages.
Load these packages in your markdown file
library(dplyr)
library(ggplot2)
gapminder <- read.csv("https://stat.duke.edu/~mc301/data/gapminder.csv")
Start with the gapminder
dataset
Filter for cases (rows) where year is equal to 2007
Save this new subsetted dataset as gap07
gap07 <- gapminder %>%
filter(year == 2007)
Task: Visualize the relationship between gdpPercap
and lifeExp
.
qplot(x = gdpPercap, y = lifeExp, data = gap07)
Task: Color the points by continent.
qplot(x = gdpPercap, y = lifeExp, color = continent, data = gap07)
Stage
Commit (with a message)
Push
What if you wanted to now change your analysis
to subset for 1952
plot life expectancy (lifeExp
) vs. population (pop
)
gpdPercap
)
size = gpdPercap
to your plotting codeOnce you’re done, commit and push all your changes with a meaningful message.