Class: Tuesdays and Thursdays, 11:45 am - 1:00 pm, Link 065 (Classroom 2)
Class attendance is a firm expectation; frequent absences or tardiness will be considered a legitimate cause for grade reduction.
Exact ranges for letter grades will be curved and cutoffs will be determined after the final exam.
The more evidence there is that the class has mastered the material, the more generous the curve will be.
Interactive
Learn-by-doing
Bring your laptop to class every day
Short survey to gage your previous exposure to material relevant to the course.
Teams of 3-5 students for in-class activities, homeworks, and project.
Larger computational tasks towards the end of the semester
Present results / work product to the class
Collaborative / fully reproducible work
Synthesis of what you’ve been taught, but should focus on a specific area
Two take home midterm exams that you are expected to complete individually.
Complete a number of computational / analysis tasks that cover the breadth of the material presented in the class.
Duke Community Standard:
I will not lie, cheat, or steal in my academic endeavors;
I will conduct myself honorably in all my endeavors; and
I will act if the Standard is compromised.
A huge amount of code is available on the web with solutions to any number of problems.
Unless I explicitly tell you not to, you may use these resource. In general, the course’s policy is that you may make use of these resources (e.g. StackOverflow) but you must explicitly cite where any outside code was obtained.
Any recycled coded that is discovered and is not explicitly cited will be treated as plagiarism.
The one exception to this rule is that you may not directly share code with another team or student in this class, you are welcome to discuss the problems together and ask for advice (unless explicitly told not to), but you may not send or make use of code from any one else in this class.
Students who miss graded work due to a scheduled varsity trip, religious holiday or short-term illness should fill out an online NOVAP, RHoliday or short-term illness form respectively.
If you cannot complete an assignment on the due date due to a short-term illness, you have until noon the following day to complete it at no penalty, then the regular late work policy kicks in.
If you are faced with a personal or family emergency or a long-range or chronic health condition that interferes with your ability to attend or complete classes, you should contact your academic dean’s office. See more information on policies surrounding these conditions at https://trinity.duke.edu/undergraduate/academic-policies/personal-emergencies. Your academic dean can also provide more information.
late, but same day: -10%
late, next day: -20%
2 days or later: no credit
Please refrain from texting or using your computer for anything other than coursework during class
You must be in class on a day when you’re scheduled to present, there are no make ups for presentations
Regrade requests must be made within 3 days of when the assignment is returned, and must be submitted in writing
Use of disallowed materials during the take home exam will not be tolerated
The authors informed the journal that the merge of lab results and other survey data used in the paper resulted in an error regarding the identification codes. Results of the analyses were based on the data set in which this error occurred. Further analyses established the results reported in this manuscript and interpretation of the data are not correct.
Original conclusion: Lower levels of CSF IL-6 were associated with current depression and with future depression […].
Revised conclusion: Higher levels of CSF IL-6 and IL-8 were associated with current depression […].
#1 Convince researchers to adopt a reproducible research workflow
#2 Train new researchers who don’t have any other workflow
Scriptability \(\rightarrow\) R
Literate programming \(\rightarrow\) R Markdown
Version control \(\rightarrow\) Git / GitHub
“Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer- what to do, let us concentrate rather on explaining to human beings- what we want a computer to do.”
Log on with your Net ID and password
2 + 2
## [1] 4
factorial(20)
## [1] 2.432902e+18
x = 2
x * 3
## [1] 6
Fully reproducible reports
Simple markdown syntax for text
Code goes in chunks
Tip: Keep the Markdown cheat sheet handy, we’ll refer to it often as the course progresses.
[Live demo – follow along]
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data.
In the following exercises we’ll use the dplyr
(for data wrangling) and ggplot2
(for visualization) packages.
To use these packages, we must first load in our markdown file
library(dplyr)
library(ggplot2)
gapminder = read.csv("https://stat.duke.edu/~mc301/data/gapminder.csv")
Start with the gapminder
dataset
Filter for cases (rows) where year is equal to 2007
Save this new subsetted dataset as gap07
gap07 = filter(gapminder, year == 2007)
Task: Visualize the relationship between gdpPercap
and lifeExp
.
ggplot(data = gap07, aes(x = gdpPercap, y = lifeExp)) + geom_point()
Task: Color the points by continent.
ggplot(data = gap07, aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point()
What if you wanted to now change your analysis
to subset for 1952
plot life expectancy (lifeExp
) vs. population (pop
)
gpdPercap
)
size = gpdPercap
to your plotting codeSign-up for a github account