A practical introduction to statistical programming focusing on the R programming language. Students will engage with the programming challenges inherent in the various stages of modern statistical analyses including everything from data collection/aggregation/cleaning to visualization and exploratory analysis to statistical model building and evaluation. This course places an emphasis on modern approaches / best practices for programming including: source control, collaborative coding, literate and reproducible programming, and distributed and multicore computing.
There is no set meeting time for lectures. However, there will be pre-recorded video lectures that will be published on Warpwire; these videos will provide narration to slides and code examples.
There will be a live lab session offered. A TA will be available to answer questions as you work through the lab. Use this opportunity to work on your lab assignment with your teammates, ask questions about homework, ask questions on the course material.
- Wednesday, May 13 – Term 1 semester classes begin, Drop/Add continues
- Friday, May 15 - Drop/Add ends
- Monday, May 25 - Memorial Day holiday. No classes are held.
- Wednesday, June 10 - Last day to withdraw with W
- Monday, June 22 – Term 1 classes end
- Thursday, June 25 - Term 1 final examinations end
This class is about you doing as opposed to you just watching or listening. Video lectures and labs will be interactive. My role as instructor is to introduce you to new tools and techniques, but it is up to you to take them and make use of them. If you only read the code and never run it or experiment with it, then you will not get much out of this course. Most slides will include supplemental resources for you to delve deeper in the topic of discussion. Occasionally, there will be readings assigned.
To be successful in this course as an undergraduate student, you will need to commit up to 20 hours per week of your time. If you are a graduate student, you need to commit up to 25 hours per week of your time. In this online summer version, we will cover the same topics and at the same depth as what is covered in the typical 15-week semester. I have detailed a course schedule that you should follow in order to be successful in this course.
This course will involve a lot of group work. Functional and diverse teams will be constructed based on the first-day class survey; these teams will not change throughout the semester (barring extraordinary circumstances). You will work in teams for some labs and on some homework assignments.
There will be four homework assignments. Some assignments will be done individually and some will be done in groups. For team based assignments, all team members are expected to contribute equally to the completion of each assignment. It is also imperative that each team member has read, run, and understood all code in the final submission. An intragroup peer evaluation will be conducted to ensure equal effort and commitment.
Students are expected to make use of the provided private git repository on the course's GitHub page as their central collaborative platform. Commits to this repository will be used as one metric of each team member's relative contribution for each homework assignment.
There will be a single exam that is to be completed individually. Details on what is and what is not permitted for the exam will be provided.
There will be an individual project. It will include a written reproducible report along with your code. You must complete the project in order to pass this course. Details of the final project will be provided as the course progresses.
There will be five lab assignments. Some assignments will be done individually and some will be done in groups. You will have 24 hours after your scheduled lab to submit your work.
Your final grade will be computed according to the following weights.
- Homework: 40%
- Exam: 25%
- Project: 25%
- Labs: 10%
The exact ranges for letter grades may be curved and cutoffs will be determined at the end of the semester. However, if you have a cumulative numerical average of 90 - 100, you are guaranteed at least an A-, 80 - 89 at least a B-, 70 - 79 at least a C-, and so on.
There are no required textbooks for this course; the following are recommended textbooks for this course and your future self.
- Advanced R
Wickham, H. (2019). Chapman and Hall/CRC.
- R for Data Science
Grolemund, G., & Wickham, H. (2017). O'Reilly.
- R packages
Wickham, H. (2015). O'Reilly.
R / RStudio
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R we will primarily be using RStudio, an interactive development environment (IDE), via a browser based interface or on your own computer. You may use RStudio on the Department servers.
To access the DSS RStudio servers, use
AnyConnect to connect with Duke's VPN.
To get R / RStudio locally:
When you're writing code, it is nice to have a text editor that is optimized for writing code. There is a huge variety of options out there, if you do not already have a preferred editor try a few and see which one works best for you.
- Vim / Emacs - old school Unix console based editors, they have a steep learning curve but are incredibly powerful
- Nano - another Unix console editor, easier learning curve but with much less power
- Sublime Text - cross-platform GUI text editor with a robust plug-in ecosystem
Git is a state-of-the-art version control system. It lets you track who made changes to what and when, and it has options for easily updating a shared or public version of your code on GitHub.
- OSX - install Git for Mac by downloading and running the installer or install homebrew and use it to install git via brew install git.
- Unix / Linux - you should be able to install git via your preferred package manager (if it is not already installed).
- Windows - install Git for Windows by downloading and running the git for windows installer. This will provide you with git, the bash shell, and ssh in windows.
Unix shell(s) / ssh
We will be doing much of the work in the class on remote Linux systems, primarily we will be interacting with these machines through a remote terminal and a shell. Using a shell gives you more power to do more tasks more efficiently with your computer.
- OSX / Unix / Linux - these tools should already be installed and you should be able to access your shell through the Terminal application (name may vary slightly depending on your OS).
- Windows - there are several ways to install bash or a bash-like shell, the preferred method is to install the Git for Windows package as detailed above.
We will be using Slack to facilitate communication and group work. Slack provides a single location for messaging, tools, and files - allowing for efficient collaboration.
Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and nonacademic endeavors, and to protect and promote a culture of integrity.
To uphold the Duke Community Standard:
- I will not lie, cheat, or steal in my academic endeavors;
- I will conduct myself honorably in all my endeavors; and
- I will act if the Standard is compromised.
Please review the standards here.
Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and non-academic endeavors, and to protect and promote a culture of integrity. Cheating on exams and quizzes, plagiarism on homework assignments and projects, lying about an illness or absence and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the Duke Community Standard, and will not be tolerated. Such incidences will result in a 0 grade for all parties involved as well as being reported to the University Judicial Board. Additionally, there may be penalties to your final class grade. Please review Duke’s Standards of Conduct.
Students with disabilities who believe they may need accommodations in this class are encouraged to contact the Student Disability Access Office at (919) 668-1267 as soon as possible to better ensure that such accommodations can be made.
In an emergency, there are several ways that the University will contact you. Two are detailed below. Campus emergency procedures are described here: http://emergency.duke.edu
Text Messaging: An alert message may be sent to the mobile devices of Duke community members who register for a new text messaging system. Sign up for DukeALERT text messages or learn more about text messaging at Duke.
LiveSafe Mobile App: Notifications may be sent through the LiveSafe Mobile app to notify members of the Duke community of emergency situations. The free mobile app, available through the Apple App Store and Android App Store, offers real-time, two-way communication between Duke community members and the Duke University Police Department.