A practical introduction to statistical programming focusing on the R programming language. Students will engage with the programming challenges inherent in the various stages of modern statistical analyses including everything from data collection/aggregation/cleaning to visualization and exploratory analysis to statistical model building and evaluation. This course places an emphasis on modern approaches/best practices for programming including: source control, collaborative coding, literate and reproducible programming, and distributed and multicore computing.
Zoom links for all meetings and their recordings can be found in Sakai.
In-person: Old Chemistry 116, Wed and Fri 10:15am - 11:30am
Virtual: Wed and Fri 10:15am - 11:30am
- Virtual: Mon 10:15am - 11:30am
|Wednesday, January 20||Spring semester begins; Drop/Add continues|
|Tuesday, February 2||Drop/Add ends|
|Wednesday, March 10||No class held|
|Wednesday, March 24||Last day to withdraw with W from class (undergraduates only)|
|Monday, April 12||No class held|
|Friday, April 23||Graduate and undergraduate classes end|
This class is about you doing as opposed to you just watching or listening. In-person and Zoom lectures and labs will be interactive. My role as an instructor is to introduce you to new tools and techniques, but it is up to you to take them and make use of them. If you only read the code and never run it or experiment with it, then you will not get much out of this course. Most lectures will include supplemental resources for you to delve deeper into the topic of discussion. Occasionally, there will be pre-class readings and activities to enrich our lecture and lab experiences.
This course will involve a lot of group work. Functional and diverse teams will initially be constructed based on an introductory survey; these teams will change throughout the semester. You will work in teams for some labs, some homework assignments, and the final project.
Assessments and grades
There will be six homework assignments. Some assignments will be individual, others team-based. For team-based assignments, all team members are expected to contribute equally. It is imperative that each team member has read, run, and understood all code in the files being submitted. An intragroup peer evaluation will be conducted to ensure equal effort and commitment.
Students are expected to make use of the provided private git repository as their central collaborative platform. Commits to this repository will be used as one metric of each team member's contribution to a given assignment.
There will be two take-home exams that are to be completed individually. Details on what is and what is not permitted for each exam will be provided at the time of the exam.
There will be a final team-based project. It will include a written reproducible report that includes your code. You must complete the project in order to pass this course. The project is due April 28, 2021.
Your final grade will be computed based on the following weights.
- Homework: 45%
- Exam 1: 20%
- Exam 2: 20%
- Final Project: 15%
The exact ranges for letter grades may be curved and cutoffs will be determined at the end of the semester. However, if you have a cumulative numerical average of 90 - 100%, you are guaranteed at least an A-, 80 - 89% at least a B-, 70 - 79% at least a C-, and so on.
There are no required textbooks for this course. However, the following are recommended for this course and your future self.
- Advanced R
Wickham, H. (2021). Chapman and Hall/CRC.
- R for Data Science
Grolemund, G., & Wickham, H. (2021). O'Reilly.
- R packages
Wickham, H. (2021). O'Reilly.
R / RStudio
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R we will primarily be using RStudio, an integrated development environment (IDE), via a browser-based interface, or on your own computer. You will have access to RStudio Pro on the Statistical Science Department's servers.
To access the department RStudio Pro servers, first, connect to Duke's network via VPN. Then, navigate to one of:
If you would like to get R and RStudio on your own computer,
- first, download and install the most up-to-date version of R;
- second, download and install the most up-to-date version of RStudio. You must have R installed before installing RStudio.
It is nice to have a text editor that is optimized for writing code. There is a huge variety of options out there, if you do not already have a preferred editor, try a few and see which one works best for you.
- Vim / Emacs - old school Unix console based editors, they have a steep learning curve but are incredibly powerful
- Nano - another Unix console editor, easier learning curve but with much less power
- Sublime Text - cross-platform GUI text editor with a robust plug-in ecosystem
- Atom - similar to Sublime Text
Git is a state-of-the-art version control system. It lets you track who made changes to what and when, and it has options for easily updating a shared or public version of your code on GitHub. Chapter 6 in Happy Git and GitHub for the useR has instructions to install git for your specific operating system. If you plan to use the RStudio Pro servers offered by the department, git is already installed.
Unix shell(s) / ssh
We will be doing a lot of work in the class on remote Linux systems; primarily we will be interacting with these machines through a remote terminal and a shell. Using a shell gives you more power to do more tasks more efficiently with your computer.
- OSX / Unix / Linux - these tools should already be installed and you should be able to access your shell through the Terminal application (name may vary slightly depending on your OS).
- Windows - there are several ways to install bash or a bash-like shell. The preferred method is to install the Git for Windows package as detailed in Chapter 6 of Happy Git and GitHub for the useR.
We will be using Slack to facilitate communication and group work. Slack optimizes collaboration by providing a single location for messaging, tools, and files. The link to join the course's Slack Workspace can be found in Sakai. In addition to Slack, announcements may also be sent to the class by email, so please check your email regularly.
Where to find help
Many questions are most effectively answered in-person, so office hours are a valuable resource. Please make use of them. A list of instructor and TA office hours can be found on the course website. Office hours are accessed through Zoom links available on Sakai.
Outside of class and office hours, any general questions about course content or assignments should be posted on Slack. The instructors and TAs will monitor Slack and reply quickly to your public or private messages. Also, feel free to answer posted questions and ask follow-up questions.
Academic resource center
Sometimes you may need help with the class that is beyond what can be provided by the teaching team. In that instance, I encourage you to visit the Academic Resource Center.
The Academic Resource Center (ARC) offers free services to all students during their undergraduate careers at Duke. Services include Learning Consultations, Peer Tutoring and Study Groups, ADHD/LD Coaching, Outreach Workshops, and more. Because learning is a process unique to every individual, they work with each student to discover and develop their own academic strategy for success at Duke. Contact the ARC to schedule an appointment. Undergraduates in any year, studying any discipline can benefit! Contact ARC@duke.edu, 919-684-5917, 211 Academic Advising Center Building, East Campus – behind Marketplace.
Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and nonacademic endeavors, and to protect and promote a culture of integrity.
To uphold the Duke Community Standard:
- I will not lie, cheat, or steal in my academic endeavors;
- I will conduct myself honorably in all my endeavors; and
- I will act if the Standard is compromised.
Cheating on exams and quizzes, plagiarism on homework assignments and projects, lying about an illness or absence, and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the Duke Community Standard, and will not be tolerated. Such incidences will result in a grade of 0 for all parties involved as well as being reported to the University Judicial Board. Additionally, there will be a penalty to your final course grade.
It is your responsibility to carefully read each assignment so you know what is permitted and what is not. If you are ever unsure what is allowed, please ask the instructor or one of the TAs.
Similar reproducible examples exist online that will help you answer many of the questions posed on notes, labs, homework assignments, and exams. Use of these resources is allowed unless it is written explicitly on the assignment.
You must always cite any code you copy or use as inspiration. Copied code without citation is plagiarism.
Discussion with other students and groups is allowed unless it is written explicitly on the assignment. However, you may not directly share code or write-up with other students.
In this course, we will strive to create a learning environment that is welcoming to all students and that is in alignment with Duke’s Commitment to Diversity and Inclusion. If there is any aspect of the class that is not welcoming or accessible to you, please let me know immediately. Additionally, if you are experiencing something outside of class that is affecting your performance in the course, please feel free to talk with me and/or your academic dean.
Students with disabilities who believe they may need accommodations in this class are encouraged to contact the Student Disability Access Office at (919) 668-1267 as soon as possible to better ensure that such accommodations can be made.