Course learning objectives
- Learn to explore, visualize, and analyze data in a reproducible and shareable manner
- Gain experience in data wrangling and munging, exploratory data analysis, data visualization, statistical inference, and predictive modeling
- Work on problems and case studies inspired by and based on real-world questions and data
- Learn to effectively communicate results through written assignments and a final project
This class is about you doing as opposed to you just watching or listening. Lectures and labs will be interactive. My role as an instructor is to introduce you to new tools and techniques, but it is up to you to take them and make use of them.
Most lectures will contain a notes document to facilitate active learning. A repository on GitHub will be created for you that will contain an R Markdown file with an outline of the day's topics, code examples, questions, and space for you to add additional information. These course notes are graded based on a good-faith effort towards completion of all parts and contribute to your participation grade.
This course will involve a lot of group work. Functional and diverse teams will be constructed based on an introductory survey; these teams will not change throughout the semester. You will work in teams for most labs and the final project.
Course materials and software
All assignments and course materials may be found on this course website and our GitHub organization. An up-to-date course schedule will provide access to lecture and lab notes, assessments, and reading assignments to help you prepare for each class.
All books below are available for free online.
- Introductory Statistics with Randomization and Simulation
- OpenIntro Statistics, 4th edition
- R for Data Science
R / RStudio
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R we will primarily be using RStudio, an integrated development environment (IDE), via a browser-based interface. You will have access to RStudio through Docker Containers provided by Duke at https://vm-manage.oit.duke.edu/containers/rstudio.
If you would like to get R and RStudio on your own computer,
- first, download and install the most up-to-date version of R;
- second, download and install the most up-to-date version of RStudio. You must have R installed before installing RStudio.
We will be using Slack to facilitate communication and group work. Slack optimizes collaboration by providing a single location for messaging, tools, and files. The link to join the course's Slack Workspace can be found in Sakai. In addition to Slack, announcements may also be sent to the class by email, so please check your email regularly.
The following activities and assessments will help you successfully achieve the course learning objectives.
In homework, you will apply what you've learned during lecture and lab to complete data science tasks. You may discuss homework assignments with other students; however, homework should be completed and submitted individually.
The lowest homework grade will be dropped at the end of the semester. However, you may not drop both the last homework assignment and the last lab assignment.
In labs, you will apply the concepts discussed in lectures to various data analysis scenarios, with a focus on computation. You will work on lab assignments individually and in teams; all team members are expected to contribute equally to the completion of each assignment. You are expected to use the assigned repository on the course's GitHub page as the central platform for collaboration. Commits to this repository will be used as a metric of each team member's relative contribution for each lab. You will also be asked to evaluate your team members' performance periodically during the semester.
The lowest lab grade will be dropped at the end of the semester. However, you may not drop both the last homework assignment and the last lab assignment.
Exams are an opportunity to assess the knowledge and skills you’ve learned. They are to be completed individually. Each exam will include analysis and computational tasks related to the content discussed in lectures, application exercises, homework assignments, labs, and assigned readings. Details about the content and structure of the exams will be discussed later in the semester.
The purpose of the project is to apply what you’ve learned throughout the semester to analyze an interesting data-based research question. The project will be completed in teams and is due Wednesday, April 28 at 12:00pm ET.
You must complete all components of the final project to pass the course.
Participation and teamwork
Your participation grade is based primarily on the completion of course notes. Notes for each lecture period will be made available in a GitHub repository. Course notes for each lecture will be graded based on a good-faith-effort towards completion of all parts and are due one week following the lecture date (lecture notes for a Wednesday lecture are due the following Wednesday at 11:59pm ET).
Periodic team feedback and small discussion assignments will also contribute to your participation grade.
The statistical experience is an opportunity for you to relate data science principles to your life and society. This assignment is a chance for you to be creative and reflect on your experiences in class. The exact format will be announced later but may include course discussions, outside readings, or writing short blog posts.
Regrade requests should be submitted through the regrade request form on Gradescope. Requests for a regrade must be made within a week of when the assignment is returned; requests submitted later will not be considered. You should only submit a regrade request if there is an error in the grade calculation or a correct answer was mistakenly marked as incorrect. You should not submit a regrade to dispute the number of points deducted for an incorrect response. By submitting a regrade request, your entire assignment may be regraded and you may potentially lose points.
Due to the time-consuming nature of responding to regrade requests, you must attend office hours and ask a member of the teaching team about the feedback before submitting the request. When you submit a request, indicate which member of the teaching team you spoke with. Grades can only be changed by the instructor. Teaching Assistants cannot change grades on returned assignments.
Late work policy
There is a 24 hour grace period after the due date of homework assignments and lab assignments, where they can be submitted with no penalty. Please use this policy as little as possible. After the grace period, there is a 20% penalty for each day the assignment is late.
Late work will not be accepted for all other assessments (exams, final project, participation, and statistical experience).
The following table presents the contribution of each component to a student's final grade:
|Participation and teamwork||2.5%|
The following table presents the mapping from course percentage to traditional letter grade:
|Minimum cutoff (%)||Grade|
Students who miss a class due to a scheduled varsity trip, religious holiday, or short-term illness should fill out an online NOVAP, Religious Observance Notification, or Incapacitation Form, respectively. These excused absences do not excuse you from assigned homework. It will still be your responsibility to submit relevant assignments in accordance with the deadline.
If you have a personal or family emergency or health condition that affects your ability to participate in class, you should contact your academic dean's office. More information about this procedure may be found on the Personal Emergencies page or provided by your academic dean.
Exam dates cannot be changed and no make-up exams will be given. If you must miss an exam, your absence must be officially excused before the exam due date. If your absence is excused, the missing exam grade will be imputed at the end of the semester based on your performance on other relevant course assignments.
Where to find help
Many questions are most effectively answered in-person, so office hours are a valuable resource. Please make use of them! A list of instructor and TA office hours can be found on the course website. Office hours are accessed through Zoom links available on Sakai.
Outside of class and office hours, any general questions about course content or assignments should be posted on Slack since there are likely other students with the same questions. The instructor and TAs will monitor Slack and reply quickly to your public or private message. Feel free also to answer posted questions and ask follow-up questions.
Academic resource center
Sometimes you may need help with the class that is beyond what can be provided by the teaching team. In that instance, I encourage you to visit the Academic Resource Center.
The Academic Resource Center (ARC) offers free services to all students during their undergraduate careers at Duke. Services include Learning Consultations, Peer Tutoring and Study Groups, ADHD/LD Coaching, Outreach Workshops, and more. Because learning is a process unique to every individual, they work with each student to discover and develop their own academic strategy for success at Duke. Contact the ARC to schedule an appointment. Undergraduates in any year, studying any discipline can benefit! Contact ARC@duke.edu, 919-684-5917, 211 Academic Advising Center Building, East Campus – behind Marketplace.
Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and nonacademic endeavors, and to protect and promote a culture of integrity.
To uphold the Duke Community Standard:
- I will not lie, cheat, or steal in my academic endeavors;
- I will conduct myself honorably in all my endeavors; and
- I will act if the Standard is compromised.
Please review the standards here.
Cheating on exams and quizzes, plagiarism on homework assignments and projects, lying about an illness or absence, and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the Duke Community Standard, and will not be tolerated. Such incidences will result in a grade of 0 for all parties involved as well as being reported to the University Judicial Board. Additionally, there will be a penalty to your final course grade.
It is your responsibility to carefully read each assignment so you know what is permitted and what is not. If you are ever unsure what is allowed, please ask the instructor or one of the TAs.
Similar reproducible examples exist online that will help you answer many of the questions posed on notes, labs, homework assignments, and exams. Use of these resources is allowed unless it is written explicitly on the assignment.
You must always cite any code you copy or use as inspiration from outside sources. Copied code without citation is plagiarism.
Discussion with other students and groups is allowed unless it is written explicitly on the assignment. However, you may not directly share code or write-up with other students.
In this course, we will strive to create a learning environment that is welcoming to all students and that is in alignment with Duke’s Commitment to Diversity and Inclusion. If there is any aspect of the class that is not welcoming or accessible to you, please let me know immediately. Additionally, if you are experiencing something outside of class that is affecting your performance in the course, please feel free to talk with me and/or your academic dean.
Students with disabilities who believe they may need accommodations in this class are encouraged to contact the Student Disability Access Office at (919) 668-1267 as soon as possible to better ensure that such accommodations can be made.