Statistical programming, computation using selected languages and environments (Python, R, Matlab, and/or C/C++) and interfaces with custom code development for statistical models. Best practices and software development for reproducible results, selecting topics from: use of markup languages, understanding data structures, design of graphics, object oriented programming, vectorized code, scoping, documenting code, profiling and debugging, building modular code, and version control - all in contexts of specific applied statistical analyses.
In-person: Old Chemistry 116, Wed and Fri 10:15am - 11:30am
Virtual: Zoom link in Sakai, Wed and Fri 10:15am - 11:30am
- Virtual: Zoom link in Sakai, Mon 8:30am - 9:45am
- Monday, August 17 - Fall semester classes begin; Drop/Add continues
- Friday, August 28 - Drop/Add ends
- Monday, September 7 - Labor Day. Classes in session
- Monday, November 16 - Graduate and undergraduate classes end
This class is about you doing as opposed to you just watching or listening. In-person and Zoom, lectures and labs will be interactive. My role as instructor is to introduce you to new tools and techniques, but it is up to you to take them and make use of them. If you only read the code and never run it or experiment with it, then you will not get much out of this course. Most slides will include supplemental resources for you to delve deeper into the topic of discussion. Occasionally, there will be pre-class readings in order to enrich our lecture and lab experiences.
This course will involve a lot of group work. Functional and diverse teams will initially be constructed based on the first-day class survey; these teams will change throughout the semester. You will work in teams for some labs, some homework assignments, and the final project.
There will be six homework assignments. Some assignments will be done individually and some will be done in groups. For team based assignments, all team members are expected to contribute equally to the completion of each assignment. It is also imperative that each team member has read, run, and understood all code in the final submission. An intragroup peer evaluation will be conducted to ensure equal effort and commitment.
Students are expected to make use of the provided private git repository on the course's GitHub page as their central collaborative platform. Commits to this repository will be used as one metric of each team member's relative contribution for each homework assignment.
There will be two take home exams that are to be completed individually. Details on what is and what is not permitted for each exam will be provided at the time of the exam.
There will be a final project that is to be completed in a team of your choice. It will include a written reproducible report along with your code. You must complete the project in order to pass this course. The project is due November 23, 2020.
Your final grade will be computed based on the following weights.
- Homework: 45%
- Exam 1: 20%
- Exam 2: 20%
- Final Project: 15%
The exact ranges for letter grades may be curved and cutoffs will be determined at the end of the semester. However, if you have a cumulative numerical average of 90 - 100, you are guaranteed at least an A-, 80 - 89 at least a B-, 70 - 79 at least a C-, and so on.
There are no required textbooks for this course; the following are recommended textbooks for this course and your future self.
- Advanced R
Wickham, H. (2019). Chapman and Hall/CRC.
- R for Data Science
Grolemund, G., & Wickham, H. (2017). O'Reilly.
- R packages
Wickham, H. (2015). O'Reilly.
R / RStudio
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R we will primarily be using RStudio, an interactive development environment (IDE), via a browser based interface or on your own computer. You may use RStudio Pro on the Department's servers.
To access the DSS RStudio servers, first connect to Duke's network via VPN. Navigate to one of:
To get R / RStudio locally:
When you're writing code, it is nice to have a text editor that is optimized for writing code. There is a huge variety of options out there, if you do not already have a preferred editor try a few and see which one works best for you.
- Vim / Emacs - old school Unix console based editors, they have a steep learning curve but are incredibly powerful
- Nano - another Unix console editor, easier learning curve but with much less power
- Sublime Text - cross-platform GUI text editor with a robust plug-in ecosystem
- Atom - similar to Sublime Text but better (in my opinion)
Git is a state-of-the-art version control system. It lets you track who made changes to what and when, and it has options for easily updating a shared or public version of your code on GitHub.
- OSX - install Git for Mac by downloading and running the installer or install homebrew and use it to install git via brew install git.
- Unix / Linux - you should be able to install git via your preferred package manager (if it is not already installed).
- Windows - install Git for Windows by downloading and running the git for windows installer. This will provide you with git, the bash shell, and ssh in windows.
Unix shell(s) / ssh
We will be doing much of the work in the class on remote Linux systems, primarily we will be interacting with these machines through a remote terminal and a shell. Using a shell gives you more power to do more tasks more efficiently with your computer.
- OSX / Unix / Linux - these tools should already be installed and you should be able to access your shell through the Terminal application (name may vary slightly depending on your OS).
- Windows - there are several ways to install bash or a bash-like shell, the preferred method is to install the Git for Windows package as detailed above.
We will be using Slack to facilitate communication and group work. Slack provides a single location for messaging, tools, and files - allowing for efficient collaboration.
Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and non-academic endeavors, and to protect and promote a culture of integrity. Cheating on exams and quizzes, plagiarism on homework assignments and projects, lying about an illness or absence and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the Duke Community Standard, and will not be tolerated. Such incidences will result in a 0 grade for all parties involved as well as being reported to the University Judicial Board. Additionally, there may be penalties to your final class grade. Please review Duke’s Standards of Conduct.
Students with disabilities who believe they may need accommodations in this class are encouraged to contact the Student Disability Access Office at (919) 668-1267 as soon as possible to better ensure that such accommodations can be made.
In an emergency, there are several ways that the University will contact you. Two are detailed below. Campus emergency procedures are described here: http://emergency.duke.edu
Text Messaging: An alert message may be sent to the mobile devices of Duke community members who register for a new text messaging system. Sign up for DukeALERT text messages or learn more about text messaging at Duke.
LiveSafe Mobile App: Notifications may be sent through the LiveSafe Mobile app to notify members of the Duke community of emergency situations. The free mobile app, available through the Apple App Store and Android App Store, offers real-time, two-way communication between Duke community members and the Duke University Police Department.