Schedule
Date | Lecture | Readings | Notes | |||
---|---|---|---|---|---|---|
0 | Thu, Jan 10 | Introduction | ||||
1 | Tue, Jan 15 | Logic in R | OIT Container | |||
Thu, Jan 17 | Data types in R | |||||
Fri, Jan 18 | Using git & github | HW1 out - due 1/24 by 11:59 pm | ||||
2 | Tue, Jan 22 | Data Structures & S3 | ||||
Thu, Jan 24 | Subsetting | |||||
Thu, Jan 24 | Branches & pull requests | |||||
3 | Tue, Jan 29 | dplyr | ||||
Thu, Jan 31 | Tidy data | |||||
4 | Tue, Feb 5 | purrr | ||||
Thu, Feb 7 | More purrr | |||||
5 | Tue, Feb 12 | ggplot2 | ||||
Thu, Feb 14 | Visual design | Angela and Eric's slides | ||||
Fri, Feb 15 | HW4 out - due 2/22 by 11:59 pm, Article PDF | |||||
6 | Tue, Feb 19 | ggplot addons | ||||
Thu, Feb 21 | Regular expressions | |||||
7 | Tue, Feb 26 | Web scraping | ||||
Thu, Feb 28 | Web APIs | |||||
8 | Tue, Mar 5 | Make | ||||
Thu, Mar 7 | Class Canceled - Scraping LQ | |||||
9 | Tue, Mar 12 | No class - Spring recess | ||||
Thu, Mar 14 | No class - Spring recess | |||||
10 | Tue, Mar 19 | Shiny | ||||
Thu, Mar 21 | Reactive Data | |||||
11 | Tue, Mar 26 | Profiling & Parallelization | ||||
Thu, Mar 28 | Databases & sql | |||||
12 | Tue, Apr 2 | Bigish data | ||||
Thu, Apr 4 | SQL and dplyr | |||||
13 | Tue, Apr 9 | Spatial data | ||||
Thu, Apr 11 | Spatial data + Modeling | |||||
Fri, Apr 12 | Homework 7 - Scoring | |||||
14 | Tue, Apr 16 | Class canceled | ||||
Thu, Apr 18 | Spark & SparklyR | |||||
Fri, Apr 19 | Midterm 2 out | due May 3rd, by 11:59 pm | ||||
15 | Tue, Apr 23 | Midterm 2 / Project Session | ||||
16 | Fri, May 3 | Midterm 2 & Project due by 11:59 pm |
Syllabus
Lectures & Lab:
The goal of both the lectures and the labs is for them to be as interactive as possible. My role as instructor is to introduce you new tools and techniques, but it is up to you to take them and make use of them. Programming is a skill that is best learned by doing, so as much as possible you will be working on a variety of tasks and activities throughout each lecture / lab. Attendance will not be taken during class but you are expected to attend all lecture and lab sessions and meaningfully contribute to in-class exercises and homework assignments.
Classroom:
Perkins Link 071 (Classroom 5),
- Lecture - Tuesdays & Thursdays 01:25 pm - 02:40 pm
- Lab - Friday 01:25 pm - 02:40 pm
Important Dates:
- Monday, January 21 - Martin Luther King, Jr. Day holiday
- Wednesday, January 23 - Drop/Add ends
- Monday, March 11 to Friday, March 15 - Spring recess
- Wednesday, March 27 - Last day to withdraw
- Friday, May 3 - Final Exam period, 7:00 - 10:00 pm
Teams:
For all of the team based assignments in this class you will be randomly assigned to teams of 3 or 4 students - these teams will change after each assignment and will only be changed due to extrodinary circumstances. You will work in these teams during class and on the homework assignment. For team based assignments, all team members are expected to contribute equally to the completion of each assignment and you will be asked to evaluate your team members after each assignment is due. Failure to adequately contribute to an assignment will result in a penalty to your score relative to the team's overall score.
Homework:
Beyond the in class activities, you will be assigned larger programming tasks throughout the semester (roughly every other week). These assignments will be completed either individually or collaboratively in a team.
Students are expected to make use of the provided git repository on the course's github page as their central collaborative platform. Commits to this repository will be used as a metric (one of several) of each team member's relative contribution for each homework.
Final Project:
You will form your own team of 3-5 students and will be responsible for the completion of an open ended final project for this course, the goal of which is to tackle an "interesting" problem using the tools and techniques covered in this class. Additional details on the project will be provided as the course progresses. You will give a 15 minute presentation on your final project in class. You must turn in a final project in order to pass this course.
Exams:
There will be a two take home midterms that you are expected to complete individually. Each exam will ask you to complete a number of small programming tasks related to the material presented in the class. The exams will be written to be between 2-5 hours of work. The exact structure and content of the exams will be discussed in more detail before they are assigned. You must complete *both* exams in order to pass this class. In the extremely unlikely case that you are excused from one of the exams, then your score on other exam will be used as the basis for your overall exam score.
Course Announcements:
I will regularly send course announcements by email, make sure to check your email daily. Email is the easiest way to reach me outside of class, note that it is much more efficient to answer most questions in person.
Academic integrity:
Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and non-academic endeavors, and to protect and promote a culture of integrity. Cheating on exams or plagiarism on homework assignments, lying about an illness or absence and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the Duke Community Standard, and will not be tolerated. Such incidences will result in a 0 grade for all parties involved. Additionally, there may be penalties to your final class grade along with being reported to the Undergraduate Conduct Board.
Please review the Academic Dishonesty policies here.
A note on sharing / reusing code - I am well aware that a huge volume of code is available on the web to solve any number of problems. Unless I explicitly tell you not to use something the course's policy is that you may make use of any online resources (e.g. StackOverflow) but you must explicitly cite where you obtained any code you directly use (or use as inspiration). Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. The one exception to this rule is that you may not directly share code with another team in this class, you are welcome to discuss the problems together and ask for advice, but you may not send or make use of code from another team.
Excused Absences:
Students who miss a class due to a scheduled varsity trip, religious holiday or short-term illness should fill out an online NOVAP, RHoliday or short-term illness form respectively. Note that these excused absences do not excuse you from assigned homework, it is your responsibility to make alternative arrangements to turn in any assignments in a timely fashion.
Those with a personal emergency or bereavement should speak with your director of graduate studies or your academic dean.
Late work policy:
- late, but same day: -10%
- late, next day: -20%
- 2 days or later: no credit
Grading:
Your final grade will be comprised of the following.
- Homework: 50%
- Midterms: 40%
- Final Project: 10%
The exact ranges for letter grades will be curved and cutoffs will be determined at the end of the semester. The more evidence there is that the class has mastered the material, the more generous the curve will be.
Textbooks
There are no required textbooks for this course, the following textbooks are recommended for supplementary and reference purposes.
- Advanced R - Wickham - Chapman and Hall/CRC, 2014 (978-1466586963)
- R Packages - Wickham - O'Reilly, 2015 (978-1491910597)
- R for Data Science - Grolemund, Wickham - O'Reilly, 2016 (978-1491910399)
Contact Information
Office Hours:
- Prof. Rundel - 204 Old Chemistry - Wednesday's 1-3 pm
- Lisa Lebovici - TBD - TBD
- Jingyi Zhang - TBD - TBD
Recommended Software
Text Editor
When you're writing code, it is nice to have a text editor that is optimized for writing code. There is a huge variety of options out there, if you do not already have a preferred editor try and few and see which one works best for you.
- vim / emacs - old school unix console based editors, they have a steep learning curve but are incredibly powerful.
- nano - another unix console editor, easier learning curve but with much less power.
- SublimeText - crossplatform GUI text editor with a robust plugin ecosystem.
git
Git is a state-of-the-art version control system. It lets you track who made changes to what when and has options for easily updating a shared or public version of your code on github.
- OSX - install Git for Mac by downloading and running the installer or install homebrew and use it to install git via brew install git.
- Unix / Linux - you should be able to install git via your prefered package manager (if it is not already installed).
- Windows - install Git for Windows by download and running the git for windows installer. This will provide you with git, the bash shell, and ssh in windows.
Unix shell(s) / ssh
We will be doing much of the work in the class on remote linux systems, primarily we will be interacting with these machine through a remote terminal and a shell. Using a shell gives you more power to do more tasks more efficiently with your computer.
- OSX / Unix / Linux - these tools should already be installed and you should be able to access your shell through the Terminal application (name may vary slightly depending on your OS).
- Windows - there are several ways to install bash or a bash-like shell, the preferred method is to install the git for windows package as detailed above. s
R / RStudio
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R we will primarily be using RStudio, an interactive development environment (IDE), via a browser based interface. There is no need to install R or RStudio on your own laptop but doing so is recommended before the end of the semester.
About this website
This site is built with Hugo and Blogdown. The theme is based on Blackburn and Hugo Conference.