class: center, middle, inverse, title-slide # Meet the toolkit
⚒ --- layout: true <div class="my-footer"> <span> Dr. Mine Çetinkaya-Rundel - <a href="http://www2.stat.duke.edu/courses/Fall18/sta112.01/schedule" target="_blank">stat.duke.edu/courses/Fall18/sta112.01 </a> </span> </div> --- class: center, middle # Course structure and policies --- ## Class meetings - Interactive - Some lectures, lots of learn-by-doing - Bring your laptop to class every day --- ## Diversity & Inclusiveness: .midi[ **Intent:** Students from all diverse backgrounds and perspectives be well-served by this course, that students' learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit. It is my intent to present materials and activities that are respectful of diversity: gender identity, sexuality, disability, age, socioeconomic status, ethnicity, race, nationality, religion, and culture. Let me know ways to improve the effectiveness of the course for you personally, or for other students or student groups. ] -- - If you have a name and/or set of pronouns that differ from those that appear in your official Duke records, please let me know. - If you feel your performance is being impacted by your experiences outside of class, please don't hesitate to come and talk with me. If you prefer to speak with someone outside of the course, your academic dean is an excellent resource. - I (like many people) am still in the process of learning about diverse perspectives/identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it. --- ## How to get help - Course content, logistics, etc. discussion on course Slack <i class="fab fa-slack-hash"></i>. - Please post on the #questions channel instead of direct messaging. - Use proper formatting: When asking questions involving code, please make sure to use inline code formatting for short bits of code or code snippets for longer, multi-line chunks. - Formatting messages: https://get.slack.help/hc/en-us/articles/202288908-Format-your-messages - Code snippets: https://get.slack.help/hc/en-us/articles/204145658-Creating-a-Snippet - Often it's a lot more pleasant an experience to get your questions answered in person. Make use of the teaching team's office hours, we're here to help! - For personal and grade related questions, direct message me on Slack or use email. --- ## Tips for asking questions - First search existing discussion for answers. If the question has already been answered, you're done! If it has already been asked but you're not satisfied with the answer, add to the thread. - Give your question context from course concepts not couse assignments. - Good context: "I have a question on filtering" - Bad context: "I have a question on HW 1 question 4" - Be precise in your description: - Good description: "I am getting the following error and I'm not sure how to resolve it - `Error: could not find function "ggplot"`" - Bad description: "R giving errors, help me! Aaaarrrrrgh!” - You can edit a question after posting it. --- ## Tips for asking questions - Format your questions nicely using markdown and code formatting. - Where appropriate, provide links to specific files, or even lines within them, in the body of your issue. This will help your helper understand your question. Note that only the teaching team will have access to private repos. - (Optional) Tag someone or some group of people. Start by typing the @ symbol and Slack will generate some suggestions. --- ## Academic integrity - Only work that is clearly assigned as team work can be completed collaboratively. - Use of disallowed materials during the take home exam will not be tolerated. --- ## Sharing/reusing code - I am well aware that a huge volume of code is available on the web to solve any number of problems. - Unless I explicitly tell you not to use something the course's policy is that you may make use of any online resources (e.g. StackOverflow) but you must explicitly cite where you obtained any code you directly use (or use as inspiration). - Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. - On individual assignments you may not directly share code with another student in this class, and on team assignments you may not directly share code with another team in this class. - Except for the take home exams, you are welcome to discuss the problems together and ask for advice, but you may not send or make use of code from another team. - On the take home exams all communication with classmates is explicitly forbidden. --- ## Course components: - Teams: 3-4 person teams, initially based on survey and pretest results, will change throughout the semester - Application exercises: Usually start in class and finish in teams by the next class period, check/no check - Homework: Individual, lowest score dropped - Exams: Individual, two take home midterms - Final project: Team, presentations during scheduled final exam time, you must participate in the project and be in class to present to pass this class - Self paced tutorials: Individual, check/no check --- ## Grading - Weights of each component are given in the syllabus. - Class attendance is a firm expectation; frequent absences or tardiness will be considered a legitimate cause for grade reduction. - Cumulative numerical averages of 90 - 100 are guaranteed at least an A-, 80 - 89 at least a B-, and 70 - 79 at least a C-, however the exact ranges for letter grades will be determined after the final exam. - The more evidence there is that the class has mastered the material, the more generous the curve will be. --- ## Other policies - Please refrain from texting or using your computer for anything other than coursework during class. - You must be in class on a day when you're scheduled to present, there are no make ups for presentations. --- class: center, middle # Reproducible data analysis --- ## Reproducibility checklist .question[ What does it mean for a data analysis to be "reproducible"? ] -- Near-term goals: - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear **why** it was done? (e.g., how were parameter settings chosen?) Long-term goals: - Can the code be used for other data? - Can you extend the code to do other things? --- ## Toolkit  - Scriptability `\(\rightarrow\)` R - Literate programming (code, narrative, output in one place) `\(\rightarrow\)` R Markdown - Version control `\(\rightarrow\)` Git / GitHub --- class: center, middle # R and RStudio --- ## What is R/RStudio? - R is a statistical programming language - RStudio is a convenient interface for R (an integreated development environment, IDE) - At its simplest:<sup>➥</sup> - R is like a car’s engine - RStudio is like a car’s dashboard <img src="img/engine-dashboard.png" width="420" style="display: block; margin: auto;" /> .footnote[ ➥ Source: [Modern Dive](https://moderndive.com/) ] --- ## Let's take a tour - R / RStudio <br><br> <center> [DEMO] </center> <br><br> Concepts introduced: - Console - Using R as a calculator - Environment - Loading and viewing a data frame - Accessing a variable in a data frame - R functions --- ## R essentials A short list (for now): - Functions are (most often) verbs, followed by what they will be applied to in parantheses: ```r do_this(to_this) do_that(to_this, to_that, with_those) ``` -- - Columns (variables) in data frames are accessed with `$`: ```r dataframe$var_name ``` -- - Packages are installed with the `install.packages` function and loaded with the `library` function, once per session: ```r install.packages("package_name") library(package_name) ``` --- ## tidyverse .pull-left[  ] .pull-right[ .center[ [tidyverse.org](https://www.tidyverse.org/) ] - The tidyverse is an opinionated collection of R packages designed for data science. - All packages share an underlying philosophy and a common grammar. ] --- class: center, middle # R Markdown --- ## R Markdown - Fully reproducible reports -- each time you knit the analysis is ran from the beginning - Simple markdown syntax for text - Code goes in chunks, defined by three backticks, narrative goes outside of chunks --- ## Let's take a tour -- R Markdown Before we that... .question[ What is the Bechdel test? ] -- The Bechdel test asks whether a work of fiction features at least two women who talk to each other about something other than a man, and there must be two women named characters. -- <br> <center> [DEMO] </center> <br> -- Concepts introduced: - Copying a project of mine - Knitting documents - R Markdown and (some) R syntax --- ## R Markdown tips - Keep the [R Markdown cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and Markdown Quick Reference (Help -> Markdown Quick Reference) handy, we'll refer to it often as the course progresses - The workspace of your R Markdown document is separate from the Console <br><br> <center> [DEMO] </center> <br><br> --- ## How will we use R Markdown? - Every assignment / report / project / etc. is an R Markdown document - You'll always have a template R Markdown document to start with - The amount of scaffolding in the template will decrease over the semester