class: center, middle, inverse, title-slide # Welcome to Regression Analysis ### Dr. Maria Tackett ### 01.09.19 --- class: center, middle # Welcome! --- ## What is Regression Analysis? "In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when <font class="vocab">**the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors')**</font>. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed." .pull-right[ [- Wikipedia](https://en.wikipedia.org/wiki/Regression_analysis) ] --- class: regular ## Instructor [Prof. Maria Tackett](https://www2.stat.duke.edu/~mt324/) <i class="material-icons">mail_outline</i> [maria.tackett@duke.edu](mailto:maria.tackett@duke.edu)<br> <i class="material-icons">work_outline</i> Old Chem 118B<br> <i class="material-icons">calendar_today</i> Tues 10:30a - 12p -- .pull-left[ <img src="img/00/capital-one-logo.jpg" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="img/00/fbi-fingerprint.jpg" width="100%" style="display: block; margin: auto;" /> ] --- ## Teaching Assistants .pull-left[ [Anna Darwish](https://www.linkedin.com/in/anna-darwish) <i class="material-icons">mail_outline</i> [anna.darwish@duke.edu](mailto:anna.darwish@duke.edu)<br> <i class="material-icons">work_outline</i> Old Chem 203B<br> <i class="material-icons">calendar_today</i> Mon 3p - 5p <br><br> [Matty Pahren](https://www.linkedin.com/in/matty-pahren-53345316a) <i class="material-icons">mail_outline</i> [martha.pahren@duke.edu](mailto:martha.pahren@duke.edu)<br> <i class="material-icons">work_outline</i> Old Chem 203B<br> <i class="material-icons">calendar_today</i> Wed 12:30p - 2:30p ] .pull-right[ [Ethan Shen](https://www.linkedin.com/in/ethan-shen-931010134/) <i class="material-icons">mail_outline</i> [ethan.shen@duke.edu](mailto:ethan.shen@duke.edu)<br> <i class="material-icons">work_outline</i> Old Chem 203B<br> <i class="material-icons">calendar_today</i> Thur 4:30p - 6:30p <br><br> [Abbas Zaidi](https://sites.google.com/site/amzaidistatistics/) <i class="material-icons">mail_outline</i> [abbas.zaidi@duke.edu](mailto:abbas.zaidi@duke.edu)<br> <i class="material-icons">work_outline</i> Old Chem 203B<br> <i class="material-icons">calendar_today</i> Mon 5p - 7p ] --- ## Where to find information - Course website: [http://bit.ly/sta210-sp19](http://bit.ly/sta210-sp19) - GitHub Site: [https://github.com/STA210-Sp19](https://github.com/STA210-Sp19) --- ## Course Objectives - Learn and apply methods for analyzing multivariate data sets - Learn to check whether a proposed statistical model is appropriate for the given data - Develop a proficiency in addressing complex research questions using regression analysis - Learn the process of data-based research by applying the methods from this course to a final project --- class: middle, center ## Examples of Regression Analysis --- ## Fingerprint Analysis .pull-left[ <img src="img/00/fingerprints.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ *We use <font class="vocab">**Analysis of Variance (ANOVA) decomposition**</font> to help determine whether the differences in fingerprints are circumstantial or because the prints were produced by different sources.* ] <br><br> <small>Tackett, M., 2018. *Creating Fingerprint Databases and a Bayesian Approach to Quantify Dependencies in Evidence*. PhD dissertation, University of Virginia.</small> --- ### Impact on Educational Achievement *"Our objectives were to ... determine whether there are differences in the impact of lead across the EOG [End of Grade] distribution, and elucidate the impact of cumulative childhood social and environmental stress on educational outcomes. <font color=#9B02BD>**Multivariate and quantile regression techniques were employed**.</font>...The effects of environmental and social stressors (especially as they stretch out the lower tail of the EOG distribution) demonstrate the particular vulnerabilities of socioeconomically and environmentally disadvantaged children."* <br><br> <small>Miranda, M., Dohyeong,K., Reiter, J., Galeano, M., & Maxson, P. (2009). Environmental contributors to the achievement gap. *NeuroToxicology*, 30, 1019-1024.</small> --- ### FiveThirtyEight March Madness Predictions *Live Win Probabilities are "derived using <font color=#9B02BD>**logistic regression analysis**</font>, which lets us plug the current state of a game into a model to produce the probability that either team will win the game."* <br> .pull-right[ [-"How Our March Madness Predictions Work"](https://fivethirtyeight.com/features/how-our-march-madness-predictions-work/) ] <br><br> [2018 March Madness Live Predictions](https://projects.fivethirtyeight.com/2018-march-madness-predictions/) --- class: middle, center ## Your Turn! --- ## Create a GitHub Account <small> .instructions[ Go to https://github.com/, and create an account (unless you already have one) ] Tips for creating a username from [Happy Git with R](http://happygitwithr.com/github-acct.html#username-advice). - Incorporate your actual name! - Reuse your username from other contexts if you can, e.g., Twitter or Slack. - Pick a username you will be comfortable revealing to your future boss. - Shorter is better than longer. - Be as unique as possible in as few characters as possible. - Make it timeless. Don’t highlight your current university, employer, or place of residence. - Avoid words laden with special meaning in programming, like `NA`. .instructions[ Once done, place a blue sticky note on your laptop. If you have questions, place a yellow sticky note. ] </small> --- ## Join RStudio.cloud - Go to [http://bit.ly/sta210-sp19-rstudio](http://bit.ly/sta210-sp19-rstudio), and log in with your GitHub credentials. - You should see a project called *Movie Budgets and Revenues*. Click "Copy"; this will create your copy of the project and launch it. <br><br> .instructions[ Once done, place a blue sticky note on your laptop. If you have questions, place a yellow sticky note. ] --- ## Movie Data Analysis 1. Put your name in the author field at the top of the file (in the `yaml` -- we will discuss what this is at a later date). Knit again. 2. Change the genre names in parts 1 and 2 to genres that interest you. The spelling and capitalization must match what's in the data, so you can use the Appendix to see the correct spelling and capitalization. Knit again. You have made your first data visualization! <br><br> .instructions[ Once done, place a blue sticky note on your laptop. If you have questions, place a yellow sticky note. ] --- ## Discussion Discuss the following with a partner. 1. Consider the plot in Part 1. - Describe how movie revenue has changed over time. - Suppose we use revenue as a measure of popularity. How has the popularity of each genre changed over time? In other words, are the genres that were most popular in 1986 still the most popular today? 2. Consider the plot in Part 2. - Which genre(s) tend to have the highest budgets? - In general, what is the relationship between a movie's budget and its total revenue? Are there any genres that show a different relationship between budget and revenue? --- class: middle, center ## Course Policies --- ## Class Meetings -- <font class="vocab">Lecture</font> - Focus on concepts of regression analysis - Interactive lecture that includes examples and hands-on exercises - Bring fully-charged laptop to every lecture -- <font class="vocab">Lab</font> - Focus on computing using R `tidyverse` syntax - Apply concepts from lecture to case study scenarios - Work on labs in teams of 3 - 4 - Bring fully-charged laptop to every lab --- ## Textbooks - [An Introduction to Statistical Learning](https://www-bcf.usc.edu/~gareth/ISL/) - Free PDF on author's website. Hard copy available for purchase - Assigned readings about statistical concepts - **NOT** used for coding - [R for Data Science](http://r4ds.had.co.nz/) - Free online version. Hard copy available for purchase. - Assigned readings and resource for R coding using `tidyverse` syntax. --- ## Activities & Assessments - <font class="vocab">Homework</font>: Individual assignments combining conceptual and computational skills. *Lowest score dropped.* -- - <font class="vocab">Labs</font>: Team assignments focusing on computational skills. *Lowest score will be dropped.* -- - <font class="vocab">Exams</font>: Two in-class exams. -- - <font class="vocab">Final Project</font>: Team project presented during the final exam period, **May 1, 2p - 5p**. You must complete the project and present in class to pass the course. -- - <font class="vocab">Teamwork</font>: Teams of 3-4 based on survey and pretest results. Consistent throughout the semester. Periodic peer evaluations. --- ## Grade Calculation | Component | Weight | |---------------|--------| | Homework | 25% | | Labs | 15% | | Exam I | 20% | | Exam II | 20% | | Final Project | 15% | | Teamwork | 5% | -- - You are expected to attend lectures and labs. Excessive absences or tardiness can impact your final course grade. - Exact grade cut-offs will be determined after Exam II. --- ## Excused Absences - Students who miss a class due to a scheduled varsity trip, religious holiday, or short-term illness should fill out the respective form. - These excused absences do not excuse you from assigned work. -- - If you have a personal or family emergency or chronic health condition that affects your ability to participate in class, please contact your academic dean’s office. -- - Exam dates cannot be changed and no make-up exams will be given. --- ## Late Work & Regrade Requests - Homework assignments: - Late but within 24 hours of deadline: 20% penalty - Not accepted if submitted any later - Late work will not be accepted for the final project - Regrade requests must be submitted within three days of when the assignment is returned using the link posted in the course syllabus --- ## Academic Honesty All work for this class should be done in accordance with the Duke Community Standard. > To uphold the Duke Community Standard: > - I will not lie, cheat, or steal in my academic endeavors; > - I will conduct myself honorably in all my endeavors; and > - I will act if the Standard is compromised. Any violations will automatically result in a grade of 0 on the assignment and will be reported to [Office of Student Conduct](https://studentaffairs.duke.edu/conduct) for further action. --- ## Reusing Code - Unless explicitly stated otherwise, you may make use of online resources (e.g. StackOverflow) for coding examples on assignments. If you directly use code from an outside source (or use it as inspiration), you must or explicitly cite where you obtained the code. Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. - On individual assignments, you may discuss the assignment with one another; however, you may not directly share code or write up with other students. - On team assignments, you may not directly share code or write up with another team. Unauthorized sharing of the code or write up will be considered a violation for all students involved. --- ## Where to find help - I encourage you to attend office hours! It is often easier to discuss the course content in-person than online. - Use Piazza for general questions about course content and/or assignments, since other students may benefit from the response. - Use email for questions regarding personal matters and/or grades. --- ## Technology - You should bring a laptop to every lecture and lab session. Outlets are limited, so make sure it is fully-charged. - Ensure the volume on all devices is set to mute. - Refrain from engaging in activities not related to the class discussion. Browsing the web and social media, excessive messaging, playing games, etc. is not only a distraction for you but is also a distraction for everyone around you. --- ## Inclusion This course is designed to be welcoming and accessible to all students. If there is some aspect of class that is not welcoming or accessible to you, please let me know immediately. Additionally, if you are experiencing something outside of class that is affecting your performance in the course, please feel free to talk with me or your academic dean. --- class: center, middle ## Questions? --- ## Announcements - Office hours start next week. - See me after class or schedule an appointment if you need to meet this week. - Labs start on Friday. Bring fully-charged laptop to lab.