The goal of the final project is to apply what you’ve learned in this course to conduct a statistical analysis. It should be an in-depth regression analysis of a question that interests your group. This question may come from one of your other courses, your research interests, your future career interests, etc.
The project will consist of
It is best to start with the question of interest and finding the data second. As you’re looking for data, keep in mind your regression analysis must be done using R. Once you find a data set, you should make sure you are able to load it into RStudio, especially if it is in a format we haven’t used in class before. If you’re having trouble loading your data set into RStudio, ask for help as soon as possible, so you can make any necessary adjustments before the project proposal is due.
In order for you to have the greatest chance of success with this project it is important that you choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple main effects and interactions can be explored for your model. Your dataset must have at least 100 observations and at least 10 variables (exceptions can be made but you must speak with me first). The data set should include both quantitative and categorical variables.
You are not permitted to reuse datasets used in examples/homework/labs in class.
The Data Visualization Services team (located in Bostock library) has written a guide for finding data for a regression analysis. Please visit R Data Resources for Regression Analysis for guidelines to consider as you search for data along with suggestions for potential data sources.
Other sources that may be helpful:
There are two main purposes of the project proposal:
You will address the following in your proposal:
glimpse
function print a summary of the data frame at the end of your proposal./data
folder, and add dimensions (number of observations and variables) and the data dictionary (a description of every variable in the dataset) to the README in the folder. Make sure the data dictionary is neatly formatted and easy to read. If your dataset has a lot of variables (> 20), you can include 10 of the key variables in the data dictionary in the README with a link to the original data dictionary for all of the variables.To give you some experience with the type of workflow you’ll experience in practice, you will submit the proposal and all other aspects of the project in your GitHub repo. You do not need to submit anything for the project on Gradescope. We will provide comment and feedback as an “issue” in your GitHub repo.
Total | 10 pt |
---|---|
Narrative | 5 pt |
Data (glimpse & dataset in data folder) |
1 pt |
Data dictionary | 2 pt |
Organization and formatting | 2 pt |
For this portion of the project, you will conduct the regression analysis and begin to derive your final conclusions. This will become the main portion of your final write up. The regression analysis should include the following:
The regression analysis should be written as a report. Write your analysis in a .Rmd file called “analysis.Rmd” in your project
folder on GitHub and output a PDF. Use proposal.Rmd
to see an example YAML.
In lieu of in-class presentations, we will share presentations on Sakai. A few of your classmates, your TAs, and Professor Tackett will watch your presentation and post questions.
The presentation will consist of two components: (1) Slide deck and (2) Video presentation OR write up.
Create a slide deck presenting the main findings of your analysis. The slide deck should have no more than 6 content slides + 1 title slide. Here is a suggested outline as you think through the slides; you do not have to use this exact format for the 6 slides.
You can use the software of your choice to create your slide deck. Save your slide deck as PDF or provide a link to view your slides online (e.g. in Google Slides). Be sure you grant the correct access permissions, so Professor Tackett, the TAs, and your classmates have access to your slides.
You will create a video presentation OR write-up to accompany your slide deck. One objective of the project is to give you experience presenting a statistical analysis through writing and in an oral presentation. Therefore, I strongly encourage your group to create a video presentation, if possible.
The video should be no more than 10 minutes (most presentations are about 8 - 10 minutes). For the presentation, you can speak over your slide deck, similar to the lecture content videos. I recommend using Zoom to record your presentation; however, you can use whatever platform works best for your group. Below are a few resources to help you record video presentations:
You will post the presentation video in Warpwire, which is accessible from the the course Sakai site (bottom of the left-hand tool bar). To post your video on Warpwire:
If it is not feasible for your group to make a video, you can make a write up (up to two pages singled- spaced) to accompany your slide deck. Think of the write up as a transcript of what you would say if you were presenting your results in class or in a video. Save your write up as a PDF. You will upload this write up with your slide deck to the discussion forum in Sakai.
Each team will post their slide deck + video or write up in the discussion forum in Sakai. To post your presentation:
The presentation must be posted on Sakai by Monday, April 27 at 11:59p EDT.
The presentation will be graded based on the following:
Each student will be assigned 3 presentations to watch. After watching the group’s video or reading their slide deck and write up, click “Reply” to post a question for the group. You may not post a question that’s already been asked on the discussion thread. Additionally, the question should be (i) substantive (i.e. it can’t be “Why did you use a histogram instead of box plot”?), (ii) demonstrate your understanding of the content from the course, and (iii) relevant to that group’s specific presentation, i.e demonstrating that you’ve watched the presentation.
You may start posting questions at Tue, April 28 at 12a EDT. All questions must be posted by Wed, April 29 at 11:59p EDT.
This portion of the project will be assessed individually.
The goal of the final write up is to demonstrate your ability to use regression analysis to answer meaningful questions, your proficiency in R, and your proficiency interpreting and presenting results.
The final write up should be no more than 15 pages. This does not include the Additional Work section.
The final write up should be written as it if will be read by a business or research colleague interested in understanding the main results from your analysis. This means your write up should focus on the main conclusions and interesting findings that you derive from your analysis. It should not just be a list of every model you tried and interpretation of every model coefficient.
You can use the following sections to help organize your write up:
Put the final write up in a .Rmd file called “final-writeup.Rmd” in your project
folder on GitHub, and output it as a PDF. The document should be neatly organized, and all code and warning messages should be suppressed, i.e. not visible in the PDF. See Formatting Guidelines & Tips for code and tips to help you format your write up.
The final write up will be graded based on the following:
Analysis (15 pt): The analysis steps are appropriate for the data and research question. The group used a thoughtful approach to select the final model that took into account potential interaction effect and addressed violations in assumptions. The model assumptions and diagnostics are thoroughly and accurately assessed. If violations of model assumptions still exist, there was a reasonable attempt to address them, i.e. based on what we’ve learned this semester.
Discussion (10 pt): The model fit is clearly assessed, and interesting findings from the model are clearly described. Interpretations of model coefficients are used to support the key findings and conclusions. If the primary modeling objective is prediction, the model’s predictive power is assessed.
Limitations & Conclusion ( 5 pt): Overall conclusions from analysis are clearly described. The group has thoughtfully considered potential limitations of their data or analysis and presented potential ideas to address them.
Organization & Formatting (5 pt) : The final write up is neatly organized with clear section headers and appropriately sized figures with informative labels. All code, warnings, and messages are suppressed. Overall, the document would be presentable in a business or research setting. See Formatting Guidelines & Tips for code and tips to format your document.
You will be asked to fill out a survey where you rate the contribution and teamwork of each team member out of 5 points. You will additionally report a contribution percentage for each team member. Filling out the survey is a prerequisite for getting credit on the team member evaluation. If you are suggesting that an individual did less than 20% of the work, please provide some explanation. If any individual gets an average peer score indicating that they did less than 10% of the work, this person’s project grade will be assessed accordingly.
Total | 100 pt |
---|---|
Proposal | 10 pt |
Analysis | 20 pt |
Final Write up | 35 pt |
Presentation | 20 pt |
Watch + comment on presentations | 10 pt |
Team peer evaluation | 5 pt |
The project will be graded based on the following criteria:
A general breakdown of scoring is as follows:
Late penalty:
Go to the course organization on GitHub: https://www.github.com/sta210-sp20
Find the repo starting with project
and that has your team name at the end (this should be the only project
repo available to you).
Follow the usual instructions for cloning a new project in RStudio
You’re working in the same repo as your teammates now, so merge conflicts will happen, issues will arise, and that’s fine! Commit and push often, and ask questions when stuck.
Review the grading guidelines and ask questions if any of the expectations are unclear.
Make sure each team member is contributing, both in terms of quality and quantity of contribution (we will be reviewing commits from different team members).
Set aside time to work together both in the same location and remotely.
When you’re done, review the .md document on GitHub to make sure you’re happy with the final state of your work.
Code: In your write up your code should be hidden (echo = FALSE
) so that your document is neat and easy to read. However your document should include all your code such that if I re-knit your Rmd file I should be able to obtain the results you presented. Exception: If you want to highlight something specific about a piece of code, you’re welcomed to show that portion.
Teamwork: You are to complete the assignment as a team. All team members are expected to contribute equally to the completion of this assignment and group assessments will be given at its completion - anyone judged to not have sufficient contributed to the final product will have their grade penalized. While different teams members may have different backgrounds and abilities, it is the responsibility of every team member to understand how and why all code and approaches in the assignment works.