The goal of the final project is to apply what you’ve learned in this course to conduct a statistical analysis. It should be an in-depth regression analysis of a question that interests your group. This question may come from one of your other courses, your research interests, your future career interests, etc.
The project will consist of two components:
It is best to start with the question of interest and finding the data second. As you’re looking for data, keep in mind your regression analysis must be done in R Studio. Once you find a data set, you should make sure you are able to load it into R Studio, especially if it is in a format we haven’t used in class before. If you’re having trouble loading your data set into R Studio, ask for help as soon as possible, so you can make any necessary adjustments before the project proposal is due.
In order for you to have the greatest chance of success with this project it is important that you choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple main effects and interactions can be explored for your model. As such, your dataset must have at least 100 observations and at least 10 variables (exceptions can be made but you must speak with me first). The data set should include both quantitative and categorical variables.
Do not reuse datasets used in examples/homework/labs in class.
The Data Visualization Services team (located in Bostock library) has written a guide for finding data for a regression analysis. Please visit R Data Resources for Regression Analysis for guidelines to consider as you search for data along with suggestions for potential data sources.
This is a draft of the introduction section of your project as well as a regression analysis plan and your dataset. Each section should be no more than 1 page (excluding figures). You can check a print preview to confirm length. Your write up and all typesetting must be done using R Markdown.
There are two main purposes of the project proposal:
The proposal should include the following:
In the introduction, you will introduce the research question you wish to explore. This includes the motivation for the question (citing any relevant literature), and your hypothesis/hypotheses regarding your question of interest.
In this section, you will describe your data set, including the data source. This section will include
Place your data in the /data
folder, and add dimensions (number of observations and variables) and the data dictionary (a description of every variable in the dataset) to the README in the folder. Then use the glimpse
function print a summary of the data frame at the end of your proposal.
Include the appropriate references for any outside literature.
Total | 20 pts |
---|---|
Introduction | 5 pts |
Data analysis plan | 10 pts |
Data | 2 pts |
Document organization and writing | 3 pts |
The goal of the write up is to demonstrate your ability to ask meaningful questions and answer them with the results from regression analysis, that you are proficient in using R, and that you are proficient at interpreting and presenting the results. Focus on methods that help you begin to answer your research questions. You do not have to apply every statistical procedure we learned. Also pay attention to your presentation. Neatness, coherency, and clarity will count.
You can add sections as you see fit to the template in your project repo. At a minimum, your write up should have the following sections:
Before you finalize your write up, make sure your chunks are turned off by including echo = FALSE
in the header of each code chunk. This will hide the R code in the .md file of your final write up.
The main part of the write up (sections 1 - 4) should be no more than 3 pages. The Additional Work section may be up to 5 pages.
Your presentation should be no longer than 8 minutes, and each team member should say something substantial.
You can use any software you like for your final presentation, including R Markdown to create your slides. There isn’t a limit to how many slides you can use, just a time limit (8 minutes total). Each team member should get a chance to speak during the presentation. Your presentation should not just be an account of everything you tried (“then we did this, then we did this, etc.”), instead it should convey what choices you made, and why, and what you found.
Presentation schedule: All teams will present during the university scheduled time of the final exam for this course, on Wed, May 1 2p - 5p. The presentation schedule is below; however, the write up and presentations are due at 2p for all teams.
Your submission should include
/data
folder)/presentation
folder)Style and format does count for this assignment, so please take the time to make sure everything looks good and your data and code are properly formatted.
Total | 100 pts |
---|---|
Proposal | 20 pts |
Presentation | 25 pts |
Write up | 35 pts |
Classmates’ presentation scores | 5 pts |
Team peer evaluation | 10 pts |
Repo and document organization | 5 pts |
Team peer evaluation: You will be asked to fill out a survey where you rate the contribution and teamwork of each team member out of 10 points. You will additionally report a contribution percentage for each team member. Filling out the survey is a prerequisite for getting credit on the team member evaluation. If you are suggesting that an individual did less than 20% of the work, please provide some explanation. If any individual gets an average peer score indicating that they did less than 10% of the work, this person will receive half the grade of the rest of the group.
The project will be graded based on the following criteria:
A general breakdown of scoring is as follows:
Late penalty:
Go to the course organization on GitHub: https://github.com/Sta210-Sp19.
Find the repo starting with project
and that has your team name at the end (this should be the only project
repo available to you).
In the repo, click on the green Clone or download button, select Use HTTPS (this might already be selected by default, and if it is, you’ll see the text Clone with HTTPS as in the image below). Click on the clipboard icon to copy the repo URL.
Go to RStudio Cloud and into the course workspace. Create a New Project from Git Repo. You will need to click on the down arrow next to the New Project button to see this option.
Copy and paste the URL of your assignment repo in the dialog box and click OK.
You’re working in the same repo as your teammates now, so merge conflicts will happen, issues will arise, and that’s fine! Commit and push often, and ask questions when stuck.
Review the grading guidelines and ask questions if any of the expectations are unclear.
Make sure each team member is contributing, both in terms of quality and quantity of contribution (we will be reviewing commits from different team members).
Set aside time to work together both in the same location and remotely.
When you’re done, review the .md document on GitHub to make sure you’re happy with the final state of your work.
Code: In your write up your code should be hidden (echo = FALSE
) so that your document is neat and easy to read. However your document should include all your code such that if I re-knit your Rmd file I should be able to obtain the results you presented. Exception: If you want to highlight something specific about a piece of code, you’re welcomed to show that portion.
Teamwork: You are to complete the assignment as a team. All team members are expected to contribute equally to the completion of this assignment and group assessments will be given at its completion - anyone judged to not have sufficient contributed to the final product will have their grade penalized. While different teams members may have different backgrounds and abilities, it is the responsibility of every team member to understand how and why all code and approaches in the assignment works.