Markdown document, final report, and videos due Monday, Nov 23, at 11:59 PM.

Note: start working on this project early! Exam 03 is the day after the project due date. Allow yourself enough time to study for the exam and complete the project.

No late work is accepted on the final project.

Introduction

TLDR: Pick (or create) a dataset and do something with it. That is your final project.

This project will consist of a written analysis of a health-related dataset of your own choosing or creation and an accompanying video presentation. This dataset may already exist, or you may collect your own data using by scraping the web or other means. The goal is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond, if you like) and apply them to a dataset in a meaningful way.

You don’t have to demonstrate proficiency in every topic we’ve covered (that’d certainly take a herculean effort!). Focus on one aspect of statistical inference or modeling to answer your research question of interest, and back up your report with well-crafted, professional-quality visualizations.

Brief project logistics

The final project will be done in your lab groups. The three hard deliverables for the final project are

  • A written, reproducible report detailing your analysis
  • The R Markdown document corresponding to your report (when knit, should reproduce your report exactly)
  • A pre-recorded video presentation (maximum 10 minutes, strictly enforced) to be uploaded to the class Warpwire page (this will be viewable by other students as well! Instructions are found below)

Due Monday, Nov 23, at 11:59 PM.

The grade breakdown is as follows:

Total 100 pts
Written report 80 pts
Video presentation 20 pts

This grade may be modified depending on team evaluation of lab members’ relative contributions.

Data sources

In order for you to have the greatest chance of success with this project it is important thatyou choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple relationships can be explored. As such, your dataset:

  • Must have at least 50 observations
  • Must have at least ten variables
  • Must have both categorical and numeric variables

Exceptions can be made but you must speak with me first.

All analyses must be done in RStudio, and your final written report and analysis must be reproducible.

If you are using a dataset that comes in a format that we haven’t encountered in class (for instance, a .DAT file), make sure that you are able to load it into RStudio as this can be tricky depending on the source. If you are having trouble, ask for help before it is too late.

Do not reuse datasets used in examples/homework/lab from class.

Some resources that may be helpful:

If you are working on an external project with another professor or lab, you are welcome to use that data, provided you obtain permission from your professor or lab.

Project components

Written report

Your written report must be done using R Markdown. All team members must meaningfully contribute to its completion.

The written report is worth 80 points, broken down as

Total 80 pts
Introduction/data 20 pts
Methodology 20 pts
Results 20 pts
Discussion 20 pts

Introduction and data

The introduction should introduce your general research question and your data (where it came from, how it was collected, what are the cases, what are the variables, etc.).

Methodology

The methodology section should include the variables used to address your research question, as well as any useful visualizations or summary statistics. As well, you should introduce and justify the statistical method(s) that you believe will be useful in answering your research question.

Results

Showcase how you arrived at answers to your question using any techniques we have learned in this class (and some beyond, if you’re feeling adventurous). Provide the main results from your analysis. The goal is not to do an exhaustive data analysis (i.e., do not calculate every statistic and procedure you have learned for every variable), but rather let me know that you are proficient at asking meaningful questions and answering them with results of data analysis, that you are proficient in using R, and that you are proficient at interpreting and presenting the results. Focus on methods that help you begin to answer your research questions.

Discussion

This section is a conclusion and discussion. This will require a summary of what you have learned about your research question along with statistical arguments supporting your conclusions. Also, critique your own methods and provide suggestions for improving your analysis. Issues pertaining to the reliability and validity of your data and appropriateness of the statistical analysis should also be discussed here. A paragraph on what you would do differently if you were able to start over with the project or what you would do next if you were going to continue work on the project should also be included.

Style and format does count for this assignment, so please take the time to make sure everything looks good and your data and code are properly formatted.

Video presentation

Sometime before Monday, November 23 at 11:59 PM, you/your group will upload a video presentation of your project to Warpwire. Note that all members must present, and that a ten-minute time limit is strictly enforced. As well, note that this video will be viewable by other students.

For the presentation, you can speak over your slide deck, similar to the lecture content videos. I recommend using Zoom to record your presentation; however, you can use whatever platform works best for your group. Below are a few resources to help you record video presentations:

You will post the presentation video in Warpwire, which is accessible from the the course Sakai site (bottom of the left-hand tool bar). To post your video on Warpwire:

  • Click the Warpwire tab in the course Sakai site.
  • Click the “+” and select “Upload files”.
  • Locate the video on your computer and click to upload.
  • Once you’ve uploaded the video to Warpwire, click to share the video and make a copy of the video’s URL. You will need this when you post the video in the discussion forum.

Peer teamwork evaluation

You will be asked to fill out a survey where you rate the contribution and teamwork of each team member by assigning a contribution percentage for each team member. Filling out the survey is a prerequisite for getting credit on the team member evaluation. If you are suggesting that an individual did less than half the expected contribution given your team size (e.g., for a team of four students, if a student contributed less than 12.5% of the total effort), please provide some explanation. If any individual gets an average peer score indicating that this was the case, their grade will be assessed accordingly.

Overall notes

The project is very open ended. For instance, in creating a compelling visualization(s) of your data in R, there is no limit on what tools or packages you may use. You do not need to visualize all of the data at once. A single high quality visualization will receive a much higher grade than a large number of poor quality visualizations.

Before you finalize your write up, make sure messages and warnings are turned off by using message = F, warning = F in each R code chunk. If there is code that you do not want to be displayed in your final report, you can suppress its printing by using echo = F in the R code chunk.

Finally, pay attention to details in your write-up and presentation. Neatness, coherency, and clarity will count.

Tips

  • Ask questions if any of the expectations are unclear.

  • Make sure each team member is contributing, both in terms of quality and quantity of contribution.

  • All team members are expected to contribute equally to the completion of this assignment and group assessments will be given at its completion - anyone judged to not have sufficient contributed to the final product will have their grade penalized. While different teams members may have different backgrounds and abilities, it is the responsibility of every team member to understand how and why all code and approaches in the assignment works.

  • You will be asked to fill out an anonymous survey where you rate the contribution and teamwork of each team member. This survey may modify the individual grade received by each group member.

Grading

Grading of the project will take into account the following:

  • Content - What is the quality of research and/or policy question and relevancy of data to those questions?
  • Correctness - Are statistical procedures carried out and explained correctly?
  • Writing and Presentation - What is the quality of the statistical presentation, writing, and explanations?
  • Creativity and Critical Thought - Is the project carefully thought out? Are the limitations carefully considered? Does it appear that time and effort went into the planning and implementation of the project?

A general breakdown of scoring is as follows:

  • 90%-100%: Outstanding effort. Student understands how to apply all statistical concepts, can put the results into a cogent argument, can identify weaknesses in the argument, and can clearly communicate the results to others.
  • 80%-89%: Good effort. Student understands most of the concepts, puts together an adequate argument, identifies some weaknesses of their argument, and communicates most results clearly to others.
  • 70%-79%: Passing effort. Student has misunderstanding of concepts in several areas, has some trouble putting results together in a cogent argument, and communication of results is sometimes unclear.
  • 60%-69%: Struggling effort. Student is making some effort, but has misunderstanding of many concepts and is unable to put together a cogent argument. Communication of results is unclear.
  • Below 60%: Student is not making a sufficient effort.

Late work policy

There is no late work accepted on this project. Be sure to turn in your work early to avoid any technological mishaps.

If you do not turn in your final project on time, you will not pass the course.