Introduction and Data Exploration (40 points)


Stages of the project

You will complete this project in two stages:

  1. Stage 1: Introduction and Data Exploration (25%)
  2. Stage 2: Project and Final presentation (75%)

The purpose of Stage 1 is to have your group explore the data set you’ll be working with, and have you propose research questions to potentially investigate in Stage 2 of the project. At this point in the semester we haven’t covered all of the statistical inference methods needed to complete Stage 2. However, you should still read both parts of the project before you begin.


Data

For your project, pick one of the following data sets to use for your analyses.

To load your data set, first download the ‘project.zip’ file in the Resources section of Sakai. Next, in RStudio, click ‘Upload’ in the file explorer and upload your data and markdown files. Navigate to where your files are saved using the file explorer and then select ‘Session’ -> ‘Set Working Directory’ -> ‘To Files Pane Location’. Use the load() function to load your data in RStudio. For example

load(file="ames.RData")

Content

The remainder of this document outlines the requirements and expectations of what your Stage 1 report should contain.

  1. Title: (2 points) Choose an appropriate working title for your project.

  2. Data: (5 points) Describe your data set and discuss your motivation for choosing it.

  3. Research questions: (15 points) Come up with three interesting research questions that you would like to explore with your data. Don’t simply ask “Is there a relationship between x and y?”. Instead, provide motivation for why a relationship between x and y is worth studying. Use outside resources to investigate this relationship, and explain what you expect to observe from the data. Finally, expand your question to consider other variables in your data set. How might the relationship between x and y change when we consider z? These questions can be based on the existing variables in your data set, but you are also free to create new variables from the data. You will have the option to update / revise / change these questions when doing Stage 2 of the project.

  4. Resources: (6 points) List the references of at least 3 resources you used to inform your research questions. These could be news articles, scholarly publications, additional data, etc. Give a brief explanation of the significance of each resource.

  5. EDA: (9 points) Perform an exploratory data analysis that addresses each of your three research questions. Your EDA should contain numerical summaries and visualizations. Your R output and plots should be accompanied by a brief explanation and interpretation of what you observed.

  6. Timeline and Teamwork: (3 points) Much of the work on Stage 2 of the project is typically done after the second midterm. Identify times between then and the project deadline when your group will be able to meet and work on the project and presentation. How you will divide the work for this project? Do you want to do everything together, or assign separate responsibilities? Are there team members that prefer doing the analysis, writing the report or making the presentation?

Format & length

  • Your Stage 1 Project should be written using the R Markdown template, so that all R code, output, and plots will be automatically included in your write up.

  • Download the template for the Stage 1 Project from Sakai resource folder.

  • Your Stage 1 Project should not exceed 10 pages (view a print preview to determined length). Adjust you plot size if it’s too long.

Grading

Your Stage 1 Project will be graded out of 40 points (as outlined above), and will make up 25% of your overall project score.

The following will result in deductions:

  • Late: -1 points for each day late
  • Reproducibility issues, requiring to make changes to the R Markdown file to knit the document: -3 points