The goal of this lab is to use Analysis of Variance (ANOVA) to understand the variation in price of diamonds that are 0.5 carats. Additionally, you will be introduced to new R fucntion used for wrangling and summarizing data.
Go to the sta210-sp20 organization on GitHub (https://github.com/sta210-sp20). Click on the repo with the prefix lab-04-anova-. It contains the starter documents you need to complete the warmup exercise.
Clone the repo and create a new project in RStudio Cloud.
See Lab 01 for full instructions on getting started.
We will use the following packages in today’s lab.
In today’s lab, we will analyze the diamonds
dataset from the ggplot2 package . Type ?diamonds
in the console to see a dictionary of the variables in the data set. The primary focus of this analysis will be examining the relationship between a diamond’s cut and price.
Before starting the exercises, take a moment to read more about the diamond attributes on the Gemological Institute of America webpage: https://www.gia.edu/diamond-quality-factor.
The diamonds
dataset contains the price and other characteristics for over 50,000 diamonds. For this analysis, we will only consider diamonds that have a carat weight of 0.5.
You will use this subset for the remainder of lab.
When using Analysis of Variance (ANOVA) to compare group means, it is ideal to have approximately the same number of observations for each group.
cut
have the fewest number of observations? Show the code and output used to support your answer. See the forcats reference page for ideas on recoding factor variables.cut
, so that the two levels with the fewest number of observations are combined into one level. Be sure to give the new level an informative name and save the results to the data frame. You will use the recoded version of cut
for the remainder of the lab.Confirm that the variable cut
was recoded as expected. Show the code and output used to check the recoding.
Create a plot to display the relationship between cut
and price
. Be sure to include informative axes labels and an informative title.
Calculate the number of observations along with the mean and standard deviation of price
for each level of cut
.
Based on the plots and summary statistics from the previous exercises , does there appear to be a relationship between the cut and price for diamonds that are 0.5 carats? Briefly explain your reasoning.
Are the assumptions for ANOVA satisfied? Comment on each assumption, including an explanation for your reasoning and any summary statistics and/or plots used to make the conclusion.
Display the ANOVA table used to examine the relationship between cut
and price
for diamonds that are 0.5 carats.
Use the ANOVA table from the previous question to calculate the sample variance of price
. Show the code / formula used to calculate the sample variance.
What is \(\hat{\sigma}^2\), the estimated variance of price
within each level of cut
.
State the null and alternative hypotheses for the test conducted using the ANOVA table in Exercise 8. State the hypotheses using both statistical notation and words in the context of the data.
What is your conclusion for the test specified in the previous question? State the conclusion in the context of the data.
You’re done and ready to submit your work! Knit, commit, and push all remaining changes. You can use the commit message “Done with Lab 4!”, and make sure you have pushed all the files to GitHub (your Git pane in RStudio should be empty) and that all documents are updated in your repo on GitHub. Then submit the assignment on Gradescope following the instructions below.
Once your work is finalized in your GitHub repo, you will submit it to Gradescope. Your assignment must be submitted on Gradescope by the deadline to be considered “on time”.
To submit your assignment:
Go to http://www.gradescope.com and click Log in in the top right corner.
Click School Credentials ➡️ Duke NetID and log in using your NetID credentials.
Click on the STA 210 Regression Analysis course.
Click on the assignment, and you’ll be prompted to submit it.
Select your assignment repo and choose “master” for the branch.
Make sure to include the names of all group members who participated in the assignment. Click here for help on adding group members to an assignment.
Click Upload. You should receive an email to confirm that the assignment has been submitted.
Notes:
Exploratory Data Analysis | 15 |
Analysis of Variance | 21 |
Additional Analysis | 4 |
Merge conflict exercise | 3 |
Lab attendance & participation | 3 |
Narrative in full sentences & document neatly organized | 2 |
Commit messages from every member | 2 |
Total | 50 |