You must turn in a knitted file to Gradescope from a Quarto Markdown file in order to receive credit. Be sure to “associate” questions appropriately on Gradescope. As a reminder, late work is not accepted outside of the 24-hour grace period for homework assignments.

The Quarto template for this assignment may be found in the repository at the following link: https://classroom.github.com/a/1VA3G83W

We will use a subset of the diamonds dataset; it is available in your directory. Here we will formally describe some of the variables which you will be using:

Important: Please continue to make regular commits and follow good coding practices (e.g., with not having code run off the page). As well, suppress warnings and messages in your R code chunks.

Note: “log(x)” refers to natural log (base \(e\)). “log2(x)” refers to log base 2. Be careful in the following exercises!

  1. Create a linear model with log(price) (natural log) as the response variable and log2(carat) (base 2), color, and clarity as predictors. Describe the relationship between carat and price (while holding color and clarity constant) as estimated by this model. Make sure your explanation is on the original scales for carat and price (i.e., un-transformed).
  2. Evaluate whether the linear model assumptions for your model are satisfied. You may assume independence is reasonable.
  3. Fit another linear model with log(price) (natural log) as the response variable and log2(carat) (base 2) as the only predictor. Provide and compare the RMSEs of these two models. Which model seems to do better in terms of RMSE? What is the unit of the RMSE of the model?
  4. Compare these two models - is there evidence that additionally including color and clarity in the model is “helpful” somehow? Carry out a formal test of the following hypotheses at the \(\alpha = 0.001\) significance level. If you cannot carry out this test, explain why.:
  1. Consider the model with log(price) as the response variable and only log2(carat) as the predictor. Suppose you wanted to compare it to a model with log(price) as the response and only color/clarity as the predictors. Describe how you might carry out a formal hypothesis test to compare these two models. If you cannot carry out this test, explain why.
  2. Consider the model on page 31 of the slides from Feb. 14. What do you predict happens to price given a one unit change in carat size?