HW 05

You must turn in a knitted file to Gradescope from a Quarto Markdown file in order to receive credit. Be sure to “associate” questions appropriately on Gradescope. As a reminder, late work is not accepted outside of the 24-hour grace period for homework assignments.

The Quarto template for this assignment may be found in the repository at the following link: https://classroom.github.com/a/1VA3G83W

We will use a subset of the diamonds dataset; it is available in your directory. Here we will formally describe some of the variables which you will be using:

price: price in dollars
carat: the carat weight of the diamond
cut: the quality of the cut (in increasing order of desirability, the categories are Fair, Good, Very Good, Premium, Ideal)
color the color of the diamond, graded in letters from J through D (lower letters are better color, so we would prefer a D diamond vs. a J diamond)
clarity: the clarity of the diamond (in increasing order of desirability, the categories are I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF)

Important: Please continue to make regular commits and follow good coding practices (e.g., with not having code run off the page). As well, suppress warnings and messages in your R code chunks.

Note: “log(x)” refers to natural log (base \(e\)). “log2(x)” refers to log base 2. Be careful in the following exercises!

Create a linear model with log(price) (natural log) as the response variable and log2(carat) (base 2), color, and clarity as predictors. Describe the relationship between carat and price (while holding color and clarity constant) as estimated by this model. Make sure your explanation is on the original scales for carat and price (i.e., un-transformed).
Evaluate whether the linear model assumptions for your model are satisfied. You may assume independence is reasonable.
Fit another linear model with log(price) (natural log) as the response variable and log2(carat) (base 2) as the only predictor. Provide and compare the RMSEs of these two models. Which model seems to do better in terms of RMSE? What is the unit of the RMSE of the model?
Compare these two models - is there evidence that additionally including color and clarity in the model is “helpful” somehow? Carry out a formal test of the following hypotheses at the \(\alpha = 0.001\) significance level. If you cannot carry out this test, explain why.:

\(H_0\): All of the slopes corresponding to the color and clarity terms are zero (while adjusting for log2(carat))
\(H_1\): At least one of the slopes corresponding to the color or clarity terms are non-zero (while adjusting for log2(carat))

Consider the model with log(price) as the response variable and only log2(carat) as the predictor. Suppose you wanted to compare it to a model with log(price) as the response and only color/clarity as the predictors. Describe how you might carry out a formal hypothesis test to compare these two models. If you cannot carry out this test, explain why.
Consider the model on page 31 of the slides from Feb. 14. What do you predict happens to price given a one unit change in carat size?

HW 05

STA 210 Spring 2023 (Jiang)

Due: Thursday, February 23, 2023 at 11:59p