use_git_config()
function.Here are some tips as you complete HW 02:
We will use the following packages in this assignment:
library(tidyverse)
library(broom)
library(knitr)
The Computations & Concepts section of homework contains short answer questions about the concepts discussed in class. Some of these questions may also require short chunks of code to produce the output needed to answer the question. Answers should be written in complete sentences.
dfw <- 727 # degrees of freedom within (residuals)
dft <- 730 # total degress of freedom
ssw <- 6.486 # sum of squares within (residuals)
sst <- 19.386 # total sum of squares
Fill in the missing values from the ANOVA table, and use the code to display the completed table. Once you have filled in the values for the table, change the eval=
to TRUE
so the code will be evaluated and the results will be displayed.
dfb <- # degrees of freedom between (model)
ssb <- # sum of squares between (model)
msb <- # mean square between
msw <- # mean square within (residuals)
f_stat <- #F -statistic
p_val <- 1-pf(____, _____, _____) # p-value
Use the code below to combine all of the values and print an ANOVA table.
source <- c("Between Groups", "Within Groups", "Total")
df <- c(dfb, dfw,dft)
ss <- c(ssb, ssw, sst)
ms <- c(msb, msw,NA)
f.statistic <- c(f_stat, NA, NA)
p.value <- c(p_val,NA,NA)
# combine the columns to make a table called "anova"
anova <- bind_cols("Source"=source,"df"=df,"Sum of squares"=ss,
"Mean square"=ms,"F-statistic"=f.statistic,"p-value"=p.value)
# print the table
kable(anova)
Use the table you created in the previous question. How many groups are there? Is there sufficient evidence against the claim that the group population means are equal? Briefly explain.
A linear regression model was used to describe the relationship between a quantitative predictor variable \(x\) and a response variable \(y\). The output from the Analysis of Variance is shown below. Use the table to answer Questions 3 - 7. You can assume all assumptions for linear regression (and thus ANOVA) are sufficiently met.
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
x | 1 | 0.3569372 | 0.3569372 | 3.187972 | 0.0757126 |
Residuals | 198 | 22.1688128 | 0.1119637 | NA | NA |
How many observations are the dataset used to conduct this analysis?
What is the estimate of \(\sigma^2\)?
Use the ANOVA table to calculate the proportion of the variation in \(y\) that is explained by \(x\). Show how you calculated this value in your R code, i.e. use R as a “calculator”.
What does the F-statistic mean? In other words, how is the F-statistic calculated?
We can use this table to test the following hypotheses: \[H_0: \beta_1 = 0 \hspace{5mm} \text{ vs }\hspace{5mm} H_a: \beta_1 \neq 0\] State the conclusion of the test in terms of the relationship between \(x\) and \(y\). Briefly describe how you came to this conclusion.
The Data Analysis section of homework contains open-ended data analysis questions. Your response should be neatly organized and read as a complete narrative. This means that in addition to addressing the question there should also be exploratory data analysis and an analysis of the model assumptions. In short, these questions should be treated as “mini-projects”.
We will use the diamonds
dataset for this analysis. Use linear regression to description the relationship between the price
and carat
of diamonds that cost $1000 or less and have a “Good” cut. Your analysis should address the following:
carat
sufficiently explains the variation in price.
As usual, be sure to include exploratory data analysis and an analysis of the model assumptions.
Total | 70 |
---|---|
Questions 1 - 7 | 30 |
Question 8 | 30 |
Documents complete and neatly organized (Markdown and knitted documents) | 5 |
Answers written in complete sentences | 3 |
Regular and informative commit messages | 2 |