knitr::opts_chunk$set(warning=FALSE,
message=FALSE)
library(tidyverse)
library(readr)
library(broom)
beer <- read_csv("data/beer.csv")
In this analysis, we will analyze the relationship between the amount of alcohol (PercentAlcohol
) and the caloric content (CaloriesPer12Oz
) in domestic beers. Let PercentAlcohol
be the predictor variable and CaloriesPer12Oz
the response variable.
Due to limited class time, we will not do the exploratory data analysis in this example. In practice, however, you should always start with the exploratory data analysis.
You can add your answers to this R Markdown document.
**1. Calculate a regression model to describe the relationship between PercentAlcohol
and CaloriesPer12Oz
. Display the model output.*
model <- lm(CaloriesPer12Oz ~ PercentAlcohol, data=beer)
model %>%
tidy(conf.int=TRUE)
## # A tibble: 2 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 5.75 5.56 1.03 3.03e- 1 -5.23 16.7
## 2 PercentAlcohol 28.6 1.03 27.8 2.76e-65 26.6 30.6
2. Does it make sense to interpret the intercept? Why or why not?
There are non-alocoholic beers, so it is possible to have a meaningful interpretation of the intercept. In our data, however, there are very few beers with less than 3% alcoholic content, so it would not be wise to interpret the intercept. It is not safe to assume the same relationship between PercentAlcohol
and CaloriesPer12Oz
hold for beers with 0% alcohol; this would be extrapolation.
3. Interpret the 95% confidence interval for the slope in the context of the data.
We are 95% confident that the interval (26.557, 30.620) contains the true population slope for PercentAlcohol
. This means we are 95% confident that for every 1% increase in alcohol content, the number of calories (per 12 oz) is expected to increase between 26.557 and 30.620 calories.
4. Find the critical value, \(t^*\), used to calculate the 95% confidence interval. The code below is a guide; uncomment and complete the lines of code to calculate and display the critical value.
n <- nrow(beer)
df <- n-2
crit_val <- qt(0.975,df)
The critical value used to calculate the 95% confident interval for the slope is 1.973934.
5. Interpret the test statistic in the context of the data
The estimated slope of 28.577 is 27.78 standard errors above the hypothesized mean of 0, assuming there is no linear relationship between percent alcohol and calories in domestic beers.
6. How was the p-value calculated? Fill in the code below to calculate the p-value. The code below is a guide; uncomment and complete the lines of code to calculate and display the p-value.
test_statistic <- 27.778990
prob <- 1 - pt(abs(test_statistic),df)
p_value <- 2 * prob
The p-value is 0. Given there is no linear relationship between PercentAlcohol
and CaloriesPer12Oz
, the probability of obtaining a test statistic with magnitude abs(test_statistic)
or more extreme is p_value
.
7. Fill in the code below to calculate the predicted calories and corresponding 90% interval for a single beer with alcohol content of 4.3%.
x0 <- data.frame(PercentAlcohol=4.3)
predict.lm(model,x0,interval="prediction",conf.level=0.9)
## fit lwr upr
## 1 128.6786 91.9253 165.4319
8. Fill in the code below to calculate the predicted calories and corresponding 90% interval for the subset of beers with alcohol content of 4.3%.
x0 <- data.frame(PercentAlcohol=4.3)
predict.lm(model,x0,interval="confidence",conf.level= 0.9)
## fit lwr upr
## 1 128.6786 125.3153 132.0419