Packages and data

library(tidyverse)
library(infer)

manhattan <- read_csv("data/manhattan.csv")
mb_yawn <- read_csv("data/mb-yawn.csv")
set.seed(45618)

Exercises

  1. Analyze the Manhattan data. Is there enough evidence to suggest that the mean price of a one-bedroom apartment is greater than 2400? Why or why not?

  2. Analyze the Manhattan data. Is there enough evidence to suggest that the median price of a one-bedroom apartment is greater than 2600? Why or why not?

  3. Reproduce the analysis with the yawning data. Is there enough evidence to suggest that yawning and observing someone yawn are not independent? Why or why not?

Exercise 1

Consider the hypothesis test:

\[H_0: \mu = 2400\] \[H_A: \mu > 2400\] Let \(\alpha = 0.05\)

xbar_rent <- manhattan %>% 
   specify(response = rent) %>% 
   calculate(stat = "mean") %>% 
   pull(stat)

xbar_rent
## [1] 2625.8
null_dist_xbar <- manhattan %>% 
   specify(response = rent) %>% 
   hypothesize(null = "point", mu = 2400) %>% 
   generate(reps = 10000, type = "bootstrap") %>% 
   calculate(stat = "mean")
null_dist_xbar %>% 
   visualise(alpha = .5) +
   geom_vline(xintercept = 2400, color = "purple", lty = 2, size = 1) +
   theme_minimal(base_size = 16) +
   shade_p_value(obs_stat = xbar_rent, direction = "greater")

null_dist_xbar %>% 
   get_p_value(obs_stat = xbar_rent, direction = "greater")

Since the p-value is greater than \(\alpha\), we fail to reject the null hypothesis at the 0.05 significance level. Hence, we do not have enough evidence to suggest that the mean rent exceeds $2,400 per month.

Confidence interval comparison

manhattan %>% 
   specify(response = rent) %>% 
   generate(reps = 10000, type = "bootstrap") %>% 
   calculate(stat = "mean") %>% 
   conf_int(level = 0.90)
## Warning: 'conf_int' is deprecated.
## Use 'get_confidence_interval' instead.
## See help("Deprecated")

If the value of the parameter specified by the null hypothesis is contained in the 90% interval then the null hypothesis cannot be rejected at the 0.05 level in our above test.

Exercise 2

Consider the hypothesis test:

\[H_0: M = 2600\] \[H_A: M > 2600\] Let \(\alpha = 0.05\)

med_rent <- manhattan %>% 
   specify(response = rent) %>% 
   calculate(stat = "median") %>% 
   pull(stat)

med_rent
## [1] 2350
null_dist_med <- manhattan %>% 
   specify(response = rent) %>% 
   hypothesize(null = "point", med = 2600) %>% 
   generate(reps = 10000, type = "bootstrap") %>% 
   calculate(stat = "median")
null_dist_med %>% 
   visualise(alpha = .5) +
   geom_vline(xintercept = 2600, color = "purple", lty = 2, size = 1) +
   theme_minimal(base_size = 16) +
   shade_p_value(obs_stat = med_rent, direction = "greater")