October 27, 2015

## Today's agenda

• Wrap up one variable inference

• Testing for independence

• Get started on App Ex 8

## Rent in Durham

Data from a random sample of 20 1+ bedroom apartments in Durham in 2012.

durham_apts <- read.csv("https://stat.duke.edu/~mc301/data/durham_apts.csv")

## Exploratory analysis

ggplot(data = durham_apts, aes(x = rent)) +
geom_dotplot()

durham_apts %>%
summarise(xbar = mean(rent), med = median(rent))
##    xbar med
## 1 920.1 887

source("https://stat.duke.edu/courses/Fall15/sta112.01/code/one_num_boot.R")
source("https://stat.duke.edu/courses/Fall15/sta112.01/code/one_num_test.R")

## Bootstrap CI for mean rent in Durham

Estimate the average rent in Durham for 1+ bedroom apartments using a 95% confidence interval.

one_num_boot(durham_apts$rent, statistic = mean, seed = 195729) ## Summary stats: n = 20, sample mean = 920.1 ## 95% CI: (795.968, 1044.232) ## Bootstrap CI for median rent in Durham Estimate the median rent in Durham for 1+ bedroom apartments using a 95% confidence interval. one_num_boot(durham_apts$rent, statistic = median, seed = 571035)
## Summary stats: n = 20, sample median = 887
## 95% CI: (712.8174, 1061.1826)

## Bootstrap testing for a mean

• Construct the bootstrap distribution

• Shift it to be centered at the null value

• Calculate the p-value as usual: observed or more extreme outcome (more extreme in the direction of the null hypothesis) given that the null value is true

## Bootstrap test for average rent in Durham

Do these data provide convincing evidence that the average rent in Durham for 1+ bedroom apartments is greater than $800? one_num_test(durham_apts$rent, statistic = mean, null = 800, alt = "greater", seed = 28732)
## H0: mu = 800
## HA: mu > 800
## Summary stats: n = 20, sample mean = 920.1
## p-value =  0.0251

## Other helper functions

For future use…

source("https://stat.duke.edu/courses/Fall15/sta112.01/code/one_cat_boot.R")
source("https://stat.duke.edu/courses/Fall15/sta112.01/code/one_cat_test.R")

## Is yawning contagious?

Do you think yawning is contagious?

## Study description

In this study 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment) and 16 to a control group where they didn't see someone yawn (control).

table(mb_yawn$group, mb_yawn$outcome)
##
##             not yawn yawn
##   control         12    4
##   treatment       24   10

## Proportion of yawners

addmargins(table(mb_yawn$group, mb_yawn$outcome))
##
##             not yawn yawn Sum
##   control         12    4  16
##   treatment       24   10  34
##   Sum             36   14  50
• Proportion of yawners in the treatment group: $$\frac{10}{34} = 0.2941$$

• Proportion of yawners in the control group: $$\frac{4}{16} = 0.25$$

• Our results match the ones calculated on the MythBusters episode.

Based on the proportions we calculated, do you think yawning is really contagious, i.e. are seeing someone yawn and yawning dependent?

## Dependence, or another possible explanation?

• The observed differences might suggest that yawning is contagious, i.e. seeing someone yawn and yawning are dependent.

• But the differences are small enough that we might wonder if they might simple be due to chance.

• Perhaps if we were to repeat the experiment, we would see slightly different results.

• So we will do just that - well, somewhat - and see what happens.

• Instead of actually conducting the experiment many times, we will our results.

## Two competing claims

• There is nothing going on." Promotion and gender are independent, no gender discrimination, observed difference in proportions is simply due to chance. $$\rightarrow$$ Null hypothesis

• There is something going on." Promotion and gender are dependent, there is gender discrimination, observed difference in proportions is not due to chance. $$\rightarrow$$ Alternative hypothesis

## Simulation setup

1. A regular deck of cards is comprised of 52 cards: 4 aces, 4 of numbers 2-10, 4 jacks, 4 queens, and 4 kings.

2. Take out two aces from the deck of cards and set them aside.

3. The remaining 50 playing cards to represent each participant in the study:
• 14 face cards (including the 2 aces) represent the people who yawn.
• 36 non-face cards represent the people who don't yawn.

## Running the simulation

1. Shuffle the 50 cards at least 7 times* to ensure that the cards counted out are from a random process.

2. Count out the top 16 cards and set them aside. These cards represent the people in the control group.

3. Out of the remaining 34 cards (treatment group) count the (the number of people who yawned in the treatment group).

4. Calculate the difference in proportions of yawners (treatment - control), and submit this value using your clicker.

5. Mark the difference you find on the dot plot.

## Checking for independence

Do the simulation results suggest that yawning is contagious, i.e. does seeing someone yawn and yawning appear to be dependent?

## $p_hat_diff ## [1] 0.0441 ## ##$p_value
## [1] 0.513

Application exercise 10:

Write a function that conducts a randomization test as described above for two categorical variables. Run the function for the yawning dataset. Comminicate with other teams to match your answers for a given seed.