Testing for independence

Is yawning contagious?

Do you think yawning is contagious?

Is yawning contagious?

Study description

In this study 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment) and 16 to a control group where they didn't see someone yawn (control).

mb_yawn = read.csv("https://stat.duke.edu/~mc301/data/mb_yawn.csv")
table(mb_yawn$group, mb_yawn$outcome)
##            
##             not yawn yawn
##   control         12    4
##   treatment       24   10

Proportion of yawners

table(mb_yawn$group, mb_yawn$outcome) %>% addmargins()
##            
##             not yawn yawn Sum
##   control         12    4  16
##   treatment       24   10  34
##   Sum             36   14  50
  • Proportion of yawners in the treatment group: \(\hat{p}_{yawn|trt} = \frac{10}{34} = 0.2941\)

  • Proportion of yawners in the control group: \(\hat{p}_{yawn|ctl} = \frac{4}{16} = 0.25\)

  • Our results match the ones calculated on the MythBusters episode.

Based on the proportions we calculated, do you think yawning is really contagious, i.e. are seeing someone yawn and yawning dependent?

Dependence, or another possible explanation?

  • The observed differences might suggest that yawning is contagious, i.e. seeing someone yawn and yawning are dependent.

  • But the differences are small enough that we might wonder if they might simple be due to chance.

  • Perhaps if we were to repeat the experiment, we would see slightly different results.

  • So we will do just that - well, somewhat - and see what happens.

  • Instead of actually conducting the experiment many times, we will simulate data to generate a null distribution.

Two competing claims

  • "There is nothing going on." Yawning and seeing someone yawn are independent, yawning is not contagious, observed difference in proportions is simply due to chance. \(\rightarrow\) Null hypothesis

  • "There is something going on." Yawning and seeing someone yawn are dependent, yawning is contagious, observed difference in proportions is not due to chance. \(\rightarrow\) Alternative hypothesis (one sided)

Formalizing the test

First we'd like to write out the null and alternative hypotheses,

\[ \begin{aligned} H_0:&~p_{yawn|trt} = p_{yawn|ctl} \\ H_A:&~p_{yawn|trt} > p_{yawn|ctl} \\ \end{aligned} \]

which is equivalent to

\[ \begin{aligned} H_0:&~p_{yawn|trt} - p_{yawn|ctl} = 0 \\ H_A:&~p_{yawn|trt} - p_{yawn|ctl} > 0 \\ \end{aligned} \]

Quantities of interest

\[ \begin{aligned} Parameter of interest:& \qquad p_{yawn|trt} - p_{yawn|ctl} \\ \\ Sample statistic:& \qquad \hat{p}_{yawn|trt} - \hat{p}_{yawn|ctl} = \frac{10}{34} - \frac{4}{16} = 0.0441 \\ \\ Null value:& \qquad p_{yawn|trt} - p_{yawn|ctl} = 0 \\ \end{aligned} \]

Simulation setup

  1. A regular deck of cards is comprised of 52 cards: 4 aces, 4 jacks, 4 queens, 4 kings, and 4 of numbers 2-10.

  2. Take out two aces from the deck of cards and set them aside.

  3. The remaining 50 playing cards will represent each participant in the study:
    • 14 face cards (including the 2 aces) represent the people who yawn.
    • 36 non-face cards represent the people who don't yawn.

Running the simulation

  1. Shuffle the 50 cards at least 7 times* to ensure that the dealt cards will be completely random.

  2. Deal out the top 16 cards (control group) and count the number of face cards, this is the simulated number of people who yawned in the control group.

  3. Deal out the remaining 34 cards (treatment group) and count the number of face cards, this is the simulated number of people who yawned in the treatment group.

  4. Calculate the difference in proportions of yawners (treatment - control), and plot it on the board.

  5. Mark the difference you find on the dot plot.

*http://www.dartmouth.edu/~chance/course/topics/winning_number.html*

Interactive Activity - HT Simulation

Lets write code that will implement this simulation.

Checking for independence

Do the simulation results suggest that yawning is contagious, i.e. does seeing someone yawn and yawning appear to be dependent?

## Response variable: categorical (2 levels, success: yawn)
## Explanatory variable: categorical (2 levels) 
## n_treatment = 34, p_hat_treatment = 0.2941
## n_control = 16, p_hat_control = 0.25
## H0: p_treatment =  p_control
## HA: p_treatment > p_control
## p_value = 0.5152

Interactive Activity - Bootstrap Simulation

Lets write code that will implement this simulation.

Bootstrap confidence interval results

## Response variable: categorical (2 levels, success: yawn)
## Explanatory variable: categorical (2 levels) 
## n_treatment = 34, p_hat_treatment = 0.2941
## n_control = 16, p_hat_control = 0.25
## 95% CI (treatment - control): (-0.2279 , 0.2904)

Inference for difference of two means

General Social Survey

  • Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society.

  • The GSS aims to gather data on contemporary American society in order to
    • monitor and explain trends and constants in attitudes, behaviors, attributes;
    • examine the structure and functioning of society in general as well as the role played by relevant subgroups;
    • compare the US to other societies to place American society in comparative perspective and develop cross-national models of human society;
    • make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting.
  • GSS questions cover a diverse range of issues including national spending priorities, marijuana use, crime and punishment, race relations, quality of life, confidence in institutions, and sexual behavior.

GSS Data

Hypothesis testing for a difference of two means

Is there a difference between the average number of hours relaxing after work between males and females. What are the hypotheses?

\[H_0: \mu_{M} = \mu_{F}\] \[H_A: \mu_{M} \ne \mu_{F}\]

Note that the variable identifying males and females in the dataset is sex.

Exploratory analysis

What type of visualization would be appropriate for evaluating this research question?

Summary statistics

## # A tibble: 2 × 4
##      sex x_bar    sd     n
##   <fctr> <dbl> <dbl> <int>
## 1   MALE  3.94  2.85   544
## 2 FEMALE  3.45  2.40   610

Interactive Activity - HT Simulation

Lets write code that will implement this simulation.

Testing for difference of means

Do the simulation results suggest that yawning is contagious, i.e. does seeing someone yawn and yawning appear to be dependent?

## Response variable: numerical
## Explanatory variable: categorical (2 levels) 
## n_MALE = 544, y_bar_MALE = 3.9393, s_MALE = 3
## n_FEMALE = 610, y_bar_FEMALE = 3.4492, s_FEMALE = 3
## H0: mu_MALE =  mu_FEMALE
## HA: mu_MALE != mu_FEMALE
## p_value = 0.0016

Interactive Activity - Bootstrap Simulation

Lets write code that will implement this simulation.

Bootstrap confidence interval results

## Response variable: numerical, Explanatory variable: categorical (2 levels)
## n_MALE = 544, y_bar_MALE = 3.9393, s_MALE = 2.8482
## n_FEMALE = 610, y_bar_FEMALE = 3.4492, s_FEMALE = 2.3969
## 95% CI (MALE - FEMALE): (0.1926 , 0.8083)