Before you begin the homework assignment, take some time to familiarize yourself with the t.test function in R, as we will be using this function to calculate p-values from one- and two-sample t-tests in R. A brief tutorial is given in the first part of this HW, and you will need these functions for Exercises 8 - 12.

Creating R Markdown files

There is no template for this homework. Instead, create your own R Markdown file in R by going to File > New File > R Markdown. In the spaces provided, give the title (HW 05) and author (your name), and select “HTML document”. You should see a new file appear that has the following header:

---
title: "HW 05"
author: "Yue Jiang"
date: "10/1/2020"
output: html_document
---

There’s also some some example Markdown code that appears under this header. You may safely delete it. Use previous templates as guides for creating your own Markdown document.

Data

We’ll be revisiting the licorice example presented in lecture. As a review, postoperative sore throat is an annoying, but painful complication of intubation after surgery, particularly with wider gauge double-lumen tubes. Reutzler et al. (2013) performed an experimental study among patients having elective surgery who required intubation with a double-lumen tube. Prior to anesthesia, patients were randomly assigned to gargle either a licorice-based solution or sugar water (as placebo). Sore throat was evaluated 30 minutes, 90 minutes, and 4 hours after conclusion of the surgery, evaluated using an 11-point Likert scale (0 = no pain, 10 = worst pain).

Let’s assume that we can treat this pain score as a continuous numeric variable.

The data are available with the following code:

library(tidyverse)
licorice <- read.csv("https://www2.stat.duke.edu/courses/Fall20/sta102/hw/data/licorice.csv")

Some relevant variables of interest are:

preOp_gender: Gender (0 = Male; 1 = Female)
preOp_calcBMI: Body mass index in kg/m$^2$
treat: Treatment given (0 = Sugar placebo; 1 = Licorice solution)
pacu30min_throatPain: Sore throat pain score 30 minutes after arrival in PACU (11-point scale: 0 = No pain; 10 = Worst pain)
postOp4hour_throatPain: Sore throat pain score 4 hours after surgery (11-point scale: 0 = No pain; 10 = worst pain)

R chunk options

To suppress printed messages and warnings in your R Markdown document, you may use the options message = F and warning = F in your R chunk. That is, we may start the R chunk with ```{r chunk-name, message = F, warning = F}.

t-tests using R

One sample t-tests

If we have a dataset of interest, we can use the syntax t.test(x, mu = _, alternative = _, conf.level = _) to perform t-tests in R and construct confidence intervals for the mean (or difference in means). For further details, you may type ?t.test or help(t.test). The arguments for this function are as follows:

x, a numeric vector of data values
mu, a number indicating the true value of the mean
alternative, specifying the alternative hypothesis. It must be either "two.sided", "greater", or "less".
conf.level, specifying the confidence level of the confidence interval.

Example: Suppose we want to test the null hypothesis that the mean BMi among all patients was 25 vs. the alternative hypothesis that the mean BMI among all patients was not equal to 25:

t.test(licorice$preOp_calcBMI, mu = 25, 
       alternative = "two.sided", 
       conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  licorice$preOp_calcBMI
## t = 2.1224, df = 234, p-value = 0.03486
## alternative hypothesis: true mean is not equal to 25
## 95 percent confidence interval:
##  25.04243 26.14047
## sample estimates:
## mean of x 
##  25.59145

The dollar sign $ tells R which variable to use. For instance, licorice$preOp_calcBMI tells R that you are interested in the preOp_calcBMI variable from the licorice dataset.

Note that the output displays the t-statistic, the degrees of freedom, the p-value, the alternative hypothesis tested, the 95% confidence interval, and the sample mean (wow, that’s a lot!).

Two sample t-tests

If you are performing a two-sample t-test, there are some other additional arguments:

y, a numeric vector of data values (placed after x)
paired, a logical being either TRUE or FALSE indicating whether you want a paired t-test, and
var.equal, a logical being either TRUE or FALSE indicating whether you assume the variance is the same in both groups (this affects the degrees of freedom used in the test)

Example: Suppose we want to test the null hypothesis that the mean BMI among men was less than or equal to the mean BMI among women vs. the alternative hypothesis that the mean BMI among men was more than BMI among women:

t.test(licorice$preOp_calcBMI ~ licorice$preOp_gender, mu = 0,
       alternative = "greater",
       paired = FALSE,
       var.equal = FALSE,
       conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  licorice$preOp_calcBMI by licorice$preOp_gender
## t = 2.9044, df = 182.62, p-value = 0.002067
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.716381      Inf
## sample estimates:
## mean in group 0 mean in group 1 
##        26.24958        24.58656

Here, since there were two variables, we used formula syntax, given by a tilde ~. In this syntax, we have variable of interest ~ grouping variable. And so when we type licorice$preOp_calcBMI ~ licorice$preOp_gender, we’re saying to perform a two-sample independent samples t-test for the preOp_calcBMI variable from licorice, and the groups are defined in the preOp_gender variable.

Exercises

In Exercises 1 - 7, you will be asked questions about a hypothesis test that you have not seen before (and likely won’t ever see). These questions are intended to assess whether you are familiar with the hypothesis testing process itself and underlying philosophical details. Only use the information provided below, and assume that required assumptions are satisfied.

The Anderson-Darling test

The one-sample Anderson-Darling (AD) test can be used to test whether a sample comes from a specific target probability distribution based looking at a function of the empirical distribution function (EDF) of the observed data. The null and alternative hypotheses are given by

$H_0$: the sample is consistent with a population that has the target distribution of interest
$H_1$: the sample is NOT consistent with a population that has the target distribution of interest

If, in our observed data, the difference between the EDF and the target distribution is “large enough,” then we reject the null hypothesis in favor of the alternative.

Suppose we conducted a one-sample AD test against some target distribution at the $\alpha$ = 0.10 level and obtained a p-value of 0.24.

6 points TRUE/FALSE: Finding a very large p-value from the Anderson- Darling test would give us strong evidence that the sample is consistent with a population with the target distribution of interest. If false, explain why.
6 points TRUE/FALSE: If we instead found a p-value smaller than our pre-specified $\alpha$, then there would be enough evidence for us to conclude that the the observed EDF is NOT consistent with a population with the target distribution of interest. If false, explain why.
6 points TRUE/FALSE: There is more than a 10% chance that a difference between the EDF and the target distribution this large or larger could occur by chance alone, if the sample were in fact consistent with the target distribution. If false, explain why.
6 points TRUE/FALSE: The probability that we have made the correct decision is greater than 0.90. If false, explain why.
6 points TRUE/FALSE: The probability of finding a significant difference between the EDF and the target distribution if we were to independently repeat the study would be less than 0.10. If false, explain why.
6 points TRUE/FALSE: The probability that the sample is consistent with a population that has the target distribution of interest is 0.24. If false, explain why.
6 points TRUE/FALSE: Assuming that the EDF is consistent with a population with the target distribution of interest, then there is a 24% chance that we have made a type I error. If false, explain why.

Implementing t-tests in R

Questions 8 - 12 use the licorice dataset as referred to in the brief tutorial above and in class. You may assume that the pain scores are continuous numeric variables for the purposes of this assignment.

When conducting a hypothesis test, you must always formally specify the significance level, the hypotheses of interest, the reference distribution of the test statistic under the null hypothesis, the test statistic itself, the p-value, your decision, and a conclusion in context of the research problem.

15 points Among patients who received the placebo sugar solution, test the hypothesis that the mean throat pain score 30 minutes after surgery is not equal to 1.
15 points Now compare the two groups; assess the hypothesis that the mean throat pain score 30 minutes after surgery was different between them.
10 points Repeat Exercise 9, this time comparing mean throat pain score 4 hours after surgery.
8 points Calculate a 99% confidence interval for the difference in mean pain scores 4 hours after surgery between the two treatment groups.
10 points Based on your analyses, do you think that licorice gargle prior to surgery is effective in reducing post-intubation sore throat? Explain your answer in detail.

Acknowledgements

Today’s dataset was made available by the Lerner Research Institute and Dr. Amy S. Nowacki of the Cleveland Clinic. These data are representative of a study by Ruetzler et al. (2013).

HW 05

STA 102 Fall 2020 (Jiang)

Due: Friday, October 9 2020 at 11:59p