Before you begin the homework assignment, take some time to familiarize yourself with the t.test function in R, as we will be using this function to calculate p-values from one- and two-sample t-tests in R. A brief tutorial is given in the first part of this HW, and you will need these functions for Exercises 8 - 12.
There is no template for this homework. Instead, create your own R Markdown file in R by going to File > New File > R Markdown. In the spaces provided, give the title (HW 05) and author (your name), and select “HTML document”. You should see a new file appear that has the following header:
---
title: "HW 05"
author: "Yue Jiang"
date: "10/1/2020"
output: html_document
---
There’s also some some example Markdown code that appears under this header. You may safely delete it. Use previous templates as guides for creating your own Markdown document.
We’ll be revisiting the licorice example presented in lecture. As a review, postoperative sore throat is an annoying, but painful complication of intubation after surgery, particularly with wider gauge double-lumen tubes. Reutzler et al. (2013) performed an experimental study among patients having elective surgery who required intubation with a double-lumen tube. Prior to anesthesia, patients were randomly assigned to gargle either a licorice-based solution or sugar water (as placebo). Sore throat was evaluated 30 minutes, 90 minutes, and 4 hours after conclusion of the surgery, evaluated using an 11-point Likert scale (0 = no pain, 10 = worst pain).
Let’s assume that we can treat this pain score as a continuous numeric variable.
The data are available with the following code:
library(tidyverse)
licorice <- read.csv("https://www2.stat.duke.edu/courses/Fall20/sta102/hw/data/licorice.csv")
Some relevant variables of interest are:
To suppress printed messages and warnings in your R Markdown document, you may use the options message = F and warning = F in your R chunk. That is, we may start the R chunk with ```{r chunk-name, message = F, warning = F}.
If we have a dataset of interest, we can use the syntax t.test(x, mu = _, alternative = _, conf.level = _) to perform t-tests in R and construct confidence intervals for the mean (or difference in means). For further details, you may type ?t.test or help(t.test). The arguments for this function are as follows:
x, a numeric vector of data valuesmu, a number indicating the true value of the meanalternative, specifying the alternative hypothesis. It must be either "two.sided", "greater", or "less".conf.level, specifying the confidence level of the confidence interval.Example: Suppose we want to test the null hypothesis that the mean BMi among all patients was 25 vs. the alternative hypothesis that the mean BMI among all patients was not equal to 25:
t.test(licorice$preOp_calcBMI, mu = 25,
alternative = "two.sided",
conf.level = 0.95)
##
## One Sample t-test
##
## data: licorice$preOp_calcBMI
## t = 2.1224, df = 234, p-value = 0.03486
## alternative hypothesis: true mean is not equal to 25
## 95 percent confidence interval:
## 25.04243 26.14047
## sample estimates:
## mean of x
## 25.59145
The dollar sign $ tells R which variable to use. For instance, licorice$preOp_calcBMI tells R that you are interested in the preOp_calcBMI variable from the licorice dataset.
Note that the output displays the t-statistic, the degrees of freedom, the p-value, the alternative hypothesis tested, the 95% confidence interval, and the sample mean (wow, that’s a lot!).
If you are performing a two-sample t-test, there are some other additional arguments:
y, a numeric vector of data values (placed after x)paired, a logical being either TRUE or FALSE indicating whether you want a paired t-test, andvar.equal, a logical being either TRUE or FALSE indicating whether you assume the variance is the same in both groups (this affects the degrees of freedom used in the test)Example: Suppose we want to test the null hypothesis that the mean BMI among men was less than or equal to the mean BMI among women vs. the alternative hypothesis that the mean BMI among men was more than BMI among women:
t.test(licorice$preOp_calcBMI ~ licorice$preOp_gender, mu = 0,
alternative = "greater",
paired = FALSE,
var.equal = FALSE,
conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: licorice$preOp_calcBMI by licorice$preOp_gender
## t = 2.9044, df = 182.62, p-value = 0.002067
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.716381 Inf
## sample estimates:
## mean in group 0 mean in group 1
## 26.24958 24.58656
Here, since there were two variables, we used formula syntax, given by a tilde ~. In this syntax, we have variable of interest ~ grouping variable. And so when we type licorice$preOp_calcBMI ~ licorice$preOp_gender, we’re saying to perform a two-sample independent samples t-test for the preOp_calcBMI variable from licorice, and the groups are defined in the preOp_gender variable.
In Exercises 1 - 7, you will be asked questions about a hypothesis test that you have not seen before (and likely won’t ever see). These questions are intended to assess whether you are familiar with the hypothesis testing process itself and underlying philosophical details. Only use the information provided below, and assume that required assumptions are satisfied.
The one-sample Anderson-Darling (AD) test can be used to test whether a sample comes from a specific target probability distribution based looking at a function of the empirical distribution function (EDF) of the observed data. The null and alternative hypotheses are given by
If, in our observed data, the difference between the EDF and the target distribution is “large enough,” then we reject the null hypothesis in favor of the alternative.
Suppose we conducted a one-sample AD test against some target distribution at the \(\alpha\) = 0.10 level and obtained a p-value of 0.24.
Questions 8 - 12 use the licorice dataset as referred to in the brief tutorial above and in class. You may assume that the pain scores are continuous numeric variables for the purposes of this assignment.
When conducting a hypothesis test, you must always formally specify the significance level, the hypotheses of interest, the reference distribution of the test statistic under the null hypothesis, the test statistic itself, the p-value, your decision, and a conclusion in context of the research problem.
Today’s dataset was made available by the Lerner Research Institute and Dr. Amy S. Nowacki of the Cleveland Clinic. These data are representative of a study by Ruetzler et al. (2013).