Today’s lab will serve as a refresher for many of the functions you’ve used in R. It will not cover dplyr or ggplot functions (though they may be useful still, for instance in manipulating data or plotting data in order to check asusmptions), but will focus on the statistical tests themselves.

We’ll begin with a short refresher on the assignment operator <- and $, and then review the functions used for statistical tests introduced on homeworks 4, 5, and 6.

This lab is a compilation of the pre-homework activities. The lab TAs will be available to help in case you have any questions. There are no exercises to turn in.

Data

We’ll be revisiting the licorice example presented in lecture and HW 05. As a review, postoperative sore throat is an annoying, but painful complication of intubation after surgery, particularly with wider gauge double-lumen tubes. Reutzler et al. (2013) performed an experimental study among patients having elective surgery who required intubation with a double-lumen tube. Prior to anesthesia, patients were randomly assigned to gargle either a licorice-based solution or sugar water (as placebo). Sore throat was evaluated 30 minutes, 90 minutes, and 4 hours after conclusion of the surgery (from 0 to 10).

The data are available with the following code:

library(tidyverse)
licorice <- read.csv("https://www2.stat.duke.edu/courses/Spring22/sta102.001/hw/data/licorice.csv")

One sample t-tests

If we have a dataset of interest, we can use the syntax t.test(x, mu = _, alternative = _, conf.level = _) to perform t-tests in R and construct confidence intervals for the mean (or difference in means). For further details, you may type ?t.test or help(t.test). The arguments for this function are as follows:

x, a numeric vector of data values
mu, a number indicating the true value of the mean
alternative, specifying the alternative hypothesis. It must be either "two.sided", "greater", or "less".
conf.level, specifying the confidence level of the confidence interval.

Example: Suppose we want to test the null hypothesis that the mean BMi among all patients was 25 vs. the alternative hypothesis that the mean BMI among all patients was not equal to 25:

t.test(licorice$preOp_calcBMI, mu = 25, 
       alternative = "two.sided", 
       conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  licorice$preOp_calcBMI
## t = 2.1224, df = 234, p-value = 0.03486
## alternative hypothesis: true mean is not equal to 25
## 95 percent confidence interval:
##  25.04243 26.14047
## sample estimates:
## mean of x 
##  25.59145

The dollar sign $ tells R which variable to use. For instance, licorice$preOp_calcBMI tells R that you are interested in the preOp_calcBMI variable from the licorice dataset.

Note that the output displays the t-statistic, the degrees of freedom, the p-value, the alternative hypothesis tested, the 95% confidence interval, and the sample mean (wow, that’s a lot!).

Two sample t-tests

If you are performing a two-sample t-test, there are some other additional arguments:

y, a numeric vector of data values (placed after x)
paired, a logical being either TRUE or FALSE indicating whether you want a paired t-test, and
var.equal, a logical being either TRUE or FALSE indicating whether you assume the variance is the same in both groups (this affects the degrees of freedom used in the test)

Example: Suppose we want to test the null hypothesis that the mean BMI among men was less than or equal to the mean BMI among women vs. the alternative hypothesis that the mean BMI among men was more than BMI among women:

t.test(licorice$preOp_calcBMI ~ licorice$preOp_gender, mu = 0,
       alternative = "greater",
       paired = FALSE,
       var.equal = FALSE,
       conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  licorice$preOp_calcBMI by licorice$preOp_gender
## t = 2.9044, df = 182.62, p-value = 0.002067
## alternative hypothesis: true difference in means between group 0 and group 1 is greater than 0
## 95 percent confidence interval:
##  0.716381      Inf
## sample estimates:
## mean in group 0 mean in group 1 
##        26.24958        24.58656

Here, since there were two variables, we used formula syntax, given by a tilde ~. In this syntax, we have variable of interest ~ grouping variable. And so when we type licorice$preOp_calcBMI ~ licorice$preOp_gender, we’re saying to perform a two-sample independent samples t-test for the preOp_calcBMI variable from licorice, and the groups are defined in the preOp_gender variable.

ANOVA

To conduct an ANOVA in R, we will take a summary of the aov function to create the ANOVA table. The general syntax is summary(aov(outcome_var ~ grouping_var, data = dataset)), where outcome_var is the numerical variable you’re interested in calculating means of (e.g., the pulse rate in the experiment talked about in clasS) and grouping_var is the grouping variable (e.g., pet vs. friend vs. neither).

Example: Suppose you’re interested in testing whether there is a difference in mean BMI between patients of the three ASA categories:

summary(aov(preOp_calcBMI ~ preOp_asa, data = licorice))

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## preOp_asa     2    262  131.21   7.595 0.000638 ***
## Residuals   232   4008   17.28                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that the ANOVA table provides all output talked about in the lecture slides. The between-group variance component is given in the top line and the within-group variance component is given in the bottom line.

Non-parametric tests in R

In R, we can use the wilcox.test() function to perform both the signed-rank test (for paired observations) or the rank sum test (for unpaired observations). In general, the format is as follows:

wilcox.test(outcome ~ groups, data = dataset, paired = ___)

where outcome is the outcome variable, groups is the group variable, dataset is the dataset, and paired is either TRUE or FALSE depending on whether you are looking for a signed rank test or rank sum test, respectively. You may also specified the alternative to be "two.sided", "less", or "greater".

Example: Suppose you wanted to perform a rank sum test, examining differences in median throat pain score between patients receiving placebo and licorice:

wilcox.test(pacu90min_throatPain ~ treat, data = licorice,
            paired = FALSE)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pacu90min_throatPain by treat
## W = 8620, p-value = 1.17e-06
## alternative hypothesis: true location shift is not equal to 0

For the Kruskal-Wallis test, use the kruskal.test() function. The general format is:

kruskal.test(outcome ~ groups, data = dataset)

where again, outcome is the outcome variable, groups is the group variable, and dataset is the dataset.

Example: Suppose you wanted to perform a rank sum test, examining differences in median throat pain score between patients depending on their ASA score:

kruskal.test(pacu90min_throatPain ~ preOp_gender, data = licorice)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  pacu90min_throatPain by preOp_gender
## Kruskal-Wallis chi-squared = 4.2705, df = 1, p-value = 0.03878

Tables in R

Tables in R can be created straightforwardly with the table() function with existing code. For instance, suppose we wanted to create a table looking at the ASA score prior to surgery:

table(licorice$asa)

## < table of extent 0 >

We can also make a 2-way table by specifying another variable. For instance, we could look at the relationship between ASA score and treatment (perhaps to see whether the treatment groups were “balanced” in terms of ASA):

table(licorice$preOp_asa, licorice$treat)

##                   
##                     0  1
##   healthy          19 22
##   mild_disease     67 67
##   systemic_disease 31 29

Notice that the first variable specified is displayed along the rows, and the second variable specified is displayed along the columns. Although R will let you create tables with any variables, they only really make sense for categorical data.

One-sample tests of proportion

Suppose you are interested in testing a single proportion. You may use the prop.test() function to do so, and it contains the following arguments:

x, the number of “successes”
n, the total sample size
p, the hypothesized proportion
alternative, specifying the alternative hypothesis. It must be either "two.sided", "greater", or "less".
conf.level, specifying the confidence level of the confidence interval.

Example: We saw earlier in the table that there were 117 patients receiving placebo and 118 receiving licorice. If we wanted to test whether this proportion is different from the null hypothesized proportion $p_0 = 0.5$ at the $\alpha = 0.05$ significance level, we could use the code (looking at placebo):

prop.test(x = 117, n = 235, p = 0.5, 
          alternative = "two.sided", conf.level = 0.95)

## 
##  1-sample proportions test with continuity correction
## 
## data:  117 out of 235, null probability 0.5
## X-squared = 0, df = 1, p-value = 1
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.4324046 0.5634108
## sample estimates:
##         p 
## 0.4978723

Note that the output displays a $\chi^2$ statistic and degrees of freedom. This is similar to the test using the normal distribution covered in the lecture slides. You may safely use R to evaluate any hypothesis tests for proportions, and thus may use this $\chi^2$ statistic for a test of proportions. The output also includes the p-value, the alternative hypothesis tested, a confidence interval, and the data itself.

Chi-square test

Chi-square tests are performed in R by inputing a table into the chisq.test function. To conduct a chi-square test, simply wrap the table in the chisq.test function.

Example: Is there an association between ASA and gender among patients receiving surgeries requiring intubation?

chisq.test(table(licorice$preOp_asa, licorice$preOp_gender))

## 
##  Pearson's Chi-squared test
## 
## data:  table(licorice$preOp_asa, licorice$preOp_gender)
## X-squared = 0.18711, df = 2, p-value = 0.9107

Note that the output provides the $\chi^2$ statistic, the degrees of freedom, and the p-value. We created a table looking at ASA and gender, and then directly used the chisq.test() function on this table.

Fisher’s exact test

We do something similar to the chi-square test: simply use the fisher.test function applied to a table.

Example: Same as above, but using Fisher’s exact test

fisher.test(table(licorice$preOp_asa, licorice$preOp_gender))

## 
##  Fisher's Exact Test for Count Data
## 
## data:  table(licorice$preOp_asa, licorice$preOp_gender)
## p-value = 0.9159
## alternative hypothesis: two.sided

Acknowledgements

Today’s dataset was made available by the Lerner Research Institute and Dr. Amy S. Nowacki of the Cleveland Clinic. These data are representative of a study by Ruetzler et al. (2013).

Lab 08

STA 102 Spring 2022 (Jiang)