Use previous R Markdown examples to serve as a template for this assignment. Be especially careful about the YAML (the header) and spacing between questions.

Non-parametric tests in R

In R, we can use the wilcox.test() function to perform both the signed-rank test (for paired observations) or the rank sum test (for unpaired observations). In general, the format is as follows:

wilcox.test(outcome ~ groups, data = dataset, paired = ___)

where outcome is the outcome variable, groups is the group variable, dataset is the dataset, and paired is either TRUE or FALSE depending on whether you are looking for a signed rank test or rank sum test, respectively. You may also specified the alternative to be "two.sided", "less", or "greater".

For the Kruskal-Wallis test, use the kruskal.test() function. The general format is:

kruskal.test(outcome ~ groups, data = dataset)

where again, outcome is the outcome variable, groups is the group variable, and dataset is the dataset.

To perform a Fisher’s exact test, use the fisher.test() function on a table object. For instance:

fisher.test(table(licorice$preOp_asa, licorice$treat))

would perform a Fisher’s exact test comparing ASA status and treatment group for the licorice dataset.

Power and sample size calculations

Power and sample size calculations for t-tests are given by the power.t.test() function. In general, the format is:

power.t.test(n = ___, 
             delta = ___, 
             sd = ___, 
             sig.level = ___,
             power = ___,
             type = ___,
             alternative = ___)

Remember that to solve for power, you must specify the following quantities: - n, the sample size, - delta, the mean difference between the groups (minimum detectable difference), - sd, the standard deviation, and - sig.level, the type 1 error rate. You must also say whether the type is "two.sample", "one.sample", or "paired", and whether the alternative is "two.sided" or "one.sided". The sample size listed is the sample size per group in the case of a two-sample or paired test.

Example: Suppose we are conducting a two-sample independent samples t-test, have 28 observations per group, and want to calculate the power to detect a difference of 1.5 units between the group, when the standard deviation of this difference is 2.3, at the \(\alpha\) = 0.05 significance level:

power.t.test(n = 28,
             delta = 1.5, 
             sd = 2.3,
             sig.level = 0.05,
             type = "two.sample",
             alternative = "two.sided")
## 
##      Two-sample t test power calculation 
## 
##               n = 28
##           delta = 1.5
##              sd = 2.3
##       sig.level = 0.05
##           power = 0.6688238
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

We can obtain approximately 67% power.

To solve for the sample size needed at a given delta, sd, sig.level, and power, you would simply input those arguments to the power.t.test function.

Example: Suppose we are conducting a one-sample t-test and want to calculate how many observations we need to ensure 90% power at the \(\alpha\) = 0.05 level in order to detect a difference of 2.5 units away from the null hypothesis, when the standard deviation is 3:

power.t.test(delta = 2.5,
             sd = 3,
             sig.level = 0.05,
             power = 0.9,
             type = "one.sample",
             alternative = "two.sided")
## 
##      One-sample t test power calculation 
## 
##               n = 17.16708
##           delta = 2.5
##              sd = 3
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided

We would need 18 observations in order to do so.

Exercises

The file cereals.csv contains data for 77 breakfast cereals in terms of their nutritional content per serving and other characteristics. The variables are: - name: name of the cereal - mfr: manufacturer (A: American Home Food Products, G: General Mills, K: Kellogg’s, N: Nabisco, P: Post, Q: Quaker Oats, and R: Ralston Purina) - type: cold vs. hot - calories, protein, fat, sodium, fiber, carbo, sugars, potass: the amount of each of these nutrients, per gram, per serving (sodium and potassium are measured in milligrams) - shelf: where the cereal was physically located (cereals on low aisles may be targeted toward children, for instance).

For the purposes of this homework, you may assume that this is a random sample of cereal types. You may load the dataset with the code below:

library(tidyverse)
cereal <- read.csv("https://www2.stat.duke.edu/courses/Fall20/sta102/hw/data/cereals.csv")
  1. 15 points Is there evidence that there is a relationship between cereal manufacturer and where on the shelf their cereals are placed? Formally evaluate any assumptions needed for a parametric test, and perform either a parametric test or non-parametric test as appropriate. State your null and alternative hypotheses, significance level, p-value, decision, and conclusion in context of this question.
  2. 20 points You are worried that cereals on the lowest shelf level (often targeted to children) might have lower fiber than cereals on the highest shelf level. Formally evaluate any assumptions needed for a parametric test to evaluate this claim, and perform either a parametric test or non-parametric test as appropriate, stating your null and alternative hypotheses, significance level, p-value, decision, and conclusion in context of this question.
  3. 25 points You are worried that there may be differential amounts of sugar in cereals depending on their location on the shelf. Formally evaluate any assumptions needed for a parametric test to evaluate this claim, and perform either a parametric test or non-parametric test as appropriate, stating your null and alternative hypotheses, significance level, p-value, decision, and conclusion in context. Conduct formal step-down tests as needed, or explain why it is inappropriate to do so in this context.
  4. 10 points Suppose you are interested in testing whether name brand cereals (Cheerios, Frosted Mini-Wheats, Lucky Charms, etc.) and their generic counterparts (Oat-ee-os, Frosted Wheat Squares, Charmed Marshmallows, etc.) have similar amounts of Vitamin D. How many total cereals would you need to test in order to detect a mean difference of 2 units (Vitamin D is literally measured in “international units”) between name brand and generic cereals with 80% power at a 5% type 1 error rate if the standard deviation of this difference is 4 units?
  5. 10 points Suppose you have 20 “high-shelf” cereals and 20 “low-shelf” cereals. What is the minimum difference in mean protein levels between the two groups you can detect with 80% power at the 5% type 1 error rate if the standard deviation of the difference is 6 grams?

Read pages 453 through 455 (i.e., up to and including the section titled “WHAT TYPE OF STUDY SHOULD HAVE A POWER CALCULATION PERFORMED?” of the paper An introduction to power and sample size estimation by Jones, Carley, and Harrison (Emergency Medicine Journal, 2004). This article is available for free here, and has been cited 400 times.

  1. 7 points Explain why a test that can attain a very high power and low type 1 error rate using a small sample size may still not be very useful scientifically.
  2. 7 points Explain the importance of performing a power / sample size calculation during the design process of a study.
  3. 6 points There is a subtle error in the first page of their article (in the section “WHAT IS POWER AND WHY DOES IT MATTER”)! Identify what this error is (hint: it’s not actually addressed in their correction of the original research paper) and correct it.