Use previous R Markdown examples to serve as a template for this assignment. Be especially careful about the YAML (the header) and spacing between questions.
In R, we can use the wilcox.test() function to perform both the signed-rank test (for paired observations) or the rank sum test (for unpaired observations). In general, the format is as follows:
wilcox.test(outcome ~ groups, data = dataset, paired = ___)
where outcome is the outcome variable, groups is the group variable, dataset is the dataset, and paired is either TRUE or FALSE depending on whether you are looking for a signed rank test or rank sum test, respectively. You may also specified the alternative to be "two.sided", "less", or "greater".
For the Kruskal-Wallis test, use the kruskal.test() function. The general format is:
kruskal.test(outcome ~ groups, data = dataset)
where again, outcome is the outcome variable, groups is the group variable, and dataset is the dataset.
To perform a Fisher’s exact test, use the fisher.test() function on a table object. For instance:
fisher.test(table(licorice$preOp_asa, licorice$treat))
would perform a Fisher’s exact test comparing ASA status and treatment group for the licorice dataset.
Power and sample size calculations for t-tests are given by the power.t.test() function. In general, the format is:
power.t.test(n = ___,
delta = ___,
sd = ___,
sig.level = ___,
power = ___,
type = ___,
alternative = ___)
Remember that to solve for power, you must specify the following quantities: - n, the sample size, - delta, the mean difference between the groups (minimum detectable difference), - sd, the standard deviation, and - sig.level, the type 1 error rate. You must also say whether the type is "two.sample", "one.sample", or "paired", and whether the alternative is "two.sided" or "one.sided". The sample size listed is the sample size per group in the case of a two-sample or paired test.
Example: Suppose we are conducting a two-sample independent samples t-test, have 28 observations per group, and want to calculate the power to detect a difference of 1.5 units between the group, when the standard deviation of this difference is 2.3, at the \(\alpha\) = 0.05 significance level:
power.t.test(n = 28,
delta = 1.5,
sd = 2.3,
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided")
##
## Two-sample t test power calculation
##
## n = 28
## delta = 1.5
## sd = 2.3
## sig.level = 0.05
## power = 0.6688238
## alternative = two.sided
##
## NOTE: n is number in *each* group
We can obtain approximately 67% power.
To solve for the sample size needed at a given delta, sd, sig.level, and power, you would simply input those arguments to the power.t.test function.
Example: Suppose we are conducting a one-sample t-test and want to calculate how many observations we need to ensure 90% power at the \(\alpha\) = 0.05 level in order to detect a difference of 2.5 units away from the null hypothesis, when the standard deviation is 3:
power.t.test(delta = 2.5,
sd = 3,
sig.level = 0.05,
power = 0.9,
type = "one.sample",
alternative = "two.sided")
##
## One-sample t test power calculation
##
## n = 17.16708
## delta = 2.5
## sd = 3
## sig.level = 0.05
## power = 0.9
## alternative = two.sided
We would need 18 observations in order to do so.
The file cereals.csv contains data for 77 breakfast cereals in terms of their nutritional content per serving and other characteristics. The variables are: - name: name of the cereal - mfr: manufacturer (A: American Home Food Products, G: General Mills, K: Kellogg’s, N: Nabisco, P: Post, Q: Quaker Oats, and R: Ralston Purina) - type: cold vs. hot - calories, protein, fat, sodium, fiber, carbo, sugars, potass: the amount of each of these nutrients, per gram, per serving (sodium and potassium are measured in milligrams) - shelf: where the cereal was physically located (cereals on low aisles may be targeted toward children, for instance).
For the purposes of this homework, you may assume that this is a random sample of cereal types. You may load the dataset with the code below:
library(tidyverse)
cereal <- read.csv("https://www2.stat.duke.edu/courses/Fall20/sta102/hw/data/cereals.csv")
Read pages 453 through 455 (i.e., up to and including the section titled “WHAT TYPE OF STUDY SHOULD HAVE A POWER CALCULATION PERFORMED?” of the paper An introduction to power and sample size estimation by Jones, Carley, and Harrison (Emergency Medicine Journal, 2004). This article is available for free here, and has been cited 400 times.