This homework covers two-sample confidence intervals and hypothesis tests (Ch. 1-3 in the textbook). The exercise and page numbers refer to The Statistical Sleuth (3rd Edition) by Ramsey and Schafer. You may discuss the assignment with others; however, you must write and submit your own answers.
Please use the STA 210: HW 1 template to write up your assignment and submit your work as a PDF under the Assignments tab in Sakai. To ensure you have the updated template, run the R code below to update the STA210
package in R Studio.
devtools::install_github("matackett/sta210/STA210") # installs package from GitHub
You will need the following libraries to complete the assignment:
library("tibble")
library("dplyr")
library("ggplot2")
library("STA210")
library("Sleuth3") #data sets from the book
t.test
Function
You can use the t.test
function in R Studio to conduct one- and two-sample inference using the t inference methods. Type ?t.test
in the console to learn more about the syntax of this function. You may also refer to your notes for example code.
When you conduct two-sample inference, the data for each group must be stored in separate data frames. For example, in order to conduct two-sample inference for the oscar_winners
data (in the STA210 package), you can put the age
data for the Best Actor and Best Actress winners in separate data frames using the following code:
best_actor <- as.data.frame(oscar_winners %>%
filter(category=="Best Actor") %>%
select(age))
best_actress <- as.data.frame(oscar_winners %>%
filter(category=="Best Actress") %>%
select(age))
The filter
function tells R Studio to include only the observations in the specified category. The select()
function tells R Studio which columns to include in the new data frame. The as.data.frame()
function ensures the data are stored as data frames so they can be used in the t.test
function.
Sometimes it is beneficial to create new variables to use in a data analysis. This often occurs if you need to transform a variable in order to make it satisfy the assumptions for our statistical methods. You can use the mutate
function to create a new variable. For example, if you wanted to create a new variable that calculates the sqrt(age)
and add the new variable to the oscar_winners
data set, you can use the code below:
oscar_winners <- oscar_winners %>% mutate(new_var = sqrt(age))
Now, the oscar_winners
data set contains the variable new_var
.
glimpse(oscar_winners)
## Observations: 180
## Variables: 6
## $ award.year <int> 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 193...
## $ age <int> 44, 38, 62, 53, 41, 34, 33, 49, 41, 37, 38, 34, 32,...
## $ name <chr> "Emil Jannings", "Warner Baxter", "George Arliss", ...
## $ movie <chr> "The Last Command", "In Old Arizona", "Disraeli", "...
## $ category <chr> "Best Actor", "Best Actor", "Best Actor", "Best Act...
## $ new_var <dbl> 6.633250, 6.164414, 7.874008, 7.280110, 6.403124, 5...
Sometimes, you will need to include mathematical notation in a document to help the reader more easily understand how you obtained your results. You can include mathematical notation in any R Markdown file using LaTex syntax. There are two ways you can display mathematics in your document:
Inline: Your mathematics will display within the line of text.$
to start and end your LaTex syntax.
The null hypothesis is $H_0:\mu = 0$
produces $$
to start and end your LaTex syntax.
The null hypothesis is $$H_0:\mu = 0$$
produces See Mathematics in R Markdown for an overview of the syntax for commonly used equations and symbols.
Question 1. Ex. 2.16 (pg. 53)
Use the case0101
data set from the Sleuth3
package.
Question 2. In Lab 1, you used exploratory data analysis to compare the distribution of ages for Best Actor and Best Actress winners at the Academy Awards. We would like to use this data to determine whether movie actors are older, on average, than movie actresses.
age
for the Best Actor and Best Actress winners. Based on the distributions of age
, is it appropriate to use the two-sample \(t\) inference methods? Explain. Be sure to include the code for any graphs and/or summary statistics you used to make your assessment. (See sections 3.2 and 3.3 for details about the robustness of the two-sample t methods.)
Question 3. Ex. #3.33 (pg. 83)
Use the ex0333
data set from the Sleuth3
package.
Once you click Knit, your PDF will appear in a new window. If you don’t see a PDF, check the pop-up blockers on your web browser.
Once you knit the file, you should see the written text along with any R code and the resulting output and/or plots. If you want to change anything in your write-up, you can make changes in the R Markdown file and knit the document to generate the updated PDF.
Once you have created the PDF file, you can export it from the Docker container to your local machine. To export the file, click the Download button in the upper right-hand corner.
You can now submit the downloaded PDF under the Assignments tab on Sakai.