STA 210: HW 1

due Monday, 9/17 at 11:59p

Introduction

This homework covers two-sample confidence intervals and hypothesis tests (Ch. 1-3 in the textbook). The exercise and page numbers refer to The Statistical Sleuth (3rd Edition) by Ramsey and Schafer. You may discuss the assignment with others; however, you must write and submit your own answers.

Please use the STA 210: HW 1 template to write up your assignment and submit your work as a PDF under the Assignments tab in Sakai. To ensure you have the updated template, run the R code below to update the STA210 package in R Studio.

devtools::install_github("matackett/sta210/STA210") # installs package from GitHub

You will need the following libraries to complete the assignment:

library("tibble")
library("dplyr")
library("ggplot2")
library("STA210")
library("Sleuth3") #data sets from the book

Homework Tips

Coding Tip: Preparing Data for the `t.test` Function

You can use the t.test function in R Studio to conduct one- and two-sample inference using the t inference methods. Type ?t.test in the console to learn more about the syntax of this function. You may also refer to your notes for example code.

When you conduct two-sample inference, the data for each group must be stored in separate data frames. For example, in order to conduct two-sample inference for the oscar_winners data (in the STA210 package), you can put the age data for the Best Actor and Best Actress winners in separate data frames using the following code:

best_actor <- as.data.frame(oscar_winners %>% 
  filter(category=="Best Actor") %>% 
  select(age))
best_actress <- as.data.frame(oscar_winners %>% 
  filter(category=="Best Actress") %>% 
  select(age))

The filter function tells R Studio to include only the observations in the specified category. The select() function tells R Studio which columns to include in the new data frame. The as.data.frame() function ensures the data are stored as data frames so they can be used in the t.test function.

Coding Tip: Creating New Variables

Sometimes it is beneficial to create new variables to use in a data analysis. This often occurs if you need to transform a variable in order to make it satisfy the assumptions for our statistical methods. You can use the mutate function to create a new variable. For example, if you wanted to create a new variable that calculates the sqrt(age) and add the new variable to the oscar_winners data set, you can use the code below:

oscar_winners <- oscar_winners %>% mutate(new_var = sqrt(age))

Now, the oscar_winners data set contains the variable new_var.

glimpse(oscar_winners)

## Observations: 180
## Variables: 6
## $ award.year <int> 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 193...
## $ age        <int> 44, 38, 62, 53, 41, 34, 33, 49, 41, 37, 38, 34, 32,...
## $ name       <chr> "Emil Jannings", "Warner Baxter", "George Arliss", ...
## $ movie      <chr> "The Last Command", "In Old Arizona", "Disraeli", "...
## $ category   <chr> "Best Actor", "Best Actor", "Best Actor", "Best Act...
## $ new_var    <dbl> 6.633250, 6.164414, 7.874008, 7.280110, 6.403124, 5...

R Markdown Tip: Mathematical Notation

Sometimes, you will need to include mathematical notation in a document to help the reader more easily understand how you obtained your results. You can include mathematical notation in any R Markdown file using LaTex syntax. There are two ways you can display mathematics in your document:

Inline: Your mathematics will display within the line of text.

Use $ to start and end your LaTex syntax.
Example: The R Markdown text The null hypothesis is $H_0:\mu = 0$ produces

The null hypothesis is $H_0:\mu = 0$

Displayed: Your mathematics will display outside the line of text

Use a $$ to start and end your LaTex syntax.
Example: The R Markdown text The null hypothesis is $$H_0:\mu = 0$$ produces

The null hypothesis is \[H_0:\mu = 0\]

See Mathematics in R Markdown for an overview of the syntax for commonly used equations and symbols.

Questions

Question 1. Ex. 2.16 (pg. 53)

Use the case0101 data set from the Sleuth3 package.

Question 2. In Lab 1, you used exploratory data analysis to compare the distribution of ages for Best Actor and Best Actress winners at the Academy Awards. We would like to use this data to determine whether movie actors are older, on average, than movie actresses.

Consider the distributions of age for the Best Actor and Best Actress winners. Based on the distributions of age, is it appropriate to use the two-sample $t$ inference methods? Explain. Be sure to include the code for any graphs and/or summary statistics you used to make your assessment. (See sections 3.2 and 3.3 for details about the robustness of the two-sample t methods.)
Conduct the appropriate hypothesis test to determine if movie actors are older, on average, than movie actresses. Be sure to include
- Your hypotheses written in statistical notation.
- The R code and resulting output from conducting the test.
- Your conclusion in the context of the problem.
Suppose your friend reads the output from your test and says, “the probability that actors and actresses have the same average age is 5.242e-07.” Is your friend correct? Explain.
Calculate a 90% confidence interval to estimate the mean difference in age between movie actors and actresses. Interpret your interval in the context of the problem.
Is it reasonable to use the results of this analysis to make conclusions about the differences in average age between movie actors and actresses? Explain.

Question 3. Ex. #3.33 (pg. 83)

Use the ex0333 data set from the Sleuth3 package.

Submitting Your Assignment

Once you complete the assignment, you’re ready to Knit the file to create the PDF document. Click the Knit button in the menu bar.

Once you click Knit, your PDF will appear in a new window. If you don’t see a PDF, check the pop-up blockers on your web browser.

Once you knit the file, you should see the written text along with any R code and the resulting output and/or plots. If you want to change anything in your write-up, you can make changes in the R Markdown file and knit the document to generate the updated PDF.

Once you have created the PDF file, you can export it from the Docker container to your local machine. To export the file, click the Download button in the upper right-hand corner.

You can now submit the downloaded PDF under the Assignments tab on Sakai.