A brief outline of getting started is shown below. See the Lab 01 Instructions for more details about the steps.
Here are some tips as you complete HW 02:
We will use the following packages in this assignment:
library(tidyverse)
library(broom)
library(knitr)
If you need to install any of the packages, type install.packages("package_name")
in the console, where package_name
is the package you need to install.
The Conceptual section of homework contains short answer questions about the concepts discussed in class. Some of these questions may also require short chunks of code to produce the output needed to answer the question. Answers should be written in complete sentences.
The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents. Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.
Less than High School | High School | Jr. College | Bachelor’s | Graduate | Total | |
---|---|---|---|---|---|---|
Mean | 38.67 | 39.6 | 41.39 | 42.55 | 40.85 | 40.45 |
Standard deviation | 15.81 | 14.97 | 18.1 | 13.62 | 15.51 | 15.17 |
n | 121 | 546 | 97 | 253 | 155 | 1172 |
We wish to test the following hypotheses:
\[\begin{align} &H_0: \mu_{<HS} = \mu_{HS} = \mu_{JC} = \mu_{B} = \mu_{G} \\ &H_a: \text{ at least one }\mu_i \text{ is not equal to the others} \end{align}\]
Df | Sum Sq | Mean Sq | F-Stat | Pr(F > F-Stat) | |
---|---|---|---|---|---|
degree | * | * | 501.54 | * | 0.0682 |
Residuals | * | 267382 | * | * | |
Total | * | * |
ssw <- 267382 # sum of squares within (residuals)
msb <- 501.54 # mean square between (model)
p_val <- 0.0682
dfb <- _____ # degrees of freedom between (model)
dfw <- _____ # degrees of freedom within (residuals)
dft <- _____ # total degrees of freedom
ssb <- _____ # sum of squares between (model)
sst <- _____ # total sum of squares
msw <- _____ # mean square within (residuals)
f_stat <- _____ #F -statistic
source <- c("Degree (model)", "Residuals", "Total")
df <- c(dfb, dfw,dft)
ss <- c(ssb, ssw, sst)
ms <- c(msb, msw,NA)
f.statistic <- c(f_stat, NA, NA)
p.value <- c(p_val,NA,NA)
# combine the columns to make a table called "anova"
anova <- bind_cols("Source" = source, "df" = df, "Sum of squares" = ss,
"Mean square" = ms, "F-statistic" = f.statistic, "p-value" = p.value)
# print the table
kable(anova)
Check the assumptions for this test. Include your conclusion about whether the assumption is satisfied and a brief explanation supporting your conclusion.
State the conclusion of the test in the context of the data. You can use a significance level of \(\alpha = 0.05\) to make your conclusion.
The Data Analysis section of homework contains open-ended data analysis questions. Your response should be neatly organized and read as a complete narrative. This means that in addition to addressing the question(s) stated below, you should include exploratory data analysis and check the appropriate model assumptions. In short, these questions should be treated as “mini-projects”.
In a 1991 study, Allen et. al sought to answer whether the presence of a close friend or pet affected women’s stress levels as they completed challenging tasks. To test this, they conducted an experiment in which 45 women were tasked with counting backwards by 13s or 17s under one of three test conditions (group
): - C
: Control group, Alone - F
: Close friend present - P
: Pet present
To quantify stress level, they measured each woman’s heart rate and blood pressure after she completed the task. For today’s analysis, we will focus on the heart rate (heart_rate
).
Use Analysis of Variance to test whether there is an association between the presence of a friend or pet and stress level when completing challenging tasks. Your analysis should include
group
as the predictor to compare the predicted mean heart_rate
for each group and discuss which groups (if any) have a significantly different mean from one another.The data is available in stress-experiment.csv
in the data
folder.
Once your work is finalized in your GitHub repo, you will submit it to Gradescope. Your assignment must be submitted on Gradescope by the deadline to be considered “on time”.
To submit your assignment:
Go to http://www.gradescope.com and click Log in in the top right corner.
Click School Credentials ➡️ Duke NetID and log in using your NetID credentials.
Click on the STA 210 Regression Analysis course.
Click on the assignment, and you’ll be prompted to submit it.
Select your assignment repo and choose “master” for the branch.
Click Upload. You should receive an email to confirm that the assignment has been submitted.
Total | 50 |
---|---|
Part 1: Conceptual problems | 18 |
Part 2: Data analysis | 20 |
rstudio::conf assignment | 10 |
At least 3 informative commit messages | 2 |
Note there is a 20% penalty if the .pdf file is incomplete and the .Rmd file has to be knitted for grading.
Questions in Part 1 are from Ex. 7.42 in OpenIntro Statistics, 4th ed.. Some parts have been reworded.
National Opinion Research Center, General Social Survey, 2018.