A brief outline of getting started is shown below. See the Lab 01 Instructions for more details about the steps.
Here are some tips as you complete HW 01:
We will use the following packages in this assignment:
library(tidyverse)
library(broom)
library(knitr)
library(openintro)
library(MASS)
If you need to install any of the packages, type install.packages("package_name")
in the console, where package_name
is the package you need to install.
The Conceptual section of homework contains short answer questions about the concepts discussed in class. Some of these questions may also require short chunks of code to produce the output needed to answer the question. Answers should be written in complete sentences.
For Questions 1 -2, we will be using the smoking
dataset in the openintro
R package. The data was originally provided by https://www.stem.org.uk/.
This dataset contains smoking habits and other demographic information for 1,691 randomly selected survey respondents in the United Kingdom (UK). Click here for a full list of the variables in the data. We will use the following variables:
smoke
: Yes: respondent smokes regularly, No: respondent does not smoke regularlyage
: Age in yearsAccording to a 2018 article by the BBC, the average age of a resident in the UK is 40. Is the average age of smokers in the UK significantly different from the average age of all UK residents?
Hint: Before you begin, create a subset that only contains smokers.
Is there a significant difference in the average age of people in the UK who smoke versus those who don’t? To answer this question, will use a t-test to compare the mean age of those who smoke versus those who don’t.
t.test
function to test the hypotheses from part(a). Write the definition of the test statistic in the context of this problem.The Data Analysis section of homework contains open-ended data analysis questions. Your response should be neatly organized and read as a complete narrative. This means that in addition to addressing the question(s) stated below, you should include exploratory data analysis and check the appropriate model assumptions. In short, these questions should be treated as “mini-projects”.
When veterinarians prescribe heart medicine for cats, the dosage often needs to be calibrated to the weight of the cat’s heart. It is very difficult to measure the heart’s weight, so veterinarians need a way to estimate it. One way to estimate it is using a cat’s body weight which is more feasible to obtain (though still difficult depending on the cat!).
We would like to fit a regression model that can be used by veterinarians to describe the relationship between a cat’s body weight and heart weight. You will use the cats
dataset in the MASS
package to complete the analysis. The cats
dataset includes the following variables:
Sex
: F: Female, M: MaleBwt
: Body weight in kilograms (kg)Hwt
: Heart weight in grams (g)Be sure to include the following in your analysis:
Tips
Be sure to include all relevant code and resulting output.
All plots should have proper labels for the axes and an informative title.
Once your work is finalized in your GitHub repo, you will submit it to Gradescope. Your assignment must be submitted on Gradescope by the deadline to be considered “on time”.
To submit your assignment:
Go to http://www.gradescope.com and click Log in in the top right corner.
Click School Credentials ➡️ Duke NetID and log in using your NetID credentials.
Click on the STA 210 Regression Analysis course.
Click on the assignment, and you’ll be prompted to submit it.
Select your assignment repo and choose “master” for the branch.
Click Upload. You should receive an email to confirm that the assignment has been submitted.
Total | 50 |
---|---|
Part 1: Conceptual Problems | 25 |
Part 2: Data Analysis | 20 |
Document has clear question headers and narrative written in complete sentences | 3 |
At least 3 informative commit messages | 2 |
Note there is a 10% penalty if the .pdf file is incomplete and the .Rmd file has to be knitted for grading.
*See Getting Started with LaTex and LaTex Symbols Guide for more information about typing statistical notation using LaTex.