You must turn in a knitted file to Gradescope from a Quarto Markdown
file in order to receive credit. Be sure to “associate”
questions appropriately on Gradescope. As a reminder, late work
is not accepted outside of the 24-hour grace period for homework
assignments.
The Quarto template for this assignment may be found in the
repository at the following link: https://classroom.github.com/a/Ji1s1ulg
Today’s data come from Dr. Songxi
Chen’s group at Peking University and represent hourly weather
measurements at the National Olympics Sports Center in Beijing. It
contains the time at which the measurement was taken (year,
month, day, and hour), but also
the levels of six pollutants (PM2.5, PM10,
SO2, NO2, CO, and
O3, in micrograms per cubic meter), as well as
weather-related variables of TEMP (temperature in Celsius),
pres (barometric pressure in hectopascals),
DEWP, (dew point in Celsius), RAIN
(precipitation in millimeters, including from all sources such as rain,
snow, etc.), wd (wind direction), and WSPM
(wind speed in meters per second).
This week’s homework focuses on statistical inference and linear
regression models with a single predictor.
Important: Some of your grade on this assignment
will also be based on meaningful commit descriptions. For the purposes
of this assignment, you must make at least two meaningful commits/pushes
with an appropriate description. As well, don’t forget to change the
name in the Quarto template.
- We’ll begin by examining how certain weather-related phenomena might
be related. Fit a linear model that uses temperature to predict the
barometric pressure. Interpret the slope and intercept estimates in
context, and then conduct a formal hypothesis test assessing whether
there is evidence for a linear relationship between these two variables.
In this hypothesis test, include the significance level you are using,
your two hypotheses of interest, the value of the test statistic, the
distribution of the test statistic under the null hypothesis, the
p-value, and your conclusion in context of the original data.
- Fit a linear model that uses temperature to predict the carbon
monoxide level (CO). Interpret the slope and intercept estimates in
context, and then conduct a formal hypothesis test assessing whether
there is evidence for a linear relationship between these two
variables.
- Use your model in Exercise 2 to predict the mean carbon monoxide
levels on 0 degree days, 10 degree days, 20 degree days, and 30 degree
days.
- Now let’s think about the model in Exercise 2 in terms of causal
explanation. Does your model do an adequate job in providing a
causal explanation for the observed relationship? Explain why
or why not. If you feel like your model does not do an
adequate job in explaining differential carbon monoxide levels due to
temperature, propose a reasonable causal pathway that would explain the
observed relationship from Exercise 2 (you might have to do some
external research. Cite your sources!).
- Fit a linear model that uses the month to predict the
carbon monoxide level (CO). Interpret the slope and intercept estimates
in context. Are either of these estimates meaningful? Explain why or why
not, supporting your answer with a high-quality visualization.
- For this last question, consider only observations in
February (hint: there should be 2,712). Fit a linear model that
uses the amount of precipitation (
RAIN) to predict the
carbon monoxide level (CO). Interpret the slope and intercept estimates
in context, and then conduct a formal hypothesis test assessing whether
there is evidence for a linear relationship between these two
variables.