You must turn in a knitted file to Gradescope from a Quarto Markdown file in order to receive credit. Be sure to “associate” questions appropriately on Gradescope. As a reminder, late work is not accepted outside of the 24-hour grace period for homework assignments.

The Quarto template for this assignment may be found in the repository at the following link: https://classroom.github.com/a/Ji1s1ulg

Today’s data come from Dr. Songxi Chen’s group at Peking University and represent hourly weather measurements at the National Olympics Sports Center in Beijing. It contains the time at which the measurement was taken (year, month, day, and hour), but also the levels of six pollutants (PM2.5, PM10, SO2, NO2, CO, and O3, in micrograms per cubic meter), as well as weather-related variables of TEMP (temperature in Celsius), pres (barometric pressure in hectopascals), DEWP, (dew point in Celsius), RAIN (precipitation in millimeters, including from all sources such as rain, snow, etc.), wd (wind direction), and WSPM (wind speed in meters per second).

This week’s homework focuses on statistical inference and linear regression models with a single predictor.

Important: Some of your grade on this assignment will also be based on meaningful commit descriptions. For the purposes of this assignment, you must make at least two meaningful commits/pushes with an appropriate description. As well, don’t forget to change the name in the Quarto template.

  1. We’ll begin by examining how certain weather-related phenomena might be related. Fit a linear model that uses temperature to predict the barometric pressure. Interpret the slope and intercept estimates in context, and then conduct a formal hypothesis test assessing whether there is evidence for a linear relationship between these two variables. In this hypothesis test, include the significance level you are using, your two hypotheses of interest, the value of the test statistic, the distribution of the test statistic under the null hypothesis, the p-value, and your conclusion in context of the original data.
  2. Fit a linear model that uses temperature to predict the carbon monoxide level (CO). Interpret the slope and intercept estimates in context, and then conduct a formal hypothesis test assessing whether there is evidence for a linear relationship between these two variables.
  3. Use your model in Exercise 2 to predict the mean carbon monoxide levels on 0 degree days, 10 degree days, 20 degree days, and 30 degree days.
  4. Now let’s think about the model in Exercise 2 in terms of causal explanation. Does your model do an adequate job in providing a causal explanation for the observed relationship? Explain why or why not. If you feel like your model does not do an adequate job in explaining differential carbon monoxide levels due to temperature, propose a reasonable causal pathway that would explain the observed relationship from Exercise 2 (you might have to do some external research. Cite your sources!).
  5. Fit a linear model that uses the month to predict the carbon monoxide level (CO). Interpret the slope and intercept estimates in context. Are either of these estimates meaningful? Explain why or why not, supporting your answer with a high-quality visualization.
  6. For this last question, consider only observations in February (hint: there should be 2,712). Fit a linear model that uses the amount of precipitation (RAIN) to predict the carbon monoxide level (CO). Interpret the slope and intercept estimates in context, and then conduct a formal hypothesis test assessing whether there is evidence for a linear relationship between these two variables.