You must turn in a knitted file to Gradescope from a Quarto Markdown file in order to receive credit. Be sure to “associate” questions appropriately on Gradescope. As a reminder, late work is not accepted outside of the 24-hour grace period for homework assignments.

The Quarto template for this assignment may be found in the repository at the following link: https://classroom.github.com/a/4yjSeHXP

Baseball is one of the most popular sports in the US. In baseball, two teams take turns on offense and defense, where the offensive team aims to earn points by primarily by running bases and the defensive team aims to stop them. When a team is on the offensive, their players take turns at bat, whereby they aim to hit the baseball in order to give their team’s players the opportunity to score points.

One (not great, but it's something) way in which teams might measure the strength of a player is the batting average (AVG), which is given by the number of successful hits divided by the number of times a player had an opportunity at bat. In recent years, the league-wide average is around 0.250 (i.e., successfully hitting 25%) [3]. A batting average greater than 0.300 is considered excellent. Note that the batting averages here are reported as percentages out of 100, so each additional unit increase is an absolute percentage point increase in batting average.

Today's data are pre-COVID baseball statistics that contain salary information (salary), the number of games that player has ever played (G), batting average (AVG), whether the player is an All-Star (allstar), whether they bat left-handed, right-handed, or with both hands (bats), the age at which they debuted in the Major Leagues (ageDebut), among other variables.

This week’s homework focuses on statistical inference and linear regression models with a multiple predictors, including categorical ones.

Important: Some of your grade on this assignment will also be based on meaningful commit descriptions. For the purposes of this assignment, you must make at least two meaningful commits/pushes with an appropriate description. As well, don’t forget to change the name in the Quarto template.

  1. Fit a linear model with a single predictor that uses batting average to predict a player's salary. Interpret the slope and intercept estimates in context. Is there evidence for a linear relationship between these two variables at the 0.05 significance level? Explain, in the context of the original data.
  2. Fit a linear model that uses batting average, the number of games he has appeared in, All-Star status, batting hand, and age at debut to predict a player's salary. Interpret the slope and intercept estimates for all predictors in your modelin context.
  3. Using your model in Exercise 2, conduct a hypothesis test at the 0.05 level corresponding to the estimated slope parameter corresponding to batting average. In doing so, explain in words what your null and alternative hypotheses are, and the conclusion from your hypothesis test in the context of the data.
  4. Compare the hypothesis tests conducted in Exercise 1 and Exercise 3. What might be the reason for any differences? Are the results from Exercise 3 what you expect? Explain.
  5. Using your model in Exercise 2, what would the predicted salary be for a player with a batting average of 0.25 who is an All-Star, has played in 300 games, bats with both hands, and who was 22 at debut?