A brief outline of getting started is shown below. See the Lab 01 Instructions for more details about the steps.
Here are some tips as you complete HW 04:
We will use the following packages in this assignment:
library(tidyverse)
library(broom)
library(knitr)
library(pROC)
library(plotROC)
#include other packages as needed
The Conceptual section of homework contains short answer questions about the concepts discussed in class. Some of these questions may also require short chunks of code to produce the output needed to answer the question. Answers should be written in complete sentences. (The questions in this section are from Broadening Your Statistical Horizons.)
In an article by Roskes et. al. 2011, the authors report on the success rate of penalty kicks that were on-target, so that either the keeper saved the shot or the shot scored, for FIFA World Cup shootouts between 1982 and 2010. They found that 18 out of 20 shots were scored when the goalkeeper’s team was behind, 71 out of 90 shots were scored when the game was tied, and 55 out of 75 shots were scored with the goalkeeper’s team ahead.
Calculate the odds of a successful penalty kick for games in which the goalkeeper’s team was (i) behind, (ii) tied, or (iii) ahead.
Calculate the odds ratios for successful penalty kicks for (i) behind versus tied, and (ii) tied versus ahead. *Note: The odds ratio between events A and B can be calculated as \(\phi = \frac{\omega_A}{\omega_B}\) where \(\omega_i\) = odds of event \(i\).
Use the following scenario and model for questions 2 - 7.
An article in the Journal of Animal Ecology by Bishop (1972) investigated whether moths provide evidence of “survival of the fittest” with their camouflage traits. Researchers glued equal numbers of light and dark morph moths in lifelike positions on tree trunks at 7 locations from 0 to 51.2 km from Liverpool. They then recorded the numbers of moths removed after 24 hours, presumably by predators. The hypothesis was that, since tree trunks near Liverpool were blackened by pollution, light morph moths would be more likely to be removed near Liverpool. The following variables are used in this analysis:
morph
= light or darkdistance
= kilometers from Liverpoolplaced
= number of moths of a specific morph glued to trees at that locationremoved
= number of moths of a specific morph removed after 24 hourslog_odds_removed
= log odds of being removedThe model with log_odds_removed
as the response and distance
, morph
, the interaction distance*morph
as the predictor variables is shown below.
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | -1.123 | 0.240 | -4.687 | 0.001 | -1.657 | -0.589 |
distance | 0.018 | 0.007 | 2.437 | 0.035 | 0.002 | 0.035 |
morphlight | 0.374 | 0.339 | 1.103 | 0.296 | -0.381 | 1.129 |
distance:morphlight | -0.028 | 0.011 | -2.612 | 0.026 | -0.051 | -0.004 |
Use the model to interpret the following in terms of the log odds of a moth being removed:
distance
for a dark mothmorphlight
given the tree is in Liverpool (distance = 0
)Use the model to interpret the following in terms of the odds of a moth being removed:
distance
for a dark mothmorphlight
given the tree is in Liverpool (distance = 0
)Use the model to interpret the following in terms of the odds of a moth being removed:
distance
for a dark mothmorphlight
given the tree is in Liverpool (distance = 0
)Use the model to interpret the intercept in terms of the odds of a moth being removed.
Use the model to calculate the predicted odds of being removed for a light moth that is glued to the trunk of a tree that is 6.2 km from Liverpool.
Calculate the predicted probability that the moth described in the previous question is removed.
The Data Analysis section of homework contains open-ended data analysis questions. Your response should be neatly organized and read as a complete narrative. This means that in addition to addressing the question there should also be exploratory data analysis and an analysis of the model assumptions. In short, these questions should be treated as “mini-projects”.
For this portion of the homework, we will look at the email.csv
located in the data
folder. Click here to read more about the data set and the variable definitions. The goal of this analysis is to create a simple spam filter that uses characteristics of an email to determine if an email is considered spam. We will use the following variables in the analysis:
spam
: Indicator for whether the email was spam.to_multiple
: Indicator for whether the email was addressed to more than one recipient.num_char
: The number of characters in the email, in thousands.number
: Factor variable saying whether there was no number, a small number (under 1 million), or a big number.Be sure each variable is the correct type in R. If needed, recode the variables so they are the correct type. Then, include the following in your analysis:
spam
and each of the predictor variables.to_mutiple
and num_char
as the predictor variables and display the model output.number
should be included in the model. Display the output from the test and write your conclusion in the context of the problem.n.cut = 5
) on the curve.Once your work is finalized in your GitHub repo, you will submit it to Gradescope. Your assignment must be submitted on Gradescope by the deadline to be considered on time.
See Submitting the Assignment for more details on how to submit the assignment on Gradescope.
Total | 50 |
---|---|
Part 1: Conceptual | 20 |
Part 2: Data analysis | 25 |
Document neatly organized with clear headers | 3 |
At least 3 informative commit messages | 2 |
The questions in Part 1 are modified from exercises in Chapter 6 of Broadening Your Statistical Horizons.
The questions and data in Part 2 are modified from exercises in Chapter 9 of OpenIntro Statistics, 4th ed.
Bishop, J. A. 1972. “An Experimental Study of the Cline of Industrial Melanism in Biston Betularia (L.) (Lepidoptera) Between Urban Liverpool and Rural North Wales.” Journal of Animal Ecology 41 (1): 209–43.
Roskes, Marieke, Daniel Sligte, Shaul Shalvi, and Carsten K. W. De Dreu. 2011. “The Right Side? Under Time Pressure, Approach Motivation Leads to Right-Oriented Bias.” Psychology Science, 22 (11): 1403–7.