Getting Started

  • Go to the STA210-Sp19 organization on GitHub (https://github.com/STA210-Sp19). Click on the repo with the prefix hw-06. It contains the starter documents you need to complete the lab.
  • Clone the repo in RStudio Cloud
  • Configure git using the use_git_config() function. You can also cache your password.

Packages

Fill in the packages you need to complete the assignment in your R Markdown document.

library(tidyverse)
library(knitr)
library(broom)
library(nnet)

# Include other libraries as needed

The Data

For this assignment, we will analyze data from the eye witness identification experiment in Carlson and Carlson (2014). In this experiment, participants were asked to watch a video of a mock crime (from the first person perspective), spend a few minutes completing a random task, and then identify the perpetrator of the mock crime from a line up shown on the screen. Every lineup in this analysis included the true perpetrator from the video. After viewing the line-up , each participant could make one of the following decisions (id):

  • correct: correctly identified the true perpetrator
  • foil: incorrectly identified the “foil”, i.e. a person who looks very similar to the perpetrator
  • reject: incorrectly concluded the true perpetrator is not in the lineup

The main objective of the analysis is to understand how different conditions of the mock crime and suspect lineup affect the decision made by the participant. We will consider the following conditions to describe the decisions:

  • lineup: How potential suspects are shown to the participants
    • Simultaneous Lineup: Participants were shown photos of all 6 potential suspects at the same time and were required to make a single decision (identify someone from the lineup or reject the lineup).
    • Sequential 5 Lineup: Photos of the 6 suspects were shown one at a time. The participant was required to make a decision (choose or don’t choose) as each photo was shown. Once a decision was made, participants were not allowed to reexamine a photo. If the participant made an identification, the remaining photos were not shown. In each of these lineups the true perpetrator was always the 5th photo in the lineup.
  • weapon: Whether or not a weapon was present in the video of the mock crime.
  • feature: Whether or not the perpetrator had a distinctive marking on his face. In this experiment, the distinctive feature was a large “N” sticker on one cheek. (The letter “N” was chosen to represent the first author’s alma mater - University of Nebraska.)

Click here to download the data. Upload the data into the data folder in your RStudio Cloud project.

Questions

  1. Conduct exploratory data analysis to describe the relationship between the response variable (id) and each of the explanatory variables (lineup, weapon, and feature). Include any appropriate plots and/or summary statistics.
    • Based on the exploratory data analysis, which conditions (if any) do you think affect the id made by a participant?
  2. Briefly explain why you should use a multinomial logistic regression model to predict id using lineup, weapon and feature.

  3. Fit the multinomial logistic model that only includes main effects. Display the model output.
    • What is the baseline category?
    • Interpret the intercepts for each part of the model in terms of the odds.
    • Interpret the coefficients of lineup for each part of the model in terms of the odds.
  4. You want to consider interaction effects for the model. Use the appropriate test to determine if there are significant interactions.
    • Based on your test, is there evidence of any significant interaction effects?

Regardless of your answer to Question 4, use the model that includes the interaction terms for the remainder of the assignment.

  1. According to the model,

    • If there was no weapon but the perpetrator had a distinctive feature in the mock crime, how do the log-odds of reject vs. a correct ID change when there is a simultaneous lineup vs. a sequential lineup?
    • If there was no weapon but the perpetrator had a distinctive feature in the mock crime, how do the odds of reject vs. a correct ID change when there is a simultaneous lineup vs. a sequential lineup?
    • Which group of participants (i.e. which set of experimental conditions) is described by the intercept?
  2. The plots of the residuals versus the predicted probabilities, and the average residuals across categories of each predictor variable are shown below. Based on these plots and tables, are there any concerns with the model fit? Briefly explain.

We now want to calculate the average residuals for each category of ID for each categorical variable. The code to calculate the residuals by category of lineup is shown below.

## # A tibble: 2 x 4
##   lineup         correct.avg foil.avg    reject.avg
##   <chr>                <dbl>    <dbl>         <dbl>
## 1 Sequential 5 -0.0000000518 -7.27e-9  0.0000000591
## 2 Simultaneous  0.000000118  -5.38e-8 -0.0000000641
## # A tibble: 2 x 4
##   weapon   correct.avg      foil.avg    reject.avg
##   <chr>          <dbl>         <dbl>         <dbl>
## 1 no      0.000000111  -0.0000000722 -0.0000000384
## 2 yes    -0.0000000451  0.0000000114  0.0000000338
## # A tibble: 2 x 4
##   feature   correct.avg      foil.avg    reject.avg
##   <chr>           <dbl>         <dbl>         <dbl>
## 1 no      0.0000000555  -0.0000000410 -0.0000000146
## 2 yes     0.00000000916 -0.0000000195  0.0000000104
  1. Use the model to predict the decision made by each participant. Make a table of the predicted vs.the actual decisions.

    • Briefly describe how the predicted decision is determined for each participant.
    • What is the misclassification rate?