class: center, middle, inverse, title-slide # Exam II Review ### Dr. Maria Tackett ### 04.10.19 --- ## Announcements - Lab 09 due today - Exam II on Monday - Lab this week - Exam help - Extra exam help - Sunday afternoon (time TBD) --- ## Exam II Outline - Short answer questions - Permitted to bring one sheet of **<u>handwritten</u>** notes (front and back) + Must turn in notes with exam - Calculator **not** permitted on exam - Please use pencil! --- ## Topics - **Review:** Multiple Linear Regression - Model Selection - Logistic Regression - Multinomial Logistic Regression - Model validation - main ideas, i.e. why is it important? - Dealing with missing data - main ideas, i.e. why is it important? --- ## Model Selection - Consider the main objective: + Prediction + Adjusting for many variables + Explanation - Forward, backward, stepwise selection + Optimize some criteria at each step - *Example*: Minimize **AIC** = `\(n\log(SSE)−n\log(n)+2(p+1)\)` --- ## Logistic Regression - Use for response variable `\(Y\)` that is categorical with 2 levels $$\log\Big(\frac{\hat{p}_i}{1-\hat{p}_i}\Big) = \hat{\beta}_0 + \hat{\beta}_1 X_i + \dots + \hat{\beta}_p X_p $$ - <font class="vocab">Slope:</font> As `\(X_j\)` increases by 1 unit, the odds of `\(Y\)` are expected to multiply by a factor of `\(\exp\{\beta_j\}\)`, holding all else constant - <font class="vocab">Intercept: </font> When `\(X_1 = \dots X_p =0\)`, odds of `\(Y\)` are expected to be `\(\exp\{\beta_0\}\)` --- ## Multinomial Logistic Regression - Use for response variable that is categorical with more than 2 levels - Suppose we have a categorical variable with `\(k > 2\)` levels. Let `\(Y=1\)` be the baseline category `$$\log\bigg(\frac{\hat{p}_{2}}{\hat{p}_{1}}\bigg) = \hat{\beta}_{02} + \hat{\beta}_{12} X_1 + \dots + \hat{\beta}_{p2} X_p \\ \vdots \\ \log\bigg(\frac{\hat{p}_{k}}{\hat{p}_{1}}\bigg) = \hat{\beta}_{0k} + \hat{\beta}_{1k} X_1 + \dots + \hat{\beta}_{pk} X_p$$` - <font class="vocab">Slope</font>: When `\(X\)` increases by one unit, the odds of `\(Y=k\)` versus `\(Y=1\)` are expected to multiply by a factor of `\(\exp\{\hat{\beta}_{1k}\}\)`, holding all else constant. - <font class="vocab">Intercept</font>: When `\(X_1 = \dots X_p =0\)`, the odds of `\(Y=k\)` versus `\(Y=1\)` are expected to be `\(\exp\{\hat{\beta}_{0k}\}\)` --- class: middle, center ## Questions? --- ## Data Description - We would like to identify crab species based on the closing force and propodus height of claws + `ex0722` data set in the `Sleuth3` R package - **Predictors:** + <font class="vocab">Force: </font> Closing force of claw (newtons) + <font class="vocab">Height: </font> Propodus height (mm) - **Response:** + <font class="vocab">Species: </font> Hemigrapsus nudus (Hn), Lophopanopeus bellus (Lb), Cancer productus (Cp) --- ## Data Description .center[ <img src="img/21/claws.png" width="80%" style="display: block; margin: auto;" /> ] Source: Yamada, S. and Boulding E., 1998, Claw morphology, prey size selection and foraging efficiency in generalist and specialist shell-breaking crabs, *Journal of Experimental Marine Biology and Ecology*, 220: 191-211. --- ### Lb species? - Suppose we want to use `Force` and `Height` to determine whether or not a crab is from the Lophopanopeus bellus (Lb) species. .question[ - What type of model should we use? - What should we do for exploratory data analysis? ] --- ## Lb species? - We will use the mean-centered variables for force and height. <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> -1.130 </td> <td style="text-align:right;"> 0.463 </td> <td style="text-align:right;"> -2.443 </td> <td style="text-align:right;"> 0.015 </td> </tr> <tr> <td style="text-align:left;"> forceCent </td> <td style="text-align:right;"> 0.211 </td> <td style="text-align:right;"> 0.092 </td> <td style="text-align:right;"> 2.279 </td> <td style="text-align:right;"> 0.023 </td> </tr> <tr> <td style="text-align:left;"> heightCent </td> <td style="text-align:right;"> -0.895 </td> <td style="text-align:right;"> 0.398 </td> <td style="text-align:right;"> -2.249 </td> <td style="text-align:right;"> 0.025 </td> </tr> </tbody> </table> .question[ - Write the equation for the odds of a crab being from the Lb species. - Interpret the intercept in the context of the problem. - Interpret `forceCent` in the context of the problem. ] --- ## Lb species? - What does **sensitivity** mean in the context of this data? - What does **specificity** mean in the context of this data? --- ## Lb species? <img src="21-exam-02-review_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ``` ## Area under the curve: 0.7756 ``` --- ## Which species? - Suppose we want to use force and height to determine a crab's species. <table> <thead> <tr> <th style="text-align:left;"> y.level </th> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Hn </td> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> -1.193 </td> <td style="text-align:right;"> 1.106 </td> <td style="text-align:right;"> -1.079 </td> <td style="text-align:right;"> 0.281 </td> </tr> <tr> <td style="text-align:left;"> Hn </td> <td style="text-align:left;"> forceCent </td> <td style="text-align:right;"> -0.494 </td> <td style="text-align:right;"> 0.196 </td> <td style="text-align:right;"> -2.514 </td> <td style="text-align:right;"> 0.012 </td> </tr> <tr> <td style="text-align:left;"> Hn </td> <td style="text-align:left;"> heightCent </td> <td style="text-align:right;"> 0.179 </td> <td style="text-align:right;"> 0.474 </td> <td style="text-align:right;"> 0.378 </td> <td style="text-align:right;"> 0.705 </td> </tr> <tr> <td style="text-align:left;"> Lb </td> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 0.021 </td> <td style="text-align:right;"> 0.602 </td> <td style="text-align:right;"> 0.034 </td> <td style="text-align:right;"> 0.973 </td> </tr> <tr> <td style="text-align:left;"> Lb </td> <td style="text-align:left;"> forceCent </td> <td style="text-align:right;"> 0.095 </td> <td style="text-align:right;"> 0.101 </td> <td style="text-align:right;"> 0.941 </td> <td style="text-align:right;"> 0.347 </td> </tr> <tr> <td style="text-align:left;"> Lb </td> <td style="text-align:left;"> heightCent </td> <td style="text-align:right;"> -0.902 </td> <td style="text-align:right;"> 0.429 </td> <td style="text-align:right;"> -2.103 </td> <td style="text-align:right;"> 0.035 </td> </tr> </tbody> </table> .question[ 1. Write the equation for part of the model describing the odds of Hn vs. Cp species. 2. Interpret the intercept for this part of the model. 3. Interpret the coefficient of `forceCent` fort this part of the model. ]