Categorical Data (4)

class: center, middle, inverse, title-slide

# Categorical Data (4)
### Yue Jiang
### Duke University

---

### Roadmap for today

It's often the case that in the real world, you may be making predictions about 
some sort of binary/multinomial outcome, for instance modeled using binary/
multinomial regression or some other machine learning categorization approach.
Today we'll be talking about some methods of evaluating "how well we do" in 
terms of prediction.

---

### Licorice and post-op sore throat

---

### Medical diagnostics

Suppose we're interested in the performance of a diagnostic test. Let `$D+$` be the event that someone has a condition of interest, and let `$T+$` be the event that a test for that condition is positive

- .copper[Prevalence]: `$P(D+)$` The proportion of individuals with the condition
- .copper[Sensitivity]: `$P(T+|D+)$`, or the true positive rate
- .copper[Specificity]: `$P(T-|D-)$`, or 1 minus the false positive rate 
- .copper[Positive Predictive Value]: `$P(D+|T+)$` 
- .copper[Negative Predictive Value]: `$P(D-|T-)$`

Of course, we can generalize this to other situations with binary classifiers, but it's often most intuitive to think of a diagnostic test.

---

### Rapid self-administered HIV tests
		
.question[
Suppose I told you I had an HIV test that was >99.98% accurate. Would you think
this is a very good test? Why or why not?
]

---

### Rapid self-administered HIV tests

.pull-left[
From the FDA package insert for the OraQuick ADVANCE Rapid HIV-1/2 Antibody Test:

- Sensitivity, `$P(Test+|HIV+)$` 99.3% 
- Specificity, `$P(Test-|HIV-)$` 99.8%

From CDC statistics for 2016, 14.3/100,000 Americans aged 13 or older are HIV+.
]

.pull-right[
<img src="img/oraquickstick.jpg" width="20%" style="display: block; margin: auto;" />
]
		
.question[
Suppose a randomly selected American aged 13 or older has a positive test on this test. What do you think is the probability that they have HIV?
]

---

### Rapid self-administered HIV tests

.center[
.eno[6.6%]
]

- Is this calculation surprising? 
- What is the explanation?
- Was this calculation reasonable to perform?

.question[ 
What if a randomly selected adult in Botswana tested positive (HIV prevalence `$\approx$` 25%)?
]

---

### Licorice and post-op sore throat

```r
dat$anypain = ifelse(dat$postOp4hour_throatPain == 0, 0, 1)
table(dat$anypain)
```

```
## 
##   0   1 
## 157  76
```

---

### Predicting throat pain

```r
m1 <- glm(anypain ~ preOp_gender + preOp_asa + intraOp_surgerySize + treat,
          data = dat)
round(summary(m1)$coef, 3)
```

```
##                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                  0.172      0.124   1.387    0.167
## preOp_gender                -0.027      0.061  -0.441    0.660
## preOp_asamild_disease        0.155      0.082   1.878    0.062
## preOp_asasystemic_disease    0.131      0.094   1.387    0.167
## intraOp_surgerySize          0.086      0.056   1.552    0.122
## treat                       -0.232      0.059  -3.918    0.000
```

---

### Predicting throat pain

```r
round(predict(m1), 3)
```

```
##     1     2     3     4     5     6     7     8     9    10    11    12    13 
## 0.244 0.181 0.268 0.354 0.113 0.354 0.330 0.181 0.157 0.244 0.268 0.181 0.181 
##    14    15    16    17    18    19    20    21    22    23    24    25    26 
## 0.268 0.244 0.268 0.244 0.181 0.268 0.026 0.026 0.113 0.113 0.113 0.244 0.268 
##    27    28    29    30    31    32    33    34    35    36    37    38    39 
## 0.026 0.026 0.181 0.026 0.268 0.157 0.113 0.244 0.268 0.244 0.268 0.268 0.268 
##    40    41    42    43    44    45    46    47    48    49    50    51    52 
## 0.181 0.268 0.268 0.268 0.157 0.113 0.268 0.244 0.181 0.268 0.354 0.330 0.154 
##    53    54    55    56    57    58    59    60    61    62    63    64    65 
## 0.241 0.241 0.086 0.154 0.217 0.000 0.217 0.130 0.154 0.241 0.217 0.217 0.086 
##    66    67    68    69    70    71    72    73    74    75    76    77    78 
## 0.086 0.241 0.241 0.086 0.241 0.217 0.217 0.217 0.130 0.241 0.241 0.217 0.241 
##    79    80    81    82    83    84    85    86    87    88    89    90    91 
## 0.327 0.000 0.241 0.241 0.086 0.241 0.241 0.086 0.303 0.000 0.241 0.154 0.086 
##    92    93    94    95    96    97    98    99   100   101   102   103   104 
## 0.154 0.241 0.154 0.241 0.268 0.330 0.268 0.181 0.181 0.113 0.244 0.244 0.181 
##   105   106   107   108   109   110   111   112   113   114   115   116   117 
## 0.268 0.354 0.268 0.330 0.268 0.244 0.268 0.268 0.154 0.241 0.154 0.241 0.241 
##   118   119   120   121   122   123   124   125   126   127   128   129   130 
## 0.414 0.345 0.345 0.500 0.259 0.389 0.500 0.259 0.500 0.345 0.259 0.476 0.500 
##   131   132   133   134   135   136   137   138   139   140   141   142   143 
## 0.500 0.500 0.259 0.500 0.389 0.500 0.500 0.500 0.500 0.500 0.345 0.414 0.476 
##   144   145   146   147   148   149   150   151   152   153   154   155   156 
## 0.500 0.500 0.500 0.500 0.500 0.476 0.476 0.500 0.476 0.562 0.414 0.500 0.500 
##   157   158   159   160   161   162   163   164   165   166   167   168   169 
## 0.500 0.449 0.449 0.473 0.232 0.449 0.387 0.319 0.473 0.473 0.319 0.473 0.473 
##   170   171   172   173   174   175   176   177   178   179   180   181   182 
## 0.319 0.387 0.473 0.473 0.449 0.449 0.473 0.473 0.387 0.449 0.387 0.473 0.473 
##   183   184   185   186   187   188   189   190   191   192   193   194   195 
## 0.560 0.473 0.387 0.363 0.363 0.449 0.473 0.449 0.473 0.345 0.500 0.389 0.562 
##   196   197   198   199   200   201   202   203   204   205   206   207   208 
## 0.500 0.476 0.500 0.476 0.500 0.500 0.345 0.500 0.587 0.345 0.500 0.345 0.587 
##   209   210   211   212   213   214   215   216   217   218   219   220   221 
## 0.476 0.476 0.414 0.500 0.500 0.476 0.500 0.500 0.389 0.259 0.562 0.587 0.587 
##   222   223   224   225   226   227   228   229   230   231   232   233 
## 0.500 0.476 0.473 0.473 0.387 0.449 0.473 0.473 0.536 0.387 0.232 0.449
```

---

### Predicting throat pain

```r
dat$pred50 <- ifelse(predict(m1) > 0.5, "Pain", "No pain")
table(dat$pred50, dat$anypain == TRUE)
```

```
##          
##           FALSE TRUE
##   No pain   136   55
##   Pain       21   21
```

.question[
What is the accuracy of our procedure here? What are the sensitivity and
specificity? The positive/negative predictive values?
]

---

### Predicting throat pain

```r
dat$pred25 <- ifelse(predict(m1) > 0.25, "Pain", "No pain")
table(dat$pred25, dat$anypain == TRUE)
```

```
##          
##           FALSE TRUE
##   No pain    72   14
##   Pain       85   62
```

.question[
What is the accuracy of our procedure here? What are the sensitivity and
specificity? The positive/negative predictive values?
]

---

### Predicting throat pain

```r
dat$pred10 <- ifelse(predict(m1) > 0.1, "Pain", "No pain")
table(dat$pred10, dat$anypain == TRUE)
```

```
##          
##           FALSE TRUE
##   No pain    14    1
##   Pain      143   75
```
.question[
What is the accuracy of our procedure here? What are the sensitivity and
specificity? The positive/negative predictive values? What are you noticing?
]

---

### Discrimination thresholds

Oral HIV tests give positive or negative results depending on levels of HIV antibodies detected in saliva

- If antibody levels are above a certain threshold, it is classified as a positive test
- Varying the threshold for a positive vs. negative test will result in a test in different characteristics
- But what is this threshold of "high" antibody level that classifies them as positive?
- We are looking for a binary classifier that classifies a patient as being positive or negative based on a threshold value of a continuous variable.
- At each threshold value, there is a trade-off between sensitivity and specificity

---

### ROC curves

.copper[Receiver Operating Characteristic] curves show how specificity and specificity change as the discrimination threshold changes; that is, for each false positive rate (1 - specificity), what the true positive rate (sensitivity) is corresponding to that threshold value.

---

### ROC curves

```r
library(PRROC)
dat$preds <- predict(m1)
roc1 <- roc.curve(dat$preds[dat$anypain == 1], 
                  dat$preds[dat$anypain == 0],
                  curve = TRUE)
plot(roc1)
```

---

### ROC curves

---

### PR curves

.copper[Precision-Recall] curves are also used to evaluate binary classification models. Instead of modeling the true positive rate (sensitivity) as a function of the false positive rate (1 - specificity), precision-recall curves model the positive predictive value as a function of the true positive rate.

.question[
Why might we care about a precision-recall curve vs. an ROC curve?
]

---

### PR curves

Think back to the HIV example: a test that has >99.98% accuracy could simply be a sheet of paper with the word "no" on it given to random Americans aged 13 or older.

It's often the case that there's an imbalance between the two statuses "in truth," and we are less interested in the more common status.

---

### PR curves

```r
pr1 <- pr.curve(dat$preds[dat$anypain == 1], 
                dat$preds[dat$anypain == 0],
                curve = TRUE)
plot(pr1)
```

---

### PR curves

---

### PR curves

```r
pr.vals <- pr1$curve
head(pr.vals)
```

```
##           [,1]      [,2]          [,3]
## [1,] 1.0000000 0.3261803 -0.0003507909
## [2,] 1.0000000 0.3261803 -0.0003507909
## [3,] 1.0000000 0.3304348  0.0264218380
## [4,] 1.0000000 0.3377778  0.0861142977
## [5,] 0.9868421 0.3440367  0.1128869265
## [6,] 0.9736842 0.3507109  0.1302833652
```

---

### PR curves

```r
ggplot(data.frame(pr1$curve), aes(x = X1, y = X2, color = X3)) + 
  geom_line() + 
  labs(x = "Sensitivity",
       y = "Positive Predictive Value",
       color = "p-hat") + 
  scale_color_gradient(low = "darkblue", high = "deeppink") +
  theme_bw()
```

---

### PR curves

<img src="cat-4_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" />
---

### Comparing ROC Curves

```r
m2 <- glm(anypain ~ preOp_gender + preOp_asa + intraOp_surgerySize,
          data = dat)
round(summary(m2)$coef, 3)
```

```
##                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                  0.033      0.123   0.267    0.790
## preOp_gender                -0.036      0.063  -0.570    0.569
## preOp_asamild_disease        0.165      0.085   1.948    0.053
## preOp_asasystemic_disease    0.143      0.097   1.475    0.141
## intraOp_surgerySize          0.096      0.057   1.669    0.097
```

---

### Comparing ROC Curves

```r
dat$preds2 <- predict(m2)

library(pROC)
roc1 <- roc(dat$anypain ~ dat$preds)
roc2 <- roc(dat$anypain ~ dat$preds2)
```
---

### Comparing ROC Curves

```r
plot(roc1, main = "Using Treatment in Model")
```

<img src="cat-4_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" />
---

### Comparing ROC Curves

```r
plot(roc2, main = "No Treatment in Model")
```

---

### Comparing ROC Curves

```r
roc.test(roc1, roc2, method = "bootstrap")
```

```
## 
## 	Bootstrap test for two correlated ROC curves
## 
## data:  roc1 and roc2
## D = 2.0736, boot.n = 2000, boot.stratified = 1, p-value = 0.03811
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.6886943   0.6141468
```

---

### Brier score

For binary outcomes, the Brier score is simply the MSE of the prediction; a 0
is perfectly predicted and a 1 corresponds to being "perfectly incorrect."

`\begin{align*}
B = \frac{1}{n}\sum_{i = 1}^n(\hat{p}_i - Y_i)^2
\end{align*}`

```r
mean( (dat$preds - dat$anypain)^2 )
```

```
## [1] 0.1985282
```