Bayesian GLMs

class: center, middle, inverse, title-slide

# Bayesian GLMs
### Yue Jiang
### Duke University

---

### Estimating bike crashes in NC counties

---

### A familiar model

```
## # A tibble: 100 x 6
##    county       pop med_hh_income traffic_vol pct_rural crashes
##    <chr>      <dbl>         <dbl>       <dbl>     <dbl>   <dbl>
##  1 Alamance  166436          50.5         182        29      77
##  2 Alexander  37353          49.1          13        73       1
##  3 Alleghany  11161          39.7          28       100       1
##  4 Anson      24877          38            79        79       7
##  5 Ashe       27109          41.9          18        85       4
##  6 Avery      17505          41.7          35        89       5
##  7 Beaufort   47079          46.4          53        66      37
##  8 Bertie     19026          35.4          24        83      10
##  9 Bladen     33190          37            19        91       9
## 10 Brunswick 136744          60.2          43        43      88
## # ... with 90 more rows
```

- `pop`: county population
- `med_hh_income`: median household income in thousands
- `traffic_vol`: mean traffic volume per meter of major roadways
- `pct_rural`: percentage of county population living in rural area

---

### A familiar model

.question[
How might we formulate an analogous *Bayesian* Poisson model using population 
rurality (let's ignore any offset for now)?
]

`\begin{align*}
Y_i | \lambda_i &\stackrel{iid}{\sim} Pois(\lambda_i),\\
\log(\lambda_i) &= \beta_0 + \beta_1(pop) + \beta_2(rural)\\
\beta_0 &\sim \cdots\\
\beta_1 &\sim \cdots\\
\beta_2 &\sim \cdots\\
\end{align*}`

.question[
What sorts of priors might make sense here?
]

---

### Stan

- .vocab[Stan] is a statistical programming language that allows users to 
perform Bayesian inference using modified .vocab[Hamiltonian Monte Carlo] (HMC) 
- Whereas Gibbs samplers you have programmed previously require calculation of
full conditionals, HMC requires calculation of gradients of the log-density
(which can be done numerically)
- HMC often produces chains with less correlated samples, resulting in
larger effective sample sizes for chains of the same length
- Because HMC relies on gradients, it requires parameters to be continuous 
(well, there are "ways around this," but that's beyond the scope of STA 440)
- Tuning certain HMC parameters may be tricky at times, particularly for
multi-modal situations or log-densities with very steep gradient changes
(again, you probably won't need to worry about this too much in STA 440!)

---

### RStan

.vocab[RStan] is an interface to call Stan code from within R. There's a bit of 
a learning curve, but allows for full flexibility using the Stan language

```r
# From RStan vignette - simple normal model
data {
  int<lower=0> J;          // number of schools 
  real y[J];               // estimated treatment effects
  real<lower=0> sigma[J];  // s.e. of effect estimates 
}
parameters {
  real mu; 
  real<lower=0> tau;
  vector[J] eta;
}
transformed parameters {
  vector[J] theta;
  theta = mu + tau * eta;
}
model {
  target += normal_lpdf(eta | 0, 1);
  target += normal_lpdf(y | theta, sigma);
}
```

---

### rstanarm

.vocab[rstanarm] is an R package that allows users to harness the power of Stan 
while specifying commonly-seen models using familiar R model syntax

```r
# From rstanarm vignette - logistic model
test <- stan_glm(cbind(agree, disagree) ~ education + gender,
                 data = womensrole,
                 family = binomial(link = "logit"),
                 prior = student_t(df = 7, 0, 5),
                 prior_intercept = student_t(df = 7, 0, 5),
                 cores = 2, seed = 12345)
```

https://mc-stan.org/rstanarm/articles/rstanarm.html

---

### Back to bike crashes

.question[
What are the dangers of using .vocab[flat priors]? Why might 
.vocab[weakly informative] priors be preferred?
]

---

### Back to bike crashes

```r
summary(bike$crashes)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   24.00   74.46   81.25 1045.00
```

```r
summary(bike$pop)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4131   24573   55800  103836  118373 1093901
```

```r
summary(bike$pct_rural)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   42.50   62.50   61.24   85.00  100.00
```

.question[
What priors for `$\beta$` might make sense? What about .vocab[hyperpriors]?
]

---

### Back to bike crashes

```r
library(rstanarm)
m1 <- stan_glm(crashes ~ I(pop/1000000) + pct_rural, 
                         data = bike, 
                         family = poisson,
                         prior_intercept = normal(5, 10),
                         prior = normal(0, 2.5, autoscale = T), 
                         chains = 2, iter = 10000, seed = 123, 
                         prior_PD = F)
```

.question[
What do each of these function arguments mean?
]

---

### Back to bike crashes

```r
prior_summary(m1)
```

```
## Priors for model 'm1' 
## ------
## Intercept (after predictors centered)
##  ~ normal(location = 5, scale = 10)
## 
## Coefficients
##   Specified prior:
##     ~ normal(location = [0,0], scale = [2.5,2.5])
##   Adjusted prior:
##     ~ normal(location = [0,0], scale = [14.928, 0.089])
## ------
## See help('prior_summary.stanreg') for more details
```

---