---
title: "MCMC in Baysian Variable Selection/Model Averaging"
output: beamer_presentation
author: "Merlise Clyde (Duke University)"
date:  11/14/2017

---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Example with Diabetes data 

First lets load the data (from Hoff) and coerce them to be dataframes.

```{r}
set.seed(8675309)
source("yX.diabetes.train.txt")
diabetes.train = as.data.frame(diabetes.train)
dim(diabetes.train)

source("yX.diabetes.test.txt")
diabetes.test = as.data.frame(diabetes.test)
colnames(diabetes.test)[1] = "y"
```

Too many models $`r 2^65`$ to enumerate!

## BMA using BAS


```{r runbas}
library(BAS)
diabetes.bas = bas.lm(y ~ ., data=diabetes.train, 
                      method="MCMC", prior='hyper-g-n',
                      n.models = 10000,
                      MCMC.iteration=500000,
                      thin = 10, initprobs="eplogp")

```

* run a MCMC sampler that combines a Random Walk Metropolis algorithm with a random swap step.  
* The algorithm stops when the number of iterations exceeds `MCMC.iterations` or `n.models` have been visited. 
* `thin` save every 10th model. 
*  `initprobs` argument uses the p-values from OLS to compute an approximate Bayes Factor.  These are used internally to sort the variables which provides improved memory usage.

## Checking Convergence

 The function `diagnostics` compares 
 
 * estimates of posterior  probabilities  estimated from Monte Carlo frequencies (asymptotically unbiased) to 

 * estimates based on normalizing   marginal likelihoods times prior probabilities of just the sampled models.  (Fisher consistent; recovers exact under enumeration)
 
* should be equal if the chain has converged.

## Convergence of Posterior Inclusion Probabilities
```{r}
diagnostics(diabetes.bas, type="pip")

```

Close, but could run longer! 


## Convergence of  model probabilities 

```{r}
diagnostics(diabetes.bas, type="model")
```

Clearly, not long enough.

## rerun longer
```{r runlongbas, cache=T}

diabetes.bas = bas.lm(y ~ ., data=diabetes.train, 
                      method="MCMC", prior='hyper-g-n',
                      n.models = 150000, 
                      MCMC.iterations=10^7,
                      thin = 64, 
                      initprobs="eplogp")


```

## Diagnostics
```{r}
diagnostics(diabetes.bas, type="model")
```


## Highest Probability Model

One problem with MCMC and model selection is that the "best" model may be visited very few times.  

```{r}
summary(diabetes.bas$freq)
```

## Comments


* I routinely use much larger numbers of MCMC iterations than I see in papers if looking at BMA or model selection (i.e. millions rather than say 10,000)

* Explore different numbers of models - does it make a difference for estimating marginal inclusion probabilities or posterior inclusion probabilities?  What about predictions under BMA?

* Check memory !  items can be large in R memory

## What variables are included?

```{r}
image(diabetes.bas)
```


## Prediction

BMA for prediction and compute the MSE between the predicted and actual responses for the test data using BMA:

```{r}
pred.bas = predict(diabetes.bas, 
                   newdata=diabetes.test, 
                   estimator="BMA", se.fit=T)
cv.summary.bas(pred.bas$fit, diabetes.test$y)


```
##  Scoring

```{r}
BAS::cv.summary.bas
```

## Fit the Lasso


```{r lasso}
suppressMessages(library(lars))
diabetes.lasso = lars(as.matrix(diabetes.train[, -1]),
                      diabetes.train[,1], 
                      type="lasso")


```

## Plot of Lasso

```{r}
plot(diabetes.lasso)
```

## Best Cp Model for Lasso

```{r}
best = which.min(diabetes.lasso$Cp)
best
out.lasso = predict(diabetes.lasso, s=best[1],
                    as.matrix(diabetes.test[,-1]))
cv.summary.bas(out.lasso$fit, diabetes.test$y)

```

Close

## Choice of estimator

*  BMA  - optimal for squared error loss Bayes

*  Highest Posterior Probability model (not optimal for prediction) but for selection

* Median Probabilty model (select model where PIP > 0.5)
 (optimal under certain conditions; nested models)
 
 * Best Probability Model - Model closest to BMA under loss

## HPM

```{r}
pred.HPM = predict(diabetes.bas, 
                   newdata=diabetes.test, 
                   estimator="HPM", se.fit=TRUE)
cv.summary.bas(pred.HPM$fit, diabetes.test$y)


```

## MPM

```{r}
pred.MPM = predict(diabetes.bas, 
                   newdata=diabetes.test, 
                   estimator="MPM")
cv.summary.bas(pred.MPM$fit, diabetes.test$y)


```

## BPM

```{r}
pred.BPM = predict(diabetes.bas, 
                   newdata=diabetes.test, 
                   estimator="BPM")
cv.summary.bas(pred.BPM$fit, diabetes.test$y)


```

## Coverage

```{r CI-BAS, echo=TRUE}
CI.bas = confint(pred.bas)
mean(CI.bas[,1] < diabetes.test$y & diabetes.test$y <= CI.bas[,2])
```

## Credible Intervals
```{r}
plot(CI.bas)
points(1:nrow(diabetes.test), diabetes.test$y, col="red", pch=20)

```

## Coverage with HPM


```{r CI-HPM, echo=TRUE}
CI.HPM = confint(pred.HPM)
mean(CI.HPM[,1] < diabetes.test$y & diabetes.test$y <= CI.HPM[,2])
```

## Credible Intervals HPM
```{r, echo=TRUE}
plot(CI.HPM)
points(1:nrow(diabetes.test), diabetes.test$y, col="red", pch=20)
```


## Calculation of CI


##  Summary

*  prediction intervals account for model uncertainty

*  multiple shrinkage with BMA

* choice of estimator and loss