# Teaching Bayes: A Binomial-Beta Lab with Solutions

Rebecca C. Steorts
June 11, 2016

### Beta-Bernoulli model

Recall the Beta-Bernoulli model: $X\mid \theta \sim \text{Bernoulli}(\theta)$ $\theta \sim \text{Beta}(a,b)$ where $$a,b$$ are fixed parameters.

Goals:

• Let's determine whether the probability that a worker will fake an illness is truly 1\%. Your task is to assist me!
• Let's outline our tasks and then solve them.

• Simulate some data using the \textsf{rbinom} function of size $$n = 100$$ and probability equal to 1\%.
• In order to replicate results, let's set our seed using \textsf{set.seed(123)}.

• Write a function that takes as its inputs the data you simulated above and a sequence of $$\theta$$ values of length 1000 and produces Likelihood values based on the Binomial Likelihood.
• Plot your sequence and its corresponding Likelihood function.

• Write a function that takes as its inputs prior parameters \textsf{a} and \textsf{b} for the Beta-Bernoulli model and the observed data, and produces the posterior parameters you need for the model.
• Generate the posterior parameters for a non-informative prior i.e. \textsf{(a,b) = (1,1)} and for an informative case \textsf{(a,b) = (3,1)}.

• Create two plots, one for the informative and one for the non-informative case to show the posterior distribution and superimpose the prior distributions on each along with the likelihood.
• What do you see? (Remember to turn the y-axis ticks off since superimposing may make the scale non-sense).

• Based on the informative case, generate a 95\% credible interval with 1000 posterior draws and a 95\% confidence interval for your parameter of interest, and use \textsf{xtable} to output these. What is the problem?
• Based on the data you simulated, do you conclude that the true value higher or lower than 1\%?

### Recap

Let's review the tasks and break them down into the most important information in each task.

• Simulate input data using the Beta-Bernoulli model
• $$n=100$$,
• $$p=0.01$$,
• Set a seed.

### Task 2: Plot Binomial LH

• Write a function called \textrf{myBernLH}.
• Inputs = simualted data and sequence of $$\theta$$ values with length 1000.
• Output: Likelihood values based upon the Binomial likelihood.
• Plot the Binomial likelihood as a function of $$\theta.$$

• Write a function called \textrf{myPosteriorParam}.
• Let the input parameters be \textrf{a,b} and the simulated input data.
• The output should be the posterior parameter for the Beta-Bernoulli model.
• Generate the posterior parameters for $$(a,b) = (1,1)$$ and $$(3,1).$$

• Create two plots based upon task 3 showing the posterior distributions.
• Show these distributions on the same plot with the likelihood.
• What do you observe?

• Based on the informative case for $$a,b$$ generate a 95 percent credible interval with 1000 posterior draws and a 95 percent confidence interval for your parameter of interest.
• What is the problem?
• Based on your simulated data, do you conclude that the true value is higher or lower thatn 1 percent?

### Solutions

We now provide solutions to the five tasks.

The data can be simulated as follows:

set.seed(123)
obs.data <- rbinom(n = 100, size = 1, prob = 0.01)


### Task 2: Plot Binomial LH

• We write a function called myBernLH to produce likelihood Bernoulli likelihood values.
### Bernoulli LH Function ###
# Input: the observed data, theta grid #
# Output: Produces likelihood values #
myBernLH <- function(obs.data, theta){
N <- length(obs.data)
x <- sum(obs.data)
LH <- ((theta)^x)*((1 - theta)^(N-x))
return(LH)
}


### Task 2: Plot Binomial LH (continued)

We now plot the likelihood as a function of $$\theta.$$

### Plot LH for a grid of theta values ###
# Create the grid #
theta.sim <- seq(from = 0, to = 1, length.out = 1000)
# Store the LH Values #
sim.LH <- myBernLH(obs.data = obs.data, theta = theta.sim)
# Create the Plot #


### Task 2: Plot Binomial LH (continued)

### Function to determine posterior parameters based on ###
### observed data and prior assumptions ###
# Inputs - Prior Paramters, observed data #
myPosteriorParam <- function(pri.a, pri.b, obs.data){
N <- length(obs.data)
x <- sum(obs.data)
post.a <- pri.a + x
post.b <- pri.b + N - x
post.param <- list('post.a' = post.a,
'post.b' = post.b)
return(post.param)
}


### Task 3: Generate posterior (continued)

# Find posterior parameters for two different priors #
# a = 1, b = 1 #
non.inform <- myPosteriorParam(pri.a = 1, pri.b = 1, obs.data = obs.data)
inform <- myPosteriorParam(pri.a = 3, pri.b = 1, obs.data = obs.data)
# a = 3, b = 1 #


### Task 3: Generate posterior (continued)

# print the output values
print(c(non.inform,inform))

$post.a [1] 2$post.b
[1] 100

$post.a [1] 4$post.b
[1] 100


We first create non-informative and informative priors and posteriors

### Create a plot of LH, Pri, Posterior using the simulated seq ###
non.inform.den <- dbeta(x = theta.sim, shape1 = non.inform$post.a, shape2 = non.inform$post.b)
inform.den <- dbeta(x = theta.sim, shape1 = inform$post.a, shape2 = inform$post.b)
pri.inform <- dbeta(x = theta.sim, shape1 = 3,
shape2 = 1)
pri.non.inform <- dbeta(x = theta.sim, shape1 = 1,
shape2 = 1)


What do you observe?

require(xtable)
## Create confidence/ credible intervals - informative ##
# Credible Interval #
my.sim <- rbeta(n = 1000, shape1 = inform$post.a, shape2 = inform$post.b)
my.credI <- quantile(my.sim, prob = c(0.025, 0.975))


# Confidence Interval #
p.hat <- sum(obs.data) / length(obs.data)
my.confI <- p.hat + qnorm(p = c(0.025, 0.975)) *
sqrt(p.hat*(1-p.hat)/length(obs.data))
# Store the results #
results <- rbind(my.credI, my.confI)
rownames(results) <- c('Credible Interval', 'Confidence Interval')
# present these results in a nice table #


### Task 5: Credible Interval Compared with Confidence Interval

print(xtable(results, digits = 4), comment = FALSE)


\begin{table}[ht] \centering \begin{tabular}{rrr} \hline & 2.5\% & 97.5\% \ \hline Credible Interval & 0.0107 & 0.0799 \ Confidence Interval & -0.0095 & 0.0295 \ \hline \end{tabular} \end{table}

### Food for Thought

• Did you find the lab too long?
• What do you anticipate students might have trouble with?
• How could you use this material to have students complete a homework
• How could you use the lab to incorporate real data?
• I'm finding that two-three tasks is reasonable and the last two are better as a follow up for homework.
• This also keeps the students engaged in the course and makes it flow well.