The premise of the study is to understand the implications of multiple testing in the context of HIV data. The analysis will determine if certain biological attributes of an individual affect the rate of HIV infection. The complete data set comes from a STEP trial of the MRKAd5 vaccine, but the portion of the dataset under consideration consists of six covariates of biomarker data from the vaccinated subjects. A biomarker is a quantitative measure of the presence of a certain biological trait in an individual. The dependent variables are the biomarkers that may or may not affect the rate of infection. The outcome variable is an indicator variable that indicates whether or not the individual is infected with HIV after receiving the treatment.
The original researchers published a paper using this data that analyzed vaccine efficacy using frequentist methods. Last semester, I replicated their analyses and fit a logistic regression. Similarly to the researchers, I found that two of the six biomarkers were significant in predicting HIV infection based on the frequentist results. This semester, I am exploring Bayesian methods to analyze the biomarker data.
For the Bayesian analysis, I am using a multivariate normal prior centered at zero. I want to use an improper uniform prior but R cannot compute marginal likelihoods using this prior. Therefore, I will make my prior as vague as possible, with a large variance and small precision. Using my prior and likelihood, I will be able to calculate the marginal likelihood of the model. Because I have 6 biomarkers, I have 32 different combinations of the variables and 32 models to choose from. The Bayesian model selection component consists of computing the marginal likelihood for each of the 32 models to determine which model is most apt.