Data Analysis Project

Nov 24, 2019

Repo Data

Photo by rawpixel on Unsplash

(Please look for the invitation from Github Classroom to create your team repo)

What do Barbie dolls, food wrap, edamame, and spermicides have in common? And what do they have to do with low sperm counts, precocious puberty, and breast cancer? “Everything” say those who support the notion that hormone mimics are disrupting everything from fish gender to human fertility. “Nothing” counter others who regard the connection as trumped up, alarmist chemophobia. The controversy swirls around the significance of a number of substances that behave like estrogens and appear to be practically everywhere–from plastic toys to topical sunscreens.

Estrogens are a group of hormones produced in both the female ovaries and male testes, with larger amounts made in females than in males. They are particularly influential during puberty, menstruation, and pregnancy, but they also help regulate the growth of bones, skin, and other organs and tissues. In general, they have a strong effect of endocrine function by disrupting these functions.

Over the past 10 years, many synthetic compounds and plant products present in the environment have been found to affect hormonal functions in various ways. Those that have estrogenic activity have been labeled as environmental estrogens. There is increasing concern that chemicals in the environment referred to as environmental estrogens may be causing adverse effects through endocrine disruption.

Hence, there is a need for new approaches for screening chemicals for endocrine disrupting effects. The rat uterotrophic bioassay provides one approach for identifying agonists or antag- onists of estrogen. An estrogen antagonist is a compound that blocks the binding of estrogen and so blocks the action of estrogen. An estrogen agonist is a compound that enhances the action of estrogen.

Rats in this study are either immature or have their ovaries removed and therefore do not produce estrogen. The point of the study is to use the rats as an assay to test the effect of estrogen agonists and antagonists on a particular hormonal response, the weight of the uterus. This is done by varying the amount of the agonist or antagonist give to the rat. The response is the weight of the uterus, with uterus weight expected to exhibit an increasing dose response trend for chemicals acting as estrogen agonists and with estrogen antagonists (ZM) acting to block such estrogen effects. It is expected that the uterus gets heavier with the increase of estrogen agonist (EE) dose.

The basic design randomizes female rats to treatment groups, with groups consisting of a control group and several groups having increasing doses of the test agent. An international multilaboratory study was conducted to compare the results of the rat uterotrophic bioassay using a known estrogen agonist (EE) and a known estrogen antagonist (ZM). The main goal of the study was to assess whether the results were consistent across the laboratories.

Variables

Protocol

immature female rats dosed by oral gavage (3 days)
immature female rats dosed by injection (3 days)
adult ovariectomized female rats dosed by injection (3 days)
adult ovariectomized female rats dosed by injection (7 days)

Uterus Uterus weight (mg)

Weight Body weight of rat (g)

EE Dose of estrogen agonist, EE in mg/kg/day

ZM Dose of estrogen antagonist, ZM in mg/kg/day

Lab Laboratory at which assay was conducted

Group Lab replicate group (6 rats were used per group)

EDA

Provide graphs that illustrate how the uterus weight depends on body weight, Protocol, EE, ZM, and Lab, using scatterplots, boxplots, conditioning plots with/without colors or different symbol types to illustrate the relationships. Where possible try to show three (or 4 variables) in a plot as long the points are clear. Can you find one plot that addresses the question of interest? Provide 1 page of figures with informative captions. All axes should be labeled with approriate units and legends included if there are colors, symbols, etc used for full credit.

Models

Part I: Develop an appropriate model for the data which includes an effect for labs and other variables with possible interactions and fit the model using OLS/Maximum Likelihood estimation, transforming variables if necessary with the goal of being able to answer all questions below with this model. If there are outliers or influential points summarize and describe how you handle them. Once satisfied, call this your full model.

Use the full model to address the following points, providing both appropriate graphical displays and quantitative summaries (with uncertainty measures of course) and interpretations:

Is the uterotrophic bioassay successful overall at identifying estrogenic effects of EE and anti-estrogenic effects of ZM? Do some labs fail to detect such effects? At what dose level of EE is there a change relative to the control and does this level vary across labs?
Does the dose response vary across labs? If so, are there certain labs that stand out as being different?
Do the protocols differ in their sensitivity to detecting estrogenic and anti-estrogenic effects? If so, is there one protocol that can be recommended?

You should have one “full” model that can be used to test each of the hypotheses by comparing the full model to simpler models. Be sure to clearly label the models and how you are testing the hypotheses. Where possible also include estimates of effects with confidence intervals to ascertain the magnitude of effects and their uncertainty in a table. For each question, construct an appropriate graph to illustrate the effect with uncertainty estimates included if possible.

Part II: Repeat the above but use an appropriate Bayesian model. Describe the Bayesian model and priors adopted - you do not have to carry out Bayesian model selection, as long as you have appropriate models to quantify the effects related to the questions above. Provide estimates of effects and credible intervals (posterior probabilities if appropriate) in a table with one graph to illustrate/support your findings for each question.

Part III Use JAGS (or your own MCMC) to fit your full model, but use a hierarchical prior for the laboratory effects to determine if interlaboratory variation is greater than variation due to measurement error: $$\beta_l \mid \sigma^2, \sigma^2_L, \lambda_l \sim N(0, \sigma^2 \sigma^2_L/\lambda_l)$$ where $\sigma^2$ is the variance of the response, $$\sigma_L \sim t_1$$ for $\sigma_L > 0$ is a half-Cauchy prior on $\sigma_L$, and $$\lambda_l \sim G(a/2, a/2)$$ allows for heavier tails than the normal for the laboratory effects in case any labs are “outliers”. Taking $a$ to be infinity is the normal prior, while $a = 1$ is a Cauchy prior conditional on $\sigma^2 \sigma^2_L$. Think about what might values of $a$ might be appropriate in this context and discuss how you select $a$. Include a graph that illustrates your findings for each question as well as tables with estimates of effects of interest and the $\sigma$ and $\sigma_L$ with credible intervals.

Submission to Github:

Choosing one of the methods from either Part I, Part II or Part III, prepare a pdf report using Sweave/knitr with 2 pages max of text and up to 2 additional pages with tables and graphs with an executive summary, an introduction (assume that the researcher does not have access to this document so highlight main questions and descriptions of variables so that your document can be read on its own), description of your methods and model with variables and parameters, results, and conclusions. You do not need to describe all of the exploratory work that you do or wrong turns, but include the most important points in the body. For each question you should summarize the hypotheses, how you tested the hypothesis, any statistics and outcomes or for non-hypothesis based methods, provide estimates of the effect size and uncertainty and interpretation. Interpretations should be done in the context of the units of the original variables if you used any transformations.

The code and output for the EDA, and Parts I-III should be included in the Appendix.

The data file looks like this (first 6 lines)

bioassay = read.table("http://stat.duke.edu/sites/stat.duke.edu/files/bioassay.txt", header=T)
head(bioassay)

Linear Models

Professor Merlise Clyde

Publications

An example preprint / working paper

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex …

Linear Models

PDF Code Dataset Project Poster Slides Video Source Document Custom Link

An example conference paper

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex …

Linear Models, Robert Ford

PDF Code Dataset Project Poster Slides Video Source Document Custom Link

Talks

Example Talk

An example talk using Academic’s Markdown slides feature.

Jun 1, 2030 1:00 PM — 3:00 PM London, United Kingdom

Project Slides Follow