STA114 / MTH136 - Statistics -Lab 6, Feb 15+29 2001

STA114/MTH136: Statistics Lab 6, Feb 15+29 2001

A medical study reported in early 1998 concerned HIV risks for babies born of HIV positive women. A group of 400 HIV positive pregnant women were randomly separated into two groups of 200. In one group, the women receiving no special treatment. The women in a second group received intense AZT therapy prior to and during childbirth. The women were followed to childbirth. In the first group, 19 out of 200 babies were HIV positive; in the second group, only 9 out of 200 babies were HIV positive. The key issue is whether there a significant effect of the treatment here, and what it suggests for treatment of HIV positive pregnant women in future. This is an archetype problem of comparison of proportions. Interest lies in assessing the significance, statistically and practically in terms of its relevance for future women, of the observed difference in outcome proportions: 19/200 versus 9/200.

	HIV +	Total
Treated:	9	200
Control:	19	200

Here is an S-Plus session to explore this data; try to cut-and-paste:

  motif();                       #  Under Windows, use "graph.sheet();"
  help.start();                  #  Nicer, windowed help system
  p0 <- rbeta(10000,20,182);     # Posterior w/uniform prior
  p1 <- rbeta(10000,10,192);
  p  <- seq(0,1,0.005);          # Just points for plotting
  del <- p0-p1;                  # Improvement in non-HIV-pos rate
  rho <- p1/p0;                  # Relative risk, under treatment

  hist(p0, prob=T, ncl=20, xlim=c(0,1));   # Quick look at the data...
  lines(p, dbeta(p,20,182));     # Overlay with beta pdf
  
  hist(del, prob=T);             # The differences
  abline(v=0);                   # A reference line at "p0=p1"
  summary(del);                  # Mean, range, etc
  mean(del);                     # Just the mean
  mean(del>0);                   # Estimate of Pr[ p0 > p1 ]
  mu <- mean(del);
  sd <- sqrt(var(del));          # Standard deviation (=sqrt var)
  x  <- seq(min(del), max(del), , 200);  # What does this do?
  y  <- dnorm(x,mu,sd);          # Normal approximation to differences
  lines(x,y,col=3);              # Does it fit the histogram very well?

  hist(rho, prob=T);             # The ratios
  abline(v=1);                   # A reference line at "p0=p1"
  summary(rho);                  # Mean, range, etc
  mean(rho);                     # Just the mean
  mean(rho<1);                   # Another estimate of Pr[ p0 > p1 ]
  mu <- mean(log(rho));          # Mean of LOG rho
  sd <- sqrt(var(log(rho)));     # Std dev of LOG rho
  x  <- seq(min(rho), max(rho), , 200);
  y  <- dlnorm(x,mu,sd);         # LOGnormal approximation to ratios
  lines(x,y,col=3);              # Does it fit the histogram very well?

What did you learn about the effectiveness of this treatment? About how many HIV infections would you expect to be prevented if 1000 HIV-positive women were given this treatment during pregnancy? If treatment costs $1500 per subject, what is the cost per infection prevented? How does treatment affect the infection probability for an individual's child?

The exact probability Pr[p0 > p1] is available from Mathematica using the command

NIntegrate[x^19 (1-x)^181 y^9 (1-y)^191, {x,0,1}, {y,0,x}] / NIntegrate[x^19 (1-x)^181 y^9 (1-y)^191, {x,0,1}, {y,0,1}] (the ratio of two double integrals; the denominator could have been replaced by its known value, Gamma[20]Gamma[182]Gamma[10]Gamma[192] /Gamma[202]^2). The answer for Pr[p0 > p1] turns out to be 0.973878. How close did you come using simulation above? How close would you expect to come? Why?

Here are the VA hospital follow-up failure data; what can you say about

Changes from 1992 to 1993?
Variations among the three hospitals?
Which is best? Which improved the most?

	A	B	C
1992:	236/520	76/186	55/134
1993:	190/451	74/191	74/137

  motif();                       # Under Windows, use "graph.sheet();"
  help.start();                  # Nicer, windowed help system
  a92 <- rbeta(1000,237,521-236);
  a93 <- rbeta(1000,191,452-190);
  b92 <- rbeta(1000, 77,187- 76);
  b93 <- rbeta(1000, 75,192- 74);
  c92 <- rbeta(1000, 56,135- 55);
  c93 <- rbeta(1000, 75,138- 74);
  
  lohi <- range(c(a93-a92, b93-b92, c93-c92));  # Overall range of diffs

  par(mfrow=c(3,1));             # Three plots, one above another

  hist(a93-a92, prob=T, xlim=lohi);    abline(v=0);
  hist(b93-b92, prob=T, xlim=lohi);    abline(v=0);
  hist(c93-c92, prob=T, xlim=lohi);    abline(v=0);

  # Note setting of "xlim" to common value, so x axes will all line up.

  mean(a92-a93);                 # Mean improvement for Hospital A
  mean(a93 < a92);               # Prob of improvement for hospital A
  mean(b93 < b92);               # Prob of improvement for hospital B
  mean(c93 < c92);               # Prob of improvement for hospital C

  mean( (a93 < b93) & (a93 < c93) );  # Prob that A is best in 1993
  mean( (b93 < a93) & (b93 < c93) );  # Prob that B is best in 1993
  mean( (c93 < a93) & (c93 < b93) );  # Prob that C is best in 1993

There is much more you can do in S-Plus with these data... have fun with them!