Goto Lab: | #0 | #1 | #2 | #3 | #4 |
This assignment contains homework problems out of the text as well as a lab portion.
ok1_bumpus$code==1 ok2_bumpus$code==2 ybar1_mean(bumpus$humerus[ok1]) ybar2_mean(bumpus$humerus[ok2]) sd1_sqrt(var(bumpus$humerus[ok1])) sd2_sqrt(var(bumpus$humerus[ok2])) n1_length(bumpus$humerus[ok1]) n2_length(bumpus$humerus[ok2])
Group | n | Average (in/1000) | SD (in/1000) |
1 Died | 24 | 727.92 | 23.54 |
2 Survived | 35 | 738.00 | 19.84 |
# calculate the pooled sd estimate sdpool_sqrt(((n1-1)*sd1^2 + (n2-1)*sd2^2)/(n1+n2-2)) # calculate se for ybar2-ybar1 sediff_sdpool*sqrt(1/n1 + 1/n2) # compute the t-statistic tval_(ybar2 - ybar1)/sediff # now get the area to the left under the t curve P_pt(tval,df=n1+n2-2) # now get the one-sided p-value 1-P # the two sided p-value is twice that above (1-P)*2
# get the required t_df(1-alpha/2).. tstar_qt(1-.05/2,df=n1+n2-2) # lower end of the ci lower_ybar2 - ybar1 - tstar*sediff # upper end of the ci upper_ybar2 - ybar1 + tstar*sediff # print out the ci: print(c(lower,upper))
> round(c(sdpool,sediff,tval,P,1-P,(1-P)/2,tstar,lower,upper),3) [1] 21.411 5.674 1.777 0.960 0.040 0.020 2.002 -1.279 21.446
ok1_bumpus$code==1 ok2_bumpus$code==2 y1_bumpus$humerus[ok1] y2_bumpus$humerus[ok2]Now y1 and y2 hold the data from the two groups. To compute a two sample t-test to compare mu2 to mu1:
> t.test(y2,y1) Standard Two-Sample t-Test data: y2 and y1 t = 1.777, df = 57, p-value = 0.0809 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.279386 21.446053 sample estimates: mean of x mean of y 738 727.9167Compare these results to those above.
> a_t.test(y2,y1)a is a list of values. To see what is in a type:
> names(a) [1] "statistic" "parameters" "p.value" "conf.int" "estimate" [6] "null.value" "alternative" "method" "data.name"So you can grab the p-value, confidence interval, or t-statistic for example.
> a$statistic t 1.776998 > a$p.value [1] 0.0809045 > a$conf.int [1] -1.279386 21.446053 attr(, "conf.level"): [1] 0.95This feature can be very convinient if you're doing many t-tests or running a simulation.
> sd_function(x) sqrt(var(x))This creates a function called sd(). It requires a vector x as its argument. This function takes a vector of numbers x and computes the standard deviation. To try it out type:
> sd(c(1,2,3)) [1] 1 > sd(rnorm(5,mean=69,sd=3)) [1] 5.682016 > sd(rnorm(500,mean=69,sd=3)) [1] 2.952136
findci_function(x,y){ a_t.test(x,y) ci_as.vector(a$conf.int) return(ci) }No try it out on some data:
> findci(rnorm(10,mean=0,sd=2),rnorm(15,mean=1,sd=2)) [1] -2.124404 1.049983 > findci(y2,y1) [1] -1.279386 21.446053
power_function(mudiff,n,sd,level=.05){ tstat_mudiff/sqrt(sd^2/n + sd^2/n) df_n-2 tcrit_qt(1-level,df=df) 1-pt(tcrit-tstat,df) }The function requires the arguments:
> power(1,20,3) [1] 0.2525874For a sample size of 50 each it's
> power(1,50,3) [1] 0.4958101
trt_c(36.23,30.86,38.25,39.61,39.16,40.49,35.08,39.2) cont_c(30.9,32.62,28.96,28.71,29.68,33.26)From this data we can get an idea of what the mean difference might be and what SD could be expected from a larger study.
> mean(trt) [1] 37.36 > mean(cont) [1] 30.68833 > mean(trt) - mean(cont) [1] 6.671667 > ntrt_length(trt) [1] 8 > ncont_length(cont) [1] 6 > sdpool_sqrt((var(trt)*(ntrt-1) + var(cont)*(ncont-1))/(ntrt+ncont-2)) [1] 2.72809So if there were 10 trees in each group, a good guess at the chance of detecting a difference in mean Vcmax is:
> power(6.67,10,2.73) [1] 0.9965274Which is very likely.
> power(c(1.1,2.2,3.3),6,sdpool) [1] 0.1125099 0.2515317 0.4862440Here's the Splus commands to make a pretty graph:
# consider 5 different sample sizes: n=3,4,5,6 and 10 n_c(3,4,5,6,10) # create a vector mdiff which ranges from 0 to 10, and # whose length is 40. mdiff_seq(0,10,length=40) # now make the plotting region, but use type="n" so that nothing # is yet plotted. I'll make sure all the labels and stuff # are ok. plot(mdiff,power(mdiff,10,sdpool),type="n", ylab="power - chance of rejecting Ho", xlab="mu(treatment) - mu(control)", main="Power curve for Vcmax, using t-test with pooled se") # Now go ahead and put in lines that correspond to the # power as a function of mdiff for the various sample sizes - # mdiff on the x-axis and the power on the y-axis. # The argument lwd=3 tells Splus to make the # line width 3 times larger than normal. lines(mdiff,power(mdiff,3,sdpool),lwd=3) lines(mdiff,power(mdiff,4,sdpool),lwd=3) lines(mdiff,power(mdiff,5,sdpool),lwd=3) lines(mdiff,power(mdiff,6,sdpool),lwd=3) lines(mdiff,power(mdiff,10,sdpool),lwd=3) # finally, put on labels so we can tell which line goes with # which sample size. The line that increases the slowest is # the one with the smallest sample size. After you enter # the text command below, you need to click on the plot at 5 different # locations so that Splus knows where to put the labels. # At the first click, "n=3" will be added to the plot; # at the 5th click, "n=10" will be added to the plot. # Remember hit the escape key if you get stuck. text(locator(5),c("n=3","n=4","n=5","n=6","n=10"))
> mdiff_seq(0,10,length=40)That's not a good choice for this data. mdiff should range from 0 to an amount that is at the high end of what's plausible. In addition to the homework, turn in this plot and the commands you used to make it. Below are the leaf number measurements taken from the initial dataset.
trt_c(1.121,1.29,1.183,1.145,1.168,1.316,.998,1.174) cont_c(1.012,1.111,1.014,1.091,1.098,1.179)