Homework 6
Solutions:
Exercise 26 Part A
Assuming that you have your data stored in two variables, let's call
them x1 and x2, you should obtain a histogram that looks like PLOT1 .
You could have used the following code:
x1_c(1.94,1.44,1.56,1.58,2.06,1.66,1.75,1.77,1.78,1.92,
1.25,1.93,2.04,1.62,2.08)
x2_c(1.27,1.63,1.47,1.39,1.93,1.26,1.71,1.67,1.28,1.85,
1.02,1.34,2.02,1.59,1.97)
diff_x1-x2
hist(diff,nclass=5,main="Differences in volumes",xlab="differences")
Exercise 26 Part B
Now you need to transform your data and, AFTER doing this, take the
difference and do the histogram. You want the histogram of the
differences of logs, not the histogram of the logs of the differences.
You should obtain a histogram like PLOT2 .
You could use the following code:
l1_log(x1)
l2_log(x2)
difl_l1-l2
hist(difl,nclass=5,
main="Differences in log(volumes)",xlab="differences")
Exercise 26 Parts C and D
After you stored the values of the logarithms in the variables l1 and
l2 and the originals in x1,x2, the p-value you obtained should be
around 0.006 in both cases (slight differences in the fourth decimal).
95%L. bound (tr) |
95%U. bound (tr) |
95%L. bound (untr) |
0.04 |
0.21 |
0.07 |
95%U. bound (untr) |
95%L. bound (back-tr) |
95%U. bound (back-tr) |
0.33 |
exp(0.04)=1.04 |
exp(0.21)=1.23 |
Exercise 27 Part A
The first step we have to take is allign in order our data. That's
direct to do with a single command in Splus. Assuming you have your
data stored in the variable data=log(x1)-log(x2)
alligned_data[order(data)]
Then you can compute easily your test.
However you should notice that the value is going to be the same
p=0.002, given the ranks you should have obtained and the
characteristics of the problem compared with that one for the
untransformed values.
Exercise 30 Part A
Assuming you have stored your data in two variables v1 and v2 for
treatment and pretreatment respectively,
example, the first step is to compute the ratio of them v1/v2, that is
going to be the ratio of tolerance to the sun between treated and
untreated samples.
Then we need to generate a random sample (let's say for example of
1000 different samples) from that. By ploting the results and getting
the quantiles of that sample we obtain a good approach to the real
values and a good confidence interval.
A quick link to the histograms for the MEAN and for the MEDIAN .
You should obtain near from following results for the MEAN
Lower bound |
Upper bound |
6.1614 |
12.4038 |
And for the MEDIAN
Lower bound |
Upper bound |
4 |
13.5 |
The code would be the following:
For the MEAN (Assuming you stored the ratio of the variables in v3)
n_length(v3)
bootmean_rep(NA,1000)
for (i in 1:1000){
sa_sample(1:n,n,replace=T)
bootmean[i]_mean(v3[sa])}
motif()
par(mfrow=c(1,1))
hist(bootmean,xlab="bootstrapped means",prob=T,nclass=40)
points(mean(v3),0,cex=2,pch=16)
ci95_quantile(bootmean,c(0.025,0.975))
text(ci95,c(0,0),c("[","]"))
ci95
For the MEDIAN
bootmedian_rep(NA,1000)
for(i in 1:1000){
sa_sample(1:n,n,replace=T)
bootmedian[i]_median(v3[sa])}
motif()
par(mfrow=c(1,1))
hist(bootmedian,xlab="bootstrapped medians",prob=T,nclass=10)
points(median(v3),0,cex=2,pch=16)
ci95_quantile(bootmedian,c(0.025,0.975))
text(ci95,c(0,0),c("[","]"))
ci95
Extra Assignment
Assuming you've stored your data in two variables city and rural,
The code should resemble the following:
do_city-rural
a_is.na(do)
diff_do[!a]
dif0_mean(diff)
dif_rep(NA,2000)
for(i in 1:2000){
vble_sample(c(-1,1),length(diff),replace=T)
dif[i]_mean(vble*abs(diff))}
extreme_ifelse(dif > dif0,1,0)
mean(extreme)
hist(dif,xlab="permutation values of mean2-mean1")
points(dif0,0,pch=17,cex=3)
Notice that you need to get rid of all NA data that you have in order
to be able to do all computations. If you don't do so, you'll receive
an error message.
We define the variable dif0 as the values that we got (the mean, of
course), and make a test assigning, through a random sample of -1 and
1 values, values randomly negative and positive to the values we got
in the sample.
The histogram should resemble THIS, and the
p-value should be around 0.002 .