Homework 4
Solutions:
You can solve almost all of these problems using the graphical
interface. Go ahead and do it that way. Below we have the
command line commands needed to solve the problems. You don't
have to do all the work that is shown here! All you have to do
is read in the data to a dataframe and then for a two sample
test:
1) statistics>compare samples>two samples>t test
2) put the variables in the x variable and y vble
3) click on data has a grouping variable
4) ok
For one sample tests:
1) statistics>compare samples>one sample>t test
2) put the variable in the x variable
3) ok
For basic statistical information
1) statistics>data summaries>summary statistics
2) click on the icons of the statistics you want to know
3) choose the set/subset of variables you want to know
4) choose if you want to group variables
5) ok
Check that you have the correct answers, but don't worry if
you did it a different way than what was shown here.
Exercise 13
These are the solutions:
Mean 1 |
Mean 2 |
S.d. 1 |
S.d. 2 |
pooled |
s.e. |
d.freedom |
t(97.5) |
Interval |
t stat. |
p-value |
6.57 |
-1.14 |
5.85 |
3.18 |
4.713 |
2.519 |
12 |
2.17 |
[2.22,13.20] |
3.06 |
0.0049 |
# assume the data are in two vectors bp and code. The following
# commands put the bp data into two vectors, one for each group.
ok1_code==1
ok2_code==2
bp1_bp[ok1]
bp2_bp[ok2]
# this puts the data into two vectors: group 1 bps in bp1 and
# the group 2 bps in bp2
#this is problem 13a
m1_mean(bp1)
m2_mean(bp2)
s1_sqrt(var(bp1))
s2_sqrt(var(bp2))
#this is problem 13b. I assign throuth the command length the value
#of the number of observations to n1 and n2. Then I get the pooled
#estimate of the standard deviation with the sqrt of an ordered
#formula as it appears in the book.
n1_length(bp1)
n2_length(bp2)
sp_sqrt((((n1-1)*var(bp1))+((n2-1)*var(bp2)))/(n1+n2-2))
#this is problem 13c. Same procedure for the SE(Y2-Y1)
sey2y1_sp*sqrt((1/n1)+(1/n2))
#you can see that the degrees of freedom is n1+n2-2=12
#see that I have used the qt function. It has arguments (x,y)
#It gives you the value of the quantile x of a t distribution with
#y degrees of freedom =) qt(x,y) 0<=x<=1,y>=1.
qt(0.975,n1+n2-2)
#this is problem 13e. The confidence intervals. Just the formula.
ica_m2-m1-qt(0.975,n1+n2-2)*sey2y1
icb_m2-m1+qt(0.975,n1+n2-2)*sey2y1
#this is problem 13f
#testing equality =) D=0
t_((m2-m1)-0)/sey2y1
#13g is direct. Look that I used abs to ensure that I'm not using
#incorrectly the t-value. I get its absolute value and compute always
#the value of the t in the right side of the bell.
pvalue_1-pt(abs(t),n1+n2-2)
Exercise 15
These are the solutions:
Confidence interval |
p-value |
[9.27,12.72] |
0 |
#exercise 15
#you just have created the formula to do this problem, so you
#just need to enter the new data en mechanically obtain the results
m1_29.2
m2_18.2
s1_7.5
s2_5.8
n1_126
n2_50
#from now you only need to tell the program what to do with this
#data using the commands (standard for any data) we created before.
sp_sqrt((((n1-1)*var(bp1))+((n2-1)*var(bp2)))/(n1+n2-2))
sey2y1_sp*sqrt((1/n1)+(1/n2))
qt(0.975,n1+n2-2)
ica_m2-m1-qt(0.975,n1+n2-2)*sey2y1
icb_m2-m1+qt(0.975,n1+n2-2)*sey2y1
t_((m2-m1)-0)/sey2y1
pvalue_1-pt(abs(t),n1+n2-2)
Exercise 16
These are the solutions:
Confidence interval |
p-value |
degrees of freedom |
[1.29,6.99] |
0.0269 |
45 |
#Exercise 16
#ic is the group of intrinsic
#ec is the group of extrinsic
ic_c(12,12,12.9,13.6,16.6,17.2,17.5,18.2,19.1,19.3,19.8,20.3,
20.5,20.6,21.3,21.6,22.1,22.2,22.6,23.1,24.0,24.3,26.7,29.7)
ec_c(5,5.4,6.1,10.9,11.8,12.0,12.3,14.8,15.0,16.8,17.2,17.2,
17.4,17.5,18.5,18.7,18.7,19.2,19.5,20.7,21.2,22.1,24.0)
#again the best way to do this is to use standard formulae.
bp1_ic
bp2_ec
m1_mean(bp1)
m2_mean(bp2)
s1_sqrt(var(bp1))
s2_sqrt(var(bp2))
n1_length(bp1)
n2_length(bp2)
sp_sqrt((((n1-1)*var(bp1))+((n2-1)*var(bp2)))/(n1+n2-2))
sey2y1_sp*sqrt((1/n1)+(1/n2))
ica_m2-m1-qt(0.975,n1+n2-2)*sey2y1
icb_m2-m1+qt(0.975,n1+n2-2)*sey2y1
t_((m2-m1)-0)/sey2y1
pt(abs(t),n1+n2-2)
onesidedpvalue_1-P
twosidedpvalue_2*(onesidedpvalue)
Exercise 20
These are the solutions:
Mean |
s.d. |
degrees of freedom |
s.e. |
Confidence interval |
one-sided p |
two-sided p |
[1.29,6.99] |
0.00269 |
6 |
2.213 |
[1.156,11.986] |
0.0237 |
0.0475 |
#For exercise 20 let's see what we already have
#the average for this group is m1 and the s.d. is s1
#the standard error for the average is
serr_s1/sqrt(n1)
#the confidence interval would be easy to construct
ic1_m1-(qt(0.975,n1-1)*serr)
ic2_m1+(qt(0.975,n1-1)*serr)
#We can see that the intervals are very broad because of the
#few degrees of freedom that we have.
t_m1/sqrt(n1)
pvalue_1-pt(abs(t),n1-1)
Exercise 21
These are the solutions:
Confidence Interval |
2-sided p-value |
t |
[0.0961,1.5281] |
0.269 |
2.2714 |
#Exercise 21
ws_c(24.5,26.9,26.9,24.3,24.1,26.5,24.6,24.2,23.6,26.2,26.2,
24.8,25.4,23.7,25.7,25.7,26.3,26.7,23.9,24.7,28.0,27.9,25.9,
25.7,26.6,23.2,25.7,26.3,24.3,26.7,24.9,23.8,25.6,27.0,24.7)
wp_c(26.5,26.1,25.6,25.9,25.5,27.6,25.8,24.9,26.0,26.5,26.0,
27.1,25.1,26.0,25.6,25.0,24.6,25,26,28.3,24.6,27.5,31.1,28.3)
#again it's easier to take what we have and use it instead
#of doing the same again and again for different data
bp1_ws
bp2_wp
m1_mean(bp1)
m2_mean(bp2)
s1_sqrt(var(bp1))
s2_sqrt(var(bp2))
n1_length(bp1)
n2_length(bp2)
sp_sqrt((((n1-1)*var(bp1))+((n2-1)*var(bp2)))/(n1+n2-2))
sey2y1_sp*sqrt((1/n1)+(1/n2))
ica_m2-m1-qt(0.975,n1+n2-2)*sey2y1
icb_m2-m1+qt(0.975,n1+n2-2)*sey2y1
t_((m2-m1)-0)/sey2y1
twosidedp_2*(1-pt(abs(t),n1+n2-2))
The summary should resemble the one on pages 28 and 29.
Extra lab
These are the solutions:
#extra assignment: power test graphics
#basis=mean(trt)-mean(cont)
basis_mean(trt)-mean(cont)
powr_function(basis,n,sd){
t_(basis)/sqrt(2*(sd*sd)/(n*n))
tbound_qt(0.95,n-2)
1-pt(tbound-t,n-2)}
#Now the function is stored in memory and you can use it whenever
#you want for different data.
trt_c(1.121,1.29,1.183,1.145,1.168,1.316,.998,1.174)
cont_c(1.012,1.111,1.014,1.091,1.098,1.179)
m_mean(trt)-mean(cont)
n1_length(trt)
n2_length(cont)
sp_sqrt((var(trt)*(n1-1))+(var(cont)*(n2-1))/(n1+n2-2))
siz_c(3,4,5,6,10)
basis_seq(0,max(mean(trt),mean(cont)),length=40)
plot(basis,powr(basis,10,sp),type="n",
ylab="power - chance of rejecting NH",
xlab="mu(trt)-mu(cont)",
main="Power curve using t-test with pooled s.d.")
for (aa in 1:5){
lines(basis,powr(basis,siz[aa],sp),lwd=3)}
text(locator(5),c("n=3","n=4","n=5","n=6","n=10"))
- #See that to solve almost all of these exercises we have used just
#a few lines and computed new data with a standard set of commands.
#However the easiest way would be to create a function that takes
#the data and compute everything by just pointing it to the data.
#With the next 17 lines you could have answered all the problems.
#The way to do this is name_function(argument1,argument2,...){
#commands
#commands...
#return(variable,variable2,variable3,...)
#}
doeverything_function(bp1,bp2){
m1_mean(bp1)
m2_mean(bp2)
s1_sqrt(var(bp1))
s2_sqrt(var(bp2))
n1_length(bp1)
n2_length(bp2)
sp_sqrt((((n1-1)*var(bp1))+((n2-1)*var(bp2)))/(n1+n2-2))
sey2y1_sp*sqrt((1/n1)+(1/n2))
ica_m2-m1-qt(0.975,n1+n2-2)*sey2y1
icb_m2-m1+qt(0.975,n1+n2-2)*sey2y1
t_((m2-m1)-0)/sey2y1
P_pt(abs(t),n1+n2-2)
twosidedp_2*(1-pt(abs(t),n1+n2-2))
return (m1,m2,s1,s2,sp,sey2y1,c("95% interval",ica,icb),
c("onesidedp",1-P),twosidedp)
}
- #This is the same as going to the window and asking the Statistics
#that you want for any data stored. You already know the path.