STA 240 Take Home Midterm Questions and Answers

Due: in class on 23 November.

Question: Can we work in groups or discuss the midterm?
Answer: Ha! Work by yourself. You can discuss general statistical concepts and ideas, but not any details involving the midterm.

Question: #1.) do you mean in general? As far as not necessarily what trap or level on the trap.
Answer: For question 1 of dataset 1, I'm looking for the total number of bugs trapped on a typical day - You could answer this by looking at counts separately for height or not. It's up to you. You may like to report this in bugs per trap since this can give an idea of what additional traps will add to the experiment.

Question: I'm having trouble changing number/numeric data from "double" to "factor" on the exam's data sets. After highlighting the appropriate column of data imported into splus, I choose 'data', then 'change data type', then I choose 'factor'...but when I double-check the column, splus tells me that it's still 'double'. *Argh!* This method worked fine for me for homework 7 (as usual, I'm using the student version of splus on a home computer).
Answer: Here's a way to get around that. Lets assume we have read the bug.asc data set into a dataframe called bug. To make a factor version of trap type:

> bug$trap2_as.character(bug$trap)

If that is no good: do this:

> bug$trap2_as.factor(as.vector(t(matrix(1:30,ncol=2,nrow=30))))

The column trap2 will for sure be a factor.

Question: Hi. I was wondering if solutions for HW 7 were going to be posted on the web. I missed a few things, and know they are important for the exam, and would like to be able to see where I went wrong.
Answer: They're up.

Question: For the test, I do have a question. When I try to do a multiple comparison, the plot that results is all off kilter, often with no scale along the horizontal axis. I think I have followed the directions as given in class, so I don't know what's wrong. A bit vague, to be sure, but perhaps you have some idea...
Answer: Hmm... When I tried out the analysis things looked fine. Remember that comparing all 8 varieties to each other requires 8*7=56 comparisons, so the plot requires 2 pages when I do it. You can also read off the comparison ci's from the report page.

Question: Ok, I added a size column to the lfarea data, giving the small island a value of 1 and the mid sized island a value of 2 (adding values at the command line like we should). When I run a two way anova on this data (colony effect first), I get a really weird report and a warning message at the bottom of the splus window. It keeps saying "effects may be unbalanced", and some effects aren't "estimable". I don't get refitted means, or SS for the island data, even though I ask for it (under island 1 and 2 it says NA--yes, I made the values into factors). What should I do differently to get this 2 way to run???? (it worked when I did 3). Thanks!
Answer: if you're model formula is:

area ~ colony+size

then colony represents a 9 mean model and size represents a 2 mean model. So if you put the colony term in your model before you put your size term in the model, then the anova table will look at how much improvement you get by adding the second term after fitting the first term. Splus is saying that once you adjust for colony, size doesn't improve the model at all. The SS for size after adjusting for colony is 0, which means there is no improvement in the RSS.

Question: I know I asked you this before but could you tell me again how to calculate a 95% CI from an ANOVA table? You said to use the adjusted means but I can't remember what else. Is it (adjusted mean)+/- t* (residual standard error²)(square root 1/n)????????????
Answer: A 95% CI for the adjusted mean is just the adjusted mean +/- t* times the se it gives. It has already accounted for the sqrt(n) stuff.

Question: What should we do if we find we are dealing with unbalanced data? Is there a button we have to depress in Splus to make the ANOVA adjust appropriately?
Answer: For us, we just want to make sure we specify our model in the right order. If you do that, you'll be fine on the midterm questions. Even if you don't, it shouldn't change answers much at all for the data we have.

Question:
Answer:

Question: does shedding light mean to show a graph that you can look at it and say "yep, the low trap values look higher than the high trap values"
Answer: Yep, that's it exactly!

Question: For the 95% confidence intervals for each berry variety, I don't know how many degrees of freedom are needed to derive t*.
Answer: It's the giving season. Use the df in the estimate for s pooled which is n - (# of mean parameters).

Question: For question 3, when you say compute confidence intervals, do you mean by hand, or can we give you the CI's we get from multiple comparisons analysis?
Answer: I want CI's for what I expect my strawberry yield to be on an average block. So, 1 ci per variety, and estimates of this. You may have to do part of this "by hand"

Question: Also for question 3, when you say "how would you modify your estimates in 2", which estimates are you referring to? Is it the CI's?
Answer: I'm referring to the estimated yield for each variety on an average block.

Question: What are the experimental units in the leafarea problem, individual samples or colonies? It seems that using a model that compares island effects for all the samples is inappropriate and that it really should compare only the means of the samples for each colony. Any thoughts?
Answer: Good question. It depends on whether you want to use these colonies to tell you about ant colonies in general, or do you just want to know is there evidence that these particular colonies harvest different leaf sizes on average. Since here the question is about these particular colonies. In fact, the islands were exhaustively surveyed for colonies. So here the experimental unit is the individual sample.