Union Carbide's questions are not going to yield accurate descriptions of people's opinions.
The sentences before the tax credits question frame tax credits as purely positive; they suggest no downside to tax credits. Of course people will respond favorably to policies that have only benefits! A fair question would inform the reader about the positives and negatives of tax credits before asking people's opinions.
The question about environmental regulations is phrased unrealistically. Almost everyone should favor regulations that increase productivity and protect the environment. It is more likely that changing some regulations would adversely affect either the environment or productivity. A direct question on the policy of interest, without propaganda, would provide more accurate information.
The probable objective of Union Carbide is to
publish results from their "survey" that support tax credits, which will
boost their profits, and support changing certain environmental regulations
that they label as beneficial for the environment and business.
These are classic examples of leading questions whose purpose is to advance
the sponsor's agenda. The insidious part of all this is that
sponsors of leading questions are likely to portray their results as "scientific"
by reporting only some smoothed version as opposed to the actual
results. For example, Union Carbide might claim something like,
"Our survey results showed overwhelming support for tax credits and for
changing regulations on the use of mercury."
2. Problematic questions?
i) How many soft drinks did you consume in the past week?
This question might confuse people. Soft drinks means different things to different people; some might include sodas only, others might also include fruit and vegetable juices, seltzer water, and other non-alcoholic beverages. Additionally, the phrase "past week" is somewhat vague. When exactly should the person start counting days to get the "past week"?
These problems can be fixed by clarifying the objective of the question. If the question is asked by a soda company, they may want to specify drinking soda only. If the question is asked by a nuitritionist, she may want to ask about each of the types of beverages in separate questions.
ii) What kind of health insurance plan do you have: a staff model health maintenance or organization, an IPA, a PPO, or unrestricted fee-for-service health plan?
Most people do not know the definitions of these types of health care plans, so that they do not have the information needed to answer the question. One way to solve this problem is to preface this question with a description of each type of plan, thereby giving respondents the information they need. Another way is to ask a series of questions, each of which describes some aspect particular to each plan. Then, the researcher can figure out what plan the respondent has based on his answers.
iii) Have you ever cheated on an exam during
college?
Clearly, few people are going to admit to this,
especially if they think someone with authority over their employment or
grades can figure out who completed the questionnaire! When
asking sensitive questions, researchers need to take many steps to ensure
confidentiality. Even then, it still may be difficult to get accurate
information.
iv) Do you favor or oppose gun control laws?
"Gun control laws" is too broad and needs to
be refined. Does this mean favoring bans on automatic assault rifles?
Does it mean favoring mandatory waiting periods for people seeking to buy
guns? Does it mean banning the sale of all guns? People will
interpret this question differently, so that results may not represent
much of anything.
v) Would you like to be rich and famous?
You'd better believe it!!! Well, some people want to be rich but not famous, and some want to be famous but not rich. How do these people answer the question? They cannot, because there is no option for them in this double-barreled question.
(P.S. If you answered yes, don't be
a professor....)
vi) How often do you feel tired during
the day--always, usually, sometimes, rarely, or never?
This is a hard question to answer. First, people have different definitions of tired. Some might interpret it as falling asleep, and others might interpret is as yawning. Effectively, such people will be answering different questions. Second, most people have levels of tiredness that vary with different days: a day with three straight classes is a lot tougher than a day with zero classes! The question doesn't tell respondents which day to choose.
This question probably can be improved by refining the objective. If the objective is to learn about people's sleeping habits on weekdays, then a direct question to that effect can be asked. If the objective is to find out about people's level of activity during the week, then a series of questions about various activities can be asked.
3. Survey, randomized experiment, or observational study?
i) It is a survey. The biologist seeks to describe the target population of all fish in the river by sampling a proportion of them. It is hard to conceive of a sampling frame for this study, since it is not possible to inventory all fish in the river. One way to pick a random sample is to pick sections of the river randomly, then throw a net into the river to collect fish in the section.
ii) This is an example of a randomized experiment. It is a causal study, because eating high/low fiber can be considered as a treatment: one can either follow or not follow a high fiber diet. The response is occurrence of heart disease. Treatments are randomized, i.e. the patients are randomly assigned to low/ high fiber treatment.
iii) This is a survey. We seek to describe three types of mutual funds, and use these descriptions to compare them. This is not a causal study, because there are no treatments. Each fund's type is a characteristic of the fund, like each person's race is a characteristic of the person. Fund type, like race, is not potentially manipulable. These characteristics of the funds are inherent and can not be changed .
The target population in this survey is all mutual funds in the small company growth fund, mid-size company growth fund, and large company growth fund categories. To get an appropriate sampling frame, one can compile a list of all possible mutual funds' names from a company like Morningstar Investment Magazine or from looking at the Mutual Fund section of the Wall Street Journal. To take a random sample, give each fund a number and then randomly pick a set of numbers using Minitab or some other software capable of giving random numbers. The random sample contains the mutual funds whose numbers were selected by the random selector.
iv) This is an observational study. First of all, it can be seen that it is a causal study, where high weight gain in pregnancy and low weight gain in pregnancy are the treatments. That is, we can conceive of manipulating the amount of weight gain., so it is a treatment. The response is the weight of the newborn baby.
However, in this study we did not (and ethically cannot) randomly assign the women to high/ low weight program. Since there is no random assignment, but there is are treatments we want to compare, this is an observational study.
To get valid estimates of the causal effect of high/low weight gain on babies' birth weight, we would try to construct a group of high weight gain mothers that looks as similar as possible to the group of low weight gain mothers on all background characteristics, then we would compare the average birth weights of the babies in the two groups. If we can not get adequate balance of the background characteristics in the two groups, we have to throw up our hands and admit that we cannot assess this causal question.
(Note: Causal studies where treatments are randomized are called "randomized experiments," and causal studies where we do not randomly assign subjects to treatments are observational studies.)
v) This is a survey, because there is no treatment is the study. Fitness level can not be considered as a treatment, because as discussed earlier in problem (iiii), the fitness level is an inherent characteristic of the subject.
On the other hand, if we wanted to compare people who had participated in some high-energy fitness program with people who had not participated in the program, then there are treatments: attending versus not attending the fitness program. This would be a causal study. If the program attendance was randomly assigned, it would be a randomized experiment. If the program attendance was not randomly assigned (e.g., people voluntarily entered the programs), then it would be an observational study.
4. Keep me single and kid-free!!
We don't believe that roughly 70% of people in America in 1976 wished they hadn't had children. The sample is not a random sample from the entire U.S. population and likely fails to reflect the characteristics of the U.S. population. Evidence of this includes:
(1) The percentage of women in the respondents (80%) is very high compared to the percentage of women in the U.S. population (around 50%). Men are likely to have different opinions than women on this issue, since men do not give birth and are in many families not primarily responsible for raising the children.
(2) People responded voluntarily. Perhaps they did so because they are passionate about this issue. Such people are likely to feel the need to express opinions that run counter to prevailing wisdom, which in this case is that "children are a blessing." This is a classic example of the problems with voluntary response sampling.
(3) Only people who read Ann Landers's column
could possibly answer the survey. Ann Landers's readers are not necessarily
representative of the U.S. population. The opinions of people who
do not read her column are just as important as those who do. This
is an example of frame coverage bias.
5. Give me a new computer.
We should not rely on the results of the survey.
This is again a case of voluntary response sampling, in which readers respond because they have passion about the issue. People with computer problems are more likely to complain about these problems, and the PC World survey provides a perfect outlet to complain. People who don't have any problems are less likely to feel the need to tell people they don't have problems. This potential bias is compounded because of the incentive offered by PC World: people who respond are entered into a contest to win a new PC. If you're having problems with your PC. you want a new one and so will respond to the survey!
There is also frame coverage bias. The study
has a target population of all PC owners. However, the study is based
only on the opinions of PC owners who subscribe to PC World.
This may not be the same population as all PC owners.
6. Potential problems with surveys.
1) When collecting responses, the interviewer should not dramatically affect the responses. Keeping this in mind, local police are not the best people to interview others about local policing. Respondents are unlikely to tell the police offer that they don't like the police!! Honest opinions about this issue are hard to obtain with this design.
2) This is a type of convenience sampling, because the forester is sampling from the part of the forest that is near (hence convenient) and not from the entire forest. Areas close to the research station may be systematically different than areas far from the station. For example, the presence of humans near the research station may impact the diversity of species in the forest areas close to the station, in which case data from these convenient areas are not representative of the entire forest.
3) This is a mess!! There is convenience sampling and judgment sampling by the student government, mixed in with voluntary response by potential respondents. People who go to the Bryan Center are likely to be different than people who do not go there. For example, they may be more active in extracurriculars than those who do not go there. Thus, sampling only people who go to the Bryan Center may lead to unrepresentative samples. Further, interviewers sitting at tables may try to convince people who look "friendly" to stop by and complete the questionnaire. Such people may hold similar opinions, and these opinions may differ from people who don't look as friendly. Finally, only people who have strong opinions on this issue are likely to stop and fill out the questionnaire, thus leading to further unrepresentativeness.
4) This design suffers from frame coverage bias, because the company wants to estimate the proportion of potential buyers in the entire American population but is sampling from their current customers. Given that current customers have already decided to buy an old version of the product, these people may be more likely to buy the new version than those who had never bought the product before.
5) This is a type of voluntary response sampling, where people respond to the survey because they have strong opinions on Social Security. Also, the nature of the TV program on social security could influence responses. For example, if the program portrays Bush's plans in a very positive light, more respondents are likely to support the plan.
7. Identifying study flaws
i). The implied treatments in this problem are "use the new screening procedure" and "use the old screening procedure". The response is the number of patients' deaths in the emergency room.
The main problem with this design is that there is no group of people assigned to the old screening procedure during the first year of its initiation. Instead, the comparison group is patients from a previous year. We have no assurance that these two groups of people look similar on background characteristics. For example, perhaps last year the patients were sicker than the patients this year (e.g., a bad economy last year caused intense stress which led to more life-ending heart attacks). Or, perhaps some other change in the emergency room (e.g., new doctors or new medical technology) other than the screening procedure is reducing death rates. Unless we randomly assign some people (or emergency rooms) to use the new procedure and others to use the old procedure, we cannot be sure that the reduction in death rates is caused mainly by the new procedure.
ii). The treatments are high and low error rates in passages. The response is the number of errors recognized by a person when proofreading these passages.
This is a flawed design because the ordering of the passages is confounded with the effects of the different error rates. For example, say readers get tired of proofreading by the time they get to the second passage, so that they are prone to miss more errors when reading the second passage. Since every person reads the high error rate passage first and the low error rate passage second, this tiredness effect would make people's performance with the low error rate look worse than it really is.
To fix this design, two random groups can be created, so that one group reads the high error rate passage and one reads the low error rate passage. Then, the average scores in the two groups are compared. Or, one could randomize the order that people read the passages, so that each treatment has an equal chance of going first.
Another potential issue with this design is that results on this particular passage may not be generalizable to other reading materials. Whether this is a problem would depend on the nature of the passage and the specific causal question of interest.
iii). These data come from two completely different target populations. Those who went to see the neurologist are probably wealthier and actively sought out the doctor. Such people tend to have higher educations, which explains why the studies using neurology data find a strong link between education and AD. Those people in the community survey are likely to have a wider range of wealth and education. There may be an entirely different relationship between education and AD in the community population. Thus, it is not surprising that the studies seem to contradict each other with regards to the link between education and AD.
iv) The target population is all old people. By using only the people who responded to all four interviews, the sampled population in 1990 no longer represents this target population. Specifically, the sample does not represent people who would not answer all four interviews! Such people may have nursing home admission rates that differ from those of people who did respond to all four interviews. For example, the people who survived all four years are likely to represent the most healthy segment of old people, and this segment does not enter nursing homes as much as the less healthy segment does. Thus, by using only people who answered all four years, we likely are underestimating the true percentage of people who enter nursing homes.
8. Clean Experimental Designs
Recall the essential principles of design of experiments:
(i) The units should be randomly assigned to the
treatments, so that the groups to be compared look similar in characteristics
that may affect the response, apart from the difference in treatment.
(ii) The study should have realistic conditions
and be generalizable to populations other than the study population.
(iii) When possible, studies should be
double blind.
(iv) There should be no noncompliance,
no interference between units, and no order effects.
Now the solutions:
(a) The conditions used to obtain the data on standard detergent are likely to be different than the conditions used in the current washes. For example, perhaps the laboratory results were obtained using a machine that is not as efficient as the new machine. Following Squeeky Pete's design, we would have no clue whether the washes with Sparkle Clean are better (or worse) due to the detergent or due to the difference in machines. We should strive to compare the brands under identical conditions. Hence this is not a good design.
(b) This design is almost OK, but it has one problem. By washing all pieces for each detergent together, we only have one washing for each detergent type. Effectively, the response variable is the amount of stain removed during one wash on all eight pieces. Having only one observation per treatment group does not give as much information as running eight separate washes under each detergent. In other words, this design wastes resources.
(c) The experiment is carried out under completely unrealistic conditions. The effects of the detergents may change completely under the normal and moderate conditions in regular washing . Hence we should not accept this design.
(d) This is even worse than the first suggestion. The detergents may not be compared under identical conditions. Plus, it would be unwise to use data from the company that makes Sparkle Clean when we want to evaluate Sparkle Clean independently!!
(e) At first sight this may look like a reasonable
design, however the consecutive washes could introduce an order effect
into the comparisons of the treatments . For example, perhaps the
machine works less efficiently after several washes. This would make
Sparkle Clean look less effective, which is unfair since it is the machine
that is failing and not Sparkle Clean . Hence this design is also
not good !
A reasonable design is to line of all 16 pieces
of cloth, then randomly assign eight to get Sparkle Clean and eight to
get the standard detergent. Then, wash them under normal conditions
separately in the order that they were lined up originally. This
randomizes the order of applying the detergent, which eliminates the order
effect.
9. Does Preventing Artery Clog Prevent Memory Loss?
We shouldn't be confident that there is a causal relationship between artherosclerosis and memory loss. The group of people with low artherosclerosis had less memory loss than the people with high artherosclerosis, but they also are likely to have different background characteristics. For example, the people with high artherosclerosis may have been older and hence more prone to memory loss, or may have a gene or live in an environment that predisposes them to having several health problems, or may lead an inactive life that results in both artherosclerosis and memory loss.
The point is that there may be many background characteristics that differ in the group that has high artherosclerosis versus the group that has low artherosclerosis. These background differences could explain memory loss as opposed to artherosclerosis levels. Hence controlling artherosclerosis alone may not control the memory loss.
To assess this question, we would need to do an observational study. We would try to construct a group of high artherosclerosis people and a group of low artherosclerosis people who had similar background characteristics on as many variables as we can collect that might be related to memory loss. Then, if we can achieve balance on these variables in the groups, we would be able to assess the effect of high versus low artherosclerosis on memory loss.
10. More questions on study design
a) There are two potential problems with this survey. First, the sampling frame is the 66,000 dentists who subscribe to the magazines. These dentists may not be representative of all dentists. For example, by subscribing to the magazines, they may be more interested in using dental products than dentists who do not subscribe. More importantly, not everyone responded to the survey. It is likely that only dentists with strong feelings about Ipana responded. For example, perhaps those who really like Ipana and have some brand loyalty to it were more likely to respond.
b) The residents increased happiness could be caused by the visits rather than the tea. This is an example where there is no control group; only visits with tea are made. A better design would have random assignment of visits with tea and visits without tea.
c) Assuming compliance with the study design, this appears to be a valid study. Random assignment of two treatments and blinding were used.
d) This survey could give misleading results because the samples may not be representative. For example, the students at the International House social event might be more up to speed with international affairs than those students who do not with to interact with other international students at such parties. Or, the students exiting large classes are more likely to be first year students or sophomores, who may be more or less knowledgeable about international affairs than others.
e) The problem with this study is that only businesses that survived are examined. To determine the differences between those that survive and those that fail, one needs to examine the businesses that fail as well. For example, what if 80% of all businesses that fail are family owned; then, there is no relationship between family ownership and survival.
f) Only people with strong opinions are likely to send a letter to their Congressperson. Hence, these people are not representative of all constituents.
g) By visiting only at home and during the week, the surveyors miss households in which no one is home during those times. These households may have less time to bake bread than households where someone is home, so that the percentage is too high.
h) Eating lobster is associated with having high income, and high income is associated with better health outcomes. Hence, eating lobster may not cause better birth outcomes, rather it is a signal for families with higher income.
i) This study is well-designed for the population of volunteers in the study. Conclusions made from the experiment should be valid for them because of the randomization and blinding. However, the researchers should be wary about extending these conclusions to broader populations. People who are addicts have different health than non-addicts, and the vaccination may have different effect for non-addicts.
j) Just because one is asked about bad hair and
then gets in a bad mood, it does not mean that bad hair causes low self-esteem.
The study conditions do not really address the question of bad hair versus
bad mood.