Resources
CONTACT INFO FOR JENISE
Email: jenise@stat.duke.edu
Phone: 684-3437 (voice mail is not functional)
Office: 219 Old Chemistry Building
Office hours: Tuesday and Thursday, 4:00-5:00 (or by appointment)
Course web page: Course materials (e.g. suggested problems,
course overview, office hours, etc.) will be available from the course
web page at
http://www.stat.duke.edu/courses/Spring00/sta110b/
Required text: Introductory Statistics for Business
and Economics, by Thomas H. Wonnacott and Ronald J. Wonnacott, 4th
edition
Topics
Topics: We will follow the text closely, although some
ideas may be included/left out/emphasized during lecture.
- Basic sampling ideas, common statistical terminology,
descriptive statistics
- Basic notions of probability, including common probability distributions
- Sampling and point estimators
- Confidence intervals
- Hypothesis testing
- ANOVA
- Linear and multiple regression
- Chi-square tests
Sections, homework
Sections:
- participation in computer labs is part of your grade
- you will not be penalized for missing two or fewer section meetings
- short quizzes will be administered weekly
- John Kern will be supervising all sections.
Homework:
- suggested readings/problems posted as we go along
- intended to help you gauge your progress and review the material
- will not be graded
Projects, exams
Projects:
- perform an analysis of the assigned data set and submit a write-up summarizing their findings
- assigned in the second half of the semester
- can be done in groups or individually
- guidelines for the analysis and write-up will be posted on the web
Exams:
- midterm will be given in lecture (date to be determined)
- final will be cumulative
- you will be allowed to use a 8.5''x 11'' sheet of notes at these exams
Sampling
population: a whole set of things, people, etc. that we want to
describe or discover something about
The sample is a subset of the population that we are able to
observe, experiment on, etc.; can use it to infer facts about
the population. Use a sample when it would be difficult or impossible
to make observations about each member of the population.
A simple random sample is a special kind of sample. Each member
of the population has an equal chance to be part of the sample; whether
or not a member becomes a part of the sample is determined randomly.
We may obtain a biased sample if certain subset(s) of the
population are overrepresented in the sample. Then the views of these
subset(s) may predominate and cause us to portray the actual
population incorrectly.
Descriptive vs. inferential statistics
deduction: With information aboout the population, we make
statements about the sample is likely to look like
induction: Methods to allow us to use what we know about the
sample and try to generalize this information for the whole
population. We use the rules of probability to help us make sound
inferences. Also known as inference/inferential statistics.
descriptive statistics: Methods to summarize the information
that we know about a population or sample
Typical methodology for comparisons
- Determine the question of interest
- Identify the response to be measured or observed
- Identify the factor that you think may be causing the
difference, the treatment factor
- Define the treatment group and the control group
- Compare the groups to see if your hypothesis is consistent with
the results
Experimental design basics
Conduct a study to determine whether a relationship exists between variables.
- Compare the treatment and control groups to see if an
association exists
- If the treatment has no effect the two groups should have
the same response
- The two groups should be similar in all respects except for the
treatment, so that differences are due to the treatment
Treatment vs. control
We want to make sure that treatment and control groups are as similar as possible.
- Other factors could be contributing to the difference in
responses of the groups
- Placebos: ``fake'' treatment to make it impossible for
subjects to know they are controls
- Double-blind studies: designed so that neither the
subjects nor the experimentors who interact with them
know which are the controls
- Confounding variables: difference in response is due to
some factor other than treatment; this other factor is also
responsible for the fact that subject is in the treatment group.
Controlled experiment
In a controlled experiment, the experimenter chooses who will receive treatment and who will not.
- For best results, choose people eligible for either, and then
randomly assign to treatment or control: randomized controlled
- Chances are the two groups will be similar in all respects except for the
treatment
- Avoid bias of the experimenters in assigning groups and self-selection of experimental subjects.
What if we cannot choose whether someone will be treated or not?
Observational study
In an observational study, the experimenter cannot choose who will
receive treatment and who will not.
- Possible that the ``treatment'' variable to be studied is a
condition or a behavior that cannot be changed or forced
- More difficult to establish similar treatment and control groups
- Can only establish association, but not causation
Historical controls
It's sometimes necessary to compare a treatment group in the present
with a control group from the past
- Conditions in the past are often not the same as those in the
present (e.g. improved medical procedures, changes in
environment and social structure)
- Unable to observe those patients from the past to re-evaluate
- If possible, it's best to compare contemporaneous groups
A few more mistakes that can hamper your analysis
- Failure to identify specifically what variable you're interested
in (ex: true interest is a proportion, but you're presented with raw numbers)
- Reviewing only a portion of a much longer trend (ex: seasonal
temperature changes)
- Other assorted factors (ex: increase in salaries in the US over
the period from 1960-1980, measurement/survey methods have changed
over time, etc.)
Example
In the U.S. in 1985, 19,893 people were murdered, compared
to 16,848 in 1970 - nearly a 20% increase. ``These figures show
that the U.S. became a more violent society over the period
1970-1985.'' True or false, and explain briefly.
Example
One of the leading causes of death in the U.S. is coronary artery
disease, in which the main arteries to the heart break down. This
disease can be treated with coronary bypass surgery. In one of the
first trials of the operation, Dr. Daniel Ullyot and associates
performed coronary bypass surgery on a test group of patients; 98%
survived 3 years or more. Previous studies showed that only 68% of
the patients getting conventional treatment survived 3 years or more.
(The conventional treatment used drugs and special diets to reduce
blood pressure and eliminate fatty deposits in the arteries.) A
newpaper article described Ullyot's results as ``spectacular''.
Comment briefly.