Home Page | Syllabus | Computing | Data | Proj | ACES (STA114, MTH136) |

Students who wish to may replace the final exam by a Final Project in which they apply statistical methods from this course to a real data set in order to solve a scientific problem. Each project must include:

- a description of a
**scientific question**being addressed; - a description of some
**data set**taken in the hope of answering that question; - a description of the
**statistical methods**you used to help illuminate the evidence offered by the data; - some
**critical analysis**of the statistical model you used. Graphical methods are especially useful here-- scatter plots, histograms, residual plots, etc. will be helpful. If your analysis involved regression, did you transform one or more of the variables? How? Why? Did you have to include a quadratic term? How did you handle your**variable selection**problem? Are you satisfied that the assumptions of linearity, equality of variance, and approximate normality are satisfied? Why?; - and the
**conclusions**that your analysis helps you to draw,*in the context of the original scientifc question*.

- Must include full references for the data set used and for any statistical techniques used that were not taught in this class;
- Will be judged on the
**statistical**sophistication and insight they show (no bonus for time spent collecting your own data). This assignment replaces a cumulative final exam, so it should show your mastery of this entire course; - Must be between 5 and 10 pages long. Computers should be used, but the
project should be a
**paper**and not just computer output-- include only the relevant plots or tables, and describe in your own words what light they shed on the scientific problem at hand; - Are due any time up to the beginning of the scheduled final exam (7pm Wednesday May 2).

Projects must demonstrate mastery of a range of statistical ideas; routine
binomial analysis of survey data would *not* be appropriate. The
project takes the place of a 3-hour comprehensive final exam and is
**not any easier** than studying for and taking the final- you must
show just as much depth and breadth of statistical knowledge in a
project as you would have in an exam. Most projects will involve model
building and model elaboration, computing and displaying posterior
distributions for quantities of interest, often using regression and
linear models (you may want to read ahead a bit).

Substantial use of methods from outside this course (for example, from
econometrics or environmetrics courses) is discouraged, since your goal is to
show matery of ideas from *this* course. Statistical methods
*not* covered in the class sylabus must be carefully referenced and
explained to demonstrate that the analysis is original.

Team projects are possible, with a maximum of three team members, but will have to be substantially deeper (and a bit longer) than individual projects and must show each participant's specific contribution in detail.

One source of data sets is the book *A Handbook of Small Data Sets*
by Hand *et al.* While the book's 510 data sets are only
**described** in the book itself (you can borrow my copy in my
office, and xerox a copy of whatever data sets you like), the data sets
(just numbers, no stories) are on-line. You can get to them by
following the Data link from the Home or Syllabus
pages, then take the Hand *et al.*
link from there. A similar collection of 100 data sets appears in the
book *Data* by Andrews and Herzberg; this one (also on-line) includes some famous datasets, like
the Stanford Heart Transplant data we've looked at in class and the
1875-1894 Deaths by Horsekicks in the Prussion Army data (they follow
the poisson distribution, probably *too* well, making some people
suspect that outliers were altered or removed). There are lots of
other data sets available on-line too; start with the class web page, or
use a search engine and have fun. CMU's StatLib and its Data & Story archive are
especially good places to start.

While you're welcome to use your own data, it's probably not worthwhile to go collecting data only for this project (takes too much time to do it well). On the other hand, if you already have data from your ongoing research, coursework in another class, hobbies, etc., especially something you know and care about, feel free to use that dataset for this project.

Ask by e-mail (
*wolpert@stat.duke.edu*) or in person if you have additional
questions. You can find me before or after class, in my Office Hours, or at
other times I'm not teaching or away. I'm also happy to look over outlines or
drafts and give you some feedback and suggestions UNTIL THE LAST WEEK OF
CLASS. Sorry, but I will have little if any time during reading and exam
weeks--- please start your projects early if you would like some feedback or
help on them.