STA114: Statistics

Optional Term Project

Due: 7pm Wednesday, May 2, 2001

Home Page Syllabus Computing Data Proj ACES (STA114, MTH136)

Students who wish to may replace the final exam by a Final Project in which they apply statistical methods from this course to a real data set in order to solve a scientific problem. Each project must include: Projects:

Projects must demonstrate mastery of a range of statistical ideas; routine binomial analysis of survey data would not be appropriate. The project takes the place of a 3-hour comprehensive final exam and is not any easier than studying for and taking the final- you must show just as much depth and breadth of statistical knowledge in a project as you would have in an exam. Most projects will involve model building and model elaboration, computing and displaying posterior distributions for quantities of interest, often using regression and linear models (you may want to read ahead a bit).

Substantial use of methods from outside this course (for example, from econometrics or environmetrics courses) is discouraged, since your goal is to show matery of ideas from this course. Statistical methods not covered in the class sylabus must be carefully referenced and explained to demonstrate that the analysis is original.

Team projects are possible, with a maximum of three team members, but will have to be substantially deeper (and a bit longer) than individual projects and must show each participant's specific contribution in detail.

One source of data sets is the book A Handbook of Small Data Sets by Hand et al. While the book's 510 data sets are only described in the book itself (you can borrow my copy in my office, and xerox a copy of whatever data sets you like), the data sets (just numbers, no stories) are on-line. You can get to them by following the Data link from the Home or Syllabus pages, then take the Hand et al. link from there. A similar collection of 100 data sets appears in the book Data by Andrews and Herzberg; this one (also on-line) includes some famous datasets, like the Stanford Heart Transplant data we've looked at in class and the 1875-1894 Deaths by Horsekicks in the Prussion Army data (they follow the poisson distribution, probably too well, making some people suspect that outliers were altered or removed). There are lots of other data sets available on-line too; start with the class web page, or use a search engine and have fun. CMU's StatLib and its Data & Story archive are especially good places to start.

While you're welcome to use your own data, it's probably not worthwhile to go collecting data only for this project (takes too much time to do it well). On the other hand, if you already have data from your ongoing research, coursework in another class, hobbies, etc., especially something you know and care about, feel free to use that dataset for this project.

Ask by e-mail ( or in person if you have additional questions. You can find me before or after class, in my Office Hours, or at other times I'm not teaching or away. I'm also happy to look over outlines or drafts and give you some feedback and suggestions UNTIL THE LAST WEEK OF CLASS. Sorry, but I will have little if any time during reading and exam weeks--- please start your projects early if you would like some feedback or help on them.