Art: From Kawarazaki Shodo, Take.

Project objectives

The purpose of the open-ended project is for you to use your statistical toolkit to address a real-world research question. Using a dataset of your choosing, identify an interesting hypothesis, identify appropriate statistical methods, carry out your analysis, and present your results in a reproducible report in a meaningful and accessible way. The goal of this project is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond, if you like!).

Alternatively, you may choose to instead create a slide deck that presents a statistical methodology related to survival analysis that we did not get to cover. In this case, your final deliverable will consist of a slide deck appropriate for a 30-minute lecture that introduces a motivating example/dataset, gives details of the methodology (no derivations needed), and presents results/R code From your motivating example. Additionally, provide a script/outline for presenting the lecture.

Logistics

The final project is due Friday, Dec. 13th.

Note that there is no grace period for the final manuscript - this is because the project serves as this course’s final exam and is on the university final exam schedule. Please use your time wisely!

Detailed report instructions

If you decide to take on the data analysis project, your report should contain the following components:

Introduction and data

This section includes an introduction to the project motivation, data, and research question. Describe the data and definitions of key variables.

Grading criteria

The research question and motivation are clearly stated in the introduction, including citations for the data source and any external research. The data are clearly described, including a description about how the data were originally collected and a concise definition of the variables relevant to understanding the report. The data cleaning process is clearly described, including any decisions made in the process (e.g., creating new variables, removing observations, etc.) If included, the explanatory data analysis helps the reader better understand the observations in the data along with interesting and relevant relationships between the variables.

Methodology

This section includes a brief description of your analysis process. Explain the reasoning for your approach. Additionally, show how you arrived at the final model by describing the model selection process, variable selection process, assessment of conditions and diagnostics if relevant, and any other relevant considerations that were part of the model fitting process.

Grading criteria

The analysis steps are appropriate for the data and research question. The student thorough and reasonable in selecting the final approach, which is clearly described in the report. The model selection process was reasonable, and addressed any violations in model conditions were discussed and/or fixed if appropriate. The model conditions and diagnostics are thoroughly and accurately assessed for their model. If violations of model conditions are still present, there was a reasonable attempt to address the violations based on the course content.

Results

This is where you will output your results. Describe the key results from the analysis and show that you are proficient in using the model output to address the research questions, using the interpretations to support your conclusions. Focus on the variables that help you answer the research question and that provide relevant context for the reader.

Grading criteria

Results are clearly presented, and interesting findings from the analysis are clearly described. If relevant, interpretations of model coefficients are used to support the key findings and conclusions, rather than merely listing the interpretation of every model coefficient. If the primary modeling objective is prediction, the model’s predictive power is thoroughly assessed. If the primary objective is descriptive, the description of the results are strong and comprehensive.

Discussion

In this section you’ll include a summary of what you have learned about your research question along with statistical arguments supporting your conclusions. In addition, discuss the limitations of your analysis and provide suggestions on ways the analysis could be improved. Any potential issues pertaining to the reliability and validity of your data and appropriateness of the statistical analysis should also be discussed here. Lastly, this section will include ideas for future work.

Grading criteria

Overall conclusions from analysis are clearly described, and the model results are put into the larger context of the subject matter and original research question. There is thoughtful consideration of potential limitations of the data and/or analysis, and ideas for future work are clearly described.

Organization + formatting

This is an assessment of the overall presentation and formatting of the written report.

Grading criteria

The report neatly written and organized with clear section headers and appropriately sized figures with informative labels. Numerical results are displayed with a reasonable number of digits, and all visualizations are neatly formatted. All citations and links are properly formatted. If there is an appendix, it is reasonably organized and easy for the reader to find relevant information. All code, warnings, and messages are suppressed. The main body of the written report (not including the appendix) is no longer than 8 pages.

If you elect to use any AI-based tools, provide the entire transcript of your interactions. Failure to do so is a violation of the Duke Community Standard.