Project


Background

The project in this class represent an opportunity for you to tackle an open ended statistical analysis to address a specific research questions. The goal of this project is for you to demonstrate proficiency in the techniques we have covered in this class and apply them to a complex dataset in a meaningful and appropriate way. All analyses must be done in RStudio and written up using Rmarkdown.

You should write as if you are explaining your results to someone who would be interested in your research question, whether this is another scholar in your field or a peer sharing your interest in the topic. Keep in mind that this audience may or may not have taken statistics, but you must be statistically accurate and use correct statistical terminology, but must also explain your conclusions in a way that a lay person can understand.



Template Files

To download the template files for the project run the following code inside RStudio:

download.file("http://stat.duke.edu/~cr173/Sta102_Fa15/Proj/proposal.Rmd", destfile="proposal.Rmd")
download.file("http://stat.duke.edu/~cr173/Sta102_Fa15/Proj/project.Rmd",  destfile="project.Rmd")



Data set

In order for you to have the greatest chance of success with this project it is important that you choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple relationships can be explored. As such, your dataset should have at least 30 observations and between 5 to 20 variables (exceptions can be made but you must speak with me first). Additionally, your data must represent a sample, and not a population as it is no possible to perform inference with population data. The dataset’s variables should include both categorical variables (e.g. political party affiliation, gender) and numerical variables (e.g. years of education, number of foreign languages spoken fluently, height, weight).

All analyses must be done in RStudio using the template files provided. Make sure that you are able to load your data into RStudio as this can be tricky depending on the source. If you are having trouble ask for help before it is too late. Also remember that you must include the code to load your data in the Rmd document as well as any supplementary code or tools (e.g. the inference function).



Proposal - due Friday, November 13th by 5 pm

On November 13th you will hand in a project proposal. This consists of completing the provided template (proposal.Rmd) and answering the included questions. This should introduce your general research question (this should include your hypothesized answer) and your data (where it came from, how it was collected, what are the cases, what are the variables, etc.). You will also include some preliminary exploratory data analysis (univariate descriptions of the variables relevant for your research question is sufficient) in order to prove the data is imported into Rstudio and is correctly formated. You will be provided with feedback on the quality of your research question and data so that you will be able to address any issues before completing the final project.



Project - due Wednesday, December 9th by 5 pm

As stated above, the goal of the project is to use the skills you have acquired in this class to undertake a novel statistical analysis of a research question of your choosing. You must use RStudio for your analysis and write up all results using knitr using the provided template. Specifically, your writeup should provide a narrative of your analysis including all necessary background information. In general, the writeup should have three primary components: an introduction, the analysis, and a conclusion. The introduction should contain a description of the dataset and research question, and should also address the significance of the research question as well as the relevance of the data to answering that question.

The bulk of the assignment will consist of a detailed analysis of the data using the methodologies we have discussed in class. While some questions can be addressed directly by a single univariate or bivariate inference test, this likely indicates the research question is too specific and should be broadened. Conversely, if you find yourself performing more than a handful of tests you question is either too broad or the tests are redundant. The goal is not to do an exhaustive data analysis i.e., do not calculate every statistic and procedure you have learned for every variable, but rather let me know that you are proficient picking the correct tool for the job at hand and that you are able to correctly interpret and present the results of that tool(s). Focus on methods that help you to answer your specific research questions.

The writeup should also include a one to two page conclusion and discussion. This should summarize what you have learned about your research question along with statistical arguments supporting your conclusions. It is also a good idea to critique/assess your own methods by discussing any potential limitations and providing suggestions for improvements. Issues pertaining to the reliability and validity of your data, and appropriateness of the statistical analysis should be discussed here. A paragraph on what you would do differently if you were able to start over with the project or what you would do next if you were going to continue work on the project can also be included.

Some other general guidelines:



Submission

For each assignment you must turn in your write up using Sakai’s Assignments tool, you will be allowed to upload the assignment(s) multiple times without penalty until the deadline.

Your submission must include:

You do not need to include your data.

Late work policy applies (-10% per day) until all files are submitted in working format. It is your responsibility to confirm that any file uploaded to Sakai are working properly (i.e. corrupted files are not an excuse for late work).



Grading

Grading of the project by the professor and TAs will take into account the following:


A general breakdown of grading is as follows:


Please note that if you score less 30% on the project you cannot pass this course and that late projects are assessed a 10% per day penalty - as such your project must be turned within one week of the deadline in order to pass this class.