class: center, middle, inverse, title-slide # Writing SAPs ### Yue Jiang ### Duke University --- ### A disclaimer The following material was used during a live lecture. Without the accompanying oral comments and discussion, the text is incomplete as a record of the presentation. A full recording may be found via Zoom on the course Sakai site. --- ### What is an SAP? A .vocab[statistical analysis plan] is a tool often used in clinical trials that establishes the research aims, study design and variables, statistical methods and models, and rationale for choosing such methods. It often includes analyses regarding sample size and power, as well as logistical details regarding any randomization procedure, data entry, quality assurance, and database management. Importantly, SAPs are written *prior* to initiating of the clinical trial in question. .question[ Why is having an SAP important? ] --- ### What is an SAP? > We will use t-tests and chi-square tests to assess continuous and categorical variables, respectively. -- is basically saying > We will use beakers to hold solutions and burettes to perform titration. SAPs should provide *strategies* and *rationales* instead of simply listing procedures. Explain *why* each statistical tool is being used and how it specifically addresses one of the research aims (don't be afraid to restate them!). Discuss rationale, appropriateness, advantages, and limitations, including mention of competing reasonable methods that were not selected (and why). --- ### What is an SAP? Strong SAPs will discuss main analyses (used to drive storyline of paper), but also include auxiliary analyses to support the main results: - .vocab[Sensitivity analyses] help evaluate robustness of the main results to assumptions or reasonable differences in choice of methods, inclusion of outliers, etc. - .vocab[Goodness of fit analyses] help place main results in context of the overall data. -- Be *comprehensive* in anticipating data issues. Mention strategies for dealing with incomplete data or missing data, questionable data values, violation of distributional or modeling assumptions, and multiple comparisons. --- ### What is an SAP? Provide detailed "template" mock-ups of all figures and tables to be included in the main manuscript. You may consider including a detailed .vocab[data dictionary] providing variables you intend to collect, the timing of variable collection (if multiple visits are specified), and the expected format and units of such variables. You may also refer to existing guidance for certain types of trials: - [CONSORT](http://www.consort-statement.org/): Consolidated Standards of Reporting Trials - [STROBE](https://www.strobe-statement.org/index.php?id=strobe-home): Strengthening the Reporting of Observational Studies in Epidemiology - [TREND](https://www.cdc.gov/trendstatement/): Transparent Reporting of Evaluations wiht Nonrandomized Designs --- ### What to include in an SAP? <img src="img/sap1.png" width="100%" style="display: block; margin: auto;" /> --- ### What to include in an SAP? <img src="img/sap2.png" width="100%" style="display: block; margin: auto;" /> --- ### What to include in an SAP? <img src="img/sap3.png" width="100%" style="display: block; margin: auto;" /> --- ### What to include in an SAP? <img src="img/sap4.png" width="100%" style="display: block; margin: auto;" /> --- ### What to include in an SAP? <img src="img/sap5.png" width="100%" style="display: block; margin: auto;" /> --- ### What to include in an SAP? <img src="img/sap6.png" width="100%" style="display: block; margin: auto;" /> --- ### What to include in an SAP? <img src="img/sap7.png" width="100%" style="display: block; margin: auto;" /> --- ### Power/sample size analysis .question[ What is the definition of statistical .vocab[power] and why should we care? ] -- .vocab[Power] is the probability of rejecting the null hypothesis when it is false: P(reject `\(H_0\)` | `\(H_0\)` is false) - often calculated for *specific* alternatives. Check out an [interactive visualization](https://rpsychologist.com/d3/nhst/) of some factors that are related to power. --- ### Power/sample size analysis .question[ Why care about choosing a sample size/power? ] -- - To show that under certain required conditions, a hypothesis test has a good chance of showing the anticipated difference, if it really exists - To be more confident that a null result is not simply a sample of excessive variability - To show a funding agency that the study has a reasonable chance of reaching a useful result - To show that necessary resources (human, animal, financial, time, etc.) will be minimized -- Note that for multiple specific hypotheses of interest, each with their own tests and estimates of interest, you may come to different conclusions when evaluating each! --- ### Calculating power Suppose `\(X \sim N(0, 3)\)` and `\(Y \sim N(\_\_, \_\_)\)`, and that you are interested in testing `\begin{align*} H_0: \mu_X &= \mu_Y\\ H_1: \mu_X &\neq \mu_Y \end{align*}` What is the anticipated power at `\(\alpha = 0.05\)` if you have 20 subjects from population `\(X\)` and 20 from `\(Y\)` if `\(Y \sim N(1, 3)\)`? How about `\(Y \sim N(5, 12)\)`? What if we had 40 subjects from `\(Y\)` or if we specified `\(\alpha = 0.01\)` instead? -- - Built-in functions in software - Formulas do exist for certain types of (simple) analyses! - In real-life, **simulation** is often used to simulate power across a wide range of potential alternatives and across a wide range of potential data patterns (demonstration given in class) --- ### Power/sample size analysis **However**: there is *no* place for power when analyzing results -- it is irrelevant for doing so! - Power is only useful in the planning stages; observed confidence intervals and point estimates are all that's needed for (frequentist) analysis. - No additional information can be obtained by performing any kind of power calculation. - After data are collected, these are just previous conjectures about expected behavior - they provide no assistance in interpreting the study's data. - Inclusion of pre-study power analyses may lead to misinterpretations regarding study results. Post hoc power analyses are similarly worthless (similarly for sample size analyses). --- ### Power/sample size analysis .question[ Does this in fact mean that power/sample size calculations are not important? ] --- ### A cautionary tale <img src="img/reinhart.png" width="40%" style="display: block; margin: auto;" /> Example adapted from [Reinhart, 2015: Statistics Done Wrong](https://www.statisticsdonewrong.com/) --- ### A cautionary tale Suppose 100 independent researchers conducted 100 pilot studies in which the true magnitude of the effect was small. Because these were pilot studies, each investigator is only able to study a small number of patients. -- Because of the small sample size, estimators have high variability -- there is low .vocab[precision] in these estimates. Each investigator plans to test the null hypothesis of zero effect (even though we can probably suspect that any tests will have low power). --- ### A cautionary tale Suppose the true unknown power of each test is 7%. We would expect 7 of the investigators to obtain a statistically significant p-value and conclude that the effect is not-zero. These 7 investigators have made the *correct* decision - they have not made false discoveries! Remember, the effect truly exists. .question[ What are some potential consequences of this sequence of events? ] --- ### A cautionary tale Because of the small sample size, the estimate of the *magnitude* of any effect will be gigantic if the p-value is statistically significant (given the high variability, how else would we have rejected the null hypothesis with such few patients?). **The p-value will be significant here only when the estimate happens to be large.** .question[ Under this understanding, what are some other potential consequences? What is a simple way we can mitigate some of these consequences? ]