This lab will not be graded - there is nothing to turn in. There is no repository for today’s lab; it will focus on statistical literacy and understanding published work.

Introduction

Dyrbye, Satele, and West (2021) published an analysis examining associations between burnout among medical students and characteristics of their medical school learning environment. In particular, the authors examine whether “perceived mistreatment during medical school [is] associated with burnout, emphathy, and career regret at graduation.” We will focus on burnout, which was measured on a numerical scale. The authors analyzed 14,126 students at 140 medical schools using data from the Graduation Questionnaire from 2016-2018 and their associated answers to the Year 2 Questionnaire (Y2Q) two years prior.

In the statistical analysis, the authors state that they: “performed multiple linear regression analysis to evaluate associations of the independent variables, measured at the beginning of year 2 of medical school, with exhaustion [and] disengagement…measured during year 4 of medical school. All models included mistreatment, MSLES subscale scores, OBI exhaustion and/or disengagement scores, IRI score, QOL score, Perceived Stress Scale score, and demographics (sex, age, marital status, relationship status, and number of dependents) as measured at the beginning of year 2 of medical school.”

The authors’ full paper is available here. We will focus on their models for burnout (exhaustion and disengagement at year 4 as measured by the graduation questionnaire).

Exercises

The authors’ results are found in Table 3, which we will be referencing throughout the rest of this lab. Read the footnotes to Table 3 carefully.

Note: Personally, I have some problems with the way the results and model are presented in the paper. Don’t be too alarmed if you have some questions!

  1. Write out the full model formulation for the linear regression model that aims to predict exhaustion at year 4.
  2. Suppose we wanted to conduct a formal hypothesis test evaluating whether female sex is associated with greater expected exhaustion (while adjusting for the other variables in the model). What would be the distribution of the test statistic associated with this hypothesis test?
  3. Is there anything strange about the way the authors deal with the reference categories for each of the dummy variables (think about what their “intercept” term is…or isn’t)? Explain.
  4. Interpret the coefficient estimates corresponding to 1 and >1 mistreatment events in context of the data (use the notion of conditional expectation).
  5. Again considering the p-values for “No. of mistreatments,” what are the null and alternative hypotheses corresponding to each of the “P value” and “Overall P value?
  6. Consider the authors’ claim from their linear regression model for burnout, which is reproduced below. Do you think this is an appropriate way to interpret the results? Is there anything missing from their interpretation?: “After adjusting for Y2Q measures, mistreatment reported on the Y2Q was associated with a higher exhaustion score (once, 0.66 [95% CI, 0.51-0.81]; more than once, 1.74 [95% CI, 1.59-1.90]; overall P < .001)