The Quarto template for this assignment may be found in the
repository at the following link: https://classroom.github.com/a/uPlwAu29
A 2008 paper by Cortez and Silva examined predictors of student
performance in two Portuguese secondary schools. We will be using part
of their dataset to examine factors that are associated with academic
performance. The variables in our abridged dataset are as follows:
school: The school the student attended (GP = Gabriel
Pereira; MS = Mousinho da Silveira)
sex: The sex of the student (F = female; M = male)
age: The age of the student in years
address: Whether the student lived in an urban (U) or
rural (R) location
famsize: The family size of the student (LE3 = <= 3;
GT3 = >3)
studytime: The categorical weekly study time of the
student (1 = <2 hours; 2 = 2 - 5 hours; 3 = 5 - 10 hours; 4 = >10
hours)
paid: Whether the student’s family had extra paid
classes for the subject in question (in this case, Portuguese)
romantic: Whether the student was in a romantic
relationship
absences: The number of school absences for the
student
final: The student’s final Portuguese grade (this is a
numeric variable ranging from 0 through 20, with higher scores being
better).
Suppose our primary question of interest is whether there is an
association between whether a student attends extra paid classes and
their final grade.
- Fit a linear model that predicts the student’s final Portuguese
grade based on all other variables in the dataset (no need for
interactions or transformations). Interpret the parameter estimate
corresponding to
paid.
- Create a residual plot that corresponds to your model (no need to
display them); you should notice a group of students for whom the model
is not predicting very well. What is special about these students in
particular (why might your model not be so great for them)?
- Describe what it would mean for the independence assumption in this
model to be violated, even after adjusting for all the variables in the
model.
- Suppose you were worried about potential violation of independence
by school. Fit the same model as in Ex. 1, but with a random intercept
for school. Interpret the parameter estimate corresponding to
paid, specifically in the context of a random effects model
(hint: it shouldn’t be the same interpretation as in Ex. 1).
- Describe the difference in how the
school variable is
being treated in the two models.
- Compare the model coefficients from your model in Ex. 1 to the fixed
effect estimates from your model in Ex. 4. What do you notice?
- The minimum passing grade in Portuguese secondary schools is a 10.
Create a mixed-effects logistic regression model that predicts whether a
student passes based on the other variables in the model; use a random
intercept for school, and then interpret the odds ratio
corresponding to whether the student gets extra paid classes (be careful
about the interpretation in light of the random effect!).
- What was your reaction to the panelists’ experiences and insight?
How has today’s conversation shaped your idea of what you may want to do
in terms of a career or with life in general? (if you weren’t in class
then you won’t get credit for this question, but it’s not going to be
worth very many points. Please just leave it blank.)