Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics. The article titled, “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity” (Hamermesh and Parker, 2005) found that instructors who are viewed to be better looking receive higher instructional ratings. (Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. http://www.sciencedirect.com/science/article/pii/S0272775704001165.)
For this assignment you will analyze data from this study.
Clone your assignment repo into RStudio Cloud and open the R Markdown file. Don’t forget to load in the necessary packages and configure git:
If you would like your git password cached for a week for this project, type the following in the Terminal:
You will need to enter your GitHub username and password one more time after caching the password. After that you won’t need to enter your credentials for 604800 seconds = 7 days. Note that this is only good for this single RStudio Cloud project – you will need to cache your credentials for each project you create.
The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance. (This is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007).) The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.
To get started, read in the data and save it as an object named evals
.
Codebook
Variable name | Description |
---|---|
score |
Average professor evaluation score: (1) very unsatisfactory - (5) excellent |
rank |
Rank of professor: teaching, tenure track, tenure |
ethnicity |
Ethnicity of professor: not minority, minority |
gender |
Gender of professor: female, male |
language |
Language of school where professor received education: English or non-English |
age |
Age of professor |
cls_perc_eval |
Percent of students in class who completed evaluation |
cls_did_eval |
Number of students in class who completed evaluation |
cls_students |
Total number of students in class |
cls_level |
Class level: lower, upper |
cls_profs |
Number of professors teaching sections in course in sample: single, multiple |
cls_credits |
Number of credits of class: one credit (lab, PE, etc.), multi credit |
bty_f1lower |
Beauty rating of professor from lower level female: (1) lowest - (10) highest |
bty_f1upper |
Beauty rating of professor from upper level female: (1) lowest - (10) highest |
bty_f2upper |
Beauty rating of professor from upper level female: (1) lowest - (10) highest |
bty_m1lower |
Beauty rating of professor from lower level male: (1) lowest - (10) highest |
bty_m1upper |
Beauty rating of professor from upper level male: (1) lowest - (10) highest |
bty_m2upper |
Beauty rating of professor from upper level male: (1) lowest - (10) highest |
Write all R code according to the style guidelines discussed in class. Be especially careful about staying within the 80 character limit.
All team members must commit and push to receive full credit.
In addition to lm()
, factor()
, and c()
, your code should only contain functions from the loaded R packages, unless an exercise states otherwise.
Create a new variable called bty_avg
that is the average attractiveness score given by the six students for each professor (bty_f1lower
through bty_m2upper
). Add this new variable to the evals
data frame. Do this in one pipe, using the rowwise()
function. Template code is given below to guide you in the right direction, however you will need to fill in the blanks.
Fit a linear model with the goal of predicting average professor evaluation score
based on average beauty rating (bty_avg
) only. Write out the linear model, and note \(R^2\) and adjusted \(R^2\).
Fit a linear model with the goal of predicting average professor evaluation score
based on average beauty rating (bty_avg
) and gender
. Write out the linear model, and note \(R^2\) and adjusted \(R^2\).
Interpret the slope and intercept of the model in Exercise 3 in context of the data.
What is the equation of the line corresponding to male professors for the model in Exercise 3?
For two professors who received the same beauty rating, which gender tends to have the higher course evaluation score?
How does the relationship between beauty and evaluation score vary between male and female professors?
How do the adjusted \(R^2\) values of the models from Exercises 2 and 3 compare? What does this tell us about how useful gender
is in explaining the variability in evaluation scores when we already have information on the beauty score of the professor?
Compare the slopes of bty_avg
under the two models. Has the addition of gender
to the model changed the parameter estimate (slope) for bty_avg
?
Create a new model called m_bty_rank
using rank
and bty_avg
to predict score
. Write the equation of the linear model and interpret the slopes and intercept in context of the data.
Going forward, only consider the following variables as potential predictors: rank
, ethnicity
, language
, age
, cls_perc_eval
, cls_did_eval
, cls_students
, cls_level
, cls_profs
, cls_credits
, bty_avg
.
Which variable on its own would you expect to be the worst predictor of evaluation scores? Why?
Check your suspicion from the previous exercise by fitting a linear model with that variable as the single predictor. Explain if your suspicion is warranted based on some result from the model.
Suppose you wanted to fit a full model with the variables listed above. If you are already going to include cls_perc_eval
and cls_students
, which variable should you not include as an additional predictor? Why?
Fit a full model with all predictors listed above (except for the one you decided to exclude) in Exercise 13.
Use function step()
. You’ll need to look at the help to see how to set it up for backward elimination. In your code chunk set results = “hide”
to suppress the traced steps in the backward elimination process.
Use backward elimination with AIC as the criterion to determine the best model. You do not need to show all steps in your answer, just the output for the final model. What are the \(R^2\) and adjusted \(R^2\) values.
Interpret the slopes of one continuous and one categorical predictor based on your final model.
Explain how you would assess if the linearity assumption is satisfied for this final model. Would it still make sense to use this model if the linearity assumption was severely violated?
Would you be comfortable making predictions with this model for professors at any university? Why or why not?
Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Please only upload your PDF document to Gradescope. Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.
Only one team member needs to submit for the group. After you hit submit, go to View or edit group and select all your team members from the drop-down menu.