The data required for this assignment can be found at

This data frame contains the following variables (columns):

The dataset has been modified from the reference below to be used in an assignment in this course. Therefore, the data are based on real observations, but missingness and other structures have been added to alter the data.

Instructions for loading the dataset:

Note that this will generate the code (in your console) necessary for importing a csv file. You need to insert this in your markdown file so that the dataset gets loaded there as well.

Now that you know how to load a csv file (using the read.csv function), going forward you can either use this function, or the point-and-click steps described above for loading datasets in R.

  1. Some of the values in data frame are missing. They have been coded using the value -999, make sure that they are properly treated as NAs.
  1. Use subsetting to replace the values of the categorical variables with the appropriate character strings and convert the variables to factors.
  1. Create a new variable called prof_val_cat where professors with evaluation scores less than or equal to average profesor rating are labeled low and those above average rating are labeled high.
  1. Using subsetting, create three different datasets, one for teaching faculty, one for tenure track faculty, and one for tenured faculty. Give these datasets informative but short names (so that if you need to refer to them later you don’t have to type a really long name).
  1. Create a visualization using at least three variables from this dataset in one plot. Hint: Refer back to what we learned about creating plots using ggplot2, as well as what you read about informative visualizations in the Tufte book.


Upload the RMarkdown (.Rmd) and HTML files associated with the assignment on Sakai. I should be able to start with the evals.csv document (which you don’t need to upload since I already have it) and reproduce your results fully.

Honor code:

This is an individual assignment (not team-based). You are welcomed to talk with your peers (feel free to ask questions of each other, share ideas, or discuss concepts), but all calculations, R code, and writing must be by you and cannot be shared accross teams. Failure to abide by these policies will result in a 0 for everyone involved. If you borrow code from an online source, make sure to cite it using a comment in your code. The comment should be visible in the HTML output (even if it looks messy).


Besides sharing ideas between each other, you can ask questions on Piazza or come by office hours. If your question is related to a code error make sure to post a MWE (minimum working example) on Piazza so that others can recreate your issue.