movies.RData

Get the data

Two options listed below:

  1. Click here to download the dataset.

  2. Alternatively, use the following lines of code to load the data in R using the following:

load(url(https://stat.duke.edu/~mc301/data/movies.Rdata))

Data description

There data were obtained from IMDB and Rotten Tomatoes. The data represent 456 randomly sampled movies released between 1972 to 2014 in the Unites States.

Codebook

This data frame contains 456 observations (rows), each representing a movie, and 27 variables (columns):

  1. title: Title of movie

  2. audience_score: Audience score on Rotten Tomatoes (response variable)

  3. type: Type of movie (Documentary, Feature Film, TV Movie)

  4. genre: Genre of movie (Action & Adventure, Comedy, Documentary, Drama, Horror, Mystery & Suspense, Other)

  5. runtime: Runtime of movie (in minutes)

  6. year: Year the movie is released

  7. mpaa_rating: MPAA rating of the movie (G, PG, PG-13, R, Unrated)

  8. studio: Studio that produced the movie

  9. imdb_num_votes: Number of votes on IMDB

  10. critics_score: Critics score on Rotten Tomatoes

  11. critics_rating: Categorical variable for critics rating on Rotten Tomatoes (Certified Fresh, Fresh, Rotten)

  12. best_pic_nom: Whether or not the movie was nominated for a best picture Oscar (no, yes)

  13. best_pic_win: Whether or not the movie won a best picture Oscar (no, yes)

  14. best_actor_win: Whether or not one of the main actors in the movie ever won an Oscar (no, yes) – note that this is not necessarily whether the actor won an Oscar for their role in the given movie

  15. best_actress win: Whether or not one of the main actresses in the movie ever won an Oscar (no, yes) – not that this is not necessarily whether the actresses won an Oscar for their role in the given movie

  16. best_dir_win: Whether or not the director of the movie ever won an Oscar (no, yes) – not that this is not necessarily whether the director won an Oscar for the given movie

  17. top200_box: Whether or not the movie is in the Top 200 Box Office list on BoxOfficeMojo (no, yes)

  18. audience_rating: Categorical variable for audience rating on Rotten Tomatoes (Spilled, Upright)

  19. director: Director of the movie

  20. actor1: First main actor/actress from the abridged cast. This information was used to determine whether the movie casts an actor or actress who won a best actor or actress Oscar.

  21. actor2: Second main actor/actress from the abridged cast. This information was used to determine whether the movie casts an actor or actress who won a best actor or actress Oscar.

  22. actor3: Third main actor/actress from the abridged cast. This information was used to determine whether the movie casts an actor or actress who won a best actor or actress Oscar.

  23. actor4: Fourth main actor/actress from the abridged cast. This information was used to determine whether the movie casts an actor or actress who won a best actor or actress Oscar.

  24. actor5: Fifth main actor/actress from the abridged cast. This information was used to determine whether the movie casts an actor or actress who won a best actor or actress Oscar.

  25. imdb_url: Link to IMDB page for the movie

  26. rt_url: Link to Rotten Tomatoes page for the movie

  27. imdb_id: IMDB ID of the movie