Project: Rotten or fresh?


You and your teammates work for Paramount Pictures.

Your bosses have just paid a large amount of money to acquire a data set of 651 randomly sampled movies produced and released before 2016. These data include a number of variables on everything from audience and critic scores from IMDB and Rotten Tomatoes to runtime and whether or not the cast and or director have won an Oscar.

After spending all this money your bosses are interested in learning what attributes make a movie popular as well as any other interesting insights into what makes a movie successful (either critically or in terms of box office gross). They want you to justify their expenditure by putting together a flashy tool that they can show off at the next board meeting.

They don’t care what exactly you do with the data: exploratory data analysis (EDA), visualization, inference, modeling, and or prediction are all valid approaches they just want something useful and or interesting to come out of these data.


Data

The data are provided in your project repo and can be loaded using:

load("movies.Rdata")

The codebook for these data is as follow:

  1. title: Title of movie
  2. title_type: Type of movie (Documentary, Feature Film, TV Movie)
  3. genre: Genre of movie (Action & Adventure, Comedy, Documentary, Drama, 1. Horror, Mystery & Suspense, Other)
  4. runtime: Runtime of movie (in minutes)
  5. mpaa_rating: MPAA rating of the movie (G, PG, PG-13, R, Unrated)
  6. studio: Studio that produced the movie
  7. thtr_rel_year: Year the movie is released in theaters
  8. thtr_rel_month: Month the movie is released in theaters
  9. thtr_rel_day: Day of the month the movie is released in theaters
  10. dvd_rel_year: Year the movie is released on DVD
  11. dvd_rel_month: Month the movie is released on DVD
  12. dvd_rel_day: Day of the month the movie is released on DVD
  13. imdb_rating: Rating on IMDB
  14. imdb_num_votes: Number of votes on IMDB
  15. critics_rating: Categorical variable for critics rating on Rotten Tomatoes 1. (Certified Fresh, Fresh, Rotten)
  16. critics_score: Critics score on Rotten Tomatoes
  17. audience_rating: Categorical variable for audience rating on Rotten Tomatoes 1. (Spilled, Upright)
  18. audience_score: Audience score on Rotten Tomatoes (response variable)
  19. best_pic_nom: Whether or not the movie was nominated for a best picture 1. Oscar (no, yes)
  20. best_pic_win: Whether or not the movie won a best picture Oscar (no, yes)
  21. best_actor_win: Whether or not one of the main actors in the movie ever won an Oscar (no, yes) – note that this is not necessarily whether the actor won an Oscar for their role in the given movie
  22. best_actress win: Whether or not one of the main actresses in the movie ever won an Oscar (no, yes) – not that this is not necessarily whether the actresses won an Oscar for their role in the given movie
  23. best_dir_win: Whether or not the director of the movie ever won an Oscar (no, yes) – not that this is not necessarily whether the director won an Oscar for the given movie
  24. top200_box: Whether or not the movie is in the Top 200 Box Office list on BoxOfficeMojo (no, yes)
  25. director: Director of the movie
  26. actor1: First main actor/actress in the abridged cast of the movie
  27. actor2: Second main actor/actress in the abridged cast of the movie
  28. actor3: Third main actor/actress in the abridged cast of the movie
  29. actor4: Fourth main actor/actress in the abridged cast of the movie
  30. actor5: Fifth main actor/actress in the abridged cast of the movie
  31. imdb_url: Link to IMDB page for the movie
  32. rt_url: Link to Rotten Tomatoes page for the movie

Analysis and Interactivity

The project purposefully open ended, you should create some kind of compelling interactive tool that provides insight into this data. There is no limit on what tools or packages you may use but we strongly recommend using Shiny as a easy way of introducing interactivity into whatever you produce. Your ultimate goal is to provide insight into the data that someone who works for a motion picture studio would be able to use. The data provided plenty of opportunities but you should not feel constrained by them, if there is additional information / data you think is relevant you are welcome to scape / collect it and add it on. Conversely if you are interested in exploring only a single genre you are more then welcome to subset the data in any way you would like.

Visualization and EDA are a very good place to start as they will give you a deeper insight into the data and its internal relationships but modeling, inference, and or prediction are also useful tools to base your conclusions around. Whatever results you produce, they must be statistically sound and fully justified.

Your final product should contain the following pieces:


Presentation format & length

You will give a ten minute presentation / demo of your work and answer questions from myself and your classmates for up to 5 mins. Each team member should speak during this presentation and participate in answering questions. The time limit is firm, you will be asked to stop at the end of 10 minutes. This is not a lot of time, therefore you should decide carefully what you will highlight during your presentation and practice to make sure you can fit everything you want to say within the time limit. You are welcome to use slides or live demo your shiny app (or a combination of both) for your presentation, any slides used should also be included in your GitHub repository.

Presentations will occur during the scheduled final period for the class, Thursday, December 15 2:00 PM - 5:00 PM. Any group member who does not attend the presentation will receive a 0 from the entire project.


Grading

Your writeup (and accompanying code) and presentation will be graded out of 100 points.

Grading of the project will take into account:


Submission

All work submission will be via the GitHub project repository - this should include your write up and all code as well as any additional slides or other materials used during the presentation.


Teamwork and grading

Team scores for both the proposal and the poster will be adjusted based on team peer evaluation data to determine each student’s individual grade. You will be asked to fill out a survey where you rate the contribution of each team member. Filling out the survey is a prerequisite for receiving a project score.