HW 01 - Data wrangling and visualization

Due: Thursday, Jan 30 at 11:59pm

This homework is based on the The Ultimate Halloween Candy Power Ranking . This article analyzed over a quarter million head-to-head match-ups pitting 86 candies against each other.

The exercises in this homework are deliberately more open-ended than ones you’ve seen previously in the course. Please refer to previous labs and lecture slides for examples of code you may have to use in completing this assignment.

Make at least three (3) commits for this homework assignment, and don’t forget to give your code chunks meaningful names.

Getting Started

Familiar steps

Creating a README file in GitHub

A README file is often the first thing visitors see when going to your GitHub repository. As such, they are a good place to put information such as what the repository contains and other administrative details. README files are written in plain markdown language and have a .md extension.

This homework assignment repository contains a template README file that you can edit. There are two ways that you can edit the README.

Packages and data

In this assignment we will work with the tidyverse and fivethirtyeight packages. Load the packages into the Console (they have already been installed for you).

library(tidyverse)
library(fivethirtyeight)

The dataset we are using is named candy_rankings.

Exercises

  1. How many observations and variables are in this dataset?

  2. Identify two categorical variables and two numerical variables in this dataset. What do they correspond to? (Hint: the help file contains the codebook)

  3. Create a visualization that examines the relationship between win percentage of a candy in its head-to-head matchups and whether the candy contains chocolate. Describe any relationship you see. The plot must include x and y labels along with a title.

  4. A single dime won 32.26% of its head-to-head matchups. How many candies had lower win percentages than a dime? Create a table that lists these candies in descending order of win percentage along with their win percentage. Only display two decimal places in your table, and only include their names and win percentages.

  5. Create a summary table of mean win percentages comparing candies that contain chocolate vs. candies that do not contain chocolate.

  6. Create a table that displays the top three fruit-flavored candies in terms of head-to-head win percentages. Only include their names and win percents.

  7. Create a new variable called price_sugar_index, that is the sum of the price-percentile and sugar-percentile in the dataset, then plot it against the head-to-head win percentage. Set the color to orange. The plot must include x and y labels along with a title. Describe any relationships you see in this visualization?

  8. Modify your plot in Exercise 7 to facet by whether a candy is bar-shaped. Have the color reflect whether the candy is fruity or not. The plot must include x and y labels along with a title. How do these plots compare to each other?

  9. Modify the README file associated with your GitHub repository to describe what this homework assignment is about. In the README file, include your name in bold, your lab section, and your favorite candy in the dataset in italics. (If you’d rather have a quarter than any of those candies, you can also choose a quarter!)

Submission

Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Please only upload your PDF document to Gradescope. Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.