You must turn in a knitted .pdf file to Gradescope from a Quarto Markdown
file in order to receive credit. Be sure to “associate”
questions appropriately on Gradescope. As a reminder, late work
is not accepted outside of the 24-hour grace period for homework
assignments.
The Quarto template for this assignment may be found in the
repository at the following link: https://classroom.github.com/a/vaPwqU8b
These data contain pocket measurements for 80 pairs of jeans from
popular US brands, as mentioned in the Pudding article available here - please read the
article prior to starting this assignment (it’s short and pretty
interesting!). For a description of the variables, check out the data
dictionary here.
Important: Some of your grade on this assignment
will also be based on meaningful commit descriptions. For the purposes
of this assignment, you must commit and push your changes after Exercise
2 and again after Exercise 4 (of course, you’re welcome to commit/push
more often than that!). As well, don’t forget to change your name in the
Quarto template.
- In this exercise we will create three new variables. First, create a
binary variable that indicates whether the style of a pair of jeans
is skinny/slim or boot cut/regular/straight. Next, create a new
variable corresponding to the "maximum rectangle" of the front pocket.
This variable is defined as the max height of the front pocket
multiplied by the max width of the front pocket. Finally, create
a similar variable corresponding to the "maximum rectangle" of the
back pocket. This variable is defined as the max height of the back
pocket multiplied by the max width of the back pocket. Give all three
variables meaningful names.
- Create a visualization that summarizes the relationship between the
"maximum rectangle" of the front pocket and the "maximum rectangle"
of the back pocket. In this visualization, color code the
observations by whether the pair of jeans is marketed toward men or
women, and facet your plot by the binary style variable you created.
The faceting should be done side-by-side (i.e., one row, two graphs). In
your plot, make sure you have strong labels throughout, including axes,
legends, and facet titles (e.g., don’t use the defaults). Provide a
meaningful title and subtitle that provide interesting
data insights - do not simply describe what variables you are plotting
(e.g., a title along the lines of "x vs. y vs. z").
- Given the basic visualizations constructed in Exercises 1 and 2,
what can you say about pockets in jeans marketed to men vs. to
women? What about differences by style? Do such differences themselves
appear to vary between male-coded and female-coded jeans? Do your
visualizations support the storyline from the Pudding article?
- Create a visualization that simply plots the maximum width of the back
pocket against the maximum height of the back pocket (no need to
separate by men’s vs. women’s-marketed jeans, but do label and title the
plot meaningfully). How many points appear to be plotted? How many
observations are there in the dataset? With these two things in mind,
what are the potential dangers of displaying this plot? Suggest a
strategy that might mitigate these issues.