Each team should have a new private repo called Team#_hw1 on github. The repo should contain: README.md
, .gitignore
, and hw1.Rmd
. The latter contains the Rmarkdown template for this assignment.
Your first task will be to clone this repo from github into RStudio via the creation of a new HW1 project. Once you have done that, you can edit README.md
to include some relevant information about the assignment, your team members, etc. Make sure that you can commit and push these changes, and that they show up on github.
To complete the assignment, read the rest of the description below and answer the included questions. All of your answers (both code and write up) should be added to hw1.Rmd
which should then be pushed to github.
The diamonds
dataset that we will use in this application exercise consists of prices and quality information from about 54,000 diamonds, and is included in the ggplot2
package.
We have already installed ggplot2
last time, so we don’t need to install them again. However, each time we launch R we will need to explicitly load any necessary packages:
library(ggplot2)
To familiarize yourself with the dataset you can view the help file associated with it, or open up the dataset in RStudio’s data viewer. To do so, run either of the following commands in the Console.
?diamonds
View(diamonds)
Another function that you’ll use very useful for quickly taking a peek at a dataset is str
. This function compactly displays the internal structure of an R object.
str(diamonds)
## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
The output above tells us that there are 53,940 observations and 10 variables in the dataset. The variable name are listed, along with their type and the first few observations of each variable.
The dataset contains information on prices of diamonds, as well as various attributes of diamonds, some of which are known to influence their price (in 2008 $s): the 4 Cs (carat
, cut
, color
, and clarity
) , as well as some physical measurements (depth
, table
, price
, x
, y
, and z
). The figure below shows what these measurements represent.
diamond_measurements
Carat is a unit of mass equal to 200 mg and is used for measuring gemstones and pearls. Cut grade is is an objective measure of a diamond’s light performance, or, what we generally think of as sparkle.
The figures below shows color grading of diamonds:
Lastly, the figure below shows clarity grading of diamonds:
As a team, browse the data and select three variables that you think are interesting and that you think may have an interesting relationship.
Data properties - For each of the variables you selected answer the following question: What are some of the properties of these variables that will be relevant (useful? problematic?) for data visualization?
Select and design two different charts that visualize the relationship between these three variables. Each chart should have different “purpose” that guides your choices and justifies the differences between the charts. Answer the following questions about your charts:
Chart properties - What two chart types have you selected? What are some of the properties of these charts, and how do those properties match with your variables of interest?
Tasks - What task(s) should users of each chart be able to undertake? Why are these tasks important for these variables?
Design context - What design choices did you make to try to help users accomplish the intended tasks?
User skills - What type of audience did you intend the chart for? What types of skills does this audience bring to the understanding of your charts? What elements of your charts might be difficult for the audience to understand, and how do you justify the choices you have made about those elements?