September 7, 2017
Everyone in sitting with their teams?
Any questions on material from last time?
Any questions on homework?
A few of you still don't have photos on your GitHub profiles
openintro
packageFor the following we'll be using the email
dataset from the openintro
package:
library(openintro)
How would you describe the shape of this distribution?
ggplot(email, aes(x = num_char)) + geom_histogram(binwidth = 5) + labs(x = "Number of characters")
Which of the following seems like a reasonable binwidth?
TEAM: There are 3921 emails in this dataset. What is roughly the median number of line breaks in emails in this dataset? Is the average (mean) expected to be higher or lower than that value, and why?
How do the distributions of number of characters vary between emails that contain no numbers, small numbers, or big numbers?
ggplot(data = email, aes(x = number, y = num_char)) + geom_boxplot() + labs(x = "No number, small number (<1 million), or big number", y = "Number of characters")
Which of the following is a more useful representation for evaluating whether emails with subjects that start with "re:" are more likely to be categorized as spam or not?
ggplot(email, aes(x = spam, fill = re_subj)) + geom_bar(position = "fill") + labs(x = "", fill = "re:", title = 'position: "fill"')
ggplot(email, aes(x = spam, fill = re_subj)) + geom_bar(position = "fill") + labs(x = "", fill = "re:", title = 'position: "fill"')
starwars <- starwars %>% filter(mass < 500) ggplot(data = starwars, aes(x = height, y = mass, color = gender)) + geom_point()
ggplot(data = starwars, aes(x = height, y = mass, color = gender)) + geom_point() + labs(title = "Mass vs. height of Star Wars characters")
ggplot(data = starwars, aes(x = height, y = mass, color = gender)) + geom_point() + labs(title = "Mass vs. height of Star Wars characters", x = "Height (in cm)", y = "Mass (in kg)")
ggplot(data = starwars, aes(x = height, y = mass, color = gender)) + geom_point() + labs(title = "Mass vs. height of Star Wars characters", x = "Height (in cm)", y = "Mass (in kg)", color = "Gender")
ggplot(data = starwars, aes(x = height, y = mass, color = gender)) + geom_point() + labs(title = "Mass vs. height of Star Wars characters", x = "Height (in cm)", y = "Mass (in kg)", color = "Gender") + xlim(c(100, 200)) + ylim(c(40, 120))
## Warning: Removed 14 rows containing missing values (geom_point).
ggplot(data = starwars, aes(x = height, y = mass, color = gender)) + geom_point(alpha = 0.3) + labs(title = "Mass vs. height of Star Wars characters", x = "Height (in cm)", y = "Mass (in kg)", fill = "Gender")
ggplot(data = starwars, aes(x = height, y = mass, color = gender)) + geom_point() + labs(title = "Mass vs. height of Star Wars characters", x = "Height (in cm)", y = "Mass (in kg)", color = "Gender") + theme_dark()
# install.packages("ggthemes") library(ggthemes) ggplot(data = starwars, aes(x = height, y = mass, color = gender)) + geom_point() + labs(title = "Mass vs. height of Star Wars characters", x = "Height (in cm)", y = "Mass (in kg)", color = "Gender") + theme_fivethirtyeight()
Also known as geospatial data or geographic information it is the data or information that identifies the geographic location of features and boundaries on Earth, such as natural or constructed features, oceans, and more.
Spatial data is usually stored as coordinates and topology, and is data that can be mapped.
TEAM: Sketch what a tidy data frame that could be used to generate the following map depicting where Hurricane Irma is headed would look like, i.e. what's in the rows, and what's in the columns? (See original interactive map at https://nyti.ms/2x7nKLD.)
Describe the spatial distribution of preferred sweetened carbonated beverage drink.
What is missing in this visualization?
Will post instructions on caching password for GitHub, follow along, ask on Slack if you get stuck, come to OH on Monday and we can do it together
Mini Homework 04