Packages and Data

library(tidyverse)
library(infer)

The data in songs contain the length in minutes of the 3,000 songs on a person’s phone (consider this as the entire population).

songs <- read_csv("data/songs.csv")

Exercises

Exercise 1

What is the population mean and standard deviation of song length?

songs <- songs %>% 
  rename(id = X1,
         length_minutes = length) %>% 
  mutate(length_seconds = length_minutes * 60)
songs_plot <- songs %>% 
  ggplot(aes(x = length_minutes)) +
  geom_histogram(binwidth = .25, alpha = .5, color = "violet") +
  labs(x = "Song length (minutes)", y = "Count") +
  theme_minimal(base_size = 16)

songs_plot

songs %>% 
  summarise(mean_length = mean(length_minutes),
            sd_length   = sd(length_minutes))

Exercise 2

What is the probability that a randomly selected song is longer than 5 minutes long?

songs_plot +
  geom_vline(xintercept = 5, lty = 2, size = 1, color = "darkblue")