Today’s agenda

Today’s agenda

  • Review App Ex from last time
    • Recap modeling with logged response variable
  • Multiple linear regression
    • So you can explain more of the variability in the response variable
  • Interaction variables
    • So you can start building models that better reflect reality
  • Model selection
    • So you can decide on the best model(s)
  • Due Tuesday:
    • App Ex 5: Requires consultation with Sandra

Initial setup

Load packages & data + data fixes

Load packages:

library(ggplot2)
library(dplyr)
library(stringr)

Load data:

pp <- read.csv("paris_paintings.csv", stringsAsFactors = FALSE) %>%
  tbl_df()

Fix prices:

pp <- pp %>%
  mutate(price = as.numeric(str_replace(price, ",", "")))

More data fixes

Fix shape coding:

pp <- pp %>%
  mutate(shape_recode = ifelse(Shape == "", NA,
                               ifelse(Shape == "ovale", "oval",
                                      ifelse(Shape == "ronde", "round",
                                             ifelse(Shape == "octogon", "octagon", Shape)))))

Fix material coding:

pp <- pp %>%
  mutate(mat_recode = ifelse(mat %in% c("a", "bc", "c"), "metal",
                             ifelse(mat %in% c("al", "ar", "m"), "stone",
                                    ifelse(mat %in% c("co", "bt", "t"), "canvas",
                                           ifelse(mat %in% c("p", "ca"), "paper",
                                                  ifelse(mat %in% c("b"), "wood",
                                                         ifelse(mat %in% c("o", "e", "v"), "other",
                                                                ifelse(mat %in% c("n/a", ""), NA,
                                                                       "uncertain"))))))))

Multiple linear regression

From last time…

The linear model with multiple predictors

  • Population model: \[ \hat{y} = \beta_0 + \beta_1~x_1 + \beta_2~x_2 + \cdots + \beta_k~x_k \]

  • Sample model that we use to estimate the population model: \[ \hat{y} = b_0 + b_1~x_1 + b_2~x_2 + \cdots + b_k~x_k \]

Main effects

Price, surface, and living artist

Very few paintings withs Surface >= 5000:

Price, surface, and living artist

For simplicity let’s focus on the paintings with Surface < 5000: