# Main effects

## Price, surface, and living artist

Very few paintings withs `Surface >= 5000`

:

## Price, surface, and living artist

For simplicity let’s focus on the paintings with `Surface < 5000`

:

9. Multiple regression, interaction effects, and model selection

- Review App Ex from last time
- Recap modeling with logged response variable

- Multiple linear regression
- So you can explain more of the variability in the response variable

- Interaction variables
- So you can start building models that better reflect reality

- Model selection
- So you can decide on the best model(s)

**Due Tuesday:**- App Ex 5: Requires consultation with Sandra

Load packages:

```
library(ggplot2)
library(dplyr)
library(stringr)
```

Load data:

```
pp <- read.csv("paris_paintings.csv", stringsAsFactors = FALSE) %>%
tbl_df()
```

Fix prices:

```
pp <- pp %>%
mutate(price = as.numeric(str_replace(price, ",", "")))
```

Fix shape coding:

```
pp <- pp %>%
mutate(shape_recode = ifelse(Shape == "", NA,
ifelse(Shape == "ovale", "oval",
ifelse(Shape == "ronde", "round",
ifelse(Shape == "octogon", "octagon", Shape)))))
```

Fix material coding:

```
pp <- pp %>%
mutate(mat_recode = ifelse(mat %in% c("a", "bc", "c"), "metal",
ifelse(mat %in% c("al", "ar", "m"), "stone",
ifelse(mat %in% c("co", "bt", "t"), "canvas",
ifelse(mat %in% c("p", "ca"), "paper",
ifelse(mat %in% c("b"), "wood",
ifelse(mat %in% c("o", "e", "v"), "other",
ifelse(mat %in% c("n/a", ""), NA,
"uncertain"))))))))
```

The linear model with multiple predictors

Population model: \[ \hat{y} = \beta_0 + \beta_1~x_1 + \beta_2~x_2 + \cdots + \beta_k~x_k \]

Sample model that we use to estimate the population model: \[ \hat{y} = b_0 + b_1~x_1 + b_2~x_2 + \cdots + b_k~x_k \]

Very few paintings withs `Surface >= 5000`

:

For simplicity let’s focus on the paintings with `Surface < 5000`

: