From last time

Application Exercise 16

Explore interactions between variables, and determine if coefficient estimates and selected variables for best model for predicting price vary for different levels of the variables in consideration.

Interactions between explanatory variables

Interaction variables

  • Including an interaction effect in the model allows for different slopes, i.e. nonparallel lines.

  • This implies that the regression coefficient for an explanatory variable would change as another explanatory variable changes.

  • This can be accomplished by adding an interaction variable: the product of two explanatory variables.

Height and landscape

p1 = ggplot(data = pp, aes(x = height_in, y = log(price), colour = as.factor(landsALL)))
p2 = p1 + geom_point()
p3 = p2 + stat_smooth(method=lm)

Model output

m1 = lm(log(price) ~ height_in + landsALL + height_in * landsALL, data = pp)
print(summary(m1), digits = 2)
## 
## Call:
## lm(formula = log(price) ~ height_in + landsALL + height_in * 
##     landsALL, data = pp)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -5.23  -1.30  -0.08   1.30   5.30 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.8748     0.0718    67.9   <2e-16 ***
## height_in            0.0041     0.0026     1.6     0.11    
## landsALL            -0.2822     0.1189    -2.4     0.02 *  
## height_in:landsALL   0.0356     0.0053     6.7    2e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.8 on 3137 degrees of freedom
##   (252 observations deleted due to missingness)
## Multiple R-squared:  0.03,   Adjusted R-squared:  0.029 
## F-statistic:  32 on 3 and 3137 DF,  p-value: <2e-16

Linear model

\[ \begin{align*} \widehat{log(price)}~&=~4.8748 \\ &+~0.0041~height\_in \\ &-~0.2822~landsALL \\ &+~0.0356~height\_in \times landsALL \end{align*} \]

Regression lines

What is the equation of the regression model for paintings with no landscape features?

\[ \widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 (0) + 0.0356~height\_in \times (0) \] \[ \widehat{log(price)} = 4.8748 + 0.0041~height\_in \]

What is the equation of the regression model for paintings with landscape features?

\[ \widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 (1) + 0.0356~height\_in \times (1) \] \[ \widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 + 0.0356~height\_in \] \[ \widehat{log(price)} = 4.5926 + 0.0397~height\_in \]

Model interpretation

no landscape: \(\widehat{log(price)} = 4.8748 + 0.0041~height\_in\)

Paintings with no landscape features are expected on average to cost \(e^{4.8748} \approx\) 131 livres.

For each additional inch on the height of a painting with no landscape features, the price is expected to be higher on average by a factor of \(e^{0.0041} \approx\) 1.0041.

landscape: \(\widehat{log(price)} = 4.5926 + 0.0397~height\_in\)

Paintings with landscape features are expected on average to cost \(e^{4.5926} \approx 99\) livres.

For each additional inch on the height of a painting with landscape features, the price is expected to be higher on average by a factor of \(e^{0.0041} \approx\) 1.04.

Do these values make sense?

\(\widehat{log(price)} = 4.8748 + 0.0041 height\_in~\) vs. \(~\widehat{log(price)} = 4.5926 + 0.0397 height\_in\)

pp %>%
  group_by(landsALL) %>%
  summarise(mean = mean(log(price)))
## Source: local data frame [2 x 2]
## 
##   landsALL  mean
## 1        0 4.826
## 2        1 5.204

plot of chunk unnamed-chunk-5

Interactions between numerical explanatory variables

Keep one of the variables constant while changing the other, i.e. solve the linear model for a given (usually mean) level of one of the variables and vary the other.

Third order interactions

  • Can you? Yes
  • Should you? Probably not…

Interaction variables and model selection

  • If main effects are dropped from the model, you should also drop the interaction effects.
  • You shouldn’t have the interaction effect, but not the main effect of an explantory variable in your model.

Don’t get carried away

… by stepwise regression. It is not a panacea, it cannot turn junk into gold, and it is definitely not a substitute for choosing predictors carefully and wisely. You might think: “Oh boy! I can generate every possible interaction term for my model, then let step choose the best ones! What a model I’ll get!”

Application Exercise 17

Fit models that include interactions and do model selection. Recommendation: use backwards selection, and do it by hand (as opposed to automated). Talk to Hilary and Sandra about what your focus should be (don’t try a full model that has everything, pick a focus for your set of explanatory variables) You can also get creative with composite variables. Once you settle on a model interpret the coefficients and create at least one visualization that supports your narrative.

Reminder: interactions of possible interest

Some variables might behave differently for various levels of another variable. For example:

  • School of painting & landscape variables: school_pntg & landsALL / lands_figs / lands_ment
  • Landscapes & paired paintings: landsALL / lands_figs / lands_ment & paired
  • Other artists & paired paintings: othartist & paired
  • Size & paired paintings: surface & paired
  • Size & figures: surface & figures / nfigures
  • Dealer & previous owner: dealer & prevcoll
  • Winning bidder & prevcoll: endbuyer & prevcoll
  • Winning bidder & artist living: winningbiddertype & artistliving