November 11, 2014

## Application Exercise 16

Explore interactions between variables, and determine if coefficient estimates and selected variables for best model for predicting price vary for different levels of the variables in consideration.

## Interaction variables

• Including an interaction effect in the model allows for different slopes, i.e. nonparallel lines.

• This implies that the regression coefficient for an explanatory variable would change as another explanatory variable changes.

• This can be accomplished by adding an interaction variable: the product of two explanatory variables.

## Height and landscape

p1 = ggplot(data = pp, aes(x = height_in, y = log(price), colour = as.factor(landsALL)))
p2 = p1 + geom_point()
p3 = p2 + stat_smooth(method=lm)

## Model output

m1 = lm(log(price) ~ height_in + landsALL + height_in * landsALL, data = pp)
print(summary(m1), digits = 2)
##
## Call:
## lm(formula = log(price) ~ height_in + landsALL + height_in *
##     landsALL, data = pp)
##
## Residuals:
##    Min     1Q Median     3Q    Max
##  -5.23  -1.30  -0.08   1.30   5.30
##
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)          4.8748     0.0718    67.9   <2e-16 ***
## height_in            0.0041     0.0026     1.6     0.11
## landsALL            -0.2822     0.1189    -2.4     0.02 *
## height_in:landsALL   0.0356     0.0053     6.7    2e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.8 on 3137 degrees of freedom
##   (252 observations deleted due to missingness)
## Multiple R-squared:  0.03,   Adjusted R-squared:  0.029
## F-statistic:  32 on 3 and 3137 DF,  p-value: <2e-16

## Linear model

\begin{align*} \widehat{log(price)}~&=~4.8748 \\ &+~0.0041~height\_in \\ &-~0.2822~landsALL \\ &+~0.0356~height\_in \times landsALL \end{align*}

## Regression lines

ï¿¼What is the equation of the regression model for paintings with no landscape features?

$\widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 (0) + 0.0356~height\_in \times (0)$ $\widehat{log(price)} = 4.8748 + 0.0041~height\_in$

ï¿¼What is the equation of the regression model for paintings with landscape features?

$\widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 (1) + 0.0356~height\_in \times (1)$ $\widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 + 0.0356~height\_in$ $\widehat{log(price)} = 4.5926 + 0.0397~height\_in$

## Model interpretation

no landscape: $$\widehat{log(price)} = 4.8748 + 0.0041~height\_in$$

Paintings with no landscape features are expected on average to cost $$e^{4.8748} \approx$$ 131 livres.

For each additional inch on the height of a painting with no landscape features, the price is expected to be higher on average by a factor of $$e^{0.0041} \approx$$ 1.0041.

landscape: $$\widehat{log(price)} = 4.5926 + 0.0397~height\_in$$

Paintings with landscape features are expected on average to cost $$e^{4.5926} \approx 99$$ livres.

For each additional inch on the height of a painting with landscape features, the price is expected to be higher on average by a factor of $$e^{0.0041} \approx$$ 1.04.

## Do these values make sense?

$$\widehat{log(price)} = 4.8748 + 0.0041 height\_in~$$ vs. $$~\widehat{log(price)} = 4.5926 + 0.0397 height\_in$$

pp %>%
group_by(landsALL) %>%
summarise(mean = mean(log(price)))
## Source: local data frame [2 x 2]
##
##   landsALL  mean
## 1        0 4.826
## 2        1 5.204

## Interactions between numerical explanatory variables

Keep one of the variables constant while changing the other, i.e. solve the linear model for a given (usually mean) level of one of the variables and vary the other.

## Third order interactions

• Can you? Yes
• Should you? Probably not…

## Interaction variables and model selection

• If main effects are dropped from the model, you should also drop the interaction effects.
• You shouldn't have the interaction effect, but not the main effect of an explantory variable in your model.

## Don't get carried away

… by stepwise regression. It is not a panacea, it cannot turn junk into gold, and it is definitely not a substitute for choosing predictors carefully and wisely. You might think: "Oh boy! I can generate every possible interaction term for my model, then let step choose the best ones! What a model Iâ€™ll get!"

## Application Exercise 17

Fit models that include interactions and do model selection. Recommendation: use backwards selection, and do it by hand (as opposed to automated). Talk to Hilary and Sandra about what your focus should be (don't try a full model that has everything, pick a focus for your set of explanatory variables) You can also get creative with composite variables. Once you settle on a model interpret the coefficients and create at least one visualization that supports your narrative.

## Reminder: interactions of possible interest

Some variables might behave differently for various levels of another variable. For example:

• School of painting & landscape variables: school_pntg & landsALL / lands_figs / lands_ment
• Landscapes & paired paintings: landsALL / lands_figs / lands_ment & paired
• Other artists & paired paintings: othartist & paired
• Size & paired paintings: surface & paired
• Size & figures: surface & figures / nfigures
• Dealer & previous owner: dealer & prevcoll
• Winning bidder & prevcoll: endbuyer & prevcoll
• Winning bidder & artist living: winningbiddertype & artistliving