November 11, 2014

Explore interactions between variables, and determine if coefficient estimates and selected variables for best model for predicting price vary for different levels of the variables in consideration.

Including an interaction effect in the model allows for different slopes, i.e. nonparallel lines.

This implies that the regression coefficient for an explanatory variable would change as another explanatory variable changes.

This can be accomplished by adding an interaction variable: the product of two explanatory variables.

p1 = ggplot(data = pp, aes(x = height_in, y = log(price), colour = as.factor(landsALL))) p2 = p1 + geom_point() p3 = p2 + stat_smooth(method=lm)

m1 = lm(log(price) ~ height_in + landsALL + height_in * landsALL, data = pp) print(summary(m1), digits = 2)

## ## Call: ## lm(formula = log(price) ~ height_in + landsALL + height_in * ## landsALL, data = pp) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.23 -1.30 -0.08 1.30 5.30 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.8748 0.0718 67.9 <2e-16 *** ## height_in 0.0041 0.0026 1.6 0.11 ## landsALL -0.2822 0.1189 -2.4 0.02 * ## height_in:landsALL 0.0356 0.0053 6.7 2e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.8 on 3137 degrees of freedom ## (252 observations deleted due to missingness) ## Multiple R-squared: 0.03, Adjusted R-squared: 0.029 ## F-statistic: 32 on 3 and 3137 DF, p-value: <2e-16

\[ \begin{align*} \widehat{log(price)}~&=~4.8748 \\ &+~0.0041~height\_in \\ &-~0.2822~landsALL \\ &+~0.0356~height\_in \times landsALL \end{align*} \]

ï¿¼What is the equation of the regression model for paintings with no landscape features?

\[ \widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 (0) + 0.0356~height\_in \times (0) \] \[ \widehat{log(price)} = 4.8748 + 0.0041~height\_in \]

ï¿¼What is the equation of the regression model for paintings with landscape features?

\[ \widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 (1) + 0.0356~height\_in \times (1) \] \[ \widehat{log(price)} = 4.8748 + 0.0041~height\_in - 0.2822 + 0.0356~height\_in \] \[ \widehat{log(price)} = 4.5926 + 0.0397~height\_in \]

**no landscape**: \(\widehat{log(price)} = 4.8748 + 0.0041~height\_in\)

Paintings with no landscape features are expected on average to cost \(e^{4.8748} \approx\) 131 livres.

For each additional inch on the height of a painting with no landscape features, the price is expected to be higher on average by a factor of \(e^{0.0041} \approx\) 1.0041.

**landscape**: \(\widehat{log(price)} = 4.5926 + 0.0397~height\_in\)

Paintings with landscape features are expected on average to cost \(e^{4.5926} \approx 99\) livres.

For each additional inch on the height of a painting with landscape features, the price is expected to be higher on average by a factor of \(e^{0.0041} \approx\) 1.04.

\(\widehat{log(price)} = 4.8748 + 0.0041 height\_in~\) vs. \(~\widehat{log(price)} = 4.5926 + 0.0397 height\_in\)

pp %>% group_by(landsALL) %>% summarise(mean = mean(log(price)))

## Source: local data frame [2 x 2] ## ## landsALL mean ## 1 0 4.826 ## 2 1 5.204

Keep one of the variables constant while changing the other, i.e. solve the linear model for a given (usually mean) level of one of the variables and vary the other.

- Can you? Yes

- Should you? Probably not…

- If main effects are dropped from the model, you should also drop the interaction effects.

- You shouldn't have the interaction effect, but not the main effect of an explantory variable in your model.

… by stepwise regression. It is not a panacea, it cannot turn junk into gold, and it is definitely not a substitute for choosing predictors carefully and wisely. You might think: "Oh boy! I can generate every possible interaction term for my model, then let step choose the best ones! What a model Iâ€™ll get!"

Fit models that include interactions and do model selection. Recommendation: use backwards selection, and do it by hand (as opposed to automated). Talk to Hilary and Sandra about what your focus should be (don't try a full model that has everything, pick a focus for your set of explanatory variables) You can also get creative with composite variables. Once you settle on a model interpret the coefficients and create at least one visualization that supports your narrative.

Some variables might behave differently for various levels of another variable. For example:

- School of painting & landscape variables:
`school_pntg`

&`landsALL`

/`lands_figs`

/`lands_ment`

- Landscapes & paired paintings:
`landsALL`

/`lands_figs`

/`lands_ment`

&`paired`

- Other artists & paired paintings:
`othartist`

&`paired`

- Size & paired paintings:
`surface`

&`paired`

- Size & figures:
`surface`

&`figures`

/`nfigures`

- Dealer & previous owner:
`dealer`

&`prevcoll`

- Winning bidder & prevcoll:
`endbuyer`

&`prevcoll`

- Winning bidder & artist living:
`winningbiddertype`

&`artistliving`