7. The language of models

Review App Ex from last time

- Modeling the relationship between variables
- Focus on
*linear*models (but remember there are other types of models too!)

- Focus on
Application Exercise: model prices of Paris Paintings

**Due Tuesday:**Finish App Ex + Reading (you’ll receive an email with a link after class)

```
library(ggplot2)
library(dplyr)
library(stringr)
```

```
pp <- read.csv("paris_paintings.csv", stringsAsFactors = FALSE) %>%
tbl_df()
```

`class(pp$price)`

`## [1] "character"`

`table(pp$price)[1:30]`

```
##
## 1,000.0 1,000.4 1,001.0 1,002.0 1,004.0 1,005.0 1,006.0 1,008.0 1,011.0 1,035.5 1,050.0 1,051.0
## 20 1 4 4 1 1 2 1 1 2 4 1
## 1,055.0 1,060.0 1,077.0 1,079.0 1,080.0 1,086.0 1,099.0 1,100.0 1,100.5 1,105.0 1,110.0 1,140.0
## 2 1 1 1 1 1 1 6 2 1 2 1
## 1,150.0 1,155.0 1,161.0 1,180.0 1,200.0 1,201.0
## 4 2 1 3 14 5
```

Replace `,`

with `` (blank), and save the variable as numeric:

```
pp <- pp %>%
mutate(price = as.numeric(str_replace(price, ",", "")))
```

Much better…

`class(pp$price)`

`## [1] "numeric"`

Describe the distribution of prices of paintings.

```
ggplot(data = pp, aes(x = price)) +
geom_histogram(binwidth = 1000)
```

We can represent relationships between variables using

**function**- A function is a mathematical concept: the relationship between an output and one or more inputs.
- Plug in the inputs and receive back the output
- Example: the formula \(y = 3x + 7\) is a function with input \(x\) and output \(y\), when \(x\) is \(5\), the output \(y\) is \(22\)

```
ggplot(data = pp, aes(x = Width_in, y = Height_in)) +
geom_point() +
stat_smooth(method = "lm") # lm for linear model
```

**Response variable:**Variable whose behavior or variation you are trying to understand, on the y-axis (dependent variable)**Explanatory variables:**Other variables that you want to use to explain the variation in the response, on the x-axis (independent variables)**Model value:**Output of the function**model function**- The model function gives the typical value of the response variable
*conditioning*on the explanatory variables - Also called the
**predicted value**

- The model function gives the typical value of the response variable
**Residuals:**Show how far each case is from its model value- \(residual = actual~value - model~value\)
- Tells how far above the model function each case is

What does a negative residual mean? Which paintings on the plot have have negative residuals?

How, if at all, the relatonship between width and height of paintings vary by whether or not they have any landscape elements?