Paris Paintings

Paris Paintings

  • Dataset compiled by Hilary Coe Cronheim & Sandra van Ginhoven (grad students with DALMI)

  • Source: printed catalogs of 28 auctions in Paris between 1764- 1780

  • 3,393 paintings - 60 variables (includes prices & descriptive details from sales catalogs)

Art Auctions

The Departure of a Hunting Party

Auction Details

Two paintings very rich in composition, of a beautiful execution, and whose merit is very remarkable, each 17 inches 3 lines high, 23 inches wide; the first, painted on wood, comes from the Cabinet of Madame the Countess of Verrue; it represents a departure for the hunt: it shows in the front a child on a white horse, a man who sounds the horn to gather the dogs, a falconer and other figures nicely distributed across the width of the painting; two horses drinking from a fountain; on the right in the corner a lovely country house topped by a terrace, on which people are at the table, others who play instruments; trees and buildings pleasantly enrich the background.

Data entry, part 1


Two paintings very rich in composition, of a beautiful execution, and whose merit is very remarkable, each 17 inches 3 lines high, 23 inches wide; the first, painted on wood, comes from the Cabinet of Madame the Countess of Verrue; it represents a departure for the hunt: it shows in the front a child on a white horse, a man who sounds the horn to gather the dogs, a falconer and other figures nicely distributed across the width of the painting; two horses drinking from a fountain; on the right in the corner a lovely country house topped by a terrace, on which people are at the table, others who play instruments; trees and buildings pleasantly enrich the background.

Data entry, part 2


  • prevcoll - was the previous owner mentioned
  • paired - was the painting sold or suggested as a pairing with another
  • landsALL - was the painting described as a landscape
  • lands_figs - does the description mention figures in a landscape
  • arch - are architectural constructions mentioned in the description

Research Questions

  • What drove the prices of paintings sold?

  • How can we characterize the marketing strategy of dealers who wrote the sales catalogues?

  • Can we identify the most preferred sets of characteristics in the paintings that buyers sought? Are there any substitutability criteria at work?

Paintings dataset

This dataset is made available for class use only. Do not post the raw dataset anywhere else. You can share your results/findings/plots etc. but not the dataset.

library(readr)
pp = read_csv("~/paris_paintings.csv")

Univariate EDA

Categorical data - Counts / Proportions

Summarize the data with counts or proportions.

table(pp$origin_cat)
## 
## D/FL    F    I    O    S 
## 1339 1101  361  588    4
table(pp$origin_cat) / nrow(pp)
## 
##        D/FL           F           I           O           S 
## 0.394636015 0.324491600 0.106395520 0.173297966 0.001178898

Numeric data - Distribution

We would like to have a standardized way of describing distributions, there are several critical features that we focus on:

  • Center
  • Spread
  • Modality (peaks)
  • Skewness (asymmetry)
  • Kurtosis (peakedness / tail thickness)

Histogram

ggplot(data = pp, aes(x = price)) +
  geom_histogram()

Histogram (log transform)

ggplot(data = pp, aes(x = price)) +
  geom_histogram() +
  scale_x_log10()

Histogram bins

ggplot(data = pp, aes(x = price)) +
  geom_histogram(bins=10) +
  scale_x_log10()

ggplot(data = pp, aes(x = price)) +
  geom_histogram(bins=100) +
  scale_x_log10()

Histogram details

  • Added via geom_histogram

  • Expects a numeric variable as x in aes

  • Choice of # of bins matters, try different values

  • Good for describing the general shape (and features) of data distribution

Boxplot

ggplot(data = pp, aes(y = price, x = 1)) +
  geom_boxplot()

ggplot(data = pp, aes(y = price, x = "")) +
  geom_boxplot()

Boxplot (log transform)

ggplot(data = pp, aes(y = price, x = "")) +
  geom_boxplot() + 
  scale_y_log10()

ggplot(data = pp, aes(y = log(price), x = "")) +
  geom_boxplot()

Boxplot details

  • Added via geom_boxplot

  • Expects numeric variable as y in aes, also requires an x value (can use 1 or "" for a single boxplot)

  • Good for highlighting specific features: center, spread, skewness, outliers

Bivariate EDA

Categorical vs. Categorical - Contingency tables

table(pp$landsALL, pp$arch)
##    
##        0    1
##   0 2036   83
##   1 1197   77
table(pp$landsALL)
## 
##    0    1 
## 2119 1274
table(pp$arch)
## 
##    0    1 
## 3233  160

(Badly) Stacked histograms

ggplot(data = pp, aes(x=price, col=materialCat, fill=materialCat)) +
  geom_histogram(bins=20) +
  scale_x_log10()

Stacked histograms

ggplot(data = pp, aes(x=price, col=materialCat, fill=materialCat)) +
  geom_histogram(bins=20) +
  scale_x_log10() + 
  facet_grid(materialCat~.)

Side-by-side boxplots

ggplot(data = pp, aes(y=price, x=materialCat, col=materialCat)) +
  geom_boxplot() + 
  scale_y_log10()

Side-by-side boxplots

ggplot(data = pp, aes(y=price, x=materialCat, col=materialCat)) +
  geom_boxplot() + 
  scale_y_log10()

Side-by-side (by-side) boxplots

ggplot(data = pp, aes(y=price, x=materialCat, col=factor(prevcoll))) +
    geom_boxplot() + 
    scale_y_log10()

Application exercise

Application exercise

See course website