Dataset compiled by Hilary Coe Cronheim & Sandra van Ginhoven (grad students with DALMI)
Source: printed catalogs of 28 auctions in Paris between 1764- 1780
3,393 paintings - 60 variables (includes prices & descriptive details from sales catalogs)
Two paintings very rich in composition, of a beautiful execution, and whose merit is very remarkable, each 17 inches 3 lines high, 23 inches wide; the first, painted on wood, comes from the Cabinet of Madame the Countess of Verrue; it represents a departure for the hunt: it shows in the front a child on a white horse, a man who sounds the horn to gather the dogs, a falconer and other figures nicely distributed across the width of the painting; two horses drinking from a fountain; on the right in the corner a lovely country house topped by a terrace, on which people are at the table, others who play instruments; trees and buildings pleasantly enrich the background.
Two paintings very rich in composition, of a beautiful execution, and whose merit is very remarkable, each 17 inches 3 lines high, 23 inches wide; the first, painted on wood, comes from the Cabinet of Madame the Countess of Verrue; it represents a departure for the hunt: it shows in the front a child on a white horse, a man who sounds the horn to gather the dogs, a falconer and other figures nicely distributed across the width of the painting; two horses drinking from a fountain; on the right in the corner a lovely country house topped by a terrace, on which people are at the table, others who play instruments; trees and buildings pleasantly enrich the background.
What drove the prices of paintings sold?
How can we characterize the marketing strategy of dealers who wrote the sales catalogues?
Can we identify the most preferred sets of characteristics in the paintings that buyers sought? Are there any substitutability criteria at work?
This dataset is made available for class use only. Do not post the raw dataset anywhere else. You can share your results/findings/plots etc. but not the dataset.
Codebook: https://stat.duke.edu/~cr173/Sta112_Fa16/data/paris_paintings.html
Go to the Resources on Sakai and download paris_paintings.csv
Upload this file to RStudio Server in your home directory
Load using the following (if you get an error make sure the data file is in the correct directory):
library(readr)
pp = read_csv("~/paris_paintings.csv")
Summarize the data with counts or proportions.
table(pp$origin_cat)
##
## D/FL F I O S
## 1339 1101 361 588 4
table(pp$origin_cat) / nrow(pp)
##
## D/FL F I O S
## 0.394636015 0.324491600 0.106395520 0.173297966 0.001178898
We would like to have a standardized way of describing distributions, there are several critical features that we focus on:
ggplot(data = pp, aes(x = price)) +
geom_histogram()
ggplot(data = pp, aes(x = price)) +
geom_histogram() +
scale_x_log10()
ggplot(data = pp, aes(x = price)) +
geom_histogram(bins=10) +
scale_x_log10()
ggplot(data = pp, aes(x = price)) +
geom_histogram(bins=100) +
scale_x_log10()
Added via geom_histogram
Expects a numeric variable as x
in aes
Choice of # of bins matters, try different values
Good for describing the general shape (and features) of data distribution
ggplot(data = pp, aes(y = price, x = 1)) +
geom_boxplot()
ggplot(data = pp, aes(y = price, x = "")) +
geom_boxplot()
ggplot(data = pp, aes(y = price, x = "")) +
geom_boxplot() +
scale_y_log10()
ggplot(data = pp, aes(y = log(price), x = "")) +
geom_boxplot()
Added via geom_boxplot
Expects numeric variable as y
in aes
, also requires an x
value (can use 1
or ""
for a single boxplot)
Good for highlighting specific features: center, spread, skewness, outliers
table(pp$landsALL, pp$arch)
##
## 0 1
## 0 2036 83
## 1 1197 77
table(pp$landsALL)
##
## 0 1
## 2119 1274
table(pp$arch)
##
## 0 1
## 3233 160
ggplot(data = pp, aes(x=price, col=materialCat, fill=materialCat)) +
geom_histogram(bins=20) +
scale_x_log10()
ggplot(data = pp, aes(x=price, col=materialCat, fill=materialCat)) +
geom_histogram(bins=20) +
scale_x_log10() +
facet_grid(materialCat~.)
ggplot(data = pp, aes(y=price, x=materialCat, col=materialCat)) +
geom_boxplot() +
scale_y_log10()
ggplot(data = pp, aes(y=price, x=materialCat, col=materialCat)) +
geom_boxplot() +
scale_y_log10()
ggplot(data = pp, aes(y=price, x=materialCat, col=factor(prevcoll))) +
geom_boxplot() +
scale_y_log10()
See course website