Why should we care about data visualization?

  • Help the researcher: Visualization is "hypothesis generating". It is a tool for discovery and research.
  • Communicate better:
    • Comparatively few people look at tables
    • But graphs catch attention easily
  • Why is communication important?
    • Academia is partly about marketing results.
    • Industry is definitely about marketing results.
    • Marketing is about the effective use of rhetorical devices

Why should we care about data visualization?

  • Beautiful graphs are "eye-catching"
  • Ugly graphs are boring -people stop listening/reading
  • Some people have done substantive work on this:
    • The visual display of quantitative information, Edward Tufte
    • Visualizing data and The elements of graphing data, William S. Cleveland
    • The grammar of graphics, Leland Wilkinson
  • ggplot2 is a package based on Wilkinson's idea that generate beautiful graphs for you.

Example: Thinking about gender gaps

  • Gender Heterogeneity (i.e. inequality) in labor market outcomes can be measured either in terms of wages or employment gaps
    • Wage gap: difference between average wage of men and women average wage.
    • Employment gap: difference between employment of men and women average wage
  • We will consider data from Spain

Data without graphs

#See what the data looks like
genderGaps
##   dates wageGap employmentGap
## 1  2002    20.2          26.9
## 2  2006    17.9          21.8
## 3  2007    18.1          20.7
## 4  2008    16.1          19.1
## 5  2009    16.7          16.7
## 6  2010    16.2          15.2
## 7  2011    17.9          13.7
## 8  2012    19.3          12.6
## 9  2013    19.3          11.3
  • It's hard to see any pattern!
  • That's why we resort to visualization

Plot twin gaps with the base package

There seems to be a pattern we may explore.

Plot twin gaps with ggplot2

ggplot2 generates beautiful graphs for us.

Other example: Histogram with base package

Same histogram with ggplot

Other examples: Plot more than two dimensions

Other examples: plot model fit