Background

Analysis of geospatial data in R

R has a rich package ecosystem for read/writing, manipulating, and analyzing geospatial data.


Some core packages:

  • sp - core classes for handling spatial data, additional utility functions.

  • rgdal - R interface to gdal (Geospatial Data Abstraction Library) for reading and writing spatial data.

  • maptools - Additional tools for reading and writing spatial data.

  • rgeos - R interface to geos (Geometry Engine Open Source) library for querying and manipulating spatial data. Reading and writing WKT.

  • raster - classes and tools for handling spatial raster data.


See more - Spatial task view

Taxonomy of geospatial objects (Simple Features)

Geometry Collection

R and Simple Feature Access

The R package sp provides classes that implement these geospatial objects.


Point(s) Linestring(s) Polygon(s) Geometry Collection
Geometry SpatialPoints SpatialLines SpatialPolygons SpatialCollection*
Geometry + Data SpatialPointsDataFrame SpatialLinesDataFrame SpatialPolygonsDataFrame


They also handle additional tasks like tracking the projection of the spatial coordinates.

Reading and writing geospatial data via sp

  • maptools:
    • readShapePoints / writeShapePoints - Shapefile w/ points
    • readShapeLines / writeShapeLines - Shapefile w/ lines
    • readShapePoly / writeShapePoly - Shapefile w/ polygons
    • readShapeSpatial / writeShapeSpatial - Shapefile
  • rgdal:
    • readOGR / writeOGR - Shapefile, GeoJSON, KML, …
  • rgeos:
    • readWKT / writeWKT - Well Known Text

Geospatial stuff is hard

Projections

Dateline

Want to fly from the Western most point in the US to the Eastern most point?

inter = gcIntermediate(c(179.776,51.952), c(-179.146,51.273), n=50, addStartEnd=TRUE)

plot(wrld_simpl, col="black", ylim=c(15,90))
lines(inter,col='red',lwd=2,lty=3)

Relationships

Distance

How do we define the distance between A and B, A and C, or B and C?

Well Known Text vs geoJSON

Points & MultiPoint

WKT:
POINT (30 10)
MULTIPOINT ((10 40), (40 30), (20 20), (30 10))
GeoJSON:
{
  "type": "Point", 
  "coordinates": [30, 10]
}
{
  "type": "MultiPoint", 
  "coordinates": 
    [ [10, 40], [40, 30], [20, 20], [30, 10] ]
}

LineString & MultiLineString

WKT:
LINESTRING (30 10, 10 30, 40 40) {.smaller}


MULTILINESTRING ((10 10, 20 20, 10 40), 
                 (40 40, 30 30, 40 20, 30 10))
GeoJSON:
{
   "type": "LineString", 
   "coordinates": 
      [ [30, 10], [10, 30], [40, 40] ]
}


{
   "type": "MultiLineString", 
   "coordinates": 
      [ [[10, 10], [20, 20], [10, 40]], 
        [[40, 40], [30, 30], [40, 20], [30, 10]]
      ]
}

Polygon & MultiPolygon

WKT:
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))


MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)),
              ((15 5, 40 10, 10 20, 5 10, 15 5)))
GeoJSON:
{
  "type": "Polygon", 
  "coordinates": [ [[30, 10], [40, 40], [20, 40], 
                    [10, 20], [30, 10]] ]
}


{
  "type": "MultiPolygon", 
  "coordinates": 
    [ [ [[30, 20], [45, 40], [10, 40], [30, 20]] ], 
      [ [[15, 5], [40, 10], [10, 20], 
         [5, 10], [15, 5]] ]
    ]
}

Polygon & MultiPolygon with Hole(s)

WKT:
POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10),
         (20 30, 35 35, 30 20, 20 30))


MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)),
              ((20 35, 10 30, 10 10, 
                30 5, 45 20, 20 35),
               (30 20, 20 15, 20 25, 30 20)))
GeoJSON:
{
  "type": "Polygon", 
  "coordinates": 
    [ [[35, 10], [45, 45], [15, 40], 
       [10, 20], [35, 10]], 
      [[20, 30], [35, 35], [30, 20], [20, 30]]
    ]
}
{
  "type": "MultiPolygon", 
  "coordinates": 
    [  [ [[40, 40], [20, 45], [45, 30], [40, 40]] ], 
       [ [[20, 35], [10, 30], [10, 10], [30, 5], 
          [45, 20], [20, 35]], 
         [[30, 20], [20, 15], [20, 25], [30, 20]]
       ]
    ]
}

GeometryCollection

WKT:

GEOMETRYCOLLECTION (POINT (4 8),
                     LINESTRING (4 6,7 10),
                     POLYGON ((6 6, 8 6, 8 8, 6 6)))'

GeoJSON:

{
  "type": "GeometryCollection",
  "geometries": [
                  { 
                    "type": "Point",
                    "coordinates": [30, 10]
                  },
                  { 
                    "type": "LineString",
                    "coordinates": [ [30, 10], [10, 30], [40, 40] ]
                  }
                ]
}

Shapefiles

Data

The following data is all from Mike Bostock's us-atlas git repository - https://github.com/mbostock/us-atlas.

Available on saxon in ~cr173/Sta523/data/us-atlas/shp/us.

dir("~cr173/Sta523/data/us-atlas/shp/us/","*.shp")

##  [1] "amtrak.shp"              "cbsa.shp"                "coast.shp"               "congress-unfiltered.shp"
##  [5] "counties-unfiltered.shp" "counties.shp"            "ferries.shp"             "nation-unmerged.shp"    
##  [9] "ports.shp"               "railroads.shp"           "roads-unmerged.shp"      "states-unfiltered.shp"  
## [13] "streams-unmerged.shp"    "waterbodies.shp"         "zipcodes-unmerged.shp"  

Shapefile components

system("ls -lh ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered*")

## -rw-r--r--+ 1 cr173 visitor 8.8K Sep 30 14:36 ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered.dbf
## -rw-r--r--+ 1 cr173 visitor  167 Sep 30 14:36 ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered.prj
## -rw-r--r--+ 1 cr173 visitor  772 Sep 30 14:36 ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered.sbn
## -rw-r--r--+ 1 cr173 visitor  156 Sep 30 14:36 ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered.sbx
## -rw-r--r--+ 1 cr173 visitor  11M Sep 30 14:36 ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered.shp
## -rw-r--r--+ 1 cr173 visitor  596 Sep 30 14:36 ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered.shx
## -rw-r--r--+ 1 cr173 visitor  19K Sep 30 14:36 ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered.txt
## -rw-r--r--+ 1 cr173 visitor 7.4K Sep 30 14:36 ~cr173/Sta523/data/us-atlas/shp/us/states-unfiltered.xml


Required files :

  • .shp - the feature geometry data
  • .shx - shape index
  • .dbf - columnar attributes for each shape, in dBase IV format

Optional files :

  • .prj - coordinate system and projection information
  • .sbn and .sbx - spatial indexes of the features
  • .xml - geospatial metadata

Shapefile info

suppressMessages(library(rgdal))
ogrInfo(path.expand("~cr173/Sta523/data/us-atlas/shp/us/"),"states-unfiltered")

## Source: "/home/vis/cr173/Sta523/data/us-atlas/shp/us/", layer: "states-unfiltered"
## Driver: ESRI Shapefile; number of rows: 62 
## Feature type: wkbPolygon with 2 dimensions
## Extent: (-179.1473 17.6744) - (179.7785 71.38921)
## CRS: +proj=longlat +datum=NAD83 +no_defs  
## LDID: 87 
## Number of fields: 9 
##         name type length typeName
## 1       AREA    2     11     Real
## 2  PERIMETER    2     11     Real
## 3 STATESP010    2     19     Real
## 4      STATE    4     20   String
## 5 STATE_FIPS    4      2   String
## 6  ORDER_ADM    2     19     Real
## 7  MONTH_ADM    4     18   String
## 8    DAY_ADM    2     19     Real
## 9   YEAR_ADM    2     19     Real

Reading a Shapefile

states = readOGR(path.expand("~cr173/Sta523/data/us-atlas/shp/us/"),
                 "states-unfiltered", stringsAsFactors=FALSE)

## OGR data source with driver: ESRI Shapefile 
## Source: "/home/vis/cr173/Sta523/data/us-atlas/shp/us/", layer: "states-unfiltered"
## with 62 features
## It has 9 fields

class(states)

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

Shapefile contents (data)

states@data
##       AREA PERIMETER STATESP010                STATE STATE_FIPS ORDER_ADM
## 0  279.462   802.665          2               Alaska         02        49
## 1    1.440    14.685         15               Hawaii         15        50
## 2    0.764     8.126         72          Puerto Rico         72         0
## 3   12.879    21.642          1              Alabama         01        22
## 4   13.586    21.737          5             Arkansas         05        25
## 5   28.919    23.890          4              Arizona         04        48
## 6   41.613    55.323          6           California         06        31
## 7   28.039    22.017          8             Colorado         08        38
## 8    1.391     7.986          9          Connecticut         09         5
## 9    0.017     0.866         11 District of Columbia         11         0
## 10   0.533     5.469         10             Delaware         10         1
## 11  13.477   112.896         12              Florida         12        27
## 12  14.625    35.561         13              Georgia         13         4
## 13  15.856    20.132         19                 Iowa         19        29
## 14  24.459    30.909         16                Idaho         16        43
## 15  15.408    21.660         17             Illinois         17        21
## 16   0.443     3.098         17             Illinois         17        21
## 17   9.873    17.293         18              Indiana         18        19
## 18   0.065     1.694         18              Indiana         18        19
## 19  22.005    21.410         20               Kansas         20        34
## 20  10.664    22.510         21             Kentucky         21        15
## 21  11.424    59.940         22            Louisiana         22        18
## 22   2.293    23.275         25        Massachusetts         25         6
## 23   2.664    50.893         24             Maryland         24         7
## 24   9.666    51.869         23                Maine         23        23
## 25  16.991    63.131         26             Michigan         26        26
## 26  11.527    75.243         26             Michigan         26        26
## 27  25.537    34.873         27            Minnesota         27        32
## 28   0.790     7.392         27            Minnesota         27        32
## 29  18.615    24.682         29             Missouri         29        24
## 30  11.881    26.186         28          Mississippi         28        20
## 31  45.081    36.427         30              Montana         30        41
## 32  12.705    64.572         37       North Carolina         37        12
## 33  21.840    24.663         38         North Dakota         38        39
## 34  21.614    24.233         31             Nebraska         31        37
## 35   2.677    10.088         33        New Hampshire         33         9
## 36   2.073    17.481         34           New Jersey         34         3
## 37  30.892    23.674         35           New Mexico         35        47
## 38  29.943    23.854         32               Nevada         32        36
## 39  13.909    42.221         36             New York         36        11
## 40   1.166    13.524         36             New York         36        11
## 41  11.325    19.294         39                 Ohio         39        17
## 42   0.994    10.146         39                 Ohio         39        17
## 43  18.003    26.979         40             Oklahoma         40        46
## 44  28.147    33.718         41               Oregon         41        33
## 45  12.531    17.686         42         Pennsylvania         42         2
## 46   0.211     2.567         42         Pennsylvania         42         2
## 47   0.304     6.629         44         Rhode Island         44        13
## 48   7.795    34.720         45       South Carolina         45         8
## 49  22.580    23.292         46         South Dakota         46        40
## 50  10.890    22.359         47            Tennessee         47        16
## 51  65.141    87.128         48                Texas         48        28
## 52  22.975    20.001         49                 Utah         49        45
## 53  10.559    57.087         51             Virginia         51        10
## 54   2.799     9.455         50              Vermont         50        14
## 55  20.844    58.094         53           Washington         53        42
## 56   0.874    36.906         53           Washington         53        42
## 57  16.482    29.492         55            Wisconsin         55        30
## 58   2.739    23.805         55            Wisconsin         55        30
## 59   6.492    20.039         54        West Virginia         54        35
## 60  27.972    21.996         56              Wyoming         56        44
## 61   0.030     2.761         78  U.S. Virgin Islands         78         0
##    MONTH_ADM DAY_ADM YEAR_ADM
## 0    January       3     1959
## 1     August      21     1959
## 2       <NA>       0        0
## 3   December      14     1819
## 4       June      15     1836
## 5   February      14     1912
## 6  September       9     1850
## 7     August       1     1876
## 8    January       9     1788
## 9       <NA>       0        0
## 10  December       7     1787
## 11     March       3     1845
## 12   January       2     1788
## 13  December      28     1846
## 14      July       3     1890
## 15  December       3     1818
## 16  December       3     1818
## 17  December      11     1816
## 18  December      11     1816
## 19   January      29     1861
## 20      June       1     1792
## 21     April      30     1812
## 22  February       6     1788
## 23     April      28     1788
## 24     March      15     1820
## 25   January      26     1837
## 26   January      26     1837
## 27       May      11     1858
## 28       May      11     1858
## 29    August      10     1821
## 30  December      10     1817
## 31  November       8     1889
## 32  November      21     1789
## 33  November       2     1889
## 34     March       1     1867
## 35      June      21     1788
## 36  December      18     1787
## 37   January       6     1912
## 38   October      31     1864
## 39      July      26     1788
## 40      July      26     1788
## 41     March       1     1803
## 42     March       1     1803
## 43  November      16     1907
## 44  February      14     1859
## 45  December      12     1787
## 46  December      12     1787
## 47       May      29     1790
## 48       May      23     1788
## 49  November       2     1889
## 50      June       1     1796
## 51  December      29     1845
## 52   January       4     1896
## 53      June      25     1788
## 54     March       4     1791
## 55  November      11     1889
## 56  November      11     1889
## 57       May      29     1848
## 58       May      29     1848
## 59      June      20     1863
## 60      July      10     1890
## 61      <NA>       0        0

Shapefile contents (shapes)

plot(states[states$ORDER_ADM %in% 1:48,], col="lightgrey", axes=TRUE)

Why Illinois twice?

plot(states[states$STATE == "Illinois",], col=c("lightgrey","lightblue"), axes=TRUE)

Why Michigan twice?

plot(states[states$STATE == "Michigan",], col=c("lightgrey","lightblue"), axes=TRUE)

Removing duplicates / the Great Lakes

states = states[!duplicated(states$STATE),]
plot(states[states$ORDER_ADM %in% 1:48,], col="lightgrey", axes=TRUE)