class: center, middle, inverse, title-slide # Spatial data ### Colin Rundel ### 2018-11-07 --- exclude: true --- class: middle count: false # Geospatial stuff is hard --- ## Projections <img src="Lec16_files/figure-html/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Dateline How long is the flight between the Western most and the Eastern most points in the US? -- <img src="Lec16_files/figure-html/unnamed-chunk-3-1.png" width="70%" style="display: block; margin: auto;" /> <img src="Lec16_files/figure-html/unnamed-chunk-4-1.png" width="70%" style="display: block; margin: auto;" /> --- ## Great circle distance <img src="Lec16_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## Even plotting is hard <img src="Lec16_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Relationships <img src="imgs/taal_photo.jpg" width="250" height="250" style="display: block; margin: auto;"/> <img src="imgs/taal_seq.png" width="1000" style="display: block; margin: auto;"/> --- class: middle count: false # Geospatial Data and R --- ## Simple Features <img src="Lec16_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Analysis of geospatial data in R R has a rich package ecosystem for read/writing, manipulating, and analyzing geospatial data. <br/> Some core packages: * ~~`sp` - core classes for handling spatial data, additional utility functions.~~ * ~~`rgdal` - R interface to `gdal` (Geospatial Data Abstraction Library) for reading and writing spatial data.~~ * ~~`rgeos` - R interface to `geos` (Geometry Engine Open Source) library for querying and manipulating spatial data. Reading and writing WKT.~~ * `sf` - Combines the functionality of `sp`, `rgdal`, and `rgeos` into a single package based on tidy simple features. * `raster` - classes and tools for handling spatial raster data. <br/> See more - [Spatial task view](http://cran.r-project.org/web/views/Spatial.html) --- ## Installing `sf` This is the hardest part of using the `sf` package, difficulty comes from is dependence on several external libraries (`geos`, `gdal`, and `proj`). * *Windows* - installing from source works when Rtools is installed (system requirements are downloaded from rwinlib) * *MacOS* - install dependencies via homebrew: `gdal`, `geos`, `proj`, `udunits`. * *Linux* - Install development pacakages for GDAL (>= 2.0.0), GEOS (>= 3.3.0), Proj4 (>= 4.8.0), udunits2 from your package manager of choice. More specific details are included in the [repo readme on github](https://github.com/r-spatial/sf). --- ## Reading, writing, and converting simple features - `sf` * `st_read` / `st_write` - Shapefile, GeoJSON, KML, ... * `st_as_sfc` / `st_as_wkt` - WKT * `st_as_sfc` / `st_as_binary` - WKB * `st_as_sfc` / `as(x, "Spatial")` - sp See [sf vignette #2](https://cran.r-project.org/web/packages/sf/vignettes/sf2.html). --- ## Shapefiles ```r fs::dir_info("data/gis/nc_counties/") %>% select(path:size) ``` ``` ## # A tibble: 4 x 3 ## path type size ## <fs::path> <fct> <fs::bytes> ## 1 data/gis/nc_counties/nc_counties.dbf file 41K ## 2 data/gis/nc_counties/nc_counties.prj file 165 ## 3 data/gis/nc_counties/nc_counties.shp file 1.41M ## 4 data/gis/nc_counties/nc_counties.shx file 900 ``` --- ## NC Counties ```r nc = st_read("data/gis/nc_counties/", quiet=TRUE, stringsAsFactors=FALSE) nc ``` ``` ## Simple feature collection with 100 features and 8 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -84.32186 ymin: 33.84175 xmax: -75.46003 ymax: 36.58815 ## epsg (SRID): 4269 ## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs ## First 10 features: ## AREA PERIMETER COUNTYP010 STATE COUNTY FIPS STATE_FIPS SQUARE_MIL ## 1 0.11175964 1.610396 1994 NC Ashe County 37009 37 429.350 ## 2 0.06159483 1.354829 1996 NC Alleghany County 37005 37 236.459 ## 3 0.14023009 1.769388 1998 NC Surry County 37171 37 538.863 ## 4 0.08912401 1.425249 1999 NC Gates County 37073 37 342.340 ## 5 0.06865730 4.428217 2000 NC Currituck County 37053 37 263.871 ## 6 0.11859434 1.404309 2001 NC Stokes County 37169 37 455.793 ## 7 0.06259671 2.106357 2002 NC Camden County 37029 37 240.615 ## 8 0.11542955 1.462524 2003 NC Warren County 37185 37 443.659 ## 9 0.14328609 2.397293 2004 NC Northampton County 37131 37 550.581 ## 10 0.09245561 1.810778 2005 NC Hertford County 37091 37 355.525 ## geometry ## 1 MULTIPOLYGON (((-81.65649 3... ## 2 MULTIPOLYGON (((-81.30999 3... ## 3 MULTIPOLYGON (((-80.71416 3... ## 4 MULTIPOLYGON (((-76.91183 3... ## 5 MULTIPOLYGON (((-75.82778 3... ## 6 MULTIPOLYGON (((-80.43315 3... ## 7 MULTIPOLYGON (((-76.54193 3... ## 8 MULTIPOLYGON (((-77.91907 3... ## 9 MULTIPOLYGON (((-77.16403 3... ## 10 MULTIPOLYGON (((-77.15428 3... ``` --- ## sf tibbles ```r nc = nc %>% as_tibble() %>% st_sf() nc ``` ``` ## Simple feature collection with 100 features and 8 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -84.32186 ymin: 33.84175 xmax: -75.46003 ymax: 36.58815 ## epsg (SRID): 4269 ## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs ## # A tibble: 100 x 9 ## AREA PERIMETER COUNTYP010 STATE COUNTY FIPS STATE_FIPS SQUARE_MIL ## <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl> ## 1 0.112 1.61 1994 NC Ashe … 37009 37 429. ## 2 0.0616 1.35 1996 NC Alleg… 37005 37 236. ## 3 0.140 1.77 1998 NC Surry… 37171 37 539. ## 4 0.0891 1.43 1999 NC Gates… 37073 37 342. ## 5 0.0687 4.43 2000 NC Curri… 37053 37 264. ## 6 0.119 1.40 2001 NC Stoke… 37169 37 456. ## 7 0.0626 2.11 2002 NC Camde… 37029 37 241. ## 8 0.115 1.46 2003 NC Warre… 37185 37 444. ## 9 0.143 2.40 2004 NC North… 37131 37 551. ## 10 0.0925 1.81 2005 NC Hertf… 37091 37 356. ## # ... with 90 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]> ``` --- ## `sf` classes .small[ ```r str(nc, max.level=1) ``` ``` ## Classes 'sf', 'tbl_df', 'tbl' and 'data.frame': 100 obs. of 9 variables: ## $ AREA : num 0.1118 0.0616 0.1402 0.0891 0.0687 ... ## $ PERIMETER : num 1.61 1.35 1.77 1.43 4.43 ... ## $ COUNTYP010: num 1994 1996 1998 1999 2000 ... ## $ STATE : chr "NC" "NC" "NC" "NC" ... ## $ COUNTY : chr "Ashe County" "Alleghany County" "Surry County" "Gates County" ... ## $ FIPS : chr "37009" "37005" "37171" "37073" ... ## $ STATE_FIPS: chr "37" "37" "37" "37" ... ## $ SQUARE_MIL: num 429 236 539 342 264 ... ## $ geometry :sfc_MULTIPOLYGON of length 100; first list element: List of 1 ## ..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg" ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA ## ..- attr(*, "names")= chr "AREA" "PERIMETER" "COUNTYP010" "STATE" ... ``` ```r class(nc) ``` ``` ## [1] "sf" "tbl_df" "tbl" "data.frame" ``` ```r class(nc$geometry) ``` ``` ## [1] "sfc_MULTIPOLYGON" "sfc" ``` ```r class(nc$geometry[[1]]) ``` ``` ## [1] "XY" "MULTIPOLYGON" "sfg" ``` ] --- ## Plotting ```r plot(nc) ``` <img src="Lec16_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## More Plotting ```r plot(nc["AREA"]) ``` <img src="Lec16_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Graticules ```r par(oma=c(0,2,0,0)) plot(nc["AREA"], graticule=TRUE, axes=TRUE, las=1) ``` <img src="Lec16_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Geometries ```r par(oma=c(0,2,0,0)) plot(st_geometry(nc), graticule=TRUE, axes=TRUE, las=1) ``` <img src="Lec16_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> --- ## ggplot2 ```r ggplot(nc, aes(fill=AREA)) + geom_sf() ``` <img src="Lec16_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> --- ## ggplot2 + palettes ```r ggplot(nc, aes(fill=AREA)) + geom_sf() + scale_fill_viridis_c() ``` <img src="Lec16_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Some other data .small[ ```r air = st_read("data/gis/airports/", quiet=TRUE, stringsAsFactors=FALSE) %>% tbl_df() %>% st_sf() hwy = st_read("data/gis/us_interstates/", quiet=TRUE, stringsAsFactors=FALSE) %>% tbl_df() %>% st_sf() ``` <img src="Lec16_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Overlays? <img src="Lec16_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Overlays? (ggplot) <img src="Lec16_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Projections ```r st_crs(nc) ``` ``` ## Coordinate Reference System: ## EPSG: 4269 ## proj4string: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs" ``` ```r st_crs(air) ``` ``` ## Coordinate Reference System: ## EPSG: 4269 ## proj4string: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs" ``` ```r st_crs(hwy) ``` ``` ## Coordinate Reference System: ## EPSG: 26915 ## proj4string: "+proj=utm +zone=15 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs" ``` --- ## Aside - UTM Zones <img src="imgs/UTM_Zones.png" width="700" style="display: block; margin: auto;"/> --- ## Lat/Long ```r hwy = st_transform(hwy, st_crs(nc)) st_crs(hwy) ``` ``` ## Coordinate Reference System: ## EPSG: 4269 ## proj4string: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs" ``` <img src="Lec16_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" /> --- class: middle count: false # Airport Example --- ## Sparse vs Full Results ```r st_intersects(nc[20:30,], air) %>% str() ``` ``` ## although coordinates are longitude/latitude, st_intersects assumes that they are planar ``` ``` ## List of 11 ## $ : int(0) ## $ : int(0) ## $ : int(0) ## $ : int(0) ## $ : int(0) ## $ : int 268 ## $ : int 717 ## $ : int(0) ## $ : int(0) ## $ : int(0) ## $ : int(0) ## - attr(*, "predicate")= chr "intersects" ## - attr(*, "region.id")= chr [1:11] "1" "2" "3" "4" ... ## - attr(*, "ncol")= int 940 ## - attr(*, "class")= chr "sgbp" ``` --- ```r st_intersects(nc, air, sparse=FALSE) %>% str() ``` ``` ## although coordinates are longitude/latitude, st_intersects assumes that they are planar ``` ``` ## logi [1:100, 1:940] FALSE FALSE FALSE FALSE FALSE FALSE ... ``` ```r st_intersects(nc, air, sparse=FALSE) %>% .[20:30, 260:270] ``` ``` ## although coordinates are longitude/latitude, st_intersects assumes that they are planar ``` ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE ## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ``` --- ## Which counties have airports? ```r nc_air = st_intersects(nc, air) ``` ``` ## although coordinates are longitude/latitude, st_intersects assumes that they are planar ``` ```r n_air = map_int(nc_air, length) nc %>% slice(which(n_air > 0)) %>% pull(COUNTY) ``` ``` ## [1] "Forsyth County" "Guilford County" "Dare County" "Wake County" ## [5] "Pitt County" "Catawba County" "Buncombe County" "Wayne County" ## [9] "Mecklenburg County" "Moore County" "Cabarrus County" "Lenoir County" ## [13] "Craven County" "Cumberland County" "Onslow County" "New Hanover County" ``` ```r air_in_nc = nc_air %>% unlist() %>% unique() air %>% slice(air_in_nc) %>% pull(AIRPT_NAME) ``` ``` ## [1] "SMITH REYNOLDS AIRPORT" ## [2] "PIEDMONT TRIAD INTERNATIONAL AIRPORT" ## [3] "DARE COUNTY REGIONAL AIRPORT" ## [4] "RALEIGH-DURHAM INTERNATIONAL AIRPORT" ## [5] "PITT-GREENVILLE AIRPORT" ## [6] "HICKORY REGIONAL AIRPORT" ## [7] "ASHEVILLE REGIONAL AIRPORT" ## [8] "SEYMOUR JOHNSON AIR FORCE BASE" ## [9] "CHARLOTTE/DOUGLAS INTERNATIONAL AIRPORT" ## [10] "MOORE COUNTY AIRPORT" ## [11] "CONCORD REGIONAL AIRPORT" ## [12] "KINSTON REGIONAL JETPORT AT STALLINGS FIELD" ## [13] "CHERRY POINT MARINE CORPS AIR STATION /CUNNINGHAM FIELD/" ## [14] "COASTAL CAROLINA REGIONAL AIRPORT" ## [15] "POPE AIR FORCE BASE" ## [16] "FAYETTEVILLE REGIONAL/GRANNIS FIELD" ## [17] "ALBERT J ELLIS AIRPORT" ## [18] "WILMINGTON INTERNATIONAL AIRPORT" ``` --- ## Results <img src="Lec16_files/figure-html/unnamed-chunk-30-1.png" width="100%" style="display: block; margin: auto;" /> --- class: middle count: false # Gerrymandering Example --- ## (Fake) Data from Annearundel County, Maryland ```r (ac = st_read("data/gis/Annearundel/AnneArundelN.shp", quiet=TRUE) %>% tbl_df() %>% st_sf()) ``` ``` ## Simple feature collection with 56 features and 7 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 413828 ymin: 146220 xmax: 441950.6 ymax: 174397.2 ## epsg (SRID): NA ## proj4string: +proj=lcc +lat_1=38.3 +lat_2=39.45 +lat_0=37.66666666666666 +lon_0=-77 +x_0=400000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## # A tibble: 56 x 8 ## E2014_R E2014_D E2016_R E2016_D Population id DISTRICT ## <int> <int> <int> <int> <int> <dbl> <int> ## 1 48 52 64 36 6122 3.70e6 3 ## 2 65 35 57 43 9154 3.70e6 2 ## 3 44 56 58 42 15458 3.70e6 3 ## 4 55 45 65 35 2424 3.70e6 3 ## 5 48 52 44 56 430 3.70e6 3 ## 6 42 58 49 51 4918 3.73e6 5 ## 7 62 38 52 48 5486 3.73e6 5 ## 8 48 52 46 54 3093 3.73e6 5 ## 9 40 60 39 61 4137 3.73e6 6 ## 10 35 65 56 44 5843 3.73e6 6 ## # ... with 46 more rows, and 1 more variable: geometry <MULTIPOLYGON [m]> ``` --- ```r ac %>% select(DISTRICT) %>% plot() ``` <img src="Lec16_files/figure-html/unnamed-chunk-32-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Adjacency matrix of precincts ```r adj = st_touches(ac, sparse=FALSE) corrplot::corrplot(adj,method="color",type="full",tl.col="black",cl.pos = "n") ``` <img src="Lec16_files/figure-html/unnamed-chunk-33-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Creating districts ```r ac_dists = ac %>% mutate(DISTRICT = as.factor(DISTRICT)) %>% group_by(DISTRICT) %>% summarize() ggplot(ac_dists, aes(fill=as.factor(DISTRICT))) + geom_sf() ``` <img src="Lec16_files/figure-html/unnamed-chunk-34-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Polsby-Popper This is a measure of district compactness $$ PP(d) = 4\pi \frac{ A(d)}{P(d)^2} $$ where `\(A(d)\)` and `\(P(d)\)` are the area and perimeter of the district respectively. -- ```r circ = st_length(st_cast(ac_dists,"MULTILINESTRING")) area = st_area(ac_dists) 4 * pi * area / circ^2 ``` ``` ## Units: [1] ## [1] 0.1578816 0.1465205 0.2948130 0.1177227 0.1422566 0.3782004 0.1953110 0.2524293 ``` --- class: middle count: false # Highway Example --- ## Highways .top[ <img src="Lec16_files/figure-html/unnamed-chunk-36-1.png" width="90%" style="display: block; margin: auto;" /> ] --- ## NC Interstate Highways ```r nc_hwy = st_intersection(hwy, nc) ``` ``` ## although coordinates are longitude/latitude, st_intersection assumes that they are planar ``` ``` ## Warning: attribute variables are assumed to be spatially constant throughout all geometries ``` <img src="Lec16_files/figure-html/unnamed-chunk-38-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Reprojecting .small[ ```r nc_lcc = st_transform(nc, 32119) nc_hwy_lcc = st_transform(nc_hwy, 32119) st_crs(nc_lcc) ``` ``` ## Coordinate Reference System: ## EPSG: 32119 ## proj4string: "+proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.22 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs" ``` http://spatialreference.org/ref/epsg/32119/ <img src="Lec16_files/figure-html/unnamed-chunk-40-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Base R plot <img src="Lec16_files/figure-html/unnamed-chunk-41-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Base R plot <img src="Lec16_files/figure-html/unnamed-chunk-42-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Counties near the interstate (Buffering) ```r buffer = nc_hwy_lcc %>% st_buffer(10000) ``` <img src="Lec16_files/figure-html/unnamed-chunk-44-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Counties near the interstate (Buffering + Union) ```r buffer = nc_hwy_lcc %>% st_buffer(10000) %>% st_union() %>% st_sf() ``` <img src="Lec16_files/figure-html/unnamed-chunk-46-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Which counties? ```r buffer = nc_hwy_lcc %>% st_buffer(10000) %>% st_union() %>% st_sf() near_counties = st_intersects(nc_lcc, buffer) %>% map_lgl(~length(.x) != 0) ``` <img src="Lec16_files/figure-html/unnamed-chunk-48-1.png" width="100%" style="display: block; margin: auto;" />