class: center, middle, inverse, title-slide # Spatial data ### Colin Rundel ### 2019-03-09 --- exclude: true --- class: middle count: false # Geospatial stuff is hard --- ## Projections <img src="Lec19_files/figure-html/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Dateline How long is the flight between the Western most and the Eastern most points in the US? -- <img src="Lec19_files/figure-html/unnamed-chunk-3-1.png" width="70%" style="display: block; margin: auto;" /> ``` ## Warning in st_is_longlat(x): bounding box has potentially an invalid value range for longlat data ``` <img src="Lec19_files/figure-html/unnamed-chunk-4-1.png" width="70%" style="display: block; margin: auto;" /> --- ## Great circle distance .small[ ```r par(mar=c(0,0,0,0)) ak1 = c(179.776,51.952) ak2 = c(-179.146,51.273) inter = geosphere::gcIntermediate(ak1, ak2, n=50, addStartEnd=TRUE) plot(st_geometry(world), col="black", ylim=c(-90,90), axes=TRUE) lines(inter,col='red',lwd=2,lty=3) ``` <img src="Lec19_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] --- ## Even plotting is hard ```r plot(st_geometry(NAm), graticule=st_crs(NAm), col="black", border=NA, axes=TRUE) ``` <img src="Lec19_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Relationships <img src="imgs/taal_photo.jpg" width="250" height="250" style="display: block; margin: auto;"/> <img src="imgs/taal_seq.png" width="1000" style="display: block; margin: auto;"/> --- class: middle count: false # Geospatial Data and R --- ## Simple Features <img src="Lec19_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Analysis of geospatial data in R R has a rich package ecosystem for read/writing, manipulating, and analyzing geospatial data. <br/> Some core packages: * ~~`sp` - core classes for handling spatial data, additional utility functions.~~ * ~~`rgdal` - R interface to `gdal` (Geospatial Data Abstraction Library) for reading and writing spatial data.~~ * ~~`rgeos` - R interface to `geos` (Geometry Engine Open Source) library for querying and manipulating spatial data. Reading and writing WKT.~~ * `sf` - Combines the functionality of `sp`, `rgdal`, and `rgeos` into a single package based on tidy simple features. * `raster` - classes and tools for handling spatial raster data. * `stars` - Reading, manipulating, writing and plotting spatiotemporal arrays (rasters) See more - [Spatial task view](http://cran.r-project.org/web/views/Spatial.html) --- ## Installing `sf` This is the hardest part of using the `sf` package, difficulty comes from is dependence on several external libraries (`geos`, `gdal`, and `proj`). * *Windows* - installing from source works when Rtools is installed (system requirements are downloaded from rwinlib) * *MacOS* - install dependencies via homebrew: `gdal`, `geos`, `proj`, `udunits`. * *Linux* - Install development pacakages for GDAL (>= 2.0.0), GEOS (>= 3.3.0), Proj4 (>= 4.8.0), udunits2 from your package manager of choice. More specific details are included in the repo [README](https://github.com/r-spatial/sf) on github. --- ## Reading, writing, and converting simple features - `sf` * `st_read` / `st_write` - Shapefile, GeoJSON, KML, ... * `read_sf` / `write_sf` - Same, suports tibbles ... * `st_as_sfc` / `st_as_wkt` - WKT * `st_as_sfc` / `st_as_binary` - WKB * `st_as_sfc` / `as(x, "Spatial")` - sp See [sf vignette #2](https://cran.r-project.org/web/packages/sf/vignettes/sf2.html). --- ## Shapefiles ```r fs::dir_info("data/gis/nc_counties/") %>% select(path:size) ``` ``` ## # A tibble: 4 x 3 ## path type size ## <fs::path> <fct> <fs::bytes> ## 1 data/gis/nc_counties/nc_counties.dbf file 41K ## 2 data/gis/nc_counties/nc_counties.prj file 165 ## 3 data/gis/nc_counties/nc_counties.shp file 1.41M ## 4 data/gis/nc_counties/nc_counties.shx file 900 ``` --- ## NC Counties ```r nc = st_read("data/gis/nc_counties/", quiet=TRUE, stringsAsFactors=FALSE) nc ``` ``` ## Simple feature collection with 100 features and 8 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -84.32186 ymin: 33.84175 xmax: -75.46003 ymax: 36.58815 ## epsg (SRID): 4269 ## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs ## First 10 features: ## AREA PERIMETER COUNTYP010 STATE COUNTY FIPS STATE_FIPS SQUARE_MIL ## 1 0.11175964 1.610396 1994 NC Ashe County 37009 37 429.350 ## 2 0.06159483 1.354829 1996 NC Alleghany County 37005 37 236.459 ## 3 0.14023009 1.769388 1998 NC Surry County 37171 37 538.863 ## 4 0.08912401 1.425249 1999 NC Gates County 37073 37 342.340 ## 5 0.06865730 4.428217 2000 NC Currituck County 37053 37 263.871 ## 6 0.11859434 1.404309 2001 NC Stokes County 37169 37 455.793 ## 7 0.06259671 2.106357 2002 NC Camden County 37029 37 240.615 ## 8 0.11542955 1.462524 2003 NC Warren County 37185 37 443.659 ## 9 0.14328609 2.397293 2004 NC Northampton County 37131 37 550.581 ## 10 0.09245561 1.810778 2005 NC Hertford County 37091 37 355.525 ## geometry ## 1 MULTIPOLYGON (((-81.65649 3... ## 2 MULTIPOLYGON (((-81.30999 3... ## 3 MULTIPOLYGON (((-80.71416 3... ## 4 MULTIPOLYGON (((-76.91183 3... ## 5 MULTIPOLYGON (((-75.82778 3... ## 6 MULTIPOLYGON (((-80.43315 3... ## 7 MULTIPOLYGON (((-76.54193 3... ## 8 MULTIPOLYGON (((-77.91907 3... ## 9 MULTIPOLYGON (((-77.16403 3... ## 10 MULTIPOLYGON (((-77.15428 3... ``` --- ## sf tibbles ```r nc = read_sf("data/gis/nc_counties/") nc ``` ``` ## Simple feature collection with 100 features and 8 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -84.32186 ymin: 33.84175 xmax: -75.46003 ymax: 36.58815 ## epsg (SRID): 4269 ## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs ## # A tibble: 100 x 9 ## AREA PERIMETER COUNTYP010 STATE COUNTY FIPS STATE_FIPS SQUARE_MIL ## <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl> ## 1 0.112 1.61 1994 NC Ashe … 37009 37 429. ## 2 0.0616 1.35 1996 NC Alleg… 37005 37 236. ## 3 0.140 1.77 1998 NC Surry… 37171 37 539. ## 4 0.0891 1.43 1999 NC Gates… 37073 37 342. ## 5 0.0687 4.43 2000 NC Curri… 37053 37 264. ## 6 0.119 1.40 2001 NC Stoke… 37169 37 456. ## 7 0.0626 2.11 2002 NC Camde… 37029 37 241. ## 8 0.115 1.46 2003 NC Warre… 37185 37 444. ## 9 0.143 2.40 2004 NC North… 37131 37 551. ## 10 0.0925 1.81 2005 NC Hertf… 37091 37 356. ## # … with 90 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]> ``` --- ## `sf` classes .small[ ```r str(nc, max.level=1) ``` ``` ## Classes 'sf', 'tbl_df', 'tbl' and 'data.frame': 100 obs. of 9 variables: ## $ AREA : num 0.1118 0.0616 0.1402 0.0891 0.0687 ... ## $ PERIMETER : num 1.61 1.35 1.77 1.43 4.43 ... ## $ COUNTYP010: num 1994 1996 1998 1999 2000 ... ## $ STATE : chr "NC" "NC" "NC" "NC" ... ## $ COUNTY : chr "Ashe County" "Alleghany County" "Surry County" "Gates County" ... ## $ FIPS : chr "37009" "37005" "37171" "37073" ... ## $ STATE_FIPS: chr "37" "37" "37" "37" ... ## $ SQUARE_MIL: num 429 236 539 342 264 ... ## $ geometry :sfc_MULTIPOLYGON of length 100; first list element: List of 1 ## ..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg" ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA ## ..- attr(*, "names")= chr "AREA" "PERIMETER" "COUNTYP010" "STATE" ... ``` ```r class(nc) ``` ``` ## [1] "sf" "tbl_df" "tbl" "data.frame" ``` ```r class(nc$geometry) ``` ``` ## [1] "sfc_MULTIPOLYGON" "sfc" ``` ```r class(nc$geometry[[1]]) ``` ``` ## [1] "XY" "MULTIPOLYGON" "sfg" ``` ] --- ## Plotting ```r plot(nc) ``` <img src="Lec19_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## More Plotting ```r plot(nc["AREA"]) ``` <img src="Lec19_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Graticules ```r par(oma=c(0,2,0,0)) plot(nc["AREA"], graticule=TRUE, axes=TRUE, las=1) ``` <img src="Lec19_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Geometries ```r par(oma=c(0,2,0,0)) plot(st_geometry(nc), graticule=TRUE, axes=TRUE, las=1) ``` <img src="Lec19_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> --- ## ggplot2 ```r ggplot(nc, aes(fill=AREA)) + geom_sf() ``` <img src="Lec19_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> --- ## ggplot2 + palettes ```r ggplot(nc, aes(fill=AREA)) + geom_sf() + scale_fill_viridis_c() ``` <img src="Lec19_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Some other data .small[ ```r air = st_read("data/gis/airports/", quiet=TRUE, stringsAsFactors=FALSE) %>% tbl_df() %>% st_sf() hwy = st_read("data/gis/us_interstates/", quiet=TRUE, stringsAsFactors=FALSE) %>% tbl_df() %>% st_sf() ``` <img src="Lec19_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Overlays? <img src="Lec19_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Overlays? (ggplot) <img src="Lec19_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Projections ```r st_crs(nc) ``` ``` ## Coordinate Reference System: ## EPSG: 4269 ## proj4string: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs" ``` ```r st_crs(air) ``` ``` ## Coordinate Reference System: ## EPSG: 4269 ## proj4string: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs" ``` ```r st_crs(hwy) ``` ``` ## Coordinate Reference System: ## EPSG: 26915 ## proj4string: "+proj=utm +zone=15 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs" ``` --- ## Aside - UTM Zones <img src="imgs/UTM_Zones.png" width="700" style="display: block; margin: auto;"/> --- ## Lat/Long ```r hwy = st_transform(hwy, st_crs(nc)) st_crs(hwy) ``` ``` ## Coordinate Reference System: ## EPSG: 4269 ## proj4string: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs" ``` <img src="Lec19_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" /> --- class: middle count: false # Airport Example --- ## Sparse vs Full Results ```r st_intersects(nc[20:30,], air) %>% str() ``` ``` ## although coordinates are longitude/latitude, st_intersects assumes that they are planar ``` ``` ## List of 11 ## $ : int(0) ## $ : int(0) ## $ : int(0) ## $ : int(0) ## $ : int(0) ## $ : int 268 ## $ : int 717 ## $ : int(0) ## $ : int(0) ## $ : int(0) ## $ : int(0) ## - attr(*, "predicate")= chr "intersects" ## - attr(*, "region.id")= chr [1:11] "1" "2" "3" "4" ... ## - attr(*, "ncol")= int 940 ## - attr(*, "class")= chr "sgbp" ``` --- ```r st_intersects(nc, air, sparse=FALSE) %>% str() ``` ``` ## although coordinates are longitude/latitude, st_intersects assumes that they are planar ``` ``` ## logi [1:100, 1:940] FALSE FALSE FALSE FALSE FALSE FALSE ... ``` ```r st_intersects(nc, air, sparse=FALSE) %>% .[20:30, 260:270] ``` ``` ## although coordinates are longitude/latitude, st_intersects assumes that they are planar ``` ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE ## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ``` --- ## Which counties have airports? ```r nc_air = st_intersects(nc, air) ``` ``` ## although coordinates are longitude/latitude, st_intersects assumes that they are planar ``` ```r n_air = map_int(nc_air, length) nc %>% slice(which(n_air > 0)) %>% pull(COUNTY) ``` ``` ## [1] "Forsyth County" "Guilford County" "Dare County" "Wake County" ## [5] "Pitt County" "Catawba County" "Buncombe County" "Wayne County" ## [9] "Mecklenburg County" "Moore County" "Cabarrus County" "Lenoir County" ## [13] "Craven County" "Cumberland County" "Onslow County" "New Hanover County" ``` ```r air_in_nc = nc_air %>% unlist() %>% unique() air %>% slice(air_in_nc) %>% pull(AIRPT_NAME) ``` ``` ## [1] "SMITH REYNOLDS AIRPORT" ## [2] "PIEDMONT TRIAD INTERNATIONAL AIRPORT" ## [3] "DARE COUNTY REGIONAL AIRPORT" ## [4] "RALEIGH-DURHAM INTERNATIONAL AIRPORT" ## [5] "PITT-GREENVILLE AIRPORT" ## [6] "HICKORY REGIONAL AIRPORT" ## [7] "ASHEVILLE REGIONAL AIRPORT" ## [8] "SEYMOUR JOHNSON AIR FORCE BASE" ## [9] "CHARLOTTE/DOUGLAS INTERNATIONAL AIRPORT" ## [10] "MOORE COUNTY AIRPORT" ## [11] "CONCORD REGIONAL AIRPORT" ## [12] "KINSTON REGIONAL JETPORT AT STALLINGS FIELD" ## [13] "CHERRY POINT MARINE CORPS AIR STATION /CUNNINGHAM FIELD/" ## [14] "COASTAL CAROLINA REGIONAL AIRPORT" ## [15] "POPE AIR FORCE BASE" ## [16] "FAYETTEVILLE REGIONAL/GRANNIS FIELD" ## [17] "ALBERT J ELLIS AIRPORT" ## [18] "WILMINGTON INTERNATIONAL AIRPORT" ``` --- ## Results <img src="Lec19_files/figure-html/unnamed-chunk-30-1.png" width="100%" style="display: block; margin: auto;" />