class: center, middle, inverse, title-slide # Spatial data and visualization ## Intro to Data Science ### Shawn Santo ### 01-28-20 --- ## Announcements - Homework 1 due Jan 30 at 11:59pm via Gradescope. Remember to associate pages with answers and the "Overall" section with the first page. - Points will be taken off if pages aren't associated on Gradescope. See myself or the TAs if you need any help. ??? ## Installing package `sf` locally From https://r-spatial.github.io/sf/index.html **Windows** Installing `sf` from source works under windows when Rtools is installed. This downloads the system requirements from rwinlib. **MacOS** ```bash brew install pkg-config brew install gdal ``` Once gdal is installed, you will be able to install sf package from source in R. **Linux** For Unix-alikes, GDAL (>= 2.0.1), GEOS (>= 3.4.0) and Proj.4 (>= 4.8.0) are required. --- ## Get today's application exercise - Navigate to https://classroom.github.com/a/wuuyTAiA - Clone your application exercise repo, appex04-[github_name] - Change your project's name in RStudio Cloud to appex04-[your_name] - Configure git in RStudio Cloud's console pane ```r library(usethis) use_git_config(user.name = "name", user.email = "email") ``` --- class: inverse, center, middle # Introduction --- ## Spatial data is different Our typical tidy data frame: ``` #> # A tibble: 336,776 x 19 #> year month day dep_time sched_dep_time dep_delay arr_time #> <int> <int> <int> <int> <int> <dbl> <int> #> 1 2013 1 1 517 515 2 830 #> 2 2013 1 1 533 529 4 850 #> 3 2013 1 1 542 540 2 923 #> 4 2013 1 1 544 545 -1 1004 #> 5 2013 1 1 554 600 -6 812 #> 6 2013 1 1 554 558 -4 740 #> 7 2013 1 1 555 600 -5 913 #> 8 2013 1 1 557 600 -3 709 #> 9 2013 1 1 557 600 -3 838 #> 10 2013 1 1 558 600 -2 753 #> # … with 336,766 more rows, and 12 more variables: sched_arr_time <int>, #> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, #> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, #> # minute <dbl>, time_hour <dttm> ``` --- ## A simple feature object ``` #> Simple feature collection with 100 features and 2 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 #> epsg (SRID): 4267 #> proj4string: +proj=longlat +datum=NAD27 +no_defs #> First 10 features: #> AREA NAME geometry #> 1 0.114 Ashe MULTIPOLYGON (((-81.47276 3... #> 2 0.061 Alleghany MULTIPOLYGON (((-81.23989 3... #> 3 0.143 Surry MULTIPOLYGON (((-80.45634 3... #> 4 0.070 Currituck MULTIPOLYGON (((-76.00897 3... #> 5 0.153 Northampton MULTIPOLYGON (((-77.21767 3... #> 6 0.097 Hertford MULTIPOLYGON (((-76.74506 3... #> 7 0.062 Camden MULTIPOLYGON (((-76.00897 3... #> 8 0.091 Gates MULTIPOLYGON (((-76.56251 3... #> 9 0.118 Warren MULTIPOLYGON (((-78.30876 3... #> 10 0.124 Stokes MULTIPOLYGON (((-80.02567 3... ``` -- <br/> **What differences do you observe?** --- ## Analysis of spatial data in R .pull-left[ <br/> - Package `raster` contains classes and tools for handling spatial raster data. <br/><br/> - Package `sf` combines the functionality of `sp`, `rgdal`, and `rgeos` into a single package based on tidy simple features. ] .pull-right[  ] <br/> Whether or not you use vector or raster data depends on the type of problem and the data source. Our focus will be on vector data and package `sf`. *Source:* https://commons.wikimedia.org/wiki/File:Raster_vector_tikz.png --- ## Features and simple features - A **feature** is a thing or object in the real world: a house, a city, a park, a forest, etc. <br/><br/> - A **simple feature**, as defined by OpenGIS Abstract, is to have both spatial and non-spatial attributes. Spatial attributes are geometry valued, and simple features are based on 2D geometry with linear interpolation between vertices. <br/><br/> ```r Simple feature collection with 100 features and 1 field geometry type: MULTIPOLYGON dimension: XY bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 epsg (SRID): 4326 proj4string: +proj=longlat +datum=WGS84 +no_defs # A tibble: 100 x 2 NAME geometry <chr> <MULTIPOLYGON [°]> *1 Ashe (((-81.47276 36.23436, -81.54084 36.27251, -... 2 Alleghany (((-81.23989 36.36536, -81.24069 36.37942, -... ``` --- ## Geometry examples <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Visualizing spatial data --- ## Getting `sf` objects To read simple features from a file or database use function `st_read()`. .tiny[ ```r library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) nc ``` ``` #> Simple feature collection with 100 features and 14 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 #> epsg (SRID): 4267 #> proj4string: +proj=longlat +datum=NAD27 +no_defs #> First 10 features: #> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 #> 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 #> 2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487 #> 3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188 #> 4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508 #> 5 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421 #> 6 0.097 1.670 1833 1833 Hertford 37091 37091 46 1452 #> 7 0.062 1.547 1834 1834 Camden 37029 37029 15 286 #> 8 0.091 1.284 1835 1835 Gates 37073 37073 37 420 #> 9 0.118 1.421 1836 1836 Warren 37185 37185 93 968 #> 10 0.124 1.428 1837 1837 Stokes 37169 37169 85 1612 #> SID74 NWBIR74 BIR79 SID79 NWBIR79 geometry #> 1 1 10 1364 0 19 MULTIPOLYGON (((-81.47276 3... #> 2 0 10 542 3 12 MULTIPOLYGON (((-81.23989 3... #> 3 5 208 3616 6 260 MULTIPOLYGON (((-80.45634 3... #> 4 1 123 830 2 145 MULTIPOLYGON (((-76.00897 3... #> 5 9 1066 1606 3 1197 MULTIPOLYGON (((-77.21767 3... #> 6 7 954 1838 5 1237 MULTIPOLYGON (((-76.74506 3... #> 7 0 115 350 2 139 MULTIPOLYGON (((-76.00897 3... #> 8 0 254 594 2 371 MULTIPOLYGON (((-76.56251 3... #> 9 4 748 1190 2 844 MULTIPOLYGON (((-78.30876 3... #> 10 1 160 2038 5 176 MULTIPOLYGON (((-80.02567 3... ``` ] ??? ## Data details This data set was presented first in Symons, Grimson, and Yuan (1983), analysed with reference to the spatial nature of the data in Cressie and Read (1985), expanded in Cressie and Chan (1989), and used in detail in Cressie (1991). It is for the 100 counties of North Carolina, and includes counts of numbers of live births (also non-white live births) and numbers of sudden infant deaths, for the July 1, 1974 to June 30, 1978 and July 1, 1979 to June 30, 1984 periods. In Cressie and Read (1985), a listing of county neighbours based on shared boundaries (contiguity) is given, and in Cressie and Chan (1989), and in Cressie (1991, 386–89), a different listing based on the criterion of distance between county seats, with a cutoff at 30 miles. The county seat location coordinates are given in miles in a local (unknown) coordinate reference system. --- ## A closer look ``` #> Simple feature collection with 100 features and 5 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 #> epsg (SRID): 4267 #> proj4string: +proj=longlat +datum=NAD27 +no_defs #> First 10 features: #> NAME BIR74 BIR79 SID74 SID79 geometry #> 1 Ashe 1091 1364 1 0 MULTIPOLYGON (((-81.47276 3... #> 2 Alleghany 487 542 0 3 MULTIPOLYGON (((-81.23989 3... #> 3 Surry 3188 3616 5 6 MULTIPOLYGON (((-80.45634 3... #> 4 Currituck 508 830 1 2 MULTIPOLYGON (((-76.00897 3... #> 5 Northampton 1421 1606 9 3 MULTIPOLYGON (((-77.21767 3... #> 6 Hertford 1452 1838 7 5 MULTIPOLYGON (((-76.74506 3... #> 7 Camden 286 350 0 2 MULTIPOLYGON (((-76.00897 3... #> 8 Gates 420 594 0 2 MULTIPOLYGON (((-76.56251 3... #> 9 Warren 968 1190 4 2 MULTIPOLYGON (((-78.30876 3... #> 10 Stokes 1612 2038 1 5 MULTIPOLYGON (((-80.02567 3... ``` <br/> Data is for the 100 counties of North Carolina, and includes counts of numbers of live births (also non-white live births) and numbers of sudden infant deaths, for the **July 1, 1974 to June 30, 1978** and **July 1, 1979 to June 30, 1984** periods. --- ## Plotting with `ggplot()` ```r ggplot(nc) + geom_sf() ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> **What is different here in terms of how we used `ggplot()`?** --- ## Add a theme ```r ggplot(nc) + geom_sf() + theme_bw(base_size = 16) ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- ## A look at some aesthetics ```r ggplot(nc) + geom_sf(color = "purple") + theme_bw(base_size = 16) ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ## A look at some aesthetics ```r ggplot(nc) + geom_sf(color = "purple", size = 3) + theme_bw(base_size = 16) ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## A look at some aesthetics ```r ggplot(nc) + geom_sf(color = "purple", size = 3, alpha = .4) + theme_bw(base_size = 16) ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## A look at some aesthetics ```r ggplot(nc) + geom_sf(color = "purple", size = 3, alpha = .4, fill = "lightblue") + theme_bw(base_size = 16) ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- ## A look back at some of our data ``` #> Simple feature collection with 100 features and 5 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 #> epsg (SRID): 4267 #> proj4string: +proj=longlat +datum=NAD27 +no_defs #> First 10 features: #> NAME BIR74 BIR79 SID74 SID79 geometry #> 1 Ashe 1091 1364 1 0 MULTIPOLYGON (((-81.47276 3... #> 2 Alleghany 487 542 0 3 MULTIPOLYGON (((-81.23989 3... #> 3 Surry 3188 3616 5 6 MULTIPOLYGON (((-80.45634 3... #> 4 Currituck 508 830 1 2 MULTIPOLYGON (((-76.00897 3... #> 5 Northampton 1421 1606 9 3 MULTIPOLYGON (((-77.21767 3... #> 6 Hertford 1452 1838 7 5 MULTIPOLYGON (((-76.74506 3... #> 7 Camden 286 350 0 2 MULTIPOLYGON (((-76.00897 3... #> 8 Gates 420 594 0 2 MULTIPOLYGON (((-76.56251 3... #> 9 Warren 968 1190 4 2 MULTIPOLYGON (((-78.30876 3... #> 10 Stokes 1612 2038 1 5 MULTIPOLYGON (((-80.02567 3... ``` <br/> How can we incorporate these variables in our plot using `ggplot()`? --- ## Choropleth map ```r ggplot(nc) + * geom_sf(aes(fill = BIR74)) + theme_bw(base_size = 16) ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> It is sometimes helpful to pick diverging colors, [COLOR BREWER 2](http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) can help. --- ## Choropleth map One way to set fill colors is with `scale_fill_gradient()`. ```r ggplot(nc) + geom_sf(aes(fill = BIR74)) + * scale_fill_gradient(low = "#fee8c8", high = "#7f0000") + theme_bw(base_size = 16) ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- ## ...it's just a population map! <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- ## ...it's just a population map! <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- ## Multiple plots - code ```r library(patchwork) p1 <- ggplot(nc) + geom_sf(aes(fill = SID74)) + scale_fill_gradient(low = "#fff7f3", high = "#49006a") + theme_bw(base_size = 16) + labs(title = "Sudden Infant Death", caption = "July 1, 1974 to June 30, 1978", fill = "Count") p2 <- ggplot(nc) + geom_sf(aes(fill = SID79)) + scale_fill_gradient(low = "#fff7f3", high = "#49006a") + theme_bw(base_size = 16) + labs(caption = "July 1, 1979 to June 30, 1984", fill = "Count") *p1 / p2 ``` --- ## Application exercise Work on Tasks 1 and 2. ??? ## Task 2 #### Part 1 Use object `world` to create a world map of the countries. You'll want to use functions `ggplot()` and `geom_sf()`. ```r ggplot(data = world) + geom_sf() ``` #### Part 2 Build on your map from Part 1 so that the countries have a fill color associated with the population estimate. Variable `pop_est` is in millions. Be sure to label your map. ```r ggplot(data = world, aes(fill = pop_est)) + geom_sf() + labs(fill = "Population\n(in millions)", title = "Population by country") + theme_minimal() ``` --- class: inverse, center, middle # Map layers --- ## Game Lands data The North Carolina Department of Environment and Natural Resources, Wildlife Resources Commission and the NC Center for Geographic Information and Analysis has a shapefile data set available on all public Game Lands in NC. https://hub.arcgis.com/datasets/faaad7fcca8d4f67abdbb1bd4697f055_0 ```r nc_game <- st_read("data/gamelands.shp", quiet = TRUE) ``` --- ## A closer look .tiny[ ```r nc_game ``` ``` #> Simple feature collection with 94 features and 6 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -84.29534 ymin: 33.98542 xmax: -75.54947 ymax: 36.58814 #> epsg (SRID): 4267 #> proj4string: +proj=longlat +datum=NAD27 +no_defs #> First 10 features: #> OBJECTID GML_HAB SUM_ACRES GameLandID Shape__Are #> 1 1 Alcoa 11395.9471 1 69931121 #> 2 2 Alligator River 24439.0891 2 151120825 #> 3 3 Angola Bay 34063.4468 3 204400526 #> 4 4 Bachelor Bay 2786.2577 4 17219484 #> 5 5 Bertie County 3883.7683 5 24044312 #> 6 6 Bladen Lakes State Forest 33671.8426 6 202085696 #> 7 7 Brinkleyville 1843.8439 92 11511489 #> 8 8 Buckhorn 491.3477 81 3046371 #> 9 9 Buckridge 17965.7187 10 110580903 #> 10 10 Buffalo Cove 6630.9453 11 41161465 #> Shape__Len geometry #> 1 549030.42 MULTIPOLYGON (((-80.07347 3... #> 2 186792.83 MULTIPOLYGON (((-76.11832 3... #> 3 105421.80 MULTIPOLYGON (((-77.86947 3... #> 4 32891.84 MULTIPOLYGON (((-76.73896 3... #> 5 83468.94 MULTIPOLYGON (((-76.9209 35... #> 6 255198.44 MULTIPOLYGON (((-78.46171 3... #> 7 46838.19 MULTIPOLYGON (((-77.90555 3... #> 8 13445.00 MULTIPOLYGON (((-79.22056 3... #> 9 142923.83 MULTIPOLYGON (((-76.10961 3... #> 10 98754.34 MULTIPOLYGON (((-81.53307 3... ``` ] --- ## Visualize `nc_game` ```r ggplot(nc_game) + geom_sf() + theme_bw() + labs(title = "NC gamelands") ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> --- ## Visualize `nc_game` ```r ggplot(nc_game) + geom_sf(fill = "#ff6700") + theme_bw() + labs(title = "NC gamelands") ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> --- ## Add layers ```r ggplot(nc) + geom_sf() + geom_sf(data = nc_game, fill = "#ff6700", alpha = .5) + theme_bw() + labs(title = "NC gamelands and counties") ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> --- ## Add layers and aesthetics ```r ggplot(nc) + geom_sf() + geom_sf(data = nc_game, aes(alpha = SUM_ACRES), fill = "#ff6700") + theme_bw() + labs(title = "NC gamelands and counties", alpha = "Acres") ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Spatial challenges --- ## Key spatial data challenges 1. Different data types exist. - Raster versus vector 2. Special attention must be given to the coordinate reference system (CRS). ```r Simple feature collection with 100 features and 1 field geometry type: MULTIPOLYGON dimension: XY bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 *epsg (SRID): 4326 *proj4string: +proj=longlat +datum=WGS84 +no_defs # A tibble: 100 x 2 NAME geometry <chr> <MULTIPOLYGON [°]> 1 Ashe (((-81.47276 36.23436, -81.54084 36.27251, -... 2 Alleghany (((-81.23989 36.36536, -81.24069 36.37942, -... ``` 3. Manipulating spatial data objects is similar but not identical to manipulating data frame objects. - Core `dplyr` functions do work --- ## Spatial data plotting needs care <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Some core `dplyr` functions and `sf` objects ```r nc %>% select(BIR74, SID74) ``` ``` #> Simple feature collection with 100 features and 2 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 #> epsg (SRID): 4267 #> proj4string: +proj=longlat +datum=NAD27 +no_defs #> First 10 features: #> BIR74 SID74 geometry #> 1 1091 1 MULTIPOLYGON (((-81.47276 3... #> 2 487 0 MULTIPOLYGON (((-81.23989 3... #> 3 3188 5 MULTIPOLYGON (((-80.45634 3... #> 4 508 1 MULTIPOLYGON (((-76.00897 3... #> 5 1421 9 MULTIPOLYGON (((-77.21767 3... #> 6 1452 7 MULTIPOLYGON (((-76.74506 3... #> 7 286 0 MULTIPOLYGON (((-76.00897 3... #> 8 420 0 MULTIPOLYGON (((-76.56251 3... #> 9 968 4 MULTIPOLYGON (((-78.30876 3... #> 10 1612 1 MULTIPOLYGON (((-80.02567 3... ``` --- ```r nc %>% filter(AREA > .23) ``` ``` #> Simple feature collection with 3 features and 14 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -79.45597 ymin: 33.94867 xmax: -78.11374 ymax: 35.31512 #> epsg (SRID): 4267 #> proj4string: +proj=longlat +datum=NAD27 +no_defs #> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 #> 1 0.241 2.214 2083 2083 Sampson 37163 37163 82 3025 4 #> 2 0.240 2.004 2150 2150 Robeson 37155 37155 78 7889 31 #> 3 0.240 2.365 2232 2232 Columbus 37047 37047 24 3350 15 #> NWBIR74 BIR79 SID79 NWBIR79 geometry #> 1 1396 3447 4 1524 MULTIPOLYGON (((-78.11377 3... #> 2 5904 9087 26 6899 MULTIPOLYGON (((-78.86451 3... #> 3 1431 4144 17 1832 MULTIPOLYGON (((-78.65572 3... ``` --- ```r nc %>% summarise(med_area = median(AREA), med_sid74 = median(SID74), med_sid79 = median(SID79)) ``` ``` #> Simple feature collection with 1 feature and 3 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 #> epsg (SRID): 4267 #> proj4string: +proj=longlat +datum=NAD27 +no_defs #> med_area med_sid74 med_sid79 geometry #> 1 0.1205 4 5 MULTIPOLYGON (((-76.54427 3... ``` -- <br/> **What do you notice in the last three examples?** --- ## Geometries are sticky with `dplyr` To remove the geometry include `st_drop_geometry()`. -- ```r nc %>% select(AREA, NAME) %>% filter(AREA > .23) %>% * st_drop_geometry() ``` ``` #> AREA NAME #> 1 0.241 Sampson #> 2 0.240 Robeson #> 3 0.240 Columbus ``` -- ```r nc %>% * st_drop_geometry() %>% select(AREA, NAME) %>% filter(AREA > .23) ``` ``` #> AREA NAME #> 1 0.241 Sampson #> 2 0.240 Robeson #> 3 0.240 Columbus ``` --- ## Application exercise Work on Tasks 3 and 4. Try to recreate the plot below. <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> ??? ## Task 3 ```r china <- world %>% filter(name == "China") ``` ```r china_cv <- virus %>% filter(cntry_r == "Mainland China", !is.na(confrmd)) ``` ## Task 4 ```r ggplot(data = china) + geom_sf(fill = "#DE2910") + geom_sf(data = china_cv, aes(size = confrmd), color = "#FFDE00", show.legend = "point") + scale_size_continuous(range = c(0, 10)) + labs(size = "Count", title = "Coronavirus cases in China", subtitle = "As of Jan 25, 2020") + theme_bw() ``` <img src="lec-04a-spatial_files/figure-html/unnamed-chunk-39-1.png" style="display: block; margin: auto;" /> --- ## Looking ahead - Thursday, Dr. Eric Monson will guest lecture. Eric is a data visualization specialist with the Duke University Libraries’ Center for Data and Visualization Sciences. - Lab 03 will be the first team lab. A component of that lab will be based on Eric's talk. You should sit with your lab team for Thursday's lecture. --- ## References - Simple Features for R vignette, https://r-spatial.github.io/sf/ - `mapview` vignette, https://r-spatial.github.io/mapview/index.html - Coordinate Reference Systems in R https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/OverviewCoordinateReferenceSystems.pdf by Melanie Frazier