- Link to course pages
- Link to this document
- Link to GitHub repo
Example data structure: a vector, array, or list
Example data concept: an object, an event, a field or coverage, a trajectory, an administrative division, outcome of an election, a hurricane. We will come back to this.
See https://journal.r-project.org/archive/2018/RJ-2018-009/
Simple features are (a data model for) things that have a geometry and properties.
Simple feature geometry refers to the geometry of a simple feature.
The simple refers to lines (and polygon boundaries) being represented by points connected with straight lines.
For some things, this "thing data model" either works
POINT (0 1)MULTIPOINT (0 1,1 1,-1 2)LINESTRING (0 0,1 1)MULTILINESTRING ((0 0,1 1),(2 2,0 2))POLYGON ((0 0,3 0,3 3,0 3,0 0),(1 1,2 1,2 2,1 2,1 1))MULTIPOLYGON (((0 0,1 0,1 1,0 1,0 0)),((2 2,3 2,3 3,2 3,2 2)))GEOMETRYCOLLECTION(POINT(0 1), LINESTRING(0 0,1 1))POINT EMPTYThese encodings are called Well-known text (WKT)
Well-known binary (WKB):
library(sf)
## Linking to GEOS 3.7.0, GDAL 2.4.0, PROJ 5.2.0
st_as_binary(st_as_sfc("POINT (0 1)"))[[1]]
## [1] 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 3f
st_as_binary(st_as_sfc("LINESTRING (0 0,1 1)"))[[1]]
## [1] 01 02 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ## [24] 00 00 00 00 00 00 00 00 f0 3f 00 00 00 00 00 00 f0 3f
WKB:
sf):str(st_as_sfc("POINT (0 1)")[[1]])
## 'XY' num [1:2] 0 1
str(st_as_sfc("LINESTRING (0 0,1 1,2 2)")[[1]])
## 'XY' num [1:3, 1:2] 0 1 2 0 1 2
POINT is a numeric vectormatrix (1 point per row)listLINESTRING and MULTIPOINT, or POLYGON and MULTILINESTRING)sf class systemsfcclass(st_linestring(rbind(c(0,0), c(1,1))))
## [1] "XY" "LINESTRING" "sfg"
sfc (keeps bbox, CRS)class(st_sfc(st_point(c(0,1)), st_point(c(5,5)), crs = 4326))
## [1] "sfc_POINT" "sfc"
sfclass(st_sf(attr = "a", geom = st_sfc(st_point(c(0,1)))))
## [1] "sf" "data.frame"
Empty geometries reflect a missing geometry. E.g.
st_intersection(st_point(c(0,1)), st_point(c(2,2)))
## GEOMETRYCOLLECTION EMPTY
gc = st_union(st_point(c(0,1)), st_linestring(rbind(c(0,0), c(1,1)))) gc
## GEOMETRYCOLLECTION (POINT (0 1), LINESTRING (0 0, 1 1))
st_sf(attr = "a", geom = st_sfc(gc))
## Simple feature collection with 1 feature and 1 field ## geometry type: GEOMETRYCOLLECTION ## dimension: XY ## bbox: xmin: 0 ymin: 0 xmax: 1 ymax: 1 ## epsg (SRID): NA ## proj4string: NA ## attr geom ## 1 a GEOMETRYCOLLECTION (POINT (...
g = st_sfc(st_point(c(0,1)), st_linestring(rbind(c(0,0), c(1,1))))
st_sf(attr = c("a", "b"), geom = g)
## Simple feature collection with 2 features and 1 field ## geometry type: GEOMETRY ## dimension: XY ## bbox: xmin: 0 ymin: 0 xmax: 1 ymax: 1 ## epsg (SRID): NA ## proj4string: NA ## attr geom ## 1 a POINT (0 1) ## 2 b LINESTRING (0 0, 1 1)
Geometry operations can be:
TRUE or FALSE, e.g. st_intersectsst_areast_intersectionThey have an arity:
st_areast_overlapsst_intersectionChapter 5 of Spatial Data Science details all the combinations.
If this all looks complicated, do compare it to how a GIS works.
The "sp stack" (2005-…) : sp, rgdal, rgeos, raster, … (sp: 550 revdeps)
The sf stack (2016-…): sf, stars, mapview, tmap, gstat, spdep, … (sf: 150 revdeps)
comparison:
GEOMETRYGEOMETRYCOLLECTIONPOLYGON and MULTIPOLYGON etc.SpatialXxxDataFrame objects are not data.frames, and as a consequence, sp objects do not work well with tidyverse commandsgeom_sf: sf has full ggplot2 supportgeom = st_sfc(st_point(0:1), st_point(1:2), crs = 4326)
x = st_sf(attr = c("a", "b"), geom = geom)
library(sp)
x.sp = as(x, "Spatial") # sf -> sp
x2 = st_as_sf(x.sp) # and back: sp -> sf
all.equal(x, x2, check.attributes = FALSE)
## [1] TRUE
LINESTRING and POLYGON return as MULTILINESTRING and MULTIPOLYGON respectively.system.file("shape/nc.shp", package="sf")sf::st_read (or sf::read_sf)nc dataset, compute the area of the polygons with st_areaAREA field in the dataset; is there a discrepancy?POINT (-81.49826 36.4314)Typical workflow:
Recommended:
Avoid side effects (anything not in the script/R Markdown file):
R --vanilla)R --vanilla)setwd(), at all timesrm(ls()) at the start of the script, but assume a clean workspacelibrary(xxx) statements at the startSo your colleague (or later self) can:
knitsessionInfo()R and R package versions matter. Use
sessionInfo()
or
devtools::session_info()
at the end of every .Rmd file so that it is documented which version of everything was used. This may help diagnosing issues later on.
suppressPackageStartupMessages(library(tidyverse))
nc = read_sf(system.file("gpkg/nc.gpkg", package="sf")) %>%
select(CNTY_, BIR74)
st_agr(nc) = c(CNTY_ = "identity", BIR74 = "aggregate")
pt = st_as_sfc("POINT (-81.49826 36.4314)", crs = st_crs(nc))
i = st_intersection(st_sf(geom = pt), nc)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant ## throughout all geometries
We don't get this warning if none of the attributes is an aggregate; we're faking this by
st_agr(nc) = c(CNTY_ = "identity", BIR74 = "constant") i = st_intersection(st_sf(geom = pt), nc)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
ggplot2::geom_sf, raster/stars, mapview, tmapst_crs() shows, or sets, the coordinate reference systemst_transform() takes care of these, using PROJ strings or EPSG codes (numbers), e.g.nc = read_sf(system.file("gpkg/nc.gpkg", package="sf"))
nc.tr = st_transform(nc, 2264)
st_crs(nc.tr)
## Coordinate Reference System: ## EPSG: 2264 ## proj4string: "+proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.2192024384 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs"
st_crs(nc.tr)$units
## [1] "us-ft"
sp has some elementary infrastructure for this (in-memory)raster is scalable, solid, and has a large functionalityraster data model: single raster layers, or a stack of raster layers, possibly time-referenced (max: 3-D).stars tries to break out of this limitation, and covers raster and vector data cubesstars: a package for raster and vector data cubes, was developed to handle this second case (time instances/periods are identical for all spatial locations/areas)Data are composed of discrete entities ("numbers, bits")
"Real life" is
S. Scheider, B. Graeler, E. Pebesma, C. Stasch, 2016. Modelling spatio-temporal information generation. Int J of Geographic Information Science, 30 (10), 1980-2008 https://dx.doi.org/10.1080/13658816.2016.1151520 (open access)
Higher-level concepts:
The concept of a function (\(\Rightarrow\)) is largely absent in GIS or data base representations, and neither is that of aggregations as a type.
system.file("tif/L7_ETMs.tif", package = "stars") with stars::read_starsst_apply)st_crop to crop the dataset to the area 300 m around POINT (293749.5 9115745)