- Link to course pages
- Link to this document
- Link to GitHub repo
Example data structure: a vector, array, or list
Example data concept: an object, an event, a field or coverage, a trajectory, an administrative division, outcome of an election, a hurricane. We will come back to this.
See https://journal.r-project.org/archive/2018/RJ-2018-009/
Simple features are (a data model for) things that have a geometry and properties.
Simple feature geometry refers to the geometry of a simple feature.
The simple refers to lines (and polygon boundaries) being represented by points connected with straight lines.
For some things, this "thing data model" either works
POINT (0 1)
MULTIPOINT (0 1,1 1,-1 2)
LINESTRING (0 0,1 1)
MULTILINESTRING ((0 0,1 1),(2 2,0 2))
POLYGON ((0 0,3 0,3 3,0 3,0 0),(1 1,2 1,2 2,1 2,1 1))
MULTIPOLYGON (((0 0,1 0,1 1,0 1,0 0)),((2 2,3 2,3 3,2 3,2 2)))
GEOMETRYCOLLECTION(POINT(0 1), LINESTRING(0 0,1 1))
POINT EMPTY
These encodings are called Well-known text (WKT)
Well-known binary (WKB):
library(sf)
## Linking to GEOS 3.7.0, GDAL 2.4.0, PROJ 5.2.0
st_as_binary(st_as_sfc("POINT (0 1)"))[[1]]
## [1] 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 3f
st_as_binary(st_as_sfc("LINESTRING (0 0,1 1)"))[[1]]
## [1] 01 02 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ## [24] 00 00 00 00 00 00 00 00 f0 3f 00 00 00 00 00 00 f0 3f
WKB:
sf
):str(st_as_sfc("POINT (0 1)")[[1]])
## 'XY' num [1:2] 0 1
str(st_as_sfc("LINESTRING (0 0,1 1,2 2)")[[1]])
## 'XY' num [1:3, 1:2] 0 1 2 0 1 2
POINT
is a numeric vectormatrix
(1 point per row)list
LINESTRING
and MULTIPOINT
, or POLYGON
and MULTILINESTRING
)sf
class systemsfc
class(st_linestring(rbind(c(0,0), c(1,1))))
## [1] "XY" "LINESTRING" "sfg"
sfc
(keeps bbox, CRS)class(st_sfc(st_point(c(0,1)), st_point(c(5,5)), crs = 4326))
## [1] "sfc_POINT" "sfc"
sf
class(st_sf(attr = "a", geom = st_sfc(st_point(c(0,1)))))
## [1] "sf" "data.frame"
Empty geometries reflect a missing geometry. E.g.
st_intersection(st_point(c(0,1)), st_point(c(2,2)))
## GEOMETRYCOLLECTION EMPTY
gc = st_union(st_point(c(0,1)), st_linestring(rbind(c(0,0), c(1,1)))) gc
## GEOMETRYCOLLECTION (POINT (0 1), LINESTRING (0 0, 1 1))
st_sf(attr = "a", geom = st_sfc(gc))
## Simple feature collection with 1 feature and 1 field ## geometry type: GEOMETRYCOLLECTION ## dimension: XY ## bbox: xmin: 0 ymin: 0 xmax: 1 ymax: 1 ## epsg (SRID): NA ## proj4string: NA ## attr geom ## 1 a GEOMETRYCOLLECTION (POINT (...
g = st_sfc(st_point(c(0,1)), st_linestring(rbind(c(0,0), c(1,1)))) st_sf(attr = c("a", "b"), geom = g)
## Simple feature collection with 2 features and 1 field ## geometry type: GEOMETRY ## dimension: XY ## bbox: xmin: 0 ymin: 0 xmax: 1 ymax: 1 ## epsg (SRID): NA ## proj4string: NA ## attr geom ## 1 a POINT (0 1) ## 2 b LINESTRING (0 0, 1 1)
Geometry operations can be:
TRUE
or FALSE
, e.g. st_intersects
st_area
st_intersection
They have an arity:
st_area
st_overlaps
st_intersection
Chapter 5 of Spatial Data Science details all the combinations.
If this all looks complicated, do compare it to how a GIS works.
The "sp stack" (2005-…) : sp, rgdal, rgeos, raster, … (sp: 550 revdeps)
The sf stack (2016-…): sf, stars, mapview, tmap, gstat, spdep, … (sf: 150 revdeps)
comparison:
GEOMETRY
GEOMETRYCOLLECTION
POLYGON
and MULTIPOLYGON
etc.SpatialXxxDataFrame
objects are not data.frames
, and as a consequence, sp objects do not work well with tidyverse
commandsgeom_sf
: sf has full ggplot2
supportgeom = st_sfc(st_point(0:1), st_point(1:2), crs = 4326) x = st_sf(attr = c("a", "b"), geom = geom) library(sp) x.sp = as(x, "Spatial") # sf -> sp x2 = st_as_sf(x.sp) # and back: sp -> sf all.equal(x, x2, check.attributes = FALSE)
## [1] TRUE
LINESTRING
and POLYGON
return as MULTILINESTRING
and MULTIPOLYGON
respectively.system.file("shape/nc.shp", package="sf")
sf::st_read
(or sf::read_sf
)nc
dataset, compute the area of the polygons with st_area
AREA
field in the dataset; is there a discrepancy?POINT (-81.49826 36.4314)
Typical workflow:
Recommended:
Avoid side effects (anything not in the script/R Markdown file):
R --vanilla
)R --vanilla
)setwd()
, at all timesrm(ls())
at the start of the script, but assume a clean workspacelibrary(xxx)
statements at the startSo your colleague (or later self) can:
knit
sessionInfo()
R and R package versions matter. Use
sessionInfo()
or
devtools::session_info()
at the end of every .Rmd file so that it is documented which version of everything was used. This may help diagnosing issues later on.
suppressPackageStartupMessages(library(tidyverse)) nc = read_sf(system.file("gpkg/nc.gpkg", package="sf")) %>% select(CNTY_, BIR74) st_agr(nc) = c(CNTY_ = "identity", BIR74 = "aggregate") pt = st_as_sfc("POINT (-81.49826 36.4314)", crs = st_crs(nc)) i = st_intersection(st_sf(geom = pt), nc)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant ## throughout all geometries
We don't get this warning if none of the attributes is an aggregate
; we're faking this by
st_agr(nc) = c(CNTY_ = "identity", BIR74 = "constant") i = st_intersection(st_sf(geom = pt), nc)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
ggplot2::geom_sf
, raster/stars, mapview, tmapst_crs()
shows, or sets, the coordinate reference systemst_transform()
takes care of these, using PROJ strings or EPSG codes (numbers), e.g.nc = read_sf(system.file("gpkg/nc.gpkg", package="sf")) nc.tr = st_transform(nc, 2264) st_crs(nc.tr)
## Coordinate Reference System: ## EPSG: 2264 ## proj4string: "+proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.2192024384 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs"
st_crs(nc.tr)$units
## [1] "us-ft"
sp
has some elementary infrastructure for this (in-memory)raster
is scalable, solid, and has a large functionalityraster
data model: single raster layers, or a stack of raster layers, possibly time-referenced (max: 3-D).stars
tries to break out of this limitation, and covers raster and vector data cubesstars
: a package for raster and vector data cubes, was developed to handle this second case (time instances/periods are identical for all spatial locations/areas)Data are composed of discrete entities ("numbers, bits")
"Real life" is
S. Scheider, B. Graeler, E. Pebesma, C. Stasch, 2016. Modelling spatio-temporal information generation. Int J of Geographic Information Science, 30 (10), 1980-2008 https://dx.doi.org/10.1080/13658816.2016.1151520 (open access)
Higher-level concepts:
The concept of a function (\(\Rightarrow\)) is largely absent in GIS or data base representations, and neither is that of aggregations as a type.
system.file("tif/L7_ETMs.tif", package = "stars")
with stars::read_stars
st_apply
)st_crop
to crop the dataset to the area 300 m around POINT (293749.5 9115745)