A. Spatial data and simple features

Since data collections can only be done somewhere, and at some time, all data is, in principle spatial and temporal. In many cases location are implicit, and for instance encoded as ID of an object. In other cases, locations are encoded with coordinates.

Coordinates can be three-dimensional \((x, y, z)\), but are in many cases two-dimensional \((x, y)\). For geo spatial data, that is, data that are Earth-bound, two-dimensional coordinates are either

  • geographic: angles (degrees longitude and latitude), pointing out locations on an ellipsoid or sphere
  • projected: measured in a two-dimensional flat space, e.g. in metres, related to an ellipsoid by projection

A coordinate reference sytem (CRS) describes how coordinates are to be interpreted: which location on Earth do they refer to. In case of projected coordinates the CRS contains the projection type and parameters, in all cases which reference ellipsoid the original geographic coordinates are associated with (e.g., WGS84, or NAD27).

Objects and fields, features

It makes in many cases sense to distinguish to “types” of spatial information, depending on whether variation is discrete or continuous over space:

  • objects are discrete things: houses, cars, rivers, persons, which have a location and other properties
  • fields are continuous phenomena: temperature and pressure of air, elevation of the Earth surface; fields are essentially functions mapping from two- or three-dimensional space (and possibly time) to a value.

OGC, a standardisation body for geospatial information, defines in its abstract specification a feature as an “abstraction of real world phenomena” that has a geometry and attributes (other characteristics). Abstractions of both objects and fields are considered features.

Simple features are features with all geometric attributes described piecewise by straight line or planar interpolation between sets of points (Herring 2011); simple feature geometries are of type

  • point, multipoint,
  • linestring, multilinestring,
  • polygon, multipolygon, or
  • combinations of these (geometrycollection).

Coverages is the term OGC uses to represent field variables; the are typically (but not necessarily) associated with raster data.

Support

In the following figure:

what is the value at the point location pointed to with the arrow?

  • For land use, we have an infinite set of point observations; for every point in the polygon we know the land use; although this would maybe be only one record in our tables, this is a field variable (or “coverage”) representing a large set of points.
  • for population density we have a single observation, which is an aggregation for the entire polygon (the sum of the inhabitants, divided by the polygon area). This is a feature attribute.

Support is the physical size (length, area, volume) a single quantity, or measurement, is associated with. For the land use variable, we have point support. For the population density variable we have support the size of the area aggreated over.

intro to sf

Although not needed for working with sf, we’ll be using tidyverse.

library(tidyverse) # although not needed for working with sf!
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

sf uses the OSGEO stack of software:

library(sf)
## Linking to GEOS 3.7.1, GDAL 2.4.2, PROJ 5.2.0

Internal (R) and system dependencies look like this:

Package sf provides handling of feature data, where each feature geometry is a point, linestring, or polygon, or a sets of those, or some combination of those. It implements the full set of geometric functions described in the simple feature access standard, and some. The basic storage is very simple, and uses only base R types (list, matrix).

  • feature sets are held as records (rows) in sf objects, inheriting from data.frame or tbl_df
  • sf objects have at least one simple feature geometry list-column of class sfc
  • sfc geometry list-columns have a bounding box and a coordinate reference system as attribute, and a class attribute pointing out the common type (or GEOMETRY in case of a mix)
  • a single simple feature geometry is of class sfg, and further classes pointing out dimension and type

Storage of simple feature geometry:

  • POINT is a numeric vector
  • LINESTRING and MULTIPOINT are numeric matrix, points/vertices in rows
  • POLYGON and MULTILINESTRING are lists of matrices
  • MULTIPOLYGON is a lists of those
  • GEOMETRYCOLLECTION is a list of typed geometries

To build from scratch:

p1 = st_point(c(3,5))
class(p1)
## [1] "XY"    "POINT" "sfg"
p2 = st_point(c(4,6))
p3 = st_point(c(4,4))
pts = st_sfc(p1, p2, p3)
class(pts)
## [1] "sfc_POINT" "sfc"
sf = st_sf(a = c(3,2.5,4), b = c(1,2,4), geom = pts)
class(sf)
## [1] "sf"         "data.frame"
sf = st_as_sf(tibble::tibble(a = c(3,2.5,4), b = c(1,2,4), geom = pts))
class(sf)
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"
sf
## Simple feature collection with 3 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 3 ymin: 4 xmax: 4 ymax: 6
## epsg (SRID):    NA
## proj4string:    NA
## # A tibble: 3 x 3
##       a     b    geom
##   <dbl> <dbl> <POINT>
## 1   3       1   (3 5)
## 2   2.5     2   (4 6)
## 3   4       4   (4 4)
plot(sf, cex = 3, pch = 16, key.pos = 1, breaks = seq(.5,4.5,1), pal = sf.colors(4))

We can read some real data:

nc = read_sf(system.file("gpkg/nc.gpkg", package="sf")) # read as sf-tibble
agr = c(AREA = "aggregate", PERIMETER = "aggregate", CNTY_ = "identity",
  CNTY_ID = "identity", NAME = "identity", FIPS = "identity", FIPSNO = "identity",
  CRESS_ID = "identity", BIR74 = "aggregate", SID74 = "aggregate", NWBIR74 = "aggregate",
  BIR79 = "aggregate", SID79 = "aggregate", NWBIR79  = "aggregate")
st_agr(nc) = agr 
nc[c(9:11,15)]
## Simple feature collection with 100 features and 3 fields
## Attribute-geometry relationship: 0 constant, 3 aggregate, 0 identity
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
## # A tibble: 100 x 4
##    BIR74 SID74 NWBIR74                                                 geom
##    <dbl> <dbl>   <dbl>                                   <MULTIPOLYGON [°]>
##  1  1091     1      10 (((-81.47276 36.23436, -81.54084 36.27251, -81.5619…
##  2   487     0      10 (((-81.23989 36.36536, -81.24069 36.37942, -81.2628…
##  3  3188     5     208 (((-80.45634 36.24256, -80.47639 36.25473, -80.5368…
##  4   508     1     123 (((-76.00897 36.3196, -76.01735 36.33773, -76.03288…
##  5  1421     9    1066 (((-77.21767 36.24098, -77.23461 36.2146, -77.29861…
##  6  1452     7     954 (((-76.74506 36.23392, -76.98069 36.23024, -76.9947…
##  7   286     0     115 (((-76.00897 36.3196, -75.95718 36.19377, -75.98134…
##  8   420     0     254 (((-76.56251 36.34057, -76.60424 36.31498, -76.6482…
##  9   968     4     748 (((-78.30876 36.26004, -78.28293 36.29188, -78.3212…
## 10  1612     1     160 (((-80.02567 36.25023, -80.45301 36.25709, -80.4353…
## # … with 90 more rows

where st_agr specifies the attribute geometry relationship, indicating for each attribute whether we have polygon support (aggregate) or point support (constant or identity) for the attributes.

Suppose we want to query the nc dataset at POINT (-78.25073 34.07663) (so-called well known text encoding of points with these x and y coordinate), then we can do this by

pt = st_as_sfc("POINT (-78.25073 34.07663)")
st_intersection(nc, st_sfc(pt, crs = st_crs(nc)))
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
## Simple feature collection with 1 feature and 14 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -78.25073 ymin: 34.07663 xmax: -78.25073 ymax: 34.07663
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
## # A tibble: 1 x 15
##    AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74
##   <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>
## 1 0.212      2.02  2241    2241 Brun… 37019  37019       10  2181     5
## # … with 5 more variables: NWBIR74 <dbl>, BIR79 <dbl>, SID79 <dbl>,
## #   NWBIR79 <dbl>, geom <POINT [°]>

This generates a warning because some of the variables have non-point support, meaning that probably a meaningless value is now attached to a point geometry. The warning is not emitted if the attribute variable returned has point support, as in

i = st_intersection(nc["CNTY_"], st_sfc(pt, crs = st_crs(nc)))
## although coordinates are longitude/latitude, st_intersection assumes that they are planar

The remaining warning is justified, as we are doing 2D flat geometry on spherical (ellipsoidal), geographical coordinates. This is often works out, but not always. It goes away when we work with projected (planar) coordinates, e.g. in

nc1 = st_transform(nc, 2264) # NC state plain, US feet
pt1 = st_transform(st_sfc(pt, crs = st_crs(nc)), 2264)
i1 = st_intersection(nc1["CNTY_"], pt1)

We can plot two variable, for instance using base plot

plot(nc[c("SID74", "SID79")], key.pos = 4)

or ggplot:

nc %>% select(SID74, SID79) %>% gather(VAR, SID, -geom) -> nc2 # stack two columns
ggplot() + geom_sf(data = nc2, aes(fill = SID)) + facet_wrap(~VAR, ncol = 1) +
  scale_y_continuous(breaks = 34:36) +
  scale_fill_gradientn(colors = sf.colors(12)) 

Exercises

  1. start R, install package sf if not already present, and load it
  2. load the nc sample dataset as done above
  3. use st_intersects to discover which counties in nc intersect with county Rowan; experiment with the order of arguments to st_intersects
  4. try to understand the object returned by st_intersects
  5. look up which methods are available for objects of class sgbp (methods(class = "sgbp")), and try some of them
  6. does Rowan intersect with itself?
  7. Find all counties that touch Rowan by using st_touches; does Rowan touch itself?
  8. How can we find what intersects and touches mean, more formally?

B. Temporal data, raster data (20’ + 10’ exercises)

Time in R

Base R has two time classes:

(dt = Sys.Date())
## [1] "2019-09-02"
class(dt)
## [1] "Date"
(tm = Sys.time())
## [1] "2019-09-02 14:05:50 CEST"
class(tm)
## [1] "POSIXct" "POSIXt"

where Date has (integer) numbers representing the day number, and POSIXct has (double, decimal) seconds, both counting since Jan 1, 1970, 00:00 UTC.

Time series data

Packages zoo and xts provide classes for working with matrices of data, with a time index for each row. The more recent package tsibble does something similar, extending tibbles rather than matrix.

We will use a small piece of the air dataset in spacetime:

data(air, package = "spacetime") # requires spacetime to be installed
sel = 1800:2300
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
a.xts = xts(t(air)[sel,], order.by = dates[sel])

xts has the ability to intrepret ISO-8601 strings as time intervals, ### time

xts::.parseISO8601("2002-03")
## $first.time
## [1] "2002-03-01 CET"
## 
## $last.time
## [1] "2002-03-31 23:59:59 CEST"

and they can be used for time selection; we can select the first six months of 2003 (rows) and the first four stations (columns), using

plot(a.xts["2003-01/2003-06",1:4], main = "PM10, four rural background stations, DE")

Neither xts, nor (to my knowledge) other R packages for time series data, keep an explicit record of the time periods for which individual records are valid. The PM10 data examined here e.g. are daily averages, but a single time stamp (for daily data the start of the day, by convention) is registered for each observation.

The data considered here form a kind of a space-time raster, as shown by

image(dates[sel], 1:70, as.matrix(a.xts), ylab = "station")

Raster data

Raster data represent a 2-D matrix, where dimensions (rows/cols) map uniquely into coordinates.

In the simplest case of a

  • regular raster
    • \(c(i) = o + (i-1) \times \delta, \ \ i = 1,2,...,n\)
    • \(o\) is the border or the raster, \(\delta\) the cell size
    • for x, \(\delta\) is usually positive, for y, negative (and \(o\) the top edge)
    • index space is continuous: integer \(i\) implies \([i,i+1)\)
    • every coordinate value maps to a unique index (unlike polygon coverages, where boundaries are ambiguous)

More complex rasters are

  • rotated raster
    • column and row dimensions are not alined with x and y axis
  • sheared raster
    • column and row dimensions are no longer perpendicular
  • rectilinear raster
    • increments along column and row dimension are not constant
  • curvilinear raster
    • every col/row combination has its own x and y value (or corner values)

Analysing raster data

Package raster has been around for a long time, is robust, well tested, versatile, and works for large rasters. We will not discuss it here.

The more recent stars package has a different take on raster data, and will be used here. The following example loads 6 bands from a (section of a) Landsat 7 image, and plots it:

library(stars) # will load sf
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
plot(x)

The irregular color breaks come from the default of using “type = quantile” to classInt::classIntervals(), which causes histogram stretching; this is done over all 6 bands.

stars: out-of-memory (if time permits)

Copernicus’ Sentinel 2 provides the state-of-the art multispectral satellite images of today: 10 m resolution, 5 days intervals. A sample image is provided in the (off-CRAN) 1 Gb starsdata package:

install.packages("starsdata", repos = "http://pebesma.staff.ifgi.de/" , type = "source")
library(starsdata)
have_starsdata = require("starsdata") # TRUE to do the big data example with pkg starsdata; instructions for installing below
## Loading required package: starsdata

We will read in the four 10-m resolution bands:

granule = system.file("sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip", package = "starsdata")
base_name = strsplit(basename(granule), ".zip")[[1]]
s2 = paste0("SENTINEL2_L1C:/vsizip/", granule, "/", base_name, ".SAFE/MTD_MSIL1C.xml:10m:EPSG_32632")
(p = read_stars(s2, proxy = TRUE))
## stars_proxy object with 1 attribute in file:
## $`MTD_MSIL1C.xml:10m:EPSG_32632`
## [1] "SENTINEL2_L1C:/vsizip//home/edzer/R/x86_64-pc-linux-gnu-library/3.6/starsdata/sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.SAFE/MTD_MSIL1C.xml:10m:EPSG_32632"
## 
## dimension(s):
##      from    to offset delta                       refsys point values    
## x       1 10980  3e+05    10 +proj=utm +zone=32 +datum...    NA   NULL [x]
## y       1 10980  6e+06   -10 +proj=utm +zone=32 +datum...    NA   NULL [y]
## band    1     4     NA    NA                           NA    NA   NULL
object.size(p)
## 7336 bytes

As we can see, this object contains no data, but only a pointer to the data. Plotting it,

system.time(plot(p))

##    user  system elapsed 
##   0.784   0.157   0.446

takes less than a second, because only pixels that can be seen are read; plotting the full 10000 x 10000 images would have taken more than 10 minutes.

We can compute an index like ndvi, and plot it:

ndvi = function(x) (x[4] - x[1])/(x[4] + x[1])
rm(x)
(s2.ndvi = st_apply(p, c("x", "y"), ndvi))
## stars_proxy object with 1 attribute in file:
## $`MTD_MSIL1C.xml:10m:EPSG_32632`
## [1] "SENTINEL2_L1C:/vsizip//home/edzer/R/x86_64-pc-linux-gnu-library/3.6/starsdata/sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.SAFE/MTD_MSIL1C.xml:10m:EPSG_32632"
## 
## dimension(s):
##      from    to offset delta                       refsys point values    
## x       1 10980  3e+05    10 +proj=utm +zone=32 +datum...    NA   NULL [x]
## y       1 10980  6e+06   -10 +proj=utm +zone=32 +datum...    NA   NULL [y]
## band    1     4     NA    NA                           NA    NA   NULL    
## call list:
## [[1]]
## st_apply(X = X, MARGIN = c("x", "y"), FUN = ndvi)
system.time(plot(s2.ndvi)) # read - compute ndvi - plot

##    user  system elapsed 
##   0.950   0.173   0.644

where lazy evaluation is used: s2.ndvi still contains no pixels, but only plot calls for them, and instructs st_apply to only work on the resolution called for. If we would save s2.ndvi to disk, the full resolution would be computed: the command

write_stars(s2.ndvi, "s2.tif")

takes 5 minutes and writes a 500 Mb GeoTIFF.

Exercises

  1. Start R, install package stars (if not already present), and load it
  2. load the Landsat image test set, as above
  3. Create an RGB composite plot using argument rgb = c(3,2,1) for RGB, and rgb = c(4,3,2) for false color.
  4. Use x6 = split(x, "band") to create a 2-dimensional raster with 6 attributes
  5. Plot x6. What has changed?
  6. Compute the mean of all six bands by adding all six attributes and dividing by 6, and assigning the resulting matrix as a new attribute in x6
  7. As an alternative, compute the mean of all six bands by applying function mean over the x and y dimensions (essentially reducing “band”), by using st_apply; compare the results of the two approaches

C. Raster/vector data cubes, large rasters (20’ + 10’ exercises)

This section tries to bring everything together: simple features, vector data, time series, and rasters.

spatial time series

We will create a stars object from the space/time matrix in air:

a = air[,sel]
dim(a)
## space  time 
##    70   501
library(units)
## udunits system database from /usr/share/xml/udunits
(a.st = st_as_stars(list(PM10 = set_units(air[,sel], ppm))))
## stars object with 2 dimensions and 1 attribute
## attribute(s):
##    PM10 [ppm]     
##  Min.   :  1.125  
##  1st Qu.: 11.350  
##  Median : 17.542  
##  Mean   : 21.323  
##  3rd Qu.: 26.583  
##  Max.   :274.333  
##  NA's   :10901    
## dimension(s):
##       from  to offset delta refsys point values
## space    1  70     NA    NA     NA    NA   NULL
## time     1 501     NA    NA     NA    NA   NULL

however, this has still no knowledge of space and time dimensions, and reference systems. We can add this:

crs = 32632 # UTM zone 32N
a.st %>% 
  st_set_dimensions(1, values = st_as_sfc(stations)) %>% 
  st_set_dimensions(2, values = dates[sel]) %>% 
  st_transform(crs) -> a.st2
## Loading required package: sp
a.st2
## stars object with 2 dimensions and 1 attribute
## attribute(s):
##    PM10 [ppm]     
##  Min.   :  1.125  
##  1st Qu.: 11.350  
##  Median : 17.542  
##  Mean   : 21.323  
##  3rd Qu.: 26.583  
##  Max.   :274.333  
##  NA's   :10901    
## dimension(s):
##      from  to     offset  delta                       refsys point
## sfc     1  70         NA     NA +proj=utm +zone=32 +datum...  TRUE
## time    1 501 2002-12-05 1 days                         Date    NA
##                                                     values
## sfc  POINT (538708.6 5947030),...,POINT (532512.2 5454307)
## time                                                  NULL
st_bbox(a.st2)
##      xmin      ymin      xmax      ymax 
##  307809.3 5295751.9  907374.8 6086661.1
st_crs(a.st2)
## Coordinate Reference System:
##   EPSG: 32632 
##   proj4string: "+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs"

vector and raster data cubes: what are they?

  • They are annotated array data.
  • Arrays map from \(n\) dimensions to \(p\) attributes: \[(D_1,D_2,...,D_n)\rightarrow (A_1,A_2,...,A_p)\]
  • array dimensions are enumerated sets, and may
    • represent a set of entities (cities, persons, species, spectral bands)
    • discretize a continuous variable, such as
      • spatial dimension,
      • time,
      • time-of-day,
      • frequency or wavelength,
      • duration (e.g., time-to-forecast)

“Cube”:

We use the word cube short for

  • “real” cube: three dimensions
  • more than three dimensions: hypercube
  • two dimensions: matrix, raster

a special case is:

  • one dimensional cube: table with records (data.frame, tibble)

“Spatial” data cube

Cube dimensions can refer to spatial dimensions in several ways:

  • each continuous spatial dimension (x, y, z) maps to a single cube dimension (raster data cubes), e.g. regular grids
  • \(c(i) = o + (i-1) \times \delta, \ \ i = 1,2,...,n\)
  • index space is continuous: integer \(i\) implies \([i,i+1)\)
  • this means that every coordinate value maps to a unique index (unlike polygons)

  • all spatial dimensions map to a single cube dimension (vector data cubes)
    • cube dimension is a set of points/lines/polygons
  • combinations of these: e.g. origin-destination matrices


knitr::include_graphics('cube1.png')


knitr::include_graphics('cube2.png')


knitr::include_graphics('cube3.png')


knitr::include_graphics('cube4.png')

geostatistics with sf and stars (10’)

For the following to run well, we need stars (>= 0.3-2) installed from github:

remotes::install_github("r-spatial/stars")

We can interpolate these spatial time series, when we have a target grid, e.g. created by

DE_NUTS1 %>% st_as_sfc() %>% st_transform(crs) -> de # DE_NUTS1 is part of the "air" datasets
grd = st_as_stars(de)
grd[[1]][grd[[1]] == 0] = NA
plot(grd, axes = TRUE)

We will work with temporal means, rather than full space/time variability, and fit a variogram to the temporal mean PM10 values:

library(gstat)
st_apply(a.st2, "sfc", mean, na.rm = TRUE) %>% 
    st_as_sf() %>%
    na.omit()  -> a.means
v = variogram(mean ~ 1, a.means)
v.fit = fit.variogram(v, vgm(10, "Exp", 1e5, 10))
plot(v, v.fit)

Then, we can use this for kriging:

int <- krige(mean~1, a.means, grd, v.fit)
## [using ordinary kriging]
plot(int, reset = FALSE, key.pos = 4, breaks = "pretty")
plot(de, col = NA, border = 'red', add = TRUE)
plot(st_geometry(a.means), col = 'green', add = TRUE, pch = 16)

A ggplot2 plot of the same is obtained by

library(viridis)
## Loading required package: viridisLite
g = ggplot() + coord_equal() +
    scale_fill_viridis() +
    theme_void() +
    scale_x_discrete(expand=c(0,0)) +
    scale_y_discrete(expand=c(0,0))
g + geom_stars(data = int) + geom_sf(data = de, fill = NA) + geom_sf(data = a.means)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

It is likely that geom_stars() will be developed further, to take some of the defaults now being set for creating object g.

A further worked out space/time case study on this type of data is found in the geostatistics/interpolation chapter of (Pebesma and Bivand forthcoming ).

time series rasters

x = c(
"avhrr-only-v2.19810901.nc",
"avhrr-only-v2.19810902.nc",
"avhrr-only-v2.19810903.nc",
"avhrr-only-v2.19810904.nc",
"avhrr-only-v2.19810905.nc",
"avhrr-only-v2.19810906.nc",
"avhrr-only-v2.19810907.nc",
"avhrr-only-v2.19810908.nc",
"avhrr-only-v2.19810909.nc"
)
# see the second vignette:
# install.packages("starsdata", repos = "http://pebesma.staff.ifgi.de", type = "source")
file_list = system.file(paste0("netcdf/", x), package = "starsdata")
(y = read_stars(file_list, quiet = TRUE))
## stars object with 4 dimensions and 4 attributes
## attribute(s), summary of first 1e+05 cells:
##    sst [°*C]       anom [°*C]      err [°*C]     ice [percent]  
##  Min.   :-1.80   Min.   :-4.69   Min.   :0.110   Min.   :0.010  
##  1st Qu.:-1.19   1st Qu.:-0.06   1st Qu.:0.300   1st Qu.:0.730  
##  Median :-1.05   Median : 0.52   Median :0.300   Median :0.830  
##  Mean   :-0.32   Mean   : 0.23   Mean   :0.295   Mean   :0.766  
##  3rd Qu.:-0.20   3rd Qu.: 0.71   3rd Qu.:0.300   3rd Qu.:0.870  
##  Max.   : 9.36   Max.   : 3.70   Max.   :0.480   Max.   :1.000  
##  NA's   :13360   NA's   :13360   NA's   :13360   NA's   :27377  
## dimension(s):
##      from   to         offset  delta  refsys point values    
## x       1 1440              0   0.25      NA    NA   NULL [x]
## y       1  720             90  -0.25      NA    NA   NULL [y]
## zlev    1    1          0 [m]     NA      NA    NA   NULL    
## time    1    9 1981-09-01 UTC 1 days POSIXct    NA   NULL
library(abind)
z <- y %>% select(sst) %>% adrop()
# g + geom_stars(data = z[1], alpha = 0.8) + facet_wrap("time")
z$sst = units::drop_units(z$sst) # otherwise, titles are messed up (issue)
plot(z)

Transforming rasters

With such a grid, and any other grid, st_transform simply creates a new curvilinear grid:

We can warp a regular grid to be a new regular grid in the existing, or to a default grid in a new target CRS, e.g. by

tif = system.file("tif/L7_ETMs.tif", package = "stars")
read_stars(tif) %>%
  slice(prec, index = 1, along = "band") %>%
  st_warp(crs = "+proj=lcc") %>%
  plot()

As an option, it can use gdalwarp, using its library interface (see sf::gdal_utils).

Curvilinear grids

Consider the following, curvilinear, grid:

prec_file = system.file("nc/test_stageiv_xyt.nc", package = "stars")
(prec = read_ncdf(prec_file, curvilinear = c("lon", "lat")))
## no 'var' specified, using Total_precipitation_surface_1_Hour_Accumulation
## other available variables:
##  time_bounds, lon, lat, time
## No projection information found in nc file. 
##  Coordinate variable units found to be degrees, 
##  assuming WGS84 Lat/Lon.
## Warning in .get_nc_dimensions(dimensions, coord_var = all_coord_var, coords
## = coords, : bounds for time seem to be reversed; reverting them
## stars object with 3 dimensions and 1 attribute
## attribute(s):
##  Total_precipitation_surface_1_Hour_Accumulation [kg/m^2]
##  Min.   :  0.000                                         
##  1st Qu.:  0.000                                         
##  Median :  0.750                                         
##  Mean   :  4.143                                         
##  3rd Qu.:  4.630                                         
##  Max.   :163.750                                         
## dimension(s):
##      from  to                  offset  delta                       refsys
## x       1  87                      NA     NA +proj=longlat +datum=WGS8...
## y       1 118                      NA     NA +proj=longlat +datum=WGS8...
## time    1  23 2001-12-31 23:00:00 UTC 0 secs                      POSIXct
##      point                         values    
## x       NA [87x118] -80.6113,...,-74.8822 [x]
## y       NA   [87x118] 32.4413,...,37.6193 [y]
## time    NA                           NULL    
## curvilinear grid
## plot(prec) ## gives error about unique breaks:
## remove NAs, zeros, and give a large number
## of breaks (used for validating in detail)
qu_0_omit = function(x, ..., n = 22) {
  x = units::drop_units(na.omit(x))
  c(0, quantile(x[x > 0], seq(0, 1, length.out = n)))
}
library(dplyr) # loads slice generic
prec_slice = slice(prec, index = 17, along = "time")
breaks = qu_0_omit(prec_slice[[1]])
plot(prec_slice, border = NA, breaks = breaks, reset = FALSE)
nc = sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"), "nc.gpkg")
plot(st_geometry(nc), add = TRUE, reset = FALSE, col = NA, border = 'red')

Essentially, such grids are plotted as simple features (small polygons) by extrapolating grid cell centeres. This will not (yet) work with grid crossing dateline or poles.

Exercises

For the exercises, it may be good to use a version of stars not yet on CRAN, installed by:

remotes::install_github("r-spatial/stars")
  1. rasterize the nc dataset above, using st_rasterize. Which attribute is retained? How could you do this for another attribute?
  2. Modify the rasterization such that you get 30 rows and 60 columns (hint: look at the exercises help of ?st_rasterize)
  3. Look up what st_crop does, and run its examples.
  4. Using the precipitation grid data shown above, find the time series with for each county in North Carolina the maximum precipitation, and plot this. Note that
    • nc has a different datum from prec, use st_transform for datum transformation
    • the time values should be hourly, but may come out differently
  5. read in the file system.file("nc/tos_O1_2001-2002.nc", package = "stars") as r
    • before plotting it, replace the array remove the units using r[[1]] = units::drop_units(r[[1]])
  6. what is the bounding box of this data set? What is the CRS, and the temporal reference system?
  7. obtain the temperature time series of this dataset at point POINT(200 10), using pt = st_sfc(st_point(c(200, 10)))

References

Herring, John R. 2011. “OpenGIS Implementation Standard for Geographic Information-Simple Feature Access-Part 1: Common Architecture.” Open Geospatial Consortium Inc, 111. http://portal.opengeospatial.org/files/?artifact_id=25355.

Pebesma, Edzer, and Roger S. Bivand. forthcoming. Spatial Data Science; Uses Cases in R. CRC. https://r-spatial.org/book/.