In the following,
# install.packages("starsdata", repos = "http://pebesma.staff.ifgi.de", type = "source")
library(stars)
## Loading required package: abind
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
granule = system.file("sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip",
package = "starsdata")
s2 = paste0("SENTINEL2_L1C:/vsizip/", granule,
"/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.SAFE/MTD_MSIL1C.xml:10m:EPSG_32632")
(p = read_stars(s2, proxy = TRUE))
## stars_proxy object with 1 attribute in file:
## $`MTD_MSIL1C.xml:10m:EPSG_32632`
## [1] "SENTINEL2_L1C:/vsizip//home/edzer/R/x86_64-pc-linux-gnu-library/3.6/starsdata/sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.SAFE/MTD_MSIL1C.xml:10m:EPSG_32632"
##
## dimension(s):
## from to offset delta refsys point values
## x 1 10980 3e+05 10 +proj=utm +zone=32 +datum... NA NULL [x]
## y 1 10980 6e+06 -10 +proj=utm +zone=32 +datum... NA NULL [y]
## band 1 4 NA NA NA NA NULL
plot(p)
we see a large triangle being black. These are zero values, (wrongly!) not encoded as NODATA (or NA). Specifying them blanks them out:
(p = read_stars(s2, proxy = TRUE, NA_value = 0))
## stars_proxy object with 1 attribute in file:
## $`MTD_MSIL1C.xml:10m:EPSG_32632`
## [1] "SENTINEL2_L1C:/vsizip//home/edzer/R/x86_64-pc-linux-gnu-library/3.6/starsdata/sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.SAFE/MTD_MSIL1C.xml:10m:EPSG_32632"
##
## NA_value: 0
## dimension(s):
## from to offset delta refsys point values
## x 1 10980 3e+05 10 +proj=utm +zone=32 +datum... NA NULL [x]
## y 1 10980 6e+06 -10 +proj=utm +zone=32 +datum... NA NULL [y]
## band 1 4 NA NA NA NA NULL
plot(p)
Where do these come from? The Sentinel2 data is composed of 290 km wide swaths, bands of constant width that are not alinged to North or East. These swath observations are then regridded to a predefined 100 km x 100 km grids (10000 x 10000 pixels on the four 10 m reslution bands) in UTM zones. The tiles we get for Germany are e.g.
if (!file.exists("tiles.rda")) {
# read tile geometries from:
# https://sentinel.esa.int/web/sentinel/missions/sentinel-2/data-products
tiles = read_sf("https://sentinel.esa.int/documents/247904/1955685/S2A_OPER_GIP_TILPAR_MPC__20151209T095117_V20150622T000000_21000101T000000_B00.kml")
tiles = st_zm(st_collection_extract(tiles, "POLYGON"))
tiles$Description = NULL
tiles = aggregate(tiles, list(Name = tiles$Name), function(x) x[1])
save(tiles, file = "tiles.rda")
} else
load("tiles.rda")
DE = st_as_sf(raster::getData("GADM", country = "DE", level = 1))
# select scenes:
tiles = tiles[DE,]
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
plot(tiles[,1], reset = FALSE, col = sf.colors(categorical=TRUE, alpha=.5))
## Warning in plot.sf(tiles[, 1], reset = FALSE, col = sf.colors(categorical =
## TRUE, : col is not of length 1 or nrow(x): colors will be recycled; use pal
## to specify a color palette
plot(DE, add = TRUE, col = NA, border = 'red')
## Warning in plot.sf(DE, add = TRUE, col = NA, border = "red"): ignoring all
## but the first attribute
(Note that the tiles have 10 km overlap, everywhere)
In the case of Sentinel2, for each of these tiles we get an image roughly every 5 days. Each image is roughly 1 Gb, and contains 13 bands:
Why not exactly 5 days?
For combining tiles that do not line up (mosaic, warp) into a regular raster, package gdalcubes can be used.
See also this chapter.
Data cubes are data structures where values are arranged at the junction of two (one) or more independent dimensions. These dimensions can reflect (discretized version) of continuous variables (e.g. three space dimension, one time dimensions), but also reflect discrete variables (e.g. species, variable measured or modelled in a weather model).
Vector data (points, lines, polygons) can be encoded along a single dimension, as an enumerated set of geographical locations, linestrings, or polygons.
Examples include:
Data cubes can be sparse or dense: if they are dense, every combination of dimension values has an actual value (in the database, or the memory object); if they are sparse, only combinations for which there is a data value are stored. Systems for sparse data cubes are relatively rare, an example is SciDB.
The stars R package implements dense vector and raster data cubes. See the data model vignette (article) for an explanation of the different forms it accepts.
Satellite imagery like Sentinel2 can be seen as sparse data cubes: since the satellites follow a path, a particular observed tile has its own time stamp, and all other tiles will be empty for that time stamp. Also, collections of tiles over different UTM systems do not form a (single) regular grid in x and y.
Nevertheless, it is often easier to (be able to) treat the data as if it were a data cube. We can do this by a data cube view: for a particular x/y grid (in a particular coordinate reference system) and a sequence of time steps (e.g. monthly) we consider (in some way, by resampling) the underlying imagery, and visualise it or build models for it. This approach is taken by Google Earth Engine, and is also followed by the openEO API. The stars R package follows this design pattern but in a much simplified (pure R) approach.
Key features of this approach:
See the stars proxy vignette for examples of this pattern; the images above only read and show the pixels present in the html / on screen.
For larger areas and time periods, downloading satellite data to a local storage is a lost game; even if local storage is sufficient, network bandwidth is likely to be the limiting factor.
In terms of spatial statistics, remote sensing data analysis is used e.g.
Otherwise, typical RS analysis challenges involve