Meaningful Spatial Statistics: The mss Package

This vignette introduces an R package for meaningful spatial statistics,

library(mss)

The package introduces three classes for spatial data: SField for representing geostatistical data (spatial fields), SObjects for point patterns (spatial objects), and SLattice for lattice (areal) data (spatial lattices). The abbreviations follow this paper (which is under review).

SExtent: extent (window) of field or point pattern

SField and SObjects objects contain an SExtent, representing the domain or extent of the field, or the window of observation for a point pattern.

The window class defines an area:

showClass("SExtent")
## Class "SExtent" [package "mss"]
## 
## Slots:
##               
## Name:     area
## Class: Spatial
## 
## Extends: "SExtentOrNULL"

The area can be represented either by a grid, as in

library(sp)
demo(meuse, ask = FALSE, echo = FALSE)
w2 = SExtent(meuse.grid) # note the grid is passed, not the polygons

or by a polygon or set of polygons, as in

w1 = SExtent(meuse.area)

SField: geostatistical data

SFields represent continuous functions, which over a domain \(D\) have a value at every point \(s \in D\).

showClass("SField")
## Class "SField" [package "mss"]
## 
## Slots:
##                                                    
## Name:    observations         domain cellsArePoints
## Class:        Spatial  SExtentOrNULL        logical

Fields from points

Observations on fields variables are often done on irregular point locations. Such a field variable is created from a SpatialPointsDataFrame object:

sf = SField(meuse, meuse.grid)

where, here, meuse.grid defines the domain for the field. Alternatively, point observations can be gridded, and defined as a field by

sf = SField(meuse.grid, meuse.grid, cellsArePoints = TRUE)

Note that the first argument defines the point observations, and the second argument the domain: the first argument is treated as a (gridded) set of points (grid cell centres), the second as a gridded set of areas (cells).

Fields from grids, grid cells representing constant areas

We can alternatively assume that each grid cell represents an area with constant point values. Such a field is defined by:

sf = SField(meuse.grid, meuse.grid, cellsArePoints = FALSE)

This field is completely known, for every point location. Note that grid cell values are constant throughout the cell, and are not conceived as cell aggregate (e.g. average, or otherwise convoluted) values. For that case, SpatialAggregates are used.

Fields from lines and polygons

Sets of points can not only be defined by grid cells, but also through lines, or polygons. For instance, points on a contour line could form a sample of a field:

library(maptools)
## Checking rgeos availability: TRUE
library(rgeos)
## rgeos version: 0.3-15, (SVN revision (unknown))
##  GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921 
##  Linking to sp version: 1.2-1 
##  Polygon checking: TRUE
cl = ContourLines2SLDF(contourLines(as.image.SpatialGridDataFrame(
  meuse.grid["dist"])))
proj4string(cl) = CRS(proj4string(meuse.grid))
sf.lines = SField(cl, meuse.grid)

or, alternatively, as a set of polygons; in this example, a single polygon with a constant value:

pol = addAttrToGeom(meuse.area, data.frame(value = 3), FALSE)
sf.pol = SField(pol, meuse.area)

SObjects: point patterns or objects

The SObjects class can be used to represent objects in a space, where the window of observation reflects the area for which all points or objects are available:

showClass("SObjects")
## Class "SObjects" [package "mss"]
## 
## Slots:
##                                   
## Name:   observations        window
## Class:       Spatial SExtentOrNULL

We can create a point pattern e.g. for the sample locations of the meuse data set:

pts = geometry(meuse)
se.pts = SObjects(pts, meuse.area)

but also areas can indicate entities:

se.area = SObjects(meuse.area, meuse.area)

SLattice: areal data

Aggregations refer to values, measured or computed as the aggregate over a set of (point) values, e.g. the mean, maximum or minimum value. As such, aggregations need to be defined for sets of points, i.e. for lines, polygons, or grid cells:

showClass("SLattice")
## Class "SLattice" [package "mss"]
## 
## Slots:
##                    
## Name:  observations
## Class:      Spatial

Sets of points can be represented by anything but points: lines, grids, and polygons:

sa.1 = SLattice(cl)
sa.3 = SLattice(meuse.grid)
sa.2 = SLattice(pol)

Three interpretations of a spatial grid

Spatial grid data can be conceived as having values for 1. points, at grid cell centres 2. constant values throughout each grid cell 3. aggregated values over the grid cell region The difference is illustrated by querying each at two points, one coinciding with a grid cell centre, one not coinciding. The two points are indicated by red circles:

xy = rbind(coordinates(meuse)[1,], coordinates(meuse.grid)[10,])
pts = SpatialPoints(xy, CRS(proj4string(meuse.grid)))
plot(as(meuse.grid[1:21,], "SpatialPolygons"))
plot(pts, col = 'red', pch = 16, add = TRUE)
## Error in plot.xy(xy.coords(x, y), type = type, ...): plot.new has not been called yet
plot(meuse.grid[1:21,], add = TRUE)
## Error in segments(p[, 1], p[, 2], p[, 3], p[, 4], col = col, lty = lty, : plot.new has not been called yet
text(coordinates(pts), c("1", "2"), pos = 2)
## Error in text.default(coordinates(pts), c("1", "2"), pos = 2): plot.new has not been called yet

Querying a spatial field where cells are points gives non-missing values only when the point coincides with the grid cell centre:

f1 = SField(meuse.grid, meuse.area, cellsArePoints = TRUE)
pts = SField(pts, meuse.grid)
over(pts, f1)
##   part.a part.b      dist soil ffreq
## 1     NA     NA        NA <NA>  <NA>
## 2      1      0 0.0122243    1     1

Querying a spatial field where cells are constant valued areas gives the grid cell values of the cells where the point is in:

f2 = SField(meuse.grid, meuse.area, cellsArePoints = FALSE)
over(pts, f2) # matches pts to the complete grid cell
##   part.a part.b       dist soil ffreq
## 1      1      0 0.00135803    1     1
## 2      1      0 0.01222430    1     1

Querying a grid where cell values are aggregations does not answer and generates a warning:

a = SLattice(meuse.grid)
over(pts, a)
## Warning: deriving field values from aggregations is not considered
## meaningful
## NULL

Methods for fields: interpolation

point-to-point interpolation

We can predict to points, laid out regularly, inside the polygon of the field domain:

m = SField(meuse, meuse.area)
v = gstat::vgm(.6, "Sph", 900, .05) # variogram model
i1 = interpolate(log(zinc)~1, m, model = v)
## [using ordinary kriging]
spplot(i1, "var1.pred")

plot of chunk unnamed-chunk-20

To show that these really concern points, plotting for a smaller region gives actual point symbols:

spplot(i1[1:300,], "var1.pred")

plot of chunk unnamed-chunk-21

Alternatively, we can predict values on a pre-defined grid, taken from the domain:

m = SField(meuse, meuse.grid)
i2 = interpolate(log(zinc)~1, m, model = v)
## [using ordinary kriging]
print(as(i2[1:3,], "data.frame"), digits = 10) # point support
##     var1.pred     var1.var      x      y
## 1 6.501248333 0.3222480995 181180 333740
## 2 6.624055036 0.2534788321 181140 333700
## 3 6.506542742 0.2747544736 181180 333700

point-to-area interpolation (block kriging)

From point values, we can predict the polygon area mean value:

library(rgeos)
m3 = SLattice(meuse.area)
i3 = interpolate(log(zinc)~1, m, m3, model = v) # interpolate from point TO polygon support
## [using ordinary kriging]
i3$var1.pred # a single value, estimate of mean for the whole area
## [1] 5.716157
mean(log(m$zinc)) # sample mean
## [1] 5.885776
mean(i1$var1.pred) # mean of predictions, similar to i2$var1.pred
## [1] 5.706189

or predict values for grid cell averages (block kriging):

m4 = SLattice(meuse.grid)
i4 = interpolate(log(zinc)~1, m, m4, model = v) # interpolate from point TO feature (cell) support
## [using ordinary kriging]
print(as(i4[1:3,], "data.frame"), digits = 10) # grid cells support
##     var1.pred     var1.var      x      y
## 1 6.500767419 0.2521655376 181180 333740
## 2 6.623255277 0.1837112271 181140 333700
## 3 6.505998695 0.2048321051 181180 333700
spplot(i4, "var1.pred")

plot of chunk unnamed-chunk-24

Support: area-to-point and area-to-area kriging

Suppose we have grid data on a 3 x 3 grid, and want to estimate on a new location, indicated by a red circle:

set.seed(131)
gt = GridTopology(cellcentre.offset = c(0,0), cellsize = c(1,1), cells.dim = c(3,3))
sgdf = SpatialGridDataFrame(SpatialGrid(gt), 
    data.frame(r = round(runif(9, max = 10), 1)))
plot(as(sgdf, "SpatialPolygons"), axes = TRUE)
text(coordinates(sgdf), labels = sgdf[[1]])
## Error in text.default(coordinates(sgdf), labels = sgdf[[1]]): plot.new has not been called yet
pt.red = SpatialPoints(cbind(0.25, 0.25))
points(pt.red, col = 'red', pch = 1)
## Error in plot.xy(xy.coords(x, y), type = type, ...): plot.new has not been called yet

The first case is when the grid cell data represent point support values at the grid cell centre, which is identical to case i1 above:

sf = SField(as(sgdf, "SpatialPointsDataFrame"), sgdf)
vm = gstat::vgm(1, "Exp", 1)
i5 = interpolate(r~1, sf, SField(pt.red, sgdf), model = vm)
## [using ordinary kriging]
as(i5[1,], "data.frame")
##           coords.x1 coords.x2 var1.pred  var1.var
## coords.x1      0.25      0.25  4.741938 0.4233456

The second case, grid cells represent constant area with point support value, meaning that every point location within a grid cell has the value of the grid cell assigned:

sf = SField(sgdf, sgdf, cellsArePoints = FALSE)
i6 = interpolate(r~1, sf, SField(pt.red, sgdf), model = vm)
as(i6[1,], "data.frame") # does not interpolate, but queries grid cell
##     r coords.x1 coords.x2
## 1 5.2      0.25      0.25

as case i4 above, from point values we can predict a mean value for a grid cell by

sf = SField(sgdf, sgdf, cellsArePoints = TRUE)
gt = GridTopology(cellcentre.offset = c(.25,.25), cellsize = c(1,1), cells.dim = c(1,1))
grd.red = SpatialGrid(gt)
i7 = interpolate(r~1, sf, SLattice(grd.red), model = vm)
## [using ordinary kriging]
i7@observations@data
##   var1.pred  var1.var
## 1  4.609252 0.1198707

Area-to-point kriging

sp = as(sgdf, "SpatialPolygonsDataFrame")
gf = SLattice(sp)
kr = interpolate(r~1, gf, SField(pt.red, sp), model = vm)
kr@observations[[1]]
## [1] 5.356593
library(gstat)
krige0(r ~ 1, sp, pt.red, vgmArea, vgm = vm) # check
##       [,1]
## 1 5.356593

Area-to-area (grid-to-grid) kriging

sp = as(sgdf, "SpatialPolygonsDataFrame")
gr = as(grd.red, "SpatialPolygons")
grd = addAttrToGeom(gr, data.frame(r=0), FALSE)
plot(as(sgdf, "SpatialPolygons"), axes = TRUE)
text(coordinates(sgdf), labels = sgdf[[1]])
## Error in text.default(coordinates(sgdf), labels = sgdf[[1]]): plot.new has not been called yet
plot(grd, add = TRUE, border = 'red')
## Error in polypath(x = mcrds[, 1], y = mcrds[, 2], border = border, col = col, : plot.new has not been called yet
kr = interpolate(r~1, gf, SLattice(grd), model = vm)
kr@observations[[1]]
## [1] 5.029828
krige0(r ~ 1, sp, grd, vgmArea, vgm = vm) # check
##       [,1]
## 1 5.029828

Estimating density from entitities

For SpatialEntity objects, we can estimate the density of objects; this method reuses MASS:kde2d:

e = SObjects(meuse, meuse.area)
d = density(e, newdata = meuse.grid)
lt = list(list("sp.polygons", meuse.area, border = 'black',
    first=FALSE), list("sp.points", meuse, col = grey(.5), pch = 16))
class(d)
## [1] "SLattice"
## attr(,"package")
## [1] "mss"
spplot(d, sp.layout = lt)

plot of chunk unnamed-chunk-31

The reason why this returns a SLattice is that the values computed (densities) do not correspond to quantities that are measurable at point locations where they are registered (grid cells). In fact, they are aggregated values for areas even much larger than the grid cells.

Spatial Aggregation

for SField data

m = SField(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(meuse.area), mean) # no warning
a[[1]]
## [1] 469.7161
a = aggregate(m, SLattice(meuse.area), sum) # warns:
## Warning: for SField objects, aggregation using a sum function is not
## considered meaningful
a[[1]]
## [1] 72806

for SpatialEntity data

m = SObjects(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(meuse.area), mean) # warns:
## Warning: for SObjects objects, aggregation using a non-sum function may not
## be meaningful
a[[1]]
## [1] 469.7161
a = aggregate(m, SLattice(meuse.area), sum) # no warning
a[[1]]
## [1] 72806

Meaningful warnings

Pebesma et al. (2014) make assertions A1-A7.

A1.

A prediction is meaningful, if it provides an estimate for a potential observation Prediction on a field variable generates no warning.

Where prediction of a SField varialbe generates no warning,

m = SField(meuse, meuse.area) # data are point support
res = interpolate(log(zinc)~1, m) # interpolate from point TO point support
## [inverse distance weighted interpolation]

prediction of a point pattern variable generates a warning:

m = SObjects(meuse, meuse.area)
sa = SLattice(meuse.grid)
res = interpolate(log(zinc)~1, m, sa) # interpolate point pattern variable: warns
## Warning: interpolation of SObjects is not considered meaningful

A2.

Summing up values over an area is meaningful, if the observed window corresponds to the target geometry (grouping predicate) of the aggregation.

For SpatialEntity data, spatial aggregation over an area larger than the observation window generates a warning:

x = bbox(meuse)[1,]
y = bbox(meuse)[2,]
bb = Polygon(cbind(c(x[1],x[1],x[2],x[2],x[1]), c(y[1],y[2],y[2],y[1],y[1])))
bbx = SpatialPolygons(list(Polygons(list(bb),"bbox_meuse")),
    proj4string = CRS(proj4string(meuse)))
m = SObjects(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(bbx), sum) # warns:
## Warning: aggregation over an area larger than the observation window is not
## considered meaningful

Stasch et al. 2014 prove A1 and A2 more formally.

Further assertions in the poster:

A3.

Polygon data may reflect constant values over areas (geology) or means (population density), this matters for (further) aggregation or downsampling.

Package mss distinguishes between SField and SLattice to represent constant values, and aggregations, respectively. Subsampling SField variabls does not not warn,

area = addAttrToGeom(meuse.area, data.frame(v=1), match.ID = FALSE)
f = SField(area, meuse.area)
pts = SField(meuse[1:5,], meuse.area)
over(pts, f) # retrieve f at the points of pts:
##   v
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1

but subsampling SLattice variables triggers a warning, and no useful value

a = SLattice(area)
over(pts, a)
## Warning: deriving field values from aggregations is not considered
## meaningful
## NULL

as it should really disaggregate first, from area to point.

A4.

The same is true for grid cells.

SField and SLattice both generalize polygon and grid data.

f1 = SField(meuse.grid, meuse.area, cellsArePoints = TRUE)
pts = SField(meuse[1:5,], meuse.area)
over(pts, f1) # matches pts to grid cell center points:
##   part.a part.b dist soil ffreq
## 1     NA     NA   NA <NA>  <NA>
## 2     NA     NA   NA <NA>  <NA>
## 3     NA     NA   NA <NA>  <NA>
## 4     NA     NA   NA <NA>  <NA>
## 5     NA     NA   NA <NA>  <NA>
f2 = SField(meuse.grid, meuse.area, cellsArePoints = FALSE)
over(pts, f2) # matches pts to the complete grid cell
##   part.a part.b       dist soil ffreq
## 1      1      0 0.00135803    1     1
## 2      1      0 0.01222430    1     1
## 3      1      0 0.10302900    1     1
## 4      1      0 0.19009400    2     1
## 5      1      0 0.27709000    2     1
a = SLattice(meuse.grid)
over(pts, a)
## Warning: deriving field values from aggregations is not considered
## meaningful
## NULL

A5.

Point pattern data often come without observation window, over which aggregation is meaningful

Attempting to create a SObjects object without an observation window specified gives a warning message:

pts <- SObjects(meuse)
## Warning: window set to the observation data features

Providing a window that does not contain all the points gives an error:

tr = try(pts <- SObjects(meuse, meuse[1:10,]))
attr(tr, "condition")
## <simpleError in validityMethod(object): one or more features are outside the observation window>

A6.

Field measurements often come without a notion for which locations interpolations based on them make sense.

m = SField(meuse["zinc"], meuse.area)
a = interpolate(log(zinc)~1, m, SLattice(bbx)) # warns:
## Warning: interpolating over an area larger than the domain is not
## considered meaningful
## [inverse distance weighted interpolation]

A7.

File types do not inform on meaningfulness

SpatialPoints, SpatialPixels, SpatialGrid, SpatialLines, SpatialPolygons and their *DataFrame counterparts do not inform whether data concern fields (geostatistical data), entities (point patterns), or aggregations. The classes in mss: SField, SObjects, and SLattice, do.

References

  1. E. Pebesma, C. Stasch, B. Graeler and S. Scheider, 2014. Meaningfully Integrating Big Earth Science Data; AGU fall meeting, poster IN33A-3757 pdf
  2. Stasch, C., S. Scheider, E. Pebesma, W. Kuhn, 2014. Meaningful Spatial Prediction and Aggregation. Environmental Modelling & Software, 51, 149-165 open access.