mss
PackageThis vignette introduces an R package for meaningful spatial statistics,
library(mss)
The package introduces three classes for spatial data: SField
for representing geostatistical data (spatial fields), SObjects
for point patterns (spatial objects), and SLattice
for lattice
(areal) data (spatial lattices). The abbreviations follow this
paper
(which is under review).
SExtent
: extent (window) of field or point patternSField
and SObjects
objects contain an SExtent
, representing
the domain or extent of the field, or the window of observation
for a point pattern.
The window class defines an area:
showClass("SExtent")
## Class "SExtent" [package "mss"]
##
## Slots:
##
## Name: area
## Class: Spatial
##
## Extends: "SExtentOrNULL"
The area can be represented either by a grid, as in
library(sp)
demo(meuse, ask = FALSE, echo = FALSE)
w2 = SExtent(meuse.grid) # note the grid is passed, not the polygons
or by a polygon or set of polygons, as in
w1 = SExtent(meuse.area)
SField
: geostatistical dataSFields represent continuous functions, which over a domain \(D\) have a value at every point \(s \in D\).
showClass("SField")
## Class "SField" [package "mss"]
##
## Slots:
##
## Name: observations domain cellsArePoints
## Class: Spatial SExtentOrNULL logical
Observations on fields variables are often done on irregular
point locations. Such a field variable is created from a
SpatialPointsDataFrame
object:
sf = SField(meuse, meuse.grid)
where, here, meuse.grid
defines the domain for the
field. Alternatively, point observations can be gridded, and
defined as a field by
sf = SField(meuse.grid, meuse.grid, cellsArePoints = TRUE)
Note that the first argument defines the point observations, and the second argument the domain: the first argument is treated as a (gridded) set of points (grid cell centres), the second as a gridded set of areas (cells).
We can alternatively assume that each grid cell represents an area with constant point values. Such a field is defined by:
sf = SField(meuse.grid, meuse.grid, cellsArePoints = FALSE)
This field is completely known, for every point location. Note
that grid cell values are constant throughout the cell, and are
not conceived as cell aggregate (e.g. average, or otherwise
convoluted) values. For that case, SpatialAggregates
are used.
Sets of points can not only be defined by grid cells, but also through lines, or polygons. For instance, points on a contour line could form a sample of a field:
library(maptools)
## Checking rgeos availability: TRUE
library(rgeos)
## rgeos version: 0.3-15, (SVN revision (unknown))
## GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921
## Linking to sp version: 1.2-1
## Polygon checking: TRUE
cl = ContourLines2SLDF(contourLines(as.image.SpatialGridDataFrame(
meuse.grid["dist"])))
proj4string(cl) = CRS(proj4string(meuse.grid))
sf.lines = SField(cl, meuse.grid)
or, alternatively, as a set of polygons; in this example, a single polygon with a constant value:
pol = addAttrToGeom(meuse.area, data.frame(value = 3), FALSE)
sf.pol = SField(pol, meuse.area)
SObjects
: point patterns or objectsThe SObjects
class can be used to represent objects in a space,
where the window of observation reflects the area for which all
points or objects are available:
showClass("SObjects")
## Class "SObjects" [package "mss"]
##
## Slots:
##
## Name: observations window
## Class: Spatial SExtentOrNULL
We can create a point pattern e.g. for the sample locations of the meuse
data set:
pts = geometry(meuse)
se.pts = SObjects(pts, meuse.area)
but also areas can indicate entities:
se.area = SObjects(meuse.area, meuse.area)
SLattice
: areal dataAggregations refer to values, measured or computed as the aggregate over a set of (point) values, e.g. the mean, maximum or minimum value. As such, aggregations need to be defined for sets of points, i.e. for lines, polygons, or grid cells:
showClass("SLattice")
## Class "SLattice" [package "mss"]
##
## Slots:
##
## Name: observations
## Class: Spatial
Sets of points can be represented by anything but points: lines, grids, and polygons:
sa.1 = SLattice(cl)
sa.3 = SLattice(meuse.grid)
sa.2 = SLattice(pol)
Spatial grid data can be conceived as having values for 1. points, at grid cell centres 2. constant values throughout each grid cell 3. aggregated values over the grid cell region The difference is illustrated by querying each at two points, one coinciding with a grid cell centre, one not coinciding. The two points are indicated by red circles:
xy = rbind(coordinates(meuse)[1,], coordinates(meuse.grid)[10,])
pts = SpatialPoints(xy, CRS(proj4string(meuse.grid)))
plot(as(meuse.grid[1:21,], "SpatialPolygons"))
plot(pts, col = 'red', pch = 16, add = TRUE)
## Error in plot.xy(xy.coords(x, y), type = type, ...): plot.new has not been called yet
plot(meuse.grid[1:21,], add = TRUE)
## Error in segments(p[, 1], p[, 2], p[, 3], p[, 4], col = col, lty = lty, : plot.new has not been called yet
text(coordinates(pts), c("1", "2"), pos = 2)
## Error in text.default(coordinates(pts), c("1", "2"), pos = 2): plot.new has not been called yet
Querying a spatial field where cells are points gives non-missing values only when the point coincides with the grid cell centre:
f1 = SField(meuse.grid, meuse.area, cellsArePoints = TRUE)
pts = SField(pts, meuse.grid)
over(pts, f1)
## part.a part.b dist soil ffreq
## 1 NA NA NA <NA> <NA>
## 2 1 0 0.0122243 1 1
Querying a spatial field where cells are constant valued areas gives the grid cell values of the cells where the point is in:
f2 = SField(meuse.grid, meuse.area, cellsArePoints = FALSE)
over(pts, f2) # matches pts to the complete grid cell
## part.a part.b dist soil ffreq
## 1 1 0 0.00135803 1 1
## 2 1 0 0.01222430 1 1
Querying a grid where cell values are aggregations does not answer and generates a warning:
a = SLattice(meuse.grid)
over(pts, a)
## Warning: deriving field values from aggregations is not considered
## meaningful
## NULL
We can predict to points, laid out regularly, inside the polygon of the field domain:
m = SField(meuse, meuse.area)
v = gstat::vgm(.6, "Sph", 900, .05) # variogram model
i1 = interpolate(log(zinc)~1, m, model = v)
## [using ordinary kriging]
spplot(i1, "var1.pred")
To show that these really concern points, plotting for a smaller region gives actual point symbols:
spplot(i1[1:300,], "var1.pred")
Alternatively, we can predict values on a pre-defined grid, taken from the domain:
m = SField(meuse, meuse.grid)
i2 = interpolate(log(zinc)~1, m, model = v)
## [using ordinary kriging]
print(as(i2[1:3,], "data.frame"), digits = 10) # point support
## var1.pred var1.var x y
## 1 6.501248333 0.3222480995 181180 333740
## 2 6.624055036 0.2534788321 181140 333700
## 3 6.506542742 0.2747544736 181180 333700
From point values, we can predict the polygon area mean value:
library(rgeos)
m3 = SLattice(meuse.area)
i3 = interpolate(log(zinc)~1, m, m3, model = v) # interpolate from point TO polygon support
## [using ordinary kriging]
i3$var1.pred # a single value, estimate of mean for the whole area
## [1] 5.716157
mean(log(m$zinc)) # sample mean
## [1] 5.885776
mean(i1$var1.pred) # mean of predictions, similar to i2$var1.pred
## [1] 5.706189
or predict values for grid cell averages (block kriging):
m4 = SLattice(meuse.grid)
i4 = interpolate(log(zinc)~1, m, m4, model = v) # interpolate from point TO feature (cell) support
## [using ordinary kriging]
print(as(i4[1:3,], "data.frame"), digits = 10) # grid cells support
## var1.pred var1.var x y
## 1 6.500767419 0.2521655376 181180 333740
## 2 6.623255277 0.1837112271 181140 333700
## 3 6.505998695 0.2048321051 181180 333700
spplot(i4, "var1.pred")
Suppose we have grid data on a 3 x 3 grid, and want to estimate on a new location, indicated by a red circle:
set.seed(131)
gt = GridTopology(cellcentre.offset = c(0,0), cellsize = c(1,1), cells.dim = c(3,3))
sgdf = SpatialGridDataFrame(SpatialGrid(gt),
data.frame(r = round(runif(9, max = 10), 1)))
plot(as(sgdf, "SpatialPolygons"), axes = TRUE)
text(coordinates(sgdf), labels = sgdf[[1]])
## Error in text.default(coordinates(sgdf), labels = sgdf[[1]]): plot.new has not been called yet
pt.red = SpatialPoints(cbind(0.25, 0.25))
points(pt.red, col = 'red', pch = 1)
## Error in plot.xy(xy.coords(x, y), type = type, ...): plot.new has not been called yet
The first case is when the grid cell data represent
point support values at the grid cell centre, which
is identical to case i1
above:
sf = SField(as(sgdf, "SpatialPointsDataFrame"), sgdf)
vm = gstat::vgm(1, "Exp", 1)
i5 = interpolate(r~1, sf, SField(pt.red, sgdf), model = vm)
## [using ordinary kriging]
as(i5[1,], "data.frame")
## coords.x1 coords.x2 var1.pred var1.var
## coords.x1 0.25 0.25 4.741938 0.4233456
The second case, grid cells represent constant area with point support value, meaning that every point location within a grid cell has the value of the grid cell assigned:
sf = SField(sgdf, sgdf, cellsArePoints = FALSE)
i6 = interpolate(r~1, sf, SField(pt.red, sgdf), model = vm)
as(i6[1,], "data.frame") # does not interpolate, but queries grid cell
## r coords.x1 coords.x2
## 1 5.2 0.25 0.25
as case i4
above, from point values we can predict a mean value for a grid cell by
sf = SField(sgdf, sgdf, cellsArePoints = TRUE)
gt = GridTopology(cellcentre.offset = c(.25,.25), cellsize = c(1,1), cells.dim = c(1,1))
grd.red = SpatialGrid(gt)
i7 = interpolate(r~1, sf, SLattice(grd.red), model = vm)
## [using ordinary kriging]
i7@observations@data
## var1.pred var1.var
## 1 4.609252 0.1198707
sp = as(sgdf, "SpatialPolygonsDataFrame")
gf = SLattice(sp)
kr = interpolate(r~1, gf, SField(pt.red, sp), model = vm)
kr@observations[[1]]
## [1] 5.356593
library(gstat)
krige0(r ~ 1, sp, pt.red, vgmArea, vgm = vm) # check
## [,1]
## 1 5.356593
sp = as(sgdf, "SpatialPolygonsDataFrame")
gr = as(grd.red, "SpatialPolygons")
grd = addAttrToGeom(gr, data.frame(r=0), FALSE)
plot(as(sgdf, "SpatialPolygons"), axes = TRUE)
text(coordinates(sgdf), labels = sgdf[[1]])
## Error in text.default(coordinates(sgdf), labels = sgdf[[1]]): plot.new has not been called yet
plot(grd, add = TRUE, border = 'red')
## Error in polypath(x = mcrds[, 1], y = mcrds[, 2], border = border, col = col, : plot.new has not been called yet
kr = interpolate(r~1, gf, SLattice(grd), model = vm)
kr@observations[[1]]
## [1] 5.029828
krige0(r ~ 1, sp, grd, vgmArea, vgm = vm) # check
## [,1]
## 1 5.029828
For SpatialEntity
objects, we can estimate the density of objects; this
method reuses MASS:kde2d
:
e = SObjects(meuse, meuse.area)
d = density(e, newdata = meuse.grid)
lt = list(list("sp.polygons", meuse.area, border = 'black',
first=FALSE), list("sp.points", meuse, col = grey(.5), pch = 16))
class(d)
## [1] "SLattice"
## attr(,"package")
## [1] "mss"
spplot(d, sp.layout = lt)
The reason why this returns a SLattice
is that the
values computed (densities) do not correspond to quantities that are
measurable at point locations where they are registered (grid cells).
In fact, they are aggregated values for areas even much larger than
the grid cells.
SField
datam = SField(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(meuse.area), mean) # no warning
a[[1]]
## [1] 469.7161
a = aggregate(m, SLattice(meuse.area), sum) # warns:
## Warning: for SField objects, aggregation using a sum function is not
## considered meaningful
a[[1]]
## [1] 72806
SpatialEntity
datam = SObjects(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(meuse.area), mean) # warns:
## Warning: for SObjects objects, aggregation using a non-sum function may not
## be meaningful
a[[1]]
## [1] 469.7161
a = aggregate(m, SLattice(meuse.area), sum) # no warning
a[[1]]
## [1] 72806
Pebesma et al. (2014) make assertions A1-A7.
A prediction is meaningful, if it provides an estimate for a potential observation Prediction on a field variable generates no warning.
Where prediction of a SField
varialbe generates no warning,
m = SField(meuse, meuse.area) # data are point support
res = interpolate(log(zinc)~1, m) # interpolate from point TO point support
## [inverse distance weighted interpolation]
prediction of a point pattern variable generates a warning:
m = SObjects(meuse, meuse.area)
sa = SLattice(meuse.grid)
res = interpolate(log(zinc)~1, m, sa) # interpolate point pattern variable: warns
## Warning: interpolation of SObjects is not considered meaningful
Summing up values over an area is meaningful, if the observed window corresponds to the target geometry (grouping predicate) of the aggregation.
For SpatialEntity
data, spatial aggregation over an area larger than the
observation window generates a warning:
x = bbox(meuse)[1,]
y = bbox(meuse)[2,]
bb = Polygon(cbind(c(x[1],x[1],x[2],x[2],x[1]), c(y[1],y[2],y[2],y[1],y[1])))
bbx = SpatialPolygons(list(Polygons(list(bb),"bbox_meuse")),
proj4string = CRS(proj4string(meuse)))
m = SObjects(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(bbx), sum) # warns:
## Warning: aggregation over an area larger than the observation window is not
## considered meaningful
Stasch et al. 2014 prove A1 and A2 more formally.
Further assertions in the poster:
Polygon data may reflect constant values over areas (geology) or means (population density), this matters for (further) aggregation or downsampling.
Package mss
distinguishes between SField
and
SLattice
to represent constant values, and aggregations,
respectively. Subsampling SField
variabls does not not warn,
area = addAttrToGeom(meuse.area, data.frame(v=1), match.ID = FALSE)
f = SField(area, meuse.area)
pts = SField(meuse[1:5,], meuse.area)
over(pts, f) # retrieve f at the points of pts:
## v
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
but subsampling SLattice
variables triggers a warning, and no
useful value
a = SLattice(area)
over(pts, a)
## Warning: deriving field values from aggregations is not considered
## meaningful
## NULL
as it should really disaggregate first, from area to point.
The same is true for grid cells.
SField
and SLattice
both generalize polygon and grid data.
f1 = SField(meuse.grid, meuse.area, cellsArePoints = TRUE)
pts = SField(meuse[1:5,], meuse.area)
over(pts, f1) # matches pts to grid cell center points:
## part.a part.b dist soil ffreq
## 1 NA NA NA <NA> <NA>
## 2 NA NA NA <NA> <NA>
## 3 NA NA NA <NA> <NA>
## 4 NA NA NA <NA> <NA>
## 5 NA NA NA <NA> <NA>
f2 = SField(meuse.grid, meuse.area, cellsArePoints = FALSE)
over(pts, f2) # matches pts to the complete grid cell
## part.a part.b dist soil ffreq
## 1 1 0 0.00135803 1 1
## 2 1 0 0.01222430 1 1
## 3 1 0 0.10302900 1 1
## 4 1 0 0.19009400 2 1
## 5 1 0 0.27709000 2 1
a = SLattice(meuse.grid)
over(pts, a)
## Warning: deriving field values from aggregations is not considered
## meaningful
## NULL
Point pattern data often come without observation window, over which aggregation is meaningful
Attempting to create a SObjects
object without an observation window specified gives
a warning message:
pts <- SObjects(meuse)
## Warning: window set to the observation data features
Providing a window that does not contain all the points gives an error:
tr = try(pts <- SObjects(meuse, meuse[1:10,]))
attr(tr, "condition")
## <simpleError in validityMethod(object): one or more features are outside the observation window>
Field measurements often come without a notion for which locations interpolations based on them make sense.
m = SField(meuse["zinc"], meuse.area)
a = interpolate(log(zinc)~1, m, SLattice(bbx)) # warns:
## Warning: interpolating over an area larger than the domain is not
## considered meaningful
## [inverse distance weighted interpolation]
File types do not inform on meaningfulness
SpatialPoints
, SpatialPixels
, SpatialGrid
, SpatialLines
,
SpatialPolygons
and their *DataFrame
counterparts do not inform
whether data concern fields (geostatistical data), entities (point
patterns), or aggregations. The classes in mss
: SField
,
SObjects
, and SLattice
, do.