2  Point Pattern data

Learning goals

  • Understand spatial data structures used in R
  • Understand what a point pattern and a point process is
  • Understand what an observation window is
  • Get familiar with the tools used in point pattern data analysis
  • See the alignment with MaxEnt

Reading materials

From Spatial Data Science: with applications in R:

  • Chapter 11: Point Patterns
  • Chapter 7: sf and stars
Summary
  • Intro to sf and stars
  • Intro to spatstat
  • Point patterns, density functions
  • Interactions in point processes
  • Simulating point process
  • Modelling density as a function of external variables

2.1 Intro to sf and stars

  • Briefly: sf provides classes and methods for simple features
    • a feature is a “thing”, with geometrical properties (point(s), line(s), polygon(s)) and attributes
    • sf stores data in data.frames with a list-column (of class sfc) that holds the geometries
the Simple Feature standard

“Simple Feature Access” is an open standard for data with vector geometries. It defines a set of classes for geometries and operations on them.

  • “simple” refers to curves that are “simply” represented by points connected by straight lines
  • connecting lines are not allowed to self-intersect
  • polygons can have holes, and have validity constraints: holes cannot extrude the outer ring etc.
  • All spatial software uses this: ArcGIS, QGIS, PostGIS, other spatial databases, …

Why do all functions in sf start with st_?

The larger geospatial open source ecosystem

R and beyond:

Figure 2.1: sf and its dependencies; arrows indicate strong dependency, dashed arrows weak dependency
Figure 2.2: upstream dependencies on OSGEO and other C/C++ libraries in R and Python spatial data sciece packages

sf operators, how to understand?

sf has objects at three nested “levels”:

  • sfg: a single geometry (without coordinate reference system)

  • sfc: a set of sfg geometries, with a coordinate reference system and bounding box

  • sf: a data.frame or tibble with at least one geometry (sfc) column

  • Operations not involving geometry (data.frame; base R; tidyverse)

    • geometry column + sf class is sticky!
    • this can be convenient, and sometimes annoying
    • use as.data.frame or as_tibble to strip the sf class label
  • Operations involving only geometry

    • predicates (resulting TRUE/FALSE)
      • unary
      • binary: DE9-IM; work on two sets, result sgbp, which is a sparse logical matrix representation
        • is_within_distance
    • measures
      • unary: length, area
      • binary: distance, by_element = FALSE
    • transformers
      • unary: buffer, centroid
      • binary: intersection, union, difference, symdifference
      • n-ary: intersection, difference
  • Operations involving geometry and attributes

    • many of the above!
    • st_join
    • aggregate
    • st_interpolate_aw: requires expression whether variable is spatially extensive or intensive

2.2 sf and spatstat

We can try to convert an sf object to a ppp (point pattern object in spatstat):

library(sf)
# Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
library(spatstat)
# Loading required package: spatstat.data
# Loading required package: spatstat.univar
# spatstat.univar 3.0-1
# Loading required package: spatstat.geom
# spatstat.geom 3.3-3
# Loading required package: spatstat.random
# spatstat.random 3.3-2
# Loading required package: spatstat.explore
# Loading required package: nlme
# spatstat.explore 3.3-2
# Loading required package: spatstat.model
# Loading required package: rpart
# spatstat.model 3.3-2
# Loading required package: spatstat.linnet
# spatstat.linnet 3.2-2
# 
# spatstat 3.2-1 
# For an introduction to spatstat, type 'beginner'
demo(nc, echo = FALSE, ask = FALSE)
pts = st_centroid(st_geometry(nc))
as.ppp(pts) # ???
# Error: Only projected coordinates may be converted to spatstat
# class objects

Note that sf interprets a NA CRS as: flat, projected (Cartesian) space.

Why is this important?

(p1 = st_point(c(0, 0)))
# POINT (0 0)
(p2 = st_point(c(1, 0)))
# POINT (1 0)
st_distance(p1, p2)
#      [,1]
# [1,]    1
st_sfc(p1, crs = 'OGC:CRS84')
# Geometry set for 1 feature 
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 0 ymin: 0 xmax: 0 ymax: 0
# Geodetic CRS:  WGS 84 (CRS84)
# POINT (0 0)
st_distance(st_sfc(p1, crs = 'OGC:CRS84'), st_sfc(p2, crs = 'OGC:CRS84'))
# Units: [m]
#        [,1]
# [1,] 111195
(p1 = st_point(c(0, 80)))
# POINT (0 80)
(p2 = st_point(c(1, 80)))
# POINT (1 80)
st_distance(p1, p2)
#      [,1]
# [1,]    1
st_distance(st_sfc(p1, crs = 'OGC:CRS84'), st_sfc(p2, crs = 'OGC:CRS84'))
# Units: [m]
#       [,1]
# [1,] 19309

Also areas:

p = st_as_sfc("POLYGON((0 80, 120 80, 240 80, 0 80))")
st_area(p)
# [1] 0
st_area(st_sfc(p, crs = 'OGC:CRS84')) |> units::set_units(km^2)
# 1620544 [km^2]
pole = st_as_sfc("POINT(0 90)")
st_intersects(pole, p)
# Sparse geometry binary predicate list of length 1, where the
# predicate was `intersects'
#  1: (empty)
st_intersects(st_sfc(pole, crs = 'OGC:CRS84'), st_sfc(p, crs = 'OGC:CRS84'))
# Sparse geometry binary predicate list of length 1, where the
# predicate was `intersects'
#  1: 1

What to do with nc? Project to \(R^2\) (flat space):

nc |> st_transform('EPSG:32119') |> st_centroid() -> pts
# Warning: st_centroid assumes attributes are constant over
# geometries
pts
# Simple feature collection with 100 features and 14 fields
# Attribute-geometry relationships: constant (6), aggregate (8)
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 149000 ymin: 36500 xmax: 898000 ymax: 306000
# Projected CRS: NAD83 / North Carolina
# First 10 features:
#     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID
# 1  0.114      1.44  1825    1825        Ashe 37009  37009        5
# 2  0.061      1.23  1827    1827   Alleghany 37005  37005        3
# 3  0.143      1.63  1828    1828       Surry 37171  37171       86
# 4  0.070      2.97  1831    1831   Currituck 37053  37053       27
# 5  0.153      2.21  1832    1832 Northampton 37131  37131       66
# 6  0.097      1.67  1833    1833    Hertford 37091  37091       46
# 7  0.062      1.55  1834    1834      Camden 37029  37029       15
# 8  0.091      1.28  1835    1835       Gates 37073  37073       37
# 9  0.118      1.42  1836    1836      Warren 37185  37185       93
# 10 0.124      1.43  1837    1837      Stokes 37169  37169       85
#    BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79                  geom
# 1   1091     1      10  1364     0      19  POINT (385605 3e+05)
# 2    487     0      10   542     3      12 POINT (419198 306144)
# 3   3188     5     208  3616     6     260 POINT (458418 296669)
# 4    508     1     123   830     2     145 POINT (876266 298782)
# 5   1421     9    1066  1606     3    1197 POINT (752184 297618)
# 6   1452     7     954  1838     5    1237 POINT (789602 291533)
# 7    286     0     115   350     2     139 POINT (857738 297588)
# 8    420     0     254   594     2     371 POINT (815437 301289)
# 9    968     4     748  1190     2     844 POINT (689435 294013)
# 10  1612     1     160  2038     5     176 POINT (498892 294730)
(pp = as.ppp(pts))
# Marked planar point pattern: 100 points
# Mark variables: 
#    AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 
# SID74 NWBIR74 BIR79 SID79 NWBIR79
# window: rectangle = [148701, 898181] x [36519, 306144] units
st_as_sf(pp)
# Simple feature collection with 101 features and 15 fields
# Geometry type: GEOMETRY
# Dimension:     XY
# Bounding box:  xmin: 149000 ymin: 36500 xmax: 898000 ymax: 306000
# CRS:           NA
# First 10 features:
#     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID
# NA    NA        NA    NA      NA        <NA>  <NA>     NA       NA
# 1  0.114      1.44  1825    1825        Ashe 37009  37009        5
# 2  0.061      1.23  1827    1827   Alleghany 37005  37005        3
# 3  0.143      1.63  1828    1828       Surry 37171  37171       86
# 4  0.070      2.97  1831    1831   Currituck 37053  37053       27
# 5  0.153      2.21  1832    1832 Northampton 37131  37131       66
# 6  0.097      1.67  1833    1833    Hertford 37091  37091       46
# 7  0.062      1.55  1834    1834      Camden 37029  37029       15
# 8  0.091      1.28  1835    1835       Gates 37073  37073       37
# 9  0.118      1.42  1836    1836      Warren 37185  37185       93
#    BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79  label
# NA    NA    NA      NA    NA    NA      NA window
# 1   1091     1      10  1364     0      19  point
# 2    487     0      10   542     3      12  point
# 3   3188     5     208  3616     6     260  point
# 4    508     1     123   830     2     145  point
# 5   1421     9    1066  1606     3    1197  point
# 6   1452     7     954  1838     5    1237  point
# 7    286     0     115   350     2     139  point
# 8    420     0     254   594     2     371  point
# 9    968     4     748  1190     2     844  point
#                              geom
# NA POLYGON ((148701 36519, 898...
# 1            POINT (385605 3e+05)
# 2           POINT (419198 306144)
# 3           POINT (458418 296669)
# 4           POINT (876266 298782)
# 5           POINT (752184 297618)
# 6           POINT (789602 291533)
# 7           POINT (857738 297588)
# 8           POINT (815437 301289)
# 9           POINT (689435 294013)

2.3 Exercises

  1. Compute the distance between POINT(10 -90) and POINT(50 -90), assuming (i) these are coordinates in a Cartesian space, and (ii) these are geodetic coordinates. What are the units of the result?
  2. Load the nc dataset into your session (e.g. using library(sf); demo(nc)) and convert it into a stars object using (i) st_as_stars(), (ii) st_rasterize(), (iii) st_interpolate_aw()
  3. Load the L7_ETMs dataset into your session (e.g. using library(stars); L7_ETMs = st_as_stars(L7_ETMs)), and convert the object to an sf object (i) using st_as_sf(), (ii) using st_as_sf(..., as_points = TRUE), and explain the differences (also plot the resulting sf objects). Randomly sample 100 points from the bounding box of L7_ETMs, and extract the image values at these points using st_extract(), and convert the result into an sf object.

2.4 Intro to spatstat

Consider a point pattern that consist of

  • a set of known coordinates
  • an observation window

We can ask ourselves: our point pattern be a realisation of a completely spatially random (CSR) process? A CSR process has

  1. a spatially constant intensity (mean: first order property)
  2. completely independent locations (interactions: second order property)

e.g.

library(spatstat)
set.seed(13431)
CSR = rpoispp(100)
plot(CSR)

Or does it have a non-constant intensity, but otherwise independent points?

ppi = rpoispp(function(x,y,...) 500 * x)
plot(ppi, main = "inhomogeneous")

Or does it have constant intensity, but dependent points:

cl <- rThomas(100, .02, 5)
plot(cl, main = "clustered")

hc <- rHardcore(0.05,1.5,square(50)) 
plot(hc, main = "inhibition")

or a combination:

#ff <- function(x,y) { 4 * exp(2 * abs(x) - 1) }
ff <- function(x,y) 10 * x
Z <- as.im(ff, owin())
Y <- rMatClust(10, 0.05, Z)
plot(Y)

2.5 Checking homogeneity

(q = quadrat.test(CSR))
# Warning: Some expected counts are small; chi^2 approximation may be
# inaccurate
# 
#   Chi-squared test of CSR using quadrat counts
# 
# data:  CSR
# X2 = 25, df = 24, p-value = 0.9
# alternative hypothesis: two.sided
# 
# Quadrats: 5 by 5 grid of tiles
plot(q)

(q = quadrat.test(ppi))
# 
#   Chi-squared test of CSR using quadrat counts
# 
# data:  ppi
# X2 = 81, df = 24, p-value = 8e-08
# alternative hypothesis: two.sided
# 
# Quadrats: 5 by 5 grid of tiles
plot(q)

2.6 Estimating density

  • main parameter: bandwidth (sigma): determines the amound of smoothing.
  • if sigma is not specified: uses bw.diggle, an automatically tuned bandwidth

Correction for edge effect?

density(CSR) |> plot()
plot(CSR, add = TRUE, col = 'green')

density(ppi) |> plot()
plot(ppi, add = TRUE, col = 'green')

density(ppi, sigma = .05) |> plot()
plot(ppi, add = TRUE, col = 'green')

2.7 Assessing interactions: clustering/inhibition

The K-function (“Ripley’s K”) is the expected number of additional random (CSR) points within a distance r of a typical random point in the observation window.

The G-function (nearest neighbour distance distribution) is the cumulative distribution function G of the distance from a typical random point of X to the nearest other point of X.

envelope(CSR, Lest) |> plot()
# Generating 99 simulations of CSR  ...
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
# 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
# 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
# 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
# 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
# 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
# 99.
# 
# Done.

envelope(cl, Lest) |> plot()
# Generating 99 simulations of CSR  ...
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
# 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
# 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
# 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
# 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
# 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
# 99.
# 
# Done.

envelope(hc, Lest) |> plot()
# Generating 99 simulations of CSR  ...
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
# 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
# 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
# 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
# 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
# 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
# 99.
# 
# Done.

envelope(ppi, Lest) |> plot()
# Generating 99 simulations of CSR  ...
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
# 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
# 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
# 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
# 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
# 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
# 99.
# 
# Done.

envelope(ppi, Linhom) |> plot()
# Generating 99 simulations of CSR  ...
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
# 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
# 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
# 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
# 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
# 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
# 99.
# 
# Done.

envelope(Y , Lest) |> plot()
# Generating 99 simulations of CSR  ...
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
# 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
# 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
# 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
# 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
# 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
# 99.
# 
# Done.

envelope(Y , Linhom) |> plot()
# Generating 99 simulations of CSR  ...
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
# 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
# 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
# 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
# 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
# 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
# 99.
# 
# Done.

2.8 Fitting models to clustered data

# assuming Inhomogeneous Poisson:
ppm(ppi, ~x)
# Nonstationary Poisson process
# Fitted to point pattern dataset 'ppi'
# 
# Log intensity:  ~x
# 
# Fitted trend coefficients:
# (Intercept)           x 
#        4.33        1.96 
# 
#             Estimate  S.E. CI95.lo CI95.hi Ztest  Zval
# (Intercept)     4.33 0.174    3.99    4.67   *** 24.91
# x               1.96 0.247    1.48    2.45   ***  7.96
# assuming Inhomogeneous clustered:
kppm(Y, ~x)
# Inhomogeneous cluster point process model
# Fitted to point pattern dataset 'Y'
# Fitted by minimum contrast
#   Summary statistic: inhomogeneous K-function
# 
# Log intensity:  ~x
# 
# Fitted trend coefficients:
# (Intercept)           x 
#        3.69        1.47 
# 
# Cluster model: Thomas process
# Fitted cluster parameters:
# kappa scale 
# 7.731 0.038 
# Mean cluster size:  [pixel image]
# 
# Cluster strength: phi =  7.122
# Sibling probability: psib =  0.8769

2.9 MaxEnt and species distribution modelling

It seems that MaxEnt fits an inhomogeneous Poisson process

Starting from presence (only) observations, it

  • adds background (absence) points, uniformly in space
  • fits logistic regression models to the 0/1 data, using environmental covariates
  • ignores spatial interactions, spatial distances

R package maxnet does that using glmnet (lasso or elasticnet regularization on)

A maxnet example using stars is available in the development version, which can be installed directly from github by remotes::install_github("mrmaxent/maxnet") ; and the same maxnet example using terra (thanks to Ben Tupper).

Relevant papers:

2.10 Exercises

  1. From the point pattern shown in section 1.3, download the data as GeoPackage, and read into R
  2. Read the boundary of Germany using rnaturalearth::ne_countries(scale = "large", country = "Germany")
  3. Create a plot showing both the observation window and the point pattern
  4. Do all observations fall inside the observation window?
  5. Create a ppp object from the points and the window
  6. Create a density map of the wind turbines, with the turbines added
  7. Test whether the point pattern is homogeneous
  8. Create a plot with the (estimated) density of the wind turbines, with the turbine points added
  9. Verify that the mean density multiplied by the area of the window approximates the number of turbines
  10. Test for interaction: create diagnostic plots to verify whether the point pattern is clustered, or exhibits repulsion

2.11 Further reading

  • E. Pebesma, 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10:1, 439-446.
  • A. Baddeley, E. Rubak and R Turner, 2016. Spatial Point Patterns: methodology and Applications in R; Chapman and Hall/CRC 810 pages.
  • J. Illian, A. Penttinen, H. Stoyan and D. Stoyan, 2008. Statistical Analysis and Modelling of Spatial Point Patterns; Wiley, 534 pages.