Geospatial Data Science in R

Published

January 22, 2025

Introduction

Required packages

The following packages may be used during the course; it is assumed that you know how to install packages, and have permission to do so on your computer.

R version

We require R 4.1.0 (as we’ll be using the native pipe |>)

CRAN packages:

install.packages(c(
"sf",
"stars",
"terra",
"s2",
"gstat",
"spatstat",
"spdep",
"spatialreg",
"mgcv"
))

non-CRAN packages:

r-inla: see https://www.r-inla.org/download-install

starsdata and spDataLarge:

options(timeout = 3600) # 1 hr instead of 1 min
install.packages("spDataLarge", repos = "https://nowosad.github.io/drat/", 
                 type = "source") # 23 Mb
install.packages("starsdata", repos = "http://cran.uni-muenster.de/pebesma/", 
                 type = "source") # 1 Gb

Introduction to the course

  • introduction of the tutor
  • introduction of course participants, please state
    • name,
    • where you’re from,
    • which program you are in

How we work

Sessions are from 9:00-14:00 CET daily schedule:

  • 9:00 - 9:45 lecture block I
  • 10:00 - 10:45 lecture block II
  • 11:00 - 14:00 exercise block

Further:

  • please raise hands or speak up whenever something comes up
  • please share questions you run into in your actual research, preferably with (example) data and R code

Syllabus

  • day 1: the new spatial stack in R: simple features, DE-9IM, vector and raster data, data cubes; the spatial statistics data types
  • day 2 operations, raster-vector, vector-raster: geometry measures, predicates, and transformers spherical geometry raster-vector: polygonizing, extracting vector-raster: rasterize, interpolate, density up- and down-scaling: aggregation, sampling, area-weighted interpolation, dasymetric mapping
  • day 3. inference: spatial correlation, fitting models: spatial correlation for point patterns, geostatistical data, and lattice data fitting regression models under spatial correlation
  • day 4 prediction and simulation; point patterns: fitting densities, simulating point patterns geostatistics: kriging interpolation, conditional simulation, simulating GMRFs

Resources

Reading as preparation: students may want to read from this book chapters: 3-7, 10-12, 14-16

Why R for spatial statistics / geospatial data science?

  • R is old! Think of the advantages!
  • R is as good as any data science language, but is more in focus with the statistical community
  • Most researchers in spatial statistics who share code have used or use R
  • R has a strong ecosystem of users and developers, who communicate and collaborate (and compete, mostly in a good way)
  • R spatial packages have gone full cycle:
    • the first generation has been deprecated (rgdal, rgeos, maptools),
    • then removed from CRAN, and
    • superseded by modern versions (sf and stars replaced sp, terra replaced raster)
  • R is a data science language that allows you to work reproducibly
  • Because we have CRAN and CRAN Taskviews: Spatial, SpatioTemporal, Tracking

Reproducing or recreating the current course

  • Go to https://github.com/edzer/madrid/
  • Go to “Code”, then “copy URL to clipboard”
  • Clone this repo to your hard drive
  • Start RStudio by double clickign the madrid.Rproj file in the source directory
  • Reproduce these course materials by installing quarto and
    • in RStudio: run build - render book, or
    • on the command line: run quarto render in the course directory
  • Run individual code sections in RStudio, and modify them!

Exercises

  1. Install the spDataLarge package (see instructions above)
  2. Copy (or clone) the course material from GitHub to your local machine
  3. Open it in RStudio
  4. Open the day1.qmd file. Try to identify a code chunk.
  5. Run the first code chunk.
  6. Skip to the last code chunk; run all code chunks above it (by a single click), and then run this last code chunk.
  7. Render the entire course “book”, view the result by opening _book/index.html in a web browser (from Rstudio)