UseR! 2021: Jul 5, 2021, 15:30 UTC

Overview

How do we plot maps?

any two-dimensional plot of Earth-bound surface data involves projection

Data Scientist runs into spatial data…

So what do you do?

plot(map)

Where did that come from?

maps::map("world")
maps::map("usa")

?map
projection: character string that names a map projection to use.  See
          ‘mapproject’ (in the ‘mapproj’ library).  The default is to
          use a rectangular projection with the aspect ratio chosen so
          that longitude and latitude scales are equivalent at the
          center of the picture.

Projection 101

  • For small areas, projection is usually no big issue (Earth is nearly flat for small areas)
  • Projections cannot preserve distances; only distances to a single location
  • Projections preserve area or shape or direction or some compromise, or
  • There are many of them!
  • there is no need for
    • North to be up
    • Europe to be in the middle
  • but a projection of a randomly rotated Earth is often harder to read!

What is a straight line, anyway?

For simple features, some definitions:

  • feature: abstraction of real world phenomena, with geometrical properties
  • simple feature: feature with all geometric attributes described piecewise by straight line or planar interpolation between sets of points

Why is this such a big deal?

Straight lines, after (re)projection, are usually no longer straight. So, when are they straight?

GeoJSON IETF standard:

  • A line between two positions is a straight Cartesian line, the shortest line between those two points in the coordinate reference system. […]
  • The coordinate reference system for all GeoJSON coordinates is a geographic coordinate reference system, using the World Geodetic System 1984 (WGS 84) [WGS84] datum, with longitude and latitude units of decimal degrees.
  • but do GeoJSON users realise that?

This example is contrived, but relevant for every line except sections of meridians or the equator.

We can add nodes (st_segmentise), or remove them (st_simplify).

Spherical geometry

For data with ellipsoidal (long/lat) coordinates, sf 1.0-0 switched to using spherical geometry (\(S^2\)) rather than Cartesian geometry (\(R^2\)). Straight lines are now great circle segments on a sphere.

As the Earth’s shape is closer to a sphere than to a flat plane, this is a good thing, but surprises are going to show up for a while due to Cartesian habits for some 50+ years.

To get the “old” (pre-sf 1.0) behaviour, use

  • sf_use_s2(FALSE),
  • st_set_crs(NA), or
  • project to to +proj=eqc.

More discussion on this in the (upcoming) Spatial Data Science book by Roger Bivand and me; also do follow Dewey Dunnington.

Handling large spatial datasets with R

  • too large to
    • hold in memory (raster, terra, stars, sf, …)
    • keep in local storage
    • download (weather: ERA5, climate: CMIP6, earth observation: Landsat/Copernicus)
  • Google Earth Engine (and similar) don’t allow you to reproduce analysis, independently
  • openEO and openEO Platform are part of a larger initiative for allowing reproducible (open source, vendor independent) computing on large, cloud-based data archives, using R and other platforms
  • In that same context, we contributed to the STAC (spatio-temporal asset catalogue) specification, a modern, light-weight approach for for discovery of images and image collections

Lifecycle of R Spatial packages

R Spatial: an open ecosystem

Conclusions

  • Many data scientists will run some day into challenges with spatial data
  • R Spatial is an open and friendly community of people using the R package ecosystem for handling and analysing spatial data
  • It uses and interfaces a lot of software used by a much larger community (e.g. OSGEO Foundation)
  • sf now uses spherical geometry:
    • straight lines may need noding in the projection where they are (supposed to be) straight,
    • simplifying, if needed, should be done after projecting
    • automated noding may be needed at some stage
  • we should reconsider projections used in default plots (after all, stringsAsFactors is also no longer TRUE)
  • analysing large spatial data sets is, and will remain a challenge
  • rgdal and rgeos will retire Jan 1, 2024:
    • sf, terra, stars, … provide alternatives:
    • UseR!s and developers will have to migrate