Preamble

The source R-Markdown (.Rmd) file for this html document is found here; here is the zip file with included figures.

A list of package is found at the end of this page. The packages this script attaches can be installed by running

pkg = c("sp", "xts", "zoo", "spacetime", "trajectories",
"maptools", "maps", "mapview", "leaflet", "rglwidget",
"rgl", "RColorBrewer", "ggmap", "ggplot2", "dplyr",
"rmarkdown", "units")
install.packages(pkg)

To re-create (reproduce) this html, using knit html in Rstudio, have all packages installed, and then:

Introduction

Two quotes from Cobb and Moore, 1997,

Data are not just numbers, they are numbers with a context

and

in data analysis, context provides meaning

illustrate that data analysis without context is essentially meaningless. What is context? We can think of

We can think of the who, how and why question referring to pragmatics of data collection, where when, where and what refer to data semantics. Reference systems relate numbers (or words, or symbols) to the real world; much of this tutorial is about reference systems for space, time and attributes.

At a higher level, reference systems can describe phenomena in terms of how attributes and identity relate to space and time; in particular whether they are continuous or discrete matters for what we can meaningfully do with data obtained from these phenomena (Scheider et al., 2016).

Time in R

Base R has some infrastructure to annotate measurement units, in particular for time and date information:

now = Sys.time() + c(0, 3600)
today = Sys.Date() + 0:1

which represent numeric values pointing to the number of seconds or days elapsed since Jan 1, 1970, 00:00 UTC:

as.numeric(now)
## [1] 1474359906 1474363506
as.numeric(today)
## [1] 17064 17065

The class attribute of these objects

class(now)
## [1] "POSIXct" "POSIXt"
class(today)
## [1] "Date"

make it behave meaningfully like time:

now * now
## Error in Ops.POSIXt(now, now): '*' not defined for "POSIXt" objects
now + now
## Error in `+.POSIXt`(now, now): binary '+' is not defined for "POSIXt" objects
(dt = now[2] - now[1])
## Time difference of 1 hours

Time differences

This last quantity, dt has a class

class(dt)
## [1] "difftime"

but also a units attribute, which can be retrieved by the units method:

units(dt)
## [1] "hours"

but can also be set to other values, e.g.

units(dt) = "days"
dt
## Time difference of 0.04166667 days
units(dt) = "secs"
dt
## Time difference of 3600 secs

Measurement units

Beyond time and time differences, more general support for SI units and derived units is provided by CRAN package units, which builds upon CRAN package udunits2 and the external udunits library, from UNIDATA. It checks compatibility of units, does unit conversion on the fly

units_file = system.file("share/udunits2.xml", package="udunits2")
if (units_file == "") stop("install package udunits2 first")
Sys.setenv(UDUNITS2_XML_PATH = units_file) # cope with bug in udunits2 0.8.1 on Win & Mac
library(units)
m = with(ud_units,  m)
s = with(ud_units,  s)
km = with(ud_units, km)
h = with(ud_units,  h)
x = 1:3 * m/s
xkmh = x
units(xkmh) = km/h        # explicit conversion
xkmh
## Units: km/h
## [1]  3.6  7.2 10.8
(y = 1:3 * km/h)          # something different
## Units: km/h
## [1] 1 2 3
x + y                     # implicit conversions
## Units: m/s
## [1] 1.277778 2.555556 3.833333
y + x
## Units: km/h
## [1]  4.6  9.2 13.8
c(x, y)
## Units: m/s
## [1] 1.0000000 2.0000000 3.0000000 0.2777778 0.5555556 0.8333333

it creates derived units where appropriate

x * y
## Units: km*m/h/s
## [1] 1 4 9
x / y
## Units: 1
## [1] 3.6 3.6 3.6
log(x)
## Units: ln(m/s)
## [1] 0.0000000 0.6931472 1.0986123

and also gives meaningful warnings when units are not compatible:

x + x * y
## Error in `units<-.units`(`*tmp*`, value = structure(list(numerator = "m", : cannot convert km*m/h/s into m/s

as udunits parses units, they can become arbitrarily complex:

u1 = make_unit(paste0(rep("m", 100), collapse = "*"))
u2 = m^100
1:3 * u2 + 1:3 * u1
## Units: m^100
## [1] 2 4 6

Summarizing, measurement units

  • convert units explicitly, or implicitly/on the fly
  • catch operations that are algebraically correct but physically meaningless
  • help carry out dimensional analysis

and provide, in that sense, part of the context of data.

Why don’t we treat space and time as special cases of SI units?

Measures of physical quantities such as length, mass, duration have a natural, absolute zero value. When measuring absolute time (when) and location (where), we need a reference.

For time, we have Coordinated Universal Time (UTC); different time zones and DST rules are defined with respect to UTC.

For space, different geodetic datums exist; the best known being the WGS84 ellipsoid (``world geodetic system-1984’’)

WGS84 is a global reference system, and has larger errors (deviations from the geoid, the `mean sea level’ surface) than ellipsoids that are fitted locally. For that reason, local ellipsoids might better fit the Earth locally, and are prefered for particular surveying projects. Coordinate transformations move from one geodetic datum (ellispoid) to another; these transformation may be non-unique, non-invertible, and approximate.

Projecting longitude/latitude degrees to a flat representation (such as UTM, or Web Mercator) is called coodinate conversion; this process is usually accurate and invertible.

When are space and time not relevant?

For a doctor, the identity and age of the patient are more relevant than the location of the patient and date of the examination. An MRI scan of a patient’s brain is understood with reference to the skull and other reference points, not with reference to the geographical location f the MRI scanner.

Many data sets come without information about space and time; this usually indicates that these aspects were not considered relevant. For the co2 dataset,

plot(co2)

the location of measurements (Mauna Loa) can be derived from the URL mentioned in ?co2, or even better, from the information the URL points to. For the mtcars dataset, ?mtcars reveals when the car models described were current (1973-4).

Time series data

We have seen so far mostly time information, but not information that is associated with time, or time series data.

Classes for time series data

The co2 data set is an example of this, as a time series:

class(co2)
## [1] "ts"
summary(co2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   313.2   323.5   335.2   337.1   350.3   366.8

The ts class, part of base R, accomodates for regular time series, and does not use POSIXt for time:

attributes(co2)
## $tsp
## [1] 1959.000 1997.917   12.000
## 
## $class
## [1] "ts"

Many of the methods for ts do not deal well with semi-regular time series, which are regular but contain missing values.

Approaches to deal with irregular time series are found in packages its, zoo and xts. Package xts builds on POSIXt as the time index, and extends zoo. The combination has nice features for

  • selection, supporting ISO 8601 strings for time periods
  • aggregation, using functions that cut time in particular intervals

What follows is an example where the string “2000:2007” selects all data for these three years, and the aggregate method (from zoo) uses the function as.yearmon to cut a time period into months, convert these into years, and to compute yearly averages:

data(air, package = "spacetime")
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(xts)
pm10 = xts(t(air), dates)
as.year <- function(x) as.numeric(floor(as.yearmon(x)))
plot(aggregate(pm10["2000::2007", 1], as.year, mean, na.rm = TRUE),
    ylab = "PM10 (ppm)")
title("Yearly averaged PM10 values, 2000-2007")

Extensive or intensive properties?

Physical properties can be extensive or intensive, depending on whether the property changes when the size (extent) of the system changes:

  • exptensive properties are for instance mass and size: if we cut a rock into pieces, mass and size of a piece change
  • intensive properties are e.g. temperature or density; these do not necessarily change with size

In the aggregation example above, we computed the yearly mean values, but when examining ?aggregate.zoo, we’d find out that the default aggregation function is sum, so

a = aggregate(pm10["2000::2007", 1], as.year, na.rm = TRUE)
mean(a)
## [1] 7176.552

computes yearly total pm10, from monthly averages, which is not physically meaningful for an intensive property like pm10.

Events, continuous series, aggregations?

Time series data, as represented by the packages mentioned above, do not distinguish events from continuous processes: ``values’’ of something are simply associated with a time stamp. Yet, for meaningful analysis, their difference could not be larger:

  • for events, we can compute counts, or densities over time, but we cannot interpolate; for instance, the temporally interpolated value of earthquake magnitudes for a moment with no earthquake does not make sense
  • for continuous processes, counts or densities only reveal properties of the sampling procedure, and not of the observed phenomenon (which could have been observed any moment)

Many time series data (co2 and pm10 above being two examples) concern temporally aggregated values. Yet, their value is not explicitly registered with the aggregation period, but rather with the starting time of this interval. If, however, instantaneous measurements of the same variables would be available (co2 or pm10 measured at the start of the month), they would be stored identically. If time periods would be registered explicitly, we would in addition have to specify how the associated (co2, pm10) value relates to this interval, possibilities are:

  • the value is constant throughout this interval
  • the value was aggregated over this interval (in which case the aggregation function needs to be specified)

Spatial data

Classes for spatial data

A large majority of spatial data constitute, roughly, of points, lines, polygons or grids. As described in Bivand, Pebesma and Gomez-Rubio (2013), package sp provides basic infrastructure for this, and supports the following classes:

The geometry-only classes, SpatialXxx, derive from Spatial and share a bounding box and coordinate reference system (proj4string); all the SpatialXxxDataFrame classes extend this with a data.frame carrying attributes associated with the geometries.

Important reasons why many think that using sp is a good idea include:

  • reinventing the wheel too often creates duplicate work, and calls for many-to-many conversions
  • through rgdal, sp supports reading and writing to and from all 142 raster and 84 vector formats supported by GDAL
  • through rgeos, an interface to GEOS, sp objects can be used for all DE-9IM intersections, including topological predicates like touches, overlaps, intersects, contains, etc., and also create buffers, unite polygons etc.

An example: why polygons are complex

`Single’ geometry entries for polygons and lines can consist of multiple polygons or lines, to accomodate real-word data. In the following example we see a state consisting of multiple polygons, one of them containing a hole:

data(air, package = "spacetime")
library(sp)
nds = DE_NUTS1["Niedersachsen",]
library(ggmap)
## Loading required package: ggplot2
bgMap = get_map(as.vector(bbox(nds)), source = "google", zoom = 7)
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=52.632009,9.116023&zoom=7&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
par(mar = rep(0,4))
merc = CRS("+init=epsg:3857")
plot(spTransform(nds, merc), bgMap = bgMap, col = grey(.5, alpha = .5))

We can examine the data interactively, thanks to the excellent mapview package:

library(mapview)
## Loading required package: leaflet
mapview(nds)
## Warning in spCheckObject(x): Columns NL_NAME_1, CC_1, REMARKS_1 in
## attribute table contain only NA values and are dropped.

When properly plotting data, we need to know which hole polygon belongs to which enclosing polygon, and this makes it difficult (if not impossible) to encode polygons in simple tables such as propagated by the tidy data framework. The fortify method in ggplot2 destroys this structure, and the ggplot2 wiki explains why ggplot2 cannot properly plot polygons with holes.

In the plot above, we can see that the border of nds does not correspond very well with that of openstreetmap. This may be due to an outdated version of DE_NUTS1 in the spacetime package.

Exercise

  1. visit gadm
  2. download the German administrative boundaries as a shapefile from gadm.org
  3. unzip the files starting with DEU_adm1, and register in which directory they are
  4. read them in with library(rgdal); x = readOGR(".", "DEU_adm1"), where you replace . with the right directory
  5. plot Lower Saxony with mapview (mapview(x[9,]))
  6. check whether the Dutch/German boundaries are better, now.

Selecting features

As we’ve seen above, we can use the [ subset operator on Spatial objects to select geometries/features (rows), or particular attributes, similar to how this is done for data.frame objects. Further functionality is obtained when we use a Spatial object as selector, as in

par(mar = c(4,4,4,1))
plot(DE_NUTS1[nds,], col = "#FFDDDD", axes = TRUE) # all states touching Lower Saxony;
plot(nds, col = grey(.5), add = TRUE) # .. which is grey.

this carries out an intersection, and returns intersecting geometries, for any Spatial object type.

Aggregation

We can compute yearly mean pm10 values by station, by

yearmeans = aggregate(pm10, as.year, mean, na.rm = TRUE)
# merge mean matrix (rows: year; cols: station ) with station data:
s = SpatialPointsDataFrame(stations, data.frame(t(yearmeans)))
spplot(s, names.attr = index(yearmeans), sp.layout = list(DE_NUTS1, col = grey(.8)), 
    as.table = TRUE, colorkey = TRUE)

Suppose that we want state mean pm10 concentrations by year; we can do this by spatial aggregation:

a = aggregate(s, DE_NUTS1, mean, na.rm = TRUE) 
spplot(a, names.attr = index(yearmeans), as.table = TRUE, 
    main = "state mean pm10 of rural background stations")

Here, the by argument (2nd arg) of aggregate is a Spatial object; spatial intersection yields the aggregation predicate.

Spatial statistics, further packages

Methods for statistically analyzing spatial data are well described in the spatial statistics literature. Other R packages that deal with handling or analyzing spatial data are categorized and described in the CRAN Task View on Spatial Data.

Spatiotemporal data, movement data

We’ve seen above how temporal data can be stored in Spatial objects – by storing variables associated with different times as different attributes. This is not optimal, because

There is also a CRAN Task View on SpatialTemporal Data that categorizes and describes R packages for handling and analyzing spatiotemporal data, including movement data.

spacetime: Class structure

Building on xts and sp, R package spacetime provides classes for spatiotemporal data that solve both issues. The class diagram is given here:

STF and STS objects cater spatial panel data (or longitudinal data), where each spatial entity has an identical set of time referenced obvservations associated with it; examples are

  • employment data per year, per state
  • fixed monitoring stations, measuring air quality every hour
  • time sequences of satellite imagery

Note that these classes only require a space time lattice layout:

  • spatial properties can be anything (points, lines, polygons, grid cells),
  • time sequences may be irregular

Package spacetime comes with routines to read data from space-wide tables (where columns reflect different spatial features), time-wide tables (where columns reflect different times), and long tables (where each space/time combination has a row entry); examples of all three are given in Pebesma, 2012. What follows is an example reading from a long table.

Reading from tables

A long table with panel data is found in package plm:

data("Produc", package = "plm")
Produc[1:5,1:9]
##     state year region     pcap     hwy   water    util       pc   gsp
## 1 ALABAMA 1970      6 15032.67 7325.80 1655.68 6051.20 35793.80 28418
## 2 ALABAMA 1971      6 15501.94 7525.94 1721.02 6254.98 37299.91 29375
## 3 ALABAMA 1972      6 15972.41 7765.42 1764.75 6442.23 38670.30 31303
## 4 ALABAMA 1973      6 16406.26 7907.66 1742.41 6756.19 40084.01 33430
## 5 ALABAMA 1974      6 16762.67 8025.52 1734.85 7002.29 42057.31 33749

Since the data contains no spatial information (states) or temporal information as POSIXct, we need to construct these – states:

library(maps)
states.m = map('state', plot=FALSE, fill=TRUE)
IDs <- sapply(strsplit(states.m$names, ":"), function(x) x[1])
library(maptools)
## Checking rgeos availability: TRUE
states = map2SpatialPolygons(states.m, IDs=IDs)

years:

yrs = 1970:1986
time = as.POSIXct(paste(yrs, "-01-01", sep=""), tz = "GMT")

When combining all this information, we do not need to reorder states because states and Produc order states alphabetically. We need to de-select District of Columbia, which is not present in Produc:

# deselect District of Columbia, polygon 8, which is not present in Produc:
library(spacetime)
Produc.st = STFDF(states[-8], time, Produc[order(Produc[2], Produc[1]),])
library(RColorBrewer)
stplot(Produc.st[,,"unemp"], yrs, col.regions = brewer.pal(9, "YlOrRd"), cuts = 9)

Movement data

As can be seen from the CRAN Task View on SpatialTemporal Data, there are quite a few R packages dealing with movement data, many of which originate in the ecology domain.

The trajectories tries to take a more general approach, but uses many of the ideas e.g. from adehabitatLT. In particular, it acknowleges that, when we observe movement, we often end up having

  • a period of movement registration for a particular item, which has a begin and end time that may not correspond to the life time of the item (Track, burst, trip),
  • potentially multiple tracks collected on a single item, with time gaps inbetween during which the item was not tracked (a set of tracks: Tracks)
  • multiple items for which Tracks are collected (TracksCollection)

Think of registering a person’s movement with a mobile phone or GPS receiver:

  • a sequence of location registrations, e.g. once per minute, is collected
  • multiple sequences are obtained when a sequence gets interrupted, e.g. when the phone’s battery is empty, or the registration is stopped for some reason (no GPS reception? privacy?)
  • the analysis may concern multiple persons

For this reason, trajectories offers Track, Tracks and TracksCollection objects to organize such data:

Examples of open trajectory data of various kinds are

  • The Argo buoys,
  • The Envirocar car tracks (locations + ODB-II in-car sensor data); REST api, uses GeoJSON
  • Geolife: 182 persons in China tracked for some months, some of whom kept activity (transportation mode) diaries (1.7 Gb when unzipped)
  • Google location history (open to owner only): gives fixes with errors, but also activity classification probabilities
  • Citibike NYC (only origing and destination fixes)
  • Atlantic hurricane tracks since 1851

Except for the Argo buoys, trajectories contains demo scripts for all these examples.

Exercise

  1. check out which demo scripts package trajectories has, using command demo (and look into its help page)
  2. run some of these demos,
  3. try to understand what is going on, and what the demo tries to do
  4. for a particular data set, which question could you formulate that could be answered by analyzing these data?

plotting methods

trajectory methods

Methods that Track, Tracks or TracksCollection share with the spatial family include

  • [[, $, [ retrieve or replace attributes, or select Track or Tracks
  • stplot create space-time plot
  • aggregate spatially aggregate track properties (coercing fixes to points)
  • bbox retrieve spatial bounding box
  • coordinates retrieve coordinates of fixes
  • coordnames retrieve coordinate names of fixes
  • over intersect spatially (coercing tracks to SpatialLines)
  • plot simple plot methods
  • proj4string retrieve coordinate reference system
  • spTransform (re)project coordinates to new coordinate reference system, or unproject
  • summary print simple summary

Methods that are unique for the Track family are

  • compare compares two Track objects: for the common time period, a difftrack object is created with all distances between them (approximating linearly when times of fixes don’t match)
  • dists compares two Tracks (two sets of Track elements) using mean distance, or Frechet distance
  • downsample remove fixes from a Track, starting with the most densely sampled ones
  • frechetDist compute Frechet distance between two Track objects
  • stcube plot space-time cube for Track objects, possibly with openstreetmap base
  • generalize resample track to lower freqency or minimal distance
  • stbox print the space-time bounding box
r = read.csv("http://pebesma.staff.ifgi.de/pm10/3_2a.txt")
time = as.POSIXct(strptime(paste(r$Date, r$Time), "%Y/%m/%d %H:%M:%S"))
require(sp)
pts = SpatialPoints(r[c("Long", "Lat")], CRS("+init=epsg:4326")) # WGS84 long/lat
library(spacetime)
library(trajectories)
tr = Track(STIDF(pts, time, r["pm10"]))
stbox(tr)
##         Long      Lat                time
## min 7.594638 51.94583 2010-03-23 10:02:24
## max 7.636554 51.98394 2010-03-23 12:52:05

Some basic plots:

plot(as(tr, "STIDF"))

gives a space-time plot, where space indicats the index of each spatial item; the plot indicates that there were a few short breaks during the data collection.

Plotting the Track, as

plot(tr)

shows the spatial path followed, without anything else. With stplot

stplot(tr)

we get the plot of the STIDF object; it cuts time in 6 equal pieces, and plots those.

library(rgl)
library(rglwidget) 
stcube(tr)
rglwidget(elementId="plot3d")

shows the 3-D, interactive space-time track, which even looks nicer when you try

stcube(tr, showMap = TRUE)

The mapview of the complete trajectory shows a wealth of points:

require(mapview)
mapview(as(tr, "SpatialPointsDataFrame"), zcol = "pm10", legend = TRUE)

When we generalize this, we get a SpatialLines object, with (by default) averaged pm10 values per line segment:

tr0 = generalize(tr, distance = 100)
class(tr@sp)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
mapview(as(tr0, "Spatial"), zcol = "pm10", lwd = 5, legend = TRUE)

Multiple tracks

The following little program will read in all the 16 PM10 tracks, and plot them with mapview (this last command only works if sp devel is installed from github):

names = c("3_1a.txt", "3_2a.txt", "3_2b.txt", "3_3a.txt", "3_3b.txt", 
"3_3c.txt", "3_4a.txt", "3_5a.txt", "4_1a.txt", "4_2a.txt", "4_3a.txt", 
"5_1a.txt", "5_3a.txt", "5_4a.txt", "6_1a.txt", "6_2a.txt")
read.tr = function(f) {
    r = read.csv(paste0("http://pebesma.staff.ifgi.de/pm10/", f))
    time = as.POSIXct(strptime(paste(r$Date, r$Time), "%Y/%m/%d %H:%M:%S"))
    pts = SpatialPoints(r[c("Long", "Lat")], CRS("+init=epsg:4326")) # WGS84 long/lat
    Track(STIDF(pts, time, r["pm10"]))
}
trs = Tracks(lapply(names, function(f) read.tr(f)))
## Warning in Track(STIDF(pts, time, r["pm10"])): deselecting 1 secondary
## point(s) of zero duration interval(s)

## Warning in Track(STIDF(pts, time, r["pm10"])): deselecting 1 secondary
## point(s) of zero duration interval(s)
dim(trs)
##     tracks geometries 
##         16     104667
trs0 = generalize(trs, distance = 100)
dim(trs0)
##     tracks geometries 
##         16       1773
mapview(as(trs0, "Spatial"), zcol = "pm10", lwd = 5, legend = TRUE)

Distances between trajectories

Distances between pairs of tracks can be computed; this is often done as a first step before, or an ingredient of clustering tracks. The naive

dists(trs0,trs0)

computes spatial distances in synchronous time; since the tracks in this data sets are all taken on different days by a single person, all distances will be NA. Frechet distances, obtained by

d = dists(trs, trs, frechetDist)

ignores time but preserve ordering (direction); it takes a long time to compute.

Aggregation; densities

Aggregating trajectories and their attributes is not trivial:

  • if we’d coerce them to points, differences in sampling density/frequency bias the results
  • if we’d resample them to regular frequency, attributes need to be interpolated
  • we can think of different weighting scheme, depending on distance covered or time spent
  • as shown above, generalize aggregates points to lines, and associates aggregated point attributes to line segments

A few examples from a web site that aggregates google location history from “somebody” are here:

Smoothing: storms data

In the following example,

data(storms)
plot(storms, type = 'p')
## Warning in axis(side, at = at, labels = labels, ...): graphical parameter
## "type" is obsolete

## Warning in axis(side, at = at, labels = labels, ...): graphical parameter
## "type" is obsolete
## Warning in title(...): graphical parameter "type" is obsolete
smth = function(x, y, xout,...) {
  predict(smooth.spline(as.numeric(x), y), as.numeric(xout))
}
storms.smooth = approxTracksCollection(storms, FUN = smth, n = 200)
## Warning in validityMethod(object): tracks with overlapping time intervals:
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18
## Warning in validityMethod(object): tracks with overlapping time intervals:
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
## Warning in validityMethod(object): tracks with overlapping time intervals:
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
## Warning in validityMethod(object): tracks with overlapping time intervals:
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
plot(storms.smooth, add = TRUE, col = 'red')

Simulating random tracks

Random tracks (in free space) can be generated using a bivariate autoregressive process, an example is given here, along with a smooth:

opar = par(mfrow = c(1,2))
x = rTrack(ar = .4)
plot(x); title("rough")
x.smooth = approxTrack(x, FUN = smth, n = 800)
plot(x.smooth, add=T, col = 'red')
x = rTrack(ar = .8) # more smooth
plot(x); title("smooth")
x.smooth = approxTrack(x, FUN = smth, n = 800)
plot(x.smooth, add=T, col = 'red')

par(opar)

Real-world problems, work in progress

At the engineering level

  • fitting GPS fixes to a road network (map matching); initial work found here
  • properly representing road networks in R
  • simulate movement through a transportation network, e.g. from origin-destination matrices (using sumo?)
  • more support for non-point (i.e., lines, polygons, grids) movement data (e.g. oil spills, fires)
  • deal with out of memory data sets
  • develop proper spatial statistical inference methods for trajectory data (… answering which question?)
  • represent spatial data in an easier (list column in data.frame) and interoperable way, simple features for R (come and see my Wednesday talk in the Spatial session)

A practical theory for meaningful spatial statistics

When are spatial aggregation and prediction meaningful?

What do polygon values refer to? What do raster cell values refer to?

  • We think too much in data morphology: points, lines, polygons, grids, trajectories, …
  • We need to think in underlying phenomena, as functions composed of rererence systems
    • Basic reference systems: \(S\) space, \(T\) time, \(Q\) quality, \(D\) entity
    • Derived reference systems: \(R\) for regions/point sets, \(I\) for time intervals/instance sets
    • Fields: \(S \times T \Rightarrow Q\)
    • Lattices: \(R \times I \Rightarrow Q\)
    • Events: \(D \Rightarrow S \times T \times Q\)
    • (Trajectories: \(T \Rightarrow S\))
    • Objects: \(D \Rightarrow T \Rightarrow S \times Q\)
  • Read more in the paper on Modelling spatiotemporal information generation

sessionInfo()

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
##  [1] rglwidget_0.1.1434 rgl_0.95.1441      trajectories_0.1-5
##  [4] RColorBrewer_1.1-2 spacetime_1.2-0    maptools_0.8-39   
##  [7] maps_3.1.1         mapview_1.1.0      leaflet_1.0.1.9004
## [10] ggmap_2.6.1        ggplot2_2.1.0      sp_1.2-4          
## [13] xts_0.9-7          zoo_1.7-13         units_0.3         
## [16] rmarkdown_1.0     
## 
## loaded via a namespace (and not attached):
##  [1] reshape2_1.4.1      lattice_0.20-34     colorspace_1.2-6   
##  [4] htmltools_0.3.5     stats4_3.3.1        viridisLite_0.1.3  
##  [7] yaml_2.1.13         R.oo_1.20.0         foreign_0.8-66     
## [10] R.utils_2.3.0       jpeg_0.1-8          foreach_1.4.3      
## [13] plyr_1.8.4          stringr_1.1.0       rgeos_0.3-20       
## [16] munsell_0.4.3       gtable_0.2.0        raster_2.5-8       
## [19] R.methodsS3_1.7.1   htmlwidgets_0.7     RgoogleMaps_1.2.0.7
## [22] mapproj_1.2-4       codetools_0.2-14    evaluate_0.9       
## [25] latticeExtra_0.6-28 knitr_1.14          httpuv_1.3.3       
## [28] gdalUtils_2.0.1.7   proto_0.3-10        Rcpp_0.12.7        
## [31] xtable_1.8-2        geosphere_1.5-5     udunits2_0.11      
## [34] scales_0.4.0        satellite_0.2.0     formatR_1.4        
## [37] webshot_0.3.2       jsonlite_1.0        mime_0.5           
## [40] rjson_0.2.15        png_0.1-7           digest_0.6.10      
## [43] stringi_1.1.1       shiny_0.13.2        RJSONIO_1.3-0      
## [46] grid_3.3.1          rgdal_1.1-10        tools_3.3.1        
## [49] magrittr_1.5        iterators_1.0.8     R6_2.1.2           
## [52] intervals_0.15.1