```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, collapse = TRUE, error = TRUE)
```
## Overview
- Motivation
- What is BIPM, GUM, VIM
- review of quantities, units, and values (according to VIM)
- software
- spatial data science
---
---
General frustrations I have:
- we can describe (simple) features geometries pretty well, and we can describe feature attributes pretty well, but not how the two relate (exceptions maybe: coverage, and CF conventions)
General fears I have:
- data science and citizen data science imply that everyone now tries to do anything, without spatial experts involved, and with varying motivations
- data scientists like to think longitude and latitude are just two more variables
---
Lack of unit checking in practice:
```{r,echo=TRUE}
(apples = c(5,8,12,3))
(oranges = c(4,2,8,11))
apples + oranges # meaningless?
(speed1 = 55) # mile/hr
(speed2 = 34.5) # km/hr
speed1 + speed2 # wrong:
with(mtcars[1:3,], mpg + cyl) # wrong and meaningless:
```
## BIPM, VIM, GUM
- BIPM: _Bureau Internationale des Poids et Mesures_
- manages _SI_, the International System of Units (briefly: what is a meter, what is a kg, and so on)
The _Joint Committee for Guides in Metrology_ (JCGM) has responsibility for the following two publications:
- Guide to the Expression of Uncertainty in Measurement (known as the **GUM**); and
- International Vocabulary of Metrology – Basic and General Concepts and Associated Terms (known as the **VIM**).
(The following 7 slides are copied from the VIM)
## What is a quantity?
- **quantity**: property of a phenomenon, body, or substance, where the property has a magnitude that can be expressed as a number and a reference
- NOTE2: A reference can be a measurement unit, a measurement procedure, a reference material, or a combination of such.
- **system of quantities**: set of quantities together with a set of noncontradictory equations relating those quantities
- **base quantity** quantity in a conventionally chosen subset of a
given system of quantities, where no subset quantity can be expressed in terms of the others
----
| Base quantity | Symbol | SI base unit | Symbol |
| -----------|--------|----------|------|
| length | $l,x,r,$ etc.| meter | m |
| mass | $m$ | kilogram | kg |
| time, duration | $t$ | second | s |
| electric current| $I, i$ | ampere | A |
| thermodynamic temperature | $T$ | kelvin | K |
| amount of substance | $n$ | mole | mol |
| luminous intensity | $I_v$ | candela | cd |
## quantities
- **derived quantity**: quantity, in a system of quantities, defined in
terms of the base quantities of that system
- **(quantity) dimension**:
expression of the dependence of a quantity on the
base quantities of a system of quantities as a
product of powers of factors corresponding to the
base quantities, omitting any numerical factor
- **quantity of dimension one**
(dimensionless quantity):
quantity for which all the exponents of the factors
corresponding to the base quantities in its
quantity dimension are zero
## units
- **measurement unit**:
real scalar quantity, defined and adopted by
convention, with which any other quantity of the
same kind can be compared to express the ratio of
the two quantities as a number
- **base unit**: measurement unit that is adopted by convention
for a base quantity (e.g., m, kg)
NOTE 3: For number of entities, the number one,
symbol 1, can be regarded as a base unit in any system
of units.
## values
- **quantity value** (or value):
number and reference together expressing magnitude
of a quantity, e.g. $15 ~ m^2$ which is short for $15 \times 1 m^2$.
## measurement, measurement error
- **measured quantity value** (measured value):
quantity value representing a measurement result
- **random measurement error**
component of measurement error that in replicate
measurements varies in an unpredictable manner
- ... and so on
## computing with units
the dimension of a quantity Q is denoted by
$$ \mbox{dim}~ Q = L ^\alpha M^\beta T^\gamma I^\delta Θ^\epsilon N^\zeta J^\eta
$$
where the exponents $\alpha,...,\eta$, named
dimensional exponents, are positive, negative, or zero.
- two _values_ can be compared if and only if their dimensional exponents are identical
- for adding/subtraction, units may need to be converted (e.g., km/h to m/s)
- the dimension of a product (ratio) of two values is the sum (difference) of their exponents
## software for units
shall
- contain a database of units, derived units, prefixes and so on
- provide functions to verify two values are compatible
- provide functions to convert values, if they are compatible
- help users avoid making mistakes, related to units and dimensions
Examples:
- udunits (UNIDATA): C, actively supported
- Unified Code for Units of Measure (UCUM), preferred by OGC (but why?)
- R package `units` (which uses udunits)
- python, C++ (boost), Julia, many more...
## Examples
```{r echo=TRUE}
suppressPackageStartupMessages(library(units))
(a = set_units(1, m/s))
(b = set_units(1, km/h))
a + b
b + a
a * b
(c = set_units(10, kg))
a + c # You can't be serious...
```
## However,
```{r echo = TRUE}
a = set_units(15, g/g)
(b = set_units(33)) # unitless
a + b
c = set_units(12, m/m) # or rad
a + c
```
----
[David Flater](https://doi.org/10.1016/j.csi.2017.10.002) proposes to extend unitless units to keep track which units were cancelled out, or of what it is a count, to catch such cases:
---
## and we can do this
```{r echo=TRUE}
install_symbolic_unit("apples")
install_symbolic_unit("oranges")
install_conversion_constant("apples", "oranges", 1.5)
set_units(5, oranges) + set_units(5, apples)
```
(but it is not trivial to get right).
---
```{r echo=TRUE}
library(sf)
demo(nc, echo = FALSE, ask = FALSE)
nc[1:2,] %>% st_transform(2264) %>% st_area # NC state plane, us_ft
nc[1:2,] %>% st_transform(2264) %>% st_area %>% set_units(m^2)
st_area(nc[1:2,]) # ellipsoidal surface, NAD27
```
---
```{r echo=TRUE}
nc <- nc %>% st_transform(2264)
g = st_make_grid(nc, n = c(20,10))
plot(st_geometry(nc), border = "#ff5555", lwd = 2)
plot(g, add = TRUE, border = "#0000bb")
```
---
```{r echo=TRUE}
st_agr(nc) = c("BIR74" = "constant")
a1 = st_interpolate_aw(nc["BIR74"], g, extensive = FALSE)
sum(a1$BIR74) / sum(nc$BIR74) # not close to one: spatially intensive
a2 = st_interpolate_aw(nc["BIR74"], g, extensive = TRUE)
sum(a2$BIR74) / sum(nc$BIR74)
```
----
```{r echo=FALSE}
a1$BIR74_int = a1$BIR74
a1$BIR74_ext = a2$BIR74
suppressPackageStartupMessages(library(tidyverse))
a <- a1 %>% select(BIR74_int, BIR74_ext) %>% gather(VAR, BIR74, -geometry)
ggplot() + geom_sf(data = a, aes(fill = BIR74)) + facet_wrap(~VAR, ncol = 1) +
scale_fill_gradientn(colors = sf.colors(20)) +
theme(panel.grid.major = element_line(color = "white"))
```
## Can measurement units help discriminate intensive from extensive?
This is only relevant when distributing, so for 1- or 2-dimensional, flat geometries.
Speculating:
- always intensive: temperature, color, density, ...
- always extensive: volume, mass, energy, heat capacity, ...
- dimensionless, but not a ratio $\Rightarrow$ count: extensive
- anything extensive divided by area: intensive
## However:
- length: total lenght of roads in a polygon: extensive
- length: altitude: intensive
Height is again extensive when measured along vertical geometries.
---
```{r echo=TRUE, collapse=FALSE}
st_agr(nc)
```
`agr`: attribute-geometry-relationship:
- _constant_: attribute is constant throughout geometry
- _identity_: attribute identifies geometry (and hence is constant)
- _aggregate_: attribute is an aggregation over the geometry
> - function `st_aggregate` and the `sf` method of `summarise` set the `agr` field to _aggregate_
---
```{r echo=TRUE, collapse=FALSE}
st_agr(nc)
pt = st_sfc(st_point(c(1260982, 994957)), crs = st_crs(nc))
x = st_intersection(nc["BIR79"], pt)
y = st_intersection(nc["BIR74"], pt) # forged
z = st_intersection(nc["NAME"], pt)
```
## Concluding
- units of measurement deserve (more) attention in data science
- Flater's paper proposes a better way to deal with unitless units, which might be an opportunity to absorb (parts of) ontologies
- the relationship between attribute and geometry is neglected in most spatial information systems (See also [Scheider et al.](https://dx.doi.org/10.1080/13658816.2016.1151520))
- for subsampling, resampling, downscaling, upscaling etc. one needs to know whether the variable in question is extensive or intensive
- units of measure may help in this regard