- Motivation
- What is BIPM, GUM, VIM
- review of quantities, units, and values (according to VIM)
- software
- spatial data science
UCSB Thinkspatial Brown Bag, February 6, 2018
General frustrations I have:
General fears I have:
data science and citizen data science imply that everyone now tries to do anything, without spatial experts involved, and with varying motivations
data scientists like to think longitude and latitude are just two more variables
Lack of unit checking in practice:
(apples = c(5,8,12,3)) ## [1] 5 8 12 3 (oranges = c(4,2,8,11)) ## [1] 4 2 8 11 apples + oranges # meaningless? ## [1] 9 10 20 14 (speed1 = 55) # mile/hr ## [1] 55 (speed2 = 34.5) # km/hr ## [1] 34.5 speed1 + speed2 # wrong: ## [1] 89.5 with(mtcars[1:3,], mpg + cyl) # wrong and meaningless: ## [1] 27.0 27.0 26.8
The Joint Committee for Guides in Metrology (JCGM) has responsibility for the following two publications:
(The following 7 slides are copied from the VIM)
quantity: property of a phenomenon, body, or substance, where the property has a magnitude that can be expressed as a number and a reference
NOTE2: A reference can be a measurement unit, a measurement procedure, a reference material, or a combination of such.
system of quantities: set of quantities together with a set of noncontradictory equations relating those quantities
base quantity quantity in a conventionally chosen subset of a given system of quantities, where no subset quantity can be expressed in terms of the others
Base quantity | Symbol | SI base unit | Symbol |
---|---|---|---|
length | \(l,x,r,\) etc. | meter | m |
mass | \(m\) | kilogram | kg |
time, duration | \(t\) | second | s |
electric current | \(I, i\) | ampere | A |
thermodynamic temperature | \(T\) | kelvin | K |
amount of substance | \(n\) | mole | mol |
luminous intensity | \(I_v\) | candela | cd |
derived quantity: quantity, in a system of quantities, defined in terms of the base quantities of that system
(quantity) dimension: expression of the dependence of a quantity on the base quantities of a system of quantities as a product of powers of factors corresponding to the base quantities, omitting any numerical factor
quantity of dimension one (dimensionless quantity): quantity for which all the exponents of the factors corresponding to the base quantities in its quantity dimension are zero
measurement unit: real scalar quantity, defined and adopted by convention, with which any other quantity of the same kind can be compared to express the ratio of the two quantities as a number
base unit: measurement unit that is adopted by convention for a base quantity (e.g., m, kg)
NOTE 3: For number of entities, the number one, symbol 1, can be regarded as a base unit in any system of units.
measured quantity value (measured value): quantity value representing a measurement result
random measurement error component of measurement error that in replicate measurements varies in an unpredictable manner
… and so on
the dimension of a quantity Q is denoted by
\[ \mbox{dim}~ Q = L ^\alpha M^\beta T^\gamma I^\delta Θ^\epsilon N^\zeta J^\eta \]
where the exponents \(\alpha,...,\eta\), named dimensional exponents, are positive, negative, or zero.
shall
Examples:
units
(which uses udunits)suppressPackageStartupMessages(library(units)) (a = set_units(1, m/s)) ## 1 m/s (b = set_units(1, km/h)) ## 1 km/h a + b ## 1.277778 m/s b + a ## 4.6 km/h a * b ## 1 km*m/h/s (c = set_units(10, kg)) ## 10 kg a + c # You can't be serious... ## Error: cannot convert kg into m/s
a = set_units(15, g/g) (b = set_units(33)) # unitless ## 33 1 a + b ## 48 1 c = set_units(12, m/m) # or rad a + c ## 27 1
David Flater proposes to extend unitless units to keep track which units were cancelled out, or of what it is a count, to catch such cases:
install_symbolic_unit("apples") install_symbolic_unit("oranges") install_conversion_constant("apples", "oranges", 1.5) set_units(5, oranges) + set_units(5, apples) ## 12.5 oranges
(but it is not trivial to get right).
library(sf) ## Linking to GEOS 3.5.1, GDAL 2.2.1, proj.4 4.9.3 demo(nc, echo = FALSE, ask = FALSE) ## Reading layer `nc.gpkg' from data source `/home/edzer/R/x86_64-pc-linux-gnu-library/3.4/sf/gpkg/nc.gpkg' using driver `GPKG' ## Simple feature collection with 100 features and 14 fields ## Attribute-geometry relationship: 0 constant, 8 aggregate, 6 identity ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965 ## epsg (SRID): 4267 ## proj4string: +proj=longlat +datum=NAD27 +no_defs nc[1:2,] %>% st_transform(2264) %>% st_area # NC state plane, us_ft ## Units: US_survey_foot^2 ## [1] 12244955726 6578960164 nc[1:2,] %>% st_transform(2264) %>% st_area %>% set_units(m^2) ## Units: m^2 ## [1] 1137598162 611207844 st_area(nc[1:2,]) # ellipsoidal surface, NAD27 ## Units: m^2 ## [1] 1137388604 611077263
nc <- nc %>% st_transform(2264) g = st_make_grid(nc, n = c(20,10)) plot(st_geometry(nc), border = "#ff5555", lwd = 2) plot(g, add = TRUE, border = "#0000bb")
st_agr(nc) = c("BIR74" = "constant") a1 = st_interpolate_aw(nc["BIR74"], g, extensive = FALSE) sum(a1$BIR74) / sum(nc$BIR74) # not close to one: spatially intensive ## [1] 1.191945 a2 = st_interpolate_aw(nc["BIR74"], g, extensive = TRUE) sum(a2$BIR74) / sum(nc$BIR74) ## [1] 1
This is only relevant when distributing, so for 1- or 2-dimensional, flat geometries.
Speculating:
Height is again extensive when measured along vertical geometries.
st_agr(nc)
## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO ## aggregate aggregate identity identity identity identity identity ## CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 ## identity constant aggregate aggregate aggregate aggregate aggregate ## Levels: constant aggregate identity
agr
: attribute-geometry-relationship:
st_aggregate
and the sf
method of summarise
set the agr
field to aggregatest_agr(nc)
## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO ## aggregate aggregate identity identity identity identity identity ## CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 ## identity constant aggregate aggregate aggregate aggregate aggregate ## Levels: constant aggregate identity
pt = st_sfc(st_point(c(1260982, 994957)), crs = st_crs(nc)) x = st_intersection(nc["BIR79"], pt)
## Warning: attribute variables are assumed to be spatially constant ## throughout all geometries
y = st_intersection(nc["BIR74"], pt) # forged z = st_intersection(nc["NAME"], pt)