f = "sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip"
granule = system.file(file = f, package = "starsdata")
file.size(granule)
# [1] 769462647
base_name = strsplit(basename(granule), ".zip")[[1]]
s2 = paste0("SENTINEL2_L1C:/vsizip/", granule, "/", base_name,
".SAFE/MTD_MSIL1C.xml:10m:EPSG_32632")
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
(p = read_stars(s2, proxy = TRUE))
# stars_proxy object with 1 attribute in 1 file(s):
# $EPSG_32632
# [1] "[...]/MTD_MSIL1C.xml:10m:EPSG_32632"
#
# dimension(s):
# from to offset delta refsys values x/y
# x 1 10980 3e+05 10 WGS 84 / UTM z... NULL [x]
# y 1 10980 6e+06 -10 WGS 84 / UTM z... NULL [y]
# band 1 4 NA NA NA B4,...,B8
st_get_dimension_values(p, "band")
# [1] "B4" "B3" "B2" "B8"
9 Large datasets
Exercise 9.1
For the S2 image (above), find out in which order the bands are using st_get_dimension_values()
, and try to find out (e.g. by internet search) which spectral bands / colors they correspond to.
Exercise 9.2
Compute NDVI for the S2 image, using st_apply
and an an appropriate ndvi
function. Plot the result to screen, and then write the result to a GeoTIFF. Explain the difference in runtime between plotting and writing.
Alternatively, one could use
ndvi_slow = function(x) (x[4]-x[1])/(x[4]+x[1])
but that is much less efficient (called for every pixel, rather than chunked).
Write to a tiff:
system.time(write_stars(ndvi, "ndvi.tif"))
# ====================================================================
# user system elapsed
# 114.312 3.273 39.206
The runtime difference is caused by the fact that plot
downsamples, so computes a very small fraction of the available pixels, where write_stars
computes all pixels, and then writes them, at full (original) resolution.
Exercise 9.3
Plot an RGB composite of the S2 image, using the rgb
argument to plot()
, and then by using st_rgb()
first.
plot(p, rgb = 1:3)
# downsample set to 18
# plot(st_rgb(p[,,,1:3], maxColorValue=13600)) # FIXME: fails
Exercise 9.4
select five random points from the bounding box of S2
, and extract the band values at these points. What is the class of the object returned? Convert the object returned to an sf
object.
pts = p %>% st_bbox() %>% st_as_sfc() %>% st_sample(5)
(p5 = st_extract(p, pts))
# stars object with 2 dimensions and 1 attribute
# attribute(s):
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# EPSG_32632 1705 2341.5 2916 3064.15 3838 4571
# dimension(s):
# from to refsys point
# geometry 1 5 WGS 84 / UTM z... TRUE
# band 1 4 NA NA
# values
# geometry POINT (307965....,...,POINT (310020 ...
# band B4,...,B8
class(p5)
# [1] "stars"
st_as_sf(p5)
# Simple feature collection with 5 features and 4 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 307965.9 ymin: 5905282 xmax: 363141.2 ymax: 5980653
# Projected CRS: WGS 84 / UTM zone 32N
# B4 B3 B2 B8 geometry
# 1 2850 2902 3383 2749 POINT (307965.9 5971317)
# 2 4194 3834 4177 4571 POINT (356964.1 5917001)
# 3 1705 1942 2402 1796 POINT (363141.2 5980653)
# 4 3850 3616 4036 3735 POINT (344725.6 5972851)
# 5 2014 2160 2437 2930 POINT (310020 5905282)
Exercise 9.5
For the 10 km radius circle around POINT(390000 5940000)
, compute the mean pixel values of the S2 image when downsampling the images with factor 30, and on the original resolution. Compute the relative difference between the results.
For the full resolution, this takes a while:
system.time(a2 <- aggregate(p, b, mean))
# user system elapsed
# 42.510 1.499 39.449
Relative differences: we will work on the array of the stars objects:
(a1[[1]] - a2[[1]])/((a1[[1]]+a2[[1]])/2)
# [,1] [,2] [,3] [,4]
# [1,] 0.001055567 0.0009431265 0.001103254 7.781836e-05
Alternatively one could convert a1
and a2
to a data.frame
, using as.data.frame
, and work on the third column of the data frames.
Exercise 9.6
Use hist
to compute the histogram on the downsampled S2 image. Also do this for each of the bands. Use ggplot2
to compute a single plot with all four histograms in facets.
hist(p1)
hist(p1[,,,1])
hist(p1[,,,2])
hist(p1[,,,3])
hist(p1[,,,4])
library(ggplot2)
ggplot(as.data.frame(p1), aes(x = EPSG_32632)) +
geom_histogram() + facet_wrap(~band)
# `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Exercise 9.7
Use st_crop
to crop the S2 image to the area covered by the 10 km circle. Plot the results. Explore the effect of setting argument crop = FALSE
Exercise 9.8
With the downsampled image, compute the logical layer where all four bands have pixel values higher than 1000. Use a raster algebra expression on the four bands (use split
first), or use st_apply
for this.
alternative, using st_apply
on the band dimension