In-class Exercise 7b: Geospatial Analysis using R

Author

Vanessa Heng

Published

March 9, 2024

Modified

March 16, 2024

1 Overview

This exercise is to give us more practices on handling geospatial data.

2 Getting started

2.1 Installing and loading the packages

For this exercise, the following R packages will be used:

  • sf for handling geospatial data.
  • viridis: colour library
  • terra is a replacement for the raster package. It has a very similar, but simpler, interface, and it is faster than raster. In this hands-on exercise, it will be used to create grid (also known as raster) objects as the input and output of spatial interpolation.
  • gstat, an R package for spatial and spatio-temporal geostatistical modelling, prediction and simulation. In this in-class exercise, it will be used to perform spatial interpolation.
  • automap, an r package for performing automatic variogram modelling and kriging interpolation.
pacman::p_load(sf, terra, gstat, tmap, viridis, tidyverse, automap)

2.2 Load data

The datasets used are:

  • RainfallStation.csv provides location information on existing rainfall stations in Singapore. The data is downloaded from Meteorological Service Singapore.

  • DAILYDATA_202402.csv provides weather data are rainfall stations for February 2024. The data is also downloaded from Meteorological Service Singapore.

  • MPSZ-2019 contains the planning subzone boundary of URA Master Plan 2019. It is downloaded from data.gov.sg. The original data is in kml format.

Importing rainfall station data

rfstations <- read_csv("data/aspatial/RainfallStation.csv")

Importing rainfall record data

We sum the total rainfall by month.

rfdata <- read_csv("data/aspatial/DAILYDATA_202402.csv") %>% 
   select (c(1,5)) %>%  #select column 1 and 5 which is station and total rainfall
   group_by(Station) %>% 
   summarise(MONTHSUM = sum(`Daily Rainfall Total (mm)`)) %>% 
   ungroup()

Converting aspatial data into geospatial data

Join these 2 datasets using left-join so that we only join the latitude and longitude of the stations that have rainfall values.

rfdata <- rfdata %>% 
   left_join(rfstations)

Convert it to the projected coordinate system using crs = 3414.

rfdata_sf <- st_as_sf(rfdata, 
                      coords = c("Longitude",
                                 "Latitude"),
                      crs = 4326) %>% 
   st_transform(crs = 3414)
Note
  • For coords argument, it is important to map the X (i.e. Longitude) first, then follow by the Y (i.e. Latitude).

  • crs = 4326 indicates that the source data is in the wgs84 coordinates system.

  • st_transform() of sf package is then used to transform the source data from wgs84 to svy21 projected coordinates system.

  • svy21 is the official projected coordinates of Singapore. 3414 is the EPSG code of svy21.

Importing planning subzone boundary data

mpsz2019 <-st_read(dsn = "data/geospatial/", 
                  layer = "MPSZ-2019") %>% 
   st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `C:\Vanessa\SMU\Term 4 - Visual Analytics & Applications\mvheng\ISSS608-VAA\In-class_Ex\In-class_Ex07\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

Visualising the data

tmap_options(check.and.fix = TRUE)
tmap_mode("view")
tm_shape(mpsz2019) +
   tm_borders() +
   tm_shape(rfdata_sf) +
   tm_dots(col = 'MONTHSUM')
tmap_mode("plot")

3 Spatial Interpolation: gstat method

In this section, we perform spatial interpolation by using gstat package. We need to create an object of class called gstat, using a function of the same name: gstat. A gstat object contains all the necessary information to conduct spatial interpolation, namely:

  • The model definition

  • The calibration data

Based on its arguments, the gstat function “understands” what type of interpolation model we want to use:

  • No variogram model → IDW

  • Variogram model, no covariates → Ordinary Kriging

  • Variogram model, with covariates → Universal Kriging

3.1 Data Preparation

We need to create a grid data object by using rast() of terra package. Note that the nrows and ncols are calculated depending on the resolution we want. (using bounding box (xmax - xmin)/ resolution))

grid <- terra::rast(mpsz2019,
                    nrows = 690,
                    ncols = 1075)

xy <- terra::xyFromCell(grid, 
                        1:ncell(grid))
Note

xyFromCell() gets coordinates of the centre of raster cells for a row, column, or cell number of a SpatRaster. Or get row, column, or cell numbers from coordinates or each other.

Lastly, we will create a data frame called coop with prediction/simulation locations.

coop <- st_as_sf(as.data.frame(xy), 
                 coords = c("x", "y"),
                 crs = st_crs(mpsz2019))
coop <- st_filter(coop, mpsz2019)
head(coop)
Simple feature collection with 6 features and 0 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 25883.42 ymin: 50231.33 xmax: 26133.32 ymax: 50231.33
Projected CRS: SVY21 / Singapore TM
                   geometry
1 POINT (25883.42 50231.33)
2  POINT (25933.4 50231.33)
3 POINT (25983.38 50231.33)
4 POINT (26033.36 50231.33)
5 POINT (26083.34 50231.33)
6 POINT (26133.32 50231.33)

3.2 Inverse Distance Weighted (IDW)

In the IDW interpolation method, the sample points are weighted during interpolation such that the influence of one point relative to another declines with distance from the unknown point you want to create.

Weighting is assigned to sample points through the use of a weighting coefficient that controls how the weighting influence will drop off as the distance from new point increases. The greater the weighting coefficient, the less the effect points will have if they are far from the unknown point during the interpolation process. As the coefficient increases, the value of the unknown point approaches the value of the nearest observational point.

It is important to notice that the IDW interpolation method also has some disadvantages: the quality of the interpolation result can decrease, if the distribution of sample data points is uneven. Furthermore, maximum and minimum values in the interpolated surface can only occur at sample data points. This often results in small peaks and pits around the sample data points.

res <- gstat(formula = MONTHSUM ~ 1, 
             locations = rfdata_sf,
             nmax = 5,
             set = list(idp = 0))

Now that our model is defined, we can use predict() to interpolate, i.e., to calculate predicted values. The predict function accepts:

  • A raster—stars object, such as dem

  • A model—gstat object, such as g

The raster serves two purposes:

  • Specifying the locations where we want to make predictions (in all methods), and

  • Specifying covariate values (in Universal Kriging only).

resp <- predict(res, coop)
[inverse distance weighted interpolation]
# inverse distance weighted interpolation
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred

pred <- terra::rasterize(resp, grid, 
                         field = "pred", 
                         fun = "mean")

Now, we will map the interpolated surface by using tmap functions.

tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(pred) + 
  tm_raster(alpha = 0.6, 
            palette = "viridis")

4 Kriging

Kriging is one of several methods that use a limited set of sampled data points to estimate the value of a variable over a continuous spatial field. An example of a value that varies across a random spatial field might be total monthly rainfall over Singapore. It differs from Inverse Distance Weighted Interpolation discussed earlier in that it uses the spatial correlation between sampled points to interpolate the values in the spatial field: the interpolation is based on the spatial arrangement of the empirical observations, rather than on a presumed model of spatial distribution. Kriging also generates estimates of the uncertainty surrounding each interpolated value.

In a general sense, the kriging weights are calculated such that points nearby to the location of interest are given more weight than those farther away. Clustering of points is also taken into account, so that clusters of points are weighted less heavily (in effect, they contain less information than single points). This helps to reduce bias in the predictions.

The kriging predictor is an “optimal linear predictor” and an exact interpolator, meaning that each interpolated value is calculated to minimize the prediction error for that point. The value that is generated from the kriging process for any sampled location will be equal to the observed value at this point, and all the interpolated values will be the Best Linear Unbiased Predictors (BLUPs).

Kriging will in general not be more effective than simpler methods of interpolation if there is little spatial autocorrelation among the sampled data points (that is, if the values do not co-vary in space). If there is at least moderate spatial autocorrelation, however, kriging can be a helpful method to preserve spatial variability that would be lost using a simpler method (for an example, see Auchincloss 2007, below).

Kriging can be understood as a two-step process:

  • The spatial covariance structure of the sampled points is determined by fitting a variogram

  • The weights derived from this covariance structure are used to interpolate values for unsampled points or blocks across the spatial field.

Kriging methods require a variogram model. A variogram (sometimes called a “semivariogram”) is a visual depiction of the covariance exhibited between each pair of points in the sampled data. For each pair of points in the sampled data, the gamma-value or “semi-variance” (a measure of the half mean-squared difference between their values) is plotted against the distance, or “lag”, between them. The “experimental” variogram is the plot of observed values, while the “theoretical” or “model” variogram is the distributional model that best fits the data.

Firstly, we will calculate and examine the empirical variogram by using variogram() of gstat package. The function requires two arguments:

  • formula, the dependent variable and the covariates (same as in gstat, see Section 12.2.1)

  • data, a point layer with the dependent variable and covariates as attributes

v <- variogram(MONTHSUM ~ 1,
               data = rfdata_sf) 
plot(v)

With reference to the comparison above, am empirical variogram model will be fitted by using fit.variogram() of gstat package.

fv <- fit.variogram(object = v,
                    model = vgm(
                      psill = 0.5, 
                      model = "Sph",
                      range = 5000, 
                      nugget = 0.1))
fv
  model     psill    range
1   Nug 0.1129190    0.000
2   Sph 0.5292397 5213.396

We can visualise how well the observed data fit the model by plotting fv using the code chunk below.

plot(v, fv)

The plot above reveals that the empirical model fits rather well. In view of this, we will go ahead to perform spatial interpolation by using the newly derived model as shown in the code chunk below.

k <- gstat(formula = MONTHSUM ~ 1, 
           data = rfdata_sf, 
           model = fv)
k
data:
var1 : formula = MONTHSUM`~`1 ; data dim = 43 x 2
variograms:
        model     psill    range
var1[1]   Nug 0.1129190    0.000
var1[2]   Sph 0.5292397 5213.396

Now we use predict() of gstat package will be used to estimate the unknown grids by using the code chunk below.

resp <- predict(k, coop)
[using ordinary kriging]
#using ordinary kriging
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp$pred <- resp$pred
resp
Simple feature collection with 314019 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2692.528 ymin: 15773.73 xmax: 56371.45 ymax: 50231.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
   var1.pred  var1.var                  geometry        x        y     pred
1   131.0667 0.6608399 POINT (25883.42 50231.33) 25883.42 50231.33 131.0667
2   130.9986 0.6610337  POINT (25933.4 50231.33) 25933.40 50231.33 130.9986
3   130.9330 0.6612129 POINT (25983.38 50231.33) 25983.38 50231.33 130.9330
4   130.8698 0.6613782 POINT (26033.36 50231.33) 26033.36 50231.33 130.8698
5   130.8092 0.6615303 POINT (26083.34 50231.33) 26083.34 50231.33 130.8092
6   130.7514 0.6616697 POINT (26133.32 50231.33) 26133.32 50231.33 130.7514
7   130.6965 0.6617971  POINT (26183.3 50231.33) 26183.30 50231.33 130.6965
8   130.6446 0.6619131 POINT (26233.28 50231.33) 26233.28 50231.33 130.6446
9   130.5958 0.6620184 POINT (26283.26 50231.33) 26283.26 50231.33 130.5958
10  132.5484 0.6542154 POINT (25033.76 50181.32) 25033.76 50181.32 132.5484

In order to create a raster surface data object, rasterize() of terra is used as shown in the code chunk below.

kpred <- terra::rasterize(resp, grid, 
                         field = "pred")
kpred
class       : SpatRaster 
dimensions  : 690, 1075, 1  (nrow, ncol, nlyr)
resolution  : 49.98037, 50.01103  (x, y)
extent      : 2667.538, 56396.44, 15748.72, 50256.33  (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414) 
source(s)   : memory
name        :      last 
min value   :  72.77826 
max value   : 195.53284 

4.1 Mapping the interpolated rainfall raster

Finally, tmap functions are used to map the interpolated rainfall raster (i.e. kpred) by using the code chunk below.

tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(kpred) + 
  tm_raster(alpha = 0.6, 
            palette = "viridis",
            title = "Total monthly rainfall (mm)") +
  tm_layout(main.title = "Distribution of monthly rainfall, Feb 2024",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha =0.2)

4.2 Automatic variogram modelling

Beside using gstat to perform variogram modelling manually, autofirVariogram() of automap package can be used to perform varigram modelling as shown in the code chunk below.

v_auto <- autofitVariogram(MONTHSUM ~ 1, 
                           rfdata_sf)
plot(v_auto)

v_auto
$exp_var
   np      dist     gamma dir.hor dir.ver   id
1  15  1957.436  311.9613       0       0 var1
2  33  3307.349  707.7685       0       0 var1
3  54  4861.368  848.1314       0       0 var1
4 116  6716.531  730.3969       0       0 var1
5 111  9235.708 1006.5381       0       0 var1
6 120 11730.199 1167.5988       0       0 var1
7 135 14384.636 1533.5903       0       0 var1

$var_model
  model    psill   range kappa
1   Nug     0.00       0   0.0
2   Ste 24100.71 1647955   0.3

$sserr
[1] 0.2178294

attr(,"class")
[1] "autofitVariogram" "list"            
k <- gstat(formula = MONTHSUM ~ 1, 
           model = v_auto$var_model,
           data = rfdata_sf)
k
data:
var1 : formula = MONTHSUM`~`1 ; data dim = 43 x 2
variograms:
        model    psill   range kappa
var1[1]   Nug     0.00       0   0.0
var1[2]   Ste 24100.71 1647955   0.3
resp <- predict(k, coop)
[using ordinary kriging]
#using ordinary kriging
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp$pred <- resp$pred

kpred <- terra::rasterize(resp, grid, 
                         field = "pred")

tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(kpred) + 
  tm_raster(alpha = 0.6, 
            palette = "viridis",
            title = "Total monthly rainfall (mm)") +
  tm_layout(main.title = "Distribution of monthly rainfall, Feb 2024",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha =0.2)