pacman::p_load(sf, terra, gstat, tmap, viridis, tidyverse, automap)In-class Exercise 7b: Geospatial Analysis using R
1 Overview
This exercise is to give us more practices on handling geospatial data.
2 Getting started
2.1 Installing and loading the packages
For this exercise, the following R packages will be used:
- sf for handling geospatial data.
- viridis: colour library
- terra is a replacement for the raster package. It has a very similar, but simpler, interface, and it is faster than raster. In this hands-on exercise, it will be used to create grid (also known as raster) objects as the input and output of spatial interpolation.
- gstat, an R package for spatial and spatio-temporal geostatistical modelling, prediction and simulation. In this in-class exercise, it will be used to perform spatial interpolation.
- automap, an r package for performing automatic variogram modelling and kriging interpolation.
2.2 Load data
The datasets used are:
RainfallStation.csv provides location information on existing rainfall stations in Singapore. The data is downloaded from Meteorological Service Singapore.
DAILYDATA_202402.csv provides weather data are rainfall stations for February 2024. The data is also downloaded from Meteorological Service Singapore.
MPSZ-2019 contains the planning subzone boundary of URA Master Plan 2019. It is downloaded from data.gov.sg. The original data is in kml format.
Importing rainfall station data
rfstations <- read_csv("data/aspatial/RainfallStation.csv")Importing rainfall record data
We sum the total rainfall by month.
rfdata <- read_csv("data/aspatial/DAILYDATA_202402.csv") %>%
select (c(1,5)) %>% #select column 1 and 5 which is station and total rainfall
group_by(Station) %>%
summarise(MONTHSUM = sum(`Daily Rainfall Total (mm)`)) %>%
ungroup()Converting aspatial data into geospatial data
Join these 2 datasets using left-join so that we only join the latitude and longitude of the stations that have rainfall values.
rfdata <- rfdata %>%
left_join(rfstations)Convert it to the projected coordinate system using crs = 3414.
rfdata_sf <- st_as_sf(rfdata,
coords = c("Longitude",
"Latitude"),
crs = 4326) %>%
st_transform(crs = 3414)For
coordsargument, it is important to map the X (i.e. Longitude) first, then follow by the Y (i.e. Latitude).crs = 4326indicates that the source data is in the wgs84 coordinates system.st_transform()of sf package is then used to transform the source data from wgs84 to svy21 projected coordinates system.svy21 is the official projected coordinates of Singapore. 3414 is the EPSG code of svy21.
Importing planning subzone boundary data
mpsz2019 <-st_read(dsn = "data/geospatial/",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`C:\Vanessa\SMU\Term 4 - Visual Analytics & Applications\mvheng\ISSS608-VAA\In-class_Ex\In-class_Ex07\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
Visualising the data
tmap_options(check.and.fix = TRUE)
tmap_mode("view")
tm_shape(mpsz2019) +
tm_borders() +
tm_shape(rfdata_sf) +
tm_dots(col = 'MONTHSUM')tmap_mode("plot")3 Spatial Interpolation: gstat method
In this section, we perform spatial interpolation by using gstat package. We need to create an object of class called gstat, using a function of the same name: gstat. A gstat object contains all the necessary information to conduct spatial interpolation, namely:
The model definition
The calibration data
Based on its arguments, the gstat function “understands” what type of interpolation model we want to use:
No variogram model → IDW
Variogram model, no covariates → Ordinary Kriging
Variogram model, with covariates → Universal Kriging
3.1 Data Preparation
We need to create a grid data object by using rast() of terra package. Note that the nrows and ncols are calculated depending on the resolution we want. (using bounding box (xmax - xmin)/ resolution))
grid <- terra::rast(mpsz2019,
nrows = 690,
ncols = 1075)
xy <- terra::xyFromCell(grid,
1:ncell(grid))xyFromCell() gets coordinates of the centre of raster cells for a row, column, or cell number of a SpatRaster. Or get row, column, or cell numbers from coordinates or each other.
Lastly, we will create a data frame called coop with prediction/simulation locations.
coop <- st_as_sf(as.data.frame(xy),
coords = c("x", "y"),
crs = st_crs(mpsz2019))
coop <- st_filter(coop, mpsz2019)
head(coop)Simple feature collection with 6 features and 0 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 25883.42 ymin: 50231.33 xmax: 26133.32 ymax: 50231.33
Projected CRS: SVY21 / Singapore TM
geometry
1 POINT (25883.42 50231.33)
2 POINT (25933.4 50231.33)
3 POINT (25983.38 50231.33)
4 POINT (26033.36 50231.33)
5 POINT (26083.34 50231.33)
6 POINT (26133.32 50231.33)
3.2 Inverse Distance Weighted (IDW)
In the IDW interpolation method, the sample points are weighted during interpolation such that the influence of one point relative to another declines with distance from the unknown point you want to create.

Weighting is assigned to sample points through the use of a weighting coefficient that controls how the weighting influence will drop off as the distance from new point increases. The greater the weighting coefficient, the less the effect points will have if they are far from the unknown point during the interpolation process. As the coefficient increases, the value of the unknown point approaches the value of the nearest observational point.
It is important to notice that the IDW interpolation method also has some disadvantages: the quality of the interpolation result can decrease, if the distribution of sample data points is uneven. Furthermore, maximum and minimum values in the interpolated surface can only occur at sample data points. This often results in small peaks and pits around the sample data points.
res <- gstat(formula = MONTHSUM ~ 1,
locations = rfdata_sf,
nmax = 5,
set = list(idp = 0))Now that our model is defined, we can use predict() to interpolate, i.e., to calculate predicted values. The predict function accepts:
A raster—stars object, such as dem
A model—gstat object, such as g
The raster serves two purposes:
Specifying the locations where we want to make predictions (in all methods), and
Specifying covariate values (in Universal Kriging only).
resp <- predict(res, coop)[inverse distance weighted interpolation]
# inverse distance weighted interpolation
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
pred <- terra::rasterize(resp, grid,
field = "pred",
fun = "mean")Now, we will map the interpolated surface by using tmap functions.
tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(pred) +
tm_raster(alpha = 0.6,
palette = "viridis")
4 Kriging
Kriging is one of several methods that use a limited set of sampled data points to estimate the value of a variable over a continuous spatial field. An example of a value that varies across a random spatial field might be total monthly rainfall over Singapore. It differs from Inverse Distance Weighted Interpolation discussed earlier in that it uses the spatial correlation between sampled points to interpolate the values in the spatial field: the interpolation is based on the spatial arrangement of the empirical observations, rather than on a presumed model of spatial distribution. Kriging also generates estimates of the uncertainty surrounding each interpolated value.
In a general sense, the kriging weights are calculated such that points nearby to the location of interest are given more weight than those farther away. Clustering of points is also taken into account, so that clusters of points are weighted less heavily (in effect, they contain less information than single points). This helps to reduce bias in the predictions.
The kriging predictor is an “optimal linear predictor” and an exact interpolator, meaning that each interpolated value is calculated to minimize the prediction error for that point. The value that is generated from the kriging process for any sampled location will be equal to the observed value at this point, and all the interpolated values will be the Best Linear Unbiased Predictors (BLUPs).
Kriging will in general not be more effective than simpler methods of interpolation if there is little spatial autocorrelation among the sampled data points (that is, if the values do not co-vary in space). If there is at least moderate spatial autocorrelation, however, kriging can be a helpful method to preserve spatial variability that would be lost using a simpler method (for an example, see Auchincloss 2007, below).
Kriging can be understood as a two-step process:
The spatial covariance structure of the sampled points is determined by fitting a variogram
The weights derived from this covariance structure are used to interpolate values for unsampled points or blocks across the spatial field.
Kriging methods require a variogram model. A variogram (sometimes called a “semivariogram”) is a visual depiction of the covariance exhibited between each pair of points in the sampled data. For each pair of points in the sampled data, the gamma-value or “semi-variance” (a measure of the half mean-squared difference between their values) is plotted against the distance, or “lag”, between them. The “experimental” variogram is the plot of observed values, while the “theoretical” or “model” variogram is the distributional model that best fits the data.

Firstly, we will calculate and examine the empirical variogram by using variogram() of gstat package. The function requires two arguments:
formula, the dependent variable and the covariates (same as in gstat, see Section 12.2.1)
data, a point layer with the dependent variable and covariates as attributes
v <- variogram(MONTHSUM ~ 1,
data = rfdata_sf)
plot(v)
With reference to the comparison above, am empirical variogram model will be fitted by using fit.variogram() of gstat package.
fv <- fit.variogram(object = v,
model = vgm(
psill = 0.5,
model = "Sph",
range = 5000,
nugget = 0.1))
fv model psill range
1 Nug 0.1129190 0.000
2 Sph 0.5292397 5213.396
We can visualise how well the observed data fit the model by plotting fv using the code chunk below.
plot(v, fv)
The plot above reveals that the empirical model fits rather well. In view of this, we will go ahead to perform spatial interpolation by using the newly derived model as shown in the code chunk below.
k <- gstat(formula = MONTHSUM ~ 1,
data = rfdata_sf,
model = fv)
kdata:
var1 : formula = MONTHSUM`~`1 ; data dim = 43 x 2
variograms:
model psill range
var1[1] Nug 0.1129190 0.000
var1[2] Sph 0.5292397 5213.396
Now we use predict() of gstat package will be used to estimate the unknown grids by using the code chunk below.
resp <- predict(k, coop)[using ordinary kriging]
#using ordinary kriging
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp$pred <- resp$pred
respSimple feature collection with 314019 features and 5 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2692.528 ymin: 15773.73 xmax: 56371.45 ymax: 50231.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
var1.pred var1.var geometry x y pred
1 131.0667 0.6608399 POINT (25883.42 50231.33) 25883.42 50231.33 131.0667
2 130.9986 0.6610337 POINT (25933.4 50231.33) 25933.40 50231.33 130.9986
3 130.9330 0.6612129 POINT (25983.38 50231.33) 25983.38 50231.33 130.9330
4 130.8698 0.6613782 POINT (26033.36 50231.33) 26033.36 50231.33 130.8698
5 130.8092 0.6615303 POINT (26083.34 50231.33) 26083.34 50231.33 130.8092
6 130.7514 0.6616697 POINT (26133.32 50231.33) 26133.32 50231.33 130.7514
7 130.6965 0.6617971 POINT (26183.3 50231.33) 26183.30 50231.33 130.6965
8 130.6446 0.6619131 POINT (26233.28 50231.33) 26233.28 50231.33 130.6446
9 130.5958 0.6620184 POINT (26283.26 50231.33) 26283.26 50231.33 130.5958
10 132.5484 0.6542154 POINT (25033.76 50181.32) 25033.76 50181.32 132.5484
In order to create a raster surface data object, rasterize() of terra is used as shown in the code chunk below.
kpred <- terra::rasterize(resp, grid,
field = "pred")
kpredclass : SpatRaster
dimensions : 690, 1075, 1 (nrow, ncol, nlyr)
resolution : 49.98037, 50.01103 (x, y)
extent : 2667.538, 56396.44, 15748.72, 50256.33 (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414)
source(s) : memory
name : last
min value : 72.77826
max value : 195.53284
4.1 Mapping the interpolated rainfall raster
Finally, tmap functions are used to map the interpolated rainfall raster (i.e. kpred) by using the code chunk below.
tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(kpred) +
tm_raster(alpha = 0.6,
palette = "viridis",
title = "Total monthly rainfall (mm)") +
tm_layout(main.title = "Distribution of monthly rainfall, Feb 2024",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2)
4.2 Automatic variogram modelling
Beside using gstat to perform variogram modelling manually, autofirVariogram() of automap package can be used to perform varigram modelling as shown in the code chunk below.
v_auto <- autofitVariogram(MONTHSUM ~ 1,
rfdata_sf)
plot(v_auto)
v_auto$exp_var
np dist gamma dir.hor dir.ver id
1 15 1957.436 311.9613 0 0 var1
2 33 3307.349 707.7685 0 0 var1
3 54 4861.368 848.1314 0 0 var1
4 116 6716.531 730.3969 0 0 var1
5 111 9235.708 1006.5381 0 0 var1
6 120 11730.199 1167.5988 0 0 var1
7 135 14384.636 1533.5903 0 0 var1
$var_model
model psill range kappa
1 Nug 0.00 0 0.0
2 Ste 24100.71 1647955 0.3
$sserr
[1] 0.2178294
attr(,"class")
[1] "autofitVariogram" "list"
k <- gstat(formula = MONTHSUM ~ 1,
model = v_auto$var_model,
data = rfdata_sf)
kdata:
var1 : formula = MONTHSUM`~`1 ; data dim = 43 x 2
variograms:
model psill range kappa
var1[1] Nug 0.00 0 0.0
var1[2] Ste 24100.71 1647955 0.3
resp <- predict(k, coop)[using ordinary kriging]
#using ordinary kriging
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp$pred <- resp$pred
kpred <- terra::rasterize(resp, grid,
field = "pred")
tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(kpred) +
tm_raster(alpha = 0.6,
palette = "viridis",
title = "Total monthly rainfall (mm)") +
tm_layout(main.title = "Distribution of monthly rainfall, Feb 2024",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2)