Take-home Exercise 3: Be Weatherwise Or Otherwise

Author

Vanessa Heng

Published

February 8, 2024

Modified

March 1, 2024

1. Overview

According to a report by the Ministry of Sustainability and Environment, the infographic below indicates that

  • From 1948 to 2016, the annual mean temperatures rose at an average rate of 0.25 °C per decade. The daily mean temperatures are projected to increase by 1.4 °C to 4.6 °C.

  • From 1980 to 2016, annual total rainfall rose at an average rate of 101 mm per decade. The contrast between the wet months (November to January) and dry months (February and June to September) is likely to be more pronounced.

The following figure is taken from Meteorological Service Singapore (MSS) website. It shows the mean monthly temperature variation (ºC) from 1991 to 2020 at Changi Climate Station.

Compared to countries in the temperate regions, temperatures in Singapore vary little from month to month. The daily temperature range has a minimum usually not falling below 23-25ºC during the night and a maximum not rising above 31-33ºC during the day. May has the highest average monthly temperature (24-hour mean of 28.6ºC) and December and January are the coolest (24-hour mean of 26.8ºC).

As a visual analytics greenhorn, we will apply newly acquired visual interactivity and visualising uncertainty methods to validate the claims presented above.

1.1 The Task

In this exercise, we are required to:

  • Select a weather station and download historical daily temperature or rainfall data from Meteorological Service Singapore (MSS) website,

  • Select either daily temperature or rainfall records of a month of the year 1983, 1993, 2003, 2013 and 2023 and create an analytics-driven data visualisation,

  • Apply appropriate interactive techniques to enhance the user experience in data discovery and/or visual story-telling.

2. Data Preparation

2.1 Installing R packages

The code below uses p_load() of the Pacman package to check if all the required packages are installed on the laptop. If they are, then they will be launched into the R environment.

pacman::p_load(tidyverse, ggstatsplot, plotly, ggplot2, ggdist)

2.2 Importing data

Based on the MSS website, we can download the monthly data of a selected climate station each time. As such, I have written a robotic process automation bot using UIPath software to download all the monthly data recorded at Changi climate station, the oldest climate station near Changi Airport, and then combine all the CSV files and save the data in one CSV file.

weather <- read_csv("data/Changi.csv")
glimpse(weather)
Rows: 16,071
Columns: 13
$ Station                     <chr> "Changi", "Changi", "Changi", "Changi", "C…
$ Year                        <dbl> 1980, 1980, 1980, 1980, 1980, 1980, 1980, …
$ Month                       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ Day                         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
$ `Daily Rainfall Total (mm)` <dbl> 0.0, 0.0, 0.0, 0.0, 8.0, 9.1, 7.9, 0.0, 0.…
$ `Daily Rainfall Total`      <chr> "�", "�", "�", "�", "�", "�", "�", "�", "�…
$ `Highest 30 Min Rainfall`   <chr> "�", "�", "�", "�", "�", "�", "�", "�", "�…
$ `Highest 120 Min Rainfall`  <chr> "�", "�", "�", "�", "�", "�", "�", "�", "�…
$ `Mean Temperature`          <chr> "�", "�", "�", "�", "�", "�", "�", "�", "�…
$ `Maximum Temperature`       <chr> "�", "�", "�", "�", "�", "�", "�", "�", "�…
$ `Minimum Temperature`       <chr> "�", "�", "�", "�", "�", "�", "�", "�", "�…
$ `Mean Wind Speed (km/h)`    <chr> "�", "�", "�", "�", "�", "�", "�", "�", "�…
$ `Max Wind Speed (km/h)`     <chr> "�", "�", "�", "�", "�", "�", "�", "�", "�…

2.3 Change data type

This exercise will focus on the analysis of how temperature changes over the years so we select the relevant columns and the temperature column names are shortened for easy reference.

We can see that columns “Year”, “Month” and “Day” are double data types whereas “Mean Temperature”, “Maximum Temperature”, and “Minimum Temperature” are character data types. These are wrongly classified.

Let’s change “Year” and “Day” to integer data type, and the temperature-related variables should be numeric.

As for the values in the column “Month”, they will be replaced by the abbreviation of the respective months.

We remove the data records for years 1980 and 1981 as there are no temperature records.

weather <- weather %>% 
          select(2:4, 9:11, "MeanTemp" = 9, "MaxTemp" = 10, "MinTemp" = 11 )

weather$Year <- as.integer(weather$Year)
weather$Month <- month.abb[weather$Month]
weather$Day <- as.integer(weather$Day)
weather$MeanTemp <- as.numeric(weather$MeanTemp)
weather$MaxTemp <- as.numeric(weather$MaxTemp)
weather$MinTemp <- as.numeric(weather$MinTemp)

weather <- weather %>% 
          filter(Year != 1980 & Year != 1981)
glimpse(weather)
Rows: 15,340
Columns: 6
$ Year     <int> 1982, 1982, 1982, 1982, 1982, 1982, 1982, 1982, 1982, 1982, 1…
$ Month    <chr> "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan"…
$ Day      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ MeanTemp <dbl> 25.3, 24.7, 25.7, 26.3, 25.8, 23.7, 23.7, 24.4, 25.4, 25.6, 2…
$ MaxTemp  <dbl> 29.4, 26.2, 27.2, 29.8, 28.8, 24.9, 25.2, 27.6, 28.3, 29.3, 2…
$ MinTemp  <dbl> 23.0, 23.5, 24.0, 24.1, 23.5, 21.9, 22.4, 22.8, 23.5, 23.2, 2…

2.4 Filter data

For this exercise, we will focus on the daily temperature records in the years 1983, 1993, 2003, 2013 and 2023. Hence, we will filter the data rows by the years.

weather_data <- weather %>% 
          filter(Year %in% c("1983", "1993", "2003", "2013", "2023"))
colSums(is.na(weather_data))
    Year    Month      Day MeanTemp  MaxTemp  MinTemp 
       0        0        0        0        0        0 
summary(weather_data)
      Year         Month                Day           MeanTemp    
 Min.   :1983   Length:1825        Min.   : 1.00   Min.   :23.00  
 1st Qu.:1993   Class :character   1st Qu.: 8.00   1st Qu.:26.90  
 Median :2003   Mode  :character   Median :16.00   Median :27.70  
 Mean   :2003                      Mean   :15.72   Mean   :27.73  
 3rd Qu.:2013                      3rd Qu.:23.00   3rd Qu.:28.70  
 Max.   :2023                      Max.   :31.00   Max.   :30.70  
    MaxTemp         MinTemp     
 Min.   :23.80   Min.   :20.90  
 1st Qu.:30.70   1st Qu.:24.10  
 Median :31.80   Median :25.00  
 Mean   :31.53   Mean   :25.04  
 3rd Qu.:32.60   3rd Qu.:26.00  
 Max.   :35.80   Max.   :29.00  
Observations
  • There are a total of 1825 observations with 6 variables.

  • There are no missing data in these 6 variables.

  • The average daily mean, maximum and minimum temperatures for these selected years are 27.73°C, 31.53°C and 25.04°C respectively.

2.5 Create useful columns and summarise dataset

It might be useful to look at the trend of daily temperatures by month of each year. Hence we will group the data by year and month, and then averaging the mean temperature and getting the maximum and minimum temperature in each month.

To plot a line chart, it might be good to have a column to show the dates, hence we create two columns to show the dates and day of the year.

Lastly, we also want to visualise the 99% confidence interval from the annual mean temperatures, hence we created another dataset to summarize the mean and standard deviation.

weather_data$DDate <- as.Date(paste(weather_data$Year, 
                                    weather_data$Month, 
                                    weather_data$Day, sep = "-"), 
                              format = "%Y-%b-%d")

# join 1993 Jan 1 to 1983 Dec 31 by setting it to Day 365+1 = 366. 
#Do the same for the other years
weather_data <- weather_data %>% 
  mutate(DayOfYear = yday(DDate) + (Year - 1983)/10 * 365)

weather_month <- weather_data %>% 
                group_by(Year, Month) %>% 
                summarise(AveMeanTemp = mean(MeanTemp),
                          MaxMaxTemp = max(MaxTemp),
                          MinMinTemp = min(MinTemp))

Ave_temp <- weather_month %>% 
  mutate(MonthOfYear = match(Month, month.abb) + (Year - 1983)/10 * 12 ) 

mean_error <- weather_data %>%
  group_by(Year) %>%
  summarise(n = n(), Temp = mean(MeanTemp), sd = sd(MeanTemp)) %>%
  mutate(se = sd/sqrt(n-1))

4. Visual Statistical Analysis

4.1 One-sample test on Daily Mean Temperature

In this MSS annual report 2023, the annual mean temperature in 2023 was 28.2°C. Let us conduct a one-sample test to compare with the 5 years’ records.

Show the code
gghistostats(data = weather_data, x = MeanTemp,
            type = "bayes",
            test.value = 28.2) +
  labs(x = "Daily mean temperatures") +
  theme_minimal()

Interpretation of results

log(BF01) = - 115.31 is a very small negative value, which means that the past 5 years’ mean temperatures are significantly different from the test value (i.e. 28.2°C).

4.2 One-way Anova test on daily mean temperatures by year

Show the code
ggbetweenstats(data = weather_data, x = Year, y = MeanTemp,
              type = "p",
              mean.ci = TRUE, 
              pairwise.comparisons = TRUE, 
              pairwise.display = "s", #show significant pair only
              p.adjust.method = "fdr",
              messages = FALSE) +
  labs(y = "Daily mean temperatures",
       title = "One-way Anova Test on Daily Mean Temperatures by Year")

Interpretation of results

From the above Anova test, we can conclude that the annual mean temperatures are significantly different in 2023 when compared with the years 1983, 1993, 2003, and 2013. The annual mean temperatures are significantly different for the years 1993 and 2003 too.

5. Conclusion

In this exercise, we aimed to explore the historical trends and patterns of daily temperature in Singapore, using data from the MSS website. December and January are usually the cooler months in a year whereas May and June are the hottest months. There has been a significant increase in the average daily temperature over the years, especially in 2023. The statement made by MSE in the infographic seems to be true ‘The annual mean temperatures rose at an average rate of 0.25 °C per decade.’, as we have found that there is an average rate of 0.22°C per decade using 1982 to 2023 data.