source("R/data_helpers.R")
source("R/model_helpers.R")
source("R/plotting_helpers.R")To reproduce/run this notebook yourself you can clone the following repository:
https://github.com/pim-n/sardal-radioactivity-analysis
1. Introduction
This is the Quarto markdown document that will go through the entire data analysis process done for this project. Most code is called in from helper files, to ensure the document remains readable. As functions are called in from these helper files, their workings will be explained. In case you are interested in seeing the source code, refer to the helper files in the R/ folder at the GitHub repository.
We will now load these helper files:
Section 2 of this document will deal with the data wrangling process. Section 3 will cover data exploration and visualization. Section 4 will give an example of how a model is specified, and Section 5 covers results and analysis of all isotopes.
For mathematical and theoretical details, refer to the project report.
Acknowledgment
The work here is centered around the mvgam [1] package, an R package for Bayesian fitting of Dynamic Generalized Additive Models. Some code is adapted and modified from various sources. In case code has been adapted from somewhere, this is specified in the corresponding section of the document.
[1] Clark N, Wells K (2023). “Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series.” Methods in Ecology and Evolution, 14, 771-784. doi:10.18637/jss.v100.i05.
Running option
If below variable is TRUE, run the entire document including creating the models. Otherwise, load the models from results/models/.
FULL_RUN = FALSE
MIN_LAG = 1
MAX_LAG = 8
if (!FULL_RUN) {
model_files <- paste0("results/models/", list.files("results/models"))
lapply(model_files, load, environment())
}Importing libraries
The following packages were used in this notebook:
# data wrangling
library(tidyverse)
library(glue)
library(lubridate)
library(readxl)
# interpolation of time series
library(zoo)
# for fitting the distributed lag GAMs
library(mgcv)
library(mvgam)
# plotting and formatting
library(gratia)
library(viridis)2. Data Wrangling
Summary: An explanation of the pre-processing procedure, followed by import of CSVs into R dataframes compatible with the
tidyprinciples.
Pre-processing
The data was provided as a set of Excel .xlsx files with sheets for the various isotopes. The data was manually put in CSV format, which is an open standard which is suitable for later import into R. The choice was made to separate the data by type (discharge or measurement at Särdal) and isotope. For measurements, a further distinction was made between species of Fucus spp.. The structure of the pre-processed data is shown below:
.
├── discharges
│ ├── discharges_C14.csv
│ ├── discharges_Co60.csv
│ ├── ...
└── sardal
├── C14
│ └── sardal_activity_C14_F_Various_Mattsson_2025.csv
├── Co60
│ ├── sardal_activity_Co60_F_Serratus_Mattsson_2025.csv
│ └── sardal_activity_Co60_F_Vesiculosus_Mattsson_2025.csv
├── Cs134
│ ├── ...
├── ...
└── ...Data loading
Now it’s time to load the CSVs into a usable format for data-processing. The process is as follows
- List all isotope CSV files in the Särdal path
- For each CSV, map to a dataframe, setting column per species of Fucus
- Find the earliest and latest observation date across all dataframes, and create a
obs_datesequence from the earliest to latest sample date - Join together all the isotope dataframes to this unified
obs_datecolumn, giving one sardal dataframedf_sardal
sardal_path <- "data/sardal"
sardal_files <- list.files(sardal_path, full.names = TRUE, pattern = "*.csv", recursive = TRUE) %>% set_names()
sardal_df_list <- purrr::map(sardal_files, read_csv,
col_types = cols(date = col_date(format = "%Y-%m-%d")))
sardal_df_list <- map2(sardal_df_list, names(sardal_df_list),
function(.x, .y){dplyr::select(.x, 1:2) %>%
set_names(nm=c("obs_date", get_species_name(.y)))}) %>%
map(~drop_na(.x))
df_sardal <- data.frame(obs_date = get_full_time_range_by_day(sardal_df_list))
df_sardal <- map(sardal_df_list, ~ full_join(df_sardal, .x, by = "obs_date"))
df_sardal <- reduce(df_sardal, full_join, by = "obs_date")
df_long_sardal <- df_sardal %>%
pivot_longer(
cols = -obs_date,
names_to = "variable",
values_to = "value"
)The same for steps are applied for the discharges, except we have no species and have a year time column instead of obs_date.
discharges_path <- "data/discharges"
discharges_files <- list.files(discharges_path, full.names = TRUE, pattern = "*.csv", recursive = TRUE) %>% set_names()
discharges_df_list <- purrr::map(discharges_files, read_csv)
discharges_df_list <- map2(discharges_df_list, names(discharges_df_list),
function(.x, .y) {
isotope_name <- get_isotope_name(.y)
column_names <- colnames(.x)[-1] # exclude year col.
new_column_names <- c(colnames(.x)[1], paste(isotope_name, column_names, sep = "_")) # concatenate isotope name to cols.
.x %>%
set_names(nm = new_column_names) # update col. names
}) %>%
map(~drop_na(.x))
df_discharges <- data.frame(year = get_full_time_range_by_year(discharges_df_list))
df_discharges <- map(discharges_df_list, ~ full_join(df_discharges, .x, by = "year"))
df_discharges <- reduce(df_discharges, full_join, by = "year")
df_long_discharges <- df_discharges %>%
pivot_longer(
cols = -year,
names_to = "variable",
values_to = "value"
)3. Data exploration & visualization
Summary: An explanation of the data through visualization. Data quality is discussed by looking at amount of data available and missing values. Finally, an autocorrelation test is performed which shows high autocorrelation.
Plotting data
data_Cs137 <- combine_data("Cs137", df_long_sardal, df_long_discharges)
p <- plot_isotope(data_Cs137)
por Co60:
data_Co60 <- combine_data("Co60", df_long_sardal, df_long_discharges)
p <- plot_isotope(data_Co60)
pNumber of observations
For statistical analysis it is very important to know how much data one has, and how it is structured, and whether there are any missing values or irregularities. A first step could be to see how many unique daily measurements and unique yearly measurements we have:
sum_daily <- df_sardal %>%
group_by(obs_date) %>%
summarize(across(where(is.numeric), ~ any(!is.na(.x)))) %>%
summarize(across(-obs_date, sum)) %>%
pivot_longer(everything())
sum_yearly <- df_sardal %>%
mutate(year = year(obs_date)) %>%
group_by(year) %>%
summarize(across(where(is.numeric), ~ any(!is.na(.x)))) %>%
summarize(across(-year, sum)) %>%
pivot_longer(everything())
sums_table <- merge(sum_daily, sum_yearly, by = 'name') %>%
rename(days = value.x, years = value.y)
sums_table name days years
1 C14_F_Various 72 54
2 Co60_F_Serratus 127 29
3 Co60_F_Vesiculosus 101 29
4 Cs134_F_Serratus 81 19
5 Cs134_F_Vesiculosus 28 11
6 Cs137_F_Serratus 223 50
7 Cs137_F_Vesiculosus 183 50
8 I129_F_Serratus 81 37
9 I129_F_Serratus_Summer 34 30
10 I129_F_Serratus_Winter 47 26
11 Tc99_F_Serratus 18 9
12 Tc99_F_Vesiculosus 24 14
Here is a visualization of the above information, showing the gaps in measured activity concentration (on a yearly scale):
df_long_sardal %>%
filter(!(variable %in% c("I129_F_Serratus_Winter", "I129_F_Serratus_Summer"))) %>%
plot_na_values()Autocorrelation
Autocorrelation is a measure of correlation of a time series with a lagged (delayed) version of itself. This means it can quantify if, and how much, past values affect the present value.
Output is suppressed for the autocorrelation plots because there are many. To see them, consult the written report or clone the repo and run this notebook.
lapply(names(df_discharges)[-1], function(col) {
col_data <- na.omit(df_discharges[[col]])
acf(col_data, plot = TRUE, main = paste("ACF Plot for", format_site_label(col)))
})lapply(names(df_sardal)[-1], function(col) {
col_data <- na.omit(df_sardal[[col]])
acf(col_data, plot = TRUE, main = paste("ACF Plot for", format_species_label(col)))
})We can see that there is some autocorrelation for almost all data. However, explicitly adding an AR component is not always useful. The best case to apply it is Carbon-14, since there is a known external effect (the bomb pulse) This means we have to somehow capture this “memory” of past values in the model. One way is to add an autoregressive component. This will be done in the model specification.
Adressing missing values in predictors
Now, we will combine the discharge and activity concentration measurements for a given isotope, so that we can use that dataframe in the model.
Due to limitations in the mvgam package, predictors cannot have NaN values. Therefore, where missing years are present in the discharges we must address this. Let’s see if we have this issue:
plot_na_values(df_long_discharges)We can see that for the caesium isotopes, there is some years without any discharge data available. We will therefore interpolate the data in these gaps in order to make fitting a model possible for caesium as well. The interpolate_data function will take care of this.
Now, let’s plot an example and see if the interpolation looks reasonable:
vars <- df_long_discharges %>%
filter(str_detect(variable, "Cs137")) %>%
distinct(variable)
# Uninterpolated data
uninterp <- df_long_discharges %>%
filter(str_detect(variable, "Cs137"),
year > 2010) %>%
mutate(type = "Uninterpolated")
# Interpolated ONCE using fuzzy isotope string
interp <- interpolate_data("Cs137", df_long_discharges) %>%
filter(year > 2010) %>%
mutate(type = "Interpolated")
# Combine
plot_df <- bind_rows(interp, uninterp)
# Plot
ggplot(plot_df, aes(x = year, y = value, color = type)) +
geom_point() +
geom_line() +
facet_wrap(~ variable, scales = "free_y") +
labs(
x = "Year",
y = "Discharge [TBq/y]",
color = "Data type"
) +
theme_minimal()Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Looks fine!
4. Example modelling process: Carbon-14
Summary: The chapter goes through creating an isotope dataset with the hand of carbon-14. Discharge and Särdal measurements are combined into one isotope-specific dataset. Tensor smooths are introduced as the foundational component for the distributed lag model.
Combining measurements and discharge
Now it is time to combine the data. The combine_data function will, for a given isotope string (e.g. “C14”), combine the Särdal measurement data as well as the discharge data for said isotope, returning one comprehensive dataframe.
data_C14 <- combine_data("C14", df_long_sardal, df_long_discharges, scale_data = FALSE)
plot_isotope(data_C14)One thing that may be relevant is the correlation between Sellafield and La Hague here. This is to avoid co-linearity issues in the model. Basically, if the time series are highly correlated, adding them as separate predictors in a regression model can lead to inaccurate results. To do this, we must first make the series quasi-stationary by taking the first difference, before checking the correlation.
cor(diff(data_C14$sellafield), diff(data_C14$la_hague))[1] 0.1693971
This is not a significant value to worry about, so we do not take any special steps to avoid co-linearity in the predictors.
Lag matrix
See Wood (2017) - Generalized Additive Models: An Introduction with R, section 5.6 for a detailed explanation on tensor smooths.
The lag matrix is a matrix of the form
0 1 2 3 4 ...
0 1 2 3 4 ...
0 1 2 3 4 ...
0 1 2 3 4 ...
...
with the dimensions of \(N \times L\), where \(N\) is the number of data points and \(L\) is the maximum lag. This construction is needed in order to create a tensor smooth. A tensor product smooth is a smooth surface that is built up from several smooth basis functions. In this case, we create a smooth surface varying with discharge amount from source \(x\) and the lag \(l\), where the height of the surface represents the effect on \(y_t^{sp}\) at a given time \(t\)
\[ f(x_{t-l}, l) = \sum_{k=0}^K \sum_{m=0}^M \beta_{km} b_k(x_{t-l}) c_m(l) \]
here,
\(b_k(x_{t-l})\) are the basis functions in discharge amount \(x_{t-l}\)
\(c_m(l)\) are the basis functions in lag \(l\)
and \(K\) and \(M\) specify the allowed complexity (number of terms) that are allowed to represent those basis functions. For more details, see Wood (2017) and the written report.
data_C14 <- data_C14 %>%
add_lagged_matrices(n_max = MAX_LAG, n_min = MIN_LAG)Hierarchical structure
A hierarchical model is useful when you have a dataset which consists of distinct groups that may have varying effects. Let’s explain by example of our Fucus spp..
One could create two seperate models for Fucus Serratus and Fucus Vesiculosus respectively, modelling the effect of the discharges on each of them individually. But here we ignore the fact that there are common effects among them, namely the transport mechanisms that bring radionuclides to the Swedish coast. Instead of modelling them separately
\[ \text{Activity in F. Serratus} \sim \text{Discharge}\\ \text{Activity in F. Vesiculosus} \sim \text{Discharge} \]
One creates a group intercept
\[ \text{Activity} \sim (1 \vert \text{species}) + \text{Discharge} \]
We essentially add a dummy variable to allow variable intercept that depends on the discharge.
measured activity concentrations, or doing something arbitrary like taking the average, a hierarchical modelling structure combines the information by taking both datasets in the regression, but allowing some statistical variation between them.
To achieve this, we apply weight or ‘mask’ matrices that are the size of the lag matrices, but consist of zeroes and ones. Simply, weights_s matrix is 1 when the species is F. Serratus, 0 otherwise, and the same is true for the F. Vesiculosus weight matrix weights_v.
Note: We are not actually going to use these for the Carbon-14 model, but the procedure is just explained by the hand of C-14.
data_C14 <- data_C14 %>%
apply_weights()Mathematical formulation
With the functions defined and explained, we now specify the model:
\[ g(y_t^*) = \underbrace{\alpha}_{\text{intercept}} + \underbrace{\sum_{l=0}^L f(\text{SF}_{t-l},l)}_{\text{Sellafield contribution}} + \underbrace{\sum_{l=0}^L f(\text{LH}_{t-l},l)}_{\text{La Hague contribution}} + \underbrace{u_t}_{\text{AR(1)}} + \underbrace{\varepsilon_t}_{\text{error term}} \]
where \(u_t = \rho u_{t-1}\) and
- \(y_t^{*}\) is bomb-pulse corrected time series \(y_t - 0.68 \cdot y_t^{bomb}\)
- \(l \in \{0, L\}\) is the lag in years.
- \(g(\cdot)\) is the link function, that links the linear combination of predictors to the expected value of \(y_t^{sp}\).
- \(f(x_{t-l},l)\) are the tensor product smooths as defined before.
- \(\varepsilon_t \sim \mathcal{N}(0, \sigma^2)\) is the error term.
Fitting the model
We can now fit the model. Note log-link relation between the predictors and response (for more details on why, see report).
The fitting can now be done by first formulating the model in mgcv/mvgam convention, then calling the functions. Note that the family of response is now assumed to be \(y_t \sim \text{LogNormal}(\mu, \sigma^2)\), which in terms of distribution of \(y_t\) is equivalent to \(log(y_t) \sim \text{Normal}(\mu, \sigma^2)\), with the small benefit that we do not have to transform the data back to the proper scale after fitting.
C14 Model - AR(1)
if(FULL_RUN) {
formula <- sardal ~
te(sellafield, lag, k=4) +
te(la_hague, lag, k=4)
priors <- get_mvgam_priors(
formula = formula,
family = lognormal(),
data = data_C14,
trend_model = AR(1)
)
priors$prior[3] <- 'ar1 ~ beta(2, 2);'
model_C14_AR_mvgam <- mvgam(formula = formula,
family = lognormal(),
data = data_C14,
trend_model = AR(1),
burnin = 2000,
samples = 4000,
parallel = TRUE,
threads = 6,
noncentered = TRUE,
priors = priors
)
save(model_C14_AR_mvgam, file = "results/models/C14_AR.mvgam.Rdata")
}And that’s it! The C14.mvgam.Rdata* is now a complete fitted model object containing estimated parameters, original data, predicted (fitted) data, STAN code, posterior information and more. We save the model.
6. All results
Carbon-14
Before we fit the other models we will now look at the summary of objects created above
summary(model_C14_AR_mvgam)Loading required namespace: rstan
GAM formula:
sardal ~ te(sellafield, lag, k = 4) + te(la_hague, lag, k = 4)
Family:
lognormal
Link function:
identity
Trend model:
AR(p = 1)
N series:
1
N timepoints:
40
Status:
Fitted using Stan
4 chains, each with iter = 6000; warmup = 2000; thin = 1
Total post-warmup draws = 16000
log(observation error) parameter estimates:
2.5% 50% 97.5% Rhat n_eff
sigma_obs[1] 0.023 0.035 0.055 1 2873
GAM coefficient (beta) estimates:
2.5% 50% 97.5% Rhat n_eff
(Intercept) 0.046 1.2e-01 0.180 1 2968
te(sellafield,lag).1 -0.087 4.1e-03 0.098 1 7280
te(sellafield,lag).2 -0.100 4.0e-03 0.110 1 5504
te(sellafield,lag).3 -0.170 3.4e-03 0.180 1 4767
te(sellafield,lag).4 -0.150 -4.7e-03 0.140 1 4137
te(sellafield,lag).5 -0.076 -1.0e-03 0.077 1 5747
te(sellafield,lag).6 -0.091 -2.2e-03 0.087 1 4729
te(sellafield,lag).7 -0.160 -3.2e-03 0.160 1 4507
te(sellafield,lag).8 -0.120 5.6e-05 0.120 1 4685
te(sellafield,lag).9 -0.110 1.7e-03 0.110 1 5448
te(sellafield,lag).10 -0.150 2.8e-03 0.160 1 4350
te(sellafield,lag).11 -0.190 -4.3e-03 0.180 1 4304
te(sellafield,lag).12 -0.170 1.1e-02 0.190 1 6335
te(sellafield,lag).13 -0.090 4.1e-03 0.097 1 7232
te(sellafield,lag).14 -0.097 5.5e-03 0.110 1 5798
te(sellafield,lag).15 -0.170 7.9e-03 0.190 1 5388
te(la_hague,lag).1 -0.210 1.1e-02 0.240 1 4085
te(la_hague,lag).2 -0.220 1.6e-02 0.260 1 2822
te(la_hague,lag).3 -0.290 1.1e-02 0.320 1 4795
te(la_hague,lag).4 -0.230 -1.2e-02 0.210 1 2501
te(la_hague,lag).5 -0.190 4.8e-03 0.200 1 2436
te(la_hague,lag).6 -0.210 -4.0e-03 0.200 1 1909
te(la_hague,lag).7 -0.230 5.5e-03 0.240 1 3091
te(la_hague,lag).8 -0.250 -1.8e-03 0.230 1 4458
te(la_hague,lag).9 -0.260 -1.8e-02 0.230 1 2453
te(la_hague,lag).10 -0.250 -1.2e-02 0.230 1 1848
te(la_hague,lag).11 -0.260 -5.5e-03 0.240 1 3684
te(la_hague,lag).12 -0.260 8.1e-03 0.260 1 3136
te(la_hague,lag).13 -0.170 4.9e-03 0.190 1 2696
te(la_hague,lag).14 -0.180 2.5e-03 0.190 1 1907
te(la_hague,lag).15 -0.260 -6.5e-03 0.250 1 2874
Approximate significance of GAM smooths:
edf Ref.df Chi.sq p-value
te(sellafield,lag) 10.674 15 0.047 1
te(la_hague,lag) 9.299 15 0.178 1
standard deviation:
2.5% 50% 97.5% Rhat n_eff
sigma[1] 0.027 0.041 0.065 1 2529
precision parameter:
2.5% 50% 97.5% Rhat n_eff
tau[1] 240 590 1400 1 2937
autoregressive coef 1:
2.5% 50% 97.5% Rhat n_eff
ar1[1] 0.21 0.67 0.94 1 3111
Stan MCMC diagnostics:
✔ No issues with effective samples per iteration
✔ Rhat looks good for all parameters
✔ No issues with divergences
✔ No issues with maximum tree depth
Samples were drawn using sampling(hmc). For each parameter, n_eff is a
crude measure of effective sample size, and Rhat is the potential scale
reduction factor on split MCMC chains (at convergence, Rhat = 1)
Use how_to_cite() to get started describing this model
mcmc_plot(model_C14_AR_mvgam, variable='ar1', regex=TRUE, type="areas")We see that the AR(1) component is very strong, indicating a strong temporal trend. This is the residual of the carbon-14 bomb pulse that is being captured by the AR(1) trend component. The tensor product smooth posteriors are a bit difficult to interpret directly. Instead, we can look at the final estimated interaction effect surface from the fitted tensor product smooths:
plot_mvgam_smooth(model_C14_AR_mvgam, series = 1, smooth = 1)plot_mvgam_smooth(model_C14_AR_mvgam, series = 1, smooth = 2)Again, these surfaces represent \(f(x_{t-l}, l)\), essentially the ‘joint effect’ of lag and discharge on the measured activity concentration in Fucus at Särdal. Sellafield is showing a pattern in line with expectations: a higher discharge from Sellafield results in a higher (more positive) effect on Särdal activity concentrations. Regarding lag, we see that the highest values are centered around 3-5 years, with a very broad delayed effect up to 8 years as well, with the effect becoming closer to 0 near 9-10 years. This seems in line with estimates of Sellafield transport time being approximately 3-4 years, corresponding to the initial ‘hit’, and the prolonged effect across many lags could be due to the delayed uptake as other environmental processes (which? ask Kristina). La Hague does not really match expectation.
We can also inspect what the predictions are from this fitted model (note the log scale!):
plot(hindcast(model_C14_AR_mvgam))No non-missing values in test_observations; cannot calculate forecast score
Some model diagnostics that will inform us about the fit are the residual plots and the prior-posterior check:
plot_mvgam_resids(model_C14_AR_mvgam)ppc(model_C14_AR_mvgam)The residual plots, such as the Q-Q plot, indicate a good model fit, suggesting that the lognormal family is appropriately describing the response. There is some slight residual autocorrelation which is not captured by the AR(1) process, overall the AR(1) has proved effective at capturing both the effect of the bomb pulse and the ‘memory’ effect of radiocarbon being stored in the Fucus over time.
The posterior predictive check (PPC) shows how well the estimated posterior \(\hat{y}\) represents the distribution of the observed data \(y\). We can see a rather good fit. However, because the actual surfaces look rather complicated it’s possible that the model is overfitting to noise here.
Carbon-14 - Bomb-pulse corrected
Above we tried to compensate for the atmospheric Carbon-14 from the nuclear weapons testing by applying an AR(1) component. Another approach could be to modify the time series of measurements instead, detrending it. We load some atmospheric C-14 measurements that are in the same unit, and subtract a scaled version to attempt to make an approximately flat time series (save for any peaks due to e.g. marine discharges).
data_C14 <- combine_data("C14", df_long_sardal, df_long_discharges, scale_data = FALSE)
bomb_pulse <- read_excel("~/RadiologyProject/sardal-radioactivity-analysis/S0033822221000953sup002.xls", col_names = FALSE, skip = 314)New names:
• `` -> `...1`
• `` -> `...2`
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
bomb_pulse <- bomb_pulse %>%
mutate(year = floor(.data[[names(bomb_pulse)[1]]])) %>%
group_by(year) %>%
summarise(F14C = mean(.data[[names(bomb_pulse)[4]]], na.rm = TRUE))
bomb_pulse <- bomb_pulse %>%
bind_rows(
bomb_pulse %>%
slice_tail(n = 2) %>%
mutate(slope = (F14C - lag(F14C)) / (year - lag(year))) %>%
slice_tail(n = 1) %>%
transmute(
year = 2020,
F14C = F14C + slope
)
)
bomb_pulse <- bomb_pulse %>%
filter(year %in% data_C14$year)
data_C14$sardal <- data_C14$sardal - 0.68*bomb_pulse$F14C
plot(data_C14$sardal)data_C14 <- data_C14 %>%
add_lagged_matrices(n_max = MAX_LAG, n_min = MIN_LAG) %>%
apply_weights()We can now make a model again
if(FULL_RUN) {
formula <- sardal ~
te(sellafield, lag, k=4) +
te(la_hague, lag, k=4)
model_C14_Corrected_mvgam <- mvgam(formula = formula,
family = lognormal(),
data = data_C14,
trend_model = 'None',
burnin = 2000,
samples = 4000,
parallel = TRUE,
threads = 6,
noncentered = TRUE
)
save(model_C14_Corrected_mvgam, file = "results/models/C14_Corrected.mvgam.Rdata")
}summary(model_C14_Corrected_mvgam)GAM formula:
sardal ~ te(sellafield, lag, k = 4) + te(la_hague, lag, k = 4)
Family:
lognormal
Link function:
identity
Trend model:
None
N series:
1
N timepoints:
40
Status:
Fitted using Stan
4 chains, each with iter = 6000; warmup = 2000; thin = 1
Total post-warmup draws = 16000
log(observation error) parameter estimates:
2.5% 50% 97.5% Rhat n_eff
sigma_obs[1] 0.019 0.027 0.041 1 4697
GAM coefficient (beta) estimates:
2.5% 50% 97.5% Rhat n_eff
(Intercept) -0.990 -0.9800 -0.980 1 18117
te(sellafield,lag).1 -0.077 0.0024 0.081 1 5764
te(sellafield,lag).2 -0.100 -0.0014 0.097 1 3518
te(sellafield,lag).3 -0.160 0.0075 0.170 1 2768
te(sellafield,lag).4 -0.160 -0.0120 0.130 1 2629
te(sellafield,lag).5 -0.063 0.0097 0.080 1 4958
te(sellafield,lag).6 -0.082 0.0054 0.090 1 3251
te(sellafield,lag).7 -0.160 -0.0052 0.150 1 2654
te(sellafield,lag).8 -0.110 -0.0044 0.110 1 2719
te(sellafield,lag).9 -0.094 0.0060 0.110 1 5107
te(sellafield,lag).10 -0.140 0.0120 0.160 1 2800
te(sellafield,lag).11 -0.190 -0.0076 0.180 1 2483
te(sellafield,lag).12 -0.100 0.0580 0.220 1 3079
te(sellafield,lag).13 -0.072 0.0100 0.087 1 5795
te(sellafield,lag).14 -0.078 0.0110 0.100 1 3600
te(sellafield,lag).15 -0.140 0.0260 0.190 1 2941
te(la_hague,lag).1 -0.220 -0.0230 0.180 1 5650
te(la_hague,lag).2 -0.210 -0.0073 0.200 1 5700
te(la_hague,lag).3 -0.260 0.0047 0.270 1 5698
te(la_hague,lag).4 -0.220 -0.0250 0.160 1 4198
te(la_hague,lag).5 -0.160 0.0099 0.190 1 3860
te(la_hague,lag).6 -0.210 -0.0280 0.160 1 4111
te(la_hague,lag).7 -0.210 0.0170 0.240 1 4092
te(la_hague,lag).8 -0.210 -0.0110 0.190 1 5755
te(la_hague,lag).9 -0.250 -0.0290 0.210 1 4139
te(la_hague,lag).10 -0.210 0.0051 0.230 1 4094
te(la_hague,lag).11 -0.240 -0.0048 0.220 1 4447
te(la_hague,lag).12 -0.230 -0.0048 0.220 1 4640
te(la_hague,lag).13 -0.140 0.0170 0.180 1 3937
te(la_hague,lag).14 -0.150 0.0075 0.170 1 4232
te(la_hague,lag).15 -0.260 -0.0320 0.190 1 4024
Approximate significance of GAM smooths:
edf Ref.df Chi.sq p-value
te(sellafield,lag) 1.423 15 0.525 1.000
te(la_hague,lag) 1.793 15 0.867 0.999
Stan MCMC diagnostics:
✔ No issues with effective samples per iteration
✔ Rhat looks good for all parameters
✔ No issues with divergences
✔ No issues with maximum tree depth
Samples were drawn using sampling(hmc). For each parameter, n_eff is a
crude measure of effective sample size, and Rhat is the potential scale
reduction factor on split MCMC chains (at convergence, Rhat = 1)
Use how_to_cite() to get started describing this model
plot_mvgam_smooth(model_C14_Corrected_mvgam, series = 1, smooth = 1)plot_mvgam_smooth(model_C14_Corrected_mvgam, series = 1, smooth = 2)plot(hindcast(model_C14_Corrected_mvgam))No non-missing values in test_observations; cannot calculate forecast score
plot_mvgam_resids(model_C14_Corrected_mvgam)ppc(model_C14_Corrected_mvgam)Caesium-137
Now, caesium is a unique case, because the influence from the 1986 Chernobyl accident on Ceasium concentrations in the Nordics has been well studied. In particular, outflow from the Baltic sea is predicted to be the primary contributor of ceasium activity concentrations on the Swedish west coast. Thus, we must include it. We include here a Baltic sea caesium-137 simulation based on information from HELCOM (2009) Radioactivity in the Baltic Sea, 1999-2006 HELCOM thematic assessment. Balt. Sea Environ. Proc. No. 117, pages 21 and 22.
simulate_baltic_cs137 <- function(data_Cs137) {
# take a vector of years from the original dataset
years <- sort(unique(data_Cs137$year))
total_years <- length(years)
# number of data points after the incident
n_after <- length(years[years >= 1986])
t <- seq_len(n_after) - 1
# effective half lives used from HELCOM 2009 pp. 21-22
k_1 <- log(2)/2.5 # 1986-1988
k_2 <- log(2)/9 # 1990 - end
y_0 <- 650 # approximate peak value Bq / m^-3
# prefill empty array
x <- array(5, total_years)
# expoenential decay with piecewise half lives
data <- y_0 * exp(- (k_1 * pmin(t, 3) + k_2 * pmax(0, t - 3)))
# replace the tail (from 1986 to end) by sim data
x[(total_years - n_after + 1):total_years] <- data
# add noise
set.seed(123)
x <- x + rnorm(total_years, sd = 0.1 * mean(x))
return(as.vector(x))
}Adding baltic sea simulation to the data:
data_Cs137 <- combine_data("Cs137", df_long_sardal, df_long_discharges, scale_data = FALSE)
baltic_Cs137 <- simulate_baltic_cs137(data_Cs137)
data_Cs137 <- data_Cs137 %>%
group_by(series) %>%
mutate(baltic = baltic_Cs137) %>%
ungroup()
plot_isotope(data_Cs137)Let’s check correlation once more
cor(diff(data_Cs137$sellafield), diff(data_Cs137$baltic))[1] -0.06012205
We prepare the data as before
data_Cs137 <- data_Cs137 %>%
add_lagged_matrices(n_max = MAX_LAG, n_min = MIN_LAG) %>%
apply_weights()Note the extra terms below in the formula. It looks like we have a much bigger model, but this is the hierarchical model at play through applying the terms only when the weights matrix contains a 1. Anyways, let’s fit
Model
if(FULL_RUN) {
formula <- sardal ~
te(sellafield, lag,
k = 4) +
te(baltic, lag,
k = 4)
model_Cs137_mvgam <- mvgam(formula = formula,
family = lognormal(),
data = data_Cs137,
burnin = 2000,
samples = 4000,
parallel = TRUE,
threads = 6,
noncentered = TRUE
)
save(model_Cs137_mvgam, file = "results/models/Cs137.mvgam.Rdata")
}We can make the same plots and summaries as before:
summary(model_Cs137_mvgam)GAM formula:
sardal ~ te(sellafield, lag, k = 4) + te(baltic, lag, k = 4)
Family:
lognormal
Link function:
identity
Trend model:
None
N series:
2
N timepoints:
46
Status:
Fitted using Stan
4 chains, each with iter = 6000; warmup = 2000; thin = 1
Total post-warmup draws = 16000
log(observation error) parameter estimates:
2.5% 50% 97.5% Rhat n_eff
sigma_obs[1] 0.14 0.18 0.23 1.00 783
sigma_obs[2] 0.14 0.16 0.22 1.17 9
GAM coefficient (beta) estimates:
2.5% 50% 97.5% Rhat n_eff
(Intercept) 2.000 2.0e+00 2.1000 1.06 39
te(sellafield,lag).1 -0.075 -4.3e-02 0.0051 1.09 44
te(sellafield,lag).2 -0.210 -4.3e-02 0.0580 1.05 153
te(sellafield,lag).3 -0.390 -4.9e-02 0.1700 1.06 124
te(sellafield,lag).4 -0.290 -6.0e-02 0.3000 1.07 100
te(sellafield,lag).5 -0.200 -6.9e-02 0.1400 1.08 82
te(sellafield,lag).6 -0.110 -7.7e-02 -0.0350 1.11 22
te(sellafield,lag).7 -0.270 -7.1e-02 0.0520 1.05 147
te(sellafield,lag).8 -0.210 -4.1e-02 0.2100 1.06 107
te(sellafield,lag).9 -0.065 -3.9e-02 -0.0071 1.15 13
te(sellafield,lag).10 -0.220 -3.7e-02 0.0760 1.06 151
te(sellafield,lag).11 -0.400 -4.2e-02 0.1800 1.06 116
te(sellafield,lag).12 -0.230 2.9e-05 0.2900 1.24 7
te(sellafield,lag).13 0.044 1.4e-01 0.3100 1.08 34
te(sellafield,lag).14 0.092 3.5e-01 0.5500 1.09 28
te(sellafield,lag).15 0.070 5.0e-01 0.9000 1.17 10
te(baltic,lag).1 -0.270 -1.4e-01 -0.0660 1.16 9
te(baltic,lag).2 -0.240 -1.0e-01 0.1200 1.14 13
te(baltic,lag).3 -0.290 -3.3e-02 0.3700 1.24 7
te(baltic,lag).4 -0.490 -1.1e-01 0.0960 1.05 101
te(baltic,lag).5 -0.240 -9.9e-02 -0.0059 1.48 4
te(baltic,lag).6 -0.170 -9.9e-02 0.0240 1.18 10
te(baltic,lag).7 -0.260 -6.4e-02 0.2900 1.22 8
te(baltic,lag).8 -0.230 4.2e-02 0.3000 1.02 286
te(baltic,lag).9 0.041 2.1e-01 0.3400 1.21 7
te(baltic,lag).10 -0.041 1.4e-01 0.4600 1.20 9
te(baltic,lag).11 -0.230 7.5e-03 0.5000 1.09 35
te(baltic,lag).12 0.045 5.2e-01 0.8800 1.03 329
te(baltic,lag).13 0.014 3.6e-01 0.6000 1.11 17
te(baltic,lag).14 -0.081 1.4e-01 0.4300 1.01 453
te(baltic,lag).15 -0.270 -3.5e-02 0.4000 1.06 60
Approximate significance of GAM smooths:
edf Ref.df Chi.sq p-value
te(sellafield,lag) 2.955 15 48.38 0.00401 **
te(baltic,lag) 2.418 15 33.18 0.04618 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Stan MCMC diagnostics:
✖ n_eff / iter below 0.001 found for 38 parameters
Effective sample size is inaccurate for these parameters
✖ Rhats above 1.05 found for some parameters
Use pairs() and mcmc_plot() to investigate
✖ 3075 of 16000 iterations ended with a divergence (19.2188%)
Try a larger adapt_delta to remove divergences
✖ 12897 of 16000 iterations saturated the maximum tree depth of 10
(80.6062%)
Try a larger max_treedepth to avoid saturation
Samples were drawn using sampling(hmc). For each parameter, n_eff is a
crude measure of effective sample size, and Rhat is the potential scale
reduction factor on split MCMC chains (at convergence, Rhat = 1)
Use how_to_cite() to get started describing this model
mcmc_plot(model_Cs137_mvgam)By p-values, it appears the model is finding predictive power for both Sellafield discharges and Baltic sea activity concentrations.
plot_mvgam_smooth(model_Cs137_mvgam, smooth = 1)plot_mvgam_smooth(model_Cs137_mvgam, smooth = 2)The effect seems to be mainly around 1-2 years delay with a long tail of influence towards higher lags. The hypothesis that the majority of Caesium-137 after 1986 originates from Baltic sea outflow into the Kattegat has been described and supported in Mattsson et al. (2025) by effective half life calculations. Our simulated series of activity concentrations based on reported effective half lives seems to corroborate this hypothesis.
That Sellafield has predictive power in the model, could indicate that the peak in activity concentrations around the 1980s before the Chernobyl disaster is caused by the significant discharges from Sellafield in the late 1970s. The smooth shows a peak effect much later than initial arrival time, closer to 6-7 years.
plot(hindcast(model_Cs137_mvgam))No non-missing values in test_observations; cannot calculate forecast score
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
plot(hindcast(model_Cs137_mvgam), series = 2)No non-missing values in test_observations; cannot calculate forecast score
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
The trends look slightly different. We can see what the difference is:
plot(model_Cs137_mvgam, type = "re")No random effect smooths (bs = "re") in model formula
It seems like the uncertainties are rather large and we cannot say anything conclusive from this model whether F. Serratus and F. Vesiculosus have different uptake of Caesium-137 in the Swedish west coast marine environment.
And the diagnostics:
plot_mvgam_resids(model_Cs137_mvgam)ppc(model_Cs137_mvgam)Cobalt-60
data_Co60 <- combine_data("Co60", df_long_sardal, df_long_discharges, scale_data = FALSE)
plot_isotope(data_Co60)cor(diff(data_Co60$barseback), diff(data_Co60$la_hague))[1] -0.04478878
cor(diff(data_Co60$barseback), diff(data_Co60$winfrith))[1] -0.1521441
cor(diff(data_Co60$winfrith), diff(data_Co60$la_hague))[1] 0.2544252
Now, for Cobalt-60 we have rather limited data available, and in order to get a useful model, one cannot have too many parameters compared to number of datapoints. Thus, since the signal for Sellafield is rather weak compared to the rest we ignore it. We prepare the data as before
data_Co60 <- data_Co60 %>%
add_lagged_matrices(n_max = MAX_LAG, n_min = MIN_LAG) %>%
apply_weights()Let’s fit as before
Model
if(FULL_RUN) {
formula <- sardal ~
te(la_hague, lag,
k = 4) +
te(barseback, lag,
k = 4)
model_Co60_mvgam <- mvgam(formula = formula,
family = lognormal(),
trend_model = 'None',
data = data_Co60,
burnin = 2000,
samples = 4000,
parallel = TRUE,
threads = 6,
noncentered = TRUE
)
save(model_Co60_mvgam, file = "results/models/Co60.mvgam.Rdata")
}We can make the same plots and summaries as before:
summary(model_Co60_mvgam)GAM formula:
sardal ~ te(la_hague, lag, k = 4) + te(barseback, lag, k = 4)
Family:
lognormal
Link function:
identity
Trend model:
None
N series:
2
N timepoints:
20
Status:
Fitted using Stan
4 chains, each with iter = 6000; warmup = 2000; thin = 1
Total post-warmup draws = 16000
log(observation error) parameter estimates:
2.5% 50% 97.5% Rhat n_eff
sigma_obs[1] 0.16 0.25 0.40 1 10761
sigma_obs[2] 0.25 0.35 0.52 1 13834
GAM coefficient (beta) estimates:
2.5% 50% 97.5% Rhat n_eff
(Intercept) 0.1300 0.2600 0.360 1 11248
te(la_hague,lag).1 -0.1100 -0.0490 0.015 1 7373
te(la_hague,lag).2 -0.2300 -0.0680 0.089 1 4419
te(la_hague,lag).3 -0.3500 -0.0960 0.150 1 4333
te(la_hague,lag).4 -0.2600 -0.0380 0.180 1 4102
te(la_hague,lag).5 -0.1700 -0.0560 0.067 1 4412
te(la_hague,lag).6 -0.1200 -0.0810 -0.036 1 8025
te(la_hague,lag).7 -0.2600 -0.0930 0.075 1 4489
te(la_hague,lag).8 -0.3600 -0.0600 0.240 1 6331
te(la_hague,lag).9 -0.1200 0.0160 0.150 1 7462
te(la_hague,lag).10 -0.0450 0.0830 0.230 1 4140
te(la_hague,lag).11 -0.1400 0.1200 0.370 1 5220
te(la_hague,lag).12 -0.3600 0.0070 0.400 1 7996
te(la_hague,lag).13 -0.2900 -0.0240 0.240 1 7559
te(la_hague,lag).14 -0.3100 -0.0480 0.190 1 6713
te(la_hague,lag).15 -0.4100 -0.0520 0.290 1 6727
te(barseback,lag).1 -0.4800 -0.2200 0.055 1 10097
te(barseback,lag).2 -0.5900 -0.2500 0.065 1 9667
te(barseback,lag).3 -0.7400 -0.3200 0.086 1 9045
te(barseback,lag).4 -0.2400 -0.0039 0.220 1 7101
te(barseback,lag).5 -0.1500 0.0520 0.270 1 5968
te(barseback,lag).6 -0.2100 0.0038 0.220 1 6818
te(barseback,lag).7 -0.3000 -0.0270 0.250 1 7720
te(barseback,lag).8 -0.1900 0.1300 0.450 1 7362
te(barseback,lag).9 0.0064 0.2400 0.490 1 6144
te(barseback,lag).10 0.0052 0.3000 0.600 1 6200
te(barseback,lag).11 0.0084 0.3300 0.650 1 7904
te(barseback,lag).12 -0.4100 -0.0052 0.500 1 7929
te(barseback,lag).13 -0.5000 -0.1400 0.280 1 7552
te(barseback,lag).14 -0.4700 -0.1200 0.270 1 9181
te(barseback,lag).15 -0.5600 -0.1700 0.220 1 10409
Approximate significance of GAM smooths:
edf Ref.df Chi.sq p-value
te(la_hague,lag) 2.163 15 5.318 0.626
te(barseback,lag) 2.189 15 6.593 0.953
Stan MCMC diagnostics:
✔ No issues with effective samples per iteration
✔ Rhat looks good for all parameters
✔ No issues with divergences
✔ No issues with maximum tree depth
Samples were drawn using sampling(hmc). For each parameter, n_eff is a
crude measure of effective sample size, and Rhat is the potential scale
reduction factor on split MCMC chains (at convergence, Rhat = 1)
Use how_to_cite() to get started describing this model
mcmc_plot(model_Co60_mvgam)plot_mvgam_smooth(model_Co60_mvgam, smooth = 1)plot_mvgam_smooth(model_Co60_mvgam, smooth = 2)plot(hindcast(model_Co60_mvgam), series = 1)No non-missing values in test_observations; cannot calculate forecast score
plot(hindcast(model_Co60_mvgam), series = 2)No non-missing values in test_observations; cannot calculate forecast score
plot(model_Co60_mvgam, type = "re")No random effect smooths (bs = "re") in model formula
And the diagnostics:
plot_mvgam_resids(model_Co60_mvgam)ppc(model_Co60_mvgam)Iodine-129
For Iodine-129, Mattsson et al. (2025) report significant seasonal differences between summer and winter data. Because this is a significant effect, and we have the collection season available as data, we chose here to treat F. Serratus in summer and winter as two separate ‘species’. We drop the mean measurement which is denoted as just F. Serratus:
data_I129 <- combine_data("I129", df_long_sardal, df_long_discharges, scale_data = FALSE)
plot_isotope(data_I129)cor(diff(data_I129$sellafield), diff(data_I129$la_hague))[1] -0.04429647
data_I129 <- droplevels(data_I129[!data_I129$series == 'I129_F_Serratus',])
data_I129 <- data_I129 %>%
add_lagged_matrices(n_max = MAX_LAG, n_min = MIN_LAG) %>%
apply_weights()Model
Let’s fit as before
if(FULL_RUN) {
formula <- sardal ~
s(series, bs = "re") +
te(sellafield, lag, k = 4, by=weights_ss) +
te(sellafield, lag, k = 4, by=weights_sw) +
te(la_hague, lag, k = 4, by=weights_ss) +
te(la_hague, lag, k = 4, by=weights_sw)
model_I129_mvgam <- mvgam(formula = formula,
family = lognormal(),
data = data_I129,
trend_model = 'None',
burnin = 2000,
samples = 4000,
parallel = TRUE,
threads = 6,
noncentered = TRUE,
priors = priors
)
save(model_I129_mvgam, file = "results/models/I129.mvgam.Rdata")
}We can make the same plots and summaries as before:
summary(model_I129_mvgam)GAM formula:
sardal ~ te(sellafield, lag, k = 4, by = weights_ss) + te(sellafield,
lag, k = 4, by = weights_sw) + te(la_hague, lag, k = 4, by = weights_ss) +
te(la_hague, lag, k = 4, by = weights_sw) + s(series, bs = "re")
Family:
lognormal
Link function:
identity
Trend model:
None
N series:
2
N timepoints:
27
Status:
Fitted using Stan
4 chains, each with iter = 6000; warmup = 2000; thin = 1
Total post-warmup draws = 16000
log(observation error) parameter estimates:
2.5% 50% 97.5% Rhat n_eff
sigma_obs[1] 0.50 0.71 1.00 1 10876
sigma_obs[2] 0.45 0.62 0.92 1 13362
GAM coefficient (beta) estimates:
2.5% 50% 97.5% Rhat n_eff
(Intercept) -2.30 -0.2000 2.20 1.01 654
te(sellafield,lag):weights_ss.1 -0.59 -0.0830 0.39 1.00 9934
te(sellafield,lag):weights_ss.2 -0.52 -0.0880 0.33 1.00 7692
te(sellafield,lag):weights_ss.3 -0.63 -0.1600 0.29 1.00 4931
te(sellafield,lag):weights_ss.4 -0.84 -0.2500 0.30 1.00 7858
te(sellafield,lag):weights_ss.5 -0.50 -0.0650 0.33 1.00 6516
te(sellafield,lag):weights_ss.6 -0.25 0.0730 0.40 1.00 6123
te(sellafield,lag):weights_ss.7 -0.17 0.1500 0.48 1.00 5630
te(sellafield,lag):weights_ss.8 -0.30 0.1200 0.52 1.00 7406
te(sellafield,lag):weights_ss.9 -0.42 0.0480 0.54 1.00 7035
te(sellafield,lag):weights_ss.10 -0.25 0.1200 0.49 1.00 6137
te(sellafield,lag):weights_ss.11 -0.15 0.2300 0.64 1.00 5503
te(sellafield,lag):weights_ss.12 -0.23 0.2700 0.85 1.00 6259
te(sellafield,lag):weights_ss.13 -0.40 0.0900 0.67 1.00 6490
te(sellafield,lag):weights_ss.14 -0.62 -0.1700 0.21 1.00 5636
te(sellafield,lag):weights_ss.15 -0.76 -0.2800 0.10 1.00 2658
te(sellafield,lag):weights_ss.16 -0.68 -0.1600 0.40 1.00 6406
te(sellafield,lag):weights_sw.1 -0.52 -0.0580 0.38 1.00 9113
te(sellafield,lag):weights_sw.2 -0.41 -0.0420 0.35 1.00 7492
te(sellafield,lag):weights_sw.3 -0.48 -0.0700 0.34 1.00 7402
te(sellafield,lag):weights_sw.4 -0.68 -0.1500 0.35 1.00 7436
te(sellafield,lag):weights_sw.5 -0.44 -0.0500 0.32 1.00 6211
te(sellafield,lag):weights_sw.6 -0.20 0.0840 0.38 1.00 4840
te(sellafield,lag):weights_sw.7 -0.15 0.1200 0.42 1.00 5306
te(sellafield,lag):weights_sw.8 -0.27 0.0980 0.48 1.00 6529
te(sellafield,lag):weights_sw.9 -0.36 0.0610 0.51 1.00 6741
te(sellafield,lag):weights_sw.10 -0.21 0.1200 0.46 1.00 6030
te(sellafield,lag):weights_sw.11 -0.17 0.1600 0.51 1.00 5742
te(sellafield,lag):weights_sw.12 -0.26 0.1900 0.67 1.00 6809
te(sellafield,lag):weights_sw.13 -0.42 0.0570 0.58 1.00 8426
te(sellafield,lag):weights_sw.14 -0.48 -0.0870 0.28 1.00 7450
te(sellafield,lag):weights_sw.15 -0.52 -0.1500 0.21 1.00 7014
te(sellafield,lag):weights_sw.16 -0.60 -0.1300 0.34 1.00 3830
te(la_hague,lag):weights_ss.1 -0.57 0.0082 0.55 1.00 8723
te(la_hague,lag):weights_ss.2 -0.47 -0.0210 0.41 1.00 6200
te(la_hague,lag):weights_ss.3 -0.57 -0.0830 0.29 1.00 4778
te(la_hague,lag):weights_ss.4 -0.85 -0.1600 0.30 1.00 4600
te(la_hague,lag):weights_ss.5 -0.48 -0.0390 0.37 1.00 5619
te(la_hague,lag):weights_ss.6 -0.31 -0.0160 0.27 1.00 4805
te(la_hague,lag):weights_ss.7 -0.27 0.0250 0.31 1.00 4561
te(la_hague,lag):weights_ss.8 -0.33 0.0720 0.50 1.00 4392
te(la_hague,lag):weights_ss.9 -0.41 -0.0180 0.34 1.00 4810
te(la_hague,lag):weights_ss.10 -0.30 -0.0190 0.24 1.00 4659
te(la_hague,lag):weights_ss.11 -0.27 0.0110 0.27 1.00 4168
te(la_hague,lag):weights_ss.12 -0.27 0.0740 0.47 1.00 4603
te(la_hague,lag):weights_ss.13 -0.47 0.0099 0.49 1.00 6458
te(la_hague,lag):weights_ss.14 -0.36 -0.0180 0.32 1.00 5808
te(la_hague,lag):weights_ss.15 -0.37 -0.0180 0.33 1.00 5829
te(la_hague,lag):weights_ss.16 -0.49 -0.0120 0.50 1.00 6616
te(la_hague,lag):weights_sw.1 -0.61 -0.0180 0.54 1.00 1188
te(la_hague,lag):weights_sw.2 -0.47 -0.0430 0.38 1.00 2017
te(la_hague,lag):weights_sw.3 -0.45 -0.0910 0.25 1.00 4502
te(la_hague,lag):weights_sw.4 -0.65 -0.1400 0.29 1.00 9352
te(la_hague,lag):weights_sw.5 -0.34 0.0460 0.45 1.00 7144
te(la_hague,lag):weights_sw.6 -0.18 0.0960 0.39 1.00 5600
te(la_hague,lag):weights_sw.7 -0.14 0.1300 0.41 1.00 4613
te(la_hague,lag):weights_sw.8 -0.22 0.1500 0.55 1.00 6994
te(la_hague,lag):weights_sw.9 -0.32 0.0320 0.39 1.00 6737
te(la_hague,lag):weights_sw.10 -0.21 0.0430 0.30 1.00 4987
te(la_hague,lag):weights_sw.11 -0.18 0.0620 0.31 1.00 4041
te(la_hague,lag):weights_sw.12 -0.22 0.1100 0.47 1.00 6193
te(la_hague,lag):weights_sw.13 -0.47 -0.0410 0.43 1.00 3398
te(la_hague,lag):weights_sw.14 -0.37 -0.0470 0.30 1.00 5712
te(la_hague,lag):weights_sw.15 -0.37 -0.0560 0.30 1.00 5963
te(la_hague,lag):weights_sw.16 -0.56 -0.0850 0.39 1.00 8213
s(series).1 -2.50 -0.1400 1.80 1.01 561
s(series).2 -2.20 0.0140 2.00 1.01 497
GAM group-level estimates:
2.5% 50% 97.5% Rhat n_eff
mean(s(series)) -1.800 -0.058 1.7 1.00 9811
sd(s(series)) 0.095 0.340 2.2 1.02 233
Approximate significance of GAM smooths:
edf Ref.df Chi.sq p-value
te(sellafield,lag):weights_ss 3.5137 16 2.376 0.438
te(sellafield,lag):weights_sw 3.1936 16 1.540 0.599
te(la_hague,lag):weights_ss 1.5923 16 0.099 0.930
te(la_hague,lag):weights_sw 1.5375 16 0.926 0.616
s(series) 0.4746 2 0.003 0.955
Stan MCMC diagnostics:
✔ No issues with effective samples per iteration
✔ Rhat looks good for all parameters
✔ No issues with divergences
✔ No issues with maximum tree depth
Samples were drawn using sampling(hmc). For each parameter, n_eff is a
crude measure of effective sample size, and Rhat is the potential scale
reduction factor on split MCMC chains (at convergence, Rhat = 1)
Use how_to_cite() to get started describing this model
mcmc_plot(model_I129_mvgam)plot_mvgam_smooth(model_I129_mvgam, smooth = 1)plot_mvgam_smooth(model_I129_mvgam, smooth = 2)plot_mvgam_smooth(model_I129_mvgam, smooth = 3)plot_mvgam_smooth(model_I129_mvgam, smooth = 4)plot(hindcast(model_I129_mvgam), series = 1)No non-missing values in test_observations; cannot calculate forecast score
Warning: Removed 6 rows containing missing values or values outside the scale range
(`geom_point()`).
plot(hindcast(model_I129_mvgam), series = 2)No non-missing values in test_observations; cannot calculate forecast score
Warning: Removed 6 rows containing missing values or values outside the scale range
(`geom_point()`).
plot_mvgam_randomeffects(model_I129_mvgam)And the diagnostics:
plot_mvgam_resids(model_I129_mvgam)ppc(model_I129_mvgam)Generative AI statement
Generative AI (GAI) was not used for text generation anywhere in this document or the final report.
GAI (Mistral 3, GPT-4, GPT-5) has been used for debugging and for creating and modifying visualization code (e.g. code for plot titles, labels, fonts, colour schemes).
GAI was not used for “mission-critical” code, such as the loading/wrangling of data and the specification of the model.
GAI was not used for generation or interpretation of the results.
All output from GAI that has been used was reviewed and edited to ensure that any output used in the final document was correct and achieved its intended purpose. The author takes full responsibility for the contents of this document.