In this tutorial, we will use the lynx
dataset as an
example, which can be accessed directly from R. Let’s load the dataset
and visualize it:
data <- data.frame(year = seq(1821, 1934, by = 1), y = as.numeric(lynx))
data$x <- data$year - min(data$year)
plot(lynx)
Based on a visual examination of the dataset, we can observe an obvious 10-year quasi-periodic behavior in the lynx count with evolving amplitudes over time. Therefore, we will consider fitting the following sGP model:
\[ \begin{aligned} y_i|\lambda_i &\sim \text{Poisson}(\lambda_i) ,\\ \log(\lambda_i) &= \eta_i = \beta_0 + g(x_i) + \xi_i,\\ g &\sim \text{sGP} \bigg(a = \frac{2\pi}{10}, \sigma\bigg),\\ \xi_i &\sim N(0,\sigma_\xi). \end{aligned} \]
Here, each \(y_i\) represents the lynx count, \(x_i\) represents the number of years since 1821, and \(\xi_i\) is an observation-level random intercept to account for overdispersion effect.
To specify the priors for the sGP boundary conditions and the intercept parameter, we assume independent normal priors with mean 0 and variance 1000. For the overdispersion parameter \(\sigma_\xi\), we assign an exponential prior with \(P(\sigma_\xi > 1) = 0.01\).
To determine the prior for the standard deviation parameter \(\sigma\) of the sGP, we employ the concept
of predictive standard deviation (PSD). We start with an exponential
prior on the 50 years PSD: \[P(\sigma(50)>1) = 0.01.\] To convert
this prior to the original standard deviation parameter \(\sigma\), we use the
compute_d_step_sGPsd
function from the sGPfit
package:
To fit the sGP model with BayesGP, specify
model = "sGP"
, and then specify the frequency parameter
a
and the number of sB spline basis used to approximate
k
.
mod <- model_fit(formula = y ~ f(x = year,
model = "sGP",
a = 2*pi/10, k = 30,
sd.prior = list(prior = "exp", param = prior_SD, h = 2)) +
f(x = x,
model = "IID",
sd.prior = list(prior = "exp", param = 0.5)),
family = "Poisson",
data = data,
method = "aghq")
The fitted sGP model is presented below:
summary(mod)
#> Here are some posterior/prior summaries for the parameters:
#> name median q0.025 q0.975 prior prior:P1 prior:P2
#> 1 intercept 6.685 6.522 6.852 Normal 0.000 1e+03
#> 2 year (PSD) 0.461 0.364 0.581 Exponential 0.126 1e-02
#> 3 x (SD) 0.262 0.203 0.342 Exponential 0.500 5e-01
#> For Normal prior, P1 is its mean and P2 is its variance.
#> For Exponential prior, prior is specified as P(theta > P1) = P2.
We can similarly use predict
to evaluate the effect at
different locations:
pred_sGP <- predict(mod, variable = "year", newdata = data.frame(year = seq(min(data$year), max(data$year), by = 0.1)))
plot(mean ~ year, data = pred_sGP, type = 'l', col = 'black')
lines(q0.025 ~ year, data = pred_sGP, lty = 'dashed', col = 'grey')
lines(q0.975 ~ year, data = pred_sGP, lty = 'dashed', col = 'grey')
mod <- model_fit(formula = y ~ f(x = year,
model = "sGP",
a = 2*pi/10, k = 30,
sd.prior = list(prior = "exp", param = prior_SD, h = 2),
boundary.prior = list(prec = c(Inf, Inf))) +
f(x = x,
model = "IID",
sd.prior = list(prior = "exp", param = 0.5)),
family = "Poisson",
data = data,
method = "aghq")
par(oldpar)