In this vignette we show how to simulate the individual data we
included in the simulation study of Hiabu,
Hofman, and Pittarello (2023). The simulations are based on the
SynthETIC
package and they can be used to replicate our
results. In the manuscript, we named the \(5\) scenarios Alpha, Beta, Gamma, Delta,
Epsilon. The \(5\) scenarios have the
same data features described in the following table. Conversely, they
have specific characteristics that we will describe in the coming
sections.
Covariates | Description |
---|---|
claim_number |
Policy identifier. |
claim_type \(\in \left\{0, 1
\right\}\) |
Type of claim. |
AP |
Accident month. |
RP |
Reporting month. |
For each scenario we will show if they satisfy the chain ladder assumptions (CL), the proportionality assumption in Cox (1972) (PROP) and if interactions are present (INT). Details on the simulation mechanism and the simulation parameters can be found in the manuscript.
This scenario is a mix of claim_type 0
and
claim_type 1
with same number of claims at each accident
month (i.e. the claims volume).
# Input data
input_data_0 <- data_generator(
random_seed = 1964,
scenario = "alpha",
time_unit = 1 / 360,
years = 4,
period_exposure = 200
)
input_data_0 %>%
as.data.frame() %>%
mutate(claim_type = as.factor(claim_type)) %>%
ggplot(aes(x = RT - AT, color = claim_type)) +
stat_ecdf(size = 1) +
labs(title = "Empirical distribution of simulated notification delays", x =
"Notification delay (in days)", y = "Cumulative Density") +
xlim(0, 1500) +
scale_color_manual(
values = c("royalblue", "#a71429"),
labels = c("Claim type 0", "Claim type 1")
) +
scale_linetype_manual(values = c(1, 3),
labels = c("Claim type 0", "Claim type 1")) +
guides(
color = guide_legend(title = "Claim type", override.aes = list(
color = c("royalblue", "#a71429"), size = 2
)),
linetype = guide_legend(
title = "Claim type",
override.aes = list(linetype = c(1, 3), size = 0.7)
)
) +
theme_bw()
This scenario is similar to simulation Alpha
but the
volume of claim_type 1
is decreasing in the most recent
accident dates. When the longer tailed bodily injuries have a decreasing
claim volume, aggregated chain ladder methods will overestimate
reserves, see Ajne (1994).
input_data_1 <- data_generator(
random_seed = 1964,
scenario = 1,
time_unit = 1 / 360,
years = 4,
period_exposure = 200
)
input_data_1 %>%
as.data.frame() %>%
mutate(claim_type = as.factor(claim_type)) %>%
ggplot(aes(x = RT - AT, color = claim_type)) +
stat_ecdf(size = 1) +
labs(title = "Empirical distribution of simulated notification delays", x =
"Notification delay (in days)", y = "Cumulative Density") +
xlim(0, 1500) +
scale_color_manual(
values = c("royalblue", "#a71429"),
labels = c("Claim type 0", "Claim type 1")
) +
scale_linetype_manual(values = c(1, 3),
labels = c("Claim type 0", "Claim type 1")) +
guides(
color = guide_legend(title = "Claim type", override.aes = list(
color = c("royalblue", "#a71429"), size = 2
)),
linetype = guide_legend(
title = "Claim type",
override.aes = list(linetype = c(1, 3), size = 0.7)
)
) +
theme_bw()
An interaction between claim_type 1
and accident period
affects the claims occurrence. One could imagine a scenario, where a
change in consumer behavior or company policies resulted in different
reporting patterns over time. For the last simulated accident month, the
two reporting delay distributions will be identical.
# Input data
input_data_2 <- data_generator(
random_seed = 1964,
scenario = 2,
time_unit = 1 / 360,
years = 4,
period_exposure = 200
)
input_data_2 %>%
as.data.frame() %>%
mutate(claim_type = as.factor(claim_type)) %>%
ggplot(aes(x = RT - AT, color = claim_type)) +
stat_ecdf(size = 1) +
labs(title = "Empirical distribution of simulated notification delays", x =
"Notification delay (in days)", y = "Cumulative Density") +
xlim(0, 1500) +
scale_color_manual(
values = c("royalblue", "#a71429"),
labels = c("Claim type 0", "Claim type 1")
) +
scale_linetype_manual(values = c(1, 3),
labels = c("Claim type 0", "Claim type 1")) +
guides(
color = guide_legend(title = "Claim type", override.aes = list(
color = c("royalblue", "#a71429"), size = 2
)),
linetype = guide_legend(
title = "Claim type",
override.aes = list(linetype = c(1, 3), size = 0.7)
)
) +
theme_bw()
A seasonality effect dependent on the accident months for
claim_type 0
and claim_type 1
is present. This
could occur in a real world setting with increased work load during
winter for certain claim types, or a decreased workforce during the
summer holidays.
input_data_3 <- data_generator(
random_seed = 1964,
scenario = 3,
time_unit = 1 / 360,
years = 4,
period_exposure = 200
)
input_data_3 %>%
as.data.frame() %>%
mutate(claim_type = as.factor(claim_type)) %>%
ggplot(aes(x = RT - AT, color = claim_type)) +
stat_ecdf(size = 1) +
labs(title = "Empirical distribution of simulated notification delays", x =
"Notification delay (in days)", y = "Cumulative Density") +
xlim(0, 1500) +
scale_color_manual(
values = c("royalblue", "#a71429"),
labels = c("Claim type 0", "Claim type 1")
) +
scale_linetype_manual(values = c(1, 3),
labels = c("Claim type 0", "Claim type 1")) +
guides(
color = guide_legend(title = "Claim type", override.aes = list(
color = c("royalblue", "#a71429"), size = 2
)),
linetype = guide_legend(
title = "Claim type",
override.aes = list(linetype = c(1, 3), size = 0.7)
)
) +
theme_bw()
The data generating process violates the proportional likelihood in Cox (1972). We generate the data assuming that a) there is an effect of the covariates on the baseline and b) the proportionality assumption is not valid.
# Input data
input_data_4 <- data_generator(
random_seed = 1964,
scenario = 4,
time_unit = 1 / 360,
years = 4,
period_exposure = 200
)
input_data_4 %>%
as.data.frame() %>%
mutate(claim_type = as.factor(claim_type)) %>%
ggplot(aes(x = RT - AT, color = claim_type)) +
stat_ecdf(size = 1) +
labs(title = "Empirical distribution of simulated notification delays", x =
"Notification delay (in days)", y = "Cumulative Density") +
xlim(0, 1500) +
scale_color_manual(
values = c("royalblue", "#a71429"),
labels = c("Claim type 0", "Claim type 1")
) +
scale_linetype_manual(values = c(1, 3),
labels = c("Claim type 0", "Claim type 1")) +
guides(
color = guide_legend(title = "Claim type", override.aes = list(
color = c("royalblue", "#a71429"), size = 2
)),
linetype = guide_legend(
title = "Claim type",
override.aes = list(linetype = c(1, 3), size = 0.7)
)
) +
theme_bw()