The rplanes
package currently has a set of seven components that each perform plausibility analysis of epidemiological signals (PLANES). Each of the individual components assesses different characteristics of the evaluated signal, resolving to a binary indicator (i.e., TRUE
or FALSE
as to whether or not the signal is implausible). All components are wrapped via the plane_score()
function to generate an overall score based on the number of components for which a flag is raised.
Here we provide a narrative walk through of each of the individual PLANES components. The full list of components, along with the corresponding functions and parameters (if applicable), is provided in the table below. It is important to emphasize that rplanes
is designed to work for both forecasted and observed epidemiological signals. However, not all of the components will work on both types of signals. Several components (including coverage, taper, trend, and shape) can only be used to assess plausibility of forecasts.
Component | Description | Function | Forecast | Observed | Parameters |
---|---|---|---|---|---|
Difference | Point-to-point difference | plane_diff() | YES | YES | None |
Coverage | Prediction interval covers the most recent observation | plane_cover() | YES | NO | None |
Taper | Prediction interval narrows as horizon increases | plane_taper() | YES | NO | None |
Repeat | Values repeat more than expected | plane_repeat() | YES | YES | Tolerated number of repeats; Number of observations to prepend |
Trend | Signal exhibits change in trend compared to recent observations | plane_trend() | YES | NO | Significance level for trend change |
Shape | Shape of signal trajectory has not been observed in seed data | plane_shape() | YES | NO | Method used to identify unique shapes (sdiff or dtw) |
Zero | Zeros found in signal when not in seed | plane_zero() | YES | YES | None |
All components are designed to take a location (i.e., string specifying location code), the signal prepared with to_signal()
that will be evaluated, and the seed prepared with plane_seed()
that holds baseline characteristics. To further customize behavior, some of the functions accept additional arguments, each of which is detailed in the examples that follow. Each component function returns a list
that is standardized to at minimum return an “indicator” element, which returns TRUE
if the component flag is raised (i.e., data is implausible) and FALSE
if the component flag is not raised (i.e., data is not implausible).
It is important to emphasize that all of the individual components are wrapped in the plane_score()
function for ease-of-use, and in practice most users will likely use the wrapper instead of accessing the functions directly. However, in the examples below we use the functions outside of plane_score()
to more clearly demonstrate how they operate. Likewise, we have created mock forecast data in each of the examples to demonstrate usage. Users should refer to the “Basic Usage” vignette for practical guidance on how to prepare data for rplanes
analyses.
As mentioned above, the data evaluated in the examples that follow will be mock forecasts. We will compare these forecasted values to the HHS Protect incident flu hospitalization data that has been aggregated from daily to weekly resolution, and is provided as internal rplanes
package data. For more details on the data preparation for rplanes
see the “Basic Usage” vignette.
The code below will load the required packages, prepare the observed data as a signal with to_signal()
, and convert the observed signal to a seed with plane_seed()
. Note that for the examples below we use a cut date of “2022-06-04” for the seed data:
## load packages
library(rplanes)
library(dplyr)
library(ggplot2)
## read in observed data
<-
hosp_all read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) %>%
select(date, location, flu.admits) %>%
mutate(date = as.Date(date))
## prepare observed signal
<- to_signal(input = hosp_all, outcome = "flu.admits", type = "observed", resolution = "weeks", horizon = NULL)
observed_signal
## create seed with cut date
<- plane_seed(observed_signal, cut_date = "2022-06-04") prepped_seed
plane_diff()
The difference component checks the point-to-point differences for evaluated signal. This component can be used on either forecasts or observed signals. The function internally computes the maximum observed difference (using absolute value) and checks to see if any of the point-to-point differences for the evaluated data exceed that threshold. If so, the flag is raised.
The example below uses a forecasted signal in which the 1 week-ahead forecast dramatically jumps from the most recent observed data:
<- c(100, 120, 140, 160)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - c(10, 20, 30, 40),
## make a large jump in hospitalizations to trigger diff component
point = point_est,
upper = point_est + c(10, 20, 30, 40),
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 90 | 100 | 110 |
01 | 2022-06-18 | 2 | 100 | 120 | 140 |
01 | 2022-06-25 | 3 | 110 | 140 | 170 |
01 | 2022-07-02 | 4 | 120 | 160 | 200 |
The seed stores the last observed value and the maximum difference for the given location:
$`01`$last_value
prepped_seed#> [1] 19
$`01`$diff$max
prepped_seed#> [1] 21
We would expect the implausibility flag to be raised in this case:
plane_diff(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] TRUE
#>
#> $values
#> [1] 19 100 120 140 160
#>
#> $evaluated_differences
#> [1] 81 20 20 20
#>
#> $maximum_difference
#> [1] 21
We can visualize the point-to-point differences and see where the data jumps with the forecast:
<-
diff_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<-
diff_flags %>%
diff_dat filter(type == "forecast") %>%
filter(date == min(date))
%>%
diff_dat ggplot(mapping = aes(x = date, y = flu.admits)) +
geom_line(lty = "dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_point(data = diff_flags, mapping = aes(x = date, y = flu.admits), shape=23, size=4, color = "black") +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle("Difference component\nFlagged")
The next example will include data that does not “jump” in absolute difference beyond what has been observed in the time series previously:
<- c(28, 31, 34, 37)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - c(5, 10, 15, 20),
point = point_est,
upper = point_est + c(5, 10, 15, 20),
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 23 | 28 | 33 |
01 | 2022-06-18 | 2 | 21 | 31 | 41 |
01 | 2022-06-25 | 3 | 19 | 34 | 49 |
01 | 2022-07-02 | 4 | 17 | 37 | 57 |
Again, we can see the last value and maximum observed difference for the given location that will be used internally in the seed:
$`01`$last_value
prepped_seed#> [1] 19
$`01`$diff$max
prepped_seed#> [1] 21
Given this max difference, we would not expect the implausibility flag to be raised in this case:
plane_diff(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] FALSE
#>
#> $values
#> [1] 19 28 31 34 37
#>
#> $evaluated_differences
#> [1] 9 3 3 3
#>
#> $maximum_difference
#> [1] 21
The plot below shows the forecasted data that would not raise the difference flag:
<-
diff_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
%>%
diff_dat ggplot(mapping = aes(x = date, y = flu.admits)) +
geom_line(lty = "dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle("Difference component\nNot flagged")
plane_cover()
The coverage component compares the prediction interval for the first horizon of the evaluated signal to the most recent value in the seed. If the interval does not cover the most recent data point, then the flag is raised as implausible. Because this component requires a prediction interval, it can only be used to assess plausibility of forecast signals.
We can create forecast data that includes a prediction interval that does not cover the most recent value in seed:
## make sure the 1 week-ahead point estimate and PI do not cover the last reported obs
<- c(60, 62, 64, 66)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - c(2, 4, 6, 8),
point = point_est,
upper = point_est + c(2, 4, 6, 8)
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 58 | 60 | 62 |
01 | 2022-06-18 | 2 | 58 | 62 | 66 |
01 | 2022-06-25 | 3 | 58 | 64 | 70 |
01 | 2022-07-02 | 4 | 58 | 66 | 74 |
The prediction interval is quite narrow and departs from the last observed value in the seed:
$`01`$last_value
prepped_seed#> [1] 19
We would expect the coverage flag to be raised:
plane_cover(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] TRUE
#>
#> $last_value
#> [1] 19
#>
#> $bounds
#> $bounds$lower
#> [1] 58
#>
#> $bounds$upper
#> [1] 62
The plot below shows the coverage of the forecast prediction intervals in relation to the seed data:
<-
cover_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<-
cov_flags %>%
cover_dat filter(type == "observed") %>%
filter(date == max(date))
ggplot(data = cover_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_point(data = cov_flags, mapping = aes(x = date, y = flu.admits), shape=23, size=4, color = "black") +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle(paste("Coverage component\nFlagged"))
We can put together an example where the prediction interval for the first horizon covers the most recent value in the seed data:
## make sure the 1 week-ahead point estimate and PI cover the last reported obs
<- c(28, 31, 34, 37)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - 28,
point = point_est,
upper = point_est + 28
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 0 | 28 | 56 |
01 | 2022-06-18 | 2 | 3 | 31 | 59 |
01 | 2022-06-25 | 3 | 6 | 34 | 62 |
01 | 2022-07-02 | 4 | 9 | 37 | 65 |
$`01`$last_value
prepped_seed#> [1] 19
Given the coverage, we would not expect the signal to be flagged as implausible:
plane_cover(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] FALSE
#>
#> $last_value
#> [1] 19
#>
#> $bounds
#> $bounds$lower
#> [1] 0
#>
#> $bounds$upper
#> [1] 56
Again, we can visualize the coverage of the forecast relative to the seed data:
<-
cover_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
ggplot(data = cover_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle(paste("Coverage component\nNot flagged"))
plane_taper()
The taper component checks whether or not the prediction interval for the evaluated signal decreases in width (i.e., certainty increases) as horizons progress. Because this component requires a prediction interval, it can only be used to assess plausibility of forecast signals.
Here we create a mock forecast that will have a narrowing prediction interval:
<- c(30, 33, 36, 39)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
## make the lower and upper bounds get narrower as horizon increases
lower = point_est - c(20, 15, 10, 5),
point = point_est,
upper = point_est + c(20, 15, 10, 5)
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 10 | 30 | 50 |
01 | 2022-06-18 | 2 | 18 | 33 | 48 |
01 | 2022-06-25 | 3 | 26 | 36 | 46 |
01 | 2022-07-02 | 4 | 34 | 39 | 44 |
The width of the prediction interval narrows from 40 to 30 to 20 to 10 over the forecasted horizons. We would expect the taper flag to be raised:
plane_taper(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] TRUE
#>
#> $widths
#> [1] 40 30 20 10
The plot below visually demonstrates the tapering effect:
<-
taper_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<-
taper_flags %>%
taper_dat filter(type == "forecast")
%>%
taper_dat ggplot(data = taper_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_point(data = taper_flags, mapping = aes(x = date, y = flu.admits), shape=23, size=4, color = "black") +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu Hospital Admissions") +
ggtitle(paste("Taper component\nFlagged"))
Now we can look at an example where the forecasted prediction interval increases in width as horizons progress:
<- c(30, 33, 36, 39)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
## make the lower and upper bounds get wider as horizon increases
lower = point_est - c(5, 10, 15, 20),
point = point_est,
upper = point_est + c(5, 10, 15, 20)
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 25 | 30 | 35 |
01 | 2022-06-18 | 2 | 23 | 33 | 43 |
01 | 2022-06-25 | 3 | 21 | 36 | 51 |
01 | 2022-07-02 | 4 | 19 | 39 | 59 |
We would not expect the implausibility flag to be raised in this case:
plane_taper(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] FALSE
#>
#> $widths
#> [1] 10 20 30 40
In the visualization below we see that the forecast prediction interval does not taper:
<-
taper_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
%>%
taper_dat ggplot(data = taper_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu Hospital Admissions") +
ggtitle(paste("Taper component\nNot flagged"))
plane_repeat()
The repeat component checks whether consecutive values in an observed or forecasted signal are repeated k times. When the seed is created, it stores the maximum number of consecutive repeats for each location and uses this as the default value for k. If the evaluated data exceeds k then the signal is considered implausible and a flag is raised.
The k threshold for repeats can be customized using the “tolerance” parameter. The function also allows users to customize the “prepend” length (i.e., the number of most recent values from seed to be concatenated with the evaluated signal while checking for repeats).
To illustrate the repeat parameters, we can contrive a simple example. Consider seed values of 11, 12, 13, 13, 13
and an evaluated forecast with point estimates 13, 13, 15, 16
. If the tolerance threshold is set at 4
and prepend length is 2
then the sequence 13, 13, 13, 13, 15, 16
would be checked for any set of more than four values repeated consecutively. In that case, no flag would be raised. The value 13
is repeated four times but we tolerate at most four repeats. However, if we keep the tolerance at 4
and change the prepend length to 3
, the evaluated sequence would be 13, 13, 13, 13, 13, 15, 16
, and a flag would be raised, because there were more repeats than the tolerance threshold.
For more on these parameters see ?plane_repeat()
.
We can mock up some example data that repeats the same point estimate:
## make sure the point estimates repeat
<- c(55, 55, 55, 55)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - c(5, 10, 15, 20),
point = point_est,
upper = point_est + c(5, 10, 15, 20)
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 50 | 55 | 60 |
01 | 2022-06-18 | 2 | 45 | 55 | 65 |
01 | 2022-06-25 | 3 | 40 | 55 | 70 |
01 | 2022-07-02 | 4 | 35 | 55 | 75 |
We can check the maximum number of repeats that have been seen in the seed data:
$`01`$max_repeats
prepped_seed#> [1] 2
Because the number of repeated point estimates we have defined above exceeds the maximum repeats we expect a flag to be raised:
plane_repeat(location = "01", input = prepped_forecast, seed = prepped_seed, tolerance = NULL, prepend = NULL)
#> $indicator
#> [1] TRUE
#>
#> $repeats
#> # A tibble: 4 × 6
#> point location date horizon lower upper
#> <dbl> <chr> <date> <int> <dbl> <dbl>
#> 1 55 01 2022-06-11 1 50 60
#> 2 55 01 2022-06-18 2 45 65
#> 3 55 01 2022-06-25 3 40 70
#> 4 55 01 2022-07-02 4 35 75
We can visualize the repeats in the forecast data:
<-
repeat_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<-
repeat_flags %>%
repeat_dat filter(type == "forecast")
%>%
repeat_dat ggplot(data = repeat_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_point(data = repeat_flags, mapping = aes(x = date, y = flu.admits), shape=23, size=4, color = "black") +
geom_ribbon(aes(ymin = lower, ymax = upper,fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle(paste("Repeat component\nFlagged"))
As described above, the “tolerance” parameter allows the user to override the default behavior that sets the maximum number of repeats via the seed. Setting a higher tolerance will decrease the sensitivity of the repeat assessment. In this example, if we increase the tolerance to 4
then we would not expect the flag to be raised:
plane_repeat(location = "01", input = prepped_forecast, seed = prepped_seed, tolerance = 4, prepend = NULL)
#> $indicator
#> [1] FALSE
#>
#> $repeats
#> # A tibble: 0 × 6
#> # ℹ 6 variables: point <dbl>, location <chr>, date <date>, horizon <int>,
#> # lower <dbl>, upper <dbl>
Here we prepare mock forecast data that does not have repeating point estimates:
## make sure the point estimates do not repeat
<- c(55, 57, 59, 61)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - c(5, 10, 15, 20),
point = point_est,
upper = point_est + c(5, 10, 15, 20)
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 50 | 55 | 60 |
01 | 2022-06-18 | 2 | 47 | 57 | 67 |
01 | 2022-06-25 | 3 | 44 | 59 | 74 |
01 | 2022-07-02 | 4 | 41 | 61 | 81 |
We can see the maximum number of repeats in the seed:
$`01`$max_repeats
prepped_seed#> [1] 2
Based on this threshold, we would not expect the implausibility flag for repeats to be raised:
plane_repeat(location = "01", input = prepped_forecast, seed = prepped_seed, tolerance = NULL, prepend = NULL)
#> $indicator
#> [1] FALSE
#>
#> $repeats
#> # A tibble: 0 × 6
#> # ℹ 6 variables: point <dbl>, location <chr>, date <date>, horizon <int>,
#> # lower <dbl>, upper <dbl>
Lastly, we can visualize the repeats for the signal:
<-
repeat_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
%>%
repeat_dat ggplot(data = repeat_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle(paste("Repeat component\nNot flagged"))
plane_trend()
The trend component assesses whether or not there is a significant change in the magnitude or direction of the slope for the evaluated signal compared to the most recent data in the seed. If a “change point” is identified in any of the forecasted horizons and/or the most recent seed value, then the flag is raised for implausibility. The trend component requires at least four times as many seed values as there are evaluated values. Furthermore, the component currently can only be used with forecasted signals.
One of the parameters for the trend function is “sig_lvl”, which defines the significance level for the internal permutation test used to detect change points. By default this value is set to 0.1
. The significance level determines the sensitivity of the trend plausibility assessment, with a lower value corresponding to a less sensitive evaluation.
For more on the trend algorithm methods see ?plane_trend()
.
Here we create some example data that doubles with each forecasted horizon:
<- c(25, 50, 100, 200)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - c(5, 10, 20, 40),
point = point_est,
upper = point_est + c(5, 10, 20, 40),
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 20 | 25 | 30 |
01 | 2022-06-18 | 2 | 40 | 50 | 60 |
01 | 2022-06-25 | 3 | 80 | 100 | 120 |
01 | 2022-07-02 | 4 | 160 | 200 | 240 |
We expect the dramatic increase in slope will be detected as a change point and a flag raised:
plane_trend(location = "01", input = prepped_forecast, seed = prepped_seed, sig_lvl = 0.1)
#> $indicator
#> [1] TRUE
#>
#> $output
#> # A tibble: 20 × 7
#> Location Index Date Value Type Changepoint Flagged
#> <chr> <int> <date> <dbl> <chr> <lgl> <lgl>
#> 1 01 1 2022-02-19 20 Observed FALSE FALSE
#> 2 01 2 2022-02-26 14 Observed FALSE FALSE
#> 3 01 3 2022-03-05 35 Observed FALSE FALSE
#> 4 01 4 2022-03-12 30 Observed FALSE FALSE
#> 5 01 5 2022-03-19 30 Observed FALSE FALSE
#> 6 01 6 2022-03-26 25 Observed FALSE FALSE
#> 7 01 7 2022-04-02 32 Observed FALSE FALSE
#> 8 01 8 2022-04-09 29 Observed FALSE FALSE
#> 9 01 9 2022-04-16 19 Observed FALSE FALSE
#> 10 01 10 2022-04-23 30 Observed FALSE FALSE
#> 11 01 11 2022-04-30 13 Observed FALSE FALSE
#> 12 01 12 2022-05-07 25 Observed FALSE FALSE
#> 13 01 13 2022-05-14 15 Observed FALSE FALSE
#> 14 01 14 2022-05-21 12 Observed FALSE FALSE
#> 15 01 15 2022-05-28 22 Observed FALSE FALSE
#> 16 01 16 2022-06-04 19 Observed FALSE FALSE
#> 17 01 17 2022-06-11 25 Forecast FALSE FALSE
#> 18 01 18 2022-06-18 50 Forecast TRUE TRUE
#> 19 01 19 2022-06-25 100 Forecast FALSE FALSE
#> 20 01 20 2022-07-02 200 Forecast FALSE FALSE
#>
#> $flagged_dates
#> [1] "2022-06-18"
The plot below shows the forecasted value identified as a change point:
<-
trend_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<-
trend_flags plane_trend(location = "01", input = prepped_forecast, seed = prepped_seed, sig_lvl = 0.1)$output %>%
filter(Changepoint == TRUE)
ggplot(data = trend_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
geom_point(data = trend_flags, mapping = aes(x = Date, y = Value), shape=23, size=4) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle(paste("Trend component\nFlagged"))
By toggling the significance level, we can control the sensitivity of the trend assessment. Here we lower the significance level to 0.001
and see that the flag is no longer raised using the same data as above:
plane_trend(location = "01", input = prepped_forecast, seed = prepped_seed, sig_lvl = 0.001)
#> $indicator
#> [1] FALSE
#>
#> $output
#> # A tibble: 20 × 7
#> Location Index Date Value Type Changepoint Flagged
#> <chr> <int> <date> <dbl> <chr> <lgl> <lgl>
#> 1 01 1 2022-02-19 20 Observed FALSE FALSE
#> 2 01 2 2022-02-26 14 Observed FALSE FALSE
#> 3 01 3 2022-03-05 35 Observed FALSE FALSE
#> 4 01 4 2022-03-12 30 Observed FALSE FALSE
#> 5 01 5 2022-03-19 30 Observed FALSE FALSE
#> 6 01 6 2022-03-26 25 Observed FALSE FALSE
#> 7 01 7 2022-04-02 32 Observed FALSE FALSE
#> 8 01 8 2022-04-09 29 Observed FALSE FALSE
#> 9 01 9 2022-04-16 19 Observed FALSE FALSE
#> 10 01 10 2022-04-23 30 Observed FALSE FALSE
#> 11 01 11 2022-04-30 13 Observed FALSE FALSE
#> 12 01 12 2022-05-07 25 Observed FALSE FALSE
#> 13 01 13 2022-05-14 15 Observed FALSE FALSE
#> 14 01 14 2022-05-21 12 Observed FALSE FALSE
#> 15 01 15 2022-05-28 22 Observed FALSE FALSE
#> 16 01 16 2022-06-04 19 Observed FALSE FALSE
#> 17 01 17 2022-06-11 25 Forecast FALSE FALSE
#> 18 01 18 2022-06-18 50 Forecast FALSE FALSE
#> 19 01 19 2022-06-25 100 Forecast FALSE FALSE
#> 20 01 20 2022-07-02 200 Forecast FALSE FALSE
#>
#> $flagged_dates
#> [1] NA
We can also make example data that reflects a consistent trend with the seed:
<- c(40, 41, 40, 43)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - c(5, 10, 15, 20),
point = point_est,
upper = point_est + c(5, 10, 15, 20),
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 35 | 40 | 45 |
01 | 2022-06-18 | 2 | 31 | 41 | 51 |
01 | 2022-06-25 | 3 | 25 | 40 | 55 |
01 | 2022-07-02 | 4 | 23 | 43 | 63 |
In this case we would not expect an implausibility flag to be raised:
plane_trend(location = "01", input = prepped_forecast, seed = prepped_seed, sig_lvl = 0.1)
#> $indicator
#> [1] FALSE
#>
#> $output
#> # A tibble: 20 × 7
#> Location Index Date Value Type Changepoint Flagged
#> <chr> <int> <date> <dbl> <chr> <lgl> <lgl>
#> 1 01 1 2022-02-19 20 Observed FALSE FALSE
#> 2 01 2 2022-02-26 14 Observed FALSE FALSE
#> 3 01 3 2022-03-05 35 Observed FALSE FALSE
#> 4 01 4 2022-03-12 30 Observed FALSE FALSE
#> 5 01 5 2022-03-19 30 Observed FALSE FALSE
#> 6 01 6 2022-03-26 25 Observed FALSE FALSE
#> 7 01 7 2022-04-02 32 Observed FALSE FALSE
#> 8 01 8 2022-04-09 29 Observed FALSE FALSE
#> 9 01 9 2022-04-16 19 Observed FALSE FALSE
#> 10 01 10 2022-04-23 30 Observed FALSE FALSE
#> 11 01 11 2022-04-30 13 Observed FALSE FALSE
#> 12 01 12 2022-05-07 25 Observed FALSE FALSE
#> 13 01 13 2022-05-14 15 Observed FALSE FALSE
#> 14 01 14 2022-05-21 12 Observed FALSE FALSE
#> 15 01 15 2022-05-28 22 Observed FALSE FALSE
#> 16 01 16 2022-06-04 19 Observed FALSE FALSE
#> 17 01 17 2022-06-11 40 Forecast FALSE FALSE
#> 18 01 18 2022-06-18 41 Forecast FALSE FALSE
#> 19 01 19 2022-06-25 40 Forecast FALSE FALSE
#> 20 01 20 2022-07-02 43 Forecast FALSE FALSE
#>
#> $flagged_dates
#> [1] NA
The visualization below shows the consistency of the forecasted trend with the comparison data in the seed:
<-
trend_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<-
trend_flags plane_trend(location = "01", input = prepped_forecast, seed = prepped_seed, sig_lvl = 0.1)$output %>%
filter(Changepoint == TRUE)
ggplot(data = trend_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
geom_point(data = trend_flags, mapping = aes(x = Date, y = Value), shape=23, size=4) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle(paste("Trend component\nNot flagged"))
Note that there are some cases where plane_trend()
will identify a change point in the seed data but will not raise an implausibility flag for the forecast. The trend component checks for change points in all forecasted horizons and the most recent value in the seed. If there is a significant change point found elsewhere in the seed time series, the function will not raise a flag. However, the output includes any change points detected regardless of whether or not they raised on an implausibility flag.
To demonstrate this, we can look at location “06”. For this example, we need to define a new prepped_seed
object because this change point occurs after our previously defined cut date:
## create seed with cut date
<- plane_seed(observed_signal, cut_date = "2022-10-29")
prepped_seed2 <- c(40, 41, 40, 43)
point_est
<-
prepped_forecast tibble(
location = "06",
date = seq(as.Date("2022-11-05"), as.Date("2022-11-26"), by = 7),
horizon = 1:4,
lower = point_est - c(5, 10, 15, 20),
point = point_est,
upper = point_est + c(5, 10, 15, 20),
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
06 | 2022-11-05 | 1 | 35 | 40 | 45 |
06 | 2022-11-12 | 2 | 31 | 41 | 51 |
06 | 2022-11-19 | 3 | 25 | 40 | 55 |
06 | 2022-11-26 | 4 | 23 | 43 | 63 |
In this case we would not expect an implausibility flag to be raised, but we do see a change point:
plane_trend(location = "06", input = prepped_forecast, seed = prepped_seed2, sig_lvl = 0.1)
#> $indicator
#> [1] FALSE
#>
#> $output
#> # A tibble: 20 × 7
#> Location Index Date Value Type Changepoint Flagged
#> <chr> <int> <date> <dbl> <chr> <lgl> <lgl>
#> 1 06 1 2022-07-16 102 Observed FALSE FALSE
#> 2 06 2 2022-07-23 64 Observed FALSE FALSE
#> 3 06 3 2022-07-30 46 Observed FALSE FALSE
#> 4 06 4 2022-08-06 39 Observed FALSE FALSE
#> 5 06 5 2022-08-13 38 Observed FALSE FALSE
#> 6 06 6 2022-08-20 39 Observed FALSE FALSE
#> 7 06 7 2022-08-27 33 Observed FALSE FALSE
#> 8 06 8 2022-09-03 35 Observed FALSE FALSE
#> 9 06 9 2022-09-10 29 Observed FALSE FALSE
#> 10 06 10 2022-09-17 24 Observed TRUE FALSE
#> 11 06 11 2022-09-24 35 Observed FALSE FALSE
#> 12 06 12 2022-10-01 72 Observed FALSE FALSE
#> 13 06 13 2022-10-08 93 Observed FALSE FALSE
#> 14 06 14 2022-10-15 97 Observed FALSE FALSE
#> 15 06 15 2022-10-22 124 Observed FALSE FALSE
#> 16 06 16 2022-10-29 211 Observed FALSE FALSE
#> 17 06 17 2022-11-05 40 Forecast FALSE FALSE
#> 18 06 18 2022-11-12 41 Forecast FALSE FALSE
#> 19 06 19 2022-11-19 40 Forecast FALSE FALSE
#> 20 06 20 2022-11-26 43 Forecast FALSE FALSE
#>
#> $flagged_dates
#> [1] NA
Note that on 2022-09-17, a change point was detected, but because it was not in our forecast, nor was it the last observed data point there was no flag raised.
The visualization below shows the change point in the observed data:
<-
trend_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "06") %>% filter(date <= "2022-10-29") %>% mutate(type="observed"), . )
<-
trend_flags plane_trend(location = "06", input = prepped_forecast, seed = prepped_seed2, sig_lvl = 0.1)$output %>%
filter(Changepoint == TRUE)
ggplot(data = trend_dat, mapping = aes(x = date, y = flu.admits)) +
geom_line(lty="dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
geom_point(data = trend_flags, mapping = aes(x = Date, y = Value), shape=23, size=4) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle(paste("Trend component\nNot flagged but change point in seed"))
plane_shape()
The shape component evaluates the shape of the trajectory of the forecast signal and compares that shape to existing shapes in the observed seed data. If the shape is identified as novel, a flag is raised, and the signal is considered implausible.
This component has one additional argument that defines the method used to identify shapes - one of “sdiff” (scaled difference; set as default) or “dtw” (Dynamic Time Warping). Based on preliminary analyses, the “dtw” method has a higher sensitivity and a slightly lower specificity than the “sdiff” method but is much more computationally expensive.
For more information on the shape algorithm, see ?plane_shape()
.
In the example below, we set the point estimates and prediction intervals of the forecast signal to a shape that is novel compared to the seed data (see plot below).
<- c(60, 60, 60, 10)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - 10,
## make an unusual shape in hospitalizations to trigger shape component
point = point_est,
upper = point_est + 10,
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 50 | 60 | 70 |
01 | 2022-06-18 | 2 | 50 | 60 | 70 |
01 | 2022-06-25 | 3 | 50 | 60 | 70 |
01 | 2022-07-02 | 4 | 0 | 10 | 20 |
We would expect an implausibility flag to be raised for this example:
plane_shape(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] TRUE
The indicator is TRUE
, meaning that the forecast is implausible, because the shape is novel relative to the seed data.
We can visualize the shape differences and see why this shape is flagged. The forecast in the plot below (red line) clearly looks different than any shape in the observed seed data (blue line):
<-
shape_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<-
shape_flags %>%
shape_dat filter(type == "forecast")
%>%
shape_dat ggplot(mapping = aes(x = date, y = flu.admits)) +
geom_line(lty = "dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
geom_point(data = shape_flags, mapping = aes(x = date, y = flu.admits), shape=23, size=4, color = "black") +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle("Shape component\nFlagged")
Next we’ll look at an example of a forecast with a familiar shape that shouldn’t trigger a flag:
<- c(28, 18, 30, 20)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = point_est - 10,
## make a familiar shape in hospitalizations to not trigger shape component
point = point_est,
upper = point_est + 10,
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 18 | 28 | 38 |
01 | 2022-06-18 | 2 | 8 | 18 | 28 |
01 | 2022-06-25 | 3 | 20 | 30 | 40 |
01 | 2022-07-02 | 4 | 10 | 20 | 30 |
We would not expect an implausibility flag to be raised for this example:
plane_shape(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] FALSE
The indicator is FALSE
, meaning that the forecast is considered plausible, because the shape is familiar relative to the seed data.
We can visualize the shape similarities/differences and see why this shape is not flagged. The forecast in the plot below (red line) looks very similar to the shape that we see in the observed seed data (blue line) between mid April and mid June:
<-
shape_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
%>%
shape_dat ggplot(mapping = aes(x = date, y = flu.admits)) +
geom_line(lty = "dotted") +
geom_line(aes(colour = type)) +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle("Shape component\nNot Flagged")
plane_zero()
This function checks for the presence of any value(s) equal to zero in the evaluated signal. If there are any zeros found, then the function will look in the seed to see if there are zeros anywhere else in the time series. If so, the function will consider the evaluated zero plausible and no flags will be raised (i.e., indicator returned as FALSE
). If not, the function will consider the evaluated zero implausible and a flag will be raised (i.e., indicator returned as TRUE
). This function can be used on either forecast or observed signals.
In the example below, we add a zero to the signal point estimate to a location for which the seed has no zeros:
<- c(31, 30, 31, 0)
point_est
<-
prepped_forecast tibble(
location = "01",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = c(26,24,24,0),
## add zeros in hospitalizations to trigger zero component
point = point_est,
upper = c(36,36,38,15)
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
01 | 2022-06-11 | 1 | 26 | 31 | 36 |
01 | 2022-06-18 | 2 | 24 | 30 | 36 |
01 | 2022-06-25 | 3 | 24 | 31 | 38 |
01 | 2022-07-02 | 4 | 0 | 0 | 15 |
The seed stores a logical indicating whether any zeros are present in the seed data, and there are none in this example:
$`01`$any_zeros
prepped_seed#> [1] FALSE
We would then expect an implausibility flag to be raised for this location:
plane_zero(location = "01", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] TRUE
The indicator is TRUE
, meaning that the forecast signal is implausible because there are zeros in the forecast signal but not in the observed seed.
We can visualize this below. The signal in red (a forecast in this example) has a zero in early July, but there were no zeros found in the seed data in blue.
<-
zero_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "01") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<- zero_dat %>%
zero_flags filter(flu.admits == 0)
%>%
zero_dat ggplot(mapping = aes(x = date, y = flu.admits)) +
geom_line(lty = "dotted") +
geom_line(aes(colour = type)) +
geom_point(data = zero_flags, mapping = aes(x = date, y = flu.admits), shape=23, size=4, color = "black") +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle("Zero component\nFlagged")
This function will only trigger an implausibility flag if there is a zero in the signal (either observed or forecast) and there is not one in the seed. In other words, no flag will be raised when: (1) there are no zeros in the signal or (2) there are any zeros in the seed.
Let’s look at an example where there are zeros in the signal (location “02”) and in the seed. No flag should be triggered:
<- c(0, 6, 2, 3)
point_est
<-
prepped_forecast tibble(
location = "02",
date = seq(as.Date("2022-06-11"), as.Date("2022-07-02"), by = 7),
horizon = 1:4,
lower = c(0,5,0,1),
## add zeros in hospitalizations
point = point_est,
upper = c(1,7,4,5),
%>%
) to_signal(outcome = "flu.admits", type = "forecast", horizon = 4)
$data prepped_forecast
location | date | horizon | lower | point | upper |
---|---|---|---|---|---|
02 | 2022-06-11 | 1 | 0 | 0 | 1 |
02 | 2022-06-18 | 2 | 5 | 6 | 7 |
02 | 2022-06-25 | 3 | 0 | 2 | 4 |
02 | 2022-07-02 | 4 | 1 | 3 | 5 |
The seed stores a logical indicating whether any zeros are present in the seed data, and there are zeros in this example:
$`02`$any_zeros
prepped_seed#> [1] TRUE
We then would not expect an implausibility flag to be raised for this location:
plane_zero(location = "02", input = prepped_forecast, seed = prepped_seed)
#> $indicator
#> [1] FALSE
The indicator is FALSE
, meaning that the forecast signal is plausible, because there are zeros in the forecast signal and in the observed seed.
We can visualize this below. The signal in red (a forecast in this example) has a zero around early-mid June, but there were also zeros found in the seed data in blue.
<-
zero_dat $data %>%
prepped_forecastmutate(type = "forecast") %>%
rename(flu.admits = point) %>%
bind_rows(observed_signal$data %>% filter(location == "02") %>% filter(date <= "2022-06-04") %>% mutate(type="observed"), . )
<- zero_dat %>%
zero_flags filter(flu.admits == 0)
%>%
zero_dat ggplot(mapping = aes(x = date, y = flu.admits)) +
geom_line(lty = "dotted") +
geom_line(aes(colour = type)) +
geom_point(data = zero_flags, mapping = aes(x = date, y = flu.admits), shape=23, size=4, color = "black") +
geom_point(aes(colour = type)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = type), alpha = 0.2) +
xlab("Date") +
ylab("Flu hospitalizations") +
ggtitle("Zero component\nNot Flagged")