validation_tests

Notation. For symbol definitions, see the notation vignette.

Overview

The aim of this vignette is show how to perform correct validation tests using closest-not-yet-treated control groups using childpen.

Simulate data

The package has a built in simulation function to draw data resembling child penalty studies.

library(childpen)

data <- simulate_data(n_individuals = 5000, treatment_groups = 24:28)
data |> tibble()
#> # A tibble: 40,000 × 5
#>       id female   age     D      Y
#>    <int>  <int> <int> <int>  <dbl>
#>  1     1      1    20    24 36267.
#>  2     1      1    21    24 46761.
#>  3     1      1    22    24 40172.
#>  4     1      1    23    24 61655.
#>  5     1      1    24    24 39286.
#>  6     1      1    25    24 52229.
#>  7     1      1    26    24 39057.
#>  8     1      1    27    24 59584.
#>  9     2      1    20    28 50250.
#> 10     2      1    21    28 52739.
#> # ℹ 39,990 more rows

The correct validation tests

See the estimation vignette for an explainer on the \(2\times2\) comparisons in childpen. Recall that \(d\) is the treatment group, \(a\) is the target age, and \(d^\prime=a+1\) is the closest not-yet-treated control group. Recall that the control group is \(d' = a + 1\), the cohort whose first birth is one year after the target age.

Assume that when presenting results, post-treatment, you report estimates for event times \(e=0,...,2\). Then, for each treatment group \(d\) you use 3 different control groups in post-treatment estimation. As the identification assumptions (e.g., parallel trends for DID) must hold for each point-estimate separately, this implies that it must hold within each treatment-control pair.

The above argument means that the validation tests should be done separately by treatment-control combinations. Returning to the above example, if you want to show results for \(e=0,...,2\) then you need to conduct pre-trend analysis for 3 different control groups. This is done automatically in the childpen package, as we show below.

For completeness, the validation tests are:

  1. Difference-in-differences (DID) estimates the average treatment effect (ATE) in pre-periods
  2. Triple differences (TD) estimates the gender gap in the ATE in pre-periods
  3. Normalized triple differences (NTD) estimates the gender gap in normalized effects in pre-periods

Multiple treatment group analysis

We will now do the main heavy lifting. We run the main estimation function, multiple_treatment_group_analysis(). Set periods_pre to the number of pre-treatment periods for which you want to conduct validation tests. As an example, we will examine two periods pre-treatment. Since we set the number of periods in the post-treatment to 2 using periods_post, this will report validation tests separately for 3 control groups, as discussed above.

res = multiple_treatment_group_analysis(data = data,
                                  treatment_groups = 24:25, # which treatment groups to run in the analysis
                                  periods_post = 2, # estimate results for post periods 0:2
                                  periods_pre = 2, # estimate pre-trend diagnostics, set to NULL to omit from estimation
                                  max_age = 40, # dont estimate results if age is above 40
                                  min_age = 20, # dont estimate results if age is below 20
                                  pre = 1, # use 1 period before treatment, can make further away if anticipation is concern
                                  verbose = FALSE # set to TRUE to output progress (i like to time loops) set to FALSE to omit messages
                                  )

Examining results of validation tests

As a first pass, lets see the results.

res |> tibble()
#> # A tibble: 270 × 16
#>        d    dp     a event_time estimand  method            est      se     ci_l
#>    <int> <dbl> <int>      <int> <chr>     <chr>           <dbl>   <dbl>    <dbl>
#>  1    24    25    24          0 APO       DID_Female    5.42e+4 7.95e+2  5.27e+4
#>  2    24    25    24          0 APO       DID_Male      6.23e+4 8.82e+2  6.06e+4
#>  3    24    25    24          0 ATE       DID_Female   -1.30e+4 7.02e+2 -1.43e+4
#>  4    24    25    24          0 ATE       DID_Male     -3.71e+3 8.27e+2 -5.34e+3
#>  5    24    25    24          0 theta     DID_Female   -2.39e-1 1.03e-2 -2.59e-1
#>  6    24    25    24          0 theta     DID_Male     -5.96e-2 1.27e-2 -8.45e-2
#>  7    24    25    24          0 ATE       TD           -9.25e+3 1.08e+3 -1.14e+4
#>  8    24    25    24          0 theta     NTD_Conv     -1.79e-1 1.64e-2 -2.12e-1
#>  9    24    25    24          0 Delta_rho NTD_New      -1.66e-1 1.59e-2 -1.97e-1
#> 10    24    25    24          0 APO       TD_Null       5.05e+4 1.15e+3  4.83e+4
#> # ℹ 260 more rows
#> # ℹ 7 more variables: ci_h <dbl>, t <dbl>, p <dbl>, n_female_treat <int>,
#> #   n_female_control <int>, n_male_treat <int>, n_male_control <int>

Focusing on \(d=24\), lets examine pre-trends. We will start with DID of females. Generally, valid pre-trend validation tests would behave such that the confidence intervals include 0, and there is no obvious trend in the pre-period, and there is no systematic difference between control groups. A valid pre-trend test shows point estimates near zero with confidence intervals covering zero and no systematic trend across pre-treatment event times.

Note that in the plot below I define control_offset as the difference between the control group \(d^\prime\) and the treatment group \(d\). E.g., for \(d=24\) and \(d^\prime=25\), i.e., the closest not-yet-treated control group at event time \(e=0\), I set control offset to 1.

Ribbons present 95% CI based on standard errors clustered at the individual level.

res |>
  filter(d == 24,
         a < d,
         estimand == "ATE",
         method == "DID_Female") |>
  mutate(control_offset = dp - d,
         control_offset = factor(control_offset)) |>
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) +
  geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() +
  scale_x_continuous(breaks = -3:-2) +
  facet_grid(cols = vars(control_offset))

Although this would be hard to look at, we can put all control offsets on same plot.

res |>
  filter(d == 24,
         a < d,
         estimand == "ATE",
         method == "DID_Female") |>
  mutate(control_offset = dp - d, 
         control_offset = factor(control_offset)) |> 
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) +
  geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + 
  scale_x_continuous(breaks = -3:-2)

Can do this for multiple treatment groups at same time

res |> 
  filter(a < d, 
         estimand == "ATE",
         method == "DID_Female") |> 
  mutate(control_offset = dp - d, 
         control_offset = factor(control_offset)) |> 
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) +
  geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + 
  scale_x_continuous(breaks = -3:-2) + 
  facet_grid(cols = vars(d))

Finally, can do this for all methods.

res |> 
  filter(a < d, 
         estimand == "ATE" & (method == "DID_Female" | method == "DID_Male" | method == "TD") |
           estimand == "theta" & method == "NTD_Conv") |>
  mutate(control_offset = dp - d, 
         control_offset = factor(control_offset)) |> 
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) +
  geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + 
  scale_x_continuous(breaks = -3:-2) + 
  facet_grid(cols = vars(d), rows = vars(method), scales = "free")