| Title: | Difference-in-Differences with Unpoolable Data |
| Version: | 3.0.0 |
| Maintainer: | Eric Jamieson <ericbrucejamieson@gmail.com> |
| Description: | A framework for estimating difference-in-differences with unpoolable data, based on Karim, Webb, Austin, and Strumpf (2025) <doi:10.48550/arXiv.2403.15910>. Supports common or staggered adoption, multiple groups, and the inclusion of covariates. Also computes p-values for the aggregate average treatment effect on the treated via the randomization inference procedure described in MacKinnon and Webb (2020) <doi:10.1016/j.jeconom.2020.04.024>. |
| License: | MIT + file LICENSE |
| Depends: | R (≥ 4.0) |
| Imports: | graphics, grDevices, stats, utils |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| URL: | https://github.com/ebjamieson97/undidR, https://ebjamieson97.github.io/undidR/ |
| BugReports: | https://github.com/ebjamieson97/undidR/issues |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| LazyData: | true |
| NeedsCompilation: | no |
| Packaged: | 2026-01-26 01:09:35 UTC; Eric Bruce Jamieson |
| Author: | Eric Jamieson [aut, cre, cph] |
| Repository: | CRAN |
| Date/Publication: | 2026-01-26 03:20:02 UTC |
undidR: Difference-in-Differences with Unpoolable Data
Description
Implements difference-in-differences with unpoolable data.
Details
A framework for estimating difference-in-differences with unpoolable data, based on Karim, Webb, Austin, and Strumpf (2025). Supports common or staggered adoption, multiple groups, and the inclusion of covariates. Also computes p-values for the aggregate average treatment effect on the treated via the randomization inference procedure described in MacKinnon and Webb (2020).
Stage One - Initialize
-
create_init_csv- Create initial CSV files for setup. -
create_diff_df- Creates the "diff matrix".
Stage Two - Silos
-
undid_stage_two- Calculates trends and differences.
Stage Three - Analysis
-
undid_stage_three- Computes ATTs and p-values.
Author(s)
Eric Jamieson
See Also
Useful links:
Report bugs at https://github.com/ebjamieson97/undidR/issues
Extract coefficients from UnDiDObj
Description
Extract coefficients from UnDiDObj
Usage
## S3 method for class 'UnDiDObj'
coef(object, level = c("agg", "sub"), ...)
Arguments
object |
A |
level |
Specify either |
... |
other arguments |
Value
A data frame of coefficient estimates
Creates the empty_diff_df.csv
Description
Creates the empty_diff_df.csv which lists all of the differences that
need to calculated at each silo in order to compute the aggregate ATT.
The empty_diff_df.csv is then to be sent out to each silo to be filled out.
Usage
create_diff_df(
init_filepath,
date_format,
freq,
covariates = FALSE,
freq_multiplier = FALSE,
weights = "both",
filename = "empty_diff_df.csv",
filepath = tempdir()
)
Arguments
init_filepath |
A character filepath to the |
date_format |
A character specifying the date format used in the
|
freq |
A character indicating the length of the time periods to be used
when computing the differences in mean outcomes between periods at each
silo. Options are: |
covariates |
A character vector specifying covariates to be considered
at each silo. If |
freq_multiplier |
A numeric value or |
weights |
A character indicating the weighting to use. The options are
|
filename |
A character filename for the created CSV file. Defaults to
|
filepath |
Filepath to save the CSV file. Defaults to |
Details
Ensure that dates in the init.csv are entered consistently
in the same date format. Call undid_date_formats() to see a list of valid
date formats. Covariates specified when calling create_diff_df() will
override any covariates specified in the init.csv.
Value
A data frame detailing the silo and time combinations for which differences must be calculated in order to compute the aggregate ATT. A CSV copy is saved to the specified directory which is then to be sent out to each silo.
Examples
file_path <- system.file("extdata/staggered", "init.csv",
package = "undidR")
create_diff_df(
init_filepath = file_path,
date_format = "yyyy",
freq = "yearly"
)
unlink(file.path(tempdir(), "empty_diff_df.csv"))
Creates the init.csv
Description
The create_init_csv() function generates a CSV file with information
on each silo's start times, end times, and treatment times.
If parameters are left empty, generates a blank CSV with only the headers.
Usage
create_init_csv(
silo_names = character(),
start_times = character(),
end_times = character(),
treatment_times = character(),
covariates = character(),
filename = "init.csv",
filepath = tempdir()
)
Arguments
silo_names |
A character vector of silo names. |
start_times |
A character vector of start times. |
end_times |
A character vector of end times. |
treatment_times |
A character vector of treatment times. |
covariates |
A character vector of covariates, or, |
filename |
A character filename for the created initializing CSV file.
Defaults to |
filepath |
Filepath to save the CSV file. Defaults to |
Details
Ensure dates are entered consistently in the same date format.
Call undid_date_formats() to view valid date formats. Control silos
should be marked as "control" in the treatment_times vector. If
covariates is FALSE, no covariate column will be included in the CSV.
Value
A data frame containing the contents written to the CSV file.
The CSV file is saved in the specified directory (or in a temporary
directory by default) with the default filename init.csv.
Examples
create_init_csv(
silo_names = c("73", "46", "54", "23", "86", "32",
"71", "58", "64", "59", "85", "57"),
start_times = "1989",
end_times = "2000",
treatment_times = c(rep("control", 6),
"1991", "1993", "1996", "1997", "1997", "1998"),
covariates = c("asian", "black", "male")
)
unlink(file.path(tempdir(), "init.csv"))
Plot method for UnDiDObj
Description
Plot method for UnDiDObj
Usage
## S3 method for class 'UnDiDObj'
plot(x, event = FALSE, event_window = NULL, ci = 0.95, ...)
Arguments
x |
A |
event |
Logical. If |
event_window |
Numeric vector of length 2 specifying the event window
as c(start, end). Default is |
ci |
Numeric between 0 and 1 specifying confidence level. Default is 0.95. |
... |
other arguments passed to plot |
Print method for UnDiDObj
Description
Print method for UnDiDObj
Usage
## S3 method for class 'UnDiDObj'
print(x, level = c("agg", "sub"), ...)
Arguments
x |
A |
level |
Specify either |
... |
other arguments |
Example merit data
Description
A dataset containing college enrollment and demographic data for analyzing the effects of merit programs in state 71.
Usage
silo71
Format
A tibble with 569 rows and 7 variables:
- coll
Binary indicator for college enrollment (outcome variable)
- merit
Binary indicator for merit program (treatment variable)
- male
Binary indicator for male students
- black
Binary indicator for Black students
- asian
Binary indicator for Asian students
- year
Year of observation
- state
State identifier
Source
https://economics.uwo.ca/people/conley_docs/code_to_download.html
Summary method for UnDiDObj
Description
Summary method for UnDiDObj
Usage
## S3 method for class 'UnDiDObj'
summary(object, level = c("all", "agg", "sub"), ...)
Arguments
object |
A |
level |
Specify either |
... |
other arguments |
Shows valid date formats
Description
The undid_date_formats() function returns a list of all valid date formats
that can be used within the undidR package.
Usage
undid_date_formats()
Details
The date formats returned by this function are used to ensure
consistency in date processing within the undidR package.
Value
A named list containing valid date formats:
-
General_Formats: General date formats compatible with the package. -
R_Specific_Formats: Date formats specific to R. -
Other_Formats: Formats seen sometimes in Stata.
Examples
undid_date_formats()
Computes UNDID results
Description
Takes in all of the filled diff df CSV files and uses them to compute group level ATTs as well as the aggregate ATT and its standard errors and p-values.
Usage
undid_stage_three(
dir_path,
agg = "g",
weights = "both",
covariates = FALSE,
notyet = FALSE,
nperm = 999,
verbose = 100,
check_anon_size = FALSE,
hc = "hc3",
only = NULL,
omit = NULL,
max_attempts = 100
)
Arguments
dir_path |
A character specifying the filepath to the folder containing all of the filled diff df CSV files. |
agg |
A character which specifies the aggregation methodology for
computing the aggregate ATT in the case of staggered adoption.
Options are: |
weights |
A string, determines which of the weighting methodologies
should be used. Options are: |
covariates |
A logical value (either |
notyet |
A logical value which declares if the not-yet-treated
differences from treated silos should be used as controls when computing
relevant sub-aggregate ATTs. Defaults to |
nperm |
Number of random permutations of treatment assignment to use
when calculating the randomization inference p-value. Defaults to |
verbose |
A numeric value (or |
check_anon_size |
A logical value, which if |
hc |
Specify which heteroskedasticity-consistent covariance matrix
estimator (HCCME) should be used. Options are |
only |
A character vector of silos to include. Defaults to |
omit |
A character vector of silos to omit. Defaults to |
max_attempts |
A numeric value. Sets the maximum number of attempts
to find a new unique random permutations during the randomization
inference procedure. Defaults to |
Value
An UnDiDObj with S3 methods of summary(), citation(),
print(), and coef().
Examples
# Execute `undid_stage_three()`
dir <- system.file("extdata/staggered", package = "undidR")
# Recommended: nperm >= 399 for reasonable precision
# (~15 seconds on typical hardware)
undid_stage_three(dir, agg = "g", nperm = 399, verbose = NULL)
Runs UNDID stage two procedures
Description
Based on the information given in the received empty_diff_df.csv,
computes the appropriate differences in mean outcomes at the local silo
and saves as filled_diff_df_$silo_name.csv. Also stores trends data
as trends_data_$silo_name.csv.
Usage
undid_stage_two(
empty_diff_filepath,
silo_name,
silo_df,
time_column,
outcome_column,
silo_date_format = NULL,
consider_covariates = TRUE,
filepath = tempdir(),
anonymize_weights = FALSE,
anonymize_size = 5
)
Arguments
empty_diff_filepath |
A character filepath to the |
silo_name |
A character indicating the name of the local silo. Ensure
spelling is the same as it is written in the |
silo_df |
A data frame of the local silo's data. Ensure any covariates
are spelled the same in this data frame as they are in the
|
time_column |
A character which indicates the name of the column in
the |
outcome_column |
A character which indicates the name of the column in
the |
silo_date_format |
A character which indicates the date format which
the date strings in the |
consider_covariates |
An optional logical parameter which if set to
|
filepath |
Character value indicating the filepath to
save the CSV files. Defaults to |
anonymize_weights |
A logical value (defaults |
anonymize_size |
A numeric value. Counts will be rounded to the nearest
multiple of this value if |
Details
Covariates at the local silo should be renamed to match the
spelling used in the empty_diff_df.csv.
Value
A list of data frames. The first being the filled differences data frame, and the second being the trends data data frame. Use the suffix $diff_df to access the filled differences data frame, and use $trends_data to access the trends data data frame.
Examples
# Load data
silo_data <- silo71
empty_diff_path <- system.file("extdata/staggered", "empty_diff_df.csv",
package = "undidR")
# Run `undid_stage_two()`
results <- undid_stage_two(
empty_diff_filepath = empty_diff_path,
silo_name = "71",
silo_df = silo_data,
time_column = "year",
outcome_column = "coll",
silo_date_format = "yyyy"
)
# View results
head(results$diff_df)
head(results$trends_data)
# Clean up temporary files
unlink(file.path(tempdir(), c("diff_df_71.csv",
"trends_data_71.csv")))