| Type: | Package |
| Title: | Fast and Memory Efficient Fitting of Linear Models with High-Dimensional Fixed Effects |
| Version: | 2.0.0 |
| Description: | Fast and user-friendly estimation of generalized linear models with multiple fixed effects and cluster the standard errors. The method to obtain the estimated fixed-effects coefficients is based on Stammann (2018) <doi:10.48550/arXiv.1707.01815>, Gaure (2013) <doi:10.1016/j.csda.2013.03.024>, Berge (2018) https://ideas.repec.org/p/luc/wpaper/18-13.html, and Correia et al. (2020) <doi:10.1177/1536867X20909691>. This implementation is described in Vargas Sepulveda (2025) <doi:10.1371/journal.pone.0331178>. |
| License: | Apache License (≥ 2) |
| URL: | https://pacha.dev/capybara/, https://github.com/pachadotdev/capybara |
| BugReports: | https://github.com/pachadotdev/capybara/issues |
| Depends: | R (≥ 4.0.0) |
| Imports: | Formula, generics, ggplot2, MASS, rlang, stats |
| Suggests: | broom, knitr, rmarkdown, tinytest, units |
| LinkingTo: | armadillo4r, cpp4r |
| VignetteBuilder: | knitr |
| Config/roxygen2/version: | 8.0.0 |
| Encoding: | UTF-8 |
| LazyData: | true |
| NeedsCompilation: | yes |
| Packaged: | 2026-06-15 07:34:10 UTC; pacha |
| Author: | Mauricio Vargas Sepulveda
|
| Maintainer: | Mauricio Vargas Sepulveda <m.vargas.sepulveda@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-15 08:50:02 UTC |
Generalized Linear Models (GLMs) with high-dimensional k-way fixed effects
Description
Provides a routine to partial out factors with many levels during the optimization of the log-likelihood
function of the corresponding GLM. The package is based on the algorithm described in Stammann (2018). It also offers
an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and
multi-way clustered standard errors. This package is a ground up rewrite with multiple refactors, optimizations, and
new features compared to the original package alpaca. In its current state, the package is stable and future changes
will be limited to bug fixes and improvements, but not to altering the functions' arguments or outputs.
Author(s)
Maintainer: Mauricio Vargas Sepulveda m.vargas.sepulveda@gmail.com (ORCID)
Authors:
Mauricio Vargas Sepulveda m.vargas.sepulveda@gmail.com (ORCID)
Other contributors:
Joao Santos Silva [thesis advisor]
Yoto Yotov [contributor]
See Also
Useful links:
Report bugs at https://github.com/pachadotdev/capybara/issues
Broom Integration
Description
The provided broom methods do the following:
-
augment: Takes the input data and adds additional columns with the fitted values and residuals. -
glance: Extracts the deviance, null deviance, and the number of observations.' -
tidy: Extracts the estimated coefficients and their standard errors.
Usage
## S3 method for class 'feglm'
augment(x, newdata = NULL, ...)
## S3 method for class 'felm'
augment(x, newdata = NULL, ...)
## S3 method for class 'feglm'
glance(x, ...)
## S3 method for class 'felm'
glance(x, ...)
## S3 method for class 'feglm'
tidy(x, conf_int = FALSE, conf_level = 0.95, ...)
## S3 method for class 'felm'
tidy(x, conf_int = FALSE, conf_level = 0.95, ...)
Arguments
x |
A fitted model object. |
newdata |
Optional argument to use data different from the data used to fit the model. |
... |
Additional arguments passed to the method. |
conf_int |
Logical indicating whether to include the confidence interval. |
conf_level |
The confidence level for the confidence interval. |
Value
A tibble with the respective information for the augment, glance, and tidy methods.
Examples
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
fit <- fepoisson(ltrade ~ ldist, ross2004_subset,
control = fit_control(keep_data = TRUE)
)
broom::augment(fit)
broom::glance(fit)
broom::tidy(fit)
Autoplot method for feglm objects
Description
Extracts the estimated coefficients and their confidence intervals.
Extracts the estimated coefficients and their confidence intervals.
Usage
## S3 method for class 'feglm'
autoplot(object, ...)
## S3 method for class 'felm'
autoplot(object, ...)
Arguments
object |
A fitted model object. |
... |
Additional arguments passed to the method. In this case, the additional argument is |
Value
A ggplot object with the estimated coefficients and their confidence intervals.
A ggplot object with the estimated coefficients and their confidence intervals.
Examples
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
fit <- fepoisson(ltrade ~ ldist | ctry1, ross2004_subset)
autoplot(fit, conf_level = 0.99)
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
fit <- felm(ltrade ~ ldist | ctry1, ross2004_subset)
autoplot(fit, conf_level = 0.99)
Separation Example Datasets
Description
Nonexistence of estimates of Poisson models across different statistical packages.
Usage
correia2019
Format
correia2019
A list of data frames with three elements:
- example1
Data frame used to show lack of convergence
- example2
Data frame used to show lack of convergence with step-halving
- fe1
Data frame with fixed effects used with the 'alpaca' package
Source
'correia2019' GitHub repository (https://github.com/sergiocorreia/correia2019/blob/master/guides/nonexistence_benchmarks.md#r-packages)
Generate fixed effects tables
Description
Generate fixed effects tables
Usage
fe_table(
...,
n = 5,
coef_digits = 3,
latex = FALSE,
model_names = NULL,
caption = NULL,
label = NULL,
position = "htbp"
)
Arguments
... |
One or more model objects of |
n |
Number of levels to show per fixed effect dimension. Use
|
coef_digits |
Number of digits for fixed effect coefficients. The default is 3. |
latex |
Whether to output as LaTeX code. The default is |
model_names |
Optional vector of custom model names. |
caption |
Optional caption for the table (LaTeX only). |
label |
Optional label for cross-referencing (LaTeX only). |
position |
LaTeX float position specifier (LaTeX only). The default
is |
Value
A formatted fixed effects table of class summary_table.
Examples
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
m1 <- felm(ltrade ~ ldist | ctry1, ross2004_subset)
m2 <- fepoisson(ltrade ~ ldist | ctry1, ross2004_subset)
fe_table(m1, m2, model_names = c("Linear", "Poisson"))
GLM fitting with high-dimensional k-way fixed effects
Description
feglm can be used to fit generalized linear models with many high-dimensional fixed effects. The term fixed effect means having one intercept for each level in each category.
Usage
feglm(
formula = NULL,
data = NULL,
family = gaussian(),
weights = NULL,
vcov = NULL,
beta_start = NULL,
eta_start = NULL,
offset = NULL,
control = NULL
)
Arguments
formula |
an object of class |
data |
an object of class |
family |
the link function to be used in the model. Similar to glm.fit this has to be the result
of a call to a family function. Default is |
weights |
an optional string with the name of the prior weights variable in |
vcov |
an optional character string specifying the type of variance-covariance estimator.
One of |
beta_start |
an optional vector of starting values for the structural parameters in the linear predictor.
Default is |
eta_start |
an optional vector of starting values for the linear predictor. |
offset |
an optional formula or numeric vector specifying an a priori known component to be included in the
linear predictor. If a formula, it should be of the form |
control |
a named list of parameters for controlling the fitting process. See fit_control for details. |
Details
If feglm does not converge this is often a sign of linear dependence between one or more regressors and a fixed effects category. In this case, you should carefully inspect your model specification.
Value
A named list of class "feglm". The list contains the following fifteen elements:
coefficients |
a named vector of the estimated coefficients |
eta |
a vector of the linear predictor |
weights |
a vector of the weights used in the estimation |
hessian |
a matrix with the numerical second derivatives |
deviance |
the deviance of the model |
null_deviance |
the null deviance of the model |
conv |
a logical indicating whether the model converged |
iter |
the number of iterations needed to converge |
nobs |
a named vector with the number of observations used in the estimation indicating the dropped and perfectly predicted observations |
fe_levels |
a named vector with the number of levels in each fixed effects |
nms_fe |
a list with the names of the fixed effects variables |
formula |
the formula used in the model |
data |
the data used in the model after dropping non-contributing observations |
family |
the family used in the model |
control |
the control list used in the model |
vcov_type |
a character string indicating the variance-covariance type used: |
References
Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.
Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).
Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.
Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.
Examples
# check the felm examples for the details about clustered standard errors
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
fit <- feglm(ltrade ~ ldist | ctry1, ross2004_subset, family = poisson())
summary(fit)
LM fitting with high-dimensional k-way fixed effects
Description
feglm can be used to fit linear models with many high-dimensional fixed effects. The estimation procedure is based on unconditional maximum likelihood and can be interpreted as a “weighted demeaning” approach.
Usage
felm(formula = NULL, data = NULL, weights = NULL, vcov = NULL, control = NULL)
Arguments
formula |
an object of class |
data |
an object of class |
weights |
an optional string with the name of the prior weights variable in |
vcov |
an optional character string specifying the type of variance-covariance estimator.
One of |
control |
a named list of parameters for controlling the fitting process. See fit_control for details. |
Value
A named list of class "felm". The list contains the following
eleven elements:
coefficients |
a named vector of the estimated coefficients |
fitted_values |
a vector of the estimated dependent variable |
weights |
a vector of the weights used in the estimation |
hessian |
a matrix with the numerical second derivatives |
null_deviance |
the null deviance of the model |
nobs |
a named vector with the number of observations used in the estimation indicating the dropped and perfectly predicted observations |
fe_levels |
a named vector with the number of levels in each fixed effect |
nms_fe |
a list with the names of the fixed effects variables |
formula |
the formula used in the model |
data |
the data used in the model after dropping non-contributing observations |
control |
the control list used in the model |
References
Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.
Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).
Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.
Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.
Examples
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
# Model with fixed effects
fit <- felm(ltrade ~ ldist | ctry1, ross2004_subset)
summary(fit)
ross2004_subset <- ross2004[ross2004$year %in% c(1994, 1999), ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
# Model without fixed effects but with clustered standard errors
# Note: Use 0 to indicate no fixed effects when specifying clusters
fit <- felm(ltrade ~ ldist | 0 | year, ross2004_subset)
summary(fit)
Negative Binomial model fitting with high-dimensional k-way fixed effects
Description
A routine that uses the same internals as feglm.
Usage
fenegbin(
formula = NULL,
data = NULL,
weights = NULL,
beta_start = NULL,
eta_start = NULL,
init_theta = NULL,
link = c("log", "identity", "sqrt"),
offset = NULL,
control = NULL
)
Arguments
formula |
an object of class |
data |
an object of class |
weights |
an optional string with the name of the prior weights variable in |
beta_start |
an optional vector of starting values for the structural parameters in the linear predictor.
Default is |
eta_start |
an optional vector of starting values for the linear predictor. |
init_theta |
an optional initial value for the theta parameter (see glm.nb). |
link |
the link function. Must be one of |
offset |
an optional formula or numeric vector specifying an a priori known component to be included in the
linear predictor. If a formula, it should be of the form |
control |
a named list of parameters for controlling the fitting process. See fit_control for details. |
Value
A named list of class "feglm". The list contains the following
eighteen elements:
coefficients |
a named vector of the estimated coefficients |
eta |
a vector of the linear predictor |
weights |
a vector of the weights used in the estimation |
hessian |
a matrix with the numerical second derivatives |
deviance |
the deviance of the model |
null_deviance |
the null deviance of the model |
conv |
a logical indicating whether the model converged |
iter |
the number of iterations needed to converge |
theta |
the estimated theta parameter |
iter_outer |
the number of outer iterations |
conv_outer |
a logical indicating whether the outer loop converged |
nobs |
a named vector with the number of observations used in the estimation indicating the dropped and perfectly predicted observations |
fe_levels |
a named vector with the number of levels in each fixed effects |
nms_fe |
a list with the names of the fixed effects variables |
formula |
the formula used in the model |
data |
the data used in the model after dropping non-contributing observations |
family |
the family used in the model |
control |
the control list used in the model |
Examples
# check the felm examples for the details about clustered standard errors
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
fit <- fenegbin(ltrade ~ ldist | ctry1, ross2004_subset)
summary(fit)
Poisson model fitting high-dimensional with k-way fixed effects
Description
A wrapper for feglm with
family = poisson().
Usage
fepoisson(
formula = NULL,
data = NULL,
weights = NULL,
vcov = NULL,
beta_start = NULL,
eta_start = NULL,
offset = NULL,
control = NULL
)
Arguments
formula |
an object of class |
data |
an object of class |
weights |
an optional string with the name of the prior weights variable in |
vcov |
an optional character string specifying the type of variance-covariance estimator.
One of |
beta_start |
an optional vector of starting values for the structural parameters in the linear predictor.
Default is |
eta_start |
an optional vector of starting values for the linear predictor. |
offset |
an optional formula or numeric vector specifying an a priori known component to be included in the
linear predictor. If a formula, it should be of the form |
control |
a named list of parameters for controlling the fitting process. See fit_control for details. |
Value
A named list of class "feglm".
Examples
# check the feglm examples for the details about clustered standard errors
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
fit <- fepoisson(ltrade ~ ldist, ross2004_subset)
summary(fit)
Asymmetric Poisson Pseudo-Maximum Likelihood (APPML) Estimation
Description
Fits an asymmetric Poisson pseudo-maximum likelihood model with high-dimensional fixed effects using expectile regression. This approach extends standard PPML by allowing different weights for positive and negative residuals, enabling estimation of conditional expectiles rather than the conditional mean.
Usage
fepoisson_asymmetric(
formula = NULL,
data = NULL,
weights = NULL,
beta_start = NULL,
eta_start = NULL,
offset = NULL,
control = NULL
)
Arguments
formula |
an object of class |
data |
an object of class |
weights |
an optional string with the name of the prior weights variable in |
beta_start |
an optional vector of starting values for the structural parameters in the linear predictor.
Default is |
eta_start |
an optional vector of starting values for the linear predictor. |
offset |
an optional formula or numeric vector specifying an a priori known component to be included in the
linear predictor. If a formula, it should be of the form |
control |
a named list of parameters for controlling the fitting process. See fit_control for details. |
Details
The APPML estimator minimizes an asymmetric loss function based on expectiles. For a given expectile \tau,
observations with negative residuals receive weight \tau while observations with positive residuals
receive weight 1 - \tau. The algorithm iteratively:
Computes residuals from the current fit
Updates weights as
w_i = |\tau - \mathbf{1}(r_i < 0)|Re-fits the weighted Poisson model
Checks convergence using
(b - b_{old})' V^{-1} (b - b_{old}) < \epsilon
The expectile parameter is specified via control = fit_control(expectile = ...). When
expectile = 0.5, the estimator is equivalent to standard PPML. Values below 0.5 estimate
lower conditional expectiles (more sensitive to small values), while values above 0.5 estimate
upper conditional expectiles (more sensitive to large values).
Value
A named list of class "feglm" containing:
coefficients |
named vector of estimated coefficients |
vcov |
variance-covariance matrix of coefficients |
eta |
linear predictor |
fitted_values |
fitted values from the final iteration |
residuals |
residuals from the final fit |
weights |
observation weights used in final fit |
appml_weights |
asymmetric weights used in APPML algorithm |
deviance |
the deviance of the model |
null_deviance |
the null deviance of the model |
conv |
logical indicating whether inner GLM converged |
conv_outer |
logical indicating whether APPML outer loop converged |
iter |
number of inner iterations |
iter_outer |
number of outer APPML iterations |
expectile |
the expectile value used |
objective_function |
final value of the convergence criterion |
negative_residuals_share |
proportion of negative residuals in final fit |
nobs |
a named vector with the number of observations |
fe_levels |
a named vector with the number of levels in each fixed effect |
nms_fe |
a list with the names of the fixed effects variables |
formula |
the formula used in the model |
family |
the family used in the model (Poisson) |
control |
the control list used in the model |
References
Newey, W. K., & Powell, J. L. (1987). Asymmetric least squares estimation and testing. Econometrica, 55(4), 819-847.
See Also
Examples
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
# Lower expectile (10th) - more weight on negative residuals
fit10 <- fepoisson_asymmetric(
ltrade ~ ldist | ctry1, ross2004_subset,
control = fit_control(expectile = 0.1)
)
summary(fit10)
Set feglm Control Parameters
Description
Set and change parameters used for fitting feglm, felm, and fenegbin. Termination conditions are similar to glm.
Usage
fit_control(
dev_tol = 1e-06,
center_tol = 1e-06,
collin_tol = 1e-08,
step_halving_factor = 0.5,
alpha_tol = 1e-05,
iter_max = 50L,
iter_center_max = 10000L,
iter_inner_max = 50L,
iter_alpha_max = 10000L,
step_halving_memory = 0.9,
max_step_halving = 2L,
start_inner_tol = 1e-05,
grand_acc_period = 4L,
centering = "berge",
sep_tol = 1e-08,
sep_zero_tol = 1e-08,
sep_mu_tol = 1e-06,
sep_max_iter = 200L,
sep_simplex_max_iter = 2000L,
sep_use_relu = TRUE,
sep_use_simplex = TRUE,
sep_use_mu = TRUE,
return_fe = TRUE,
keep_tx = FALSE,
keep_data = FALSE,
return_hessian = FALSE,
check_separation = TRUE,
init_theta = 0,
vcov_type = NULL,
expectile = NULL,
expectile_tol = 1e-12,
expectile_iter_max = 50L,
expectile_glm_iter_max = NULL,
expectile_trace = FALSE
)
Arguments
dev_tol |
tolerance level for the first stopping condition of the maximization routine. The stopping condition
is based on the relative change of the deviance in iteration |
center_tol |
tolerance level for the stopping condition of the centering algorithm. The stopping condition is
based on the relative change of the centered variable similar to the |
collin_tol |
tolerance level for detecting collinearity. The default is |
step_halving_factor |
numeric indicating the factor by which the step size is halved to iterate towards
convergence. This is used to control the step size during optimization. The default is |
alpha_tol |
tolerance for fixed effects (alpha) convergence. The default is |
iter_max |
integer indicating the maximum number of iterations in the maximization routine. The default is
|
iter_center_max |
integer indicating the maximum number of iterations in the centering algorithm. The default is
|
iter_inner_max |
integer indicating the maximum number of iterations in the inner loop of the centering
algorithm. The default is |
iter_alpha_max |
maximum iterations for fixed effects computation. The default is |
step_halving_memory |
numeric memory factor for step-halving algorithm. Controls how much of the previous
iteration is retained. The default is |
max_step_halving |
maximum number of post-convergence step-halving attempts. The default is |
start_inner_tol |
starting tolerance for inner solver iterations. The default is |
grand_acc_period |
integer indicating the period (in iterations) for grand acceleration in the centering
algorithm. Grand acceleration applies a second-level Irons-Tuck extrapolation on the overall convergence
trajectory. Lower values (e.g., 4-10) may speed up convergence for difficult problems. Set to a very large
value (e.g., 10000) to effectively disable. The default is |
centering |
character string indicating the centering algorithm to use for demeaning fixed effects.
|
sep_tol |
tolerance for separation detection. The default is |
sep_zero_tol |
tolerance for treating values as zero in separation detection. The default is |
sep_mu_tol |
tolerance for mu-based separation detection during IRLS iterations. Observations with
|
sep_max_iter |
maximum iterations for ReLU separation detection algorithm. The default is |
sep_simplex_max_iter |
maximum iterations for simplex separation detection algorithm. The default is |
sep_use_relu |
logical indicating whether to use the ReLU algorithm for separation detection. The default is |
sep_use_simplex |
logical indicating whether to use the simplex algorithm for separation detection. The default is |
sep_use_mu |
logical indicating whether to use mu-based separation detection during IRLS iterations.
This catches observations where predicted values become extremely small (suggesting perfect prediction
of zeros). Following ppmlhdfe methodology. The default is |
return_fe |
logical indicating if the fixed effects should be returned. This can be useful when fitting general
equilibrium models where skipping the fixed effects for intermediate steps speeds up computation. Set it to
|
keep_tx |
logical indicating if the centered regressor matrix should be stored. The default is |
keep_data |
logical indicating if the filtered data should be stored in the result object. Required for
|
return_hessian |
logical indicating if the Hessian matrix should be returned. The Hessian is a P*P
matrix used to compute the variance-covariance matrix. The default is |
check_separation |
logical indicating whether to perform separation detection for Poisson models. When |
init_theta |
Initial value for the negative binomial dispersion parameter (theta). The default is |
vcov_type |
Optional character string specifying the type of variance-covariance estimator to be used.
When |
expectile |
numeric value between 0 and 1 (exclusive) specifying the expectile for asymmetric
Poisson pseudo-maximum likelihood (APPML) estimation. When |
expectile_tol |
tolerance level for the stopping condition of the expectile iteration algorithm. The convergence criterion uses a hybrid approach: the algorithm converges when the squared norm
of coefficient changes |
expectile_iter_max |
integer indicating the maximum number of iterations for the expectile
reweighting algorithm. The default is |
expectile_glm_iter_max |
integer indicating the maximum number of inner GLM (IRLS) iterations
per APPML outer iteration. When |
expectile_trace |
logical indicating whether to print iteration information during expectile
estimation. The default is |
Value
A named list of control parameters.
Memory-Efficient vs Full Models
By default, fit_control() returns "thin" model objects with minimal memory footprint:
-
keep_data = FALSE- data not stored -
return_fe = FALSE- fixed effects not stored -
return_hessian = FALSE- Hessian not stored -
keep_tx = FALSE- centered matrix not stored
See Also
Examples
ross2004_subset <- ross2004[ross2004$year == 1999, ]
ross2004_subset <- ross2004_subset[ross2004_subset$ltrade >
quantile(ross2004_subset$ltrade, 0.75), ]
felm(ltrade ~ ldist | ctry1, ross2004_subset,
control = fit_control(dev_tol = 1e-10, center_tol = 1e-10)
)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- generics
- ggplot2
Subset of WTO data from Ross (2004)
Description
Data from Ross (2004) used to show the different variance-covariance estimators from Cameron and Miller (2014).
Usage
ross2004
Format
ross2004
A data frame with 234,597 rows and 22 columns:
- ltrade
Log of bilateral trade between i and j at time t
- bothin
Binary variable which is unity if both i and j are GATT/WTO members at t
- onein
Binary variable which is unity if either i or j is a GATT/WTO member at t
- gsp
Binary variable which is unity if i was a GSP beneficiary of j or vice versa at t
- ldist
Log of distance between i and j
- lrgdp
Log of real GDP
- lrgdppc
Log of real GDP per capita
- regional
Binary variable which is unity if i and j are in the same region
- custrict
Binary variable which is unity if i and j are in the same customs union
- comlang
Binary variable which is unity if i and j share a common language
- border
Binary variable which is unity if i and j share a border
- landl
Number of landlocked countries in the country-pair (0, 1, or 2)
- island
Number of island nations in the pair (0, 1, or 2)
- lareap
Log of the area of the country (in square kilometers)
- comcol
Binary variable which is unity if i and j were ever colonies after 1945 with the same colonizer
- curcol
Binary variable which is unity if i and j are colonies at time t
- colony
Binary variable which is unity if i ever colonized j or vice versa
- comctry
Binary variable which is unity if i and j remained part of the same nation during the sample (e.g., France and Guadeloupe)
- ctry1
Country 1 ISO-3 code
- ctry2
Country 2 ISO-3 code
- pair
Country pair undirected dyads
- year
Year of observation
Source
Do We Really Know That the WTO Increases Trade? (DOI: 10.1257/000282804322970724)
Recompute Sandwich Variance-Covariance Matrix
Description
Recompute the variance-covariance matrix for a fitted model using a different
clustering structure or covariance type. This allows changing the vcov estimator without
re-fitting the model, provided the model was fit with keep_tx = TRUE and
return_hessian = TRUE in the control parameters.
Usage
sandwich_vcov(
object,
cluster1 = NULL,
cluster2 = NULL,
type = c("hetero", "clustered", "m-estimator", "dyadic", "m-estimator-dyadic"),
...
)
Arguments
object |
a fitted model object of class |
cluster1 |
a vector or factor for the first clustering variable. Required for
|
cluster2 |
a vector or factor for the second clustering variable. Required for
|
type |
character string specifying the covariance type. One of:
|
... |
additional arguments (currently ignored). |
Details
The model must be fit with fit_control(keep_tx = TRUE, return_hessian = TRUE) to store
the centered design matrix, Hessian, and working residuals needed for vcov recomputation.
Note that keep_data = TRUE is NOT required - residuals are stored automatically when
keep_tx = TRUE.
For dyadic clustering (used in gravity/trade models), cluster1 and cluster2
represent the two entity dimensions (e.g., exporter and importer). The function handles
the full dyadic correlation structure including cross-entity correlations.
Value
A named matrix of covariance estimates.
References
Cameron, C., J. Gelbach, and D. Miller (2011). "Robust Inference With Multiway Clustering". Journal of Business & Economic Statistics 29(2).
Cameron, C. and D. Miller (2014). "Robust Inference for Dyadic Data". Unpublished manuscript.
See Also
Examples
# Refitting models
ross2004_s1 <- ross2004[ross2004$year == 1994, ]
ross2004_s1 <- ross2004_s1[ross2004_s1$ltrade >
quantile(ross2004_s1$ltrade, 0.75), ]
ross2004_s2 <- ross2004[ross2004$year == 1999, ]
ross2004_s2 <- ross2004_s2[ross2004_s2$ltrade >
quantile(ross2004_s2$ltrade, 0.75), ]
ross2004_subset <- rbind(ross2004_s1, ross2004_s2)
fepoisson(
ltrade ~ ldist | ctry1, ross2004_subset,
control = fit_control(vcov_type = "hetero")
)
fepoisson(
ltrade ~ ldist | ctry1, ross2004_subset,
control = fit_control(vcov_type = "m-estimator")
)
# Reusing models
# Store required components
fit <- fepoisson(
ltrade ~ ldist | ctry1, ross2004_subset,
control = fit_control(keep_tx = TRUE, return_hessian = TRUE)
)
# Heteroskedastic-robust HC0 sandwich (no cluster variable needed)
sandwich_vcov(fit, type = "hetero")
#' One-way cluster
sandwich_vcov(fit, cluster1 = ross2004_subset$year, type = "clustered")
Generate formatted regression tables
Description
Generate formatted regression tables
Usage
summary_table(
...,
coef_digits = 3,
se_digits = 3,
stars = TRUE,
latex = FALSE,
model_names = NULL,
caption = NULL,
label = NULL,
position = "htbp"
)
Arguments
... |
One or more model objects of |
coef_digits |
Number of digits for coefficients. The default is 3. |
se_digits |
Number of digits for standard errors. The default is 3. |
stars |
Whether to include significance stars. The default is |
latex |
Whether to output as LaTeX code. The default is |
model_names |
Optional vector of custom model names |
caption |
Optional caption for the table (LaTeX only) |
label |
Optional label for cross-referencing (LaTeX only) |
position |
LaTeX float position specifier (LaTeX only). The default is
|
Value
A formatted table
Examples
ross2004_subset <- ross2004[ross2004$year == 1999 &
ross2004$ltrade > quantile(ross2004$ltrade, 0.75), ]
m1 <- felm(ltrade ~ ldist | ctry1, ross2004_subset)
m2 <- fepoisson(ltrade ~ ldist | ctry1, ross2004_subset)
summary_table(m1, m2, model_names = c("Linear", "Poisson"))
Update a fitted feglm model
Description
S3 method for update() that understands the |-separated formula syntax
used by feglm(). Uses Formula::update.Formula() for proper handling of
multi-part formulas. Identical semantics to update.felm().
Usage
## S3 method for class 'feglm'
update(object, formula. = . ~ ., vcov = NULL, family = NULL, ...)
Arguments
object |
A fitted |
formula. |
Update formula; only the segments you want to change need to
differ from
|
vcov |
Optional new |
family |
Optional new family (e.g. |
... |
Additional arguments forwarded to |
Value
A refitted feglm object.
Update a fitted felm model
Description
S3 method for update() that understands the |-separated formula syntax
used by felm(). R's built-in stats::update.formula() breaks on these
formulas because the | parts look like factor arithmetic. This method uses
the Formula::update.Formula() method which correctly handles multi-part
formulas.
The . placeholder works as usual:
-
. ~ .- keep the current response and RHS regressors. The second
|segment replaces (or keeps, if.) the fixed-effects.The third
|segment replaces (or keeps, if.) the cluster variables.
Usage
## S3 method for class 'felm'
update(object, formula. = . ~ ., vcov = NULL, ...)
Arguments
object |
A fitted |
formula. |
Update formula; only the segments you want to change need to
differ from
|
vcov |
Optional new |
... |
Additional arguments forwarded to |
Value
A refitted felm object.
Update a felm_formula object
Description
S3 method for update() on felm_formula objects. Uses
Formula::update.Formula() to properly handle multi-part formulas with |.
The . placeholder behaves as in stats::update.formula():
-
. ~ .- keep current response and RHS unchanged. Second
|segment - replaces (or keeps with.) the fixed effects.Third
|segment - replaces (or keeps with.) the cluster variables.
Usage
## S3 method for class 'felm_formula'
update(object, formula., ...)
Arguments
object |
A |
formula. |
Update specification, e.g. |
... |
Ignored. |
Value
A new felm_formula object.
Covariance matrix for GLMs
Description
Covariance matrix for the estimator of the structural parameters from objects returned by feglm. The covariance is computed during model fitting - either the inverse Hessian (default) or the sandwich estimator if cluster variables are specified in the formula.
Usage
## S3 method for class 'feglm'
vcov(object, ...)
Arguments
object |
an object of class |
... |
additional arguments (currently ignored). |
Value
A named matrix of covariance estimates.
References
Cameron, C., J. Gelbach, and D. Miller (2011). "Robust Inference With Multiway Clustering". Journal of Business & Economic Statistics 29(2).
See Also
Examples
# Model with clustering - returns sandwich covariance
ross2004_s1 <- ross2004[ross2004$year == 1994, ]
ross2004_s1 <- ross2004_s1[ross2004_s1$ltrade >
quantile(ross2004_s1$ltrade, 0.75), ]
ross2004_s2 <- ross2004[ross2004$year == 1999, ]
ross2004_s2 <- ross2004_s2[ross2004_s2$ltrade >
quantile(ross2004_s2$ltrade, 0.75), ]
ross2004_subset <- rbind(ross2004_s1, ross2004_s2)
fit <- fepoisson(ltrade ~ ldist + border | ctry1 | year, ross2004_subset)
vcov(fit)
Covariance matrix for LMs
Description
Covariance matrix for the estimator of the structural parameters from objects returned by felm. The covariance is computed during model fitting - either the inverse Hessian (default) or the sandwich estimator if cluster variables are specified in the formula.
Usage
## S3 method for class 'felm'
vcov(object, ...)
Arguments
object |
an object of class |
... |
additional arguments (currently ignored). |
Value
A named matrix of covariance estimates.
References
Cameron, C., J. Gelbach, and D. Miller (2011). "Robust Inference With Multiway Clustering". Journal of Business & Economic Statistics 29(2).
See Also
Examples
# Model with clustering - returns sandwich covariance
ross2004_s1 <- ross2004[ross2004$year == 1994, ]
ross2004_s1 <- ross2004_s1[ross2004_s1$ltrade >
quantile(ross2004_s1$ltrade, 0.75), ]
ross2004_s2 <- ross2004[ross2004$year == 1999, ]
ross2004_s2 <- ross2004_s2[ross2004_s2$ltrade >
quantile(ross2004_s2$ltrade, 0.75), ]
ross2004_subset <- rbind(ross2004_s1, ross2004_s2)
fit <- felm(ltrade ~ ldist | ctry1 | year, ross2004_subset)
vcov(fit)