--- title: "Relational Event Modeling Using `remverse`" subtitle: "A comprehensive tutorial" author: "Joris Mulder" date: "05/15/2026" output: rmarkdown::html_document: toc: true toc_depth: 3 theme: spacelab highlight: pygments code_folding: show bibliography: references.bib header-includes: - \usepackage{amsmath,amssymb} vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{The Relational Event Modeling Pipeline} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(dpi = 72, collapse = TRUE, comment = "#>") ``` --- The `remverse` suite provides a three-step pipeline for relational event modeling: 1. **`remify()`** [@remify] — process a raw edgelist into a standardized event history, specifying the model type ("tie" or "actor"), risk set, timing, and event characteristics (e.g., types, directionality). 2. **`remstats()`** [@meijerink2024remstats] — compute network statistics (endogenous, exogenous, and interactions), with options for memory decay and case-control sampling. 3. **`remstimate()`** [@remstimate] — estimate model parameters via MLE or HMC. Model fit can then be assessed with `diagnostics()`. This vignette walks through the full pipeline for a range of modeling choices: tie-oriented vs. actor-oriented models, interval vs. ordinal timing, different risk set definitions, typed events, and a variety of endogenous and exogenous statistics. Each section builds on the same dataset so the results can be compared directly. ```{r load} library(remverse) ``` --- # 1 Data We use the `edgelist0` dataset (115 directed events among 10 actors) and the `edgelist0_actors` dataset (time-varying actor attributes). ```{r data} data(edgelist0) data(edgelist0_actors) head(edgelist0) head(edgelist0_actors) ``` Events record a `time`, a `sender` (`actor1`), a `receiver` (`actor2`), a `setting` (event type: `"work"` or `"social"`), and a `weight`. The `edgelist0_actors` table provides `job` scores measured from time point `0` for each actor. --- # 2 Quick start: tie-oriented model The tie-oriented model (`model = "tie"`) models the event rate at the dyadic (tie) level [@butts20084]. The three-step pipeline processes the event history, computes network statistics, and estimates the model. ```{r tie-basic} # Step 1: process the event history reh <- remify(edgelist = edgelist0, model = "tie", directed = TRUE) summary(reh) # Step 2: define effects and compute statistics effects <- ~ inertia() + reciprocity() + indegreeSender() + outdegreeReceiver() + outdegreeSender() stats <- remstats(reh = reh, tie_effects = effects) stats # Step 3: estimate the model fit <- remstimate(reh = reh, stats = stats, method = "MLE") summary(fit) ``` The baseline intercept is automatically included for interval timing (`ordinal = FALSE`). It captures the average event rate across all dyads. The endogenous effects quantify how the accumulated event history shapes future interaction patterns: `inertia` measures the tendency of dyads to repeat past interactions, `reciprocity` captures the tendency to reciprocate received events, and the degree effects capture sender and receiver activity, all as function of the volume of past events. These endogenous statistics can be scaled in different manners (Section 4.5) which also affects the interpretation of the corresponding effects. The column `Pr(>|z|)` is the 2-sided p value for testing whether a parameter equals zero. The column `Pr(=0)` is the posterior probability that a parameter is zero based on the default Bayes factor of the `BFpack` package [@mulder2021bfpack]. --- # 3 Quick start: actor-oriented model Actor-oriented models decompose the event process into two steps: a sender rate model (which actor initiates next?) and a receiver choice model (whom does the sender choose?) [@Stadtfeld:2017]. Currently the actor-oriented model is only supported for directed events. ```{r actor-basic} # Step 1: process the event history reh_ao <- remify(edgelist = edgelist0, model = "actor", directed = TRUE, riskset = "active") summary(reh_ao) # Step 2: specify statistics for sender and receiver choice models separately sender_effects <- ~ outdegreeSender() + indegreeSender() choice_effects <- ~ inertia() + reciprocity() stats_ao <- remstats( reh = reh_ao, sender_effects = sender_effects, receiver_effects = choice_effects ) stats_ao # Step 3: estimate the model fit_ao <- remstimate(reh = reh_ao, stats = stats_ao, method = "MLE") summary(fit_ao) ``` The sender model estimates a baseline rate and the effects of out-degree and in-degree on sender activity. The receiver model estimates how inertia and reciprocity determine the choice of receiver, conditional on the sender. --- # 4 Modeling choices {.tabset .tabset-fade .tabset-pills} The sections below describe the modeling choices available at each step of the pipeline. Most choices apply to both tie-oriented and actor-oriented models; exceptions are noted. ## 4.1 Timing: interval vs. ordinal By default, the model uses exact event times (`ordinal = FALSE`), resulting in a rate model with a baseline intercept. When only the order of events is known or only the order is of interest, set `ordinal = TRUE`. The model then reduces to a stratified Cox proportional-hazard model — no baseline intercept can be estimated. This choice applies to both tie and actor models. ```{r ordinal-tie} # Tie model with ordinal timing reh_ord <- remify(edgelist = edgelist0, model = "tie", ordinal = TRUE) stats_ord <- remstats(reh = reh_ord, tie_effects = ~ inertia() + reciprocity()) fit_ord <- remstimate(reh = reh_ord, stats = stats_ord, method = "MLE") summary(fit_ord) ``` ```{r ordinal-actor} # Actor model with ordinal timing reh_ao_ord <- remify(edgelist = edgelist0, model = "actor", ordinal = TRUE, riskset = "active") stats_ao_ord <- remstats( reh = reh_ao_ord, sender_effects = ~ outdegreeSender(), receiver_effects = ~ inertia() + reciprocity() ) fit_ao_ord <- remstimate(reh = reh_ao_ord, stats = stats_ao_ord, method = "MLE") summary(fit_ao_ord) ``` --- ## 4.2 Directionality When events have no inherent directionality, set `directed = FALSE` in `remify()`. The risk set then contains unordered actor pairs rather than ordered dyads, and only statistics defined for undirected networks are available. For example, `sp()` (shared partners) counts the number of third actors through which the two actors have interacted with in the past. Undirected events are only supported for the tie-oriented model. ```{r undirected} reh_undir <- remify(edgelist = edgelist0, model = "tie", directed = FALSE) cat("Directed dyads:", reh$D, "\n") cat("Undirected pairs:", reh_undir$D, "\n") stats_undir <- remstats( reh = reh_undir, tie_effects = ~ inertia() + sp() ) fit_undir <- remstimate(reh = reh_undir, stats = stats_undir, method = "MLE") summary(fit_undir) ``` The `sp()` statistic also supports scaling: `"none"` gives the raw shared partner count, while `"std"` standardizes across all pairs at each time point. Setting `unique = TRUE` counts only third actors that create a *new* shared partner (i.e., not previously connected to both actors in the pair). --- ## 4.3 Risk set variations The risk set defines which dyads (or actors) are at risk of producing an event at each time point. This choice applies to both tie and actor models. ### Full risk set (default) All possible dyads are at risk at every time point. For a directed network with $N$ actors, this gives $N(N-1)$ dyads. ### Active risk set In certain situations, it may be unrealistic that all dyads in the network are at risk to initiate events (e.g., in the case of very large networks). In this case, the user could approximate the actual risk set by taking the active risk set which includes only dyads that are observed in the event history data. This may severely reduce the computation of endogenous statistics as only statistics are computed that are in this (typically much) smaller risk set. ```{r active} reh_active <- remify(edgelist = edgelist0, model = "tie", riskset = "active") cat("Full risk set:", reh$D, "dyads\n") cat("Active risk set:", reh_active$activeD, "dyads\n") stats_active <- remstats(reh = reh_active, tie_effects = ~ inertia() + reciprocity()) fit_active <- remstimate(reh = reh_active, stats = stats_active, method = "MLE") summary(fit_active) ``` In addition to the `active` risk set, the `active_saturated` risk set extends the observed dyads by including their reverse direction (e.g., if only A $\rightarrow$ B is observed and B $\rightarrow$ A is not observed, B $\rightarrow$ A is also included) and, for typed events, all event types for each observed actor pair. ### Manual risk set A manual risk set allows you to restrict the set of dyads at risk to a predefined set. Observed dyads not included in the manual specification are added automatically. ```{r manual} # Define a manual risk set: a subset of dyads my_riskset <- data.frame( actor1 = c(1, 1, 2, 3, 5, 6, 5), actor2 = c(2, 3, 1, 4, 4, 7, 7) ) reh_manual <- remify( edgelist = edgelist0, model = "tie", riskset = "manual", manual.riskset = my_riskset ) cat("Manual risk set:", reh_manual$activeD, "dyads\n") ``` --- ## 4.4 Typed events When events carry a type label (here, `"work"` vs. `"social"` in the column `setting` of the edgelist), different choices can be made how this can be modeled. When a column of the edgelist contains a column `type`, the event types are automatically taken into account when processing (remifying) the data. Alternatively, one can use the argument `event_type` to specify which column of the edgelist contains the event types. The risk set can optionally be extended by type (`extend_riskset_by_type = TRUE`; default is `FALSE`), so that each dyad-type combination is a separate unit at risk. This is only possible in the tie-oriented model. ```{r typed-setup} reh_typed <- remify( edgelist = edgelist0, model = "tie", event_type = "setting" ) ``` When computing the endogenous statistics, the `consider_type` argument controls how past event types enter the computation. ### `consider_type = "ignore"` Event types are not distinguished — statistics aggregate over all types. ```{r typed-ignore} stats_ign <- remstats( reh = reh_typed, tie_effects = ~ inertia(consider_type = "ignore") ) dimnames(stats_ign)[[3]] ``` ### `consider_type = "separate"` One statistic per type: separate counts of past `"work"` events and past `"social"` events. This allows one to model separate effects for the separate event types. ```{r typed-separate} stats_sep <- remstats( reh = reh_typed, tie_effects = ~ inertia(consider_type = "separate") ) dimnames(stats_sep)[[3]] ``` ### `consider_type = "interact"` With `consider_type = "interact"` (requires `extend_riskset_by_type = TRUE`), each base statistic is expanded into $C^2$ statistics, one for each combination of past event type and riskset element type. For example, with types *work* and *social*, `inertia(consider_type = "interact")` produces four statistics: `inertia.work.work`, `inertia.work.social`, `inertia.social.work`, and `inertia.social.social`. This allows the model to estimate a separate effect for each past-type $\times$ future-type combination. ```{r typed-interact} reh_typed_ext <- remify( edgelist = edgelist0, model = "tie", event_type = "setting", extend_riskset_by_type = TRUE ) stats_int <- remstats( reh = reh_typed_ext, tie_effects = ~ inertia(consider_type = "interact") ) dimnames(stats_int)[[3]] ``` ### Estimation with typed events Different `consider_type` settings can be freely mixed within the same model. ```{r typed-fit} stats_typed <- remstats( reh = reh_typed_ext, tie_effects = ~ inertia(consider_type = "interact") + reciprocity(consider_type = "ignore") ) fit_typed <- remstimate(reh = reh_typed_ext, stats = stats_typed, method = "MLE") summary(fit_typed) ``` The type-separated coefficients of inertia tell us whether inertia operates differently across event types. For instance, a strong negative effect of `inertia.work.social` suggests that past work interactions tend to result in fewer social events. Typed events and `consider_type` work the same way in the actor-oriented model: ```{r typed-actor} reh_ao_typed <- remify( edgelist = edgelist0, model = "actor", event_type = "setting" ) stats_ao_typed <- remstats( reh = reh_ao_typed, sender_effects = ~ outdegreeSender(), receiver_effects = ~ inertia(consider_type = "separate") + reciprocity() ) fit_ao_typed <- remstimate(reh = reh_ao_typed, stats = stats_ao_typed, method = "MLE") summary(fit_ao_typed) ``` --- ## 4.5 Endogenous effects, exogenous effects, and scaling ### Endogenous effects The full list of available statistics is obtained with `tie_effects()` for the tie model or `actor_effects()` for the actor model. Many endogenous statistics accept a `scaling` argument that controls how raw counts are transformed: - `"none"` (default): raw counts are used as-is. For `inertia()`, this is the number of past events on the dyad. - `"prop"`: raw counts are divided by a normalizing quantity. For `inertia()`, the count is divided by the sender's outdegree at time $t$, yielding the proportion of the sender's past events directed at this receiver. - `"std"`: raw counts are standardized (zero mean, unit variance) across all dyads at each time point. ```{r scaling} reh <- remify(edgelist = edgelist0, model = "tie", directed = TRUE) stats_none <- remstats(reh = reh, tie_effects = ~ inertia(scaling = "none")) stats_prop <- remstats(reh = reh, tie_effects = ~ inertia(scaling = "prop")) stats_std <- remstats(reh = reh, tie_effects = ~ inertia(scaling = "std")) # Compare values at the last time point for the first dyad cat("Raw count: ", stats_none[114, 1, "inertia"], "\n") cat("Proportional: ", stats_prop[114, 1, "inertia"], "\n") cat("Standardized: ", stats_std[114, 1, "inertia"], "\n") remst_none <- remstimate(reh, stats = stats_none) coef(remst_none) remst_prop <- remstimate(reh, stats = stats_prop) coef(remst_prop) remst_std <- remstimate(reh, stats = stats_std) coef(remst_std) ``` The different types of scaling change the interpretation of the corresponding effects. Depending on the application, different types of scaling may be preferred. Caution is advised when using raw counts (`scaling = "none"`) because the statistics will then be unbounded, potentially resulting in process explosion. ### Exogenous effects Exogenous actor attributes and dyadic covariates can be included in the model alongside endogenous effects. The `edgelist0_actors` dataset provides time-varying attributes. ```{r exo-tie} exo_effects <- ~ inertia(scaling = "std") + reciprocity(scaling = "std") + send("job", edgelist0_actors) + same("job", edgelist0_actors) stats_exo <- remstats( reh = reh, tie_effects = exo_effects ) fit_exo <- remstimate(reh = reh, stats = stats_exo, method = "MLE") summary(fit_exo) ``` - `send("job")`: quantifies whether actors who have a job tend to initiate more events. - `same("job")`: quantifies whether actors having the same job status tend to send more events to each other. Exogenous effects are also available in the actor model, where they can enter the sender rate model, the receiver choice model, or both: ```{r exo-actor} rate_effects_exo <- ~ outdegreeSender(scaling = "std") + send("job", edgelist0_actors) choice_effects_exo <- ~ inertia(scaling = "std") + reciprocity(scaling = "std") + same("job", edgelist0_actors) stats_ao_exo <- remstats( reh = reh_ao, sender_effects = rate_effects_exo, receiver_effects = choice_effects_exo ) fit_ao_exo <- remstimate(reh = reh_ao, stats = stats_ao_exo, method = "MLE") summary(fit_ao_exo) ``` ### Interactions between endogenous and exogenous effects Interaction effects can be used to examine how endogenous tendencies (e.g., inertia) depend on actors' attributes: ```{r exo-interaction} stats_ix <- remstats( reh = reh, tie_effects = ~ inertia() + send("job", edgelist0_actors) + inertia():send("job", edgelist0_actors) ) fit_ix <- remstimate(reh = reh, stats = stats_ix, method = "MLE") summary(fit_ix) ``` The interaction between inertia and send_job quantifies whether the tendency to repeat past interactions differs depending on the sender's job. --- ## 4.6 Memory The `memory` argument of `remstats` controls how the transpired time of past events contributes to endogenous statistics at each time point [@arena2023fast]. This applies to both tie and actor models. ```{r memory} # Full memory (default): entire event history stats_full <- remstats(reh = reh, tie_effects = ~ inertia(), memory = "full") # Window memory: only events within the last 500 time units stats_win <- remstats(reh = reh, tie_effects = ~ inertia(), memory = "window", memory_value = 500) # Decay memory: exponential decay with half-life 200 stats_dec <- remstats(reh = reh, tie_effects = ~ inertia(), memory = "decay", memory_value = 200) # Compare the inertia statistic at the last time point for a single dyad cat("Full memory: ", stats_full[900, 2, "inertia"], "\n") cat("Window (500): ", stats_win[900, 2, "inertia"], "\n") cat("Decay (200): ", stats_dec[900, 2, "inertia"], "\n") ``` `full` memory accumulates all past events equally; `window` memory discards events older than the threshold (specified in `memory_value`); `decay` memory down-weights older events exponentially via a half-life parameter (specified in `memory_value`). --- ## 4.7 Case-control sampling For large networks, computing statistics over the full risk set at every time point is expensive. Case-control sampling draws a random subset of non-event dyads as the comparison set, substantially reducing computation [@lerner2020reliability]. This can be done by setting `sampling = TRUE` and setting `samp_num` to be the total number of dyads in the risk set per event (which includes the observed event(s)). Case-control sampling is currently only available for the tie-oriented model. ```{r sampling} reh_samp <- remify(edgelist = edgelist0, model = "tie", directed = TRUE, event_type = "setting") stats_sampled <- remstats( reh = reh_samp, tie_effects = exo_effects, sampling = TRUE, samp_num = 20, seed = 42 ) fit_sampled <- remstimate(reh = reh_samp, stats = stats_sampled, method = "MLE") summary(fit_sampled) ``` The estimates are consistent as the sample size grows but will differ slightly from the full risk set estimates due to sampling variability. --- # 5 Estimation {.tabset .tabset-fade .tabset-pills} ## 5.1 Maximum likelihood estimation MLE is the default estimation method (`method = "MLE"`) and has been used throughout this vignette. For interval timing, the model uses the full likelihood; for ordinal timing, a stratified partial likelihood (equivalent to a Cox proportional-hazard model). With case-control sampling, the case-control likelihood is used. ## 5.2 Bayesian estimation with HMC Both tie-oriented and actor-oriented models can be estimated with Hamiltonian Monte Carlo via `method = "HMC"`. By default, independent normal priors with mean 0 and variance 100 are placed on all coefficients. Custom priors can be specified via the `prior` argument. When using va: ```{r hmc, message=FALSE} P <- length(dimnames(stats_exo)[[3]]) my_prior <- list( mean = rep(0, P), vcov = diag(c(100, rep(1, P - 1))) # wide prior on intercept, N(0,1) on effects ) fit_exo_hmc <- remstimate( reh = reh, stats = stats_exo, method = "HMC", prior = my_prior, burnin = 200, nsim = 500, thin = 2, L = 100, seed = 42 ) summary(fit_exo_hmc) ``` The HMC output includes posterior means, standard deviations, and trace plots for convergence assessment. Note that the algorithm takes considerably longer than `MLE`. --- # 6 Diagnostics {.tabset .tabset-fade .tabset-pills} The `diagnostics()` function computes several goodness-of-fit measures. The `plot()` function produces three diagnostic plots by default (`which = 1:3`). ## 6.1 Tie model diagnostics ```{r diag-tie, out.width="80%", fig.width=8, fig.height=4, dev=c("jpeg"), dev.args = list(bg = "white")} diag <- diagnostics(object = fit_exo, reh = reh, stats = stats_exo) diag plot(x = fit_exo, reh = reh, diagnostics = diag) ``` ### Waiting-time diagnostics (plot 1) Under a correctly specified model, the rescaled inter-event times $\Delta t_m \cdot \sum_{(i,j) \in \mathcal{R}} \hat{\lambda}_{ij}(t_m)$ should follow an Exp(1) distribution. The Q-Q plot (left) compares the empirical quantiles against the theoretical Exp(1) quantiles: points falling on the diagonal indicate good fit. Systematic departures reveal specific types of misfit — a convex curve (points above the line) suggests the model underestimates waiting times for large gaps, while a concave curve suggests the opposite. The density histogram (right) overlays the empirical distribution with the Exp(1) density (dashed line) for a visual comparison. These plots are only produced for interval timing models (`ordinal = FALSE`). ### Schoenfeld residuals (plot 2) For each effect included in the model, a standardized Schoenfeld residual is computed at every time point [@schoenfeld1982partial]. The residual measures the difference between the observed covariate value and its expected value under the model at that time point. Under a correctly specified model, the residuals should fluctuate randomly around zero with no systematic trend. A smoothing spline (red line) is fitted through the residuals. If the spline substantially deviates from the zero line (dashed), the effect of that statistic may be changing over time — violating the proportional-hazards assumption that the coefficient is constant throughout the observation period. Regarding static covariates (e.g., sender attributes such as `send_job` and `same_job`), the Schoenfeld residuals do not reveal a trend in the effect. However, these residuals are still informative: if residuals for a static covariate are systematically shifted away from zero, this suggests the estimated coefficient may be unreliable — possibly due to confounding with other effects or because the variable does not contribute meaningfully to the model. In contrast, residuals centered around zero indicate that the model's predictions are consistent with the observed covariate values across events. In the current example, the residuals for same_job are centered around zero, while those for send_job are systematically shifted — suggesting the latter may be redundant or confounded with other effects in the model. ### Recall (plot 3) For each time point, the observed event is ranked among all competing dyads based on the estimated rate from the fitted model. The relative rank is plotted over time, where 1 = top-ranked (perfectly predicted) and 0 = bottom-ranked. A well-fitting model concentrates points near the top. Three reference lines aid interpretation: - **Null model (dotted grey at 0.5)**: the expected rank under an intercept-only model where all dyads are equally likely. Points above this line indicate the model does better than chance. - **Median rank (dashed black)**: summarizes overall model performance. A median rank well above 0.5 indicates the model captures meaningful structure in the event process. - **Top % threshold (solid blue at 0.95)**: points above this line are events the model ranked in the top 5% of all competing dyads. The title reports the median rank and the proportion of events in the top 5%. In this example, we see that the median recall is 0.85 implying that half of the observed events belong to the 15% most likely events according to the fitted model, suggesting a fairly good in-sample predictive performance. ## 6.2 Actor model diagnostics The same diagnostic tools are available for actor-oriented models: ```{r diag-actor, out.width="50%", dev=c("jpeg"), dev.args = list(bg = "white")} diag_ao <- diagnostics(object = fit_ao_exo, reh = reh_ao, stats = stats_ao_exo) plot(x = fit_ao_exo, reh = reh_ao, diagnostics = diag_ao) ``` ## 6.3 HMC diagnostics ```{r diag-hmc, out.width="50%", dev=c("jpeg"), dev.args = list(bg = "white")} diag_hmc <- diagnostics(object = fit_exo_hmc, reh = reh, stats = stats_exo) plot(x = fit_exo_hmc, reh = reh, diagnostics = diag_hmc) ``` --- # 7 Summary The table below summarizes the key modeling choices and how they map to function arguments: | Choice | Argument | Options | |--------|----------|---------| | Model type | `remify(model = ...)` | `"tie"`, `"actor"` | | Timing | `remify(ordinal = ...)` | `FALSE` (interval), `TRUE` (ordinal) | | Directionality | `remify(directed = ...)` | `TRUE`, `FALSE` (tie model only) | | Risk set | `remify(riskset = ...)` | `"full"`, `"active"`, `"active_saturated"`, `"manual"` | | Event types | `remify(event_type = ...)` | column name or `NULL` | | Type handling | `inertia(consider_type = ...)` | `"ignore"`, `"separate"`, `"interact"` | | Scaling | `inertia(scaling = ...)` | `"none"`, `"prop"`, `"std"` | | Memory | `remstats(memory = ...)` | `"full"`, `"window"`, `"interval"`, `"decay"` | | Sampling | `remstats(sampling = ...)` | `TRUE`/`FALSE` (tie model only) | | Estimation | `remstimate(method = ...)` | `"MLE"`, `"HMC"` | These choices can be freely combined — for example, an ordinal actor-oriented model with an active risk set, typed events using `consider_type = "separate"`, and Bayesian estimation via HMC.