tidyaudit

R-CMD-check License: LGPL-3 Lifecycle: experimental

Pipeline audit trails and data diagnostics for tidyverse workflows.

tidyaudit captures metadata-only snapshots at each step of a dplyr pipeline, building a structured audit report without storing the data itself. Operation-aware taps enrich snapshots with join match rates, filter drop statistics, and more. The package combines diagnostic tools for interactive development and production-oriented tools for data quality.

Installation

# Install CRAN version using
install.packages("tidyaudit")

# Install development version using using `pak`
pak::pak("fpcordeiro/tidyaudit")

Quick Example

library(tidyaudit)
library(dplyr)
set.seed(123)

orders  <- data.frame(id = 1:100, amount = runif(100, 10, 500), region_id = sample(1:5, 100, TRUE))
regions <- data.frame(region_id = 1:4, name = c("North", "South", "East", "West"))

trail <- audit_trail("order_pipeline")

result <- orders |>
  audit_tap(trail, "raw") |>
  left_join_tap(regions, by = "region_id", .trail = trail, .label = "with_region") |>
  filter_tap(amount > 100, .trail = trail, .label = "high_value", .stat = amount)
#> ℹ filter_tap: amount > 100
#> Dropped 18 of 100 rows (18.0%)
#> Stat amount: dropped 1,062.191 of 25,429.39

print(trail)
#> ── Audit Trail: "order_pipeline" ─────────────────────────────────────────────────────────────────────
#> Created: 2026-02-21 14:36:35
#> Snapshots: 3
#> 
#>   #  Label        Rows  Cols  NAs  Type                                
#>   ─  ───────────  ────  ────  ───  ────────────────────────────────────
#>   1  raw           100     3    0  tap                                 
#>   2  with_region   100     4   23  left_join (many-to-one, 77% matched)
#>   3  high_value     82     4   20  filter (dropped 18 rows, 18%)       
#> 
#> Changes:
#>   raw → with_region: = rows, +1 cols, +23 NAs
#>   with_region → high_value: -18 rows, = cols, -3 NAs

audit_diff(trail, "raw", "high_value")
#> ── Audit Diff: "raw" → "high_value" ──
#> 
#>   Metric  Before  After  Delta
#>   ──────  ──────  ─────  ─────
#>   Rows       100     82    -18
#>   Cols         3      4     +1
#>   NAs          0     20    +20
#> 
#> ✔ Columns added: name
#> 
#> Numeric shifts (common columns):
#>     Column     Mean before  Mean after   Shift
#>     ─────────  ───────────  ──────────  ──────
#>     id               50.50       49.66   -0.84
#>     amount          254.29      297.16  +42.87
#>     region_id         3.08        3.05   -0.03

Features

Audit trail system — the core innovation:

Diagnostic functions — tidyverse ports from dtaudit:

See vignette("tidyaudit") for the audit trail walkthrough and vignette("diagnostics") for the diagnostic functions guide.

Relationship to dtaudit

tidyaudit is a tidyverse-native sibling to dtaudit (a data.table-based package on CRAN). The two packages share design vocabulary and S3 class naming conventions but no code or dependencies.

License

LGPL (>= 3)