Not all sensitive data is recorded as strings - features such as age, date of birth, or income could result in aspects of a data set being personally identifiable. To aid with these challenges we include the ‘numeric blur’ method (in comparison to the ‘blur’ for categorical data). As the ‘blur’ transform aggregates categorical features according to a new taxonomy, so too does ‘numeric blur’ create aggregation for numeric features.
At present the methods require pre-defined points at which to divide the data.
library(deident)
quantile_cuts <- quantile(ShiftsWorked$`Daily Pay`, c(0.25, 0.5, 0.75))
numeric_blur_pipe <- ShiftsWorked |>
add_numeric_blur(`Daily Pay`, cuts = quantile_cuts)
apply_deident(ShiftsWorked, numeric_blur_pipe)
#> # A tibble: 3,100 × 7
#> `Record ID` Employee Date Shift `Shift Start` `Shift End` `Daily Pay`
#> <int> <chr> <date> <chr> <chr> <chr> <fct>
#> 1 1 Maria Cook 2015-01-01 Night 17:01 00:01 (70.9,144]
#> 2 2 Stephen C… 2015-01-01 Day 08:01 16:01 (144,208]
#> 3 3 Kimberly … 2015-01-01 Day 08:01 16:01 (70.9,144]
#> 4 4 Nathan Al… 2015-01-01 Day 08:01 15:01 (144,208]
#> 5 5 Samuel Pa… 2015-01-01 Night 16:01 23:01 (208, Inf]
#> 6 6 Scott Mor… 2015-01-01 Night 17:01 00:01 (70.9,144]
#> 7 7 Nathan Sa… 2015-01-01 Rest <NA> <NA> (-Inf,70.9]
#> 8 8 Jose Lopez 2015-01-01 Night 17:01 00:01 (208, Inf]
#> 9 9 Donna Bro… 2015-01-01 Night 16:01 00:01 (208, Inf]
#> 10 10 George Ki… 2015-01-01 Night 16:01 00:01 (208, Inf]
#> # ℹ 3,090 more rows