Not all sensitive data is recorded as strings - features such as age, date of birth, or income could result in aspects of a data set being personally identifiable. To aid with these challenges we include methods for ‘perturbing’ numeric data (the addition of random noise).
Three types of random noise are included:
adaptive_noise
[default] - random noise which scales
with the standard deviation of the variable transformed.white_noise
- random noise at a set spread.lognorm_noise
- random multiplicative noise at a set
spread.NB: we set a random seed using set.seed
here for
reproducibility. We recommend users avoid this step when using the
package in production code.
library(deident)
set.seed(101)
perturb_pipe <- ShiftsWorked |>
add_perturb(`Daily Pay`)
apply_deident(ShiftsWorked, perturb_pipe)
#> # A tibble: 3,100 × 7
#> `Record ID` Employee Date Shift `Shift Start` `Shift End` `Daily Pay`
#> <int> <chr> <date> <chr> <chr> <chr> <dbl>
#> 1 1 Maria Cook 2015-01-01 Night 17:01 00:01 75.1
#> 2 2 Stephen C… 2015-01-01 Day 08:01 16:01 160.
#> 3 3 Kimberly … 2015-01-01 Day 08:01 16:01 71.6
#> 4 4 Nathan Al… 2015-01-01 Day 08:01 15:01 205.
#> 5 5 Samuel Pa… 2015-01-01 Night 16:01 23:01 213.
#> 6 6 Scott Mor… 2015-01-01 Night 17:01 00:01 153.
#> 7 7 Nathan Sa… 2015-01-01 Rest <NA> <NA> 5.66
#> 8 8 Jose Lopez 2015-01-01 Night 17:01 00:01 212.
#> 9 9 Donna Bro… 2015-01-01 Night 16:01 00:01 228.
#> 10 10 George Ki… 2015-01-01 Night 16:01 00:01 240.
#> # ℹ 3,090 more rows
To change the noise, pass one of the functions including the desired level of noise.
perturb_pipe_white_noise <- ShiftsWorked |>
add_perturb(`Daily Pay`, noise = white_noise(sd=0.3))
apply_deident(ShiftsWorked, perturb_pipe_white_noise)
#> # A tibble: 3,100 × 7
#> `Record ID` Employee Date Shift `Shift Start` `Shift End` `Daily Pay`
#> <int> <chr> <date> <chr> <chr> <chr> <dbl>
#> 1 1 Maria Cook 2015-01-01 Night 17:01 00:01 78.6
#> 2 2 Stephen C… 2015-01-01 Day 08:01 16:01 156.
#> 3 3 Kimberly … 2015-01-01 Day 08:01 16:01 77.8
#> 4 4 Nathan Al… 2015-01-01 Day 08:01 15:01 203.
#> 5 5 Samuel Pa… 2015-01-01 Night 16:01 23:01 210.
#> 6 6 Scott Mor… 2015-01-01 Night 17:01 00:01 142.
#> 7 7 Nathan Sa… 2015-01-01 Rest <NA> <NA> -0.460
#> 8 8 Jose Lopez 2015-01-01 Night 17:01 00:01 213.
#> 9 9 Donna Bro… 2015-01-01 Night 16:01 00:01 219.
#> 10 10 George Ki… 2015-01-01 Night 16:01 00:01 242.
#> # ℹ 3,090 more rows
perturb_pipe_heavy_adaptive_noise <- ShiftsWorked |>
add_perturb(`Daily Pay`, noise = adaptive_noise(sd.ratio=0.4))
apply_deident(ShiftsWorked, perturb_pipe_heavy_adaptive_noise)
#> # A tibble: 3,100 × 7
#> `Record ID` Employee Date Shift `Shift Start` `Shift End` `Daily Pay`
#> <int> <chr> <date> <chr> <chr> <chr> <dbl>
#> 1 1 Maria Cook 2015-01-01 Night 17:01 00:01 60.0
#> 2 2 Stephen C… 2015-01-01 Day 08:01 16:01 108.
#> 3 3 Kimberly … 2015-01-01 Day 08:01 16:01 60.1
#> 4 4 Nathan Al… 2015-01-01 Day 08:01 15:01 195.
#> 5 5 Samuel Pa… 2015-01-01 Night 16:01 23:01 229.
#> 6 6 Scott Mor… 2015-01-01 Night 17:01 00:01 118.
#> 7 7 Nathan Sa… 2015-01-01 Rest <NA> <NA> -51.7
#> 8 8 Jose Lopez 2015-01-01 Night 17:01 00:01 197.
#> 9 9 Donna Bro… 2015-01-01 Night 16:01 00:01 229.
#> 10 10 George Ki… 2015-01-01 Night 16:01 00:01 230.
#> # ℹ 3,090 more rows