Perturb Example

Not all sensitive data is recorded as strings - features such as age, date of birth, or income could result in aspects of a data set being personally identifiable. To aid with these challenges we include methods for ‘perturbing’ numeric data (the addition of random noise).

Three types of random noise are included:

  1. adaptive_noise [default] - random noise which scales with the standard deviation of the variable transformed.
  2. white_noise - random noise at a set spread.
  3. lognorm_noise - random multiplicative noise at a set spread.

NB: we set a random seed using set.seed here for reproducibility. We recommend users avoid this step when using the package in production code.

library(deident)
set.seed(101)

perturb_pipe <- ShiftsWorked |>
  add_perturb(`Daily Pay`)

apply_deident(ShiftsWorked, perturb_pipe)
#> # A tibble: 3,100 × 7
#>    `Record ID` Employee   Date       Shift `Shift Start` `Shift End` `Daily Pay`
#>          <int> <chr>      <date>     <chr> <chr>         <chr>             <dbl>
#>  1           1 Maria Cook 2015-01-01 Night 17:01         00:01             75.1 
#>  2           2 Stephen C… 2015-01-01 Day   08:01         16:01            160.  
#>  3           3 Kimberly … 2015-01-01 Day   08:01         16:01             71.6 
#>  4           4 Nathan Al… 2015-01-01 Day   08:01         15:01            205.  
#>  5           5 Samuel Pa… 2015-01-01 Night 16:01         23:01            213.  
#>  6           6 Scott Mor… 2015-01-01 Night 17:01         00:01            153.  
#>  7           7 Nathan Sa… 2015-01-01 Rest  <NA>          <NA>               5.66
#>  8           8 Jose Lopez 2015-01-01 Night 17:01         00:01            212.  
#>  9           9 Donna Bro… 2015-01-01 Night 16:01         00:01            228.  
#> 10          10 George Ki… 2015-01-01 Night 16:01         00:01            240.  
#> # ℹ 3,090 more rows

To change the noise, pass one of the functions including the desired level of noise.

perturb_pipe_white_noise <- ShiftsWorked |>
  add_perturb(`Daily Pay`, noise = white_noise(sd=0.3))

apply_deident(ShiftsWorked, perturb_pipe_white_noise)
#> # A tibble: 3,100 × 7
#>    `Record ID` Employee   Date       Shift `Shift Start` `Shift End` `Daily Pay`
#>          <int> <chr>      <date>     <chr> <chr>         <chr>             <dbl>
#>  1           1 Maria Cook 2015-01-01 Night 17:01         00:01            78.6  
#>  2           2 Stephen C… 2015-01-01 Day   08:01         16:01           156.   
#>  3           3 Kimberly … 2015-01-01 Day   08:01         16:01            77.8  
#>  4           4 Nathan Al… 2015-01-01 Day   08:01         15:01           203.   
#>  5           5 Samuel Pa… 2015-01-01 Night 16:01         23:01           210.   
#>  6           6 Scott Mor… 2015-01-01 Night 17:01         00:01           142.   
#>  7           7 Nathan Sa… 2015-01-01 Rest  <NA>          <NA>             -0.460
#>  8           8 Jose Lopez 2015-01-01 Night 17:01         00:01           213.   
#>  9           9 Donna Bro… 2015-01-01 Night 16:01         00:01           219.   
#> 10          10 George Ki… 2015-01-01 Night 16:01         00:01           242.   
#> # ℹ 3,090 more rows
perturb_pipe_heavy_adaptive_noise <- ShiftsWorked |>
  add_perturb(`Daily Pay`, noise = adaptive_noise(sd.ratio=0.4))

apply_deident(ShiftsWorked, perturb_pipe_heavy_adaptive_noise)
#> # A tibble: 3,100 × 7
#>    `Record ID` Employee   Date       Shift `Shift Start` `Shift End` `Daily Pay`
#>          <int> <chr>      <date>     <chr> <chr>         <chr>             <dbl>
#>  1           1 Maria Cook 2015-01-01 Night 17:01         00:01              60.0
#>  2           2 Stephen C… 2015-01-01 Day   08:01         16:01             108. 
#>  3           3 Kimberly … 2015-01-01 Day   08:01         16:01              60.1
#>  4           4 Nathan Al… 2015-01-01 Day   08:01         15:01             195. 
#>  5           5 Samuel Pa… 2015-01-01 Night 16:01         23:01             229. 
#>  6           6 Scott Mor… 2015-01-01 Night 17:01         00:01             118. 
#>  7           7 Nathan Sa… 2015-01-01 Rest  <NA>          <NA>              -51.7
#>  8           8 Jose Lopez 2015-01-01 Night 17:01         00:01             197. 
#>  9           9 Donna Bro… 2015-01-01 Night 16:01         00:01             229. 
#> 10          10 George Ki… 2015-01-01 Night 16:01         00:01             230. 
#> # ℹ 3,090 more rows