Out of the box, deident
features a set of
transformations to aid in the de-identification of data sets. Each
transformation is implemented via R6Class
and extends
BaseDeident
. User defined transformations can be
implemented in a similar manner.
To demonstrate the different transformation we supply a toy data set,
df
, comprising 26 observations of three variables:
X
if B <= 13
,
Y
if B > 13
Apply a cached random replacement cipher. Re-occurrence of the same key will receive the same hash.
Implemented deident
options:
deident(df, "psudonymize", A)
deident(df, "Pseudonymizer", A)
deident(df, Pseudonymizer, A)
deident(df, Pseudonymizer$new(), A)
psu <- Pseudonymizer$new()
deident(df, psu, A)
By default Pseudonymizer
replaces values in variables
with a random alpha-numeric string of 5 characters. This can be replaced
via calling set_method
on an instantiated Pseudonymizer
with the desired function:
psu <- Pseudonymizer$new()
new_method <- function(key, ...){
paste(sample(letters, 12, T), collapse="")
}
psu$set_method(new_method)
deident(df, psu, A)
#> DeidentList
#> 1 step(s) implemented
#> Step 1 : 'Pseudonymizer' on variable(s) A
#> For data:
#> columns: A, B, C
The first argument to the method receives the key to be transformed.
Implemented deident
options:
Apply cryptographic hashing to a variable.
Implemented deident
options:
deident(df, "encrypt", A)
deident(df, "Encrypter", A)
deident(df, Encrypter, A)
deident(df, Encrypter$new(), A)
encrypt <- Encrypter$new()
deident(df, encrypt, A)
At initialization, Encrypter
can be given
hash_key
and seed
values to control the
cryptographic encryption. It is recommended users set these values and
do not disclose them.
Apply Gaussian white noise to a numeric variable.
Implemented deident
options:
Aggregate categorical values dependent on a user supplied list. the
list must be supplied to Blur
at initialization.
Implemented deident
options:
Aggregate numeric values dependent on a user supplied vector of
breaks/ cuts. If no vector is supplied NumericBlurer
defaults to a binary classification about 0.
Implemented deident
options:
deident(df, "numeric_blur", B)
deident(df, "NumericBlurer", B)
deident(df, NumericBlurer, B)
deident(df, NumericBlurer$new(), B)
numeric_blur <- NumericBlurer$new()
deident(df, numeric_blur, B)
At initialization NumericBlurer
takes an argument
cuts
to define the limits of each interval.
Apply Shuffler
to a data set having first grouped the
data on column(s). The grouping needs to be defined at
initialization.
Implemented deident
options:
At initialization GroupedShuffler
takes an argument
limit
such that if any aggregated sub group has fewer than
limit
observations all values are dropped.