This article describes how to create a Findings SDTM domain using the {sdtm.oak} package. Examples are currently presented and tested in the context of the VS domain.
Before reading this article, it is recommended that users review the
“Creating an Interventions Domain” article, which provides a detailed
explanation of various concepts in {sdtm.oak}, such as
oak_id_vars
, condition_add
, etc. It also
offers guidance on which mapping algorithms or functions to use for
different mappings and provides a more detailed explanation of how these
mapping algorithms or functions work.
In this article, we will dive directly into programming and provide further explanation only where it is required.
In {sdtm.oak} we process one raw dataset at a time. Similar raw datasets (example Vital Signs - Screening (OID - vs_raw), Vital Signs - Treatment (OID - vs_t_raw)) can be stacked together before processing.
Repeat the above steps for different raw datasets before proceeding with the below steps.
Read all the raw datasets into the environment. In this example, the
raw dataset name is vs_raw
. Users can read it from the
package using the below code:
PATNUM | FORML | ASMNTDN | TMPTC | VTLD | VTLTM | SUBPOS | SYS_BP | DIA_BP | PULSE | RESPRT | TEMP | TEMPLOC | OXY_SAT | LAT | LOC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
375 | Vital Signs | 0 | Pre-dose | 16-May-15 | 7:25 | PRONE | 158 | 92 | 63 | 17 | 40.48 | SKIN | 98 | RIGHT | FINGER |
375 | Vital Signs | 0 | Post-dose | 16-May-15 | 10:25 | SEMI-RECUMBENT | 94 | 78 | 76 | 20 | 36.75 | TYMPANIC MEMBRANE | 99 | LEFT | FINGER |
375 | Vital Signs | 0 | 6-May-18 | 2:01 | PRONE | 117 | 62 | 66 | 15 | 29.45 | ORAL CAVITY | 96 | LEFT | FINGER | |
376 | Vital Signs | 1 | NA | NA | NA | NA | NA | NA | |||||||
376 | Vital Signs | 0 | Pre-dose | 23-Oct-08 | 1:19 | PRONE | 85 | 68 | 73 | 21 | 38.25 | AXILLA | 93 | RIGHT | FINGER |
376 | Vital Signs | 0 | Post-dose | 23-Oct-08 | 3:19 | PRONE | 126 | 81 | 56 | 18 | 38.08 | TYMPANIC MEMBRANE | 93 | LEFT | FINGER |
oak_id | raw_source | patient_number | PATNUM | FORML | SYS_BP | DIA_BP |
---|---|---|---|---|---|---|
1 | vitals | 375 | 375 | Vital Signs | 158 | 92 |
2 | vitals | 375 | 375 | Vital Signs | 94 | 78 |
3 | vitals | 375 | 375 | Vital Signs | 117 | 62 |
4 | vitals | 376 | 376 | Vital Signs | NA | NA |
5 | vitals | 376 | 376 | Vital Signs | 85 | 68 |
6 | vitals | 376 | 376 | Vital Signs | 126 | 81 |
Read in the DM domain
Controlled Terminology is part of the SDTM specification and it is
prepared by the user. In this example, the study controlled terminology
name is sdtm_ct.csv
. Users can read it from the package
using the below code:
codelist_code | term_code | term_value | collected_value | term_preferred_term | term_synonyms |
---|---|---|---|---|---|
C66726 | C25158 | CAPSULE | Capsule | Capsule Dosage Form | cap |
C66726 | C25394 | PILL | Pill | Pill Dosage Form | |
C66726 | C29167 | LOTION | Lotion | Lotion Dosage Form | |
C66726 | C42887 | AEROSOL | Aerosol | Aerosol Dosage Form | aer |
C66726 | C42944 | INHALANT | Inhalant | Inhalant Dosage Form | |
C66726 | C42946 | INJECTION | Injection | Injectable Dosage Form | |
C66726 | C42953 | LIQUID | Liquid | Liquid Dosage Form | |
C66726 | C42998 | TABLET | Tablet | Tablet Dosage Form | tab |
C66728 | C25629 | BEFORE | Prior | Prior | |
C66728 | C53279 | ONGOING | Continue | Continue | Continuous |
This raw dataset has multiple topic variables. Lets start with the first topic variable. Map topic variable SYSBP from the raw variable SYS_BP.
# Map topic variable SYSBP and its qualifiers.
vs_sysbp <-
hardcode_ct(
raw_dat = vs_raw,
raw_var = "SYS_BP",
tgt_var = "VSTESTCD",
tgt_val = "SYSBP",
ct_spec = study_ct,
ct_clst = "C66741"
) %>%
# Filter for records where VSTESTCD is not empty.
# Only these records need qualifier mappings.
dplyr::filter(!is.na(.data$VSTESTCD))
oak_id | raw_source | patient_number | VSTESTCD |
---|---|---|---|
1 | vitals | 375 | SYSBP |
2 | vitals | 375 | SYSBP |
3 | vitals | 375 | SYSBP |
5 | vitals | 376 | SYSBP |
6 | vitals | 376 | SYSBP |
Map rest of the variables applicable to the topic variable SYSBP. This can include qualifiers, identifier and timing variables.
# Map topic variable SYSBP and its qualifiers.
vs_sysbp <- vs_sysbp %>%
# Map VSTEST using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "SYS_BP",
tgt_var = "VSTEST",
tgt_val = "Systolic Blood Pressure",
ct_spec = study_ct,
ct_clst = "C67153",
id_vars = oak_id_vars()
) %>%
# Map VSORRES using assign_no_ct algorithm
assign_no_ct(
raw_dat = vs_raw,
raw_var = "SYS_BP",
tgt_var = "VSORRES",
id_vars = oak_id_vars()
) %>%
# Map VSORRESU using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "SYS_BP",
tgt_var = "VSORRESU",
tgt_val = "mmHg",
ct_spec = study_ct,
ct_clst = "C66770",
id_vars = oak_id_vars()
) %>%
# Map VSPOS using assign_ct algorithm
assign_ct(
raw_dat = vs_raw,
raw_var = "SUBPOS",
tgt_var = "VSPOS",
ct_spec = study_ct,
ct_clst = "C71148",
id_vars = oak_id_vars()
)
oak_id | raw_source | patient_number | VSTESTCD | VSTEST | VSORRES | VSORRESU | VSPOS |
---|---|---|---|---|---|---|---|
1 | vitals | 375 | SYSBP | Systolic Blood Pressure | 158 | mmHg | PRONE |
2 | vitals | 375 | SYSBP | Systolic Blood Pressure | 94 | mmHg | SEMI-RECUMBENT |
3 | vitals | 375 | SYSBP | Systolic Blood Pressure | 117 | mmHg | PRONE |
5 | vitals | 376 | SYSBP | Systolic Blood Pressure | 85 | mmHg | PRONE |
6 | vitals | 376 | SYSBP | Systolic Blood Pressure | 126 | mmHg | PRONE |
This raw data source has other topic variables DIABP, PULSE, RESP, TEMP, OXYSAT, VSALL and its corresponding qualifiers. Repeat mapping topic and qualifiers for each topic variable.
# Map topic variable DIABP and its qualifiers.
vs_diabp <-
hardcode_ct(
raw_dat = vs_raw,
raw_var = "DIA_BP",
tgt_var = "VSTESTCD",
tgt_val = "DIABP",
ct_spec = study_ct,
ct_clst = "C66741"
) %>%
dplyr::filter(!is.na(.data$VSTESTCD)) %>%
# Map VSTEST using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "DIA_BP",
tgt_var = "VSTEST",
tgt_val = "Diastolic Blood Pressure",
ct_spec = study_ct,
ct_clst = "C67153",
id_vars = oak_id_vars()
) %>%
# Map VSORRES using assign_no_ct algorithm
assign_no_ct(
raw_dat = vs_raw,
raw_var = "DIA_BP",
tgt_var = "VSORRES",
id_vars = oak_id_vars()
) %>%
# Map VSORRESU using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "DIA_BP",
tgt_var = "VSORRESU",
tgt_val = "mmHg",
ct_spec = study_ct,
ct_clst = "C66770",
id_vars = oak_id_vars()
) %>%
# Map VSPOS using assign_ct algorithm
assign_ct(
raw_dat = vs_raw,
raw_var = "SUBPOS",
tgt_var = "VSPOS",
ct_spec = study_ct,
ct_clst = "C71148",
id_vars = oak_id_vars()
)
# Map topic variable PULSE and its qualifiers.
vs_pulse <-
hardcode_ct(
raw_dat = vs_raw,
raw_var = "PULSE",
tgt_var = "VSTESTCD",
tgt_val = "PULSE",
ct_spec = study_ct,
ct_clst = "C66741"
) %>%
dplyr::filter(!is.na(.data$VSTESTCD)) %>%
# Map VSTEST using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "PULSE",
tgt_var = "VSTEST",
tgt_val = "Pulse Rate",
ct_spec = study_ct,
ct_clst = "C67153",
id_vars = oak_id_vars()
) %>%
# Map VSORRES using assign_no_ct algorithm
assign_no_ct(
raw_dat = vs_raw,
raw_var = "PULSE",
tgt_var = "VSORRES",
id_vars = oak_id_vars()
) %>%
# Map VSORRESU using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "PULSE",
tgt_var = "VSORRESU",
tgt_val = "beats/min",
ct_spec = study_ct,
ct_clst = "C66770",
id_vars = oak_id_vars()
)
# Map topic variable RESP from the raw variable RESPRT and its qualifiers.
vs_resp <-
hardcode_ct(
raw_dat = vs_raw,
raw_var = "RESPRT",
tgt_var = "VSTESTCD",
tgt_val = "RESP",
ct_spec = study_ct,
ct_clst = "C66741"
) %>%
dplyr::filter(!is.na(.data$VSTESTCD)) %>%
# Map VSTEST using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "RESPRT",
tgt_var = "VSTEST",
tgt_val = "Respiratory Rate",
ct_spec = study_ct,
ct_clst = "C67153",
id_vars = oak_id_vars()
) %>%
# Map VSORRES using assign_no_ct algorithm
assign_no_ct(
raw_dat = vs_raw,
raw_var = "RESPRT",
tgt_var = "VSORRES",
id_vars = oak_id_vars()
) %>%
# Map VSORRESU using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "RESPRT",
tgt_var = "VSORRESU",
tgt_val = "breaths/min",
ct_spec = study_ct,
ct_clst = "C66770",
id_vars = oak_id_vars()
)
# Map topic variable TEMP from raw variable TEMP and its qualifiers.
vs_temp <-
hardcode_ct(
raw_dat = vs_raw,
raw_var = "TEMP",
tgt_var = "VSTESTCD",
tgt_val = "TEMP",
ct_spec = study_ct,
ct_clst = "C66741"
) %>%
dplyr::filter(!is.na(.data$VSTESTCD)) %>%
# Map VSTEST using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "TEMP",
tgt_var = "VSTEST",
tgt_val = "Temperature",
ct_spec = study_ct,
ct_clst = "C67153",
id_vars = oak_id_vars()
) %>%
# Map VSORRES using assign_no_ct algorithm
assign_no_ct(
raw_dat = vs_raw,
raw_var = "TEMP",
tgt_var = "VSORRES",
id_vars = oak_id_vars()
) %>%
# Map VSORRESU using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "TEMP",
tgt_var = "VSORRESU",
tgt_val = "C",
ct_spec = study_ct,
ct_clst = "C66770",
id_vars = oak_id_vars()
) %>%
# Map VSLOC from TEMPLOC using assign_ct
assign_ct(
raw_dat = vs_raw,
raw_var = "TEMPLOC",
tgt_var = "VSLOC",
ct_spec = study_ct,
ct_clst = "C74456",
id_vars = oak_id_vars()
)
# Map topic variable OXYSAT from raw variable OXY_SAT and its qualifiers.
vs_oxysat <-
hardcode_ct(
raw_dat = vs_raw,
raw_var = "OXY_SAT",
tgt_var = "VSTESTCD",
tgt_val = "OXYSAT",
ct_spec = study_ct,
ct_clst = "C66741"
) %>%
dplyr::filter(!is.na(.data$VSTESTCD)) %>%
# Map VSTEST using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "OXY_SAT",
tgt_var = "VSTEST",
tgt_val = "Oxygen Saturation",
ct_spec = study_ct,
ct_clst = "C67153",
id_vars = oak_id_vars()
) %>%
# Map VSORRES using assign_no_ct algorithm
assign_no_ct(
raw_dat = vs_raw,
raw_var = "OXY_SAT",
tgt_var = "VSORRES",
id_vars = oak_id_vars()
) %>%
# Map VSORRESU using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "OXY_SAT",
tgt_var = "VSORRESU",
tgt_val = "%",
ct_spec = study_ct,
ct_clst = "C66770",
id_vars = oak_id_vars()
) %>%
# Map VSLAT using assign_ct from raw variable LAT
assign_ct(
raw_dat = vs_raw,
raw_var = "LAT",
tgt_var = "VSLAT",
ct_spec = study_ct,
ct_clst = "C99073",
id_vars = oak_id_vars()
) %>%
# Map VSLOC using assign_ct from raw variable LOC
assign_ct(
raw_dat = vs_raw,
raw_var = "LOC",
tgt_var = "VSLOC",
ct_spec = study_ct,
ct_clst = "C74456",
id_vars = oak_id_vars()
)
# Map topic variable VSALL from raw variable ASMNTDN with the logic if ASMNTDN == 1 then VSTESTCD = VSALL
vs_vsall <-
hardcode_ct(
raw_dat = condition_add(vs_raw, ASMNTDN == 1L),
raw_var = "ASMNTDN",
tgt_var = "VSTESTCD",
tgt_val = "VSALL",
ct_spec = study_ct,
ct_clst = "C66741"
) %>%
dplyr::filter(!is.na(.data$VSTESTCD)) %>%
# Map VSTEST using hardcode_ct algorithm
hardcode_ct(
raw_dat = vs_raw,
raw_var = "ASMNTDN",
tgt_var = "VSTEST",
tgt_val = "Vital Signs",
ct_spec = study_ct,
ct_clst = "C67153",
id_vars = oak_id_vars()
)
Now that all the topic variable and its qualifier mappings are complete, combine all the datasets and proceed with mapping qualifiers, identifiers and timing variables applicable to all topic variables.
# Combine all the topic variables into a single data frame and map qualifiers
# applicable to all topic variables
vs <- dplyr::bind_rows(
vs_vsall, vs_sysbp, vs_diabp, vs_pulse, vs_resp,
vs_temp, vs_oxysat
) %>%
# Map qualifiers common to all topic variables
# Map VSDTC using assign_ct algorithm
assign_datetime(
raw_dat = vs_raw,
raw_var = c("VTLD", "VTLTM"),
tgt_var = "VSDTC",
raw_fmt = c(list(c("d-m-y", "dd-mmm-yyyy")), "H:M")
) %>%
# Map VSTPT from TMPTC using assign_ct
assign_ct(
raw_dat = vs_raw,
raw_var = "TMPTC",
tgt_var = "VSTPT",
ct_spec = study_ct,
ct_clst = "TPT",
id_vars = oak_id_vars()
) %>%
# Map VSTPTNUM from TMPTC using assign_ct
assign_ct(
raw_dat = vs_raw,
raw_var = "TMPTC",
tgt_var = "VSTPTNUM",
ct_spec = study_ct,
ct_clst = "TPTNUM",
id_vars = oak_id_vars()
) %>%
# Map VISIT from INSTANCE using assign_ct
assign_ct(
raw_dat = vs_raw,
raw_var = "INSTANCE",
tgt_var = "VISIT",
ct_spec = study_ct,
ct_clst = "VISIT",
id_vars = oak_id_vars()
) %>%
# Map VISITNUM from INSTANCE using assign_ct
assign_ct(
raw_dat = vs_raw,
raw_var = "INSTANCE",
tgt_var = "VISITNUM",
ct_spec = study_ct,
ct_clst = "VISITNUM",
id_vars = oak_id_vars()
)
oak_id | raw_source | patient_number | VSTESTCD | VSTEST | VSORRES | VSORRESU | VSPOS | VSLAT | VSDTC | VSTPT | VSTPTNUM | VISIT | VISITNUM |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | vitals | 375 | SYSBP | Systolic Blood Pressure | 158.00 | mmHg | PRONE | NA | 2015-05-16T07:25 | PREDOSE | 1 | VISIT1 | VISIT1 |
1 | vitals | 375 | DIABP | Diastolic Blood Pressure | 92.00 | mmHg | PRONE | NA | 2015-05-16T07:25 | PREDOSE | 1 | VISIT1 | VISIT1 |
1 | vitals | 375 | PULSE | Pulse Rate | 63.00 | beats/min | NA | NA | 2015-05-16T07:25 | PREDOSE | 1 | VISIT1 | VISIT1 |
1 | vitals | 375 | RESP | Respiratory Rate | 17.00 | breaths/min | NA | NA | 2015-05-16T07:25 | PREDOSE | 1 | VISIT1 | VISIT1 |
1 | vitals | 375 | TEMP | Temperature | 40.48 | C | NA | NA | 2015-05-16T07:25 | PREDOSE | 1 | VISIT1 | VISIT1 |
1 | vitals | 375 | OXYSAT | Oxygen Saturation | 98.00 | % | NA | RIGHT | 2015-05-16T07:25 | PREDOSE | 1 | VISIT1 | VISIT1 |
2 | vitals | 375 | SYSBP | Systolic Blood Pressure | 94.00 | mmHg | SEMI-RECUMBENT | NA | 2015-05-16T10:25 | POSTDOSE | 2 | VISIT1 | VISIT1 |
2 | vitals | 375 | DIABP | Diastolic Blood Pressure | 78.00 | mmHg | SEMI-RECUMBENT | NA | 2015-05-16T10:25 | POSTDOSE | 2 | VISIT1 | VISIT1 |
2 | vitals | 375 | PULSE | Pulse Rate | 76.00 | beats/min | NA | NA | 2015-05-16T10:25 | POSTDOSE | 2 | VISIT1 | VISIT1 |
2 | vitals | 375 | RESP | Respiratory Rate | 20.00 | breaths/min | NA | NA | 2015-05-16T10:25 | POSTDOSE | 2 | VISIT1 | VISIT1 |
Create derived variables applicable to all topic variables.
vs <- vs %>%
dplyr::mutate(
STUDYID = "test_study",
DOMAIN = "VS",
VSCAT = "VITAL SIGNS",
USUBJID = paste0("test_study", "-", .data$patient_number)
) %>%
# derive_seq(tgt_var = "VSSEQ",
# rec_vars= c("USUBJID", "VSTRT")) %>%
derive_study_day(
sdtm_in = .,
dm_domain = dm,
tgdt = "VSDTC",
refdt = "RFXSTDTC",
study_day_var = "VSDY"
) %>%
dplyr::select("STUDYID", "DOMAIN", "USUBJID", everything())
STUDYID | DOMAIN | USUBJID | VSTESTCD | VSTEST | VSORRES | VSORRESU | VSPOS | VSLAT | VSTPT | VSTPTNUM | VISIT | VISITNUM | VSDTC | VSDY |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
test_study | VS | test_study-375 | SYSBP | Systolic Blood Pressure | 158.00 | mmHg | PRONE | NA | PREDOSE | 1 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | DIABP | Diastolic Blood Pressure | 92.00 | mmHg | PRONE | NA | PREDOSE | 1 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | PULSE | Pulse Rate | 63.00 | beats/min | NA | NA | PREDOSE | 1 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | RESP | Respiratory Rate | 17.00 | breaths/min | NA | NA | PREDOSE | 1 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | TEMP | Temperature | 40.48 | C | NA | NA | PREDOSE | 1 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | OXYSAT | Oxygen Saturation | 98.00 | % | NA | RIGHT | PREDOSE | 1 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | SYSBP | Systolic Blood Pressure | 94.00 | mmHg | SEMI-RECUMBENT | NA | POSTDOSE | 2 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | DIABP | Diastolic Blood Pressure | 78.00 | mmHg | SEMI-RECUMBENT | NA | POSTDOSE | 2 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | PULSE | Pulse Rate | 76.00 | beats/min | NA | NA | POSTDOSE | 2 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
test_study | VS | test_study-375 | RESP | Respiratory Rate | 20.00 | breaths/min | NA | NA | POSTDOSE | 2 | VISIT1 | VISIT1 | 2015-05-16 | -2890 |
Yet to be developed.