A cdm reference is a single R object that represents OMOP CDM data. The tables in the cdm reference may be in a database, but a cdm reference may also contain OMOP CDM tables that are in data frames, tibbles, or Arrow. In the latter cases, the cdm reference would typically be a subset of an original cdm reference that has been derived as part of a particular analysis.
omopgenerics provides a general class definition for a cdm reference and a data frame/tibble implementation. For creating a cdm reference using a database, see the CDMConnector package (https://darwin-eu.github.io/CDMConnector/).
A cdm reference is a list of tables. These tables come in three types: standard OMOP CDM tables, cohort tables, and other auxiliary tables.
There are multiple versions of the OMOP CDM. The list of tables included in version 5.3 is as follows.
library(omopgenerics)
#>
#> Attaching package: 'omopgenerics'
#> The following object is masked from 'package:stats':
#>
#> filter
omopTables()
#> [1] "person" "observation_period" "visit_occurrence"
#> [4] "visit_detail" "condition_occurrence" "drug_exposure"
#> [7] "procedure_occurrence" "device_exposure" "measurement"
#> [10] "observation" "death" "note"
#> [13] "note_nlp" "specimen" "fact_relationship"
#> [16] "location" "care_site" "provider"
#> [19] "payer_plan_period" "cost" "drug_era"
#> [22] "dose_era" "condition_era" "metadata"
#> [25] "cdm_source" "concept" "vocabulary"
#> [28] "domain" "concept_class" "concept_relationship"
#> [31] "relationship" "concept_synonym" "concept_ancestor"
#> [34] "source_to_concept_map" "drug_strength" "cohort_definition"
#> [37] "attribute_definition" "concept_recommended"The standard OMOP tables have required fields. We can check the required columns of the person table, for example, like so:
omopColumns(table = "person", version = "5.3")
#> [1] "person_id" "gender_concept_id"
#> [3] "year_of_birth" "month_of_birth"
#> [5] "day_of_birth" "birth_datetime"
#> [7] "race_concept_id" "ethnicity_concept_id"
#> [9] "location_id" "provider_id"
#> [11] "care_site_id" "person_source_value"
#> [13] "gender_source_value" "gender_source_concept_id"
#> [15] "race_source_value" "race_source_concept_id"
#> [17] "ethnicity_source_value" "ethnicity_source_concept_id"Studies using the OMOP CDM often create study-specific cohort tables.
We also consider these part of the cdm reference once created. Each
cohort table is associated with a specific class of its own, a
generatedCohortSet, which is described more in a subsequent
vignette. As with the standard OMOP CDM tables, cohort tables are
expected to contain a specific set of fields, and they may include
additional fields.
cohortColumns(table = "cohort", version = "5.3")
#> [1] "cohort_definition_id" "subject_id" "cohort_start_date"
#> [4] "cohort_end_date"
cohortColumns(table = "cohort_set", version = "5.3")
#> [1] "cohort_definition_id" "cohort_name"
cohortColumns(table = "cohort_attrition", version = "5.3")
#> [1] "cohort_definition_id" "number_records" "number_subjects"
#> [4] "reason_id" "reason" "excluded_records"
#> [7] "excluded_subjects"The Achilles R package provides descriptive statistics on an OMOP CDM database. The results from Achilles are stored in tables in the database. The following tables are created with the given columns.
achillesTables()
#> [1] "achilles_analysis" "achilles_results" "achilles_results_dist"
achillesColumns("achilles_analysis")
#> [1] "analysis_id" "analysis_name" "stratum_1_name" "stratum_2_name"
#> [5] "stratum_3_name" "stratum_4_name" "stratum_5_name" "is_default"
#> [9] "category"
achillesColumns("achilles_results")
#> [1] "analysis_id" "stratum_1" "stratum_2" "stratum_3" "stratum_4"
#> [6] "stratum_5" "count_value"
achillesColumns("achilles_results_dist")
#> [1] "analysis_id" "stratum_1" "stratum_2" "stratum_3" "stratum_4"
#> [6] "stratum_5" "count_value" "min_value" "max_value" "avg_value"
#> [11] "stdev_value" "median_value" "p10_value" "p25_value" "p75_value"
#> [16] "p90_value"Beyond the standard OMOP CDM tables and cohort tables, additional tables can be added to the cdm reference. These tables could, for example, be OMOP extension or expansion tables, or extra tables containing data required to perform a study but not normally included as part of the OMOP CDM. These tables could contain any set of fields.
Any table that is part of a cdm object has to satisfy the following conditions:
All tables must share a common source (that is, a mix of tables in the database and in-memory is not permitted).
Table names must be lowercase snake_case.
Column names in each table must be lowercase snake_case.
The person and observation_period
tables must be present.
The cdm reference must have a “cdmName” attribute that gives the name associated with the data contained within it.
When the export method is applied to a cdm reference, metadata about that cdm will be written to a CSV file. The CSV file contains the following columns:
| Variable | Description | Datatype | Required |
|---|---|---|---|
| result_type | Always “Snapshot”. Identifies this result as a summary of a cdm reference. | Character | Yes |
| cdm_name | The name of the data source. | Character | Yes |
| cdm_source_name | Value of cdm source name taken from the cdm source table (if present in the cdm reference). | Character | No |
| cdm_description | Value of cdm description taken from the cdm source table (if present in the cdm reference). | Character | No |
| cdm_documentation_reference | Value of cdm documentation reference taken from the cdm source table (if present in the cdm reference). | Character | No |
| cdm_version | The cdm version associated with the cdm reference. | Character | Yes |
| cdm_holder | Value of cdm holder reference taken from the cdm source table (if present in the cdm reference). | Character | No |
| cdm_release_date | Value of cdm release date taken from the cdm source table (if present in the cdm reference). | Date | No |
| vocabulary_version | Version of the vocabulary being used taken from the concept table (if present in the cdm reference). | Character | No |
| person_count | Number of records in the person table. | Integer | Yes |
| observation_period_count | Number of records in the observation period table. | Integer | Yes |
| earliest_observation_period_start_date | Earliest date in the observation period start date field from the observation period table. | Date | Yes |
| latest_observation_period_end_date | Latest date in the observation period start date field from the observation period table. | Date | Yes |
| snapshot_date | Date at which this snapshot was created. | Date | Yes |