A summarised result is a table that contains aggregated summary statistics (a result set with no patient-level data). The summarised result object consists of two objects: a results table and a settings table.
This table has 13 columns:
result_id (1), used to identify a group of results with
common settings (see settings below).cdm_name (2), used to identify the name of the cdm
object used to obtain those results.group_name (3) - group_level (4), these
columns work together as a name-level pair. A
name-level pair is two columns that work together to summarise
information from multiple other columns. The name column
contains the column names separated by &&& and
the level column contains the column values separated by
&&&. Elements in the name column must
be snake_case. Usually, group aggregation is used to show high-level
aggregations: e.g. cohort name or codelist name.strata_name (5) - strata_level (6), these
columns work together as a name-level pair. Usually strata
aggregation is used to show stratifications of the results: e.g. age
groups or sex.variable_name (7), name of the variable of
interest.variable_level (8), level of the variable of interest,
it is usually a subclass of the variable_name.estimate_name (9), name of the estimate.estimate_type (10), type of the value displayed, the
supported types are: numeric, integer, date, character, proportion,
percentage, logical.estimate_value (11), value of interest.additional_name (12) - additional_level
(13), these columns work together as a name-level pair. Usually
additional aggregation is used to include the aggregations that did not
fit in the group/strata definition.The following table summarises the requirements of each column in the summarised_result format:
| Column name | Column type | is NA allowed? | Requirements |
|---|---|---|---|
| result_id | integer | No | NA |
| cdm_name | character | No | NA |
| group_name | character | No | name1 |
| group_level | character | No | level1 |
| strata_name | character | No | name2 |
| strata_level | character | No | level2 |
| variable_name | character | No | NA |
| variable_level | character | Yes | NA |
| estimate_name | character | No | snake_case |
| estimate_type | character | No | estimateTypeChoices() |
| estimate_value | character | No | NA |
| additional_name | character | No | name3 |
| additional_level | character | No | level3 |
The settings table provides one row per result_id with
the settings used to generate those results. There is no limit on the
number of columns or parameters that can be provided per result_id, but
at least three values should be provided:
result_type (1): identifies the type of result
provided. We would usually use the name of the function that generated
that set of results in snake_case. For example, if the function that
generates the summarised result is named summariseMyCustomData,
then the result_type would be: summarise_my_custom_data.package_name (2): name of the package that generated
the result type.package_version (3): version of the package that
generated the result type.These columns must be character vectors, but this restriction does not apply to other extra columns.
The newSummarisedResult() function can be used to create
library(omopgenerics)
library(dplyr)
x <- tibble(
result_id = 1L,
cdm_name = "my_cdm",
group_name = "cohort_name",
group_level = "cohort1",
strata_name = "sex",
strata_level = "male",
variable_name = "Age group",
variable_level = "10 to 50",
estimate_name = "count",
estimate_type = "numeric",
estimate_value = "5",
additional_name = "overall",
additional_level = "overall"
)
result <- newSummarisedResult(x)
result |>
glimpse()
#> Rows: 1
#> Columns: 13
#> $ result_id <int> 1
#> $ cdm_name <chr> "my_cdm"
#> $ group_name <chr> "cohort_name"
#> $ group_level <chr> "cohort1"
#> $ strata_name <chr> "sex"
#> $ strata_level <chr> "male"
#> $ variable_name <chr> "Age group"
#> $ variable_level <chr> "10 to 50"
#> $ estimate_name <chr> "count"
#> $ estimate_type <chr> "numeric"
#> $ estimate_value <chr> "5"
#> $ additional_name <chr> "overall"
#> $ additional_level <chr> "overall"
settings(result)
#> # A tibble: 1 × 8
#> result_id result_type package_name package_version group strata additional
#> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 "" "" "" cohort_n… sex ""
#> # ℹ 1 more variable: min_cell_count <chr>We can also associate settings with our results. These will typically be used to explain how the result was created.
result <- newSummarisedResult(
x = x,
settings = tibble(
result_id = 1L,
package_name = "PatientProfiles",
study = "my_characterisation_study"
)
)
result |> glimpse()
#> Rows: 1
#> Columns: 13
#> $ result_id <int> 1
#> $ cdm_name <chr> "my_cdm"
#> $ group_name <chr> "cohort_name"
#> $ group_level <chr> "cohort1"
#> $ strata_name <chr> "sex"
#> $ strata_level <chr> "male"
#> $ variable_name <chr> "Age group"
#> $ variable_level <chr> "10 to 50"
#> $ estimate_name <chr> "count"
#> $ estimate_type <chr> "numeric"
#> $ estimate_value <chr> "5"
#> $ additional_name <chr> "overall"
#> $ additional_level <chr> "overall"
settings(result)
#> # A tibble: 1 × 9
#> result_id result_type package_name package_version group strata additional
#> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 "" PatientProfiles "" cohor… sex ""
#> # ℹ 2 more variables: min_cell_count <chr>, study <chr>Multiple summarised result objects can be combined using the bind function. Result IDs will be assigned for each set of results with the same settings. If two groups of results have the same settings, although they are in different objects, they will be merged into a single one.
result1 <- newSummarisedResult(
x = tibble(
result_id = 1L,
cdm_name = "my_cdm",
group_name = "cohort_name",
group_level = "cohort1",
strata_name = "sex",
strata_level = "male",
variable_name = "Age group",
variable_level = "10 to 50",
estimate_name = "count",
estimate_type = "numeric",
estimate_value = "5",
additional_name = "overall",
additional_level = "overall"
),
settings = tibble(
result_id = 1L,
package_name = "PatientProfiles",
package_version = "1.0.0",
study = "my_characterisation_study",
result_type = "stratified_by_age_group"
)
)
result2 <- newSummarisedResult(
x = tibble(
result_id = 1L,
cdm_name = "my_cdm",
group_name = "overall",
group_level = "overall",
strata_name = "overall",
strata_level = "overall",
variable_name = "overall",
variable_level = "overall",
estimate_name = "count",
estimate_type = "numeric",
estimate_value = "55",
additional_name = "overall",
additional_level = "overall"
),
settings = tibble(
result_id = 1L,
package_name = "PatientProfiles",
package_version = "1.0.0",
study = "my_characterisation_study",
result_type = "overall_analysis"
)
)Now that we have our results, we can combine them using bind. Because the two sets of results contain the same result ID, this will be automatically updated when the results are combined.
result <- bind(result1, result2)
result |>
dplyr::glimpse()
#> Rows: 2
#> Columns: 13
#> $ result_id <int> 1, 2
#> $ cdm_name <chr> "my_cdm", "my_cdm"
#> $ group_name <chr> "cohort_name", "overall"
#> $ group_level <chr> "cohort1", "overall"
#> $ strata_name <chr> "sex", "overall"
#> $ strata_level <chr> "male", "overall"
#> $ variable_name <chr> "Age group", "overall"
#> $ variable_level <chr> "10 to 50", "overall"
#> $ estimate_name <chr> "count", "count"
#> $ estimate_type <chr> "numeric", "numeric"
#> $ estimate_value <chr> "5", "55"
#> $ additional_name <chr> "overall", "overall"
#> $ additional_level <chr> "overall", "overall"
settings(result)
#> # A tibble: 2 × 9
#> result_id result_type package_name package_version group strata additional
#> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 stratified_by_… PatientProf… 1.0.0 "coh… "sex" ""
#> 2 2 overall_analys… PatientProf… 1.0.0 "" "" ""
#> # ℹ 2 more variables: min_cell_count <chr>, study <chr>We have an entire vignette explaining how the summarised_result
object is suppressed: vignette("suppression", "omopgenerics").
The summarised_result object can be exported and imported as a CSV file with the following functions:
importSummarisedResult()
exportSummarisedResult()
Note that exportSummarisedResult() also suppresses the
results.
x <- tempdir()
files <- list.files(x)
exportSummarisedResult(result, path = x, fileName = "result.csv")
setdiff(list.files(x), files)
#> [1] "result.csv"Note that the settings are included in the CSV file:
#> "result_id","cdm_name","group_name","group_level","strata_name","strata_level","variable_name","variable_level","estimate_name","estimate_type","estimate_value","additional_name","additional_level" "1","my_cdm","cohort_name","cohort1","sex","male","Age group","10 to 50","count","numeric","5","overall","overall" "2","my_cdm","overall","overall","overall","overall","overall","overall","count","numeric","55","overall","overall" "3","my_cdm","overall","overall","log_id","1","Log file created",NA,"date_time","character","2026-06-17 14:08:11","overall","overall" "3","my_cdm","overall","overall","log_id","1","Log file created",NA,"elapsed_time","integer","0","overall","overall" "3","my_cdm","overall","overall","log_id","2","Defining toy data",NA,"date_time","character","2026-06-17 14:08:11","overall","overall" "3","my_cdm","overall","overall","log_id","2","Defining toy data",NA,"elapsed_time","integer","0","overall","overall" "3","my_cdm","overall","overall","log_id","3","Summarise toy data",NA,"date_time","character","2026-06-17 14:08:11","overall","overall" "3","my_cdm","overall","overall","log_id","3","Summarise toy data",NA,"elapsed_time","integer","0","overall","overall" "3","my_cdm","overall","overall","log_id","4","Exporting log file",NA,"date_time","character","2026-06-17 14:08:11","overall","overall" "3","my_cdm","overall","overall","log_id","4","Exporting log file",NA,"elapsed_time","integer","2","overall","overall" "3","my_cdm","overall","overall","log_id","5","Exporting log file",NA,"date_time","character","2026-06-17 14:08:13","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"result_type","character","stratified_by_age_group","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"package_name","character","PatientProfiles","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"package_version","character","1.0.0","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"group","character","cohort_name","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"strata","character","sex","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"additional","character","","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"min_cell_count","character","5","overall","overall" "1",NA,"overall","overall","overall","overall","settings",NA,"study","character","my_characterisation_study","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"result_type","character","overall_analysis","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"package_name","character","PatientProfiles","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"package_version","character","1.0.0","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"group","character","","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"strata","character","","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"additional","character","","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"min_cell_count","character","5","overall","overall" "2",NA,"overall","overall","overall","overall","settings",NA,"study","character","my_characterisation_study","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"result_type","character","summarise_log_file","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"package_name","character","omopgenerics","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"package_version","character","1.4.0","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"group","character","","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"strata","character","log_id","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"additional","character","","overall","overall" "3",NA,"overall","overall","overall","overall","settings",NA,"min_cell_count","character","5","overall","overall"
You can later import the results back with
importSummarisedResult():
res <- importSummarisedResult(path = file.path(x, "result.csv"))
class(res)
#> [1] "summarised_result" "omop_result" "tbl_df"
#> [4] "tbl" "data.frame"
res |>
glimpse()
#> Rows: 11
#> Columns: 13
#> $ result_id <int> 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3
#> $ cdm_name <chr> "my_cdm", "my_cdm", "my_cdm", "my_cdm", "my_cdm", "my…
#> $ group_name <chr> "cohort_name", "overall", "overall", "overall", "over…
#> $ group_level <chr> "cohort1", "overall", "overall", "overall", "overall"…
#> $ strata_name <chr> "sex", "overall", "log_id", "log_id", "log_id", "log_…
#> $ strata_level <chr> "male", "overall", "1", "1", "2", "2", "3", "3", "4",…
#> $ variable_name <chr> "Age group", "overall", "Log file created", "Log file…
#> $ variable_level <chr> "10 to 50", "overall", NA, NA, NA, NA, NA, NA, NA, NA…
#> $ estimate_name <chr> "count", "count", "date_time", "elapsed_time", "date_…
#> $ estimate_type <chr> "numeric", "numeric", "character", "integer", "charac…
#> $ estimate_value <chr> "5", "55", "2026-06-17 14:08:11", "0", "2026-06-17 14…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
res |>
settings()
#> # A tibble: 3 × 9
#> result_id result_type package_name package_version group strata additional
#> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 stratified_by_… PatientProf… 1.0.0 "coh… "sex" ""
#> 2 2 overall_analys… PatientProf… 1.0.0 "" "" ""
#> 3 3 summarise_log_… omopgenerics 1.4.0 "" "log_… ""
#> # ℹ 2 more variables: min_cell_count <chr>, study <chr><summarised_result>omopgenerics defines a tidy() method for
<summarised_result> objects. This function:
The <summarised_result> object has the following
pair columns: group_name-group_level, strata_name-strata_level, and
additional_name-additional_level. These pairs use the
&&& separator to combine multiple fields. For
example, if you want to combine cohort_name and age_group in the
group_name-group_level pair:
group_name = "cohort_name &&& age_group" and
group_level = "my_cohort &&& <40". By
default, if no aggregation is produced in the group_name-group_level
pair: group_name = "overall" and
group_level = "overall".
ORIGINAL FORMAT:
| group_name | group_level |
|---|---|
| cohort_name | acetaminophen |
| cohort_name &&& sex | acetaminophen &&& Female |
| sex &&& age_group | Male &&& <40 |
The tidy format puts each value into its own column. This makes it
easier to manipulate, but the output is no longer standardised, as each
<summarised_result> object will have a different
number and set of column names. Missing values will be filled with the
“overall” label.
TIDY FORMAT:
| cohort_name | sex | age_group |
|---|---|---|
| acetaminophen | overall | overall |
| acetaminophen | Female | overall |
| overall | Male | <40 |
<summarised_result> object
as columns:Each <summarised_result> object has a settings
attribute that relates the ‘result_id’ column to each different set of
settings. The columns ‘result_type’, ‘package_name’ and
‘package_version’ are always present in settings, but we may also have
extra parameters depending on how the object was created. In the
<summarised_result> format, we need to use the
settings() function to see those variables:
ORIGINAL FORMAT:
settings:
| result_id | my_setting | package_name |
|---|---|---|
| 1 | TRUE | omopgenerics |
| 2 | FALSE | omopgenerics |
<summarised_result>:
| result_id | cdm_name | additional_name | |
|---|---|---|---|
| 1 | omop | ... | overall |
| ... | ... | ... | ... |
| 2 | omop | ... | overall |
| ... | ... | ... | ... |
In the tidy format, we add the settings as columns, so their values
are repeated multiple times (there is only one row per result_id in
settings, whereas there can be multiple rows in the
<summarised_result> object). The column ‘result_id’
is removed because it no longer provides information. Again, we lose
standardisation (multiple different settings), but we gain
flexibility:
TIDY FORMAT:
| cdm_name | additional_name | my_setting | package_name | |
|---|---|---|---|---|
| omop | ... | overall | TRUE | omopgenerics |
| ... | ... | ... | ... | ... |
| omop | ... | overall | FALSE | omopgenerics |
| ... | ... | ... | ... | ... |
In the <summarised_result> format estimates are
displayed in 3 columns:
<character>.ORIGINAL FORMAT:
| variable_name | estimate_name | estimate_type | estimate_value |
|---|---|---|---|
| number individuals | count | integer | 100 |
| age | mean | numeric | 50.3 |
| age | sd | numeric | 20.7 |
In the tidy format, we pivot the estimates, creating a new column for each ‘estimate_name’ value. The columns will be cast to ‘estimate_type’. If there are multiple estimate_type values for the same estimate_name, they will not be cast and will be displayed as character values (a warning will be thrown). Missing data are populated with NAs.
TIDY FORMAT:
| variable_name | count | mean | sd |
|---|---|---|---|
| number individuals | 100 | NA | NA |
| age | NA | 50.3 | 20.7 |
Let’s see a simple example with some toy data:
The split functions are provided independently:
splitGroup() only splits the pair
group_name-group_level columns.splitStrata() only splits the pair
strata_name-strata_level columns.splitAdditional() only splits the pair
additional_name-additional_level columns.There is also the function: - splitAll() that splits any
x_name-x_level pair found in the data.
splitAll(result)
#> # A tibble: 2 × 9
#> result_id cdm_name cohort_name sex variable_name variable_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 my_cdm cohort1 male Age group 10 to 50
#> 2 2 my_cdm overall overall overall overall
#> # ℹ 3 more variables: estimate_name <chr>, estimate_type <chr>,
#> # estimate_value <chr>pivotEstimates() can be used to pivot the variables that
we are interested in.
The argument pivotEstimatesBy specifies which variables
we want to use to pivot by. There are four options:
NULL/character() to not pivot anything.c("estimate_name") to pivot only estimate_name.c("variable_level", "estimate_name") to pivot
estimate_name and variable_level.c("variable_name", "variable_level", "estimate_name")
to pivot estimate_name, variable_level and variable_name.Note that variable_level can contain NA values, these
will be ignored on the naming part.
pivotEstimates(
result,
pivotEstimatesBy = c("variable_name", "variable_level", "estimate_name")
)
#> # A tibble: 2 × 10
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 my_cdm cohort_name cohort1 sex male
#> 2 2 my_cdm overall overall overall overall
#> # ℹ 4 more variables: additional_name <chr>, additional_level <chr>,
#> # `Age group_10 to 50_count` <dbl>, overall_overall_count <dbl>addSettings() is used to add the settings that we want
as new columns to our <summarised_result> object.
The settingsColumn argument is used to choose which
settings we want to add.
addSettings(
result,
settingsColumn = "result_type"
)
#> # A tibble: 2 × 14
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 my_cdm cohort_name cohort1 sex male
#> 2 2 my_cdm overall overall overall overall
#> # ℹ 8 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>, result_type <chr>Dealing with an <summarised_result> object can be
difficult, especially when we are trying to filter. For example, it can
be difficult to filter to a certain result_type or, when many strata are
joined together, to filter only one of the variables. The
tidy format makes filtering easier, but using it means
losing the <summarised_result> object.
The omopgenerics package contains functions that help with this process:
filterSettings to filter the
<summarised_result> object using the
settings() attribute.filterGroup to filter the
<summarised_result> object using the
group_name-group_level tidy columns.filterStrata to filter the
<summarised_result> object using the
strata_name-strata_level tidy columns.filterAdditional to filter the
<summarised_result> object using the
additional_name-additional_level tidy columns.For instance, let’s filter result so it only has results
for males:
result |>
filterStrata(sex == "male")
#> # A tibble: 1 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 my_cdm cohort_name cohort1 sex male
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>Now let’s see an example using the information in settings to filter the result. In this case, we only want results from the “overall_analysis”. Since this information is in the result_type column in settings, we proceed as follows:
result |>
filterSettings(result_type == "overall_analysis")
#> # A tibble: 1 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 2 my_cdm overall overall overall overall
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr><summarised_result>Working with <summarised_result> objects often
involves managing columns for settings,
grouping, strata, and
additional levels. These retrieval functions help you
identify and manage columns:
settingsColumns() gives you the setting names that are
available in a <summarised_result> object.groupColumns() gives you the new columns that will be
generated when splitting group_name-group_level pair into different
columns.strataColumns() gives you the new columns that will be
generated when splitting strata_name-strata_level pair into different
columns.additionalColumns() gives you the new columns that will
be generated when splitting additional_name-additional_level pair into
different columns.tidyColumns() gives you the columns that the object
will have if you tidy it (tidy(result)). This function is
very useful for knowing which columns can be included in
plot and table functions.Let’s see the different values with our example result data:
The unite functions serve as the complementary tools to the split
functions, allowing you to generate name-level pair columns from
targeted columns within a <dataframe>.
There are three unite functions that allow you to create
group, strata, and additional name-level columns from specified sets of
columns:
uniteAdditional()
uniteGroup()
uniteStrata()
For example, to create group_name and group_level columns from a tibble, you can use:
# Create and show mock data
data <- tibble(
denominator_cohort_name = c("general_population", "older_than_60", "younger_than_60"),
outcome_cohort_name = c("stroke", "stroke", "stroke")
)
head(data)
#> # A tibble: 3 × 2
#> denominator_cohort_name outcome_cohort_name
#> <chr> <chr>
#> 1 general_population stroke
#> 2 older_than_60 stroke
#> 3 younger_than_60 stroke
# Unite into group name-level columns
data |>
uniteGroup(cols = c("denominator_cohort_name", "outcome_cohort_name"))
#> # A tibble: 3 × 2
#> group_name group_level
#> <chr> <chr>
#> 1 denominator_cohort_name &&& outcome_cohort_name general_population &&& stroke
#> 2 denominator_cohort_name &&& outcome_cohort_name older_than_60 &&& stroke
#> 3 denominator_cohort_name &&& outcome_cohort_name younger_than_60 &&& strokeThese functions can be helpful when creating your own
<summarised_result>.