In the process of developing search strategies for evidence synthesis, it is standard practice to test different versions of a search against a set of already known relevant studies — benchmark studies. In this way, the right balance between precision and sensitivity can be achieved prior to screening.
Until now, this within-database testing has been the primary method of pre-screening search validation. With CiteSource, we can test search strategies across databases to assess the usefulness of certain databases before finalizing our database set. This vignette provides a workflow for testing a search strategy across multiple databases and against a set of benchmark studies.
In this example, we are running a search about loneliness and gambling addiction. We developed a search strategy for PsycInfo, our main database, and want to see if searching Web of Science and PubMed adds useful records and helps us find more of our benchmark studies.
Here we import three database searches and a set of benchmark
studies. The benchmark file is assigned cite_source = NA
since it does not represent a database search, and
cite_label = "benchmark" to identify it as the reference
set.
citation_files <- list.files(path = "valid_data", pattern = "\\.ris", full.names = TRUE)
citation_files
#> [1] "valid_data/WoS_79.ris" "valid_data/benchmark.ris"
#> [3] "valid_data/psycinfo_64.ris" "valid_data/pubmed_46.ris"
citations <- read_citations(citation_files,
cite_sources = c(NA, "psycinfo", "pubmed", "wos"),
cite_labels = c("benchmark", "search", "search", "search"),
tag_naming = "best_guess")
#> Note: the following cite_label value(s) are not in the standard vocabulary (search / screened / final): benchmark. Phase-analysis functions expect these exact labels.
#> Import completed - with the following details:
#> file cite_source cite_string cite_label citations
#> 1 WoS_79.ris <NA> <NA> benchmark 79
#> 2 benchmark.ris psycinfo <NA> search 13
#> 3 psycinfo_64.ris pubmed <NA> search 64
#> 4 pubmed_46.ris wos <NA> search 46CiteSource merges duplicate records while preserving the
cite_source and cite_label metadata fields, so
the origin of each record is retained through deduplication.
A heatmap shows the total number of records from each database and the count of overlapping records for each pair. Web of Science yielded the highest number of records on gambling addiction and loneliness; PubMed the least.
The percentage heatmap shows what share of each row’s records were also found in each column. Here, 55% of Web of Science records were also found in PsycInfo, while 44% of PsycInfo records were found in Web of Science.
An upset plot provides more detail about shared and unique records across all source combinations. Web of Science had the most unique records not found in any other database (n=29); PubMed had only four unique records. Twenty-four records were found in every database.
To examine which records are exclusive to each database, filter
n_unique for unique == TRUE and rejoin with
unique_citations to recover full bibliographic data.
unique_psycinfo <- n_unique |>
dplyr::filter(cite_source == "psycinfo", unique == TRUE) |>
dplyr::inner_join(unique_citations, by = "duplicate_id")
unique_pubmed <- n_unique |>
dplyr::filter(cite_source == "pubmed", unique == TRUE) |>
dplyr::inner_join(unique_citations, by = "duplicate_id")
unique_wos <- n_unique |>
dplyr::filter(cite_source == "wos", unique == TRUE) |>
dplyr::inner_join(unique_citations, by = "duplicate_id")
# To export for manual review:
# export_csv(unique_pubmed, "pubmed_unique.csv")Filtering unique_citations to only the benchmark records
and passing to record_level_table() shows which databases
contained each benchmark study.
citation_summary_table() calculates sensitivity and
precision scores for each database against the benchmark set, providing
a concise overview of each source’s performance before screening
begins.
| Sources |
Records
|
Contribution
|
Sensitivity | Precision | |
|---|---|---|---|---|---|
| total | unique | unique | |||
| search | |||||
| pubmed | 64 | 33 | 41.77% | 71.11% | — |
| wos | 46 | 18 | 22.78% | 51.11% | — |
| psycinfo | 13 | 7 | 8.86% | 14.44% | — |
| Total1 | 90 | 58 | 64.44% | — | — |
| benchmark | |||||
| wos | 39 | 14 | 17.72% | 49.37% | 84.78% |
| pubmed | 35 | 9 | 11.39% | 44.30% | 54.69% |
| NA | 27 | 27 | 34.18% | 34.18% | — |
| psycinfo | 6 | 2 | 2.53% | 7.59% | 46.15% |
| Total1 | 79 | 52 | 65.82% | — | 87.78% |
Included fields:
|
|||||
| 1 After deduplication | |||||
CiteSource can export deduplicated results as CSV, RIS, or BibTeX files, and reimport them to resume analysis later.
#export_csv(unique_citations, filename = "unique-by-source.csv", separate = "cite_source")
#export_ris(unique_citations, filename = "unique_citations.ris", source_field = "DB", label_field = "N1")
#export_bib(unique_citations, filename = "unique_citations.bib", include = c("sources", "labels", "strings"))
#reimport_csv("unique-by-source.csv")CiteSource can evaluate the usefulness of different databases against a set of benchmark studies before screening begins. In this example, both PsycInfo and Web of Science made unique contributions to the benchmark set and had a significant proportion of unique records. PubMed did not contribute any unique benchmark records and mostly overlapped with the other two databases — providing evidence that it may not be an effective addition for this topic.