| Title: | Prepare Data and Trees for Phylogenetic Comparative Methods |
| Version: | 1.0.0 |
| Description: | Reconcile species names across datasets and phylogenetic trees for comparative biology workflows. Identifies mismatches due to formatting differences, taxonomic synonymy, and spelling errors. Produces detailed reports documenting how each name was resolved, which taxonomic authority was used, and what remains unresolved. Supports exact matching, name normalisation, synonym resolution via local taxonomic databases, and fuzzy matching for likely typos. Detects taxonomic splits and lumps. For methodological context, see Nakagawa et al. (2026) <doi:10.32942/X2468Z>. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/itchyshin/prepR4pcm, https://itchyshin.github.io/prepR4pcm/ |
| BugReports: | https://github.com/itchyshin/prepR4pcm/issues |
| Date: | 2026-06-16 |
| Language: | en-GB |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.1.0) |
| Imports: | ape, cli, rlang, tibble |
| Suggests: | caper, clootl, digest, dplyr, fishtree, httr2, knitr, MCMCglmm, phytools, piggyback, pkgdown, readr, rgnparser, rmarkdown, rotl, rtrees, spelling, stringr, taxadb, testthat (≥ 3.0.0) |
| LazyData: | true |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| Config/roxygen2/version: | 8.0.0 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-20 18:45:02 UTC; z3437171 |
| Author: | Shinichi Nakagawa |
| Maintainer: | Shinichi Nakagawa <itchyshin@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-25 11:00:13 UTC |
prepR4pcm: Reconcile species names for phylogenetic comparative methods
Description
Species names in your dataset rarely match the tip labels of your
phylogenetic tree. Formatting differences (Homo_sapiens vs
Homo sapiens), taxonomic synonymy (Corvus brachyrhynchos splits and
lumps), and simple spelling mistakes silently drop species from PGLS,
phylogenetic mixed models, and other phylogenetic comparative methods
(PCMs). prepR4pcm is a toolkit for ecologists and evolutionary
biologists to detect and resolve these mismatches, audit every decision,
and produce aligned data-tree pairs ready for downstream analysis.
Typical workflow
A minimal end-to-end pipeline looks like this:
# 1. Match your data frame to a tree
rec <- reconcile_tree(
avonet_subset, tree_jetz,
x_species = "Species1",
fuzzy = TRUE # enable typo correction
)
# 2. Review what matched, what is flagged, what is unresolved
reconcile_summary(rec)
reconcile_plot(rec)
reconcile_suggest(rec) # suggest near-misses for unresolved names
# 3. Correct any unresolved or flagged cases by hand
rec <- reconcile_override(rec,
name_x = "Corvus brachyrhnchos", # typo in data
name_y = "Corvus_brachyrhynchos")
# 4. Produce an aligned dataset and pruned tree
aligned <- reconcile_apply(rec,
data = avonet_subset, tree = tree_jetz,
species_col = "Species1",
drop_unresolved = TRUE)
# 5. aligned$data and aligned$tree are ready for downstream PCM tools
Key concepts
- Reconciliation object
The central data structure. Contains a
mappingtibble (one row per source name, with match type and score), ametalist (reproducibility provenance), acountssummary, anoverrideslog of applied manual corrections, and anunused_overridesaudit trail of overrides that could not be applied (e.g. whenname_yis missing from the target). Returned by allreconcile_*matching functions. Inspect withreconcile_summary(), extract the table withreconcile_mapping(), and act on it withreconcile_apply(),reconcile_merge(), orreconcile_export().- Four-stage matching cascade
Names are resolved in this order, and the first stage that produces a match is recorded as
match_type:-
exact — verbatim string equality.
-
normalized — after removing underscores, fixing case, stripping authority strings (Corvus corax Linnaeus 1758), and applying diacritic folding.
-
synonym — via a local taxonomic database (see taxadb) such as Catalogue of Life or GBIF.
-
fuzzy — character-level similarity on the remaining unmatched names (opt-in via
fuzzy = TRUE).
Any additional
overridesor manual edits are applied on top asmatch_type = "manual".-
- Provenance
Every decision is logged in the mapping table (
match_type,match_score,match_source) and inmeta(package version, timestamp, taxonomic authority, fuzzy threshold, etc.). Usereconcile_report()to produce a shareable HTML audit trail for supplementary materials or collaborators.- Splits and lumps
Taxonomic revisions often split one species into several, or lump several into one.
reconcile_splits_lumps()flags these cases so you can decide how to handle them before analysis.- Tree augmentation
When unresolved species have congeners in the tree,
reconcile_augment()can graft them in as sister taxa at genus level. This is an exploratory aid: always run sensitivity analyses with and without augmented tips.
Function families
- Match names
reconcile_tree(),reconcile_data(),reconcile_to_trees(),reconcile_trees(),reconcile_multi()- Apply and export
- Inspect and audit
reconcile_summary(),reconcile_mapping(),reconcile_plot(),reconcile_suggest(),reconcile_diff(),reconcile_report(),reconcile_review()- Corrections and crosswalks
reconcile_override(),reconcile_override_batch(),reconcile_crosswalk()- Advanced
- Name utilities
Getting started
-
vignette("getting-started", package = "prepR4pcm")— core concepts with a minimal worked example. -
vignette("bird-workflow", package = "prepR4pcm")— a realistic multi-dataset bird pipeline ending in PGLS and phylogenetic GLMM fits. -
vignette("db-assembly-workflow_mammals", package = "prepR4pcm")— assembling a mammal trait database from three sources (Amniote, PanTHERIA, TetrapodTraits), reconciling the unique species names against a mammal phylogeny, and producing a model-ready species-level data frame.
Author(s)
Maintainer: Shinichi Nakagawa itchyshin@gmail.com (ORCID) [copyright holder]
Authors:
Shinichi Nakagawa itchyshin@gmail.com (ORCID) [copyright holder]
Santiago Ortega
Ayumi Mizuno
Eduardo S.A. Santos
Malgorzata Lagisz (ORCID)
Bhavya Jain
Jimuel Jr Celeste
Sergio Poo Hernandez pooherna@ualberta.ca
References
Mizuno, A., Drobniak, S.M., Williams, C., Lagisz, M. & Nakagawa, S. (2025) Promoting the use of phylogenetic multinomial generalised mixed-effects model to understand the evolution of discrete traits. Journal of Evolutionary Biology 38:1699–1715. doi:10.1093/jeb/voaf116
Norman, K.E., Chamberlain, S. & Boettiger, C. (2020) taxadb: A high-performance local taxonomic database interface. Methods in Ecology and Evolution 11:1153–1159. doi:10.1111/2041-210X.13440
Paradis, E. & Schliep, K. (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528. doi:10.1093/bioinformatics/bty633
See Also
Useful links:
Report bugs at https://github.com/itchyshin/prepR4pcm/issues
Internal: delegate grafting to rtrees::get_tree(tree_by_user = TRUE)
Description
Internal: delegate grafting to rtrees::get_tree(tree_by_user = TRUE)
Usage
.pr_augment_rtrees(species_to_add, tree, taxon = NULL, quiet = FALSE, ...)
Arguments
species_to_add |
Character vector of binomials present in data but absent from the tree. |
tree |
The user's backbone phylo. |
taxon |
One of rtrees' supported taxa. |
quiet |
Logical. |
... |
Forwarded to |
Value
A list with tree, augmented, skipped, backend_meta.
Internal: delegate grafting to U.PhyloMaker::phylo.maker()
Description
Universal (plants + animals) variant of the V.PhyloMaker grafting
strategy. Wraps U.PhyloMaker::phylo.maker() so the user can pick
a specific scenario.
Usage
.pr_augment_uphylomaker(
species_to_add,
tree,
gen.list = NULL,
scenario = "S3",
quiet = FALSE,
...
)
Arguments
species_to_add |
Character vector of binomials to graft. |
tree |
The user's backbone phylo. |
gen.list |
A data.frame mapping genus -> family. Required by
U.PhyloMaker. If |
scenario |
Character. One of "S1", "S2", "S3". Default "S3". |
quiet |
Logical. |
... |
Forwarded to |
Value
A list with tree, augmented, skipped, backend_meta.
References
Jin, Y. & Qian, H. (2023). U.PhyloMaker: an R package that can generate large phylogenetic trees for plants and animals. Plant Diversity 45(3): 347–352. doi:10.1016/j.pld.2022.12.007
Internal: delegate grafting to V.PhyloMaker2::phylo.maker()
Description
Plant-only alternative to the rtrees backend. Wraps
V.PhyloMaker2::phylo.maker() so the user can pick a specific
V.PhyloMaker scenario (S1 / S2 / S3, see Jin & Qian 2019).
Usage
.pr_augment_vphylomaker(
species_to_add,
tree,
scenarios = "S3",
quiet = FALSE,
...
)
Arguments
species_to_add |
Character vector of binomials to graft. |
tree |
The user's backbone phylo. |
scenarios |
Character. One of "S1", "S2", "S3" (default
"S3"). Forwarded to |
quiet |
Logical. |
... |
Forwarded to |
Value
A list with tree, augmented, skipped, backend_meta.
References
Jin, Y. & Qian, H. (2019). V.PhyloMaker: an R package that can generate very large phylogenies for vascular plants. Ecography 42(8): 1353–1359. doi:10.1111/ecog.04434
Jin, Y. & Qian, H. (2022). V.PhyloMaker2: an updated and enlarged R package that can generate very large phylogenies for vascular plants. Plant Diversity 44(4): 335–339. doi:10.1016/j.pld.2022.05.005
Look up names via the Global Names verifier (HTTP)
Description
Internal helper for pr_lookup_authority(authority = "gnverifier").
POSTs the input vector to the
Global Names verifier and maps
each bestResult back to the 5-column tibble contract used by the
taxadb path. Returns all-rows-not_found and emits a single
warning on network failure, mirroring the taxadb branch's
degradation behaviour so the cascade above keeps running.
Usage
.pr_lookup_gnverifier(names, db_version = NULL)
Arguments
names |
Character vector of names to verify. |
db_version |
Ignored; emits a single warning if non-NULL. |
Value
A tibble with the same 5 columns as pr_lookup_authority().
Look up names in a taxadb-backed authority
Description
Internal helper extracted from pr_lookup_authority() so the
taxadb path can sit alongside the gnverifier path without
duplicating the cache machinery in pr_lookup_authority().
Usage
.pr_lookup_taxadb(to_lookup, authority, db_version = NULL)
Arguments
to_lookup |
Character vector of names to look up. |
authority |
A length-1 character vector. Authority code. |
db_version |
A length-1 character vector or NULL. |
Value
A tibble with the same 5 columns as pr_lookup_authority().
Normalise scientific names via the gnparser backend
Description
Internal helper for pr_normalize_names(parser = "gnparser").
Routes parsing through rgnparser::gn_parse_tidy() (which wraps
the gnparser Go binary, part of the Global Names Architecture),
then applies the same rank and case-standardisation contract as
the internal cascade so the return value is interchangeable.
Usage
.pr_normalize_gnparser(names, rank)
Arguments
names |
Character vector of raw scientific names. |
rank |
One of |
Value
Character vector with normalisation_log attribute.
AVONET morphological trait data (subset)
Description
A subset of ~920 bird species from the AVONET database (BirdLife taxonomy), covering 12 passerine families within the Corvoidea and allied clades. Contains morphological measurements and ecological traits.
Usage
avonet_subset
Format
A data frame with ~920 rows and 16 columns:
- Species1
Scientific name (BirdLife taxonomy)
- Family1
Family
- Order1
Order
- Beak.Length_Culmen
Beak length from culmen (mm)
- Beak.Length_Nares
Beak length from nares (mm)
- Beak.Width
Beak width (mm)
- Beak.Depth
Beak depth (mm)
- Tarsus.Length
Tarsus length (mm)
- Wing.Length
Wing length (mm)
- Mass
Body mass (g)
- Habitat
Primary habitat code
- Habitat.Density
Habitat density code
- Migration
Migration status
- Trophic.Level
Trophic level
- Trophic.Niche
Trophic niche
- Primary.Lifestyle
Primary lifestyle
Source
Tobias et al. (2022) AVONET: morphological, ecological and geographical data for all birds. Ecology Letters 25:581–597. doi:10.1111/ele.13898
BirdLife-BirdTree taxonomy crosswalk
Description
A crosswalk mapping species names between BirdLife International taxonomy
and the BirdTree (Jetz et al. 2012) taxonomy. This is useful as a
pre-built override table for reconciling datasets that use BirdLife names
against phylogenies that use BirdTree names. See
reconcile_crosswalk() to convert this into an overrides table.
Usage
crosswalk_birdlife_birdtree
Format
A data frame with ~11,000 rows and 4 columns:
- Species1
Species name in BirdLife taxonomy
- Species3
Species name in BirdTree taxonomy
- Match.type
Type of match:
"1BL to 1BT"(one-to-one),"Many BL to 1BT"(lump),"1BL to many BT"(split),"Extinct","Newly described species","Invalid taxon"- Match.notes
Additional notes on the match
Source
The crosswalk is distributed as supporting information with the AVONET database release (Tobias et al. 2022). It maps two underlying taxonomies, both of which should be cited if you use the crosswalk in published work — see the references below.
References
Tobias, J.A. et al. (2022) AVONET: morphological, ecological and geographical data for all birds. Ecology Letters 25:581–597. doi:10.1111/ele.13898
Jetz, W., Thomas, G.H., Joy, J.B., Hartmann, K. & Mooers, A.O. (2012) The global diversity of birds in space and time. Nature 491:444–448. doi:10.1038/nature11631
Plumage lightness data (subset)
Description
A subset of ~650 passerine species from Delhey et al. (2019), with
plumage lightness measurements and climate variables. Covers species
from the same families as avonet_subset that have plumage data.
Note that species names use underscores (e.g.,
"Corvus_corax"), making this useful for demonstrating name
normalisation.
Usage
delhey_subset
Format
A data frame with columns:
- TipLabel
Species name with underscores (tree tip label format)
- family
Family name
- annual_mean_temperature
Annual mean temperature at range centroid
- annual_precipitation
Annual precipitation at range centroid
- lightness_male
Mean plumage lightness, males
- lightness_female
Mean plumage lightness, females
Source
Delhey et al. (2019) Reconciling ecogeographical rules: rainfall and temperature predict global colour variation in the largest bird radiation. Ecology Letters 22:726–736. doi:10.1111/ele.13233
Amniote-style mammal life-history sample
Description
A ~5,000-species sample of mammal life-history records, prepared to
mirror the structure of the Amniote Life-History Database. Used by
the db-assembly-workflow_mammals vignette to demonstrate
assembling trait data from multiple sources before reconciling
against a phylogenetic tree.
Usage
mammal_amniote_example
Format
A tibble with ~4,953 rows and 5 columns:
nameLength-1 character vector. Scientific name (genus species), space-separated. Some rows carry trinomials.
female_body_mass_gNumeric. Female adult body mass (g);
NAwhen unknown.adult_body_mass_gNumeric. Sex-pooled adult body mass (g);
NAwhen unknown.litter_or_clutch_size_nNumeric. Mean offspring per reproductive event;
NAwhen unknown.litters_or_clutches_per_yNumeric. Number of reproductive events per year;
NAwhen unknown.
Source
Myhrvold et al. (2015) An amniote life-history database to perform comparative analyses with birds, mammals, and reptiles. Ecology 96:3109. doi:10.1890/15-0846R.1
PanTHERIA-style mammal life-history sample
Description
A ~5,400-species sample of mammal life-history records, prepared to
mirror the structure of the PanTHERIA database. Used by the
db-assembly-workflow_mammals vignette.
Usage
mammal_pantheria_example
Format
A tibble with ~5,416 rows and 4 columns:
MSW05_BinomialLength-1 character vector. Scientific name under MSW3 (Mammal Species of the World 3) taxonomy.
5-1_AdultBodyMass_gNumeric. Adult body mass (g);
NAwhen unknown.15-1_LitterSizeNumeric. Mean litter size;
NAwhen unknown.16-1_LittersPerYearNumeric. Litters per year;
NAwhen unknown.
Source
Jones et al. (2009) PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology 90:2648. doi:10.1890/08-1494.1
TetrapodTraits-style mammal sample
Description
A ~5,900-species sample of mammal trait records, prepared to
mirror the structure of the TetrapodTraits 1.0.0 database. Used by
the db-assembly-workflow_mammals vignette.
Usage
mammal_tetrapodtraits_example
Format
A tibble with ~5,911 rows and 3 columns:
Scientific.NameLength-1 character vector. Scientific name (genus species), period-separated genus.species column name as in the source release.
BodyMass_gNumeric. Body mass (g);
NAwhen unknown.LitterSizeNumeric. Mean litter size;
NAwhen unknown.
Source
Moura et al. (2024) A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases. PLOS Biology 22:e3002658. doi:10.1371/journal.pbio.3002658
Mammal phylogenetic tree (example)
Description
A 5,987-tip subset of the Upham, Esselstyn & Jetz (2019) VertLife
mammal phylogeny, used by the db-assembly-workflow_mammals
vignette to demonstrate reconciling species names from multiple
trait sources against a tree. Tip labels use underscores
(Genus_species); 76 tips carry an X_ prefix, denoting Mesozoic
stem-mammal fossils grafted onto the molecular backbone via the
Upham et al. "backbone-and-patch" framework.
Usage
mammal_tree_example
Format
An object of class phylo (from the ape package), with
5,987 tips and 5,986 internal nodes.
Details
Source confirmed by Santiago Ortega, who contributed the data, on issue #11.
If you use this tree in published work, please cite Upham et al. (2019) directly. The bundled object is a subset used for examples only — for analysis-grade trees, download the full credible set from https://vertlife.org/phylosubsets/.
Source
Upham, N.S., Esselstyn, J.A. & Jetz, W. (2019) Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLOS Biology 17(12):e3000494. doi:10.1371/journal.pbio.3000494. Full credible sets at https://vertlife.org/phylosubsets/.
References
Other published mammal phylogenies suitable for comparative analysis (alternatives to Upham et al. 2019):
Faurby, S. & Svenning, J.-C. (2015) A species-level phylogeny of all extant and late Quaternary extinct mammals using a novel heuristic-hierarchical Bayesian approach. Molecular Phylogenetics and Evolution 84:14–26. doi:10.1016/j.ympev.2014.11.001
Bininda-Emonds, O.R.P. et al. (2007) The delayed rise of present-day mammals. Nature 446:507–512. doi:10.1038/nature05634
Nest trait data (subset)
Description
A subset of ~920 bird species from the global nest trait database (v2), covering the same Corvoidea + allied families as avonet_subset. Contains nest site and structure information.
Usage
nesttrait_subset
Format
A data frame with columns:
- Scientific_name
Scientific name (HBW/BirdLife v5 taxonomy)
- Order
Order
- Family
Family
- Common_name
English common name
- NestSite_ground
Ground nesting (0/1)
- NestSite_tree
Tree nesting (0/1)
- NestSite_nontree
Non-tree elevated nesting (0/1)
- NestSite_cliff_bank
Cliff/bank nesting (0/1)
- NestStr_scrape
Scrape nest (0/1)
- NestStr_platform
Platform nest (0/1)
- NestStr_cup
Cup nest (0/1)
- NestStr_dome
Dome nest (0/1)
- NestStr_primary_cavity
Primary cavity nester (0/1)
- NestStr_second_cavity
Secondary cavity nester (0/1)
Source
Chia et al. (2023) A global database of bird nest traits. Scientific Data 10:923. doi:10.1038/s41597-023-02837-1
Align a tree to a reconciliation mapping
Description
Renames and/or prunes tip labels according to the reconciliation mapping.
Usage
pr_align_tree(tree, mapping, drop_unresolved = FALSE)
Arguments
tree |
An |
mapping |
A mapping tibble from a reconciliation object. |
drop_unresolved |
Logical. Drop tips with no match? Default |
Value
A modified ape::phylo object.
Bind a species to a tree as sister to a congener
Description
Uses phytools::bind.tip() if available, otherwise falls back to a
pure-ape implementation using ape::bind.tree().
Usage
pr_bind_species(tree, sp_label, congener_tips, where, bl)
Arguments
tree |
phylo object. |
sp_label |
A length-1 character vector. Tip label to add (underscore format). |
congener_tips |
Character vector of congener tip labels. |
where |
Placement strategy. |
bl |
Branch length for the new tip. |
Value
List with tree and placed_near.
Bind a tip to a tree
Description
Wrapper that uses phytools::bind.tip() if available, otherwise
uses a pure-ape implementation via ape::bind.tree().
Usage
pr_bind_tip(tree, tip_label, where, position = 0, edge.length = 0)
Arguments
tree |
phylo object. |
tip_label |
A length-1 character vector. Label for the new tip. |
where |
Integer. Node or tip index to bind near. |
position |
Numeric. How far back from the node to attach. |
edge.length |
Numeric. Branch length of the new tip. |
Value
A modified phylo object.
Calculate branch length for an augmented tip
Description
Calculate branch length for an augmented tip
Usage
pr_calc_augment_bl(tree, congener_tips, method)
Arguments
tree |
phylo object. |
congener_tips |
Character vector of congener tip labels. |
method |
Branch length strategy. |
Value
Numeric(1) branch length.
Format the citations for a tree result
Description
Given a pr_tree_result produced by pr_get_tree() or
pr_date_tree(), emit a formatted citation block listing the
backend used, the underlying paper(s), and per-tree source
citations when the result is a multi-tree posterior. Useful when
writing the methods section of a paper or when adding a tree
provenance footnote to a figure.
Usage
pr_cite_tree(result, format = c("text", "markdown", "bibtex"))
Arguments
result |
A |
format |
A length-1 character vector. One of:
|
Value
A length-1 character vector containing the formatted
citation block. The result is also printed (invisibly returned)
so calling pr_cite_tree(res) on its own at the console shows
the block.
See Also
pr_get_tree() / pr_date_tree() for producing the
pr_tree_result that this function formats.
Examples
# Build a minimal `pr_tree_result` by hand so the three citation
# formats are visible without a network call. In real use this
# object is returned by `pr_get_tree()` or `pr_date_tree()`.
fake_res <- structure(
list(
source = "fishtree",
tree = ape::read.tree(text = "(Salmo_salar,Esox_lucius);"),
backend_meta = list(tree_provenance = list())
),
class = "pr_tree_result"
)
cat(pr_cite_tree(fake_res, format = "text")) # human-readable
cat(pr_cite_tree(fake_res, format = "markdown")) # for a README
cat(pr_cite_tree(fake_res, format = "bibtex")) # for a .bib file
# Realistic use after actually retrieving a tree from a backend:
if (requireNamespace("fishtree", quietly = TRUE)) {
res <- pr_get_tree(c("Salmo salar", "Esox lucius"),
source = "fishtree")
cat(pr_cite_tree(res, format = "markdown"))
}
Compute summary counts from a mapping table
Description
Compute summary counts from a mapping table
Usage
pr_compute_counts(mapping)
Arguments
mapping |
A mapping tibble. |
Value
A named list of counts.
Time-calibrate a topology using the DateLife chronogram database
Description
Wraps datelife::datelife_use() to add divergence-time
calibrations to an existing phylo (or multiPhylo) using
DateLife's database of pre-computed chronograms (Sanchez Reyes et
al. 2024, Systematic Biology 73:470). Returns a result with
the same shape as pr_get_tree() so downstream PCM workflows —
including pigauto's
posterior-tree imputation — can consume it without further glue
code.
Usage
pr_date_tree(
tree,
n_dated = 1L,
dating_method = "bladj",
check_ultrametric = TRUE,
...
)
Arguments
tree |
An |
n_dated |
A length-1 positive integer. How many calibrated
trees to return per input topology. |
dating_method |
A length-1 character vector. Forwarded to
|
check_ultrametric |
Logical. After dating, check that the
result is ultrametric and warn if not. Default |
... |
Additional arguments forwarded to
|
Value
A list with class pr_tree_result and components:
treeThe dated topology — a
phylowhenn_dated = 1or amultiPhylowhenn_dated > 1.matchedTip labels of the input that DateLife was able to date.
unmatchedTip labels of the input absent from DateLife's database (returned with no calibration applied).
mappingA tibble with one row per input tip label, mirroring
pr_get_tree()'s audit table:input_name,normalized_name,query_name,tree_name,in_tree,match_type,placement_status, and the fourtnrs_*columns.placement_statusand thetnrs_*columns areNAfor DateLife dating, which applies no TNRS step.sourceAlways
"datelife"(paired withpr_get_tree()'s dispatch).backend_metaIncludes
dating_method,calibrations(per-node calibration table from DateLife), and the standardtree_provenancelist (one entry per returned tree).
When to use this
Use pr_date_tree() when you already have a topology (e.g. from a
published phylogeny or your own analysis) and want to attach
divergence times. Use pr_get_tree() with source = "datelife" if
you have only species names. Both end up calling the GitHub-only
datelife package, but the starting point is different.
Install datelife before calling this function —
prepR4pcm does NOT pull it in via Suggests (its transitive
dep tree can't be auto-resolved by pak on a clean CI image, so
we keep it as an opt-in install):
pak::pak("phylotastic/datelife").
What "n_dated > 1" actually returns
This is a common point of confusion. With n_dated = 50,
pr_date_tree() does NOT change the input topology — it returns
up to 50 chronograms that all share the input topology but
differ in their branch lengths, because each variant is dated
using a different source paper in DateLife's chronogram database
(think: variant 1 uses Hedges et al. 2015, variant 2 uses
Bininda-Emonds et al. 2007, etc.). So you get one topology and
N versions of branch lengths, not N different topologies.
If you want both axes of variation (topology uncertainty +
dating uncertainty), feed a multiPhylo of N topologies in.
DateLife's each = TRUE mode is then applied per input tree, so
the output reflects the cross-product of input topology and
DateLife source. Example pipeline:
trees <- pr_get_tree(species, source = "rtrees",
taxon = "mammal") # ~100 topologies
dated <- pr_date_tree(trees$tree, n_dated = 5)
By contrast, pr_get_tree(species, source = "datelife", n_tree = 50)
returns up to 50 chronograms where each variant comes from a
different DateLife source — i.e. a different topology AND
different branch lengths per variant, because DateLife's source
chronograms aren't constrained to share a topology.
References
Sanchez Reyes, L. L., McTavish, E. J., & O'Meara, B. (2024). DateLife: Leveraging databases and analytical tools to reveal the dated Tree of Life. Systematic Biology, 73(2), 470–485. doi:10.1093/sysbio/syae015
See Also
pr_get_tree() for retrieval (species –> tree);
pr_cite_tree() for formatting the citations of the result;
reconcile_augment() for filling tip-level gaps in an existing
tree (a complementary operation to dating).
Examples
if (rlang::is_installed("datelife")) {
# Example 1: one chronogram from a topology
library(ape)
tr <- read.tree(text =
"(Rhea_americana,(Pterocnemia_pennata,Struthio_camelus));")
res <- pr_date_tree(tr)
res$tree # phylo (chronogram)
res$backend_meta$dating_method # "bladj"
# Example 2: per-source chronograms for posterior-tree PCMs
res <- pr_date_tree(tr, n_dated = 5)
class(res$tree) # "multiPhylo"
length(res$backend_meta$tree_provenance) # one entry per tree
}
Detect the species name column in a data frame
Description
Uses a two-stage heuristic: first checks for common column names, then falls back to content-based detection (binomial name pattern).
Usage
pr_detect_species_column(df, arg_name = "x_species")
Arguments
df |
A data frame. |
arg_name |
Character. Name of the argument, used in error messages
(e.g., |
Details
Stage 1 — Name matching. Checks column names (case-insensitive) against
a priority list: species, species_name, binomial, taxon,
scientificName, Scientific_name, canonical_name, tip.label,
PhyloName, Binomial, latin_name, sci_name.
Stage 2 — Content heuristic. If no name match, checks which character
columns have >50% of non-NA values matching the binomial pattern
^[A-Z][a-z]+ [a-z]+.
If zero or multiple candidates are found, the function stops with an informative error.
Value
A length-1 character vector: the detected column name.
Detect taxonomic splits and lumps in a reconciliation mapping
Description
Examines a reconciliation mapping for cases where:
-
Splits: one name in x maps to multiple names in y (or vice versa) via synonym resolution, indicating a taxonomic split.
-
Lumps: multiple names in x map to a single name in y (or vice versa) via synonym resolution, indicating a taxonomic lump.
Usage
pr_detect_splits_lumps(mapping)
Arguments
mapping |
A mapping tibble from a |
Details
Detection uses the name_resolved column: when multiple rows share
the same accepted name but differ in the original names, the accepted
name has been split or lumped between the two sources.
Value
A list with two tibbles:
- splits
Cases where one name in x maps to multiple names in y (or one resolved name covers multiple y names).
- lumps
Cases where multiple names in x map to one name in y (or multiple x names share one resolved name).
Each tibble has columns: name_resolved, names_x, names_y,
n_x, n_y, type.
Ensure the taxadb local database is available
Description
Downloads the database for the specified authority if not already cached.
Usage
pr_ensure_db(authority, db_version = NULL)
Arguments
authority |
A length-1 character vector. Taxonomic authority code. |
db_version |
A length-1 character vector or NULL. Database version. |
Value
Invisibly returns the authority string.
Extract genus from binomial species names
Description
Takes the first word of each name as the genus.
Usage
pr_extract_genus(names)
Arguments
names |
Character vector of species names. |
Value
Character vector of genus names.
Extract tip labels from a phylogenetic tree
Description
Return the tip labels of a tree as a character vector, whether the
tree is already an ape::phylo object in memory or lives in a
Newick or Nexus file on disk. Convenience wrapper around
tree$tip.label that also handles file input and multi-tree files
(returns the tips of the first tree).
Usage
pr_extract_tips(tree)
Arguments
tree |
An |
Value
A character vector of tip labels (one element per tip).
See Also
pr_normalize_names() for cleaning tip labels before
joining against a data frame.
Other name utilities:
pr_normalize_names()
Examples
data(tree_jetz)
head(pr_extract_tips(tree_jetz))
Fuzzy-match two sets of species names
Description
Uses component-based similarity: the genus and epithet are matched
separately, then combined with weights (genus 0.6, epithet 0.4) to
reflect that genus-level errors are more informative. Uses base R
utils::adist() for Levenshtein distance — no extra dependencies.
Usage
pr_fuzzy_match(names_x, names_y, threshold = 0.9, rank = "species")
Arguments
names_x |
Character vector. |
names_y |
Character vector. |
threshold |
Numeric (0–1). Minimum similarity score. Default 0.9. |
rank |
Character. |
Details
Genus pre-filtering is applied: only names whose genus is within 2 edits of each other are compared. This reduces the number of pairwise comparisons dramatically for large datasets.
Value
A tibble with columns: name_x, name_y, score, notes.
Retrieve a candidate phylogeny for a species list
Description
Connects reconciled species names to an external phylogenetic resource
and returns a pruned candidate tree plus a report of which species
were matched and which were dropped. Intended as the bridge between
the package's reconciliation cascade and any downstream comparative
analysis: feed the result of reconcile_data() / reconcile_tree()
(or any character vector of cleaned names) into pr_get_tree() and
get back a phylo ready for reconcile_apply().
Usage
pr_get_tree(
x,
source = c("rotl", "rtrees", "clootl", "fishtree", "datelife", "auto"),
species_col = NULL,
taxon = NULL,
n_tree = 1L,
cache = FALSE,
tnrs = c("auto", "always", "never"),
min_match = 0.8,
check_ultrametric = TRUE,
resolve_polytomies = FALSE,
branch_lengths = NULL,
...
)
Arguments
x |
One of:
|
source |
A length-1 character vector. Which external backend to use. One of:
|
species_col |
A length-1 character vector. Required when |
taxon |
A length-1 character vector. Required when
|
n_tree |
A length-1 positive integer. How many trees to
request from the backend. Default
When the request returns a multiPhylo, the result's |
cache |
Logical. Cache the result on disk and reuse it on
subsequent identical calls? Default |
tnrs |
A length-1 character vector. Run a TNRS preflight
(Open Tree of Life name resolution via
When |
min_match |
A length-1 numeric in |
check_ultrametric |
Logical. After producing the tree, check
that it's ultrametric (all tips equidistant from the root) and
warn if not. Default |
resolve_polytomies |
Logical. After retrieval, resolve
any polytomies via |
branch_lengths |
A length-1 character vector or
|
... |
Backend-specific arguments forwarded to the underlying
call. See the help page of the underlying function in the
relevant backend package ( |
Details
Each backend is provided by an external R package that we list in
Suggests rather than Imports, so installing prepR4pcm does
not pull them in automatically. The error message tells you what
to install if you ask for a backend you don't have.
Name handling. Input names are run through
pr_normalize_names() before the backend is queried — underscores
become spaces, leading/trailing whitespace is trimmed, OTT-id
suffixes (e.g. ott770315) and authority strings (e.g.
(Linnaeus, 1758)) are stripped, and hybrid signs are
standardised. The matched and unmatched slots in the result use
the original input format (as you typed it), not the normalised
form.
When TNRS substitutes a name (only when tnrs = "always", or for
the fishtree backend under tnrs = "auto"), the replacement is
recorded in result$backend_meta$tnrs_replacements as a named
character vector (original = resolved). A one-shot cli warning
lists the first few substitutions on the call itself.
TNRS also returns structured match metadata. pr_get_tree() records
it per name in the mapping tibble: tnrs_number_matches,
tnrs_is_synonym, tnrs_approximate_match, and tnrs_flags. When a
name resolves to more than one taxon (tnrs_number_matches > 1, a
homonym), a one-shot cli warning names the affected species, since
the resolved name is then only one of several candidates.
Value
A list with class pr_tree_result and components:
treeA
phylo(single) ormultiPhylo(posterior) object from the chosen backend, pruned to the matched species.matchedCharacter vector of names from the user's original input (preserving the input format, including any underscores) that resolved to a tip in
tree. The dispatcher enforces that matched names are a subset ofunique(input)— TNRS substitution, normalisation, and backend-internal name juggling cannot leak intermediate names into this slot.unmatchedCharacter vector of names from the original input that did not resolve. Disjoint from
matched;length(matched) + length(unmatched) == length(unique(input))always holds. Inspect these and consider running them back throughreconcile_suggest()/ a manual override.mappingA tibble with one row per unique input species. Core columns:
input_name,normalized_name,query_name,tree_name,in_tree,match_type, andplacement_status. This is the audit trail for name handling:input_nameis what the user supplied,normalized_nameis the result ofpr_normalize_names(),query_nameis the backend query after optional TNRS,tree_nameis the actual returned tip label, andmatch_typeis one of"exact","normalized","tnrs", or"unmatched". Forsource = "rtrees",placement_statuscarries the grafting status frombackend_meta$placement; otherwise it isNA. Four further columns record whatrotl's TNRS resolver reported for each name:tnrs_number_matches,tnrs_is_synonym,tnrs_approximate_match, andtnrs_flags. These areNAfor backends ortnrssettings where TNRS did not run.tnrs_number_matches > 1flags a homonym, meaning the resolved name is only one of several candidate taxa.sourceThe backend that produced the tree.
backend_metaA named list of diagnostic information. Standard fields populated by the dispatcher:
n_queriedUnique input species count.
n_requestedThe
n_treeargument the user passed.n_returnedNumber of trees in
tree(1 forphylo).n_matchedEqual to
length(matched).tnrs_replacementsWhen TNRS ran (
tnrs = "always", ortnrs = "auto"forfishtree) androtlis installed: a named character vector mapping original input to the TNRS-resolved name, for names that TNRS changed.NULLwhen no TNRS or no replacements occurred. A one-shotcliwarning lists the first three substitutions on the call, so silent name correction is impossible.tip_set_consistentLogical. For
multiPhyloreturns:TRUEif every tree shares the same tip set.dropped_per_treeFor
multiPhyloreturns wheretip_set_consistent = FALSE: a list of character vectors, per tree, listing species missing from each tree relative to the union of all trees.NULLotherwise.tree_provenanceA list with one entry per returned tree (so
tree[[i]]pairs withbackend_meta$tree_provenance[[i]]whentreeis amultiPhylo).
Backend-specific fields (e.g.
taxon,n_grafted,grafted_tips,placementforrtrees;backend,type,tnrs_tableforfishtree/rotl;summary_format,source_citations,referencefordatelife) are merged in at the top level by the wrapper that called the backend. Thertrees-specificplacementslot is a tibble with one row per unique input species and columnsinput_name,tree_name,placement_status("exact","genus_added","family_added","skipped", or"unmatched").
References
Backend reference trees:
Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K., & Mooers, A. O.
(2012). The global diversity of birds in space and time.
Nature 491: 444–448. doi:10.1038/nature11631
(Used by rtrees for taxon = "bird" and by BirdTree.)
Rabosky, D. L., Chang, J., Title, P. O., Cowman, P. F., Sallan, L.,
Friedman, M., Kaschner, K., Garilao, C., Near, T. J., Coll, M., &
Alfaro, M. E. (2018). An inverse latitudinal gradient in speciation
rate for marine fishes. Nature 559: 392–395.
doi:10.1038/s41586-018-0273-1
(Fish Tree of Life; used by source = "fishtree" and by rtrees
for taxon = "fish".)
Upham, N. S., Esselstyn, J. A., & Jetz, W. (2019). Inferring the
mammal tree: Species-level sets of phylogenies for questions in
ecology, evolution, and conservation. PLOS Biology 17(12):
e3000494. doi:10.1371/journal.pbio.3000494
(VertLife mammal posterior; used by rtrees for taxon = "mammal"
with mammal_tree = "vertlife".)
Jin, Y. & Qian, H. (2019). V.PhyloMaker: an R package that can
generate very large phylogenies for vascular plants.
Ecography 42(8): 1353–1359. doi:10.1111/ecog.04434
(Vascular-plant mega-tree used by rtrees for taxon = "plant";
also the basis for the source = "vphylomaker" augmentation
backend in reconcile_augment().)
Sanchez Reyes, L. L., O'Meara, B. C., Brown, J. W., & McTavish, E.
J. (2024). DateLife: Leveraging databases and analytical tools to
reveal the dated Tree of Life. Systematic Biology 73(2):
470–485. doi:10.1093/sysbio/syae015
(Used by source = "datelife" and by pr_date_tree().)
Methodology:
Chang, J., Rabosky, D. L., & Alfaro, M. E. (2019). Estimating
diversification rates on incompletely sampled phylogenies:
Theoretical concerns and practical solutions. Systematic Biology
69(3): 602–611. doi:10.1093/sysbio/syz081
(Stochastic polytomy resolution behind fishtree_complete_phylogeny()
for n_tree > 1.)
Michonneau, F., Brown, J. W., & Winter, D. J. (2016). rotl: an R
package to interact with the Open Tree of Life data. Methods in
Ecology and Evolution 7(12): 1476–1481.
doi:10.1111/2041-210X.12593
(TNRS preflight and source = "rotl".)
See Also
reconcile_tree() / reconcile_data() for producing the
reconciled species list that feeds this function;
reconcile_apply() for combining the returned phylo with the
data frame ready for analysis;
reconcile_augment() for filling gaps in an existing tree
(a tree-aware alternative to retrieving a fresh tree);
pr_date_tree() for time-calibrating an existing topology;
pr_cite_tree() for formatting citations for a tree result;
pr_tree_compare() for comparing two or more retrieved trees;
pr_get_tree_status() for checking which backends are installed
and reachable;
pr_tree_cache_dir() / pr_tree_cache_status() /
pr_tree_cache_clear() for managing the on-disk cache.
The companion package
pigauto consumes a
multiPhylo directly via multi_impute_trees() for posterior-
tree PCMs — request a posterior sample with n_tree > 1.
Examples
if (interactive()) {
# Example 1: birds via clootl (Clements taxonomy). Uses the
# bundled AVONET subset (657 species placed in the Clements tree).
data(avonet_subset)
if (requireNamespace("clootl", quietly = TRUE)) {
res <- pr_get_tree(avonet_subset, species_col = "Species1",
source = "clootl")
ape::Ntip(res$tree) # species placed in the tree
head(res$unmatched) # names clootl could not resolve
}
# Example 2: fish via fishtree (Rabosky et al. 2018, time-calibrated)
if (requireNamespace("fishtree", quietly = TRUE)) {
res <- pr_get_tree(c("Salmo salar", "Esox lucius", "Gadus morhua"),
source = "fishtree")
res$tree
}
# Example 3: anything via rotl (universal, network)
if (requireNamespace("rotl", quietly = TRUE)) {
res <- pr_get_tree(c("Homo sapiens", "Pan troglodytes",
"Mus musculus"),
source = "rotl")
res$tree
}
# Example 4: posterior of fish trees (50 trees, for multi-tree PCMs)
if (requireNamespace("fishtree", quietly = TRUE)) {
res <- pr_get_tree(c("Salmo salar", "Esox lucius"),
source = "fishtree", n_tree = 50)
class(res$tree) # "multiPhylo"
}
}
Report the install status of every pr_get_tree() backend
Description
Walks every backend supported by pr_get_tree() and reports
whether the underlying package is installed (and at what version),
whether it requires network, and what to do if it's missing. Useful
for first-time users figuring out which backends are available, and
for CI sanity checks.
Usage
pr_get_tree_status(check_network = FALSE)
Arguments
check_network |
Logical. Should the probe attempt a tiny
network call to test that backends needing the network are
actually reachable? Default |
Value
A data.frame with one row per backend and columns:
sourceBackend name, as passed to
pr_get_tree().installedLogical — is the package available?
versionCharacter — installed version, or
NA.needs_networkLogical — does the backend hit a remote server at runtime?
reachableLogical or
NA— result of the network check (only populated whencheck_network = TRUE).install_hintCharacter — the install command to run when
installed = FALSE.source_repoCharacter — "CRAN" or a GitHub repo for non-CRAN backends.
See Also
pr_get_tree() / pr_date_tree() for the consumers.
Examples
# Local-only probe (fast, no network)
pr_get_tree_status()
# Also test reachability of remote backends
pr_get_tree_status(check_network = TRUE)
Load overrides from a data frame or file path
Description
Load overrides from a data frame or file path
Usage
pr_load_overrides(overrides)
Arguments
overrides |
A data frame, file path to CSV, or NULL. |
Value
A data frame with columns name_x, name_y, and optionally
user_note, or NULL.
Load a phylogenetic tree
Description
If tree is already a phylo object, returns it. If it is a file path,
attempts to read it as Newick first, then Nexus.
Usage
pr_load_tree(tree)
Arguments
tree |
An |
Value
An ape::phylo object.
Look up names in a taxonomic authority
Description
For each name, queries the configured authority and returns the accepted
name, taxonomic status, and taxon ID. Most authorities are backed by a
local taxadb database; authority = "gnverifier" calls the
Global Names HTTP verifier instead.
Usage
pr_lookup_authority(names, authority = "col", db_version = NULL)
Arguments
names |
Character vector of scientific names. |
authority |
A length-1 character vector. Authority code (e.g.,
|
db_version |
A length-1 character vector or NULL. Ignored when
|
Value
A tibble with columns: input, accepted_name, status,
taxon_id, authority.
Normalise scientific names to a canonical form
Description
Apply a sequence of deterministic text transformations so that
scientific names which differ only in formatting compare equal.
This is the same routine used by stage 2 of the matching cascade in
reconcile_data() and reconcile_tree(). Use it directly when you
want to clean a column of names without running a full
reconciliation — for example, when building a crosswalk by hand.
Usage
pr_normalize_names(
names,
rank = c("species", "subspecies"),
parser = c("internal", "gnparser")
)
Arguments
names |
A character vector of scientific names (any length;
each element is a single name). |
rank |
A length-1 character vector. Taxonomic rank to normalise to:
|
parser |
A length-1 character vector. Which parsing engine to use:
|
Details
The transformations, applied in order, are:
Replace underscores and multiple whitespace with a single space (
Homo_sapiens->Homo sapiens).Strip authority strings and year, including multi-author and parenthetical forms (
Corvus corax (Linnaeus, 1758)->Corvus corax).Strip any other trailing parenthetical qualifier, such as the Open Tree of Life homonym / rank flags that
rotlreturns (Prunella (genus in kingdom Archaeplastida)->Prunella).Fold diacritics to ASCII (
Passer domesticusstays asPasser domesticus; accented characters are simplified).Standardise case: genus capitalised, epithet lowercase.
Strip infraspecific epithets if
rank = "species".Trim whitespace and collapse leftover empty tokens.
Value
A character vector of normalised names, the same length as
names, with an attribute "normalisation_log" — a tibble
recording every non-trivial change, for auditing.
Note
On the spelling: the title and prose use British English
normalise, consistent with the package's
Language: en-GB declaration. The function identifier
pr_normalize_names() keeps the American-English z because
R-package function names conventionally use ASCII identifiers
in the form most R users expect. The two spellings are
equivalent and intentional.
See Also
reconcile_data() and reconcile_tree() for the full
four-stage matching cascade; pr_extract_tips() for pulling tip
labels out of a tree prior to normalising them.
Other name utilities:
pr_extract_tips()
Examples
pr_normalize_names(c("Homo_sapiens",
"homo sapiens",
"Parus major major",
"Corvus corax (Linnaeus, 1758)"))
# Keep trinomials
pr_normalize_names("Parus major major", rank = "subspecies")
Phylogenetic correlation matrix from a tree
Description
Convert a phylogeny into the correlation matrix used as a random-
effect structure in phylogenetic meta-analysis (metafor::rma.mv)
or phylogenetic mixed models (MCMCglmm, brms, etc.).
Usage
pr_phylo_cor(x, corr = TRUE, ...)
Arguments
x |
A |
corr |
Logical. Pass through to |
... |
Additional arguments forwarded to |
Details
Wraps ape::vcv() with corr = TRUE. Designed to slot in after
pr_get_tree() when the goal is meta-analysis, where typically:
Topology comes from Open Tree of Life (
source = "rotl") because the species span many higher taxa.Polytomies are resolved at random (
resolve_polytomies = TRUE).Branch lengths are computed via Grafen's method (
branch_lengths = "grafen") because rotl's edge lengths are unit-length placeholders.The correlation matrix is computed once and reused as
random = ~1|species'sR = list(species = phy_cor)inmetafor::rma.mv()(orrandom = ~specieswithcov.formula = ~ phyloinMCMCglmm).
The correlation matrix has the property that, for a Brownian-motion model on a tree with branch lengths in time units, two species' off-diagonal entry equals the time from root to their MRCA divided by the time from root to tip. So an ultrametric tree always has diagonal = 1 (every tip is the same distance from the root).
For meta-analysis with rotl topology + Grafen's method, the
resulting matrix is the standard Pagel's lambda = 1 phylogenetic
correlation that metafor::rma.mv() accepts directly.
Value
A square symmetric matrix with row/column names equal to
the tip labels. For multiPhylo input, a list of such matrices.
References
Paradis, E., & Schliep, K. (2019). ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics, 35(3), 526–528. doi:10.1093/bioinformatics/bty633
Cinar, O., Nakagawa, S., & Viechtbauer, W. (2022). Phylogenetic multilevel meta-analysis: a simulation study on the importance of modelling the phylogeny. Methods in Ecology and Evolution, 13(2), 383–395. doi:10.1111/2041-210X.13760
See Also
pr_get_tree() (use with branch_lengths = "grafen"
and resolve_polytomies = TRUE for the meta-analysis path);
ape::vcv() for the underlying computation.
Examples
set.seed(1)
tr <- ape::rcoal(5) # ultrametric, bifurcating
phy_cor <- pr_phylo_cor(tr)
dim(phy_cor)
all(diag(phy_cor) == 1)
# End-to-end meta-analysis prep
if (requireNamespace("rotl", quietly = TRUE)) {
res <- try(pr_get_tree(c("Homo sapiens", "Pan troglodytes",
"Mus musculus", "Rattus norvegicus"),
source = "rotl",
resolve_polytomies = TRUE,
branch_lengths = "grafen"),
silent = TRUE)
if (!inherits(res, "try-error")) {
phy_cor <- pr_phylo_cor(res)
# phy_cor can now be supplied to downstream meta-analysis models.
}
}
Resolve synonyms between two name sets
Description
For names that remain unmatched after exact and normalised matching, queries a taxonomic authority to find cases where both names resolve to the same accepted name, or where one name is a synonym of the other.
Usage
pr_resolve_synonyms(
unmatched_x,
unmatched_y,
authority = "col",
db_version = NULL,
quiet = FALSE
)
Arguments
unmatched_x |
A character vector. Unmatched names from source x. |
unmatched_y |
A character vector. Unmatched names from source y. |
authority |
A length-1 character vector. Authority code, one
of |
db_version |
A length-1 character vector, or |
quiet |
Logical. Suppresses progress messages when |
Value
A tibble with columns: name_x, name_y, name_resolved,
match_source, notes.
Run the matching cascade
Description
The central engine behind all reconcile_* functions. Applies matching
stages in strict order of decreasing confidence: exact -> normalised ->
synonym -> fuzzy. Each stage only operates on names not yet matched.
Usage
pr_run_cascade(
names_x,
names_y,
authority = "col",
db_version = NULL,
rank = "species",
overrides = NULL,
fuzzy = FALSE,
fuzzy_threshold = 0.9,
flag_threshold = 0.95,
resolve = "flag",
multi_x = FALSE,
quiet = FALSE
)
Arguments
names_x |
Character vector. Names from source x. |
names_y |
Character vector. Names from source y. |
authority |
A length-1 character vector, or |
db_version |
A length-1 character vector or NULL. |
rank |
A length-1 character vector. |
overrides |
A data.frame with columns |
fuzzy |
Logical. Enables the fuzzy-matching stage when |
fuzzy_threshold |
Numeric. Minimum similarity (0–1) for fuzzy
matches. Default |
resolve |
A length-1 character vector. How to handle low-confidence matches:
|
multi_x |
Logical. Allow multiple x names to resolve to the
same y? Default |
quiet |
Logical. |
Value
A tibble with the full mapping table.
Standardise case of scientific names
Description
Capitalises the genus (first word), lowercases everything else.
Usage
pr_standardise_case(names)
Arguments
names |
Character vector. |
Value
Character vector with standardised case.
Strip authority strings from scientific names
Description
Removes trailing author citations and year from binomials or trinomials.
Usage
pr_strip_authority(names)
Arguments
names |
Character vector. |
Value
Character vector with authority strings removed.
Strip infraspecific epithets to produce binomials
Description
Reduces trinomials and names with rank indicators to genus + species.
Usage
pr_strip_infraspecific(names)
Arguments
names |
Character vector. |
Value
Character vector of binomials.
Clear the local tree-retrieval cache
Description
Removes all cached pr_get_tree() / pr_date_tree() results. By
default asks for confirmation before deleting; pass confirm = FALSE
to skip the prompt (useful in scripts).
Usage
pr_tree_cache_clear(confirm = TRUE, source = NULL)
Arguments
confirm |
Logical. Ask interactively before deleting? Default
|
source |
A length-1 character vector or |
Value
Invisibly, the number of files removed.
See Also
pr_tree_cache_dir() / pr_tree_cache_status().
Examples
# Demo against a throwaway cache so the user's real cache is untouched
old_opt <- getOption("prepR4pcm.cache_dir")
tmp_cache <- file.path(tempdir(), "prepR4pcm-cache-demo")
pr_tree_cache_dir(tmp_cache)
# Drop two dummy entries so there is something to clear:
dir.create(file.path(tmp_cache, "fishtree"), showWarnings = FALSE)
dir.create(file.path(tmp_cache, "rotl"), showWarnings = FALSE)
saveRDS(NULL, file.path(tmp_cache, "fishtree", "abc.rds"))
saveRDS(NULL, file.path(tmp_cache, "rotl", "def.rds"))
pr_tree_cache_status() # 2 entries
pr_tree_cache_clear(confirm = FALSE) # removes both
pr_tree_cache_status() # empty
# Restore the previous cache directory
options(prepR4pcm.cache_dir = old_opt)
Get or set the local tree-retrieval cache directory
Description
Returns the path to the cache directory used by pr_get_tree() and
pr_date_tree() when called with cache = TRUE. Pass a path to
override the default.
Usage
pr_tree_cache_dir(path = NULL)
Arguments
path |
A length-1 character vector or |
Details
The default cache directory is tools::R_user_dir() with type
"cache" and the package name "prepR4pcm", which on Linux is
typically ~/.cache/R/prepR4pcm/, on macOS
~/Library/Caches/org.R-project.R/R/prepR4pcm/, and on Windows
something under %LOCALAPPDATA%\R\cache\R\prepR4pcm\.
To use a cache directory you control, pass its path explicitly with
pr_tree_cache_dir(path).
Value
A length-1 character vector — the absolute path of the cache directory.
See Also
pr_tree_cache_status() / pr_tree_cache_clear();
pr_get_tree() for the consumer.
Examples
# Default location
pr_tree_cache_dir()
old_cache <- getOption("prepR4pcm.cache_dir", NULL)
tmp_cache <- tempfile("prepR4pcm-cache-")
pr_tree_cache_dir(tmp_cache)
options(prepR4pcm.cache_dir = old_cache)
unlink(tmp_cache, recursive = TRUE)
Show the contents of the local tree-retrieval cache
Description
Lists every cache entry by source, with file size and modification
timestamp. Useful for figuring out where the disk space went or
for confirming a fresh run hit the cache.
Usage
pr_tree_cache_status()
Value
A data.frame (sorted by most recent first) with columns
source, hash, size_kb, modified. Returns an empty data
frame with the same columns when the cache is empty.
See Also
pr_tree_cache_dir() / pr_tree_cache_clear().
Examples
pr_tree_cache_status()
Compare two or more phylogenetic trees
Description
Computes a small set of standard metrics for comparing trees that come from different backends (or different runs of the same backend). Designed for the common case of "I retrieved a tree from rotl and another from fishtree — do they agree?"
Usage
pr_tree_compare(..., prune_to_common = TRUE)
Arguments
... |
Two or more |
prune_to_common |
Logical. Restrict each tree to the shared
tip set before computing topology metrics? Default |
Details
RF distance is computed via ape::dist.topo() with the default
method. Branch-length correlation matches edges by their tip-set
bipartition: for each edge in tree A, the corresponding edge in
tree B (if any) is the one that splits the same set of tips. The
Pearson correlation is taken over the matched edge-length pairs;
edges whose bipartition is absent in the other tree are dropped.
This is a proper bipartition-matched correlation as introduced in
Kuhner & Felsenstein (1994) for tree comparison.
Value
A list with class pr_tree_compare and components:
n_treesNumber of input trees.
tip_setsNamed list of character vectors, one per tree.
shared_tipsTips present in every input tree.
unique_toNamed list, one per tree, of tips present in that tree but not in every other tree.
n_sharedLength-1 integer.
pairwise_jaccardSquare matrix;
(i, j)is the Jaccard index oftip_sets[[i]] vs tip_sets[[j]].pairwise_rfSquare matrix of Robinson-Foulds distances between pairs of trees pruned to
shared_tips.NAwhen the pair has < 4 shared tips.pairwise_branch_corSquare matrix of Pearson correlations between matching edge lengths in each pair, or
NAwhen one or both trees have no branch lengths.
References
Kuhner, M. K., & Felsenstein, J. (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biology and Evolution 11(3): 459–468. doi:10.1093/oxfordjournals.molbev.a040126
Robinson, D. F., & Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical Biosciences 53(1–2): 131–147. doi:10.1016/0025-5564(81)90043-2
See Also
pr_get_tree() for retrieval; reconcile_apply() for
combining a chosen tree with a dataset.
Examples
# Two trees with identical tip sets
set.seed(1)
t1 <- ape::rtree(10)
t2 <- ape::rtree(10, tip.label = t1$tip.label)
cmp <- pr_tree_compare(t1, t2)
cmp$n_shared
cmp$pairwise_rf
# Two trees with overlapping but not identical tips
t3 <- ape::rtree(8, tip.label = t1$tip.label[1:8])
cmp <- pr_tree_compare(t1, t3)
cmp$pairwise_jaccard
Valid taxonomic authorities
Description
Returns the set of authority codes that the package accepts when
resolving species-name synonyms. Most are served by taxadb
(a local database mirroring the providers documented in
?taxadb::td_create); "gnverifier" is the one HTTP-backed
authority, calling the Global Names Architecture
verifier service instead of a
local database.
Usage
pr_valid_authorities()
Details
"col"Catalogue of Life. The default and a sensible starting point for most taxa.
"itis"Integrated Taxonomic Information System. Strong coverage for North American vertebrates and plants.
"gbif"GBIF backbone. Wider coverage; captures more recent synonymy.
"ncbi"NCBI Taxonomy. Best when you are working with sequence data.
"ott"Open Tree of Life synthetic taxonomy. Useful when your downstream phylogeny is from the Open Tree synthesis. We restrict the schema to
"dwc"(Darwin Core) when callingtaxadb::td_create()because the"common"schema does not ship for OTT under taxadb v22.12."itis_test"A small bundled subset of ITIS, cached locally with taxadb for testing. Intended for examples and unit tests; not for analysis.
"gnverifier"Global Names verifier — HTTP-backed verification against ~100 authoritative sources (Catalogue of Life, ITIS, GBIF, NCBI, Open Tree, ...). No local database is downloaded; requires network access and the httr2 package. Useful when you want broader source coverage than any single taxadb provider, or want to avoid the ~100 MB taxadb download.
Five authority codes that previous versions of the package
advertised — iucn, tpl, fb, slb, wd — are not on this
list. Empirical testing against taxadb v22.12 showed that
iucn errors with a schema mismatch and the other four are not
taxadb providers at all. Anyone who was passing one of those
values was getting a hard error; passing them now produces a
helpful migration message instead.
Validate a user-supplied authority string
Description
Used by every entry-point function that accepts authority.
Lower-cases the input, returns it unchanged if NULL (synonym
resolution skipped), errors with a helpful message if the value
was previously listed but is no longer supported, or with a
standard "unknown authority" message otherwise.
Usage
pr_validate_authority(authority, call = caller_env())
Arguments
authority |
A length-1 character vector or NULL. The user-supplied value. |
call |
Calling environment, for |
Value
The lower-cased, validated authority (or NULL).
Validate a phylo object
Description
Checks for 0 tips and duplicate tip labels.
Usage
pr_validate_tree(tree)
Arguments
tree |
An |
Value
The tree (unchanged) if valid.
Warn the user that some overrides could not be applied
Description
Emits a cli_alert_warning summarising why each rejected override
was skipped. Pointer to the full table on the result object.
Usage
pr_warn_unused_overrides(unused)
Arguments
unused |
A tibble produced by |
Value
Invisibly NULL.
Print a reconciliation summary
Description
Renders the formatted report attached to the object. Triggered
automatically by R's REPL when the object is auto-printed (i.e.
when reconcile_summary(rec) is called without assignment).
Usage
## S3 method for class 'reconciliation_summary'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (currently unused). |
Value
The object, invisibly.
Apply a reconciliation to produce an aligned data-tree pair
Description
Turn a reconciliation object into an analysis-ready data frame and
pruned phylogenetic tree whose species labels agree. This is the step
that feeds directly into caper::pgls(), MCMCglmm::MCMCglmm(),
phytools::fastAnc(), or any other PCM that expects matching names
in data and tree.
Usage
reconcile_apply(
reconciliation,
data = NULL,
tree = NULL,
species_col = NULL,
drop_unresolved = FALSE
)
Arguments
reconciliation |
A reconciliation object returned by
|
data |
A data frame to align. If |
tree |
An |
species_col |
A length-1 character vector. Column in |
drop_unresolved |
Logical. Drops unmatched rows and tips when |
Details
Rows in data whose species have no match in the tree (and tips in
tree whose species have no match in the data) are handled according
to drop_unresolved. Matched data rows are kept as-is. Matched tree
tips are renamed to the source-x (data-side) name when the tree-side
label differs, so downstream PCM software can look up tips by the
species names in your data frame.
Value
A list with two elements:
dataThe aligned data frame (or
NULLifdatawas not supplied).treeThe aligned
phyloobject (orNULLiftreewas not supplied).
See Also
reconcile_tree() to build the reconciliation;
reconcile_merge() when you want a single merged data frame
instead of aligned data + tree; reconcile_export() to write
everything to disk.
Other reconciliation functions:
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
aligned <- reconcile_apply(rec,
data = avonet_subset,
tree = tree_jetz,
species_col = "Species1",
drop_unresolved = TRUE)
nrow(aligned$data)
ape::Ntip(aligned$tree)
# aligned$data and aligned$tree are ready for downstream PCM tools
Graft missing species onto a phylogenetic tree (genus-level placement)
Description
When a reconciliation identifies species that are present in your data
but missing from the tree, reconcile_augment() attaches each missing
species as sister to a congener — i.e., a species in the same genus
already present in the tree. The result is a tree that contains every
species in your dataset, at the cost of making a strong assumption
about where the new tips sit.
Usage
reconcile_augment(
reconciliation,
tree,
where = c("genus", "near"),
branch_length = c("congener_median", "half_terminal", "zero"),
seed = NULL,
quiet = FALSE,
source = c("internal", "rtrees", "vphylomaker", "uphylomaker"),
taxon = NULL,
check_ultrametric = TRUE,
...
)
Arguments
reconciliation |
A reconciliation object, typically from
|
tree |
An |
where |
A length-1 character vector. Where to attach each new tip
(only used when
|
branch_length |
A length-1 character vector. How to set the terminal branch
length of each newly added tip (only used when
When the input tree is ultrametric, each grafted tip's terminal
edge is adjusted after placement so the augmented tree stays
ultrametric — a requirement of phylogenetic comparative
methods. |
seed |
A length-1 integer or |
quiet |
Logical. Suppress progress messages? Default |
source |
A length-1 character vector. Which grafting backend
to use. One of |
taxon |
A length-1 character vector. Required when
|
check_ultrametric |
Logical. After grafting, check that the
result is ultrametric and warn if not. Default |
... |
Additional arguments forwarded to the chosen backend:
|
Value
A list with:
- tree
The augmented
phyloobject (ormultiPhylowhensource = "rtrees"returns a posterior sample).- original
The original (unmodified)
phyloobject, for easy comparison.- augmented
A tibble documenting each added species:
species,genus,placed_near(sister tip / MRCA node /rtreesplacement note),branch_length,method,n_congeners. Forsource = "rtrees",branch_lengthandn_congenersareNAbecause the backend chooses them.- skipped
A tibble of species that could not be placed, with the reason (e.g. "No congener in tree", "rtrees did not place this species").
- meta
Provenance metadata: source, placement strategy, branch length rule, counts; for
source = "rtrees"includes abackend_metasub-list with the taxon and the number of grafted tips.
When to use this
Tip-grafting is an exploratory convenience, not a substitute for a properly inferred phylogeny. Both source modes (see below) make strong placement assumptions that are often wrong in detail. Use it to keep exploratory PCMs running while you decide how to handle orphan species, and always:
Report exactly which species were augmented (see
$augmentedin the return value).Run sensitivity analyses with and without the augmented tips.
Prefer a published imputed phylogeny (e.g. the PhyloMaker or TACT approaches) when grafting many species.
Choosing a source
"internal"(default)Genus-level placement using only your tree (no external dependencies). Each missing species is attached as sister to a congener (or at the congeneric MRCA). Fast and reproducible, but only works when the genus is already represented in the tree, and assumes the new tip diverged in roughly the same way as its congeners.
"rtrees"Delegates the grafting to the
rtreesmega-tree machinery viartrees::get_tree(tree_by_user = TRUE). Uses your tree as the backbone and letsrtreesplace each missing species using genus / family information from a taxon-specific reference tree. Requirestaxonand the GitHub-onlyrtreespackage (https://daijiang.github.io/rtrees/). Helpful when the genus is absent from your tree but present inrtrees' reference — which the internal mode would skip."vphylomaker"Plant-only alternative to
"rtrees"via either of the GitHub packages V.PhyloMaker2 (https://github.com/jinyizju/V.PhyloMaker2, preferred when installed; updated and enlarged version) or V.PhyloMaker (https://github.com/jinyizju/V.PhyloMaker, used as a fallback; original 2019 version). Callsphylo.maker(sp.list, tree, scenarios = ...)with your tree as the backbone. Use this when you want explicit control over the V.PhyloMaker placement scenario ("S1","S2", or"S3"— see Jin & Qian 2019/2022); otherwise"rtrees"withtaxon = "plant"is simpler."uphylomaker"Universal (plants + animals) variant of V.PhyloMaker, via the GitHub package U.PhyloMaker (https://github.com/jinyizju/U.PhyloMaker). Same
phylo.makerconvention but takes agen.list(a genus-family lookup) so it can graft non-plant taxa as well as plants. Use this when your tree spans multiple kingdoms and you want the V.PhyloMaker placement strategy.
Use pr_get_tree() when you have only a species list and need a
candidate tree from scratch (rotl, clootl, or rtrees). Use
reconcile_augment() when you already have a tree and want to fill
the gaps.
References
Paradis, E. & Schliep, K. (2019). ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526–528. doi:10.1093/bioinformatics/bty633
Augmentation backends:
Jin, Y. & Qian, H. (2019). V.PhyloMaker: an R package that can
generate very large phylogenies for vascular plants.
Ecography 42(8): 1353–1359. doi:10.1111/ecog.04434
(source = "vphylomaker", fallback path.)
Jin, Y. & Qian, H. (2022). V.PhyloMaker2: an updated and enlarged R
package that can generate very large phylogenies for vascular plants.
Plant Diversity 44(4): 335–339.
doi:10.1016/j.pld.2022.05.005
(source = "vphylomaker", preferred path.)
Jin, Y. & Qian, H. (2023). U.PhyloMaker: an R package that can
generate large phylogenetic trees for plants and animals.
Plant Diversity 45(3): 347–352.
doi:10.1016/j.pld.2022.12.007
(source = "uphylomaker".)
See Also
reconcile_tree() for the reconciliation step;
reconcile_apply() for the non-augmenting alternative (prune data
and tree to the intersection); pr_get_tree() for retrieving a
candidate tree from external resources when you don't have a tree
yet; pr_date_tree() for time-calibrating an existing topology;
pr_cite_tree() for formatting tree provenance citations. The
companion package
pigauto consumes the
resulting tree (or multiPhylo) directly via
multi_impute_trees() for posterior-tree PCMs.
Other reconciliation functions:
reconcile_apply(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
# --- Example 1: genus-level placement with congener_median branch lengths ---
x <- data.frame(species = c("A a", "A missing", "B c", "C absent"))
tree <- ape::read.tree(text = "((A_a:1,A_b:1):1,B_c:2);")
result <- reconcile_tree(x, tree, x_species = "species",
authority = NULL, quiet = TRUE)
aug <- reconcile_augment(result, tree, seed = 42, quiet = TRUE)
# Compare original vs augmented tree
cat("Original tips:", ape::Ntip(tree), "\n")
cat("Augmented tips:", ape::Ntip(aug$tree), "\n")
cat("Added:", nrow(aug$augmented), "| Skipped:", nrow(aug$skipped), "\n")
# Inspect which species were added and where they were placed
head(aug$augmented[, c("species", "genus", "placed_near",
"branch_length", "n_congeners")])
# Species skipped (no congener in tree)
head(aug$skipped)
# --- Example 2: MRCA placement with zero-length branches ---
aug_near <- reconcile_augment(result, tree,
where = "near",
branch_length = "zero",
seed = 42, quiet = TRUE)
cat("\nMRCA placement (zero branches):\n")
cat(" Added:", nrow(aug_near$augmented), "\n")
# Compare: MRCA placement shows genus-level context
head(aug_near$augmented[, c("species", "placed_near", "method")])
# --- Example 3: delegate grafting to rtrees ---
# Useful when the genus is missing from your tree but present in
# the rtrees taxon-specific reference tree.
if (requireNamespace("rtrees", quietly = TRUE)) {
aug_rt <- try(
reconcile_augment(result, tree,
source = "rtrees",
taxon = "bird",
quiet = TRUE),
silent = TRUE
)
if (!inherits(aug_rt, "try-error")) {
nrow(aug_rt$augmented) # how many were placed
aug_rt$meta$backend_meta$n_grafted # how many at higher rank
}
}
Convert a published taxonomy crosswalk into an overrides table
Description
Turn a curated species-name crosswalk (e.g. the BirdLife–BirdTree
crosswalk bundled as crosswalk_birdlife_birdtree, or Clements
updates released each year) into a data frame that can be passed
straight to the overrides argument of reconcile_tree(),
reconcile_data() and friends.
Usage
reconcile_crosswalk(
crosswalk,
from_col,
to_col,
match_type_col = NULL,
notes_col = NULL,
one_to_one_only = FALSE
)
Arguments
crosswalk |
A data frame, or a file path. File format is
inferred from the extension: |
from_col |
A length-1 character vector. Column name for source names (e.g.,
|
to_col |
A length-1 character vector. Column name for target names (e.g.,
|
match_type_col |
A length-1 character vector or |
notes_col |
A length-1 character vector or NULL. Column containing additional notes. |
one_to_one_only |
Logical. If |
Details
Using a crosswalk is preferable to automated synonym resolution when an authoritative mapping exists — it is reproducible, does not depend on taxadb being available, and you can point to the published source in the methods section of your paper.
Value
A data frame with columns name_x, name_y, and
user_note, ready to be passed as the overrides argument.
See Also
reconcile_override_batch() for applying this table
directly to an existing reconciliation; crosswalk_birdlife_birdtree
for the bundled example.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(crosswalk_birdlife_birdtree)
overrides <- reconcile_crosswalk(
crosswalk_birdlife_birdtree,
from_col = "Species1",
to_col = "Species3",
match_type_col = "Match.type"
)
head(overrides)
Reconcile species names between two datasets
Description
Match the species column of one data frame (x) to the species column
of another (y), returning a reconciliation object that records how
every name was resolved. Use this when combining trait datasets, range
datasets, or any other species-level tables that may use slightly
different taxonomies or spellings.
Usage
reconcile_data(
x,
y,
x_species = NULL,
y_species = NULL,
authority = "col",
rank = c("species", "subspecies"),
overrides = NULL,
db_version = NULL,
fuzzy = FALSE,
fuzzy_threshold = 0.9,
flag_threshold = 0.95,
resolve = c("flag", "first"),
quiet = FALSE,
x_label = NULL,
y_label = NULL
)
Arguments
x |
A data frame whose species will be matched from. |
y |
A data frame whose species will be matched to (typically the "reference" taxonomy or the dataset you want to merge with). |
x_species |
A length-1 character vector. Name of the column in
|
y_species |
A length-1 character vector. Name of the column in
|
authority |
A length-1 character vector, or
Five authority codes that earlier versions of the package
advertised — |
rank |
A length-1 character vector. Controls how trinomials are handled during normalisation:
|
overrides |
Optional pre-built corrections. Either a data
frame with at least columns |
db_version |
A length-1 character vector. taxadb
database snapshot to use (e.g. |
fuzzy |
Logical. Enables the fuzzy-matching stage when
|
fuzzy_threshold |
Numeric in [0, 1]. Minimum genus-weighted
similarity score for a fuzzy match to be accepted. Default |
flag_threshold |
Numeric in [0, 1]. When |
resolve |
A length-1 character vector. What to do with borderline matches:
|
quiet |
Logical. Suppresses progress messages when |
x_label |
A length-1 character vector or |
y_label |
A length-1 character vector or |
Details
Names are passed through a four-stage matching cascade, and the first
stage that returns a match is recorded in match_type:
-
exact — verbatim string equality.
-
normalized — after stripping underscores, authority strings ("Corvus corax Linnaeus, 1758"), diacritics, and case/whitespace differences.
-
synonym — lookup in a local taxonomic database via taxadb (Catalogue of Life, GBIF, ITIS, NCBI, ...). Skipped if
authority = NULL. -
fuzzy — character-level similarity (opt-in via
fuzzy = TRUE). Uses a genus-weighted Levenshtein score (60% genus, 40% specific epithet) with a genus pre-filter so that only plausibly similar genera are compared.
Names that survive all four stages are labelled unresolved. Any
entries supplied through overrides take precedence over the cascade.
After the call. A reconciliation object is the input to
most other functions in the package. Common next steps:
-
reconcile_summary()— human-readable breakdown of matches. -
reconcile_plot()— one-glance bar/pie of match composition. -
reconcile_mapping()— extract the full per-name tibble. -
reconcile_suggest()— near-miss candidates for unresolved names. -
reconcile_merge()— join the two datasets using the reconciliation as the species key. -
reconcile_report()— shareable HTML audit trail.
Value
A reconciliation object. The accompanying mapping tibble, match-type counts, provenance metadata, and applied / unused override slots are documented in reconciliation. See the "After the call" section above for the most common next steps.
References
Norman, K.E., Chamberlain, S. & Boettiger, C. (2020) taxadb: A high-performance local taxonomic database interface. Methods in Ecology and Evolution 11:1153–1159. doi:10.1111/2041-210X.13440
See Also
reconcile_tree() for matching against a phylogenetic tree;
reconcile_to_trees() / reconcile_trees() / reconcile_multi()
for multi-input workflows.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
# Merge AVONET morphology with nest-site data. Both datasets use
# slightly different taxonomies; authority = NULL keeps the example
# offline (no taxadb download).
data(avonet_subset)
data(nesttrait_subset)
rec <- reconcile_data(avonet_subset, nesttrait_subset,
x_species = "Species1",
y_species = "Scientific_name",
authority = NULL)
rec # concise print method
reconcile_summary(rec) # full breakdown
# Join the two datasets on the reconciled species key
merged <- reconcile_merge(rec, avonet_subset, nesttrait_subset,
species_col_x = "Species1",
species_col_y = "Scientific_name")
head(merged[, c("species_resolved", "Family1", "Common_name")])
Diff two reconciliations to see what changed
Description
Compare a "before" and "after" reconciliation and list every species whose outcome differs: newly matched, newly unresolved, promoted to a higher-confidence match type, or linked to a different target. Useful for:
checking the effect of adding a taxonomy crosswalk or a batch of manual overrides,
comparing two taxonomic authorities (e.g. Catalogue of Life vs GBIF),
auditing changes between runs before and after tightening the fuzzy threshold.
Usage
reconcile_diff(x, y, quiet = FALSE)
Arguments
x |
A reconciliation object — the "before" state. |
y |
A reconciliation object — the "after" state. Must be
reconciled against the same |
quiet |
Logical. Suppresses the console summary when |
Value
A list with the following components:
gainedTibble of species matched in
ybut unresolved inx.lostTibble of species matched in
xbut unresolved iny.type_changedTibble of species whose
match_typediffers between the two runs.target_changedTibble of species whose
name_ydiffers.unused_overrides_diffTibble of overrides that are in the
unused_overridesslot of one reconciliation but not the other; columnsname_x,name_y,reason,side("x"or"y").summaryA one-row tibble with counts:
n_gained,n_lost,n_type_changed,n_target_changed,n_shared,n_unused_override_diff.
See Also
reconcile_crosswalk() for building an override table from
a published taxonomy crosswalk; reconcile_override_batch() for
applying many hand edits.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
x <- data.frame(species = c("A a", "A old", "B c"))
tree <- ape::read.tree(text = "((A_a:1,A_new:1):1,B_c:2);")
# Without manual overrides
r1 <- reconcile_tree(x, tree, x_species = "species",
authority = NULL, quiet = TRUE)
# With one manual override
overrides <- data.frame(name_x = "A old", name_y = "A new",
match_type = "manual")
r2 <- reconcile_tree(x, tree, x_species = "species",
authority = NULL, overrides = overrides,
quiet = TRUE)
d <- reconcile_diff(r1, r2, quiet = TRUE)
cat("Gained:", nrow(d$gained), "| Lost:", nrow(d$lost), "\n")
Write an aligned dataset, tree, and mapping table to disk
Description
Apply a reconciliation and save three files: the aligned CSV, the
pruned tree, and the full mapping tibble. Intended for producing
analysis-ready, archivable outputs — drop the three files into a
Zenodo deposit or a project's data-output/ folder alongside the
reconciliation report and you have a fully documented provenance
trail.
Usage
reconcile_export(
reconciliation,
data = NULL,
tree = NULL,
species_col = NULL,
dir = tempfile("prepR4pcm-export-"),
prefix = "reconciled",
tree_format = c("nexus", "newick"),
drop_unresolved = TRUE
)
Arguments
reconciliation |
A reconciliation object returned by
|
data |
A data frame to align. If |
tree |
An |
species_col |
A length-1 character vector. Column name in |
dir |
A length-1 character vector. Path to the output directory
that will receive the exported files (e.g. a project's
|
prefix |
A length-1 character vector. File name prefix. Default
|
tree_format |
A length-1 character vector. Tree output format:
|
drop_unresolved |
Logical. Drops unresolved species when |
Value
A named list of file paths (invisibly):
$data (CSV), $tree (Nexus or Newick), $mapping (CSV), and
$unused_overrides (CSV; NULL when there are no rejected
overrides on the reconciliation).
See Also
reconcile_apply() for in-memory alignment without writing
to disk; reconcile_report() for a self-contained HTML audit
trail.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
result <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
out_dir <- tempfile("export_")
files <- reconcile_export(result,
data = avonet_subset, tree = tree_jetz,
species_col = "Species1",
dir = out_dir, prefix = "avonet_jetz")
files$data # path to CSV
files$tree # path to Nexus tree
files$mapping # path to mapping CSV
unlink(out_dir, recursive = TRUE) # clean up
Extract the per-name mapping table from a reconciliation
Description
Returns the mapping tibble inside a reconciliation object. Use this when you want to filter matches programmatically (e.g. pull all unresolved species, all fuzzy matches above a given score, or join the mapping back to the original data frame).
Usage
reconcile_mapping(reconciliation, include_unused_overrides = FALSE)
Arguments
reconciliation |
A reconciliation object returned by
|
include_unused_overrides |
Logical. Appends the rejected
override rows to the returned tibble when |
Value
A tibble with one row per unique name seen in either source and the following columns:
name_xStatement: this column holds the original name as it appeared in source
x(your data).NAfor rows that exist only in sourcey(e.g. tree tips not in your data).name_yStatement: this column holds the original name as it appeared in source
y(the reference dataset or tree).NAfor rows that exist only in sourcex.name_resolvedThe accepted/canonical name returned by the taxonomic authority, when synonym resolution was used.
NAwhenauthority = NULLor no synonym was found.match_typeOne of
"exact","normalized","synonym","fuzzy","manual"(set viareconcile_override()),"flagged"(low-confidence, needs review),"unresolved", or — wheninclude_unused_overrides = TRUE—"override_unused"(override row not applied because of missing names or prior matches).match_scoreNumeric in [0, 1].
1for exact/normalized/synonym/manual matches; a genus-weighted Levenshtein score for fuzzy matches;NAfor unresolved and for unused-override rows.match_sourceWhere the match came from:
"exact","normalisation", the taxadb authority code (e.g."col"),"fuzzy", or"user_override".in_xLogical. This column records whether the name was present in source
x.in_yLogical. This column records whether the name was present in source
y.notesFree-text notes, populated e.g. when a name is flagged for review or when an override carries a user comment. For
match_type = "override_unused"rows this column carries the rejection reason.
See Also
reconcile_summary() for a printed breakdown;
reconcile_suggest() for near-miss candidates for unresolved
names; reconcile_apply() to turn the mapping into an aligned
data-tree pair. The unused-override rows surfaced by
include_unused_overrides = TRUE mirror the unused_overrides
slot on the reconciliation object.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
mapping <- reconcile_mapping(rec)
# How many species matched?
sum(mapping$in_x & mapping$in_y)
# Which species are in the data but missing from the tree?
head(mapping[mapping$in_x & !mapping$in_y, c("name_x", "match_type")])
# Append rejected overrides for audit
mapping_full <- reconcile_mapping(rec, include_unused_overrides = TRUE)
Merge two reconciled datasets
Description
After reconciling two datasets with reconcile_data(), use this function
to join them into a single analysis-ready data frame. The reconciliation
mapping table provides the species-level join key, so names that differ
between the two datasets (due to formatting, synonyms, or typos) are
correctly linked.
Usage
reconcile_merge(
reconciliation,
data_x,
data_y,
species_col_x = NULL,
species_col_y = NULL,
how = c("inner", "left", "full"),
suffix = c("_x", "_y"),
drop_unresolved = FALSE
)
Arguments
reconciliation |
A reconciliation object (typically from
|
data_x |
The first data frame (source x in the reconciliation). |
data_y |
The second data frame (source y in the reconciliation). |
species_col_x |
A length-1 character vector. Species column in |
species_col_y |
A length-1 character vector. Species column in |
how |
A length-1 character vector. Join type:
|
suffix |
A length-2 character vector. Suffixes to disambiguate columns with the
same name in both datasets. Default |
drop_unresolved |
Logical. If |
Details
One row per species. reconcile_merge() works best when each dataset
has exactly one row per species. If a species appears in multiple rows
(e.g., sex-specific measurements, repeated populations), the merge
produces all pairwise combinations for that species—the same behaviour
as base merge(). To avoid unexpected row expansion, aggregate to one
row per species before merging, or be aware that the output will contain
more rows than either input.
Asymmetric datasets. When data_y contains many more species than
data_x (common when merging against a large reference database), use
how = "inner" or how = "left". Inner joins keep only the species
present in both datasets; left joins keep all data_x rows and fill
data_y columns with NA for unmatched species. Use how = "full"
only when you need to retain species unique to either side.
Recommended workflow for multi-row data. Reconcile using a
species-level summary (one row per species), inspect the mapping with
reconcile_mapping(), then join the mapping back to your full dataset
using the species column as key.
Value
A data frame with a species_resolved column as the join
key, plus all columns from both datasets (with suffixes added when
column names collide).
See Also
reconcile_data() to build the reconciliation;
reconcile_apply() when you want aligned data + tree instead of a
single merged data frame.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(nesttrait_subset)
rec <- reconcile_data(avonet_subset, nesttrait_subset,
x_species = "Species1",
y_species = "Scientific_name",
authority = NULL, quiet = TRUE)
merged <- reconcile_merge(rec, avonet_subset, nesttrait_subset,
species_col_x = "Species1",
species_col_y = "Scientific_name")
cat(sprintf("Merged: %d rows, %d cols\n", nrow(merged), ncol(merged)))
head(merged[, c("species_resolved", "Family1", "Common_name")])
Reconcile several datasets against one phylogenetic tree
Description
Match several trait or occurrence datasets against a single phylogenetic tree in one call. Species that appear in more than one dataset are reconciled once; the combined mapping records which dataset(s) each species belongs to, making it easy to identify the set of species with complete trait coverage.
Usage
reconcile_multi(
datasets,
tree,
species_cols = NULL,
authority = "col",
rank = c("species", "subspecies"),
overrides = NULL,
db_version = NULL,
fuzzy = FALSE,
fuzzy_threshold = 0.9,
resolve = c("flag", "first"),
quiet = FALSE
)
Arguments
datasets |
A named list of data frames. The names are used as
dataset labels (e.g. |
tree |
An |
species_cols |
Character vector. Species column name in each
dataset. If length 1, the same column name is used for every
dataset. Auto-detected from each data frame if |
authority |
A length-1 character vector, or
Five authority codes that earlier versions of the package
advertised — |
rank |
A length-1 character vector. Controls how trinomials are handled during normalisation:
|
overrides |
Optional pre-built corrections. Either a data
frame with at least columns |
db_version |
A length-1 character vector. taxadb
database snapshot to use (e.g. |
fuzzy |
Logical. Enables the fuzzy-matching stage when
|
fuzzy_threshold |
Numeric in [0, 1]. Minimum genus-weighted
similarity score for a fuzzy match to be accepted. Default |
resolve |
A length-1 character vector. What to do with borderline matches:
|
quiet |
Logical. Suppresses progress messages when |
Value
A reconciliation object. The mapping tibble gains one
logical column per input dataset (e.g. in_morpho, in_nests)
indicating which datasets contained each species.
See Also
reconcile_tree() for the single-dataset case;
reconcile_merge() to join two datasets after reconciliation.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(nesttrait_subset)
data(tree_jetz)
datasets <- list(
morpho = avonet_subset,
nests = nesttrait_subset
)
result <- reconcile_multi(datasets, tree_jetz,
species_cols = c("Species1", "Scientific_name"),
authority = NULL)
print(result)
Manually override a single name in a reconciliation
Description
Apply a single hand-curated decision to a reconciliation object.
Use this to accept a match the matching cascade rejected (typically
a flagged fuzzy hit), remove a spurious match, or force a new
mapping that the cascade missed. ("Cascade" here means the
four-stage matching pipeline run by reconcile_tree() and
reconcile_data() — exact, normalised, synonym, fuzzy — as
described in ?prepR4pcm.) The override is recorded in the
provenance log so that you and your reviewers can audit every
manual decision.
Usage
reconcile_override(
reconciliation,
name_x,
name_y = NULL,
action = c("accept", "reject", "replace"),
note = ""
)
Arguments
reconciliation |
A reconciliation object. |
name_x |
A length-1 character vector. The name as it appears in source |
name_y |
A length-1 character vector or |
action |
A length-1 character vector. What the override does:
|
note |
A length-1 character vector. A short justification for the override,
stored in the provenance log and in |
Details
For applying many overrides at once (e.g. from a curated CSV), see
reconcile_override_batch(); for interactive decisions in the
console, see reconcile_review(); for published taxonomy crosswalks,
see reconcile_crosswalk().
Value
An updated reconciliation object. The existing row for
name_x is replaced with one whose match_type is "manual" and
match_source is "user_override".
See Also
reconcile_override_batch() for bulk overrides;
reconcile_suggest() for near-miss candidates;
reconcile_crosswalk() for published taxonomy crosswalks.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
# Pick an unresolved species and hand-assign it for illustration
unresolved <- reconcile_mapping(rec)
unresolved <- unresolved[unresolved$match_type == "unresolved" &
unresolved$in_x, ]
if (nrow(unresolved) > 0) {
rec <- reconcile_override(
rec,
name_x = unresolved$name_x[1],
name_y = tree_jetz$tip.label[1],
note = "Demo: manual assignment"
)
}
Apply many manual corrections to a reconciliation at once
Description
A convenience wrapper around reconcile_override() for curated
batches of manual decisions.
Usage
reconcile_override_batch(reconciliation, overrides, quiet = FALSE)
Arguments
reconciliation |
A reconciliation object returned by
|
overrides |
A data frame, or a length-1 character vector giving the path to a CSV file with the same columns:
|
quiet |
Logical. Suppresses per-override success messages when
|
Details
Typical workflow: generate a CSV of corrections (by hand, or with
the help of reconcile_suggest()), check it into version control,
and apply it on every run so the corrections are reproducible and
reviewable.
Value
An updated reconciliation object with all overrides applied.
See Also
reconcile_override() for the single-override case;
reconcile_crosswalk() for building an override table from a
published taxonomy crosswalk.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
result <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
# Create a batch of overrides
batch <- data.frame(
name_x = reconcile_mapping(result)$name_x[
reconcile_mapping(result)$match_type == "unresolved" &
reconcile_mapping(result)$in_x][1:2],
name_y = tree_jetz$tip.label[1:2],
action = "accept",
note = "Batch demo",
stringsAsFactors = FALSE
)
batch <- batch[!is.na(batch$name_x), ]
if (nrow(batch) > 0) {
result2 <- reconcile_override_batch(result, batch)
}
Plot the match composition of a reconciliation
Description
Draw a one-glance bar or pie chart of how species names were resolved (exact, normalised, synonym, fuzzy, flagged, manual, unresolved). Uses base R graphics only, so no additional packages are required.
Usage
reconcile_plot(reconciliation, type = c("bar", "pie"), ...)
Arguments
reconciliation |
A reconciliation object returned by
|
type |
A length-1 character vector. Plot style:
|
... |
Additional arguments passed on to |
Value
The input reconciliation, invisibly, so you can use the
function in a pipe.
See Also
reconcile_summary() for a textual breakdown;
reconcile_report() for a full HTML audit trail.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
reconcile_plot(rec)
reconcile_plot(rec, type = "pie")
Write a self-contained HTML reconciliation report
Description
Produce an HTML file summarising a reconciliation object: provenance metadata, match-type breakdown, full mapping table, and a list of unresolved / flagged species. The file has no external dependencies (CSS is inlined), so it is suitable for sharing with collaborators, pasting into supplementary materials, or archiving next to analysis outputs.
Usage
reconcile_report(
reconciliation,
file,
title = "Reconciliation Report",
open = interactive()
)
Arguments
reconciliation |
A reconciliation object returned by
|
file |
A length-1 character vector. Output file path. Must end in |
title |
A length-1 character vector. Report title shown at the top of the page. Default is generic. |
open |
Logical. Open the finished report in the default
browser? Defaults to |
Details
Value
The file path, invisibly.
Layout
The report opens with a run header (the originating
reconcile_tree() / reconcile_data() call, timestamp, package
version), the match-coverage summary, and a compact bar chart of
match composition. Below those, per-match-type detail tables
(normalised, synonym, fuzzy, flagged) and the unresolved-species
list make each decision auditable. The bird-workflow vignette
includes annotated screenshots of both sections.
See Also
reconcile_summary() for a console equivalent;
reconcile_export() to additionally save aligned data and tree
files.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
f <- tempfile(fileext = ".html")
reconcile_report(rec, file = f, open = FALSE)
cat("Report written to:", f, "\n")
Interactively review reconciliation matches
Description
Presents matches one at a time for manual accept/reject decisions in
an interactive R session. Each accepted or rejected match is applied
via reconcile_override(), updating the reconciliation object
in place. Useful for auditing fuzzy or flagged matches in the console
or RStudio.
Usage
reconcile_review(
reconciliation,
type = c("flagged", "fuzzy", "all_unresolved"),
suggest = TRUE,
quiet = FALSE
)
Arguments
reconciliation |
A reconciliation object returned by
|
type |
A length-1 character vector. Which matches to review:
|
suggest |
Logical. If |
quiet |
Logical. If |
Details
This function requires an interactive session. In non-interactive
contexts (e.g., scripts, CI), it warns and returns reconciliation
unchanged.
At each prompt the user may enter:
aAccept the proposed match (calls
reconcile_override()withaction = "accept").rReject the match (calls
reconcile_override()withaction = "reject").sSkip – move to the next item without changes.
qQuit – return the current state immediately.
Value
An updated reconciliation object reflecting accepted and rejected decisions.
See Also
reconcile_override() and reconcile_override_batch() for
non-interactive corrections; reconcile_suggest() for shortlisting
unresolved species before review.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
if (interactive()) {
# Interactive review in RStudio console:
result <- reconcile_review(result, type = "flagged")
}
Flag taxonomic splits and lumps in a reconciliation
Description
Taxonomic revisions often split a single species into several or
lump several into one. When your data and your reference taxonomy
disagree on such cases, the reconciliation mapping will show one
name in one source linked to multiple accepted names in the other.
reconcile_splits_lumps() scans a reconciliation for these cases
and returns them as two tibbles, one for splits and one for lumps,
so you can decide how to handle each before running your PCM
(e.g. keep only one of the split taxa, pool traits across a lumped
set, or exclude them entirely).
Usage
reconcile_splits_lumps(reconciliation, quiet = FALSE)
Arguments
reconciliation |
A reconciliation object built with a
non- |
quiet |
Logical. Suppresses the console summary when |
Details
Detection relies on the name_resolved column populated by
synonym resolution — so authority must have been set (i.e. not
NULL) when building the reconciliation.
Value
Invisibly, a list with two tibbles:
splitsCases where one name in source
xcorresponds to multiple accepted names in sourcey.lumpsCases where several names in source
xshare a single accepted name in sourcey.
See Also
reconcile_diff() for comparing two reconciliations,
which surfaces the same splits/lumps across taxonomy versions.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
# `reconcile_splits_lumps()` only surfaces rows that synonym lookup
# resolved (`match_type == "synonym"`), which requires `authority`
# to be non-NULL when building the reconciliation. The bundled-data
# call below uses `authority = NULL` for speed, so the output is
# empty:
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL,
quiet = TRUE)
sl <- reconcile_splits_lumps(rec, quiet = TRUE)
nrow(sl$splits); nrow(sl$lumps) # 0 and 0
# To show what the output looks like when splits and lumps DO turn
# up, we hand-build a tiny reconciliation. In practice you would
# obtain this by calling reconcile_tree(..., authority = "col").
#
# * Acanthiza pusilla (data) was split in CoL into A. pusilla and
# A. apicalis (1 x-name -> 2 y-names ==> split).
# * Parus caeruleus and Cyanistes caeruleus (data: old + new names)
# both map to Cyanistes caeruleus in CoL
# (2 x-names -> 1 y-name ==> lump).
demo_mapping <- tibble::tibble(
name_x = c("Acanthiza pusilla", "Acanthiza pusilla",
"Parus caeruleus", "Cyanistes caeruleus"),
name_y = c("Acanthiza pusilla", "Acanthiza apicalis",
"Cyanistes caeruleus", "Cyanistes caeruleus"),
name_resolved = c("Acanthiza pusilla", "Acanthiza pusilla",
"Cyanistes caeruleus", "Cyanistes caeruleus"),
match_type = "synonym",
match_score = 1,
match_source = "col",
in_x = TRUE,
in_y = TRUE,
notes = NA_character_
)
rec_demo <- structure(
list(mapping = demo_mapping,
meta = list(type = "data_tree", authority = "col"),
counts = list(),
overrides = tibble::tibble()),
class = "reconciliation"
)
sl <- reconcile_splits_lumps(rec_demo, quiet = TRUE)
sl$splits # 1 row: Acanthiza pusilla split into 2 taxa
sl$lumps # 1 row: Parus + Cyanistes lumped into 1 taxon
Suggest near-miss matches for unresolved species
Description
For every species that the four-stage cascade failed to resolve,
reconcile_suggest() returns the top-n candidate matches in the
reference source (y). The cascade is the exact -> normalised ->
synonym -> fuzzy matching process run by reconcile_tree() and
reconcile_data() (see ?prepR4pcm). This is the most efficient
way to audit orphan species: a typo or a species epithet that
drifted by one letter will usually appear near the top of the list,
and you can then feed the fix to reconcile_override() or
reconcile_override_batch().
Usage
reconcile_suggest(reconciliation, n = 3, threshold = 0.7, quiet = FALSE)
Arguments
reconciliation |
A reconciliation object returned by
|
n |
Integer. Maximum number of suggestions to return per
unresolved species. Default |
threshold |
Numeric in [0, 1]. Minimum weighted similarity
score for a candidate to be listed. Default |
quiet |
Logical. Suppresses informational messages when |
Details
Similarity is computed from the Levenshtein edit distance between normalised names — i.e., the minimum number of character insertions, deletions and substitutions needed to turn one name into the other, divided by the length of the longer name and subtracted from 1. The final score is weighted 60% genus, 40% specific epithet, which heavily penalises genus-level disagreement while tolerating small epithet differences.
For computational efficiency on large trees, reconcile_suggest()
only compares a query name against reference names whose genus is
within 2 character edits of the query genus. This can very
occasionally miss a match where both the genus and the epithet are
badly misspelled simultaneously; if you suspect that, lower the
threshold and inspect manually.
Value
A tibble with one row per (unresolved, suggestion) pair:
unresolvedThe unresolved name from source
x.suggestionA candidate name from source
y.scoreWeighted similarity in [
threshold, 1].
Rows are sorted by unresolved then descending score, so the
first suggestion for each name is the best candidate.
See Also
reconcile_override() / reconcile_override_batch() to
act on suggestions; reconcile_review() for an interactive
alternative.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
suggestions <- reconcile_suggest(rec, n = 2, threshold = 0.85)
head(suggestions, 10)
Print a reconciliation summary to the console
Description
Produce a human-readable breakdown of a reconciliation object:
how many names matched exactly, how many were rescued by
normalisation, synonymy, or fuzzy matching, and which names remain
unresolved. Usually the second function you call after
reconcile_tree() or reconcile_data().
Usage
reconcile_summary(
reconciliation,
detail = c("full", "brief", "mismatches_only"),
format = c("console", "data.frame"),
file = NULL,
...
)
Arguments
reconciliation |
A reconciliation object returned by
|
detail |
A length-1 character vector. How much to show:
|
format |
A length-1 character vector. Where the summary goes:
|
file |
A length-1 character vector or |
... |
Additional arguments (currently unused). |
Value
A reconciliation_summary object. The formatted report
is attached to the object and rendered by
print.reconciliation_summary(). R's REPL auto-printing means
that calling the function at the prompt without assignment shows
the full report; assigning the result to a variable shows
nothing until you print(x) (or auto-print x). Use
invisible(reconcile_summary(rec)) to suppress display at the
prompt entirely.
See Also
reconcile_plot() for a visual summary;
reconcile_report() for a shareable HTML audit trail;
reconcile_mapping() for the full per-name tibble.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
reconcile_summary(rec, detail = "brief")
reconcile_summary(rec, detail = "mismatches_only")
Reconcile one dataset against multiple phylogenetic trees
Description
Takes a single data frame and matches it against each tree in a named
list, returning one reconciliation object per tree. This is the
standard workflow for generating separate tree-compatible datasets
aligned to different phylogenies (e.g., Clements 2023, 2024, 2025,
Jetz 2012).
Usage
reconcile_to_trees(
x,
trees,
x_species = NULL,
authority = "col",
rank = c("species", "subspecies"),
overrides = NULL,
db_version = NULL,
fuzzy = FALSE,
fuzzy_threshold = 0.9,
resolve = c("flag", "first"),
quiet = FALSE,
x_label = NULL
)
Arguments
x |
A data frame. |
trees |
A named list of |
x_species |
A length-1 character vector. Column name in |
authority |
A length-1 character vector, or
Five authority codes that earlier versions of the package
advertised — |
rank |
A length-1 character vector. Controls how trinomials are handled during normalisation:
|
overrides |
Optional pre-built corrections. Either a data
frame with at least columns |
db_version |
A length-1 character vector. taxadb
database snapshot to use (e.g. |
fuzzy |
Logical. Enables the fuzzy-matching stage when
|
fuzzy_threshold |
Numeric in [0, 1]. Minimum genus-weighted
similarity score for a fuzzy match to be accepted. Default |
resolve |
A length-1 character vector. What to do with borderline matches:
|
quiet |
Logical. Suppresses progress messages when |
x_label |
A length-1 character vector or |
Details
Species names in x are normalised once and reused across all trees,
so synonym lookups are not repeated.
Value
A named list of reconciliation objects, one per tree, with
the same names as trees.
See Also
reconcile_tree() for the single-tree case;
reconcile_diff() to compare two reconciliations (e.g. to quantify
how many species are gained or lost by switching taxonomies).
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
data(tree_clements25)
results <- reconcile_to_trees(
avonet_subset,
trees = list(jetz = tree_jetz, clements = tree_clements25),
x_species = "Species1",
authority = NULL
)
# Compare overlap across trees
sapply(results, function(r) r$counts$n_exact)
Reconcile species names between a dataset and a phylogenetic tree
Description
Match the species in a trait data frame (x) to the tip labels of a
phylogenetic tree (tree), producing a reconciliation object ready
to feed into reconcile_apply(), PGLS, phylogenetic GLMMs, ancestral
state reconstruction, or any other phylogenetic comparative method
(PCM). This is typically the first function you call in a prepR4pcm
workflow.
Usage
reconcile_tree(
x,
tree,
x_species = NULL,
authority = "col",
rank = c("species", "subspecies"),
overrides = NULL,
db_version = NULL,
fuzzy = FALSE,
fuzzy_threshold = 0.9,
flag_threshold = 0.95,
resolve = c("flag", "first"),
quiet = FALSE,
x_label = NULL
)
Arguments
x |
A data frame containing the trait data. Must have one column of scientific names. |
tree |
An |
x_species |
A length-1 character vector. Name of the column
in |
authority |
A length-1 character vector, or
Five authority codes that earlier versions of the package
advertised — |
rank |
A length-1 character vector. Controls how trinomials are handled during normalisation:
|
overrides |
Optional pre-built corrections. Either a data
frame with at least columns |
db_version |
A length-1 character vector. taxadb
database snapshot to use (e.g. |
fuzzy |
Logical. Enables the fuzzy-matching stage when
|
fuzzy_threshold |
Numeric in [0, 1]. Minimum genus-weighted
similarity score for a fuzzy match to be accepted. Default |
flag_threshold |
Numeric in [0, 1]. When |
resolve |
A length-1 character vector. What to do with borderline matches:
|
quiet |
Logical. Suppresses progress messages when |
x_label |
A length-1 character vector or |
Details
Internally, reconcile_tree() treats the tree's tip labels as the
y argument of reconcile_data() and runs the same four-stage
matching cascade (exact -> normalized -> synonym -> fuzzy). Tip labels
typically differ from data names only in formatting (underscores,
capitalisation, authority strings), so even with authority = NULL
you usually recover most matches at the normalized stage. Turn on
fuzzy = TRUE to also catch spelling mistakes.
After reconciliation, the typical workflow is:
Inspect with
reconcile_summary()orreconcile_plot().Investigate unresolved names with
reconcile_suggest()and fix them withreconcile_override()orreconcile_override_batch().Produce an aligned data frame and pruned tree via
reconcile_apply().Optionally, graft orphan species onto the tree with
reconcile_augment()(exploratory only; always run sensitivity analyses).
Value
A reconciliation object with meta$type == "data_tree".
The mapping tibble has one row per unique name: matched species
(in_x & in_y), data-only orphans (in_x & !in_y, candidates for
reconcile_augment()), and tree-only orphans (!in_x & in_y,
candidates for reconcile_apply() to prune).
References
Paradis, E. & Schliep, K. (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528. doi:10.1093/bioinformatics/bty633
See Also
reconcile_apply() to produce an aligned data-tree pair;
reconcile_augment() to add orphan species back to the tree;
reconcile_to_trees() to reconcile against several trees at once;
reconcile_data() for the data-only counterpart.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_trees()
Examples
# Reconcile the bundled AVONET subset against the Jetz et al. (2012)
# bird tree. `authority = NULL` keeps the example offline; in a real
# analysis you would usually set `authority = "col"` (Catalogue of
# Life) to pick up taxonomic synonyms.
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(
avonet_subset, tree_jetz,
x_species = "Species1",
authority = NULL,
fuzzy = TRUE # also catch typos
)
rec # one-line status
reconcile_summary(rec) # full breakdown by match type
# Produce aligned data + pruned tree ready for PGLS / PGLMM
aligned <- reconcile_apply(rec,
data = avonet_subset,
tree = tree_jetz,
species_col = "Species1",
drop_unresolved = TRUE)
nrow(aligned$data)
ape::Ntip(aligned$tree)
Reconcile tip labels between two phylogenetic trees
Description
Compare the tip labels of two phylogenetic trees and report which species are shared, which differ only in formatting or synonymy, and which appear in only one of the two trees. Use this when assessing the impact of switching phylogenies (e.g., Jetz et al. 2012 vs Clements 2025) before deciding which tree to use in a downstream PCM.
Usage
reconcile_trees(
tree1,
tree2,
authority = "col",
rank = c("species", "subspecies"),
overrides = NULL,
db_version = NULL,
fuzzy = FALSE,
fuzzy_threshold = 0.9,
resolve = c("flag", "first"),
quiet = FALSE
)
Arguments
tree1 |
An |
tree2 |
An |
authority |
A length-1 character vector, or
Five authority codes that earlier versions of the package
advertised — |
rank |
A length-1 character vector. Controls how trinomials are handled during normalisation:
|
overrides |
Optional pre-built corrections. Either a data
frame with at least columns |
db_version |
A length-1 character vector. taxadb
database snapshot to use (e.g. |
fuzzy |
Logical. Enables the fuzzy-matching stage when
|
fuzzy_threshold |
Numeric in [0, 1]. Minimum genus-weighted
similarity score for a fuzzy match to be accepted. Default |
resolve |
A length-1 character vector. What to do with borderline matches:
|
quiet |
Logical. Suppresses progress messages when |
Value
A reconciliation object with meta$type == "tree_tree".
See Also
reconcile_diff() to quantify gains/losses between two
reconciliations; reconcile_to_trees() when you want to match a
single dataset against many trees at once.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree()
Examples
data(tree_jetz)
data(tree_clements25)
rec <- reconcile_trees(tree_jetz, tree_clements25, authority = NULL)
rec
# How many tips are shared across both trees?
sum(reconcile_mapping(rec)$in_x & reconcile_mapping(rec)$in_y)
The reconciliation S3 class
Description
A reconciliation object is the shared data structure that every
matching function in prepR4pcm returns, and that every
downstream function consumes. You will never build one by hand;
call reconcile_tree(), reconcile_data(), reconcile_trees(),
reconcile_to_trees(), or reconcile_multi() instead. This page
documents the structure so you can poke at the internals when
debugging or writing custom helpers.
Usage
new_reconciliation(
mapping,
meta,
counts = NULL,
overrides = NULL,
unused_overrides = NULL
)
Arguments
mapping |
A tibble with the mapping table (see above). |
meta |
A named list of provenance metadata. |
counts |
A named list of summary counts. Computed from
|
overrides |
A tibble of manual overrides (empty by default). |
unused_overrides |
A tibble of overrides that could not be
applied, with columns |
Value
An object of class reconciliation.
Structure
A reconciliation is an S3 list with five components:
mappingA tibble with one row per unique name seen in either source. Columns are documented in
reconcile_mapping():name_x,name_y,name_resolved,match_type(one of"exact","normalized","synonym","fuzzy","manual","flagged","unresolved", or — when surfaced viareconcile_mapping(include_unused_overrides = TRUE)—"override_unused"),match_score,match_source,in_x,in_y,notes.metaA named list of provenance metadata — call signature, timestamp, source labels, taxonomic authority, fuzzy settings, resolve mode, rank, prepR4pcm version.
countsA named list of match-type counts, used by the print method and by
reconcile_summary().overridesA tibble logging manual corrections applied via
reconcile_override()orreconcile_override_batch().unused_overridesA tibble of overrides that the cascade could NOT apply, with columns
name_x,name_y, andreason(one ofname_x_not_in_data,name_y_not_in_target, oralready_matched). Empty when no overrides were supplied or when every override applied successfully. Surfaced inreconcile_summary(),reconcile_report()(HTML),reconcile_export()(as<prefix>_unused_overrides.csv), andreconcile_mapping(include_unused_overrides = TRUE).
Methods
Standard S3 methods are defined for print(), summary() (which
dispatches to reconcile_summary()), and format().
Accessing the object
-
reconcile_mapping()— extract the per-name tibble. -
reconcile_summary()— human-readable breakdown. -
reconcile_apply()— align data and tree. -
reconcile_merge()— join two datasets. -
reconcile_override()/reconcile_override_batch()— manual corrections.
Clements 2025 phylogenetic tree (subset)
Description
A pruned version of the Clements 2025 taxonomy phylogenetic tree, containing ~850 species from the same families. Larger than tree_jetz because the Clements taxonomy recognises more species in these clades. Tip labels use underscores.
Usage
tree_clements25
Format
An object of class phylo (from the ape package).
Source
Clements et al. (2025) eBird/Clements Checklist of Birds of the World, v2025.
Jetz (2012) phylogenetic tree (subset)
Description
A pruned version of the BirdTree Stage 2 maximum clade credibility tree
(Hackett backbone), containing ~660 species from the Corvoidea and
allied passerine families. Deliberately smaller than avonet_subset
(~920 species) so that reconciliation produces unresolved species
suitable for reconcile_augment(). Tip labels use underscores.
Usage
tree_jetz
Format
An object of class phylo (from the ape package).
Source
Jetz et al. (2012) The global diversity of birds in space and time. Nature 491:444–448. doi:10.1038/nature11631
Validate a reconciliation object
Description
Checks that all required components are present and correctly typed.
Usage
validate_reconciliation(reconciliation)
Arguments
reconciliation |
A |
Value
reconciliation, invisibly, if valid. Throws an error otherwise.