Forcing non-endemic singletons: solutions and limitations

library(DAISIEprep)
library(ape)
library(phylobase)
#> 
#> Attaching package: 'phylobase'
#> The following object is masked from 'package:ape':
#> 
#>     edges

Problem statement

When using the “asr” extraction method, non-endemic species that are closely phylogenetically related can be potentially erroneously lumped into a single clade. This clade is labelled endemic as the DAISIE model cannot handle non-endemic clades. For example, see the plot below which shows 3 plant species that are grouped into a single clade even though they are non-endemic island species.

set.seed(
  1,
  kind = "Mersenne-Twister",
  normal.kind = "Inversion",
  sample.kind = "Rejection"
)
phylo <- ape::rcoal(10)
phylo$tip.label <- c("Plant_a", "Plant_b", "Plant_c", "Plant_d", "Plant_e",
                     "Plant_f", "Plant_g", "Plant_h", "Plant_i", "Plant_j")
phylo <- phylobase::phylo4(phylo)
endemicity_status <- c("not_present", "not_present", "not_present", 
                       "not_present", "not_present", "not_present",
                       "nonendemic", "nonendemic", "nonendemic",
                       "not_present")
phylod <- phylobase::phylo4d(phylo, as.data.frame(endemicity_status))
phylod <- add_asr_node_states(phylod = phylod, asr_method = "mk", rate_model = "ER", tie_preference = "mainland")
plot_phylod(phylod = phylod)

If you have a priori knowledge that the non-endemic species should be separate colonisations then this can be taken into account when extracting the phylogenetic island community data using extract_island_species(). For example, if you have a group of closely related geographically widespread species, then it is reasonable to assume they did not speciate on the island.

First solution: forcing non-endemic island species to be singletons

The extract_island_species() function has an argument force_nonendemic_singleton, which by default is FALSE which fully trusts the ancestral state reconstruction (see add_asr_node_states()) allowing non-endemics to be grouped into an endemic clade. The force_nonendemic_singleton argument is only active when using the extraction_method = "asr", if using extraction_method = "min" then force_nonendemic_singleton will be ignored. For example:

island_tbl <- extract_island_species(
  phylod = phylod,
  extraction_method = "asr", 
  force_nonendemic_singleton = FALSE
)
island_tbl
#> Class:  Island_tbl 
#>   clade_name  status missing_species  col_time col_max_age branching_times
#> 1    Plant_g endemic               0 0.7648553       FALSE    0.380034....
#>   min_age      species clade_type
#> 1      NA Plant_g,....          1

As you can see the code above extracts only a single colonisation event resulting in an endemic clade.

However, if force_nonendemic_singleton is set to TRUE then closely related non-endemic island species will be split into single-species island colonists (see Limitations section below for cases when this does not work). For example:

island_tbl <- extract_island_species(
  phylod = phylod,
  extraction_method = "asr", 
  force_nonendemic_singleton = TRUE
)
island_tbl
#> Class:  Island_tbl 
#>   clade_name     status missing_species   col_time col_max_age branching_times
#> 1    Plant_g nonendemic               0 0.38003405       FALSE              NA
#> 2    Plant_h nonendemic               0 0.04960523       FALSE              NA
#> 3    Plant_i nonendemic               0 0.04960523       FALSE              NA
#>   min_age species clade_type
#> 1      NA Plant_g          1
#> 2      NA Plant_h          1
#> 3      NA Plant_i          1

In this case, the force_nonendemic_singleton = TRUE now allows non-endemics to be extracted as separate colonisation events to the island.

Limitation

The first limitation of the approach of setting force_nonendemic_singleton to TRUE is it does not work when the non-endemic island species are reconstructed to be embedded within an endemic clade. For example, take a look at the tree below and see how the ancestral state reconstruction has a node on the island that is the ancestor of both the endemic and non-endemic species:

endemicity_status <- c("endemic", "endemic", "endemic", 
                       "endemic", "endemic", "not_present",
                       "nonendemic", "nonendemic", "nonendemic",
                       "not_present")
phylod <- phylobase::phylo4d(phylo, as.data.frame(endemicity_status))
phylod <- add_asr_node_states(phylod = phylod, asr_method = "mk", rate_model = "ER", tie_preference = "mainland")
plot_phylod(phylod = phylod)

In this case even when setting force_nonendemic_singleton = TRUE the clade is still extracted as a single endemic clade.

island_tbl <- extract_island_species(
  phylod = phylod,
  extraction_method = "asr", 
  force_nonendemic_singleton = TRUE
)
#> Warning in rm_nonendemic_in_clade(phylod = phylod, island_tbl = island_tbl): Non-endemic species may be grouped within an endemic clade.
#> force_nonendemic_singleton cannot remove non-endemic species from endemic clades in these cases.
island_tbl
#> Class:  Island_tbl 
#>   clade_name  status missing_species col_time col_max_age branching_times
#> 1    Plant_a endemic               0 1.721423       FALSE    0.764855....
#>   min_age      species clade_type
#> 1      NA Plant_a,....          1

As shown above a warning is printed to make you aware that this issue is known and to alert you of possible erroneous results downstream if using this data, for example fitting the DAISIE model.

A second limitation of forcing the non-endemic island species to be separate colonisation events, and in effect overwriting the ancestral state reconstruction, is that the order of the extraction matters. In other words, the order in which extract_island_species() goes through the phylogeny extracting the species will influence the extracted island community data. Below we show two examples on the same tree where the order of the endemic and non-endemic species is reversed and how this affects the extraction, the extraction starts from the species labelled “Plant_a”.

newick <- "(Plant_e:0.9,((Plant_a:0.3,Plant_b:0.3):0.3,(Plant_c:0.3,Plant_d:0.3):0.3):0.3);"
phylo <- ape::read.tree(text = newick)
phylo <- phylobase::phylo4(phylo)
endemicity_status <- c("not_present", "endemic", "endemic", "nonendemic", 
                       "nonendemic")
phylod <- phylobase::phylo4d(phylo, as.data.frame(endemicity_status))
phylod <- add_asr_node_states(phylod = phylod, asr_method = "mk", rate_model = "ER", tie_preference = "mainland")
plot_phylod(phylod = phylod)

island_tbl <- extract_island_species(
  phylod = phylod,
  extraction_method = "asr", 
  force_nonendemic_singleton = TRUE
)
#> Warning in extract_species_asr(phylod = phylod, species_label = as.character(phylod@label[i]), : Root of the phylogeny is on the island so the colonisation
#>               time from the stem age cannot be collected, colonisation time
#>               will be set to infinite.
#> Warning in rm_nonendemic_in_clade(phylod = phylod, island_tbl = island_tbl): Non-endemic species may be grouped within an endemic clade.
#> force_nonendemic_singleton cannot remove non-endemic species from endemic clades in these cases.
island_tbl
#> Class:  Island_tbl 
#>   clade_name  status missing_species col_time col_max_age branching_times
#> 1    Plant_a endemic               0      Inf       FALSE    0.6, 0.3....
#>   min_age      species clade_type
#> 1      NA Plant_a,....          1

When we reverse the order of the endemic and non-endemic clades within the tree, the extraction changes when force_nonendemic_singleton = TRUE, even though the ancestral state reconstruction is exactly the same.

endemicity_status <- c("not_present", "nonendemic", "nonendemic", "endemic", 
                       "endemic")
phylod <- phylobase::phylo4d(phylo, as.data.frame(endemicity_status))
phylod <- add_asr_node_states(phylod = phylod, asr_method = "mk", rate_model = "ER", tie_preference = "mainland")
plot_phylod(phylod = phylod)

island_tbl <- extract_island_species(
  phylod = phylod,
  extraction_method = "asr", 
  force_nonendemic_singleton = TRUE
)
#> Warning in extract_species_asr(phylod = phylod, species_label = as.character(phylod@label[i]), : Root of the phylogeny is on the island so the colonisation
#>               time from the stem age cannot be collected, colonisation time
#>               will be set to infinite.
#> Warning in rm_nonendemic_in_clade(phylod = phylod, island_tbl = island_tbl): Non-endemic species may be grouped within an endemic clade.
#> force_nonendemic_singleton cannot remove non-endemic species from endemic clades in these cases.
island_tbl
#> Class:  Island_tbl 
#>   clade_name     status missing_species col_time col_max_age branching_times
#> 1    Plant_a nonendemic               0      0.3       FALSE              NA
#> 2    Plant_b nonendemic               0      0.3       FALSE              NA
#> 3    Plant_c    endemic               0      Inf       FALSE             0.3
#>   min_age      species clade_type
#> 1      NA      Plant_a          1
#> 2      NA      Plant_b          1
#> 3      NA Plant_c,....          1

This is a known and unwanted issue that the developers of DAISIEprep are working to fix it. For now we recommend to plot the phylogeny with the tip and node states and check if the non-endemic island species need to be separate colonisations (i.e. force_nonendemic_singleton = TRUE), if so please be cautious of the influence of extraction order.

This second issue of extraction order is only applicable when extraction_method = TRUE and force_nonendemic_singleton = TRUE, when using the min extraction algorithm or force_nonendemic_singleton = FALSE (the default behaviour) then there is no issue with the order of extraction.

Second solution: assigning “DAISIE endemics”

In some cases where one or more species from an endemic island radiation have colonised the mainland you may want to reclassify the non-endemic island species (i.e. on the island and the mainland) as an island endemic. We term these “DAISIE endemics” as they are not true island endemics, but from the perspective of the DAISIE framework they are the result of speciation on the island and later expanded their range to the mainland. If scoring these non-endemic species as non-endemic it can lead to complex biogeographical reconstructions that may end up breaking up an endemic island radiation into multiple lineages (i.e. multiple island colonisations); and also because this may lead to an underestimation of the speciation rates on the island if the DAISIE model is fitted to the data. By classifying those species as “DAISIE endemics” we can avoid these problems.

set.seed(
  1,
  kind = "Mersenne-Twister",
  normal.kind = "Inversion",
  sample.kind = "Rejection"
)
phylo <- ape::rcoal(10)
phylo$tip.label <- c("Plant_a", "Plant_b", "Plant_c", "Plant_d", "Plant_e",
                     "Plant_f", "Plant_g", "Plant_h", "Plant_i", "Plant_j")
phylo <- phylobase::phylo4(phylo)
endemicity_status <- c("endemic", "endemic", "endemic", 
                       "endemic", "nonendemic", "endemic",
                       "not_present", "not_present", "not_present",
                       "not_present")
phylod <- phylobase::phylo4d(phylo, as.data.frame(endemicity_status))
phylod <- add_asr_node_states(phylod = phylod, asr_method = "mk", rate_model = "ER", tie_preference = "mainland")
plot_phylod(phylod = phylod)

By assinging species that were likely endemic before recolonising the mainland or expanding their range to another region as endemic it resolves issues of incorrectly breaking apart island radiations when using the min extraction method, or creating overly complex reconstructions when using add_asr_node_states() and the asr extraction method (although this latter asr approach is less vulnerable to erroneous reconstructions in many of these back-colonisation scenarios).

The example below is the same phylogeny but the “Plant_e” species is now reclassified as endemic (“DAISIE endemic”) for the purposes of reconstruction and extraction as it is judged by the empiricist to be the result of speciation on the island, and thus from the perspective of evolutionary processes on the island it is endemic.

endemicity_status <- c("endemic", "endemic", "endemic", 
                       "endemic", "endemic", "endemic",
                       "not_present", "not_present", "not_present",
                       "not_present")
phylod <- phylobase::phylo4d(phylo, as.data.frame(endemicity_status))
phylod <- add_asr_node_states(phylod = phylod, asr_method = "mk", rate_model = "ER", tie_preference = "mainland")
plot_phylod(phylod = phylod)

In the two cases shown above using the asr extraction would have resulted in the same island community data, but when using larger, more complex phylogenetic and biogeographic data (e.g. birds of Madagascar) using the “DAISIE endemics” approach can help simplify data processing.

Limitations

This approach is only valid if you know the island system well and are sure that the non-endemic species within an island radiation are most likely the result of island speciation. By augmenting the endemicity status of the island species in the way outlined above it introduce biological inaccuracies if not done with care.

In the case of the issue of having non-endemics that you believe to be true non-endemmic island species, and these are embedded within an island radiation of island endemic species, then this issue outlined at the start of this vignette remains and neither reclassifying nor using force_nonendemic_singleton = TRUE will resolve the issue.

Solution are not mutually exclusive

The two solutions covered in this article can be used jointly. For example if you have an endemic radiation with a single non-endemic and a group of closely related non-endemic species in a different part of the phylogeny then the endemicity status can be updated and force_nonendemic_singleton = TRUE when calling extract_island_species(). For example:

set.seed(
  1,
  kind = "Mersenne-Twister",
  normal.kind = "Inversion",
  sample.kind = "Rejection"
)
phylo <- ape::rcoal(10)
phylo$tip.label <- c("Plant_a", "Plant_b", "Plant_c", "Plant_d", "Plant_e",
                     "Plant_f", "Plant_g", "Plant_h", "Plant_i", "Plant_j")
phylo <- phylobase::phylo4(phylo)
endemicity_status <- c("not_present", "nonendemic", "endemic", 
                       "endemic", "endemic", "not_present",
                       "not_present", "nonendemic", "nonendemic",
                       "not_present")
phylod <- phylobase::phylo4d(phylo, as.data.frame(endemicity_status))
phylod <- add_asr_node_states(phylod = phylod, asr_method = "mk", rate_model = "ER", tie_preference = "mainland")
plot_phylod(phylod = phylod)

island_tbl <- extract_island_species(
  phylod = phylod,
  extraction_method = "asr"
)
island_tbl
#> Class:  Island_tbl 
#>   clade_name  status missing_species  col_time col_max_age branching_times
#> 1    Plant_b endemic               0 0.7648553       FALSE    0.061465....
#> 2    Plant_h endemic               0 0.7648553       FALSE    0.049605....
#>   min_age      species clade_type
#> 1      NA Plant_d,....          1
#> 2      NA Plant_h,....          1

In this example the island community data extracted is two endemic clades (by default in extract_island_species() the force_nonendemic_singleton = FALSE).

If we set the non-endemic species in the island radiation to endemic using the “DAISIE endemic” reasoning stated above, and by manually setting force_nonendemic_singleton = TRUE we can acheive, for particular scenarios, be better extraction.

endemicity_status <- c("not_present", "endemic", "endemic", 
                       "endemic", "endemic", "not_present",
                       "not_present", "nonendemic", "nonendemic",
                       "not_present")
phylod <- phylobase::phylo4d(phylo, as.data.frame(endemicity_status))
phylod <- add_asr_node_states(phylod = phylod, asr_method = "mk", rate_model = "ER", tie_preference = "mainland")
plot_phylod(phylod = phylod)

island_tbl <- extract_island_species(
  phylod = phylod,
  extraction_method = "asr",
  force_nonendemic_singleton = TRUE
)
#> Warning in rm_nonendemic_in_clade(phylod = phylod, island_tbl = island_tbl): Non-endemic species may be grouped within an endemic clade.
#> force_nonendemic_singleton cannot remove non-endemic species from endemic clades in these cases.
island_tbl
#> Class:  Island_tbl 
#>   clade_name     status missing_species   col_time col_max_age branching_times
#> 1    Plant_b    endemic               0 0.58496106       FALSE    0.061465....
#> 2    Plant_h nonendemic               0 0.04960523       FALSE              NA
#> 3    Plant_i nonendemic               0 0.04960523       FALSE              NA
#>   min_age      species clade_type
#> 1      NA Plant_d,....          1
#> 2      NA      Plant_h          1
#> 3      NA      Plant_i          1

DAISIEprep development

The DAISIEprep team are actively working on the issues outlined in this vignette and hope to make improvements over the next few months. Please keep an eye on new versions of the DAISIEprep package being released in case improvements become available.