Analysis of Disparity: Estimating and Comparing How Variable Phenotype Is

Introduction

This vignette demonstrates how to use the disparity_resample() and disparity_test() functions to estimate and compare morphospace occupation (disparity) between groups. These functions provide complementary approaches for disparity analysis:

disparity_resample(): Provides resampling-based estimates (bootstrap or rarefaction) with confidence intervals
disparity_test(): Performs permutation tests to assess statistical differences between two groups

Disparity analysis is fundamental in geometric morphometrics and evolutionary biology to quantify how much morphological variation exists within groups. Imagine two populations of the same species and we don’t want to know whether the average shape of one population is different from the average shape of the other population. Rather, we want to know whether the amount of variability in shape is different between the two populations.

Basic Concepts

Disparity (also called “morphospace occupation”) can be quantified using several statistics. The ones implemented in GeometricMorphometricsMix are:

Multivariate variance: Sum of variances across all variables (trace of covariance matrix)
Mean pairwise Euclidean distance: Average distance between all pairs of observations
Convex hull volume: Volume of the smallest convex hull containing all observations
Claramunt proper variance: Variance based on linear shrinkage covariance estimates

Statistical approaches

GeometricMorphometricsMix provides three main statistical approaches, all based on resampling, for comparing disparity across groups:

Bootstrapping (resampling with replacement) of the statistic of choice to derive confidence intervals (this is the most sensible choice in most cases, particularly with more than two groups)
Rarefaction (resampling without replacement to a common sample size) to account for differences in sample sizes between groups (this is useful when groups have very different sample sizes and we use statistics sensitive to sample size)
Permutation tests between two groups to assess whether they differ significantly in disparity (or, more formally, to test the null hypothesis that they have the same amount of variation)

Bootstrapping and rarefaction are implemented in the disparity_resample() function, permutation tests in the disparity_test() function.

Simulate Example Data

set.seed(123)

if (requireNamespace("MASS", quietly = TRUE)) {
  # Group A: smaller, more compact group
  grpA = MASS::mvrnorm(25, mu = rep(0, 8), Sigma = diag(8) * 0.5)
  # Group A: 25 observations, centered at origin, low variance
  
  # Group B: larger, more dispersed group  
  grpB = MASS::mvrnorm(40, mu = rep(2, 8), Sigma = diag(8) * 1.5)
  # Group B: 40 observations, shifted mean, higher variance
  
  # Group C: intermediate size and dispersion
  grpC = MASS::mvrnorm(30, mu = rep(-1, 8), Sigma = diag(8) * 1.0)
  # Group C: 30 observations, negative mean, intermediate variance
  
  # Combine data
  Data = rbind(grpA, grpB, grpC)
  groups = factor(c(rep("A", nrow(grpA)), rep("B", nrow(grpB)), rep("C", nrow(grpC))))
  # Combined dataset with group labels
  
  cat("Sample sizes:\n")
  table(groups)
  # Display sample sizes for each group
}
#> Sample sizes:
#> groups
#>  A  B  C 
#> 25 40 30

Bootstrap Analysis

Bootstrap resampling provides confidence intervals for disparity estimates by resampling with replacement from the original data.

if (requireNamespace("MASS", quietly = TRUE)) {
  # Bootstrap multivariate variance
  boot_mv = disparity_resample(Data, 
                               group = groups, 
                               n_resamples = 1000,
                               statistic = "multivariate_variance",
                               bootstrap_rarefaction = "bootstrap",
                               CI = 0.95)
  # Bootstrap analysis of multivariate variance with 95% CI
  
  print(boot_mv)
  # Display formatted results with CI overlap assessment
  
  # Bootstrap mean pairwise Euclidean distance
  boot_ed = disparity_resample(Data,
                               group = groups,
                               n_resamples = 1000, 
                               statistic = "mean_pairwise_euclidean_distance",
                               bootstrap_rarefaction = "bootstrap")
  # Bootstrap analysis of mean pairwise Euclidean distances
  
  cat("\nMean pairwise Euclidean distance results:\n")
  boot_ed$results
  # Direct access to results table
}
#> Warning: package 'future' was built under R version 4.5.2
#> Disparity resampling results
#> ===========================
#> 
#> Statistic: Multivariate variance 
#> Confidence level: 95% 
#> 
#>  group   average   CI_min    CI_max
#>      A  3.408176 2.764179  4.099788
#>      B 11.507280 9.648819 13.493110
#>      C  7.817262 6.375453  9.238869
#> 
#> Confidence interval overlap assessment:
#> At least one pair of confidence intervals does not overlap.
#> 
#> 
#> Mean pairwise Euclidean distance results:
#>   group  average   CI_min   CI_max
#> A     A 2.470406 2.206519 2.703814
#> B     B 4.585098 4.198769 4.956907
#> C     C 3.767456 3.380240 4.105780

Rarefaction Analysis

Rarefaction resampling accounts for differences in sample sizes by resampling without replacement to a common sample size.

if (requireNamespace("MASS", quietly = TRUE) && requireNamespace("geometry", quietly = TRUE)) {
  # Bootstrap convex hull volume
  rare_hull = disparity_resample(prcomp(Data)$x[,seq(3)],
                                 group = groups,
                                 n_resamples = 200,
                                 statistic = "convex_hull_volume",
                                 bootstrap_rarefaction = "rarefaction",
                                 sample_size = "smallest")
  # Rarefaction analysis of convex hull volume
  # Note: fewer resamples due to computational intensity, using the scores along
  # the first few principal components due to the potential issues with the convex hull and high dimensional data

  print(rare_hull)
  # Convex hull results - Group B should have largest volume
}
#> Disparity resampling results
#> ===========================
#> 
#> Statistic: Convex hull volume 
#> Confidence level: 95% 
#> 
#>  group   average    CI_min    CI_max
#>      A  5.202352  5.202352  5.202352
#>      B 51.499119 36.647855 63.410444
#>      C 19.587730 14.935754 22.268621
#> 
#> Confidence interval overlap assessment:
#> At least one pair of confidence intervals does not overlap.

Visualization

The plot method creates confidence interval plots for visual comparison of disparity estimates.

  # Plot bootstrap multivariate variance results
   plot(boot_mv)

  # Plot rarefaction for convex hull volume
  plot(rare_hull)

  
  
  cat("Plotting methods create ggplot2 confidence interval plots\n")
#> Plotting methods create ggplot2 confidence interval plots
  cat("showing average values and confidence intervals for each group.\n")
#> showing average values and confidence intervals for each group.

Permutation Tests Between Two Groups

The disparity_test() function performs permutation tests to assess whether two groups differ significantly in disparity.

if (requireNamespace("MASS", quietly = TRUE)) {
  # Test Groups A vs B (different variances expected)
  test_AB = disparity_test(grpA, grpB, perm = 999)
  # Permutation test between groups A and B
  
  cat("Groups A vs B comparison:\n")
  print(test_AB)
  # Results show observed values, differences, and p-values
  
  # Test Groups A vs C (more similar variances expected)  
  test_AC = disparity_test(grpA, grpC, perm = 999)
  # Permutation test between groups A and C
  
  cat("\nGroups A vs C comparison:\n")
  print(test_AC)
  # Compare groups with more similar dispersions
}
#> Groups A vs B comparison:
#>                                  Observed_grp1 Observed_grp2 difference p_value
#> Multivariate variance                 3.538886     11.795823   8.256936   0.001
#> Mean pairwise Euclidean distance      2.578045      4.699841   2.121796   0.001
#> 
#> Groups A vs C comparison:
#>                                  Observed_grp1 Observed_grp2 difference p_value
#> Multivariate variance                 3.538886      8.053470   4.514584   0.001
#> Mean pairwise Euclidean distance      2.578045      3.891326   1.313282   0.001

Univariate Data Analysis

disparity_resample() also works with univariate data (vectors), defaulting to variance as test statistic.

# Simulate univariate data
set.seed(456)
uni_A = rnorm(30, mean = 0, sd = 1)
# Group A: normal distribution, sd=1

uni_B = rnorm(35, mean = 0, sd = 2) 
# Group B: normal distribution, sd=2 (higher variance)

uni_data = c(uni_A, uni_B)
uni_groups = factor(c(rep("A", length(uni_A)), rep("B", length(uni_B))))
# Combined univariate dataset

# Bootstrap analysis of univariate variance
uni_boot = disparity_resample(uni_data,
                              group = uni_groups,
                              n_resamples = 1000,
                              bootstrap_rarefaction = "bootstrap")
# Bootstrap for univariate data (statistic argument ignored)

cat("Univariate variance analysis:\n")
#> Univariate variance analysis:
print(uni_boot)
#> Disparity resampling results
#> ===========================
#> 
#> Statistic: Variance 
#> Confidence level: 95% 
#> 
#>  group  average    CI_min   CI_max
#>      A 1.292491 0.8856887 1.756390
#>      B 3.187117 1.5397112 5.149367
#> 
#> Confidence interval overlap assessment:
#> All confidence intervals overlap.
# Group B should show higher variance


plot(uni_boot)

# Plotting univariate bootstrap results

Advanced: Single Group Analysis

disparity_resample() can analyze single groups without group comparisons. This might be useful to obtain confidence intervals or estimates in a single group to compare it to a known value or interval.

if (requireNamespace("MASS", quietly = TRUE)) {
  # Single group bootstrap analysis
  single_boot = disparity_resample(grpB,
                                   n_resamples = 500,
                                   statistic = "multivariate_variance",
                                   bootstrap_rarefaction = "bootstrap")
  # Analysis of Group B alone
  
  cat("Single group analysis (Group B):\n")
  print(single_boot)
  # Confidence interval for single group disparity
}
#> Single group analysis (Group B):
#> Disparity resampling results
#> ===========================
#> 
#> Statistic: Multivariate variance 
#> Confidence level: 95% 
#> 
#>  group  average   CI_min   CI_max
#>    All 11.50473 9.543089 13.35472

Practical Considerations

Sample Size Effects

Bootstrap: Appropriate when groups have sufficient sample sizes
Rarefaction: Useful when comparing groups with different sample sizes and statistics sensitive to outliers/sample size
Convex hull: Requires substantially more observations than variables. One often needs to restrict the analysis to scores along a subset of principal components.

Statistic Selection

Multivariate variance: Most commonly used, less sensitive to outliers. Also called “sum of univariate variances” and, in geometric morphometrics, “Procrustes variance”
Mean pairwise distance: Alternative measure, can be more robust
Convex hull volume: Sensitive to outliers but captures occupied space
Claramunt proper variance: Accounts for covariance structure and how “spread out” across orthogonal dimensions (principal components) variation is

Interpretation Guidelines

Non-overlapping confidence intervals suggest different disparity levels
Permutation test p-values < 0.05 indicate significant differences
Consider biological relevance alongside statistical significance

Summary

The disparity_resample() and disparity_test() functions provide comprehensive tools for morphospace disparity analysis:

disparity_resample() offers flexible resampling approaches with confidence intervals
disparity_test() provides formal statistical tests between two groups
Both functions support multiple disparity statistics and handle various data types
S3 methods enable convenient printing and plotting of results

These tools support robust comparative analysis of morphological variation across groups, time periods, or experimental conditions in evolutionary and morphometric studies.