compositional.mle is part of an ecosystem of R packages
that share a single design principle from Structure and
Interpretation of Computer Programs (SICP):
closure—the result of combining things is itself the
same kind of thing, ready to be combined again.
This principle appears at three levels:
| Level | Package | Closure property |
|---|---|---|
| Solvers | compositional.mle |
Composing solvers yields a solver |
| Estimates | algebraic.mle |
Combining MLEs yields an MLE |
| Tests | hypothesize |
Combining hypothesis tests yields a test |
This vignette walks through that story. All packages referenced below are available on CRAN.
The core idea of compositional.mle is that solvers are
first-class functions that compose via operators:
%>>% (sequential chaining) — pipe the result of
one solver into the next%|% (parallel racing) — run solvers simultaneously,
keep the bestwith_restarts() — retry a solver from multiple starting
pointsBecause each operator returns a solver, you can compose freely:
library(compositional.mle)
# Generate data
set.seed(42)
y <- rnorm(200, mean = 5, sd = 2)
# Define the log-likelihood (numDeriv handles derivatives by default)
ll <- function(theta) {
mu <- theta[1]; sigma <- theta[2]
-length(y) * log(sigma) - 0.5 * sum((y - mu)^2) / sigma^2
}
problem <- mle_problem(loglike = ll)
# Coarse-to-fine: grid search for a starting region, then L-BFGS-B
strategy <- grid_search(lower = c(0, 0.5), upper = c(10, 5), n = 5) %>>%
lbfgsb(lower = c(-Inf, 1e-4), upper = c(Inf, Inf))
result <- strategy(problem, theta0 = c(0, 1))
result#> MLE Solver Result
#> -----------------
#> Method: grid -> lbfgsb
#> Converged: TRUE
#> Iterations: 10
#> Log-likelihood: -424.84
#>
#> Estimate:
#> [1] 5.0609 1.9498
You can race different approaches and let the best log-likelihood win:
race <- bfgs() %|% nelder_mead() %|% gradient_ascent()
result_race <- race(problem, theta0 = c(0, 1))Or combine restarts with chaining:
robust <- with_restarts(
gradient_ascent() %>>% bfgs(),
n = 10,
sampler = uniform_sampler(lower = c(-10, 0.1), upper = c(10, 10))
)
result_robust <- robust(problem, theta0 = c(0, 1))See vignette("strategy-design") for a thorough treatment
of solver composition.
mle Result ObjectEvery solver returns an mle object (defined by
algebraic.mle). This is the shared currency of the
ecosystem—any package that produces or consumes mle objects
works with every other package.
The mle class provides a uniform interface:
library(algebraic.mle)
# Extract estimates and uncertainty
params(result) # parameter estimates (theta-hat)
se(result) # standard errors
vcov(result) # variance-covariance matrix
loglik_val(result) # log-likelihood at the MLE
confint(result) # 95% confidence intervals#> params(result)
#> [1] 5.0609 1.9498
#>
#> se(result)
#> [1] 0.1379 0.0975
#>
#> vcov(result)
#> [,1] [,2]
#> [1,] 0.019009 -0.000007
#> [2,] -0.000007 0.009506
#>
#> loglik_val(result)
#> [1] -424.84
#>
#> confint(result)
#> 2.5 % 97.5 %
#> [1,] 4.7906 5.3312
#> [2,] 1.7587 2.1410
The key point: results from any solver have the same
interface. Whether you used Newton-Raphson, a grid search, or a
composed pipeline, params(), se(), and
confint() work the same way.
Suppose three independent labs each estimate the mean of a
Normal\((\mu, \sigma)\) population from
their own data. Each produces an mle object.
algebraic.mle provides mle_weighted() to
combine them via inverse-variance weighting using
Fisher information:
# Three independent datasets from the same population
set.seed(123)
y1 <- rnorm(50, mean = 10, sd = 3)
y2 <- rnorm(100, mean = 10, sd = 3)
y3 <- rnorm(75, mean = 10, sd = 3)
# Fit each independently
make_problem <- function(data) {
mle_problem(loglike = function(theta) {
mu <- theta[1]; sigma <- theta[2]
-length(data) * log(sigma) - 0.5 * sum((data - mu)^2) / sigma^2
})
}
solver <- lbfgsb(lower = c(-Inf, 1e-4), upper = c(Inf, Inf))
fit1 <- solver(make_problem(y1), theta0 = c(0, 1))
fit2 <- solver(make_problem(y2), theta0 = c(0, 1))
fit3 <- solver(make_problem(y3), theta0 = c(0, 1))
# Combine: inverse-variance weighting via Fisher information
combined <- mle_weighted(list(fit1, fit2, fit3))# Individual SEs vs combined
data.frame(
Source = c("Lab 1 (n=50)", "Lab 2 (n=100)", "Lab 3 (n=75)", "Combined"),
Estimate = c(params(fit1)[1], params(fit2)[1],
params(fit3)[1], params(combined)[1]),
SE = c(se(fit1)[1], se(fit2)[1], se(fit3)[1], se(combined)[1])
)#> Source Estimate SE
#> 1 Lab 1 (n=50) 10.207 0.4243
#> 2 Lab 2 (n=100) 9.869 0.2998
#> 3 Lab 3 (n=75) 9.753 0.3464
#> 4 Combined 9.934 0.1995
The combined estimate has a smaller standard error than any
individual estimate—this is the power of information pooling. And
because mle_weighted() returns an mle object,
you can pass it to any function that accepts MLEs:
confint(), se(), further weighting, or
hypothesis testing. Combining MLEs yields an MLE
(closure).
hypothesizeThe hypothesize
package provides composable hypothesis tests. Like solvers and
estimates, combining tests yields a test.
wald_test() tests whether a single parameter equals a
hypothesised value. lrt() compares nested models via the
likelihood ratio:
library(hypothesize)
# Wald test: is mu = 10?
w <- wald_test(estimate = params(combined)[1],
se = se(combined)[1],
null_value = 10)
w#> Hypothesis test ( wald_test )
#> -----------------------------
#> Test statistic: 0.1094
#> P-value: 0.7409
#> Degrees of freedom: 1
#> Significant at 5% level: FALSE
# Likelihood ratio test: does sigma differ from 3?
# Compare full model (mu, sigma free) vs restricted (mu free, sigma = 3)
ll_full <- loglik_val(fit2) # full model log-likelihood
ll_null <- sum(dnorm(y2, mean = params(fit2)[1], sd = 3, log = TRUE))
lr <- lrt(null_loglik = ll_null, alt_loglik = ll_full, dof = 1)
lr#> Hypothesis test ( likelihood_ratio_test )
#> ------------------------------------------
#> Test statistic: 0.2851
#> P-value: 0.5934
#> Degrees of freedom: 1
#> Significant at 5% level: FALSE
When you have independent tests (e.g., each lab tests \(H_0: \mu = 10\)),
fisher_combine() merges them using Fisher’s method:
# Each lab tests H0: mu = 10
w1 <- wald_test(params(fit1)[1], se(fit1)[1], null_value = 10)
w2 <- wald_test(params(fit2)[1], se(fit2)[1], null_value = 10)
w3 <- wald_test(params(fit3)[1], se(fit3)[1], null_value = 10)
# Combine independent tests
combined_test <- fisher_combine(w1, w2, w3)
combined_test#> Hypothesis test ( fisher_combined_test )
#> -----------------------------------------
#> Test statistic: 3.4287
#> P-value: 0.7534
#> Degrees of freedom: 6
#> Significant at 5% level: FALSE
The combined test is itself a hypothesis_test object—you
can extract pval(), check is_significant_at(),
or combine it further. This is the closure property at work.
All hypothesis_test objects share a uniform interface,
mirroring how all mle objects share params() /
se():
Derivatives are pluggable in mle_problem(). The solver
never knows (or cares) where they come from:
# Strategy 1: numDeriv (default) --- zero effort
p1 <- mle_problem(loglike = ll)
# Strategy 2: hand-coded analytic derivatives
p2 <- mle_problem(
loglike = ll,
score = function(theta) {
mu <- theta[1]; sigma <- theta[2]; n <- length(y)
c(sum(y - mu) / sigma^2,
-n / sigma + sum((y - mu)^2) / sigma^3)
},
fisher = function(theta) {
n <- length(y); sigma <- theta[2]
matrix(c(n / sigma^2, 0, 0, 2 * n / sigma^2), 2, 2)
}
)
# Strategy 3: automatic differentiation
# Any AD package that provides score(f, theta) and hessian(f, theta)
# can be plugged in here, e.g.:
# p3 <- mle_problem(
# loglike = ll,
# score = function(theta) ad_pkg::score(ll, theta),
# fisher = function(theta) ad_pkg::hessian(ll, theta)
# )Each strategy has trade-offs:
| Approach | Accuracy | Effort | Speed |
|---|---|---|---|
Numerical (numDeriv) |
\(O(h^2)\) truncation error | None (default) | Slow (extra evaluations) |
| Hand-coded analytic | Exact | High (error-prone) | Fast |
| Automatic differentiation (AD package) | Exact (machine precision) | Low | Moderate |
The general recommendation: start with the default
(no explicit derivatives) during model development, then switch
to an AD package or hand-coded derivatives when you need
accuracy or performance. For derivative-free problems, use
nelder_mead().
| Situation | Recommended | Reason |
|---|---|---|
| Quick prototyping | numDeriv (default) |
Zero effort, just write the log-likelihood |
| Production / accuracy-critical | AD package or hand-coded | Exact derivatives |
| Complex models (mixtures, hierarchical) | AD package | Hand-coding is error-prone and tedious |
| Non-differentiable objective | nelder_mead() |
Derivative-free simplex method |
The packages form a pipeline: define a model, solve for the MLE, combine estimates, and test hypotheses—with closure at every level.
| Package | CRAN | Purpose |
|---|---|---|
algebraic.mle |
Yes | MLE algebra: params(), se(),
confint(), mle_weighted() |
algebraic.dist |
Yes | Distribution algebra: convolution, mixture, truncation |
compositional.mle |
— | Solver composition: %>>%, %|%,
with_restarts() |
hypothesize |
Yes | Test composition: wald_test(), lrt(),
fisher_combine() |
dfr.dist |
— | Distribution-free reliability methods |
The typical workflow:
mle_problem()compositional.mle — get an mle objectmle_weighted() from algebraic.mle — get a
(better) mle objecthypothesize — get
a hypothesis_test objectfisher_combine() —
get a (stronger) hypothesis_test objectAt every step, the result is the same type as the input. That is the SICP closure property, and it is what makes the ecosystem composable.