Suppose we observe \(n\) positive continuous responses \(y_1, \dots, y_n\) — waiting times, survival times, monetary amounts — drawn from a Gamma distribution with known shape \(k > 0\) and unknown rate \(\beta > 0\):
\[ Y_i \mid \beta \;\sim\; \mathrm{Gamma}(k,\, \beta), \qquad i = 1,\dots,n. \]
The mean response is \(\mu = k/\beta\) and the variance is \(k/\beta^2\). The log-likelihood (up to constants) is:
\[ \ell(\beta) \;\propto\; n k \log\beta - \beta\,\textstyle\sum_i y_i. \]
Why fix the shape? Conjugacy requires \(k\) to be treated as known. In practice you
either know it from domain knowledge (e.g., exponential responses have
\(k = 1\)) or estimate it by
method-of-moments: \(\hat k = \bar y^2 /
s^2\). When both \(k\) and \(\beta\) are unknown, conjugacy breaks and a
rejection sampler is needed — that case is handled by
Gamma(log) with glmbdisp().
The special case \(k = 1\) gives the Exponential distribution, the natural model for inter-event waiting times under a constant-hazard assumption.
Because \(\beta > 0\), the Gamma distribution is again the natural conjugate prior:
\[ \beta \;\sim\; \mathrm{Gamma}(\alpha_0,\, b_0), \]
with prior mean \(\alpha_0 / b_0\) and variance \(\alpha_0 / b_0^2\).
Relationship to the dispersion. In GLM terms the
Gamma dispersion is \(\phi = 1/k\), so
the shape \(k\) is the precision \(1/\phi\). Specifying
lik_shape = k in
dGamma(Inv_Dispersion = FALSE) is equivalent to specifying
a known dispersion.
The “identity” link. When you pass
family = Gamma(link = "identity") to
glmb(), the single regression coefficient
is the rate \(\beta\) directly. This
matches the conjugate parameterization.
Multiply the likelihood kernel \(\beta^{n k}\,e^{-\beta \sum y_i}\) by the prior kernel \(\beta^{\alpha_0 - 1}\,e^{-b_0 \beta}\):
\[ p(\beta \mid y) \;\propto\; \beta^{(\alpha_0 + n k) - 1}\,e^{-(b_0 + \sum_i y_i)\,\beta}. \]
This is a Gamma kernel, giving the conjugate posterior:
\[ \boxed{\beta \mid y \;\sim\; \mathrm{Gamma}\!\left(\alpha_0 + n k,\;\; b_0 + \textstyle\sum_i y_i\right).} \]
Update rule:
The posterior mean rate is \((\alpha_0 + n k)/(b_0 + \sum y_i)\), and the posterior mean response is:
\[ \mathbb{E}[\mu \mid y] \;\approx\; \frac{k}{\mathbb{E}[\beta \mid y]} \;=\; \frac{k(b_0 + \sum_i y_i)}{\alpha_0 + n k}. \]
The bayesrules package does not
currently include a plot_gamma_gamma() helper, so we plot
the prior, the (normalized) likelihood, and the posterior directly with
curve().
Scenario. You record the waiting time between calls to a customer service hotline. Exponential waiting times (\(k = 1\)) are a standard model. You place a \(\Gamma(2, 4)\) prior on the call rate \(\beta\) (prior mean = 0.5 calls/minute). You then observe \(n = 20\) inter-arrival times with sum \(\sum y = 35\) minutes.
alpha0 <- 2; b0 <- 4 ## Gamma(2, 4) prior: mean = 0.5
k <- 1 ## known shape (exponential)
n_obs <- 20L; sum_y <- 35 ## data: 20 observations, sum = 35
post_shape <- alpha0 + n_obs * k
post_rate <- b0 + sum_y
ggamma_analytic <- data.frame(
Example = "Call inter-arrivals (illustration)",
n = n_obs,
`sum(y)` = sum_y,
Posterior = sprintf("Gamma(%d, %d)", post_shape, post_rate),
`Mean (rate)` = post_shape / post_rate,
`SD (rate)` = sqrt(post_shape) / post_rate,
check.names = FALSE
)
knitr::kable(ggamma_analytic, digits = 4,
caption = "Conjugate Gamma--Gamma posterior for rate beta")| Example | n | sum(y) | Posterior | Mean (rate) | SD (rate) |
|---|---|---|---|---|---|
| Call inter-arrivals (illustration) | 20 | 35 | Gamma(22, 39) | 0.5641 | 0.1203 |
beta_grid <- seq(0.01, 1.5, length.out = 500)
## Normalized likelihood: Gamma(n*k + 1, sum_y) shape, treated as density
lik_unnorm <- dgamma(beta_grid, shape = n_obs * k + 1, rate = sum_y)
## Scale likelihood to have same peak height as the prior (visual only)
prior_vals <- dgamma(beta_grid, alpha0, b0)
post_vals <- dgamma(beta_grid, post_shape, post_rate)
lik_scaled <- lik_unnorm * (max(prior_vals) / max(lik_unnorm))
plot(beta_grid, prior_vals, type = "l", lwd = 2, col = "steelblue",
xlab = expression(beta), ylab = "Density",
main = "Gamma–Gamma update: prior, likelihood, posterior",
ylim = c(0, max(post_vals) * 1.1))
lines(beta_grid, lik_scaled, lwd = 2, col = "darkgreen", lty = 2)
lines(beta_grid, post_vals, lwd = 2, col = "tomato")
legend("topright",
legend = c("Prior Gamma(2, 4)",
"Likelihood (scaled)",
sprintf("Posterior Gamma(%d, %d)", post_shape, post_rate)),
col = c("steelblue", "darkgreen", "tomato"),
lty = c(1, 2, 1), lwd = 2, bty = "n")
abline(v = alpha0 / b0, lty = 3, col = "steelblue")
abline(v = post_shape / post_rate, lty = 3, col = "tomato")The posterior is pulled from the prior mean (0.5) toward the data-based MLE (\(n/\sum y = 20/35 \approx 0.57\)) and is considerably tighter than either.
The bayesrules::cherry_blossom_sample dataset records
net finishing times (minutes) for runners in a 10-mile cherry blossom
race (Johnson et al. 2022). Finishing
times are continuous and positive; we model them as Gamma-distributed
with a method-of-moments estimate of the shape \(k\).
library(bayesrules)
## Use net finishing times (minutes); drop missing values
y_cb <- cherry_blossom_sample$net
y_cb <- y_cb[is.finite(y_cb)]
n_cb <- length(y_cb)
ybar_cb <- mean(y_cb)
s2_cb <- var(y_cb)
## Method-of-moments estimate of shape k: k_hat = ybar^2 / s^2
k_hat_cb <- ybar_cb^2 / s2_cb
cat(sprintf("n = %d, mean = %.2f min, var = %.2f, k_hat = %.3f\n",
n_cb, ybar_cb, s2_cb, k_hat_cb))
#> n = 185, mean = 89.95 min, var = 195.44, k_hat = 41.395The estimated shape \(\hat k \approx\) 41.4 is treated as fixed for the conjugate analysis.
Prior. Based on general knowledge that competitive 10-mile race times cluster around 85–100 minutes, we set a \(\Gamma(5, 0.06)\) prior on the rate \(\beta\) (prior mean rate \(\approx 0.083\), corresponding to a prior mean time \(k/\beta \approx 85\) minutes for the estimated \(k\)).
## Set prior centered at rate corresponding to ~85 min mean finish time
## E[mu] = k / beta_prior => beta_prior = k_hat / 85
alpha0_cb <- 5
b0_cb <- alpha0_cb / (k_hat_cb / ybar_cb) ## prior mean rate = k_hat / ybar
cat(sprintf("Prior: Gamma(%.2f, %.4f), prior mean rate = %.4f, implied mean time = %.1f min\n",
alpha0_cb, b0_cb,
alpha0_cb / b0_cb,
k_hat_cb / (alpha0_cb / b0_cb)))
#> Prior: Gamma(5.00, 10.8644), prior mean rate = 0.4602, implied mean time = 89.9 minpost_shape_cb <- alpha0_cb + n_cb * k_hat_cb
post_rate_cb <- b0_cb + sum(y_cb)
post_mean_rate_cb <- post_shape_cb / post_rate_cb
post_sd_rate_cb <- sqrt(post_shape_cb) / post_rate_cb
post_mean_time_cb <- k_hat_cb / post_mean_rate_cb
cb_analytic <- data.frame(
Dataset = "Cherry blossom finish times",
n = n_cb,
`k (fixed)` = k_hat_cb,
Posterior = sprintf("Gamma(%.2f, %.4f)", post_shape_cb, post_rate_cb),
`Mean (rate beta)` = post_mean_rate_cb,
`SD (rate beta)` = post_sd_rate_cb,
`Mean time (min)` = post_mean_time_cb,
check.names = FALSE
)
knitr::kable(cb_analytic, digits = 4,
caption = "Conjugate Gamma--Gamma posterior for rate beta (shape--rate)")| Dataset | n | k (fixed) | Posterior | Mean (rate beta) | SD (rate beta) | Mean time (min) |
|---|---|---|---|---|---|---|
| Cherry blossom finish times | 185 | 41.3954 | Gamma(7663.15, 16651.0310) | 0.4602 | 0.0053 | 89.9468 |
glmb()df_cb <- data.frame(y = y_cb)
cb_beta <- matrix(alpha0_cb / b0_cb, nrow = 1, ncol = 1,
dimnames = list(NULL, "(Intercept)"))
cb_pf <- dGamma(
shape = alpha0_cb,
rate = b0_cb,
beta = cb_beta,
Inv_Dispersion = FALSE,
lik_shape = k_hat_cb
)
set.seed(2026)
fit_cb <- glmb(
n = 20000,
y ~ 1,
data = df_cb,
family = Gamma(link = "identity"),
pfamily = cb_pf
)
print(fit_cb)
#>
#> Call: glmb(formula = y ~ 1, family = Gamma(link = "identity"), pfamily = cb_pf,
#> n = 20000, data = df_cb)
#>
#> Posterior Mean Coefficients:
#> (Intercept)
#> 0.4603
#>
#> Effective Number of Parameters: 40.76645
#> Expected Residual Deviance: 7790.162
#> DIC: 62148.07cb_compare <- data.frame(
Dataset = "Cherry blossom finish times",
Posterior = cb_analytic$Posterior,
`Analytic Mean (rate)` = cb_analytic$`Mean (rate beta)`,
`Analytic SD (rate)` = cb_analytic$`SD (rate beta)`,
`glmb Post.Mean` = fit_cb$coef.means["(Intercept)"],
`glmb Post.Sd` = sd(fit_cb$coefficients[, "(Intercept)", drop = TRUE]),
check.names = FALSE
)
knitr::kable(cb_compare, digits = 4,
caption = "Analytic vs. glmb() posterior mean and SD for rate beta")| Dataset | Posterior | Analytic Mean (rate) | Analytic SD (rate) | glmb Post.Mean | glmb Post.Sd | |
|---|---|---|---|---|---|---|
| (Intercept) | Cherry blossom finish times | Gamma(7663.15, 16651.0310) | 0.4602 | 0.0053 | 0.4603 | 0.0052 |
The glmb Post.Mean and
glmb Post.Sd columns refer to the
rate \(\beta\). Mean
finish time is \(\hat k\) divided by
posterior mean rate (see Mean time in the analytic
table).
The scalar Gamma–Gamma model is the conjugate prototype for
Gamma glmb() with
family = Gamma(link = "identity"). In that setting
the single coefficient is the rate \(\beta\), the prior is
dGamma(shape, rate, beta, Inv_Dispersion = FALSE, lik_shape = k),
and the posterior update is the closed form above.
When the shape \(k\) is
unknown, the Gamma prior on \(k\) is no longer conjugate. That case is
handled by Gamma(link = "log") with
dNormal() priors on the regression coefficients plus
glmbdisp() for the dispersion \(\phi = 1/k\) (Chapter 10).
Gamma(log).