Chapter 02-S04: Gamma–Poisson Conjugacy for One Count Rate

Kjell Nygren

2026-06-21

library(glmbayes)
if (requireNamespace("bayesrules", quietly = TRUE)) library(bayesrules)

1. The model

1.1 Likelihood

Suppose we observe \(n\) independent counts \(y_1, \dots, y_n\) drawn from a Poisson distribution with unknown rate \(\lambda > 0\):

\[ Y_i \mid \lambda \;\sim\; \mathrm{Poisson}(\lambda), \qquad i = 1,\dots,n. \]

The joint likelihood is

\[ p(y_1,\dots,y_n \mid \lambda) \;\propto\; e^{-n\lambda}\,\lambda^{\sum_i y_i}. \]

Everything relevant for updating beliefs about \(\lambda\) is captured by two sufficient statistics: \(n\) (number of observations) and \(\sum_i y_i\) (total observed count).

With exposure. In many applications the counts are observed over different exposure periods \(e_i\) (time, area, number of opportunities). The model becomes \(Y_i \mid \lambda \sim \mathrm{Poisson}(e_i \lambda)\), and the sufficient statistics become \(\sum_i e_i\) and \(\sum_i y_i\). An exposure-weighted example appears in Appendix A ((Albert 2009)).

1.2 Prior

The rate \(\lambda > 0\) calls for a positive prior. The Gamma distribution in shape–rate form is the natural conjugate choice:

\[ p(\lambda) \;=\; \frac{\beta^\alpha}{\Gamma(\alpha)}\,\lambda^{\alpha-1}\,e^{-\beta\lambda}, \qquad \lambda > 0, \]

with shape \(\alpha > 0\) and rate \(\beta > 0\). Its mean is \(\alpha/\beta\) and its variance is \(\alpha/\beta^2\).

Interpreting the parameters. Think of \(\alpha - 1\) as a “prior total count” and \(\beta\) as a “prior number of observations”. The prior mean \(\alpha/\beta\) is your pre-data estimate of the rate, and larger \(\beta\) encodes a stronger commitment to that estimate.

Note: many textbooks use shape–scale form (\(\theta = 1/\beta\)); glmbayes uses the shape–rate convention throughout.

1.3 Posterior derivation

Multiply the likelihood kernel \(e^{-n\lambda}\lambda^{\sum y_i}\) by the prior kernel \(\lambda^{\alpha-1}e^{-\beta\lambda}\):

\[ p(\lambda \mid y) \;\propto\; \lambda^{(\alpha + \sum y_i) - 1}\,e^{-(\beta + n)\lambda}. \]

This is a Gamma kernel, so the conjugate posterior is:

\[ \boxed{\lambda \mid y \;\sim\; \mathrm{Gamma}\!\left(\alpha + \textstyle\sum_i y_i,\;\; \beta + n\right).} \]

Update rule: add the total count to the shape, add the sample size (or total exposure) to the rate.

The posterior mean is

\[ \mathbb{E}[\lambda \mid y] \;=\; \frac{\alpha + \sum y_i}{\beta + n} \;=\; \frac{\beta}{\beta + n}\cdot\frac{\alpha}{\beta} \;+\; \frac{n}{\beta + n}\cdot\bar{y}, \]

again a weighted average of the prior mean and the data mean.


2. bayesrules illustration

The bayesrules package (Johnson et al. 2022) provides plot_gamma_poisson() and summarize_gamma_poisson() for visualizing the Gamma–Poisson update. The same scenario appears in the Bayes Rules! conjugate vignette and in Chapter 12 of the book.

Scenario. You observe daily bicycle rental counts \(y_1, \dots, y_9\) from the first nine days of a year. The data are \(y = (1, 1, 1, 0, 0, 0, 0, 0, 0)\), giving \(\sum y = 3\) across \(n = 9\) days. Your prior belief is a \(\Gamma(3, 4)\) distribution on the daily rate \(\lambda\) (prior mean = 0.75 rentals/day).

br_y_df  <- data.frame(y = c(1, 1, 1, rep(0, 6)))  ## n = 9, sum(y) = 3
br_n     <- nrow(br_y_df)
br_shape <- 3
br_rate  <- 4

## Analytic posterior
post_shape_br <- br_shape + sum(br_y_df$y)   ## 3 + 3 = 6
post_rate_br  <- br_rate  + br_n              ## 4 + 9 = 13
library(bayesrules)
plot_gamma_poisson(
  shape      = br_shape,
  rate       = br_rate,
  sum_y      = sum(br_y_df$y),
  n          = br_n,
  prior      = TRUE,
  likelihood = TRUE,
  posterior  = TRUE
)

summarize_gamma_poisson(shape = br_shape, rate = br_rate,
                        sum_y = sum(br_y_df$y), n = br_n)
#>       model shape rate      mean      mode        var        sd
#> 1     prior     3    4 0.7500000 0.5000000 0.18750000 0.4330127
#> 2 posterior     6   13 0.4615385 0.3846154 0.03550296 0.1884223

Reading the output. The posterior is \(\Gamma(6, 13)\) with mean \(6/13 \approx 0.46\). The data (mean \(\bar y = 0.33\)) have pulled the posterior mean down from the prior mean (0.75), and the posterior is noticeably tighter.

gp_analytic <- data.frame(
  Example = "Bayes Rules! daily counts",
  n = br_n,
  `sum(y)` = sum(br_y_df$y),
  Posterior = sprintf("Gamma(%d, %d)", post_shape_br, post_rate_br),
  Mean = post_shape_br / post_rate_br,
  SD = sqrt(post_shape_br) / post_rate_br,
  check.names = FALSE
)
knitr::kable(gp_analytic, digits = 4,
  caption = "Conjugate Gamma--Poisson posterior (Bayes Rules! scenario)")
Conjugate Gamma–Poisson posterior (Bayes Rules! scenario)
Example n sum(y) Posterior Mean SD
Bayes Rules! daily counts 9 3 Gamma(6, 13) 0.4615 0.1884

3. Fitting with glmb()

The scalar Gamma–Poisson model maps to glmb() with family = poisson(link = "identity") and dGamma(shape, rate, beta, Inv_Dispersion = FALSE). The single coefficient is the rate \(\lambda\).

gp_beta <- matrix(br_shape / br_rate, 1L, 1L, dimnames = list(NULL, "(Intercept)"))
gp_pf   <- dGamma(shape = br_shape, rate = br_rate,
                  beta = gp_beta, Inv_Dispersion = FALSE)

set.seed(2026)
fit_gp <- glmb(
  n       = 20000,
  y ~ 1,
  data    = br_y_df,
  weights = rep(1L, br_n),
  family  = poisson(link = "identity"),
  pfamily = gp_pf
)
print(fit_gp)
#> 
#> Call:  glmb(formula = y ~ 1, family = poisson(link = "identity"), pfamily = gp_pf, 
#>     n = 20000, data = br_y_df, weights = rep(1L, br_n))
#> 
#> Posterior Mean Coefficients:
#> (Intercept)  
#>      0.4641  
#> 
#> Effective Number of Parameters: 0.5025621 
#> Expected Residual Deviance: 7.462233 
#> DIC: 13.96479
gp_compare <- data.frame(
  Example = "Bayes Rules! daily counts",
  Posterior = gp_analytic$Posterior,
  `Analytic Mean` = gp_analytic$Mean,
  `Analytic SD`   = gp_analytic$SD,
  `glmb Post.Mean` = fit_gp$coef.means["(Intercept)"],
  `glmb Post.Sd`   = sd(fit_gp$coefficients[, "(Intercept)", drop = TRUE]),
  check.names = FALSE
)
knitr::kable(gp_compare, digits = 4,
  caption = "Analytic vs. glmb() posterior mean and SD")
Analytic vs. glmb() posterior mean and SD
Example Posterior Analytic Mean Analytic SD glmb Post.Mean glmb Post.Sd
(Intercept) Bayes Rules! daily counts Gamma(6, 13) 0.4615 0.1884 0.4641 0.1875

The glmb Post.Mean and glmb Post.Sd columns should match the analytic Mean and SD to Monte Carlo error because dGamma(Inv_Dispersion = FALSE) implements the exact conjugate update.


4. Connection to glmbayes

The scalar Gamma–Poisson model is the conjugate prototype for Poisson glmb() with family = poisson(link = "identity"). In that setting the single coefficient is the rate \(\lambda\), the prior is dGamma(shape, rate, beta, Inv_Dispersion = FALSE), and the posterior update is the closed form above.

For general Poisson regression with covariates (log link), conjugacy is lost and glmb() uses envelope-based accept–reject sampling with a dNormal() prior on the regression coefficients (Chapter 10). The Bayes Rules! equality_index Poisson regression example is worked in Chapter 10, Appendix A.


Appendix A. Heart transplant mortality (Albert (2009))

The hearttransplants dataset (Albert 2020; Christiansen and Morris 1995) records, for each of 94 U.S. hospitals, the exposure e (expected deaths under the national baseline rate) and observed deaths y within 30 days of surgery. The model is \(y_i \mid \lambda_i \sim \mathrm{Poisson}(e_i \lambda_i)\) where \(\lambda_i\) is the hospital’s standardized mortality ratio.

library(LearnBayes)
data("hearttransplants")
cat(sprintf("%d hospitals  |  sum(y) = %d  |  sum(e) = %.0f\n",
            nrow(hearttransplants), sum(hearttransplants$y), sum(hearttransplants$e)))
#> 94 hospitals  |  sum(y) = 277  |  sum(e) = 294681

Albert’s prior. Albert (Ch. 3.2) uses a \(\Gamma(16, 15174)\) prior on \(\lambda\), derived from historical national data (prior mean \(\approx 0.00105\)). He focuses on two specific hospitals:

Hospital Exposure \(e\) Deaths \(y\) Analytic posterior Posterior mean Posterior SD
A 66 1 \(\Gamma(17,\; 15240)\) \(17/15240 \approx 0.001115\) \(\sqrt{17}/15240 \approx 0.000271\)
B 1767 4 \(\Gamma(20,\; 16941)\) \(20/16941 \approx 0.001181\) \(\sqrt{20}/16941 \approx 0.000264\)
alpha0_ht <- 16L;  beta0_ht <- 15174L
ex_A <- 66L;   yobs_A <- 1L
ex_B <- 1767L; yobs_B <- 4L
ht_albert <- data.frame(
  Hospital = c("A", "B"),
  e = c(ex_A, ex_B),
  y = c(yobs_A, yobs_B),
  Posterior = c("Gamma(17, 15240)", "Gamma(20, 16941)"),
  Mean = c(
    (alpha0_ht + yobs_A) / (beta0_ht + ex_A),
    (alpha0_ht + yobs_B) / (beta0_ht + ex_B)
  ),
  SD = c(
    sqrt(alpha0_ht + yobs_A) / (beta0_ht + ex_A),
    sqrt(alpha0_ht + yobs_B) / (beta0_ht + ex_B)
  ),
  check.names = FALSE
)
knitr::kable(ht_albert, digits = 6,
  caption = "Albert Ch. 3.2: analytic posterior mean and SD (shape--rate Gamma)")
Albert Ch. 3.2: analytic posterior mean and SD (shape–rate Gamma)
Hospital e y Posterior Mean SD
A 66 1 Gamma(17, 15240) 0.001115 0.000271
B 1767 4 Gamma(20, 16941) 0.001181 0.000264

A.1 Fitting with glmb()

The exposure enters as weights. The conjugate sampler updates \(\Gamma(\alpha_0 + y,\; \beta_0 + e)\) exactly.

ht_beta <- matrix(alpha0_ht / beta0_ht, 1L, 1L, dimnames = list(NULL, "(Intercept)"))
ht_pf   <- dGamma(shape = alpha0_ht, rate = beta0_ht,
                  beta = ht_beta, Inv_Dispersion = FALSE)

set.seed(2026)
fit_A <- glmb(n = 20000, y ~ 1, data = data.frame(y = yobs_A),
              weights = ex_A, family = poisson(link = "identity"), pfamily = ht_pf)
print(fit_A)
#> 
#> Call:  glmb(formula = y ~ 1, family = poisson(link = "identity"), pfamily = ht_pf, 
#>     n = 20000, data = data.frame(y = yobs_A), weights = ex_A)
#> 
#> Posterior Mean Coefficients:
#> (Intercept)  
#>    0.001121  
#> 
#> Effective Number of Parameters: 3.864282 
#> Expected Residual Deviance: 768.8068 
#> DIC: 904.6711
set.seed(2026)
fit_B <- glmb(n = 20000, y ~ 1, data = data.frame(y = yobs_B),
              weights = ex_B, family = poisson(link = "identity"), pfamily = ht_pf)
print(fit_B)
#> 
#> Call:  glmb(formula = y ~ 1, family = poisson(link = "identity"), pfamily = ht_pf, 
#>     n = 20000, data = data.frame(y = yobs_B), weights = ex_B)
#> 
#> Posterior Mean Coefficients:
#> (Intercept)  
#>    0.001185  
#> 
#> Effective Number of Parameters: 351.3413 
#> Expected Residual Deviance: 101062.1 
#> DIC: 107184
ht_compare <- data.frame(
  Hospital = c("A", "B"),
  e = c(ex_A, ex_B),
  y = c(yobs_A, yobs_B),
  Posterior = c("Gamma(17, 15240)", "Gamma(20, 16941)"),
  `Albert Mean` = ht_albert$Mean,
  `Albert SD`   = ht_albert$SD,
  `glmb Post.Mean` = c(fit_A$coef.means["(Intercept)"],
                       fit_B$coef.means["(Intercept)"]),
  `glmb Post.Sd`   = c(sd(fit_A$coefficients[, "(Intercept)", drop = TRUE]),
                       sd(fit_B$coefficients[, "(Intercept)", drop = TRUE])),
  check.names = FALSE
)
knitr::kable(ht_compare, digits = 6,
  caption = "Albert analytic vs. glmb() posterior mean and SD")
Albert analytic vs. glmb() posterior mean and SD
Hospital e y Posterior Albert Mean Albert SD glmb Post.Mean glmb Post.Sd
A 66 1 Gamma(17, 15240) 0.001115 0.000271 0.001121 0.000270
B 1767 4 Gamma(20, 16941) 0.001181 0.000264 0.001185 0.000263

Each fit is for one hospital: (Intercept) is the posterior draw for that hospital’s rate \(\lambda\). The glmb Post.Mean and glmb Post.Sd columns should match Albert’s Mean and SD to Monte Carlo error.


See also

References

Albert, Jim. 2009. Bayesian Computation with R. 2nd ed. Use r! Springer.
Albert, Jim. 2020. LearnBayes: Functions for Learning Bayesian Inference. https://CRAN.R-project.org/package=LearnBayes.
Christiansen, Cindy L., and Carl N. Morris. 1995. Fitting and Checking a Two-Level Poisson Model: Modeling Patient Mortality Rates in Heart Transplant Patients. (New York).
Johnson, Alicia A., Miles Q. Ott, and Mine Dogucu. 2022. Bayes Rules! An Introduction to Applied Bayesian Modeling. CRC Press. https://www.bayesrulesbook.com.