Chapter 02-S03: Beta–Binomial Conjugacy for One Proportion

Kjell Nygren

2026-06-21

library(glmbayes)
if (requireNamespace("bayesrules", quietly = TRUE)) library(bayesrules)

1. The model

1.1 Likelihood

Suppose we observe \(n\) independent binary trials — each a success (1) or failure (0) — with a common success probability \(\theta \in (0,1)\). The total number of successes \(y = \sum_i y_i\) follows:

\[ y \mid \theta \;\sim\; \mathrm{Binomial}(n,\, \theta). \]

The likelihood is

\[ p(y \mid \theta) \;=\; \binom{n}{y}\,\theta^y\,(1-\theta)^{n-y}, \]

and everything relevant for updating our beliefs about \(\theta\) is captured by the two sufficient statistics \(y\) (successes) and \(n - y\) (failures).

1.2 Prior

The parameter \(\theta\) is a probability, so its support is \([0,1]\). The Beta distribution is the natural conjugate prior:

\[ p(\theta) \;=\; \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}, \qquad \theta \in [0,1], \]

with shape parameters \(a > 0\) and \(b > 0\). Its mean is \(a/(a+b)\) and its variance is \(ab/[(a+b)^2(a+b+1)]\).

Interpreting the parameters. Think of \(a - 1\) as a “prior number of successes” and \(b - 1\) as a “prior number of failures” accumulated before the current study. The total “prior sample size” is \(a + b - 2\) and the prior mean is \(a/(a+b)\).

Prior belief Parameter choice
No information — completely flat \(a = b = 1\) (uniform on \([0,1]\))
Mild belief near 0.5 \(a = b = 3\)
Moderate belief near 0.4 \(a = 4, b = 6\) (mean = 0.4)
Strong belief near 0.2 \(a = 4, b = 16\) (mean = 0.2)

1.3 Posterior derivation

Multiply the likelihood kernel \(\theta^y(1-\theta)^{n-y}\) by the prior kernel \(\theta^{a-1}(1-\theta)^{b-1}\):

\[ p(\theta \mid y) \;\propto\; \theta^{(a+y)-1}(1-\theta)^{(b+n-y)-1}. \]

This is the kernel of a Beta distribution, so the conjugate posterior is:

\[ \boxed{\theta \mid y \;\sim\; \mathrm{Beta}(a + y,\; b + n - y).} \]

The update rule is simple:

1.4 The posterior mean as a compromise

\[ \mathbb{E}[\theta \mid y] \;=\; \frac{a + y}{a + b + n} \;=\; \frac{a+b}{a+b+n}\cdot\frac{a}{a+b} \;+\; \frac{n}{a+b+n}\cdot\frac{y}{n}. \]

This is a weighted average of the prior mean \(a/(a+b)\) and the data frequency \(y/n\), with weights proportional to the “prior sample size” \(a+b\) and the actual sample size \(n\). When \(n\) is much larger than \(a + b\) the data dominate; when \(n\) is small the prior pulls the estimate toward its own mean.


2. bayesrules illustration

The bayesrules package provides plot_beta_binomial() and summarize_beta_binomial() for quick visualization and tabular summaries of the Beta–Binomial update (Johnson et al. 2022).

Scenario. A clinical researcher believes, based on earlier trials, that about 40% of patients respond to a new analgesic. They encode this as a Beta(4, 6) prior (mean = 0.4, effective prior sample size = 10). In a new pilot study they observe \(y = 14\) responders out of \(n = 30\) patients.

a0 <- 4;  b0 <- 6          ## Beta(4, 6) prior: mean = 0.4
y  <- 14; n  <- 30          ## data: 14/30 successes

## Analytic posterior
post_a <- a0 + y            ## 4 + 14 = 18
post_b <- b0 + (n - y)      ## 6 + 16 = 22
library(bayesrules)
## Overlay prior, likelihood (scaled), and posterior densities
plot_beta_binomial(
  alpha      = a0,
  beta       = b0,
  y          = y,
  n          = n,
  prior      = TRUE,
  likelihood = TRUE,
  posterior  = TRUE
)

## Tabular summary of prior, likelihood, and posterior
summarize_beta_binomial(alpha = a0, beta = b0, y = y, n = n)
#>       model alpha beta mean      mode         var         sd
#> 1     prior     4    6 0.40 0.3750000 0.021818182 0.14770979
#> 2 posterior    18   22 0.45 0.4473684 0.006036585 0.07769547

Reading the output. The posterior parameters are Beta(18, 22), giving a posterior mean of \(18/40 = 0.45\) — pulled from the prior mean (0.40) toward the data frequency (\(14/30 \approx 0.47\)) in proportion to their relative weights. The posterior is noticeably narrower than the prior, reflecting the information gained from 30 observations.

Analytic posterior.

bb_analytic <- data.frame(
  Example = "Analgesic trial (BayesRules)",
  n = n,
  y = y,
  Posterior = sprintf("Beta(%d, %d)", post_a, post_b),
  Mean = post_a / (post_a + post_b),
  SD = sqrt(post_a * post_b / ((post_a + post_b)^2 * (post_a + post_b + 1))),
  check.names = FALSE
)
knitr::kable(bb_analytic, digits = 4,
  caption = "Conjugate Beta--Binomial posterior")
Conjugate Beta–Binomial posterior
Example n y Posterior Mean SD
Analgesic trial (BayesRules) 30 14 Beta(18, 22) 0.45 0.0777

3. Real data: the Bechdel test

The Bechdel test asks whether a film features at least two women who talk to each other about something other than a man. The bayesrules::bechdel dataset records the test result for 1794 films (Johnson et al. 2022).

Research question. What fraction of films pass the Bechdel test?

Prior. Based on earlier analyses suggesting roughly 45% of films pass, we specify a Beta(9, 11) prior (mean = 0.45, effective sample size = 20 — moderately informative).

library(bayesrules)

## Binary outcome: 1 = PASS, 0 = FAIL
pass   <- as.integer(bechdel[["binary"]] == "PASS")
n_bech <- length(pass)
y_bech <- sum(pass)

cat(sprintf("n = %d,  y (PASS) = %d,  proportion = %.3f\n",
            n_bech, y_bech, y_bech / n_bech))
#> n = 1794,  y (PASS) = 803,  proportion = 0.448
a0_b <- 9;  b0_b <- 11    ## Beta(9, 11) prior: mean = 0.45

post_a_b <- a0_b + y_bech
post_b_b <- b0_b + (n_bech - y_bech)

c(prior_mean  = a0_b / (a0_b + b0_b),
  data_freq   = round(y_bech / n_bech, 4),
  post_mean   = round(post_a_b / (post_a_b + post_b_b), 4))
#> prior_mean  data_freq  post_mean 
#>     0.4500     0.4476     0.4476

## 95% credible interval
round(qbeta(c(0.025, 0.975), post_a_b, post_b_b), 4)
#> [1] 0.4248 0.4706

With over 1700 films the data dominate the prior; the posterior mean is nearly identical to the observed proportion.

analytic_mean_b <- post_a_b / (post_a_b + post_b_b)
analytic_sd_b   <- sqrt(post_a_b * post_b_b /
                        ((post_a_b + post_b_b)^2 * (post_a_b + post_b_b + 1)))

bech_analytic <- data.frame(
  Dataset = "Bechdel test",
  n = n_bech,
  y = y_bech,
  Posterior = sprintf("Beta(%d, %d)", post_a_b, post_b_b),
  Mean = analytic_mean_b,
  SD = analytic_sd_b,
  check.names = FALSE
)
knitr::kable(bech_analytic, digits = 4,
  caption = "Conjugate Beta--Binomial posterior")
Conjugate Beta–Binomial posterior
Dataset n y Posterior Mean SD
Bechdel test 1794 803 Beta(812, 1002) 0.4476 0.0117

3.1 Fitting with glmb() using dBeta()

The scalar Beta–Binomial model generalizes to glmb() with family = binomial(link = "identity") and the dBeta() prior. In this intercept-only setting the single regression coefficient is \(\theta\) directly on the probability scale.

df_bech <- data.frame(y = pass)

bech_beta <- matrix(a0_b / (a0_b + b0_b), nrow = 1, ncol = 1,
                    dimnames = list(NULL, "(Intercept)"))

bech_pf <- dBeta(shape1 = a0_b, shape2 = b0_b, beta = bech_beta)

set.seed(2026)
fit_bech <- glmb(
  n       = 20000,
  y ~ 1,
  data    = df_bech,
  weights = rep(1L, n_bech),
  family  = binomial(link = "identity"),
  pfamily = bech_pf
)
print(fit_bech)
#> 
#> Call:  glmb(formula = y ~ 1, family = binomial(link = "identity"), pfamily = bech_pf, 
#>     n = 20000, data = df_bech, weights = rep(1L, n_bech))
#> 
#> Posterior Mean Coefficients:
#> (Intercept)  
#>      0.4477  
#> 
#> Effective Number of Parameters: 0.9969965 
#> Expected Residual Deviance: 2468.272 
#> DIC: 2469.269
bech_compare <- data.frame(
  Dataset = "Bechdel test",
  Posterior = bech_analytic$Posterior,
  `Analytic Mean` = bech_analytic$Mean,
  `Analytic SD`   = bech_analytic$SD,
  `glmb Post.Mean` = fit_bech$coef.means["(Intercept)"],
  `glmb Post.Sd`   = sd(fit_bech$coefficients[, "(Intercept)", drop = TRUE]),
  check.names = FALSE
)
knitr::kable(bech_compare, digits = 4,
  caption = "Analytic vs. glmb() posterior mean and SD")
Analytic vs. glmb() posterior mean and SD
Dataset Posterior Analytic Mean Analytic SD glmb Post.Mean glmb Post.Sd
(Intercept) Bechdel test Beta(812, 1002) 0.4476 0.0117 0.4477 0.0117

The glmb Post.Mean and glmb Post.Sd columns should match the analytic Mean and SD to Monte Carlo error because dBeta() implements the exact conjugate update.


4. Connection to glmbayes

This scalar prototype is the intuition behind Binomial glmb() (Chapters 7–9), where instead of a single \(\theta\) you model \(\mathrm{logit}(\theta_i) = x_i^\top \beta\) and place a multivariate Normal prior on the coefficient vector \(\beta\). The conjugate dBeta() path is specific to intercept-only models with the identity link; once covariates enter, the conjugacy is lost and the general envelope sampler takes over.


See also

References

Johnson, Alicia A., Miles Q. Ott, and Mine Dogucu. 2022. Bayes Rules! An Introduction to Applied Bayesian Modeling. CRC Press. https://www.bayesrulesbook.com.