Ordinary Least Squares (OLS) is optimal for Gaussian errors. PMM2 improves upon OLS when errors are asymmetric (skewed). But what about errors that are symmetric yet non-Gaussian?
Many real-world processes produce errors with lighter tails than the normal distribution (platykurtic errors, \(\gamma_4 < 0\)). In these cases:
PMM3 constructs a cubic stochastic polynomial using the fourth and sixth central moments. The key theoretical quantities:
The factor \(g_3\) is the ratio \(\text{Var}(\hat{\beta}_{\text{PMM3}}) / \text{Var}(\hat{\beta}_{\text{OLS}})\). When \(g_3 < 1\), PMM3 achieves lower variance than OLS.
PMM3 iteratively refines OLS estimates using the score equation:
\[ Z_{1\nu} = \varepsilon_\nu (\kappa - \varepsilon_\nu^2), \quad \kappa = m_4 / m_2 \]
with Jacobian \(J = X^\top \text{diag}(3\varepsilon^2 - \kappa) X\). The solver includes step-size limiting and a divergence guard for numerical stability.
Reference: Based on the theoretical framework of the Polynomial Maximization Method (Zabolotnii et al., 2018).
Uniform errors are the canonical platykurtic case: \(\gamma_4 \approx -1.2\), symmetric, bounded.
n <- 300
# Generate predictor
x <- rnorm(n, mean = 5, sd = 2)
# True parameters
beta_0 <- 10
beta_1 <- 2.5
# Generate uniform errors (platykurtic, gamma4 ≈ -1.2)
errors <- runif(n, -sqrt(3), sqrt(3)) # variance = 1
# Response variable
y <- beta_0 + beta_1 * x + errors
dat <- data.frame(y = y, x = x)fit_ols_temp <- lm(y ~ x, data = dat)
res_ols <- residuals(fit_ols_temp)
par(mfrow = c(1, 2))
hist(res_ols, breaks = 30, main = "Residual Distribution (Uniform Errors)",
xlab = "Residuals", col = "lightblue", border = "white")
qqnorm(res_ols, main = "Q-Q Plot")
qqline(res_ols, col = "red", lwd = 2)par(mfrow = c(1, 1))
cat("Skewness (gamma3):", round(pmm_skewness(res_ols), 3), "\n")
#> Skewness (gamma3): 0.026
cat("Excess kurtosis (gamma4):", round(pmm_kurtosis(res_ols) - 3, 3), "\n")
#> Excess kurtosis (gamma4): -4.183The Q-Q plot shows lighter tails than normal (S-shaped curve with compressed ends). The skewness is near zero but kurtosis is clearly negative.
fit_pmm3 <- lm_pmm3(y ~ x, data = dat)
fit_ols <- lm(y ~ x, data = dat)
# Display PMM3 summary
summary(fit_pmm3)
#> PMM3 estimation results (S = 3, symmetric errors)
#> Call:
#> lm_pmm3(formula = y ~ x, data = dat)
#>
#> Coefficients:
#> (Intercept) x
#> 9.916678 2.518456
#>
#> Central moments of initial residuals:
#> m2 = 0.9555941
#> m4 = 1.658808
#> m6 = 3.424153
#>
#> Theoretical characteristics of PMM3 (S = 3):
#> gamma4 = -1.183442
#> gamma6 = 6.675667
#> g3 = 0.3082707 (expected ratio Var[PMM3]/Var[OLS])
#> kappa = 1.231908
#>
#> Algorithm information:
#> Convergence status: TRUE
#> Iterations: 3comparison <- data.frame(
Parameter = c("Intercept", "Slope"),
True = c(beta_0, beta_1),
OLS = coef(fit_ols),
PMM3 = coef(fit_pmm3),
Diff = coef(fit_pmm3) - coef(fit_ols)
)
print(comparison, row.names = FALSE)
#> Parameter True OLS PMM3 Diff
#> Intercept 10.0 9.804314 9.916678 0.11236314
#> Slope 2.5 2.537942 2.518456 -0.01948666
cat("\nResidual sum of squares:\n")
#>
#> Residual sum of squares:
cat(" OLS: ", sum(residuals(fit_ols)^2), "\n")
#> OLS: 286.6782
cat(" PMM3: ", sum(residuals(fit_pmm3)^2), "\n")
#> PMM3: 287.1956
cat("\nTheoretical variance ratio g3 =",
fit_pmm3@g_coefficient, "\n")
#>
#> Theoretical variance ratio g3 = 0.3082707
cat("Expected variance reduction:",
round((1 - fit_pmm3@g_coefficient) * 100, 1), "%\n")
#> Expected variance reduction: 69.2 %For symmetric platykurtic errors, PMM3 should outperform both OLS and PMM2. PMM2 should offer no improvement since \(\gamma_3 \approx 0\).
# Fit all three methods
fit_ols <- lm(y ~ x, data = dat)
fit_pmm2 <- lm_pmm2(y ~ x, data = dat)
fit_pmm3 <- lm_pmm3(y ~ x, data = dat)
comparison3 <- data.frame(
Method = c("OLS", "PMM2", "PMM3"),
Intercept = c(coef(fit_ols)[1], coef(fit_pmm2)[1], coef(fit_pmm3)[1]),
Slope = c(coef(fit_ols)[2], coef(fit_pmm2)[2], coef(fit_pmm3)[2])
)
print(comparison3, row.names = FALSE, digits = 5)
#> Method Intercept Slope
#> OLS 9.8043 2.5379
#> PMM2 9.8058 2.5376
#> PMM3 9.9167 2.5185
cat("\nTrue values: Intercept =", beta_0, ", Slope =", beta_1, "\n")
#>
#> True values: Intercept = 10 , Slope = 2.5
cat("\nEfficiency factors:\n")
#>
#> Efficiency factors:
vf2 <- pmm2_variance_factor(fit_pmm2@m2, fit_pmm2@m3, fit_pmm2@m4)
cat(" PMM2 g2 =", vf2$g2,
" (improvement:", round((1 - vf2$g2) * 100, 1), "%)\n")
#> PMM2 g2 = (improvement: %)
cat(" PMM3 g3 =", fit_pmm3@g_coefficient,
" (improvement:", round((1 - fit_pmm3@g_coefficient) * 100, 1), "%)\n")
#> PMM3 g3 = 0.3082707 (improvement: 69.2 %)With symmetric platykurtic errors, PMM2’s \(g_2 \approx 1\) (no gain), while PMM3’s \(g_3 \ll 1\) (substantial gain).
The auto_mpg dataset contains fuel consumption data for
398 vehicles. The regression of MPG vs Horsepower
(quadratic model) produces residuals with \(|\gamma_3| \approx 0.2\) and \(\gamma_4 \approx -1.3\) — a case where PMM3
is appropriate.
data(auto_mpg)
# Remove missing horsepower values
auto_complete <- na.omit(auto_mpg[, c("mpg", "horsepower")])
# Visualize the relationship (clearly nonlinear)
plot(auto_complete$horsepower, auto_complete$mpg,
main = "MPG vs Horsepower",
xlab = "Horsepower", ylab = "Miles per Gallon",
col = "steelblue", pch = 16, cex = 0.8)# Quadratic OLS
fit_auto_ols <- lm(mpg ~ horsepower + I(horsepower^2), data = auto_complete)
# Check residual properties
res_auto <- residuals(fit_auto_ols)
cat("Residual diagnostics:\n")
#> Residual diagnostics:
cat(" Skewness (gamma3):", round(pmm_skewness(res_auto), 3), "\n")
#> Skewness (gamma3): 0.218
cat(" Kurtosis (gamma4):", round(pmm_kurtosis(res_auto) - 3, 3), "\n")
#> Kurtosis (gamma4): -1.701
sym_test <- test_symmetry(res_auto)
cat(" Symmetric:", sym_test$is_symmetric, "\n")
#> Symmetric: TRUE
# Dispatch recommendation
dispatch_auto <- pmm_dispatch(res_auto)
#> n = 392 | gamma3 = +0.218 | gamma4 = +1.299
#> g2(PMM2) = 0.9856 | g3(PMM3) = 0.8951
#> >>> gamma3 = 0.218, gamma4 = 1.299: near-Gaussian residuals. No PMM advantage expected. Use OLS.# Fit PMM3 (quadratic model)
fit_auto_pmm3 <- lm_pmm3(mpg ~ horsepower + I(horsepower^2),
data = auto_complete)
# Fit PMM2 for comparison
fit_auto_pmm2 <- lm_pmm2(mpg ~ horsepower + I(horsepower^2),
data = auto_complete, na.action = na.omit)
# Compare all three methods
comparison_auto <- data.frame(
Method = c("OLS", "PMM2", "PMM3"),
Intercept = c(coef(fit_auto_ols)[1], coef(fit_auto_pmm2)[1],
coef(fit_auto_pmm3)[1]),
HP = c(coef(fit_auto_ols)[2], coef(fit_auto_pmm2)[2],
coef(fit_auto_pmm3)[2]),
HP_sq = c(coef(fit_auto_ols)[3], coef(fit_auto_pmm2)[3],
coef(fit_auto_pmm3)[3])
)
print(comparison_auto, row.names = FALSE, digits = 6)
#> Method Intercept HP HP_sq
#> OLS 56.9001 -0.466190 0.00123054
#> PMM2 56.0466 -0.454033 0.00119689
#> PMM3 58.1803 -0.488974 0.00131442
cat("\nPMM3 g3 =", fit_auto_pmm3@g_coefficient,
" (improvement:", round((1 - fit_auto_pmm3@g_coefficient) * 100, 1), "%)\n")
#>
#> PMM3 g3 = 0.895111 (improvement: 10.5 %)hp_seq <- seq(min(auto_complete$horsepower),
max(auto_complete$horsepower), length.out = 200)
newdata <- data.frame(horsepower = hp_seq)
pred_ols <- predict(fit_auto_ols, newdata = newdata)
pred_pmm3 <- predict(fit_auto_pmm3,
newdata = data.frame(horsepower = hp_seq,
`I(horsepower^2)` = hp_seq^2))
plot(auto_complete$horsepower, auto_complete$mpg,
main = "MPG vs Horsepower: OLS and PMM3 Quadratic Fits",
xlab = "Horsepower", ylab = "Miles per Gallon",
col = "gray70", pch = 16, cex = 0.7)
lines(hp_seq, pred_ols, col = "blue", lwd = 2)
lines(hp_seq, as.numeric(pred_pmm3), col = "red", lwd = 2, lty = 2)
legend("topright", legend = c("OLS", "PMM3"),
col = c("blue", "red"), lty = c(1, 2), lwd = 2)For contrast, the regression of MPG vs Acceleration produces asymmetric residuals (\(\gamma_3 \approx 0.5\)), where PMM2 is appropriate instead.
# Linear model: MPG vs Acceleration
fit_accel_ols <- lm(mpg ~ acceleration, data = auto_mpg)
res_accel <- residuals(fit_accel_ols)
cat("Acceleration residual diagnostics:\n")
#> Acceleration residual diagnostics:
cat(" Skewness (gamma3):", round(pmm_skewness(res_accel), 3), "\n")
#> Skewness (gamma3): 0.497
cat(" Kurtosis (gamma4):", round(pmm_kurtosis(res_accel) - 3, 3), "\n")
#> Kurtosis (gamma4): -3.33
# Dispatch
dispatch_accel <- pmm_dispatch(res_accel)
#> n = 398 | gamma3 = +0.497 | gamma4 = -0.330
#> g2(PMM2) = 0.8519 | g3(PMM3) = 0.9779
#> >>> |gamma3| = 0.497 > 0.3 and g2 = 0.852 < 0.95: moderate asymmetry, PMM2 worthwhile (14.8% reduction).
# Compare
fit_accel_pmm2 <- lm_pmm2(mpg ~ acceleration, data = auto_mpg, na.action = na.omit)
vf2_accel <- pmm2_variance_factor(fit_accel_pmm2@m2, fit_accel_pmm2@m3, fit_accel_pmm2@m4)
cat("\nPMM2 g2 =", vf2_accel$g2,
" (improvement:", round((1 - vf2_accel$g2) * 100, 1), "%)\n")
#>
#> PMM2 g2 = (improvement: %)This demonstrates how pmm_dispatch() guides users to the
correct method: PMM3 for symmetric platykurtic
residuals, PMM2 for asymmetric residuals.
PMM3 extends naturally to multiple predictors.
n <- 300
x1 <- rnorm(n)
x2 <- runif(n, -2, 2)
x3 <- rnorm(n, 3, 1)
# True parameters
beta <- c(5, 1.5, -0.8, 2.0)
# Beta-symmetric errors (platykurtic, gamma4 ≈ -0.86)
errors_beta <- (rbeta(n, 3, 3) - 0.5) * sqrt(12 * 3)
y_multi <- beta[1] + beta[2]*x1 + beta[3]*x2 + beta[4]*x3 + errors_beta
dat_multi <- data.frame(y = y_multi, x1 = x1, x2 = x2, x3 = x3)
# Fit
fit_multi_ols <- lm(y ~ x1 + x2 + x3, data = dat_multi)
fit_multi_pmm3 <- lm_pmm3(y ~ x1 + x2 + x3, data = dat_multi)
comparison_multi <- data.frame(
Parameter = c("Intercept", "x1", "x2", "x3"),
True = beta,
OLS = coef(fit_multi_ols),
PMM3 = coef(fit_multi_pmm3),
Diff = coef(fit_multi_pmm3) - coef(fit_multi_ols)
)
print(comparison_multi, row.names = FALSE, digits = 4)
#> Parameter True OLS PMM3 Diff
#> Intercept 5.0 4.9943 4.9668 -0.027433
#> x1 1.5 1.6223 1.6114 -0.010928
#> x2 -0.8 -0.7644 -0.7735 -0.009073
#> x3 2.0 1.9923 1.9992 0.006979
cat("\ng3 =", fit_multi_pmm3@g_coefficient,
" (improvement:", round((1 - fit_multi_pmm3@g_coefficient) * 100, 1), "%)\n")
#>
#> g3 = 0.9000364 (improvement: 10 %)PMM3 supports models without an intercept term.
# Model through the origin
y_noint <- 3 * x1 + errors_beta[1:n]
dat_noint <- data.frame(y = y_noint, x1 = x1)
fit_noint <- lm_pmm3(y ~ x1 - 1, data = dat_noint)
cat("True slope: 3\n")
#> True slope: 3
cat("PMM3 estimate:", coef(fit_noint), "\n")
#> PMM3 estimate: 3.11516
cat("Converged:", fit_noint@convergence, "\n")
#> Converged: TRUEThe summary() method reports several PMM3-specific
quantities:
summary(fit_pmm3)
#> PMM3 estimation results (S = 3, symmetric errors)
#> Call:
#> lm_pmm3(formula = y ~ x, data = dat)
#>
#> Coefficients:
#> (Intercept) x
#> 9.916678 2.518456
#>
#> Central moments of initial residuals:
#> m2 = 0.9555941
#> m4 = 1.658808
#> m6 = 3.424153
#>
#> Theoretical characteristics of PMM3 (S = 3):
#> gamma4 = -1.183442
#> gamma6 = 6.675667
#> g3 = 0.3082707 (expected ratio Var[PMM3]/Var[OLS])
#> kappa = 1.231908
#>
#> Algorithm information:
#> Convergence status: TRUE
#> Iterations: 3| Quantity | Meaning |
|---|---|
| m2, m4, m6 | Central moments of the initial (OLS) residuals |
| gamma4 | Excess kurtosis: negative = platykurtic, zero = Gaussian |
| gamma6 | Sixth-order cumulant coefficient |
| g3 | Theoretical efficiency factor: \(\text{Var}(\hat\beta_{\text{PMM3}}) / \text{Var}(\hat\beta_{\text{OLS}})\) |
| kappa | Moment ratio \(m_4/m_2\) used in the NR score equations |
| Convergence | Whether the Newton-Raphson algorithm converged |
| Iterations | Number of NR iterations performed |
Rule of thumb: If \(g_3 < 0.90\), PMM3 provides a meaningful improvement over OLS (\(> 10\%\) variance reduction).
By default, PMM3 computes \(\kappa\) from OLS residuals and holds it fixed throughout the Newton-Raphson iterations. In adaptive mode, \(\kappa\) is re-estimated from the current residuals at each iteration:
fit_fixed <- lm_pmm3(y ~ x, data = dat, adaptive = FALSE)
fit_adaptive <- lm_pmm3(y ~ x, data = dat, adaptive = TRUE)
comparison_adapt <- data.frame(
Mode = c("Fixed kappa", "Adaptive kappa"),
Intercept = c(coef(fit_fixed)[1], coef(fit_adaptive)[1]),
Slope = c(coef(fit_fixed)[2], coef(fit_adaptive)[2]),
Iterations = c(fit_fixed@iterations, fit_adaptive@iterations)
)
print(comparison_adapt, row.names = FALSE, digits = 5)
#> Mode Intercept Slope Iterations
#> Fixed kappa 9.9167 2.5185 3
#> Adaptive kappa 9.9177 2.5183 4Adaptive mode can improve estimates when the error distribution departs significantly from what the initial OLS residuals suggest, at the cost of additional iterations.
PMM3 provides four diagnostic plots:
# Symmetric platykurtic errors → PMM3
pmm_dispatch(runif(500, -1, 1))
#> n = 500 | gamma3 = -0.026 | gamma4 = -1.272
#> g2(PMM2) = 0.9990 | g3(PMM3) = 0.2438
#> >>> |gamma3| = 0.026 < 0.3 (symmetric) and gamma4 = -1.272 < -0.7 (platykurtic). PMM3 expected to reduce variance by 75.6%.
# Asymmetric errors → PMM2
pmm_dispatch(rexp(500) - 1)
#> n = 500 | gamma3 = +2.351 | gamma4 = +8.946
#> g2(PMM2) = 0.4953 | g3(PMM3) = 0.7901
#> >>> |gamma3| = 2.351 > 1.0: strong asymmetry detected. PMM2 expected to reduce variance by 50.5%.
# Gaussian errors → OLS
pmm_dispatch(rnorm(500))
#> n = 500 | gamma3 = -0.001 | gamma4 = +0.007
#> g2(PMM2) = 1.0000 | g3(PMM3) = 1.0000
#> >>> gamma3 = -0.001, gamma4 = 0.007: near-Gaussian residuals. No PMM advantage expected. Use OLS.mom <- compute_moments_pmm3(residuals(fit_ols))
cat("m2 =", mom$m2, "\n")
#> m2 = 0.9555941
cat("m4 =", mom$m4, "\n")
#> m4 = 1.658808
cat("m6 =", mom$m6, "\n")
#> m6 = 3.424153
cat("gamma4 =", mom$gamma4, "\n")
#> gamma4 = -1.183442
cat("kappa =", mom$kappa, "\n")
#> kappa = 1.231908
cat("g3 =", mom$g3, "\n")
#> g3 = 0.3082707Use PMM3 when:
Use PMM2 instead when:
Stick with OLS when:
pmm_skewness(),
pmm_kurtosis(), test_symmetry()pmm_dispatch() for automatic
recommendationBased on Monte Carlo simulations (R = 1000 replications):
| Error Distribution | \(\gamma_4\) | \(g_3\) | Variance Reduction |
|---|---|---|---|
| Uniform | \(-1.20\) | \(0.64\) | 36% |
| Beta(2,2) symmetric | \(-0.86\) | \(0.78\) | 22% |
| Beta(3,3) symmetric | \(-0.57\) | \(0.88\) | 12% |
| Truncated Normal (\(|x| \leq 2\)) | \(-0.47\) | \(0.91\) | 9% |
| Gaussian (control) | \(0.00\) | \(1.00\) | 0% |
Key insight: The more platykurtic the errors (\(\gamma_4\) further from 0), the larger the efficiency gain from PMM3.
vignette("pmm3_time_series") for AR, MA, ARMA, ARIMA models
with PMM3vignette("pmm2_introduction") for asymmetric error
estimationvignette("pmm2_time_series") for PMM2 time series
modelsvignette("bootstrap_inference")Zabolotnii S., Warsza Z.L., Tkachenko O. (2018). Polynomial Estimation of Linear Regression Parameters for the Asymmetric PDF of Errors. Springer AISC, vol 743. doi:10.1007/978-3-319-77179-3_75
Package functions: ?lm_pmm3,
?pmm_dispatch, ?compute_moments_pmm3,
?test_symmetry, ?pmm_gamma6