spicy 0.11.0

New features

`table_continuous_lm()`

Cluster-robust SEs via cluster and four vcov choices ("CR0"–"CR3"), dispatched to clubSandwich with Satterthwaite df (clubSandwich in Suggests).
vcov = "bootstrap" (nonparametric or cluster) and vcov = "jackknife" (leave-one-out / leave-one-cluster-out) variance estimators in pure base R, controlled by boot_n.
Three new effect_size choices alongside "f2": Cohen’s "d", Hedges’ "g" (two-group only), Hays’ "omega2". New effect_size_ci adds noncentral t / F CIs rendered inline as 0.18 [0.07, 0.30].
HC* estimators delegate to sandwich::vcovHC(); rank-deficient fits return a clean rank-by-rank covariance.

Harmonisation across the table family

Shared reporting vocabulary (decimal_mark, p_digits, align, named-labels) now spans cross_tab(), freq() and the three table_*() helpers, including APA-style p-value notation (<.001 / .045, no leading zero).
table_categorical()’s assoc_measure accepts a per-variable spec. When measures differ across rows the column collapses to "Effect size" and an APA-style Note. line documents the per-variable measure; phi on a non-2x2 errors.
All three table_*() functions gain as.data.frame(), tibble::as_tibble(), broom::tidy() and broom::glance() methods (broom in Suggests).

Quality and robustness

Classed conditions. Errors and warnings now carry stable classes (spicy_error / spicy_warning plus 11 leaf classes documented in ?spicy), so downstream code can dispatch via tryCatch() / withCallingHandlers() instead of matching message strings. rlang (>= 1.1.0) required.
Structured cli messages. Multi-line errors and warnings (vcov fallbacks, bootstrap/jackknife failures, padding migration, labels length mismatch) render as cli bullets.
Locale-deterministic ordering. Sorts in varlist(), freq(), cross_tab() and table_*() use method = "radix". Output is byte-stable across locales and platforms, matching Stata / SPSS guarantees.
Edge-case hardening. varlist() / code_book() / cross_tab() / freq() no longer crash on zero-length or all-NA Date / POSIXct / character columns or factors with no observed levels (R 4.6.0 sort() segfaults on these inputs).
Snapshot-locked rendering. tests/testthat/test-snapshots.R pins the exact console output of every spicy print method, so any unintended formatting drift surfaces as a PR diff.
API stability contract. ?spicy documents which exports are stable, stabilising or internal. pkgdown reference groups exports via four @family tags.
Cross-software validation. All 13 association measures agree with PSPP 2.0 (CROSSTABS /STATISTICS=ALL, 65 / 65 statistics on four datasets); Cohen’s d and Hedges’ g noncentral CIs are tested numerically against effectsize::cohens_d() / effectsize::hedges_g() (tolerance = 1e-6); point-estimate formulas and asymptotic standard errors follow DescTools (Signorell et al.).

Improvements

cross_tab() warns when correct = TRUE is ignored on a non-2x2 sub-table, when weights contains NA, and notes statistics computed on a sub-table after empty rows / columns are pruned.
cross_tab() validates decimal_mark, p_digits and simulate_B up front; freq() validates decimal_mark and tightens digits to a non-negative integer.
A user category literally named "N" or "Total" is no longer mis-rendered as the totals row in cross_tab().
table_continuous_lm(output = "long") returns n, df1, df2 as integer columns; predictor_label preserved on the degenerate-model fallback path.
cramer_v() / phi() doc states the CI uses the Fisher z-transformation (point estimate and p-value identical to DescTools / SPSS).
uncertainty_coef() doc states entropy uses 0 log 0 = 0 (matching SPSS, PSPP, Stata, Cover & Thomas).

Bug fixes

label_from_names() raises actionable errors on duplicate or empty new column names; trims whitespace and preserves the input class.
table_continuous_lm(output = "data.frame") names contrast CI columns from ci_level (was hardcoded to 95 %).
The categorical-predictor global Wald F degrades to NA on a singular coefficient covariance submatrix.
The degenerate-table branch of cramer_v(), yule_q(), gamma_gk(), kendall_tau_b() and somers_d() respects detail: scalar NA_real_ by default, fully shaped spicy_assoc_detail when detail = TRUE.
uncertainty_coef() returns a finite estimate (was NaN) when a marginal is zero.
somers_d(direction = "symmetric") returns the harmonic mean of the two asymmetric values, matching SPSS / PSPP CROSSTABS.
print.spicy_assoc_detail() / print.spicy_assoc_table() use APA-strict <.001 / .045 notation, matching the rest of the package.
varlist() / code_book() honour factor_levels = "all" for haven_labelled columns: declared-but-unobserved labels appear in the Values summary.
copy_clipboard() rejects row.names.as.col vectors of length ≠ 1 and empty strings; accumulates all messages from clipr::write_clip() instead of overwriting.
mean_n() / sum_n() reject non-integer min_valid >= 1 and min_valid > ncol; their digits requires a non-negative integer.

Breaking changes

table_continuous_lm() and table_categorical() default to decimal-point alignment for numeric columns (align = "decimal"). Pass align = "auto" for the previous behaviour.
build_ascii_table() / spicy_print_table(): padding switches from a string enum to a non-negative integer. Default 2L (was +5L); printed tables are roughly 40 % narrower. Migration: "compact" -> 0L, "normal" -> 2L, "wide" -> 4L.
table_categorical(assoc_measure = "auto") on a 2x2 table picks phi instead of cramer_v. Numeric value unchanged (|phi| = V on 2x2); only the column label changes.
freq() drops observations with NA weights (with a warning) instead of recoding them to zero. Aligns with cross_tab().
table_continuous_lm(output = "long") returns NA in es_type / es_value when effect_size = "none" (was "f2"), and renames sum_w to weighted_n.

spicy 0.10.0

New features

code_book() now accepts tidyselect-style variable selectors through ..., matching varlist() and vl().
code_book() gains a filename argument for the base name of CSV, Excel, and PDF exports. When NULL (the default), the filename is derived from title and falls back to "Codebook" when needed. Filenames are sanitized to portable ASCII consistently across platforms.
varlist() now summarizes matrix and array columns by their dimensions, and counts valid, missing, and distinct observations by rows.
freq() gains a factor_levels argument that mirrors varlist() and code_book(). With factor_levels = "all", declared-but-unobserved factor and labelled levels appear in the output with n = 0, matching SPSS FREQUENCIES; the default "observed" preserves the previous Stata tab-style behavior.

Improvements

varlist() now displays missing values as <NA> and <NaN> in the Values summary when include_na = TRUE, and quotes literal "NA", "NaN", and empty-string values so they cannot be confused with the missing markers.
varlist() now emits a column-named warning and marks the failing cell as <error: ...> when a column cannot be summarized, instead of silently writing "Invalid or unsupported format". Remaining columns are unaffected.
varlist() produces more precise Viewer titles for extraction, pipe, and literal get("name") expressions, while keeping ambiguous dynamic calls anonymous (vl: <data>).
code_book() now rejects partial-match names in ... (e.g. val = TRUE, tit = "x") that would otherwise be silently treated as tidyselect expressions, and surfaces varlist() selection errors directly.
freq() now resolves the weights argument via tidy-eval, so column references nested in compound expressions (e.g. weights = if (use_w) col else NULL) work as expected. Qualified expressions like weights = df2$w continue to take precedence over column lookup.
freq() validates digits, sort, weights, and the logical scalar arguments (valid, cum, rescale, styled) more strictly at the public boundary, with clearer error messages for non-finite values, NA, multi-element inputs, and non-numeric weight vectors.
freq() now documents the interaction of weights containing NA with rescale = TRUE (Stata pweight semantics) and the dropping of unused factor / labelled levels (Stata tab semantics, with code_book(factor_levels = "all") as the schema-style alternative).

Bug fixes

varlist() now displays labelled values in the same prefixed-label order for compact and values = TRUE summaries; previously the compact summary used data order.
varlist(values = TRUE) now deduplicates element types when summarizing list-columns. Previously list(1L, 2L, "a") produced "List(3): character, integer, integer"; now produces "List(3): character, integer".
include_na = TRUE now correctly appends <NA> markers for list-columns in both varlist() modes; previously it had no effect on this column type.
varlist() now validates column names up front and gives clearer errors for missing, empty, NA, or duplicate names.
varlist() now errors clearly when tidyselect expressions try to rename columns; ... is for selecting variables, not renaming.
freq(data, x, weights = NULL) now correctly treats the explicit NULL as “no weighting” instead of emitting a misleading "variable 'NULL' not found" error. Parameterized patterns like weights = if (use_w) wts else NULL are now supported.
print() for spicy_freq_table no longer crashes when the var_label attribute is NA_character_, numeric, or multi-element; the Label: line is silently skipped for any value that is not a single non-empty string.
freq() no longer surfaces the name of the ignored data vector in the printed footer when both data and x are passed as vectors. The footer now consistently shows the analyzed vector’s name.

spicy 0.9.0

Breaking changes

table_continuous() now enables inferential output by default when by is supplied. With a grouping variable, the p column from test is shown automatically (previous default hid it). This aligns the two table helpers: table_continuous() stays descriptive when by is absent, and reports the test p-value when by is supplied, matching table_continuous_lm()’s inferential default. To preserve the previous behavior, pass p_value = FALSE explicitly. statistic and effect_size remain FALSE by default and must still be enabled consciously.
varlist() now displays observed factor levels by default in Values, matching its role as a quick inspection of the current data. Use factor_levels = "all" to display unused factor levels as well, which was the previous default behavior and remains the default in code_book().

Minor improvements

code_book() gains a factor_levels argument. It defaults to "all" so exported codebooks continue to document all declared factor levels, including unused levels; use "observed" to mirror varlist() output.
freq() now prints the Freq. column as integers regardless of digits, which continues to control percentage precision. This matches the convention of SPSS, Stata, and SAS PROC FREQ for weighted counts and keeps the two numeric concepts (discrete counts vs. continuous percentages) visually distinct.
freq(..., styled = FALSE) now returns a genuinely plain data.frame with no spicy_freq_table rendering metadata clinging to it, so str(), dput(), and downstream programmatic use see only the tabulation columns. The metadata attributes (digits, data_name, var_name, var_label, class_name, n_total, n_valid, weighted, rescaled, weight_var) are now documented in @return and remain available on the invisibly returned spicy_freq_table object when styled = TRUE (the default).
table_continuous_lm() documentation now clarifies why p_value = TRUE and r2 = "r2" are the defaults, and robust-variance fallback warnings are now more explicit when a model matrix is singular.

Bug fixes

freq() now correctly resolves qualified weight expressions such as weights = other$w or weights = other[["w"]] even when the referenced column name also exists in data. Previously the bare-name fallback could silently pull the weight vector from the wrong data frame when column names collided.
freq() with sort and missing values now keeps the NA row at the end of the tabulation so the printed Cum. Percent and Cum. Valid Percent columns stay monotonic and match the Valid → Missing → Total display layout. Sorting previously could push the NA row between valid rows and make cumulative percentages appear to jump.
varlist() now preserves literal "NA" and empty-string values in the Values summary instead of removing them as if they were missing values.
varlist() now distinguishes actual NA values from NaN in the Values summary when include_na = TRUE.
varlist(values = TRUE) now preserves factor level order in the Values summary, matching the default compact factor display.
varlist() now validates values, tbl, and include_na up front and gives a clear error when one of them is not TRUE or FALSE.

spicy 0.8.0

New features

table_continuous_lm() adds APA-style bivariate linear-model tables for continuous outcomes. It acts as the model-based companion to table_continuous() for reporting fitted mean comparisons or slopes in an lm framework, with one predictor per model, model-based means for categorical predictors, optional case weights, classical or HC0-HC5 variance estimators, multiple output formats (ASCII, tinytable, gt, flextable, Excel, clipboard, and Word), output = "data.frame" for the wide raw table, output = "long" for the analytic long table, and configurable display of tests, confidence intervals, fit statistics, and effect sizes.

Minor improvements

Installed package vignettes now avoid embedding heavy HTML table and codebook widgets during CRAN builds, reducing package size while preserving rich pkgdown article rendering.
Website and vignette coverage now includes table_continuous_lm(), using the bundled sochealth data throughout and adding a dedicated article for model-based continuous summary tables.
table_continuous() and table_continuous_lm() now support dedicated display precision for effect-size columns, and table_continuous_lm() also supports separate precision for R² columns, so model fit and effect sizes can be formatted independently from descriptive values and test statistics.
table_continuous_lm() now keeps n as the unweighted analytic sample size in wide and rendered outputs, and can optionally add a separate Weighted n column reporting the sum of case weights.

spicy 0.7.0

New features

table_continuous() is a new helper for continuous summary tables. It computes descriptive statistics (mean, SD, min, max, confidence interval of the mean, and n) for numeric variables, with tidyselect column selection, optional grouping via by, and multiple output formats (ASCII, tinytable, gt, flextable, Excel, clipboard, and Word).
table_continuous() gains effect_size and effect_size_ci arguments. When by is used, effect_size = TRUE adds an “ES” column with the appropriate measure (Hedges’ g, eta-squared, rank-biserial r_rb, or epsilon-squared) chosen automatically based on the test method and number of groups, and effect_size_ci = TRUE appends the confidence interval in brackets.
table_continuous() gains a test argument ("welch", "student", or "nonparametric") to choose the group-comparison method, along with independent p_value and statistic display toggles so users can request either or both outputs when by is used.
ASCII console tables now split oversized outputs into stacked horizontal panels, repeating the left-most identifier columns so wide freq(), cross_tab(), table_categorical(), and table_continuous() prints stay readable in narrow consoles.

Breaking changes

table_categorical() replaces table_apa() as the public helper for categorical summary tables. It uses select and by, supports grouped cross-tabulation or one-way frequency-style tables when by = NULL, and consolidates output formats under a single output argument. Migrate existing table_apa() calls to table_categorical(), use output = "default" for ASCII tables and output = "data.frame" for plain data frames, and replace former output = "wide" / style = "report" paths with the formatted output engines.
Excel export now uses openxlsx2 instead of openxlsx for a lighter dependency footprint (no Rcpp compilation required).

Minor improvements

Package citation metadata now uses the current package title and CRAN DOI, so citation("spicy") matches DESCRIPTION and points to the package DOI.
table_categorical() and table_continuous() now print shorter ASCII titles without appending the input data frame name, and no longer require officer for output = "flextable" alone; officer is now required only for Word export paths that actually write .docx files.
table_continuous() now accepts tidyselect syntax in exclude in addition to character vectors, and no longer warns that test is ignored when it is still needed to compute effect sizes.

spicy 0.6.0

New features

New family of association measure functions for contingency tables: assoc_measures(), contingency_coef(), gamma_gk(), goodman_kruskal_tau(), kendall_tau_b(), kendall_tau_c(), lambda_gk(), phi(), somers_d(), uncertainty_coef(), and yule_q(). Each returns a numeric scalar by default; pass detail = TRUE for a named vector with estimate, confidence interval, and p-value.
cross_tab() gains assoc_measure and assoc_ci arguments. When both variables are ordered factors, it automatically selects Kendall’s Tau-b instead of Cramer’s V. The note format changes from Chi-2: 18.0 (df = 4) to Chi-2(4) = 18.0. Numeric attributes (chi2, df, p_value, assoc_measure, assoc_value, assoc_result) are now attached to the output data frame.
table_apa() now dynamically labels the association measure column based on the measure used, instead of always showing “Cramer’s V”. New assoc_measure and assoc_ci arguments are passed through to cross_tab().
table_apa() gains output = "gt" to produce a gt_tbl object with APA-style formatting, column spanners, and alignment.
table_apa() now correctly centers spanner labels over their column pairs in tinytable and flextable output.
All association measure functions and assoc_measures() gain a digits argument (default 3) that controls the number of decimal places when printed. The p-value always uses 3 decimal places or < 0.001.
detail = TRUE results now print with formatted output (aligned columns, fixed decimal places) via a new print.spicy_assoc_detail() method. assoc_measures() output uses a new print.spicy_assoc_table() method with the same formatting.
New bundled dataset sochealth: a simulated social-health survey (n = 1200, 24 variables) with variable labels, ordered factors, survey weights, and missing values. Includes four Likert-scaled life satisfaction items (life_sat_health, life_sat_work, life_sat_relationships, life_sat_standard) for demonstrating mean_n(), sum_n(), and count_n().

Bug fixes

count_n() now correctly counts NA values when count = NA and strict = TRUE are both used. List columns are now reported in verbose mode instead of causing silent errors.
cross_tab() rescale logic now operates on complete cases only, so the weighted total N matches the unweighted N when missing values are present (consistent with Stata behavior).
freq() now uses true NA consistently (instead of the "<NA>" string) in both weighted and unweighted paths. cum_valid_prop is now correctly NA for missing rows. Invalid digits and sort values are rejected with clear error messages.
mean_n() and sum_n() now validate min_valid and digits arguments, rejecting non-numeric, negative, or multi-element values.
mean_n(), sum_n(), and count_n() no longer trigger a tidyselect deprecation warning when select receives a character vector. Character vectors are now automatically wrapped with all_of().
table_apa() now preserves the original factor level order in row variables instead of sorting alphabetically. When drop_na = FALSE, the (Missing) category is placed at the bottom of each variable’s levels. percent_digits, p_digits, and v_digits are now validated.
table_apa() p-values no longer wrap across lines in tinytable HTML output.

Breaking changes

cramer_v() now accepts a detail argument. By default it returns a numeric scalar (as before). Pass detail = TRUE to get a 4-element named vector (estimate, ci_lower, ci_upper, p_value), or detail = TRUE, conf_level = NULL for a 2-element vector (estimate, p_value) without CI.

spicy 0.5.0

New features

New table_apa() helper to build APA-ready cross-tab reports with multiple output formats (wide, long, tinytable, flextable, excel, clipboard, word).
table_apa() exposes key cross_tab() controls for weighting and inference (weights, rescale, correct, simulate_p, simulate_B) and now handles missing values explicitly when drop_na = FALSE.

Bug fixes

count_n() no longer crashes when special = "NaN" is used with non-numeric columns. Passing count = NA now errors with a message directing to special = "NA".
cross_tab() fixes a spurious rescale warning for explicit all-ones weights and aligns the Cramer’s V formula with cramer_v().
table_apa() no longer leaks global options on error. The simulate_p default is aligned to FALSE.
varlist() title generation no longer crashes on unrecognizable expressions.

Minor improvements

copy_clipboard() parameter message renamed to show_message.
freq() now dispatches printing correctly via S3.
Removed unused collapse and stringi from Imports.

spicy 0.4.2

cross_tab() hardening: improved vector-mode detection (including labelled vectors), stricter weight validation, safer rescaling, and clearer early errors (e.g., explicit y = NULL).
cross_tab() statistics are now computed on non-empty margins in grouped tables, avoiding spurious NA results; internal core path refactored to remove dplyr/tibble from computation while preserving user-facing behavior.
freq() now errors clearly when x is missing for data.frame input and validates rescaling when weight sums are zero/non-finite.
count_n(), mean_n(), and sum_n() regex mode is hardened (regex = TRUE now validates/defaults select safely).
mean_n() and sum_n() now return NA (with warning) when no numeric columns are selected.
label_from_names() now validates input type (data.frame/tibble required).
cramer_v() now returns NA with warning for degenerate tables.
Dependency optimization: DT and clipr moved to Suggests; optional runtime checks added in code_book() and copy_clipboard().
Tests expanded with regression coverage for all the above edge cases.

spicy 0.4.1

Fixed CRAN incoming check notes by removing non-standard top-level files.

spicy 0.4.0

Print methods have been fully redesigned to produce clean, aligned ASCII tables inspired by Stata’s layout. The new implementation improves formatting, adds optional color support, and provides more consistent handling of totals and column spacing.
Output from freq() and cross_tab() now benefits from the enhanced print.spicy() formatting, offering clearer, more readable summary tables.
Documentation and internal tests were updated for clarity and consistency.
cross_tab() gains an explicit correct argument to control the use of Yates’ continuity correction for Chi-squared tests in 2x2 tables. The default behavior remains unchanged.
The documentation of cross_tab() was refined and harmonized, with a clearer high-level description, improved parameter wording, and expanded examples.
Minor cosmetic improvements were made to varlist() output: the title prefix now uses vl: instead of VARLIST, and the column name Ndist_val was renamed to N_distinct for improved readability and consistency.
Minor cosmetic improvement: ASCII table output no longer includes a closing bottom rule by default.

spicy 0.3.0

New function code_book(), which generates a comprehensive variable codebook that can be viewed interactively and exported to multiple formats (copy, print, CSV, Excel, PDF).

spicy 0.2.1

label_from_names() now correctly handles edge cases when the separator appears in the label or is missing.

spicy 0.2.0

New function label_from_names() to derive and assign variable labels from headers of the form "name<sep>label" (e.g. "name. label"). Especially useful for LimeSurvey CSV exports (Export results -> CSV -> Headings: Question code & question text), where the default separator is ". ".

spicy 0.1.0

Initial release

Introduces a collection of tools for variable inspection, descriptive summaries, and data exploration.
Provides functions to:
- Extract variable metadata and display compact summaries (varlist()).
- Compute frequency tables (freq()), cross-tabulations (cross_tab()), and Cramer’s V for categorical associations (cramer_v()).
- Generate descriptive statistics such as means (mean_n()), sums (sum_n()), and counts (count_n()) with automatic handling of missing data.
- Copy data (copy_clipboard()) directly to the clipboard for quick export.