freq() and cross_tab() are the core
tabulation functions in spicy. They handle factors, labelled variables
(from haven or labelled), weights, and missing values out of the box.
This vignette covers the main options using the bundled
sochealth dataset.
Pass a data frame and a variable name to get counts and percentages:
freq(sochealth, education)
#> Frequency table: education
#>
#> Category │ Values Freq. Percent
#> ──────────┼─────────────────────────────────
#> Valid │ Lower secondary 261 21.8
#> │ Upper secondary 539 44.9
#> │ Tertiary 400 33.3
#> ──────────┼─────────────────────────────────
#> Total │ 1200 100.0
#>
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealthSort by frequency with sort = "-" (decreasing) or
sort = "+" (increasing). Sort alphabetically with
sort = "name+" or sort = "name-":
freq(sochealth, education, sort = "-")
#> Frequency table: education
#>
#> Category │ Values Freq. Percent
#> ──────────┼─────────────────────────────────
#> Valid │ Upper secondary 539 44.9
#> │ Tertiary 400 33.3
#> │ Lower secondary 261 21.8
#> ──────────┼─────────────────────────────────
#> Total │ 1200 100.0
#>
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealthSort alphabetically:
freq(sochealth, education, sort = "name+")
#> Frequency table: education
#>
#> Category │ Values Freq. Percent
#> ──────────┼─────────────────────────────────
#> Valid │ Lower secondary 261 21.8
#> │ Tertiary 400 33.3
#> │ Upper secondary 539 44.9
#> ──────────┼─────────────────────────────────
#> Total │ 1200 100.0
#>
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealthAdd cumulative columns with cum = TRUE:
freq(sochealth, smoking, cum = TRUE)
#> Frequency table: smoking
#>
#> Category │ Values Freq. Percent Valid Percent Cum. Percent
#> ──────────┼─────────────────────────────────────────────────────
#> Valid │ No 926 77.2 78.8 77.2
#> │ Yes 249 20.8 21.2 97.9
#> Missing │ NA 25 2.1 100.0
#> ──────────┼─────────────────────────────────────────────────────
#> Total │ 1200 100.0 100.0 100.0
#>
#> Category │ Values Cum. Valid Percent
#> ──────────┼────────────────────────────
#> Valid │ No 78.8
#> │ Yes 100.0
#> Missing │ NA
#> ──────────┼────────────────────────────
#> Total │ 100.0
#>
#> Label: Current smoker
#> Class: factor
#> Data: sochealthSupply a weight variable with weights. By default,
rescale = TRUE adjusts the weighted total to match the
unweighted sample size:
freq(sochealth, education, weights = weight)
#> Frequency table: education
#>
#> Category │ Values Freq. Percent
#> ──────────┼──────────────────────────────────
#> Valid │ Lower secondary 258.62 21.6
#> │ Upper secondary 546.40 45.5
#> │ Tertiary 394.99 32.9
#> ──────────┼──────────────────────────────────
#> Total │ 1200 100.0
#>
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth
#> Weight: weight (rescaled)Set rescale = FALSE to keep the raw weighted counts:
freq(sochealth, education, weights = weight, rescale = FALSE)
#> Frequency table: education
#>
#> Category │ Values Freq. Percent
#> ──────────┼───────────────────────────────────
#> Valid │ Lower secondary 257.86 21.6
#> │ Upper secondary 544.79 45.5
#> │ Tertiary 393.82 32.9
#> ──────────┼───────────────────────────────────
#> Total │ 1196.47 100.0
#>
#> Label: Highest education level
#> Class: ordered, factor
#> Data: sochealth
#> Weight: weightWhen a variable has value labels (e.g., imported from SPSS or Stata
with haven), freq() shows them by default with the
[code] label format. Control this with
labelled_levels:
# Create a labelled version of the smoking variable
sh <- sochealth
sh$smoking_lbl <- labelled::labelled(
ifelse(sh$smoking == "Yes", 1L, 0L),
labels = c("Non-smoker" = 0L, "Current smoker" = 1L)
)
# Default: [code] label
freq(sh, smoking_lbl)
#> Frequency table: smoking_lbl
#>
#> Category │ Values Freq. Percent Valid Percent
#> ──────────┼───────────────────────────────────────────────────
#> Valid │ [0] Non-smoker 926 77.2 78.8
#> │ [1] Current smoker 249 20.8 21.2
#> Missing │ NA 25 2.1
#> ──────────┼───────────────────────────────────────────────────
#> Total │ 1200 100.0 100.0
#>
#> Class: haven_labelled, vctrs_vctr, integer
#> Data: sh
# Labels only (no codes)
freq(sh, smoking_lbl, labelled_levels = "labels")
#> Frequency table: smoking_lbl
#>
#> Category │ Values Freq. Percent Valid Percent
#> ──────────┼───────────────────────────────────────────────
#> Valid │ Non-smoker 926 77.2 78.8
#> │ Current smoker 249 20.8 21.2
#> Missing │ NA 25 2.1
#> ──────────┼───────────────────────────────────────────────
#> Total │ 1200 100.0 100.0
#>
#> Class: haven_labelled, vctrs_vctr, integer
#> Data: sh
# Codes only (no labels)
freq(sh, smoking_lbl, labelled_levels = "values")
#> Frequency table: smoking_lbl
#>
#> Category │ Values Freq. Percent Valid Percent
#> ──────────┼───────────────────────────────────────
#> Valid │ 0 926 77.2 78.8
#> │ 1 249 20.8 21.2
#> Missing │ NA 25 2.1
#> ──────────┼───────────────────────────────────────
#> Total │ 1200 100.0 100.0
#>
#> Class: haven_labelled, vctrs_vctr, integer
#> Data: shTreat specific values as missing with na_val:
freq(sochealth, income_group, na_val = "High")
#> Frequency table: income_group
#>
#> Category │ Values Freq. Percent Valid Percent
#> ──────────┼─────────────────────────────────────────────
#> Valid │ Low 247 20.6 25.6
#> │ Lower middle 388 32.3 40.3
#> │ Upper middle 328 27.3 34.1
#> Missing │ NA 237 19.8
#> ──────────┼─────────────────────────────────────────────
#> Total │ 1200 100.0 100.0
#>
#> Label: Household income group
#> Class: ordered, factor
#> Data: sochealthCross two variables to get a contingency table with a chi-squared test and effect size:
cross_tab(sochealth, smoking, education)
#> Crosstable: smoking x education (N)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 179 415 332
#> Yes │ 78 112 59
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 257 527 391
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 926
#> Yes │ 249
#> ─────────────┼────────────
#> Total │ 1175
#>
#> Chi-2(2) = 21.6, p < 0.001
#> Cramer's V = 0.14Use percent = "row" or percent = "col" to
display percentages instead of raw counts:
cross_tab(sochealth, smoking, education, percent = "col")
#> Crosstable: smoking x education (Column %)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 69.6 78.7 84.9
#> Yes │ 30.4 21.3 15.1
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 100.0 100.0 100.0
#> N │ 257 527 391
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 78.8
#> Yes │ 21.2
#> ─────────────┼────────────
#> Total │ 100.0
#> N │ 1175
#>
#> Chi-2(2) = 21.6, p < 0.001
#> Cramer's V = 0.14cross_tab(sochealth, smoking, education, percent = "row")
#> Crosstable: smoking x education (Row %)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 19.3 44.8 35.9
#> Yes │ 31.3 45.0 23.7
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 21.9 44.9 33.3
#>
#> Values │ Total N
#> ─────────────┼───────────────────────
#> No │ 100.0 926
#> Yes │ 100.0 249
#> ─────────────┼───────────────────────
#> Total │ 100.0 1175
#>
#> Chi-2(2) = 21.6, p < 0.001
#> Cramer's V = 0.14Stratify the table by a third variable:
cross_tab(sochealth, smoking, education, by = sex)
#> Crosstable: smoking x education (N) | sex = Female
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 95 220 160
#> Yes │ 38 62 31
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 133 282 191
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 475
#> Yes │ 131
#> ─────────────┼────────────
#> Total │ 606
#>
#> Chi-2(2) = 7.1, p = 0.029
#> Cramer's V = 0.11
#>
#> Crosstable: smoking x education (N) | sex = Male
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 84 195 172
#> Yes │ 40 50 28
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 124 245 200
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 451
#> Yes │ 118
#> ─────────────┼────────────
#> Total │ 569
#>
#> Chi-2(2) = 15.6, p < 0.001
#> Cramer's V = 0.17For more than one grouping variable, use
interaction():
cross_tab(sochealth, smoking, education,
by = interaction(sex, age_group))
#> Crosstable: smoking x education (N) | sex x age_group = Female.25-34
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 23 49 29
#> Yes │ 9 9 7
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 32 58 36
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 101
#> Yes │ 25
#> ─────────────┼────────────
#> Total │ 126
#>
#> Chi-2(2) = 2.1, p = 0.356
#> Cramer's V = 0.13
#>
#> Crosstable: smoking x education (N) | sex x age_group = Male.25-34
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 9 42 32
#> Yes │ 11 11 4
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 20 53 36
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 83
#> Yes │ 26
#> ─────────────┼────────────
#> Total │ 109
#>
#> Chi-2(2) = 14.2, p < 0.001
#> Cramer's V = 0.36
#>
#> Crosstable: smoking x education (N) | sex x age_group = Female.35-49
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 24 73 48
#> Yes │ 10 20 8
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 34 93 56
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 145
#> Yes │ 38
#> ─────────────┼────────────
#> Total │ 183
#>
#> Chi-2(2) = 3.0, p = 0.223
#> Cramer's V = 0.13
#>
#> Crosstable: smoking x education (N) | sex x age_group = Male.35-49
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 33 59 60
#> Yes │ 14 17 7
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 47 76 67
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 152
#> Yes │ 38
#> ─────────────┼────────────
#> Total │ 190
#>
#> Chi-2(2) = 6.9, p = 0.032
#> Cramer's V = 0.19
#>
#> Crosstable: smoking x education (N) | sex x age_group = Female.50-64
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 28 63 45
#> Yes │ 8 16 6
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 36 79 51
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 136
#> Yes │ 30
#> ─────────────┼────────────
#> Total │ 166
#>
#> Chi-2(2) = 2.0, p = 0.360
#> Cramer's V = 0.11
#>
#> Crosstable: smoking x education (N) | sex x age_group = Male.50-64
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 28 58 42
#> Yes │ 8 13 5
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 36 71 47
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 128
#> Yes │ 26
#> ─────────────┼────────────
#> Total │ 154
#>
#> Chi-2(2) = 2.1, p = 0.343
#> Cramer's V = 0.12
#>
#> Crosstable: smoking x education (N) | sex x age_group = Female.65-75
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 20 35 38
#> Yes │ 11 17 10
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 31 52 48
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 93
#> Yes │ 38
#> ─────────────┼────────────
#> Total │ 131
#>
#> Chi-2(2) = 2.5, p = 0.282
#> Cramer's V = 0.14
#>
#> Crosstable: smoking x education (N) | sex x age_group = Male.65-75
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 14 36 38
#> Yes │ 7 9 12
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 21 45 50
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 88
#> Yes │ 28
#> ─────────────┼────────────
#> Total │ 116
#>
#> Chi-2(2) = 1.4, p = 0.499
#> Cramer's V = 0.11When both variables are ordered factors, cross_tab()
automatically switches from Cramer’s V to Kendall’s Tau-b:
cross_tab(sochealth, self_rated_health, education)
#> Crosstable: self_rated_health x education (N)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ────────────────┼───────────────────────────────────────────────────────────
#> Poor │ 28 28 5
#> Fair │ 86 118 62
#> Good │ 102 263 193
#> Very good │ 44 118 133
#> ────────────────┼───────────────────────────────────────────────────────────
#> Total │ 260 527 393
#>
#> Values │ Total
#> ────────────────┼────────────
#> Poor │ 61
#> Fair │ 266
#> Good │ 558
#> Very good │ 295
#> ────────────────┼────────────
#> Total │ 1180
#>
#> Chi-2(6) = 73.2, p < 0.001
#> Kendall's Tau-b = 0.20You can override the automatic selection with
assoc_measure:
cross_tab(sochealth, self_rated_health, education, assoc_measure = "gamma")
#> Crosstable: self_rated_health x education (N)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ────────────────┼───────────────────────────────────────────────────────────
#> Poor │ 28 28 5
#> Fair │ 86 118 62
#> Good │ 102 263 193
#> Very good │ 44 118 133
#> ────────────────┼───────────────────────────────────────────────────────────
#> Total │ 260 527 393
#>
#> Values │ Total
#> ────────────────┼────────────
#> Poor │ 61
#> Fair │ 266
#> Good │ 558
#> Very good │ 295
#> ────────────────┼────────────
#> Total │ 1180
#>
#> Chi-2(6) = 73.2, p < 0.001
#> Goodman-Kruskal Gamma = 0.31Add a 95% confidence interval for the association measure with
assoc_ci = TRUE:
cross_tab(sochealth, smoking, education, assoc_ci = TRUE)
#> Crosstable: smoking x education (N)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 179 415 332
#> Yes │ 78 112 59
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 257 527 391
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 926
#> Yes │ 249
#> ─────────────┼────────────
#> Total │ 1175
#>
#> Chi-2(2) = 21.6, p < 0.001
#> Cramer's V = 0.14, 95% CI [0.08, 0.19]Weights work the same as in freq(). Without rescaling,
the table shows raw weighted counts:
cross_tab(sochealth, smoking, education, weights = weight)
#> Crosstable: smoking x education (N)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 176 417 324
#> Yes │ 79 114 60
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 255 531 384
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 917
#> Yes │ 253
#> ─────────────┼────────────
#> Total │ 1170
#>
#> Chi-2(2) = 21.3, p < 0.001
#> Cramer's V = 0.13
#> Weight: weightWith rescale = TRUE, the weighted total matches the
unweighted sample size:
cross_tab(sochealth, smoking, education, weights = weight, rescale = TRUE)
#> Crosstable: smoking x education (N)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 176 419 325
#> Yes │ 79 115 60
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 255 534 385
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 921
#> Yes │ 254
#> ─────────────┼────────────
#> Total │ 1175
#>
#> Chi-2(2) = 21.4, p < 0.001
#> Cramer's V = 0.13
#> Weight: weight (rescaled)When expected cell counts are small, use simulated p-values:
cross_tab(sochealth, smoking, education,
simulate_p = TRUE, simulate_B = 5000)
#> Crosstable: smoking x education (N)
#>
#> Values │ Lower secondary Upper secondary Tertiary
#> ─────────────┼───────────────────────────────────────────────────────────
#> No │ 179 415 332
#> Yes │ 78 112 59
#> ─────────────┼───────────────────────────────────────────────────────────
#> Total │ 257 527 391
#>
#> Values │ Total
#> ─────────────┼────────────
#> No │ 926
#> Yes │ 249
#> ─────────────┼────────────
#> Total │ 1175
#>
#> Chi-2(NA) = 21.6, p < 0.001 (simulated)
#> Cramer's V = 0.14You can set package-wide defaults with options() so you
don’t have to repeat arguments:
vignette("association-measures") - choosing the right
effect size for your contingency table.vignette("table-categorical") - building
publication-ready categorical tables.?freq and ?cross_tab for the full argument
reference.