tidyna

Tired of littering your code with na.rm = TRUE?

tidyna masks common R functions and warns you when NAs are removed. It handles some special cases. The table() default is set to useNA = "ifany".

Installation

Install from CRAN:

install.packages("tidyna")

Or install the development version from GitHub:

# install.packages("pak")
pak::pak("statzhero/tidyna")

Usage

library(tidyna)

x <- c(1, 2, NA)
mean(x)
#> ⚠️ 1 missing value removed.
#> [1] 1.5

Suppress warnings with options(tidyna.warn = FALSE).

Functions

Summary: mean, sum, prod, sd, var, median, quantile
Extrema: min, max, pmin, pmax, range
Logical: any, all
Row-wise: rowSums, rowMeans
Correlation: cor
Table: table

Special cases

All-NA input is configurable: By default, tidyna throws an error when all values are NA to prevent misleading values like Inf, NaN, or 0:

base::sum(c(NA, NA), na.rm = TRUE)
#> [1] 0

sum(c(NA, NA))
#> Error in `sum()`:
#> ! All values are NA; check if something went wrong.

You can change this behavior with the all_na argument or the tidyna.all_na option:

# Return base R behavior (NaN, Inf, 0, etc.)
sum(c(NA, NA), all_na = "base")
#> [1] 0

# Always return NA
sum(c(NA, NA), all_na = "na")
#> [1] NA

rowSums/rowMeans return NA for all-NA rows, but error if the entire matrix is NA. Also configurable via all_na.

pmax/pmin return NA for positions where all inputs are NA (with a warning), but error if every position is all-NA. Also configurable via all_na.

cor defaults to use = "pairwise.complete.obs" instead of erroring on NAs.

table defaults to useNA = "ifany", showing NA counts when present rather than silently dropping them.

Performance

There is no free lunch. The tidyna package adds some overhead:

For most functions like mean() the overhead is negligible (1.1x). But rowMeans() and rowSums() require an extra pass to detect all-NA rows, so there is a substantial loss (3-4x).

I’m still working on whether the memory allocation needs to be addressed.

Roadmap

Add explicit _aware suffixed versions (mean_aware, sum_aware, etc.) for users who prefer not to mask base functions.

naflex: Conditional NA removal based on thresholds
na.tools: Utilities for working with missing values