New functions cheapr_if_else
, case
and
val_match
to make vectorised if-else operations much
cheaper.
New function with_local_seed
to help run
reproducible expressions with a local seed to remove the need for
setting a seed globally, especially helpful for small expressions and
comparisons without affecting the global RNG state.
Various internal bug fixes related to the scalar functions.
Fixed a regression where NULL
elements were not
being correctly dropped in new_df()
.
New factor functions levels_rename
,
levels_add
, levels_rm
,
levels_lump
and levels_count
.
overview
cols are abbreviated to save visual space
and histograms are printed by default.
levels_drop
was not working correctly and has been
fixed.
New functions cheapr_var
and
cheapr_rev
.
get_breaks
has been improved and a few small bugs
have been fixed.
as_discrete
gains a new argument
inf_label
.
Safety improvements to as_discrete
.
Removed internal C++ functions as package installation was failing for some machines.
New scalar functions have been added and some renamed. Most are
now prefixed with ‘val_’ or ‘na_’ in the case of NA
specific scalar functions.
New cheap functions for binning continuous data into discrete
bins. These include get_breaks
, as_discrete
and bin
. get_breaks
finds ‘pretty’
break-points of numeric data very quickly. as_discrete
converts numeric data to discrete categories as a factor.
bin
is a low-level function for binning numeric data into
the correct bins. It can also efficiently return the corresponding break
values instead of the break indices through
codes = FALSE
.
New function na_insert
to randomly insert
NA
values into a vector.
New function vector_length
as a hybrid between
length
and nrow
.
gcd
and scm
now make use of 64-bit
integers internally and can accept ‘integer64’ objects. scm
used to return NA
once the 32-bit integer limit of 2^31 - 1
was reached if the input was an integer vector. This has now been
increased to the 64-bit integer limit, which is approximately
9.223372e+18 and errors if that limit is exceeded.
‘integer64’ objects are now lightly supported. They are not supported in any sequence functions or in the ‘set_math’ functions.
New functions new_df
and
named_list
.
All factor levels utilities now begin with the prefix ‘levels_’.
New cheap factor functions as_factor
,
levels_add_na
, levels_drop_na
,
levels_drop
and levels_reorder
.
lag_
now uses memmove
where
possible.
Fixed an issue where lag_(x)
was materialising x
twice if x was an ALTREP integer sequence.
Range based subsetting, e.g. sset(x, 1:10)
should
now be faster as memmove
is used where possible.
New functions val_count
and which_val
for common scalar operations.
Some functions gain a ‘names’ argument.
Replaced calls to STRING_PTR
with
STRING_PTR_RO
to satisfy R package check results.
lag_
should now be somewhat faster.
Fixed a small bug in lag2_
that would produce
incorrect results when supplying a vector of lags and an order
vector.
A signed integer overflow bug in lag2_
has been
fixed. This occurred when supplying NA
lags.
lag2_
no longer fills the names of named vectors
when the fill
value is supplied.
New function recycle
to help recycle R objects to a
common size.
The set
functions that update by reference are now
ALTREP aware and take a copy when the input is an ALTREP
object.
New function lag2_
as a generalised solution for
complex lags. It supports dynamic lag vectors, lags using an order
vector, and custom run lengths. It doesn’t support updating by reference
or long vectors.
New function lag_
for very fast lags and leads on
vectors and data frames. It includes a set
argument
allowing users to create a lagged vector by reference without
copies.
set_round
has been amended to improve floating point
accuracy.
New ‘set’ Math operations inspired by ‘data.table’ and ‘collapse’ that transform data by reference.
Fixed an inconsistency of when sequence_()
would
error when supplied with a zero-length size argument.
Fixed a protection stack imbalance in count_val(x)
when x
is NULL
.
sset
has been optimised for wide data frames with
many variables. It is also faster when applied to a data frame with
dates, date-times and factors.
In sset
, when i
is a logical vector it
must match the length of x.
sset
can now handle ‘ALTREP’ compact real sequences
as well.
sset
is now parallelised when i
is an
‘ALTREP’ compact integer sequence,
e.g. sset(x, 1:10)
.
sset
now has an internal range-based subset method
for ‘ALTREP’ integer sequences made using :
for
example.
New function count_val
as a cheaper alternative to
e.g. sum(x == val)
.
Negative indexing in sset
has been improved. It is
also now partially parallelised.
Setting recursive
to false should now be
faster.
‘overview’ objects gain an additional list element “print_digits” which is passed to the print method in order to correctly round the summary statistics without affecting the ‘cheapr.digits’ option globally.
factor_
and na_rm
now handle data
frames.
A bug in sset.data.table
that caused further set
calculations to produce warnings has been fixed.
is_na.POSIXlt
and sset.POSIXlt
have
been rewritten to handle unbalanced ‘POSIXlt’ objects.
New function sset
to consistently subset data frame
rows and vectors in general.
overview
now always returns an object of class
“overview”. It also returns the number of observations instead of rows
so that it makes sense for vector summaries as well as data frame
summaries.
sequence_
has been optimised and rewritten in C++.
It now only checks for integer overflow when both from
and
by
are integer vectors.
The internal function list_as_df
has been rewritten
in C++.
New function overview
as a cheaper alternative to
summary
.
All of the NA
handling functions now fall back to
using is.na
if an appropriate method cannot be
found.
More support has been added for all objects with an
is.na
method.
is_na
has been added as an S3 generic function which
is parallelised and internally falls back on is.na
if there
are no suitable methods.
Additional list utility functions have been added.
Limited support for vctrs_rcrd
objects has been
added again.
num_na
and similar functions no longer treat empty
data frame rows as single observations but instead return the total
number of NA
values in the data frame.
Fixed a bug in row_na_counts
and
col_na_counts
that would cause the session to crash when a
column variable was a list.
For the time being, vctrs ‘vctrs_rcrd’ objects are no longer supported though this support may be re-added in the future.