Fixing warning on Debian systems:
Result: WARN
Found the following significant warnings:
RcppExports.cpp:865:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
RcppExports.cpp:899:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
RcppExports.cpp:933:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
RcppExports.cpp:967:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
See ‘/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/philentropy.Rcheck/00install.out’ for details.
* used C++ compiler: ‘Debian clang version 17.0.5 (1)’
Rcpp v1.0.11.6
via
devtools::install_github("https://github.com/RcppCore/Rcpp")
and rerun Rcpp::compileAttributes()
.../src/correlation.h
adjustment of use of logical
operators rather than Wbitwise (| -> or
) which
otherwises raises warnings in clang14
R_xlen_t
instead of
int
during indexing.distance()
and all other individual information theory
functions receive a new argument epsilon
with default value
epsilon = 0.00001
to treat cases where in individual
distance or similarity computations yield x / 0
or
0 / 0
. Instead of a hard coded epsilon, users can now set
epsilon
according to their input vectors. (Many thanks to
Joshua McNeill #26 for this great question).dist_one_one()
,
dist_one_many()
, dist_many_many()
are added.
They are fairly flexible intermediaries between distance()
and single distance functions. dist_one_one()
expects two
vectors (probability density functions) and returns a single value.
dist_one_many()
expects one vector (a probability density
function) and one matrix (a set of probability density functions), and
returns a vector of values. dist_many_many()
expects two
matrices (two sets of probability density functions), and returns a
matrix of values. (Many thanks to Jakub Nowosad, see #27, #28, and New
Vignette Many_Distance)dplyr
package dependency was removed and replaced by
the poorman
due to the heavy dependency burden of
dplyr
, since philentropy
only used
dplyr::between()
which is now
poorman::between()
(Many thanks to Patrice Kiener for this
suggestion)distance(..., as.dist.obj = TRUE)
now returns the same
values as stats::dist()
when working with 2 dimensional
input matrices (2 vector inputs) (see #29) (Many thanks to Jakub Nowosad
(@Nowosad))
Example:library(philentropy)
= matrix(c(1, 2), ncol = 1)
m1
dist(m1)
#> 1
#> 2 1
distance(m1, as.dist.obj = TRUE)
#> Metric: 'euclidean'; comparing: 2 vectors.
#> 1
#> 2 1
distance()
function receives a new argument
mute.message
allowing users to mute message printing when
running large-scale distance computations. Example:distance(rbind(1:10/sum(1:10), 20:29/sum(20:29)),
method = "euclidean",
mute.message = TRUE)
markdown
dependency to DESCRIPTION
(find details
here)the distance()
function receives a new argument
use.row.names
to enable passing the row names from the
input probability or count matrix to the output distance matrix
the distance()
function can now handle
data.table
and tibble
input #16
adding new functionality and arguments as.dist.obj
,
diag
, and upper
to
philentropy::distance()
to allow users to retrieve a
stats::dist()
object when working with
philentropy::distance()
(Many thanks to Hugo Tavares #18 -
see also #13) When using
philentropy::distance(..., as.dist.obj = TRUE)
users can
now directly pass the distance()
output into
hclust
:
Before:
<- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
ProbMatrix <- distance(ProbMatrix, method = "jaccard")
dist.mat <- as.dist(dist.mat)
true.dist.mat <- hclust(true.dist.mat, method = "complete")
clust.res clust.res
Call:
hclust(d = true.dist.mat, method = "complete")
Cluster method : complete
Number of objects: 3
Now:
<- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
ProbMatrix <- distance(ProbMatrix, method = "jaccard", as.dist.obj = TRUE)
dist.mat <- hclust(true.dist.mat, method = "complete")
clust.res clust.res
Call:
hclust(d = true.dist.mat, method = "complete")
Cluster method : complete
Number of objects: 3
gJSD()
which tested transposed matrix
rows rather than transposed matrix columns for sum > 1 (see issue #17
; many thanks to @wkc1986)fixing bug which caused that KL distance returns NaN when P == 0 (see issue #10; Many thanks to @KaiserDominici)
fixing bug which caused stack overflow when computing distance matrices with many rows (see issue #7; Many thanks to @wkc1986 and @elbamos)
fixing bug in gJSD()
where an rbind()
input matrix is not properly transposed (Many thanks to @vrodriguezf; see issue
#14)
gJSD()
receives new argument est.prob
to enable empirical estimation of probability vectors from input count
vectors (non-probabilistic vectors)
Jaccard and Tanimoto similarity measures now return
0
instead of NAN
when probability vectors
contain zeros (Many thanks to @JonasMandel; see issue #15)
jensen-shannon
computations to
compute wrong values when 0 values
were present in the
input vectors (see issue #4 ; Many thanks to @wkc1986)jensen-difference
computations
to compute wrong values when 0 values
were present in the
input vectorsJSD()
gives NaN when any
probability is 0 - see https://github.com/HajkD/philentropy/issues/1
(Thanks to William Kurtis Chang)dist.diversity()
and
distance()
when check for
colSums(x) > 1.001
was peformed (leak was found with
rhub::check_with_valgrind()
)Initial submission version.