algebraic.distAn algebra over distributions (random elements).
Tags: multivariate distributions, multivariate normal distribution, multivariate empirical distribution, data generating process, R, data-science, statistics, inference, likelihood-models, probability-theory
The GitHub documentation can be viewed here.
See the vignette algebraic.dist: Example for a quick introduction to the package.
You can install the development version of
algebraic.dist from GitHub with:
# install.packages("devtools")
devtools::install_github("queelius/algebraic.dist")The R package algebraic.dist provides an algebra over
distributions. It’s not fully-formed yet, but I plan on using it for a
lot of my future work. For instance, I’ll move a lot of the code in
algebraic.mle and likelihood.model to this
package.
After that, I want to experiment with using the
algebraic.dist to do the following:
Compose distributions such that operations over distributions generate other known distributions.
There are a lot of well-known compositions, such as the exponential distribution being the minimum of independent exponential distributions, or the sum of independent normal distributions being a normal distribution, but there is a very large space of possible compositions that are not as well-known or well-studied that I want to explore.
Let people use an R expression to lazily compose functions of distributions. Simplifying a distribution expression will generate a most simple R expression that represents the same distribution.
Sometimes, this may result in a simple close-form distribution, like a multivariate normal distribution, but in other cases it may result in a (hopefully simpler) expression that composes multiple distributions and operations over them.
With these R expressions that represent distributions, we can define more operations, like taking the limiting distribution of a sequence of distributions, say \(\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^n X_i\), which is of normal by the central limit theorem.
Deduce various properties of these distributions, such as their moments, variances, etc. Sometimes, this may require numerical integration or Monte Carlo methods, but if the expression simplifies to a known distribution, then we can use the known properties of that distribution.
I have a lot of this code in place in C++, but I want to re-implement it in R so that it’s more accessible to others. I may also implement some of the more interesting compositions in C++ and expose them to R via Rcpp, but I’m not sure yet. I use a lot of templates and metaprogramming in C++, and I’m not sure how well that will translate to Rcpp.