Introduction to ‘cpp11armadillo’

Motivations

The development of cpp11armadillo emerges from the desire to follow a simplified approach towards R and C++ integration by building on top of cpp11, a ground up rewrite of C++ bindings to R with different design trade-offs and features. cpp11armadillo aims at providing an additional layer to put the end-user focus on the computation instead of configuration (Vaughan, Hester, and François 2023).

Armadillo is a linear algebra library for the C++ language, aiming towards a good balance between speed and ease of use. It is justified in the fact that C++, in its current form, is very valuable to address bottlenecks that we find with interpreted languages such as R and Python but it does not provide data structures nor functions for linear algebra (Sanderson and Curtin 2016).

RcppArmadillo was first published to CRAN in 2010, and it allows to use Armadillo via Rcpp, a widely extended R package to call C++ functions from R (Eddelbuettel and Sanderson 2014).

Design choices

The design choices in cpp11armadillo are:

These ideas reflect a comprehensive effort to provide an efficient interface for integrating C++ and R that aligns with the Tidy philosophy (Wickham et al. 2019), addressing both the technical and community-driven aspects that influence software evolution.

These choices have advantages and disadvantages. A disadvantage is that cpp11armadillo will not convert data types automatically, the user must be explicit about data types, especially when passing data from R to C++ and then exporting the final computation back to R. An advantage is that cpp11armadillo codes, including its internal templates, can be adapted to work with Python via pybind11 (Jakob, Rhinelander, and Moldovan 2016).

cpp11armadillo uses Hansen (2022) notation, meaning that matrices are column-major and vectors are expressed as column vectors (i.e., \(N\times1\) matrices).

Examples

Convention: input R matrices are denoted by x, y, z, and output or intermediate C++ matrices are denoted by X, Y, Z. The example functions can be called from R scripts and should have proper headers as in the following code:

#include <armadillo.hpp>
#include <cpp11.hpp>
#include <cpp11armadillo.hpp>

using namespace arma;
using namespace cpp11;

[[cpp11::register]] // allows using the function in R
doubles_matrix<> solve_mat(doubles_matrix<> x) {
  Mat<double> Y = as_Mat(x); // convert from R to C++
  Mat<double> Yinv = inv(Y); // Y^(-1)
  return as_doubles_matrix(Yinv); // convert from C++ to R
}

This example includes the Armadillo, cpp11 and cpp11armadillo libraries, and allows interfacing C++ with R (i.e., the #include <cpp11.hpp>). It also loads the corresponding namespaces (i.e., the using namespace cpp11) in order to simplify the notation (i.e., using Mat instead of arma::Mat).

The as_Mat() function is provided by cpp11armadillo to pass a matrix object from R to C++ and that Armadillo can read.

The as_doubles_matrix() function is also provided by cpp11armadillo to pass a Mat<double> or Mat<int> object from C++ to R.

Ordinary Least Squares

Given a design matrix \(X\) and and outcome vector \(y\), one function to obtain the OLS estimator \(\hat{\beta} = (X^tX)^{-1}(X^tY)\) as a matrix (i.e., column vector) is:

doubles_matrix<> ols_mat(const doubles_matrix<>& y, const doubles_matrix<>& x) {
  Mat<double> Y = as_Mat(y);
  Mat<double> X = as_Mat(x);

  Mat<double> XtX = X.t() * X; // X'X
  Mat<double> XtX_inv = inv(XtX); // (X'X)^(-1)
  Mat<double> beta = XtX_inv * X.t() * Y; // (X'X)^(-1)(X'Y)

  return as_doubles_matrix(beta);
}

// or
doubles ols_dbl(const doubles& y, const doubles_matrix<>& x) {
  // Y either as Mat or Col
  Mat<double> Y = as_Mat(y);
  // Col<double> Y = as_Col(y);

  Mat<double> X = as_Mat(x);

  Mat<double> XtX = X.t() * X;
  Mat<double> XtX_inv = inv(XtX);
  Mat<double> beta = XtX_inv * X.t() * Y;

  return as_doubles(beta);
}

The ols_mat() function receives inputs from R, does the computation on C++ side. The use of const and & are specific to the C++ language and allow to pass data from R to C++ while avoiding copying the data, therefore saving time and memory.

Another option is to express the same computation as a vector:

doubles ols_dbl(const doubles_matrix<>& y, const doubles_matrix<>& x) {
  Mat<double> Y = as_Mat(y);
  Mat<double> X = as_Mat(x);

  Mat<double> XtX = X.t() * X;
  Mat<double> XtX_inv = inv(XtX);
  Mat<double> beta = XtX_inv * X.t() * Y;

  return as_doubles(beta);
}

Eigenvalues

Given a matrix \(X\), an eigenvalue of \(X\) is any number \(t\) such that \(Xv = tv\), where \(v\) is a non-zero vector.

Two functions to obtain the eigenvalues of a symmetric matrix are:

doubles_matrix<> eigen_sym_mat(const doubles_matrix<>& x) {
  Mat<double> X = as_Mat(x);
  Mat<double> y = eig_sym(X);
  return as_doubles_matrix(y);
}

doubles eigen_sym_dbl(const doubles_matrix<>& x) {
  Mat<double> X = as_Mat(x);
  Mat<double> y = eig_sym(X);
  return as_doubles(y);
}

When the input matrix is not symmetric, the eigenvalues are complex numbers. cpp11armadillo provides a wrapper to return the real and imaginary parts of the eigenvalues as a list:

list eigen_gen_mat(const doubles_matrix<>& x) {
  Mat<double> X = as_Mat(x);
  Mat<complex<double>> y = eig_gen(X);
  list out = as_complex_matrix(y);
  return out;
}

The idea of cpp11armadillo is to integrate with cpp11 to provide flexibility, and the previous result can be obtained without using the as_complex_matrix() wrapper:

list eigen_gen_mat(const doubles_matrix<>& x) {
  Mat<double> X = as_Mat(x);

  Mat<complex<double>> y = eig_gen(X);

  Mat<double> y_real = real(y);
  Mat<double> y_imag = imag(y);

  writable::list out;
  out.push_back({"real"_nm = as_doubles_matrix(y_real)});
  out.push_back({"imag"_nm = as_doubles_matrix(y_imag)});

  return out;
}

As with OLS, in this case it is possible to return a list of vectors instead of a matrices:

[[cpp11::register]]
list eigen_gen_dbl(const doubles_matrix<>& x) {
  Mat<double> X = as_Mat(x);
  Col<complex<double>> y = eig_gen(X);
  list out = as_complex_doubles(y);
  return out;
}

Cholesky decomposition

One function to decompose a matrix \(X\) into an upper (lower) triangular matrix \(R\) such that \(X = R^tR\) (\(X = RR^t\)) is:

doubles_matrix<> chol_mat(const doubles_matrix<>& x, std::string type) {
  Mat<double> X = as_Mat(x);

  Mat<double> res;

  if (type == "upper") {
    res = chol(X);
  } else if (type == "lower") {
    res = chol(X, "lower");
  } else {
    stop("Invalid type");
  }

  return as_doubles_matrix(res);
}

This function uses the second argument to avoid creating a new function for the lower triangular decomposition.

QR decomposition

One function to decompose a matrix \(X\) into a product \(X=QR\) where \(Q\) is an orthogonal matrix and \(R\) is an upper triangular matrix is:

list qr_mat(const doubles_matrix<>& x, bool econ) {
  Mat<double> X = as_Mat(x);

  Mat<double> Q;
  Mat<double> R;

  bool computable;

  if (!econ) {
    computable = qr(Q, R, X);
  } else {
    computable = qr_econ(Q, R, X);
  }

  if (!computable) {
    stop("QR decomposition failed");
  } else {
    writable::list out;
    out.push_back({"Q"_nm = as_doubles_matrix(Q)});
    out.push_back({"R"_nm = as_doubles_matrix(R)});
    return out;
  }
}

In this case, the function returns a list of matrices after using the second argument to use two different Armadillo implementations of the QR decomposition.

Capital Asset Pricing Model (CAPM)

Suppose there are two risky stocks, Pepsi and Coca-Cola, with returns for the last three days given by a matrix \(R\) in a market with a risk-free bond with a return \(f=2\%\) and a risky market portfolio with returns for the last three days given by a vector \(m\). The CAPM equation is given by \(R_i = f + \beta_i(E(m) - f)\), where \(\beta_i = \frac{\text{cov}(R_i, m)}{\text{var}(m)}\) (Bodie, Kane, and Marcus 2010).

The simulated returns are given by:

set.seed(200100)
f <- 0.02
m <- matrix(rnorm(3, 0, 0.1), nrow = 3, ncol = 1)
r <- matrix(rnorm(6, 0, 0.1), nrow = 3, ncol = 2)

One function to obtain the CAPM betas for the two stocks is:

doubles_matrix<> capm(const doubles_matrix<>& r, const doubles_matrix<>& m,
                      double f) {
  Mat<double> R = as_Mat(r);
  Mat<double> M = as_Mat(m);
  Mat<double> F = ones<Mat<double>>(R.n_cols, 1) * f;

  // Market average return and CAPM betas
  Mat<double> M_avg = ones<Mat<double>>(R.n_cols, 1) * as_scalar(mean(M, 0));
  Mat<double> beta = cov(R, M) / as_scalar(var(M));

  // Expected returns (% = pairwise multiplication)
  Mat<double> out = F + beta % (M_avg - F);

  return as_doubles_matrix(out);
}

Leontief inverse

The Leontief Input-Output model posits that the equilibrium production in an economy (i.e., country) with \(N\) industries (i.e., sectors) is given by \((I-A)^{-1}\), where \(I\) is the identity matrix and \(A\) contains the input requirement for each industry \(1\leq i \leq N\) from any other \(1 \leq j \leq N-1\) industry. \(A\) is determined by a transaction matrix \(X\) that reflects the final transactions between industries and an aggregate demand vector \(d\) (Leontief 1986).

We can use the data from the leontief package as data inputs for the example.

library(leontief)
X <- transaction_matrix
d <- wage_demand_matrix[, "final_total_demand"]

One function to obtain the Leontief inverse is:

doubles_matrix<> leontief_inverse(const doubles_matrix<>& x, 
                                  const doubles_matrix<>& d) {
  Mat<double> X = as_Mat(x);
  Mat<double> D = as_Mat(d);

  // Check dimensions
  bool X_square = (X.n_rows == X.n_cols);
  bool compatible_dimensions = (X.n_rows == D.n_rows);

  if (X_square == false) {
    stop("x must be square.");
  }
  if (compatible_dimensions == false) {
    stop("d must have the same number of elements as the number of rows in x.");
  }

  // Input requirement matrix
  Mat<double> Y = zeros(D.n_rows, D.n_rows);
  Y.each_col() += D;
  Mat<double> A = X / Y.t();

  Mat<double> I = eye(X.n_rows, X.n_cols); // Identity matrix

  Mat<double> out = inv(I - A); // Leontief inverse

  return as_doubles_matrix(out);
}

Benchmarks

A proper benchmark is to compute eigenvalues for large matrices. Both cpp11armadillo and RcppArmadillo use Armadillo as a backend, and the marginal observed differences are because of how cpp11 and Rcpp pass data from R to C++ and viceversa. The computation times are identical.

Input Median time cpp11armadillo Median time RcppArmadillo
500x500 35.07ms 36.4ms
1000x1000 260.28ms 263.21ms
1500x1500 874.62ms 857.31ms
2000x2000 2.21s 2.21s
Input Memory allocation cpp11armadillo Memory allocation RcppArmadillo
500x500 17.1KB 4.62MB
1000x1000 21KB 4.62MB
1500x1500 24.9KB 4.63MB
2000x2000 28.8KB 4.63MB

The cpp11armadillo computation was obtained with the eigen_sym_mat() function already shown.

The RcppArmadillo computation was obtained with the following function:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;

// [[Rcpp::export]]
arma::mat eigen_sym_mat(const arma::mat& x) {
  arma::mat y = eig_sym(x);
  return y;
}

In order to get the RcppArmadillo function to work, we had to dedicate time to search online about the error function 'enterRNGScope' not provided by package 'Rcpp', which required to load additional headers in our code on R side.

Conclusion

RcppArmadillo has been and will continue to be widely successful. cpp11armadillo is a alternative templated implementation with different design trade-offs and features. Both packages can co-exist and continue to enrich the R community.

References

Bodie, Z., A. Kane, and A. Marcus. 2010. Investments. McGraw-Hill Education.

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” Computational Statistics & Data Analysis 71 (March): 1054–63. https://doi.org/10.1016/j.csda.2013.02.005.

Hansen, Bruce. 2022. Econometrics. Princeton University Press.

Jakob, Wenzel, Jason Rhinelander, and Dean Moldovan. 2016. Pybind11 — Seamless Operability Between C++11 and Python. https://github.com/pybind/pybind11.

Leontief, Wassily. 1986. Input-Output Economics. Oxford University Press.

Sanderson, Conrad, and Ryan Curtin. 2016. “Armadillo: A Template-Based C++ Library for Linear Algebra.” Journal of Open Source Software 1 (2): 26. https://doi.org/10.21105/joss.00026.

Vaughan, Davis, Jim Hester, and Romain François. 2023. Cpp11: A C++11 Interface for R’s c Interface. https://CRAN.R-project.org/package=cpp11.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.