Introduction to cppcontainers

Christian Düben

November 09, 2024

1 Motivation

This package incorporates C++ Standard Template Library containers into R. In doing so, it meets various objectives and targets different groups of R users.

First, it speeds up development workflows. If you previously wanted to make use of C++ containers, you had to write C++ code and wait for compilation every time you changed it. To evaluate the data held in C++, you also had to write print or export functions for each setup. cppcontainers does these steps for you. It wraps C++ containers, their methods, and helper functions and compiles them when the package is installed. There is no need to write C++ code or wait for compilation after the package has been installed. You can use the containers as interactively and without waiting time as R’s native data frames.

Second, it allows user who do not administer their machines without an installed compiler (e.g., without Rtools on Windows) to use C++ containers, through the binary version of the package.

Third, it makes C++ easily accessible to R users. Much of the R community does not understand C++ and does not want to undertake the effort of learning it. This can impede collaboration between team members who are familiar with C++ and those who are not. cppcontainers circumvents this problem by abstracting away much of the underlying C++ code, framing everything in an R-style syntax (incl. 1-based indices), and explaining functions in a language intuitive to R users.

Fourth, it pushes R towards a general purpose programming language. One of the big differences between R and general purpose languages is the availability of data structures. In R, almost everything is some sort of vector; a vector of primitives, a vector of pointers, etc. Even scalars are vectors of length one. This simplicity makes R very accessible and easy to learn for its audience, that is largely comprised of people with a background in quantitative, empirical methods but without training in software engineering. However, users who come from a general purpose programming language, such as C++ or Python, likely miss a larger collection of data structures. cppcontainers fills this gap.

Fifth, this package brings the performance of C++ Standard Template Library containers to R. These data structures are tailored towards certain use cases and can speed up execution times. Of course, there is some overhead involved compared to entirely writing the code in C++. Yet, by providing a thin wrapper around Rcpp::XPtr, cppcontainers attempts to minimize this overhead.

It is important to keep this motivation in mind. There is a lot more what you can do with C++ and consequently Rcpp, than what this package implements. This, e.g., includes nested containers. The aim of this package is not to incorporate all features of an entire other language into R. It adds functionality in one specific domain.


2 Container Types

cppcontainers wraps many of the Standard Template Library’s container types. The following list briefly introduces them in terms that are easy to grasp for users with little technical expertise. It avoids a discussion of performance characteristics or thread-safety implications.

Associative containers

Container adapters

Sequence containers

There is a lot to know on the efficiency of element access at different positions, element insertion and removal, space efficiency, etc. It is too much to cover in this vignette. The list above primarily elaborates on uniqueness and sorting. It grants a first idea of what container types the package wraps and on which container types to read up on in other sources of documentation.

In this package, containers can hold primitives of four different types: integers, doubles, strings, and booleans. These correspond to the integer, numeric, character, and logical types in R. Native C++ strings are different from R strings, which is why the string type requires comparatively more conversions and can be a little less efficient to work with than the alternatives. Booleans are also differently implemented, but are converted more implicitly than strings are.

Much of STL containers’ power lies in nested structures. In C++, you can, e.g., assemble a vector of sets or a queue of multimaps. This increases the universe of potential setups exponentially. Since an R wrapper package needs to hardcode all options, cppcontainers does not implement nested compositions and only targets primitive elements of the four mentioned types.


3 Methods

This package covers many of the container types’ methods and maintains their original names. To make them more intuitive for R users, it replaces the object-oriented notation (x.size()), with the in R more commonly used functional notation (size(x)). cppcontainers omits methods that return values such as pointers or iterators, which are not directly utilizable in R. One example is the equal_range method for maps.

A few methods are modified to make them more convenient for R users. This particularly includes indexing and vector inputs. Methods like count and contains only accept a single value in C++, but allow for a vector in this package. cppcontainers solves it with iterative procedures on the C++ side.

An exception from the naming convention are the functions creating the objects. To avoid clashes with, e.g., base::vector and utils::stack, this package prefixes the respective methods with cpp_. cpp_set creates a set, cpp_vector creates a vector, etc. Another outlier is remove, which the package implements as remove. (with a dot suffix) to avoid compatibility problems with base::remove.

The following table illustrates which methods apply to which container type. Italicized names do not refer to original C++ methods, but to functions added by this package. They aid in accessing the objects’ data from R.

3.1 Associative Containers

method set unordered set multiset unordered multiset map unordered map multimap unordered multimap
==
[
assign
at
back
bucket_count
capacity
clear
contains
count
emplace
emplace_after
emplace_back
emplace_front
empty
erase
erase_after
flip
front
insert
insert_after
insert_or_assign
load_factor
max_bucket_count
max_load_factor
max_size
merge
pop
pop_back
pop_front
print
push
push_back
push_front
rehash
remove.
reserve
resize
reverse
shrink_to_fit
size
sort
sorting
splice
splice_after
to_r
top
try_emplace
type
unique

3.2 Container Adapters

method stack queue priority queue
==
[
assign
at
back
bucket_count
capacity
clear
contains
count
emplace
emplace_after
emplace_back
emplace_front
empty
erase
erase_after
flip
front
insert
insert_after
insert_or_assign
load_factor
max_bucket_count
max_load_factor
max_size
merge
pop
pop_back
pop_front
print
push
push_back
push_front
rehash
remove.
reserve
resize
reverse
shrink_to_fit
size
sort
sorting
splice
splice_after
to_r
top
try_emplace
type
unique

3.3 Sequence Containers

method vector deque forward list list
==
[
assign
at
back
bucket_count
capacity
clear
contains
count
emplace
emplace_after
emplace_back
emplace_front
empty
erase
erase_after
flip
front
insert
insert_after
insert_or_assign
load_factor
max_bucket_count
max_load_factor
max_size
merge
pop
pop_back
pop_front
print
push
push_back
push_front
rehash
remove.
reserve
resize
reverse
shrink_to_fit
size
sort
sorting
splice
splice_after
to_r
top
try_emplace
type
unique

4 Examples

The manual pages provide ample examples on the classes and methods in this package. The subsequent code just showcases a few of the functionalities outside of any meaningful workflow.

## set
s <- cpp_set(c(4, 2, 3))
s
#> 2 3 4
insert(s, c(4, 6))
s
#> 2 3 4 6
print(s, n = -2)
#> 6 4

## unordered set
s <- cpp_unordered_set(3:5)
s
#> 5 4 3
emplace(s, 2)
s
#> 2 5 4 3
type(s)
#> [1] "integer"

## multiset
s <- cpp_multiset(c(3, 3, 6, 2))
s
#> 2 3 3 6
contains(s, 2)
#> [1] TRUE
to_r(s)
#> [1] 2 3 3 6

## unordered multiset
s <- cpp_unordered_multiset(c(3, 3, 6, 2))
s
#> 6 2 3 3
count(s, 3)
#> [1] 2
rehash(s)

## map
m <- cpp_map(c("Alice", "Bob"), c(TRUE, FALSE))
m
#> ["Alice",TRUE] ["Bob",FALSE]
m["Bob"]
#> [1] FALSE
insert_or_assign(m, FALSE, "Alice")
m
#> ["Alice",FALSE] ["Bob",FALSE]

## unordered map
m1 <- cpp_unordered_map(c("Alice", "Bob"), 3:4)
m1
#> ["Bob",4] ["Alice",3]
m2 <- cpp_unordered_map(c("Bob", "Jane"), 6:7)
m2
#> ["Jane",7] ["Bob",6]
merge(m1, m2)
m1
#> ["Jane",7] ["Bob",4] ["Alice",3]
m2
#> ["Bob",6]
at(m1, "Jane")
#> [1] 7

## multimap
m <- cpp_multimap(c("Alice", "Bob", "Bob"), 3:5)
m
#> ["Alice",3] ["Bob",4] ["Bob",5]
clear(m)
empty(m)
#> [1] TRUE

## unordered multimap
m <- cpp_multimap(c("Alice", "Bob", "Bob"), 3:5)
m
#> ["Alice",3] ["Bob",4] ["Bob",5]
size(m)
#> [1] 3
erase(m, "Bob")
m
#> ["Alice",3]

## stack
s1 <- cpp_stack(3:4)
s1
#> Top element: 4
s2 <- cpp_stack(3:4)
s2
#> Top element: 4
s1 == s2
#> [1] TRUE
pop(s1)
s1
#> Top element: 3

## queue
q <- cpp_queue(c(2.1, 3.3))
q
#> First element: 2.1
push(q, 2.7)
q
#> First element: 2.1
back(q)
#> [1] 2.7

## priority queue
p <- cpp_priority_queue(c(2.4, 1.3, 4.2, 1.5))
p
#> First element: 4.2
top(p)
#> [1] 4.2
sorting(p)
#> [1] "descending"

## vector
v <- cpp_vector(2:4)
v
#> 2 3 4
reserve(v, 10)
capacity(v)
#> [1] 10

## deque
d <- cpp_deque(c("Alice", "Bob", "Bob"))
d
#> "Alice" "Bob" "Bob"
push_front(d, "Jane")
d
#> "Jane" "Alice" "Bob" "Bob"
pop_front(d)
d
#> "Alice" "Bob" "Bob"

## forward list
l <- cpp_forward_list(c(4, 2, 6))
l
#> 4 2 6
sort(l)
l
#> 2 4 6
resize(l, 7)
l
#> 2 4 6 0 0 0 0

## list
l <- cpp_list(c("Alice", "Bob", "Bob", "Alice"))
unique(l)
#> [1] 1
l
#> "Alice" "Bob" "Alice"
print(l, from = 3)
#> "Alice" "Bob" "Alice"