Generating caches on a cluster

Nathan Sheffield

2026-02-28

Generating caches in parallel using batchtools

By default, simpleCache creates caches in the R session you use to call it. If you need to make lots of caches, or very large caches, you may want instead to sub these as jobs to a cluster resource manager (like SLURM). simpleCache can do this using functionality from the batchtools package.

This vignette is unevaluated because it relies on the batchtools package and a cluster environment.

To do this, first, create a batchtools registry. You can follow more detailed documentation in the batchtools package, but here’s some code to get you started:

library(simpleCache)
setCacheDir(tempdir())

registry = batchtools::makeRegistry(NA)
templateFile = system.file("templates/slurm-advanced.tmpl", package = "simpleCache")
registry$cluster.functions = batchtools::makeClusterFunctionsSlurm(
  template = templateFile)
registry

Notice that I’m using a custom slurm template here. With a registry in hand, we next need to define the resources this cache job will require:

resources = list(ncpus=1, memory=1000, walltime=60, partition="serial")

Then, we simply add these as arguments to simpleCache() like so:

simpleCache("testBatch", {
  rnorm(1e7, 0, 1)
  }, batchRegistry=registry, batchResources=resources)

This will now create and submit a job script to the cluster. That job script will have R code to create your testBatch cache by calling the code in your simpleCache call, rnorm(1e7, 0, 1). Next time you run this function, it will just load the cache without recreating it, as you would expect simpleCache to do. Now there’s a bunch of other stuff you can use batchtools to do with these jobs:

batchtools::getJobTable(reg=registry)
batchtools::getJobPars()
batchtools::getStatus()

batchtools::getJobTable(reg=registry)
batchtools::getJobPars(1, reg=registry)
batchtools::loadResult(1, reg=registry)
# batchtools::testJob(1, reg=registry)
# killJobs()

When you’re done, you may want to remove your temporary registry:

batchtools::removeRegistry(reg=registry)

See batchtools documentation for more details on using registries.

Lock files

When a cache is submitted to the cluster, simpleCache creates a .lock file so that if you later call simpleCache for the same cache — whether you forgot the job was running or are working locally — it will see the lock and skip rebuilding. The lock file is removed automatically when the cluster job completes. This mechanism is a simple convenience guard, not a robust concurrency lock — it is possible for two processes to both check for the lock file before either creates it, resulting in duplicate builds.