Installing Dependencies

library(greta)

Why we need to install dependencies

The greta package uses Google’s TensorFlow (TF) and Tensorflow Probability (TFP)) under the hood to do efficient, fast, and scalable linear algebra and MCMC. TF and TFP are python packages, and so are required to be installed. This is different to how normal dependencies work with R packages, where the dependencies are automagically built and managed by CRAN.

Unfortunately, there isn’t an automatic, reliable way to ensure that these are provided along when you install greta, so we need to take an additional step to install them. We have tried very hard to make the process as easy as possible by providing a helper function, install_greta_deps().

How to install python dependencies using install_greta_deps()

We recommend running:

install_greta_deps()

And then following any prompts to install dependencies. You will then need to restart R and load library(greta) to start using greta.

How install_greta_deps() works

The install_greta_deps() function installs the Python dependencies TF and TFP. By default it installs versions TF 2.15.0, and TFP version 0.23.0. It places these inside a conda environment, “greta-env-tf2”. For the default settings, this is python 3.10. Using a conda environment isolates these exact python modules from other python installations, so only greta will see them.

We do this as it helps avoids installation issues, where previously you might update TF on your computer and overwrite the current version needed by greta. Using this “greta-env-tf2” conda environment means installing other python packages should not be impact the Python packages needed by greta. It is part of the recommended way to manage python dependencies in an R package as recommended by the team at Posit.

Using different versions of TF, TFP, and Python

The install_greta_deps() function takes three arguments:

  1. deps: Specify dependencies with greta_deps_spec()
  2. timeout: time in minutes to wait in installation before failing/exiting
  3. restart: whether to restart R (“force” - restart R, “no”, will not restart, “ask” (default) - ask the user)

You specify the version of TF TFP, or python that you want to use with greta_deps_spec(), which has arguments:

If you specify versions of TF/TFP/Python that are not compatible with each other, it will error before starting installation. We determined the appropriate versions of Python, TF, and TFP from https://www.tensorflow.org/install/source#tested_build_configurations and https://www.tensorflow.org/install/source_windows#tested_build_configurations, and by inspecting TFP release notes. We put this information together into a dataset, greta_deps_tf_tfp. You can inspect this with View(greta_deps_tf_tfp).

If you provide an invalid installation versions, it will error and suggest some alternative installation versions.

How we install dependencies

For users who want to know more about the installation process of dependencies in greta.

We create a separate R instance using callr to install python dependencies using reticulate to talk to Python, and the R package tensorflow, for installing the tensorflow python module. We use callr so that we can ensure the installation of python dependencies happens in a clean R session that doesn’t have python or reticulate already loaded. It also means that we can hide the large amounts of text output to the console that happens when installation is running - these are written a logfile during installation that you can read with open_greta_install_log().

If miniconda isn’t installed, we install miniconda. You can think of miniconda as a lightweight version of python with minimal dependencies.

If “greta-tf2-env” isn’t found, then we create a new conda environment named “greta-tf2-env”, for a version of python that works with the specified versions of TF and TFP.

Then we install the TF and TFP python modules, using the versions specified in greta_deps_spec().

After installation, we ask users if they want to restart R. This only happens in interactive sessions, and only if the user is in RStudio. This is to avoid potential issues where this script might be used in batch scripts online.

Troubleshooting installation

Installation doesn’t always go to plan. Here are some approaches to getting your dependencies working.

If the previous installation helper did not work, you can try the following:

reticulate::install_miniconda()
reticulate::conda_create(
        envname = "greta-env-tf2",
        python_version = "3.10"
      )
reticulate::conda_install(
        envname = "greta-env-tf2",
        packages = c(
          "tensorflow-probability==0.23.0",
          "tensorflow==2.15.0"
        )
      )

Which will install the python modules into a conda environment named “greta-env-tf2”.

You can also not install these not into a special conda environment like so:

reticulate::install_miniconda()
reticulate::conda_install(
        packages = c(
          "tensorflow-probability==0.23.0",
          "tensorflow==2.15.0"
        )
      )