This vignette introduces rxp_pipeline(), a function for
organising large projects into logical sub-pipelines. This feature is
particularly useful when working on complex projects with multiple
phases (e.g., ETL, Modelling, Reporting) or when collaborating in teams
where different members work on different parts of the pipeline.
As pipelines grow, a single gen-pipeline.R file can
become difficult to manage. Consider a data science project with: - Data
extraction and cleaning (ETL) - Feature engineering - Model training -
Model evaluation - Report generation
Putting all derivations in one file makes it hard to:
To solve this issue, you can define your project using sub-pipelines
and join them into a master pipeline using
rxp_pipeline().
This allows you to:
A project with sub-pipelines would look something like this:
my-project/
├── default.nix # Nix environment (generated by rix)
├── gen-env.R # Script to generate default.nix
├── gen-pipeline.R # MASTER SCRIPT: combines all sub-pipelines
└── pipelines/
├── 01_data_prep.R # Data preparation sub-pipeline
├── 02_analysis.R # Analysis sub-pipeline
└── 03_reporting.R # Reporting sub-pipeline
Each sub-pipeline file returns a list of derivations:
# Data Preparation Sub-Pipeline
# pipelines/01_data_prep.R
library(rixpress)
list(
rxp_r(name = raw_mtcars, expr = mtcars),
rxp_r(name = clean_mtcars, expr = dplyr::filter(raw_mtcars, am == 1)),
rxp_r(name = selected_mtcars, expr = dplyr::select(clean_mtcars, mpg, cyl, hp, wt))
)The rxp_pipeline() function takes:
The second sub-pipeline:
# Analysis Sub-Pipeline
# pipelines/02_analysis.R
library(rixpress)
list(
rxp_r(name = summary_stats, expr = summary(selected_mtcars)),
rxp_r(name = mpg_model, expr = lm(mpg ~ hp + wt, data = selected_mtcars)),
rxp_r(name = model_coefs, expr = coef(mpg_model))
)The master script becomes very clean, as rxp_pipeline
handles sourcing the files:
# gen-pipeline.R
library(rixpress)
# Create named pipelines with colours by pointing to the files
pipe_data_prep <- rxp_pipeline(
name = "Data Preparation",
path = "pipelines/01_data_prep.R",
color = "#E69F00"
)
pipe_analysis <- rxp_pipeline(
name = "Statistical Analysis",
path = "pipelines/02_analysis.R",
color = "#56B4E9"
)
# Build combined pipeline
rxp_populate(list(pipe_data_prep, pipe_analysis), project_path = ".", build = TRUE)When sub-pipelines are defined, visualisation tools use pipeline colours:
rxp_visnetwork())
and Static DAG (rxp_ggdag()) both use a
dual-encoding approach:
rxp_trace() output in the
console is coloured by pipeline (using the cli
package).When you call rxp_populate() with
rxp_pipeline objects:
pipeline_group and pipeline_colordag.json includes
pipeline metadatarxp_visnetwork() and
rxp_ggdag() read this metadatarxp_pipeline() provides a simple yet powerful way to
organise complex pipelines. By grouping derivations into logical units,
you can:
For a working example, see the subpipelines demo in the
rixpress_demos
repository.