sd2R is an R package that provides a native, GPU-accelerated Stable Diffusion pipeline by wrapping the C++ implementation from stable-diffusion.cpp and using ggmlR as the tensor backend.
sd2R exposes a high-level R interface for text-to-image and image-to-image generation, while all heavy computation (tokenization, encoders, denoiser, sampler, VAE, model loading) is implemented in C++. Supports SD 1.x, SD 2.x, SDXL, Flux, and FLUX.2 (Klein) model families. Targets local inference on Linux with Vulkan-enabled AMD GPUs (with automatic CPU fallback via ggml), without relying on external Python or web APIs.
Flux without Python:
R → sd2R → ggmlR → Vulkan → GPU
src/sd/): tokenizers, text
encoders (CLIP, Mistral, Qwen, UMT5), diffusion UNet/MMDiT denoiser,
samplers, VAE encoder/decoder, and model loading for
.safetensors and .gguf weights.LinkingTo) and libggml.a, reusing the same
GGML/Vulkan stack that also powers llamaR and other ggmlR-based
packages.sd_generate() — single entry
point for all generation modes. Automatically selects the optimal
strategy (direct, tiled sampling, or highres fix) based on output
resolution and available VRAM (vram_gb parameter in
sd_ctx()). Users don’t need to think about tiling at
all.verbose = FALSE
by default — no console output unless explicitly enabled. Cross-platform
build system with configure/configure.win
generating Makevars from templates.vram_gb in
sd_ctx() to override auto-detection.sd_generate_multi_gpu() distributes prompts across Vulkan
GPUs via callr, one process per GPU, with progress
reporting.device_layout parameter in sd_ctx()
distributes sub-models across multiple Vulkan GPUs within a single
process. Presets: "mono" (all on one GPU),
"split_encoders" (CLIP/T5 on GPU 1, diffusion + VAE on GPU
0), "split_vae" (CLIP/T5 + VAE on GPU 1, diffusion on GPU
0), "encoders_cpu" (text encoders on CPU). Manual override
via diffusion_gpu, clip_gpu,
vae_gpu.meta_backend = TRUE in sd_ctx() shards a
single diffusion model across all available GPUs via the ggml meta
backend (for models too large for one GPU). Requires ggmlR >= 0.7.8;
falls back to the normal single-backend path otherwise.sd_profile_start() / sd_profile_stop() /
sd_profile_summary(). Tracks model loading, text encoding
(with CLIP/T5 breakdown), sampling, and VAE decode/encode stages.vae_decode_only = FALSE in context.mask argument of
sd_img2img() regenerates only the masked region while
preserving the rest. Accepts a PNG path, a numeric matrix, or an SD
image (white = generate, black = keep); sd_load_mask()
loads a mask file. Works on plain SD/SDXL/FLUX 1/2 weights — no
dedicated inpaint model required.vae_mode = "auto"
(default) queries free GPU memory before VAE decode and enables tiling
only when estimated peak usage exceeds available VRAM (with a 50 MB
safety reserve). Falls back to a pixel-area threshold
(vae_auto_threshold) when Vulkan memory query is
unavailable (CPU backend, no GPU). Supports per-axis relative tile
sizing (vae_tile_rel_x, vae_tile_rel_y) for
non-square aspect ratios.sd_system_info(), reporting GGML/Vulkan capabilities as
detected by ggmlR at build time.sd_pipeline() +
sd_node() for composable, sequential multi-step workflows
(txt2img → upscale → img2img → save). Pipelines are serializable to JSON
via sd_save_pipeline() /
sd_load_pipeline().sd_app() launches an
interactive web interface with auto-detection of model architecture,
non-blocking async generation (C++ std::thread), live
progress bar with ETA, and automatic role assignment for multi-file
models (Flux, FLUX.2, SD3).Launch an interactive web interface for image generation:
# From an R session
sd_app() # random port, opens browser
sd_app(model_dir = "/path/to/models") # pre-scan a model folder
sd_app(port = 3838, host = "127.0.0.1") # fixed port/hostFrom the terminal (one-liners):
# Simplest
Rscript -e 'sd2R::sd_app()'
# Fixed port + local host, open browser
Rscript -e 'sd2R::sd_app(port = 3838, host = "127.0.0.1", launch.browser = TRUE)'
# Equivalent low-level call (no sd2R helpers)
Rscript -e "shiny::runApp(system.file('shiny/sd2R_app', package = 'sd2R'), port = 3838, host = '127.0.0.1', launch.browser = TRUE)"Features: - Auto-detects model architecture (Flux, FLUX.2, SD3, SDXL,
SD1/2) and assigns component roles (diffusion, VAE, CLIP, T5) -
Non-blocking generation with live progress bar and ETA - Shares
sd_generate()’s auto-routing: guidance-distilled CFG
(Flux/FLUX.2), VRAM-aware VAE tiling, and multi-step highres-fix all run
through the async engine - Prevents incompatible model combinations
pipe <- sd_pipeline(
sd_node("txt2img", prompt = "a cat in space", width = 512, height = 512),
sd_node("upscale", factor = 2),
sd_node("img2img", strength = 0.3),
sd_node("save", path = "output.png")
)
# Save / load as JSON
sd_save_pipeline(pipe, "my_pipeline.json")
pipe <- sd_load_pipeline("my_pipeline.json")
# Run
ctx <- sd_ctx("model.safetensors")
sd_run_pipeline(pipe, ctx, upscaler_ctx = upscaler)New to sd2R? Grab a ready-made FLUX 2 model in one line — no Kaggle
account, no Python, no manual file juggling.
sd_download_model() downloads the bundle from a public Kaggle dataset and unpacks it
for you:
# Download FLUX 2 (GGUF) into ./models/flux2
sd_download_model(dest = "models/flux2", verbose = TRUE)
# Then launch the GUI pointed at that folder
sd_app(model_dir = "models/flux2")That’s it — the app auto-detects the model and you can start
generating. Re-running sd_download_model() is safe: it
skips the download if the folder is already populated.
src/sd2R_interface.cpp
defines a thin bridge between R and the C API in
stable-diffusion.h, returning XPtr objects
with custom finalizers for correct lifetime management of
sd_ctx_t and upscaler_ctx_t.configure /
configure.win generate Makevars from
.in templates, resolving ggmlR paths, OpenMP, and Vulkan at
configure time. Per-target -include r_ggml_compat.h applied
only to sd/*.cpp sources to avoid macro conflicts with
system headers.DESCRIPTION declares
Rcpp and ggmlR in LinkingTo, and NAMESPACE is
generated via roxygen2 with useDynLib and Rcpp
imports..onLoad() initializes logging
and registers constant values that mirror the underlying C++ enums using
0-based indices.verbose = FALSE by default — no output unless
requested.-Winconsistent-missing-override, deprecated
codecvt).# Install ggmlR first (if not already installed)
install.packages("ggmlR", configure.args = "--with-simd")
# Install sd2R
install.packages("sd2R")Launch the GUI from a terminal:
Rscript -e "sd2R::sd_app()"
During installation, the configure script automatically
downloads tokenizer vocabulary files (~128 MB total) from GitHub
Releases. This requires curl or wget.
Tested configuration:
Install R, Rtools45, and the Vulkan SDK (use the default install paths).
From CRAN — from source with SIMD (recommended; required for GPU):
Requires Rtools45. Build from source if you want Vulkan GPU
acceleration: the build enables Vulkan only when the Vulkan SDK
is present at compile time (configure.win auto-detects
VULKAN_SDK), so install the Vulkan SDK before
running the commands below.
SIMD is a ggmlR build option, enabled via the
GGML_USE_SIMD environment variable. There is
no --with-simd /
--configure-args="..." flag — configure.win
does not parse those, so set the environment variable instead.
# --- ggmlR (tensor/Vulkan backend) with SIMD ---
unlink("C:/Program Files/R/R-4.6.0/library/00LOCK-ggmlR", recursive = TRUE)
Sys.setenv(GGML_USE_SIMD = "1")
install.packages("ggmlR", type = "source")
# --- sd2R ---
unlink("C:/Program Files/R/R-4.6.0/library/00LOCK-sd2R", recursive = TRUE)
Sys.setenv(MAKEFLAGS = "-j8") # parallel compile; lower on fewer cores
install.packages("sd2R", type = "source")Launch the GUI from a terminal:
"C:\Program Files\R\R-4.6.0\bin\Rscript.exe" -e "library(sd2R); sd_app()"
curl or wget (for downloading vocabulary
files during installation)libvulkan-dev +
glslc (Linux) or Vulkan SDK (Windows)CLIP-L + T5-XXL text encoders, VAE.
sample_steps = 10.
| Test | AMD RX 9070 (16 GB) | Tesla P100 (16 GB) | 2x Tesla T4 (16 GB) |
|---|---|---|---|
| 1. 768x768 direct | 13.72 s | 94.0 s | 62.0 s |
| 2. 1024x1024 tiled VAE | 24.84 s | 151.4 s | 105.6 s |
| 3. 2048x1024 highres fix | 42.70 s | 312.5 s | 222.0 s |
| 4. img2img 768x768 direct | 8.16 s | 51.0 s | 32.8 s |
| 5. 1024x1024 direct | 24.90 s | 152.2 s | 112.1 s |
| 6. Multi-GPU 4 prompts | – | – | 141.7 s (4 img) |
Qwen3 LLM text encoder + FLUX.2 VAE.
sample_steps = 4.
RTX 3090 system: CPU Xeon E5-2666 v3, 32 GB RAM (Windows).
| Test | AMD RX 9070 (16 GB) | RTX 3090 (24 GB) |
|---|---|---|
| 1. 768x768 direct | 13.58 s | 5.10 s |
| 2. 1024x1024 tiled VAE | 32.51 s | 8.59 s |
| 3. 2048x1024 highres fix | 45.01 s | 23.54 s |
| 4. img2img 768x768 direct | 8.08 s | 4.34 s |
| 5. 1024x1024 direct | 33.31 s | 8.74 s |
| SD 1.5 | Flux Q4_K_S | |
|---|---|---|
| Diffusion params | ~860 MB | ~6.5 GB |
| Text encoders | CLIP ~240 MB | CLIP-L + T5-XXL ~3.9 GB |
| Sampling per step (768x768) | ~0.1–0.3 s | ~3.9 s |
| Architecture | UNet | MMDiT (57 blocks) |
For a live, runnable demo see the Kaggle notebook: Stable Diffusion in R (ggmlR + Vulkan GPU).
MIT