---
title: "Chapter 08: Kernel Loading --- load_kernel_source and load_kernel_library"
author: "Kjell Nygren"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Chapter 08: Kernel Loading --- load_kernel_source and load_kernel_library}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## The assembly problem

OpenCL compiles a **single string** of source code for each program.
The string must be self-contained: all functions used by the kernel must be
defined in the same string, in order, before they are called. There are no
separate compilation units and no linker.

When a kernel needs `dnorm4`, and `dnorm4` itself needs `R_D__0` and
`R_finite`, those definitions must precede `dnorm4` in the string, which
must precede the kernel body. **opencltools** provides two R functions
that assemble and sort source components; use `package = "nmathopencl"`
for this package's `inst/cl/` tree.

## `opencltools::load_kernel_source()`

```{r, eval = FALSE}
opencltools::load_kernel_source(relative_path, package = "nmathopencl")
```

Reads and returns the contents of a single `.cl` file from `inst/cl/`
inside the named package. The `relative_path` is relative to `inst/cl/`:

```{r, eval = FALSE}
# Load the dnorm kernel entry-point file
kernel_src <- opencltools::load_kernel_source("src/dnorm_kernel.cl",
                                              package = "nmathopencl")

# Load a custom kernel from another package's inst/cl/
my_kernel   <- opencltools::load_kernel_source("ex_glmbayes_src/f2_f3_binomial_logit.cl",
                                   package = "nmathopencl")
```

The returned character string can be concatenated
with a library string and passed to the C++ runner.

## `opencltools::load_kernel_library()`

```{r, eval = FALSE}
opencltools::load_kernel_library(subdir, package = "nmathopencl", verbose = FALSE)
```

Reads **all** `.cl` files from `inst/cl/<subdir>/`, parses the
`@provides` / `@depends` annotations in each file, performs a topological
sort, and returns a single concatenated source string with files in
dependency-correct order.

```{r, eval = FALSE}
# Load the complete nmath library (all ~180 files)
nmath_src <- opencltools::load_kernel_library("nmath", package = "nmathopencl")

# Load a smaller subset pre-curated for glmbayes kernels
nmath_glm <- opencltools::load_kernel_library("ex_glmbayes_nmath",
                                              package = "nmathopencl")

# Load with diagnostic output to trace the sort order
nmath_src <- opencltools::load_kernel_library("nmath",
                                              package = "nmathopencl",
                                              verbose = TRUE)
```

## The annotation format

For `opencltools::load_kernel_library()` to sort correctly, every `.cl` file in the
subdirectory must begin with an annotation block:

```c
// @provides: symbol1, symbol2
// @depends:  fileA, fileB
// @includes: library_name
```

| Tag | Required? | Meaning |
|-----|-----------|---------|
| `@provides` | Yes | Comma-separated list of function (or symbol) names defined in this file |
| `@depends` | Yes (may be empty) | Comma-separated list of `.cl` filenames (without extension) that must appear earlier in the output |
| `@includes` | Optional | Library grouping label; used for diagnostics only |

**Rules for `@depends`:**

- List only **direct** dependencies, not the full transitive set. The
  resolver computes the transitive closure automatically.
- Every non-header file must list the library header file (the file whose
  basename matches the subdirectory name) in `@depends`, so the header is
  always loaded first.
- If a file has no dependencies at all (i.e., it is the library header),
  leave `@depends` empty or omit the tag.

Example for `dgamma.cl`:

```c
// @provides: dgamma
// @depends:  nmath, dpois, lgamma
// @includes: nmath
```

This tells the resolver: emit `nmath.cl`, `dpois.cl`, and `lgamma.cl`
before `dgamma.cl`. The resolver handles those files' own dependencies
recursively.

## The topological sort algorithm

`opencltools::load_kernel_library()` uses a straightforward iterative topological sort:

1. Parse all `.cl` files and extract their `@provides` and `@depends` tags.
2. Build a set of "unsorted" files. Initialize the "sorted" set to empty.
3. In each pass, find every unsorted file whose `@depends` entries are all
   already in the sorted set. Emit those files and add them to the sorted
   set.
4. Repeat until all files are sorted or no progress is made.
5. If no progress is made but unsorted files remain, throw:
   `"Dependency sort failed: unresolved dependencies remain."`

Files with an empty `@depends` list are emitted first (pass 1). These are
always the library header files.

### Verbose output

When `verbose = TRUE`, the function prints each pass of the sort:

```
Pass 1: emitting nmath.cl, Rmath.cl, dpq.cl
Pass 2: emitting chebyshev.cl, cospi.cl, fmax2.cl, gammalims.cl
Pass 3: emitting lgammacor.cl
Pass 4: emitting lgamma.cl, gamma.cl
...
```

This is invaluable for debugging missing or circular `@depends` tags.

## Cycle detection and breaking

If file A lists file B in its `@depends` and file B lists file A, the
resolver will detect this on the first pass where neither can be promoted,
and throw an error. If `verbose = TRUE`, it prints the stuck files before
throwing.

The standard fix is to split one of the mutually dependent files into two:

- A "cycle-free" fragment containing functions that make no forward call.
- A "cycle-dependent" fragment containing functions that call the
  cycle-free fragment.
- A `refactored.cl` that provides forward declarations for any symbols
  referenced before they are defined.

Chapter 04 describes the three specific cycles in the nmath library and how
they are resolved.

## Assembling a complete program

A complete OpenCL program string is assembled by concatenating library
and kernel sources in order:

```{r, eval = FALSE}
library(nmathopencl)

# 1. Load the nmath library (sorted automatically)
nmath_src  <- opencltools::load_kernel_library("nmath", package = "nmathopencl")

# 2. Load the kernel entry-point
kernel_src <- opencltools::load_kernel_source("src/dnorm_kernel.cl",
                                              package = "nmathopencl")

# 3. Concatenate into a single program string
program    <- paste(nmath_src, kernel_src, sep = "\n")
```

This `program` string is then passed to the C++ runner (see Chapter 09).

## Using libraries from another package

The `package` argument tells `opencltools::load_kernel_source()` and
`opencltools::load_kernel_library()` which package's `inst/cl/` to look in:

```{r, eval = FALSE}
# Load the curated nmath subset shipped inside nmathopencl
# for use by a downstream package's kernels
nmath_sub <- opencltools::load_kernel_library("ex_glmbayes_nmath",
                                  package = "nmathopencl")

# Load a custom kernel from the same downstream package
my_kernel <- opencltools::load_kernel_source("ex_glmbayes_src/f2_f3_binomial_logit.cl",
                                 package = "nmathopencl")

program <- paste(nmath_sub, my_kernel, sep = "\n")
```

This is exactly the pattern used by `f2_f3_opencl()` in the `ex_glmbayes`
example (see Chapter 10).

## Dependency index --- faster, kernel-specific loading

`opencltools::load_kernel_library()` always sorts and loads **all** files in the
subdirectory. For a large library like `nmath` (137 files) this is
thorough but wasteful when only a small subset is needed for a specific
kernel.

`nmathopencl` therefore ships a pre-built dependency index alongside every
curated library subdirectory. The index encodes the full transitive dependency
map and the global load order, enabling two important optimizations:

1. **`load_library_for_kernel()` (R)** --- loads only the files required by one
   specific kernel, with no sort at runtime.
2. **`load_library_for_kernel()` (C++)** --- the same function implemented
   entirely in C++, suitable for use inside Rcpp kernel wrappers.

### Building the index --- `write_kernel_dependency_index()`

```{r, eval = FALSE}
write_kernel_dependency_index(library_dir, verbose = TRUE)
```

This function analyzes a library directory, computes the full transitive
dependency closure for every stem, and writes **two companion files** next to
the `.cl` sources:

| File | Format | Consumer |
|------|--------|----------|
| `kernel_dependency_index.rds` | Binary R object | R functions |
| `kernel_dependency_index.tsv` | Tab-separated text | C++ code |

The `.tsv` has a header row and one row per stem in global load order:

```
stem	all_depends
refactored	
Rmath	refactored
nmath	refactored, Rmath
chebyshev	refactored, Rmath, nmath
dbinom	dpq, refactored, Rmath, nmath, stirlerr_cycle_free, ...
```

Both files are shipped with the package so end users never need to
regenerate them. Run `write_kernel_dependency_index()` only after adding or
removing `.cl` files from a library directory.

```{r, eval = FALSE}
# Regenerate after changing the nmath library
nmath_dir <- system.file("cl", "nmath", package = "nmathopencl")
write_kernel_dependency_index(nmath_dir, verbose = TRUE)
```

### Loading with the index --- `load_library_for_kernel()` (R)

```{r, eval = FALSE}
load_library_for_kernel(kernel_path, library_dir,
                        depends_tag = "all_depends",
                        index = NULL)
```

Reads the annotation tag `@{depends_tag}` from the kernel file to get the
list of needed stems, looks them up in the pre-loaded index, and returns
their source concatenated in correct dependency order.  Returns `""` when
the kernel has no `@{depends_tag}` annotation (e.g. a kernel using only
OpenCL built-ins).

The `index` argument should always be supplied. Load it once and reuse it
across all calls to avoid reading from disk on every invocation:

```{r, eval = FALSE}
nmath_dir <- system.file("cl", "nmath", package = "nmathopencl")

# Load the index once
idx <- readRDS(file.path(nmath_dir, "kernel_dependency_index.rds"))

# Re-use across multiple kernels --- only the subset for each kernel is read
src_gaussian <- load_library_for_kernel(
    kernel_path = system.file("cl", "ex_glmbayes_src",
                              "f2_f3_gaussian.cl", package = "nmathopencl"),
    library_dir = nmath_dir,
    depends_tag = "depends_nmath",
    index       = idx
)
src_gamma <- load_library_for_kernel(
    kernel_path = system.file("cl", "ex_glmbayes_src",
                              "f2_f3_gamma.cl", package = "nmathopencl"),
    library_dir = nmath_dir,
    depends_tag = "depends_nmath",
    index       = idx
)
```

The annotation tag in the kernel file must follow the same format used
throughout the library:

```c
// @depends_nmath: dbinom, pnorm, dnorm
```

### Extracting a subset --- `extract_library_subset()`

```{r, eval = FALSE}
extract_library_subset(kernel_paths, library_dir, dest_dir,
                       depends_tag = "all_depends",
                       index = NULL,
                       overwrite = FALSE)
```

Given a set of kernel files, reads their `@{depends_tag}` annotations,
takes the union of all transitive dependencies, and copies exactly those
`.cl` files (plus both index files) from `library_dir` to `dest_dir` in
dependency order. The destination directory must already exist.

```{r, eval = FALSE}
nmath_dir    <- system.file("cl", "nmath",           package = "nmathopencl")
src_dir      <- system.file("cl", "ex_glmbayes_src", package = "nmathopencl")
dest_dir     <- "inst/cl/mypkg_nmath"   # must exist

idx <- readRDS(file.path(nmath_dir, "kernel_dependency_index.rds"))

result <- extract_library_subset(
    kernel_paths = list.files(src_dir, pattern = "\\.cl$", full.names = TRUE),
    library_dir  = nmath_dir,
    dest_dir     = dest_dir,
    depends_tag  = "depends_nmath",
    index        = idx
)
print(result)  # data frame: stem, source, dest, copied
```

The result includes a row for `kernel_dependency_index.rds` and
`kernel_dependency_index.tsv` so the extracted subset is immediately usable
by both R and C++.

### C++ equivalent --- `openclPort::load_library_for_kernel()`

For use inside Rcpp kernel wrappers (where calling back into R is not
permitted), `kernel_loader.cpp` provides a C++ implementation with the
same semantics:

```cpp
#include "openclPort.h"   // declares load_library_for_kernel

// Inside a kernel wrapper, after the family/link dispatch has set kernel_file:
std::string nmath_source = openclPort::load_library_for_kernel(
    kernel_file,          // e.g. "ex_glmbayes_src/f2_f3_binomial_logit.cl"
    "ex_glmbayes_nmath",  // library subdirectory
    "nmathopencl",        // package
    "depends_nmath"       // annotation tag
);
```

The C++ function reads `kernel_dependency_index.tsv` from the library
directory once per call, parses the annotation tag from the kernel file, and
loads only the required subset in the pre-computed order.  No topological
sort is performed at runtime.

The `@depends_nmath` annotation must list the **direct** nmath entry-point
stems called by the kernel (not the full transitive list):

```c
// @depends_nmath: dbinom, pnorm, dnorm
```

The transitive expansion is performed automatically using the TSV index.

## Error messages

| Error | Cause |
|-------|-------|
| `"OpenCL support is not available"` | Package built without `USE_OPENCL` |
| `"Dependency sort failed: unresolved dependencies remain."` | A `@depends` entry names a file that does not exist in the directory, or there is a genuine circular dependency |
| `"File not found: inst/cl/<path>"` | `relative_path` does not match any file in the package |
