Converting between single-cell formats with lstar

lstar is a lightweight interchange layer for single-cell and spatial omics. A dataset is a set of axes (labelled sets you index by — cells, genes, pca) and fields (typed data over a tuple of axes — counts, embeddings, graphs, labels), serialized to a portable Zarr store that R, Python, and C++ all read and write. Format conversion is then just write_Y(read_X(obj)) with the L* store as the universal intermediate, and what a target cannot hold is recorded in ds$dropped rather than silently lost.

The model in R, end to end

Everything below runs with only the base dependencies (Matrix); no Seurat/SCE needed.

library(lstar)

cells <- paste0("c", 1:6); genes <- paste0("g", 1:4)
m <- as(matrix(as.numeric(1:24), 6, 4, dimnames = list(cells, genes)), "CsparseMatrix")  # cells x genes

ds <- list(
  kind = "sample",
  axes = list(
    cells = list(labels = cells, origin = "observed", role = "observation"),
    genes = list(labels = genes, origin = "observed", role = "feature")),
  fields = list(
    counts = list(role = "measure", span = c("cells", "genes"), state = "raw", values = m),
    cluster = list(role = "label", span = "cells", values = factor(c("a", "a", "b", "b", "a", "b")))))
class(ds) <- "lstar_dataset"

p <- tempfile(fileext = ".lstar.zarr")
lstar_write(ds, p)            # -> a portable Zarr store (also readable from Python and C++)
ds2 <- lstar_read(p)
ds2
#> lstar_dataset (sample): 2 axes, 2 fields
#>   axis  cells      6
#>   axis  genes      4
#>   field counts         measure    [cells x genes]
#>   field cluster        label      [cells]

A categorical label over cells induces a factor axis whose labels are its categories, so independent per-group results align on one axis.

Converting to and from Seurat / SingleCellExperiment

The profiles map the shared-vocabulary core — counts, normalized/scaled expression, PCA (scores and gene loadings), UMAP/t-SNE, clusterings, cell/gene metadata — between formats. (Not evaluated here, to keep the vignette dependency-free.)

so  <- write_seurat(ds)          # L* dataset  -> Seurat object
ds3 <- read_seurat(so)           # Seurat       -> L* dataset
sce <- write_sce(read_seurat(so))   # Seurat -> SingleCellExperiment, in one line

Cross-language conversions go through the on-disk store — write it on one side, read it on the other, no shared memory and no format re-implementation:

# Python:  lstar.write(read_anndata(ad.read_h5ad("pbmc.h5ad")), "pbmc.lstar.zarr")
ds_from_h5ad <- lstar_read("pbmc.lstar.zarr")
saveRDS(write_seurat(ds_from_h5ad), "pbmc.rds")

The lstar convert command line

The Python package ships a one-command CLI that detects formats by path, bridges R and Python through the store automatically, and reports what crossed (and what was dropped):

lstar convert pbmc.h5ad pbmc.rds --report        # AnnData -> Seurat, with a fidelity report
lstar convert pbmc.rds  pbmc.h5ad --check        # + open the result in its native library and smoke-test it

--backend auto|native|direct adds a package-free fallback: .h5ad converts with only h5py (no anndata), and a Seurat .rds reads and writes with base R + this package (no SeuratObject); an SCE .rds reads package-free. See vignette topics and the package website for the full conversion matrix. ```