stt.api is a minimal, backend-agnostic
R client for OpenAI-compatible speech-to-text (STT)
APIs, with optional local fallbacks.
It lets you transcribe audio in R without caring which backend actually performs the transcription.
A unified interface for speech-to-text in R
A way to switch easily between:
{whisper} (native R torch, local GPU/CPU)/v1/audio/transcriptions (cloud or local
servers)Designed for scripting, Shiny apps, containers, and reproducible pipelines
{whisper}remotes::install_github("cornball-ai/stt.api")Required dependencies are minimal:
curljsonliteOptional backends:
{whisper} (recommended, on CRAN)install.packages("whisper")
remotes::install_github("cornball-ai/stt.api")
library(stt.api)
res <- stt("speech.wav")
res$textThat’s it. With {whisper} installed, stt()
transcribes locally on GPU or CPU with no configuration needed.
stt.api also supports OpenAI-compatible APIs for cloud or container-based transcription:
set_stt_base("http://localhost:4123")
# Optional, for hosted services like OpenAI
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
res <- stt("speech.wav", backend = "openai")This works with OpenAI, Whisper containers, LM Studio, OpenWebUI,
AnythingLLM, or any server implementing
/v1/audio/transcriptions.
When you call stt() without specifying a backend, it
picks the first available:
{whisper} (native R torch, if installed)stt.api_base is set)Regardless of backend, stt() always returns the same
structure:
list(
text = "Transcribed text",
segments = NULL | data.frame(...),
language = "en",
backend = "api" | "whisper",
raw = <raw backend response>
)This makes it easy to switch backends without changing downstream code.
stt_health()Returns:
list(
ok = TRUE,
backend = "api",
message = "OK"
)Useful for Shiny apps and deployment checks.
Explicit backend choice:
stt("speech.wav", backend = "openai")
stt("speech.wav", backend = "whisper")Automatic selection (default):
stt("speech.wav")stt.api targets the OpenAI-compatible STT
spec:
POST /v1/audio/transcriptions
This is intentionally chosen because it is:
options(
stt.api_base = NULL,
stt.api_key = NULL,
stt.timeout = 60
)Setters:
set_stt_base()
set_stt_key()Example:
Error in stt():
No transcription backend available.
Install whisper or set stt.api_base.
stt.api is designed to pair cleanly with
tts.api:
| Task | Package |
|---|---|
| Speech → Text | stt.api |
| Text → Speech | tts.api |
Both share:
Installing and maintaining local Whisper backends can be difficult:
stt.api lets you decouple your R code from those
concerns.
Your transcription code stays the same whether the backend is:
MIT