Feature·Cassettes·Record & Replay

Record once. Replay forever.

A cassette saves the answers from real LLM calls to a file the first time you run, then plays them back on every run after — no API key, no network, the same output every time. Deterministic tests without secrets. Offline demos that always work.

$cargo install sema-lang

NDJSON tapes · safe to commit · works with agents, streaming, embeddings

weather-agent.jsonl
RECORDING
01
complete
a1b2c3…
"Hello"
12 tok · gpt-5-mini
02
complete
d4e5f6…
"It's sunny, 22°C."
28 tok · gpt-5-mini
03
embed
g7h8i9…
[0.01, -0.02, 0.03, …]
1536 dim · text-embedding-3-small
04
complete
j0k1l2…
cassette miss in :replay mode
not recorded

Three modes

Record, replay, or both.

:mode decides what happens on each call. :auto is the friendly default — records what's missing, replays what it has. :replay is for CI — never touches the network, and a call that isn't on the tape is a hard error. That error is a feature: it tells you exactly which call drifted.

  • :auto (default). Record new calls, replay recorded ones. Great for writing tapes.
  • :replay. Only play back. A missing call is a hard error that names the request. CI mode.
  • :record. Always call the model and record. Use when you want a fresh tape.
:auto
On tapeplay it back
New callrecord it
default · writing tapes
:replay
On tapeplay it back
New callerror
CI · no network · no key
:record
On tapecall model, re-record
New callcall model, record
fresh tape · re-record

CI pattern

Record locally. Replay in CI.

Run your suite once with an API key to capture tapes. Commit them. Then run CI with SEMA_LLM_CASSETTE_MODE=replay — no secrets, no network, no flakiness. Any call that isn't on the tape fails loudly, so a prompt change can't silently hit a live model.

  • No API key in CI. The tape has the answers. Run sema test/agents.sema with zero secrets.
  • Deterministic by design. Same input, same output, every run. LLM tests that pass reliably.
  • Drift detection. Change a prompt or model and the old tape stops matching — the failure names the call.

CI setup guide →

terminal — record & replay
$ # Record: run once with a key
$ SEMA_LLM_CASSETTE=tapes/suite.jsonl \
  sema test/agents.sema
→ recording 12 LLM calls…
✓ wrote tapes/suite.jsonl (4.8 KB)
 
$ git add tapes/suite.jsonl && git commit
→ committed tape
 
$ # CI: replay with no key
$ SEMA_LLM_CASSETTE=tapes/suite.jsonl \
  SEMA_LLM_CASSETTE_MODE=replay \
  sema test/agents.sema
→ replaying 12 calls from tape…
✓ all tests passed (0 API calls)

What's in the file

Plain text. Safe to commit.

A tape is NDJSON — one JSON object per line, one line per saved call. It's diffable, appendable, and reviewable in a pull request. Only the answer is saved, looked up by a fingerprint of the request. Your prompt text, API key, and headers are never written to the file.

  • Only answers. The model, tokens, and response are saved. Prompts and keys are not.
  • NDJSON format. One line per call. git diff shows exactly what changed when you re-record.
  • Versioned. A "v" field supports future format changes.
tapes/suite.jsonl3 lines · 1.2 KB
{"v":1,"kind":"complete","key":"a1b2…",
 "content":"Hello",
 "model":"gpt-5-mini",
 "prompt_tokens":12,
 "completion_tokens":1}

{"v":1,"kind":"stream","key":"c3d4…",
 "content":"Hi there",
 "chunks":["Hi"," there"],
 "completion_tokens":2}

{"v":1,"kind":"embed","key":"e5f6…",
 "model":"text-embedding-3-small",
 "embeddings":[[0.01,-0.02,0.03]]}

Agents & streaming

Every call type, covered.

Cassettes work with llm/complete, llm/chat, llm/extract, agent/run, streaming, and embeddings. For agents, each turn is saved separately — so a full multi-turn run (model → tool call → result → final answer) replays exactly. Your tool handlers still run on replay; the cassette only stands in for the model's responses.

  • Agent tool loops. Each model turn is recorded independently. Tools execute normally on replay — deterministic model output, real tool logic.
  • Streaming. The exact sequence of chunks is saved and replayed in order. A streaming UI behaves identically offline.
  • Embeddings. Vectors are saved byte-for-byte. Similarity scores and vector-store contents are exactly reproducible.
llm/complete
answer, model, tokens, finish reason
llm/chat
multi-turn conversations
llm/extract
structured results rebuilt from saved answer
agent/run
each turn saved separately · tools still run
llm/stream
chunks saved and replayed in order
llm/embed
vectors saved byte-for-byte

How it composes

Slots in, doesn't conflict.

A cassette sits just above the real model and below everything else. Cost tracking keeps working on replay — the saved answer carries its real token counts. Tracing still produces OpenTelemetry spans. The response cache is turned off inside a cassette block so the tape always answers first.

  • Cost & budgets. A replayed call reports real token usage — so budget limits and llm/session-usage behave as if the call happened.
  • Tracing. Replayed calls still produce OTel spans with recorded model and token counts.
  • Retries & fallback. While recording, the normal retry logic wraps the real call. On replay there's nothing to retry.
Your code
cassette record / replay
cost · budgets · tracing · retry · fallback
LLM provider (API)

Notebook integration

Tapes for notebooks.

Record LLM cells once with a key, commit the tape next to the .sema-nb file, and the notebook re-runs the same way forever — offline, for anyone, in CI. One setup cell turns it on; every LLM cell after it records or replays automatically.

  • One setup cell. (llm/cassette-load "tapes/nb.jsonl" {:mode :auto}) — every cell after it is recorded or replayed.
  • Headless replay. SEMA_LLM_CASSETTE_MODE=replay sema notebook run nb.sema-nb — no key, no network.
  • Shareable demos. Ship the tape with the notebook. Anyone can run it and see the exact same output.

Notebook feature page →

[1]
(llm/cassette-load
  "tapes/nb.jsonl"
  {:mode :auto})
cassette loaded · recording
[2]
(llm/complete "Summarize Sema"
  {:model "gpt-5-mini"})
"A Lisp with LLM primitives in Rust."
12 tok · recorded · $0.0002
[3]
(llm/cassette-save)
wrote tapes/nb.jsonl (0.8 KB)

Record your first tape in seconds.

Wrap your LLM calls in llm/with-cassette, run once, commit the tape.

curl$curl -fsSL https://sema-lang.com/install.sh | sh
cargo$cargo install sema-lang