Feature·Cassettes·Record & Replay
Record once. Replay forever.
A cassette saves the answers from real LLM calls to a file the first time you run, then plays them back on every run after — no API key, no network, the same output every time. Deterministic tests without secrets. Offline demos that always work.
NDJSON tapes · safe to commit · works with agents, streaming, embeddings
Three modes
Record, replay, or both.
:mode decides what happens on each call. :auto is the friendly default — records what's missing, replays what it has. :replay is for CI — never touches the network, and a call that isn't on the tape is a hard error. That error is a feature: it tells you exactly which call drifted.
- :auto (default). Record new calls, replay recorded ones. Great for writing tapes.
- :replay. Only play back. A missing call is a hard error that names the request. CI mode.
- :record. Always call the model and record. Use when you want a fresh tape.
CI pattern
Record locally. Replay in CI.
Run your suite once with an API key to capture tapes. Commit them. Then run CI with SEMA_LLM_CASSETTE_MODE=replay — no secrets, no network, no flakiness. Any call that isn't on the tape fails loudly, so a prompt change can't silently hit a live model.
- No API key in CI. The tape has the answers. Run
sema test/agents.semawith zero secrets. - Deterministic by design. Same input, same output, every run. LLM tests that pass reliably.
- Drift detection. Change a prompt or model and the old tape stops matching — the failure names the call.
What's in the file
Plain text. Safe to commit.
A tape is NDJSON — one JSON object per line, one line per saved call. It's diffable, appendable, and reviewable in a pull request. Only the answer is saved, looked up by a fingerprint of the request. Your prompt text, API key, and headers are never written to the file.
- Only answers. The model, tokens, and response are saved. Prompts and keys are not.
- NDJSON format. One line per call.
git diffshows exactly what changed when you re-record. - Versioned. A
"v"field supports future format changes.
{"v":1,"kind":"complete","key":"a1b2…", "content":"Hello", "model":"gpt-5-mini", "prompt_tokens":12, "completion_tokens":1} {"v":1,"kind":"stream","key":"c3d4…", "content":"Hi there", "chunks":["Hi"," there"], "completion_tokens":2} {"v":1,"kind":"embed","key":"e5f6…", "model":"text-embedding-3-small", "embeddings":[[0.01,-0.02,0.03]]}
Agents & streaming
Every call type, covered.
Cassettes work with llm/complete, llm/chat, llm/extract, agent/run, streaming, and embeddings. For agents, each turn is saved separately — so a full multi-turn run (model → tool call → result → final answer) replays exactly. Your tool handlers still run on replay; the cassette only stands in for the model's responses.
- Agent tool loops. Each model turn is recorded independently. Tools execute normally on replay — deterministic model output, real tool logic.
- Streaming. The exact sequence of chunks is saved and replayed in order. A streaming UI behaves identically offline.
- Embeddings. Vectors are saved byte-for-byte. Similarity scores and vector-store contents are exactly reproducible.
How it composes
Slots in, doesn't conflict.
A cassette sits just above the real model and below everything else. Cost tracking keeps working on replay — the saved answer carries its real token counts. Tracing still produces OpenTelemetry spans. The response cache is turned off inside a cassette block so the tape always answers first.
- Cost & budgets. A replayed call reports real token usage — so budget limits and
llm/session-usagebehave as if the call happened. - Tracing. Replayed calls still produce OTel spans with recorded model and token counts.
- Retries & fallback. While recording, the normal retry logic wraps the real call. On replay there's nothing to retry.
Notebook integration
Tapes for notebooks.
Record LLM cells once with a key, commit the tape next to the .sema-nb file, and the notebook re-runs the same way forever — offline, for anyone, in CI. One setup cell turns it on; every LLM cell after it records or replays automatically.
- One setup cell.
(llm/cassette-load "tapes/nb.jsonl" {:mode :auto})— every cell after it is recorded or replayed. - Headless replay.
SEMA_LLM_CASSETTE_MODE=replay sema notebook run nb.sema-nb— no key, no network. - Shareable demos. Ship the tape with the notebook. Anyone can run it and see the exact same output.
"tapes/nb.jsonl"
{:mode :auto})
{:model "gpt-5-mini"})
Record your first tape in seconds.
Wrap your LLM calls in llm/with-cassette, run once, commit the tape.