Feature·Cassettes·Record & Replay

Record once. Replay forever.

A cassette saves the answers from real LLM calls to a file the first time you run, then plays them back on every run after — no API key, no network, the same output every time. Deterministic tests without secrets. Offline demos that always work.

Read the docs Try the playground

$cargo install sema-lang

NDJSON tapes · safe to commit · works with agents, streaming, embeddings

weather-agent.jsonl

RECORDING

complete

a1b2c3…

"Hello"

12 tok · gpt-5-mini

complete

d4e5f6…

"It's sunny, 22°C."

28 tok · gpt-5-mini

embed

g7h8i9…

[0.01, -0.02, 0.03, …]

1536 dim · text-embedding-3-small

complete

j0k1l2…

⚠cassette miss in :replay mode

not recorded

Three modes

Record, replay, or both.

:mode decides what happens on each call. :auto is the friendly default — records what's missing, replays what it has. :replay is for CI — never touches the network, and a call that isn't on the tape is a hard error. That error is a feature: it tells you exactly which call drifted.

:auto (default). Record new calls, replay recorded ones. Great for writing tapes.
:replay. Only play back. A missing call is a hard error that names the request. CI mode.
:record. Always call the model and record. Use when you want a fresh tape.

:auto

On tapeplay it back

New callrecord it

default · writing tapes

:replay

On tapeplay it back

New callerror

CI · no network · no key

:record

On tapecall model, re-record

New callcall model, record

fresh tape · re-record

CI pattern

Record locally. Replay in CI.

Run your suite once with an API key to capture tapes. Commit them. Then run CI with SEMA_LLM_CASSETTE_MODE=replay — no secrets, no network, no flakiness. Any call that isn't on the tape fails loudly, so a prompt change can't silently hit a live model.

No API key in CI. The tape has the answers. Run sema test/agents.sema with zero secrets.
Deterministic by design. Same input, same output, every run. LLM tests that pass reliably.
Drift detection. Change a prompt or model and the old tape stops matching — the failure names the call.

CI setup guide →

terminal — record & replay

$ # Record: run once with a key

$ SEMA_LLM_CASSETTE=tapes/suite.jsonl \

sema test/agents.sema

→ recording 12 LLM calls…

✓ wrote tapes/suite.jsonl (4.8 KB)

$ git add tapes/suite.jsonl && git commit

→ committed tape

$ # CI: replay with no key

$ SEMA_LLM_CASSETTE=tapes/suite.jsonl \

SEMA_LLM_CASSETTE_MODE=replay \

sema test/agents.sema

→ replaying 12 calls from tape…

✓ all tests passed (0 API calls)

What's in the file

Plain text. Safe to commit.

A tape is NDJSON — one JSON object per line, one line per saved call. It's diffable, appendable, and reviewable in a pull request. Only the answer is saved, looked up by a fingerprint of the request. Your prompt text, API key, and headers are never written to the file.

Only answers. The model, tokens, and response are saved. Prompts and keys are not.
NDJSON format. One line per call. git diff shows exactly what changed when you re-record.
Versioned. A "v" field supports future format changes.

tapes/suite.jsonl3 lines · 1.2 KB

{"v":1,"kind":"complete","key":"a1b2…",
 "content":"Hello",
 "model":"gpt-5-mini",
 "prompt_tokens":12,
 "completion_tokens":1}

{"v":1,"kind":"stream","key":"c3d4…",
 "content":"Hi there",
 "chunks":["Hi"," there"],
 "completion_tokens":2}

{"v":1,"kind":"embed","key":"e5f6…",
 "model":"text-embedding-3-small",
 "embeddings":[[0.01,-0.02,0.03]]}

Agents & streaming

Every call type, covered.

Cassettes work with llm/complete, llm/chat, llm/extract, agent/run, streaming, and embeddings. For agents, each turn is saved separately — so a full multi-turn run (model → tool call → result → final answer) replays exactly. Your tool handlers still run on replay; the cassette only stands in for the model's responses.

Agent tool loops. Each model turn is recorded independently. Tools execute normally on replay — deterministic model output, real tool logic.
Streaming. The exact sequence of chunks is saved and replayed in order. A streaming UI behaves identically offline.
Embeddings. Vectors are saved byte-for-byte. Similarity scores and vector-store contents are exactly reproducible.

✓

llm/complete

answer, model, tokens, finish reason

✓

llm/chat

multi-turn conversations

✓

llm/extract

structured results rebuilt from saved answer

✓

agent/run

each turn saved separately · tools still run

✓

llm/stream

chunks saved and replayed in order

✓

llm/embed

vectors saved byte-for-byte

How it composes

Slots in, doesn't conflict.

A cassette sits just above the real model and below everything else. Cost tracking keeps working on replay — the saved answer carries its real token counts. Tracing still produces OpenTelemetry spans. The response cache is turned off inside a cassette block so the tape always answers first.

Cost & budgets. A replayed call reports real token usage — so budget limits and llm/session-usage behave as if the call happened.
Tracing. Replayed calls still produce OTel spans with recorded model and token counts.
Retries & fallback. While recording, the normal retry logic wraps the real call. On replay there's nothing to retry.

Your code

↓

cassette record / replay

↓

cost · budgets · tracing · retry · fallback

↓

LLM provider (API)

Notebook integration

Tapes for notebooks.

Record LLM cells once with a key, commit the tape next to the .sema-nb file, and the notebook re-runs the same way forever — offline, for anyone, in CI. One setup cell turns it on; every LLM cell after it records or replays automatically.

One setup cell. (llm/cassette-load "tapes/nb.jsonl" {:mode :auto}) — every cell after it is recorded or replayed.
Headless replay. SEMA_LLM_CASSETTE_MODE=replay sema notebook run nb.sema-nb — no key, no network.
Shareable demos. Ship the tape with the notebook. Anyone can run it and see the exact same output.

Notebook feature page →

[1]

(llm/cassette-load
"tapes/nb.jsonl"
{:mode :auto})

cassette loaded · recording

[2]

(llm/complete "Summarize Sema"
{:model "gpt-5-mini"})

"A Lisp with LLM primitives in Rust."

12 tok · recorded · $0.0002

[3]

(llm/cassette-save)

wrote tapes/nb.jsonl (0.8 KB)

Record your first tape in seconds.

Wrap your LLM calls in llm/with-cassette, run once, commit the tape.

curl$curl -fsSL https://sema-lang.com/install.sh | sh

cargo$cargo install sema-lang

Cassettes docs Open the playground