Feature·Observability·OpenTelemetry

Every call, traced.

Set one environment variable and every llm/complete, agent/run, tool dispatch, and retry is recorded as an OpenTelemetry trace — automatically, no instrumentation code. GenAI semantic conventions so backends understand it out of the box. Off by default; never blocks your run.

$OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 sema agent.sema

Jaeger · Grafana · Langfuse · Datadog · Honeycomb · SigNoz · and more

What a trace looks like

An agent run, as a span tree.

One agent/run produces a tree of nested spans. The agent span contains LLM call spans and tool execution spans. Retries nest under the LLM call that triggered them. Every span carries GenAI attributes — model, token counts, cost, finish reason.

sema847ms6 spans
0
212
424
635
847ms
invoke_agent coder
847ms
chat claude-sonnet-4-6
524ms
execute_tool read-file
152ms
chat claude-sonnet-4-6
167ms
llm.retry_attempt
103ms
execute_tool run-command
64ms
gen_ai.request.model
claude-sonnet-4-6
gen_ai.usage.input_tokens
1,247
gen_ai.usage.output_tokens
382
gen_ai.usage.cost
$0.012
gen_ai.response.finish_reasons
["stop"]
sema.gen_ai.cache.hit
true

Setup

One variable. That's it.

Point Sema at your tracing backend with a single environment variable. Every LLM call, tool dispatch, agent run, and retry is instrumented automatically — no code changes, no SDK imports, no wrapper functions. Tracing is off by default; set neither variable and nothing is recorded.

  • Network backend. OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 — sends spans to Jaeger, Grafana, Langfuse, any OTLP receiver. Telemetry is sent in the background — a slow or dead backend can't delay or crash your script.
  • File backend. SEMA_OTEL_FILE=/tmp/trace.jsonl — writes spans to a local file, one JSON object per line. No network needed.
terminal — one-minute setup
# Start Jaeger (free, local)
$ docker run --rm -d -p 4318:4318 \
  -p 16686:16686 jaegertracing/all-in-one
 
# Point Sema at it and run
$ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  sema -e '(llm/complete "hi" {:max-tokens 16})'
→ "Hello! How can I help?"
 
# Open http://localhost:16686 — trace is there
✓ 1 trace · 1 span · 42 tok · $0.0003

Automatic instrumentation

What gets traced — without you writing anything.

CLIENT
chat {model}
Every llm/complete and llm/chat — including cache hits
CLIENT
embeddings {model}
Every llm/embed call
INTERNAL
execute_tool {name}
Every tool dispatch in an agent loop
INTERNAL
invoke_agent {name}
Every agent/run and tools-enabled completion
INTERNAL
notebook.run_all
A notebook "Run All" — one child span per cell
INTERNAL
llm.retry_attempt
Each HTTP retry (429 / 5xx / network), nested under the LLM span

Backend compatibility

Works with your tools.

Sema follows the OpenTelemetry GenAI semantic conventions, so any OTLP-compatible backend reads the traces natively. A handful of LLM-specific tools need a compat flag — one env var, no code changes.

  • No compat mode needed. Jaeger, Grafana/Tempo, SigNoz, OpenObserve, Honeycomb, Datadog, Dynatrace, Logfire.
  • One env var for the rest. SEMA_OTEL_COMPAT=langfuse (or openinference, arize, etc.) adds extra attribute names alongside the standard gen_ai.* ones.
  • Auth via headers. OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer ..." — standard OTLP auth, works with any hosted backend.

Backend compatibility guide →

Jaeger
Grafana
Langfuse
Datadog
Honeycomb
SigNoz
OpenObserve
Logfire
Phoenix
Elastic
Dynatrace
New Relic
Coralogix
MLflow
LangSmith
+ more

Custom spans

Add your own. Or don't.

The built-in llm/* and agent/* calls are traced for you. When you build your own abstractions — a RAG loop, a batch job, a custom provider — typed span helpers let them emit first-class spans too. Every one is a no-op when tracing is off, so they're safe to leave in.

  • Generic spans. (with-span "ingest-batch" {:batch.size 100} ...) — name, attributes, body.
  • Typed spans. otel/llm-span, otel/tool-span, otel/retrieval-span — render like the built-ins in backends.
  • Annotate the current span. otel/set-attribute, otel/event, otel/set-status — typed values, not strings.
  • Session grouping. (with-session "chat-42" {:user "alice"} ...) — groups spans for Langfuse sessions.
pipeline.semacustom spans
(with-span "ingest-batch"
  {:batch.size 100}
  (otel/event "started" {})
  (otel/retrieval-span
    "vector-search"
    (lambda ()
      (search index query))
    {:top-k 5}))

(otel/llm-span
  {:model "custom-model"
   :provider "myco"}
  (lambda ()
    (define resp (my-llm-call prompt))
    (otel/llm-usage
      {:input-tokens 120
       :output-tokens 30
       :cost-usd 0.001})
    resp))

Metrics & privacy

Counts without content.

When exporting over a network endpoint, Sema also records two GenAI metric histograms: token usage and operation duration. Prompt and response text is never recorded unless you explicitly opt in — token counts, model names, cost, and timing carry no message text and are always exported.

  • Token usage metric. gen_ai.client.token.usage — input/output token counts per call.
  • Duration metric. gen_ai.client.operation.duration — call latency in seconds.
  • Content capture is opt-in. Set OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true to record prompt/response text. Off by default; long messages are truncated.
gen_ai.client.token.usage
input
1,247
output
382
gen_ai.client.operation.duration
p50
340ms
p95
720ms
p99
890ms

Turn it on. See everything.

One environment variable between you and a full trace tree.

otel$OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 sema agent.sema
file$SEMA_OTEL_FILE=/tmp/trace.jsonl sema agent.sema