Feature·Observability·OpenTelemetry

Every call, traced.

Set one environment variable and every llm/complete, agent/run, tool dispatch, and retry is recorded as an OpenTelemetry trace — automatically, no instrumentation code. GenAI semantic conventions so backends understand it out of the box. Off by default; never blocks your run.

Read the docs Try the playground

$OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 sema agent.sema

Jaeger · Grafana · Langfuse · Datadog · Honeycomb · SigNoz · and more

What a trace looks like

An agent run, as a span tree.

One agent/run produces a tree of nested spans. The agent span contains LLM call spans and tool execution spans. Retries nest under the LLM call that triggered them. Every span carries GenAI attributes — model, token counts, cost, finish reason.

sema847ms6 spans

212

424

635

847ms

invoke_agent coder

847ms

chat claude-sonnet-4-6

524ms

execute_tool read-file

152ms

chat claude-sonnet-4-6

167ms

llm.retry_attempt

103ms

execute_tool run-command

64ms

gen_ai.request.model

claude-sonnet-4-6

gen_ai.usage.input_tokens

1,247

gen_ai.usage.output_tokens

382

gen_ai.usage.cost

$0.012

gen_ai.response.finish_reasons

["stop"]

sema.gen_ai.cache.hit

true

Setup

One variable. That's it.

Point Sema at your tracing backend with a single environment variable. Every LLM call, tool dispatch, agent run, and retry is instrumented automatically — no code changes, no SDK imports, no wrapper functions. Tracing is off by default; set neither variable and nothing is recorded.

Network backend. OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 — sends spans to Jaeger, Grafana, Langfuse, any OTLP receiver. Telemetry is sent in the background — a slow or dead backend can't delay or crash your script.
File backend. SEMA_OTEL_FILE=/tmp/trace.jsonl — writes spans to a local file, one JSON object per line. No network needed.

terminal — one-minute setup

# Start Jaeger (free, local)

$ docker run --rm -d -p 4318:4318 \

-p 16686:16686 jaegertracing/all-in-one

# Point Sema at it and run

$ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \

sema -e '(llm/complete "hi" {:max-tokens 16})'

→ "Hello! How can I help?"

# Open http://localhost:16686 — trace is there

✓ 1 trace · 1 span · 42 tok · $0.0003

Automatic instrumentation

What gets traced — without you writing anything.

CLIENT

chat {model}

Every llm/complete and llm/chat — including cache hits

CLIENT

embeddings {model}

Every llm/embed call

INTERNAL

execute_tool {name}

Every tool dispatch in an agent loop

INTERNAL

invoke_agent {name}

Every agent/run and tools-enabled completion

INTERNAL

notebook.run_all

A notebook "Run All" — one child span per cell

INTERNAL

llm.retry_attempt

Each HTTP retry (429 / 5xx / network), nested under the LLM span

Backend compatibility

Works with your tools.

Sema follows the OpenTelemetry GenAI semantic conventions, so any OTLP-compatible backend reads the traces natively. A handful of LLM-specific tools need a compat flag — one env var, no code changes.

No compat mode needed. Jaeger, Grafana/Tempo, SigNoz, OpenObserve, Honeycomb, Datadog, Dynatrace, Logfire.
One env var for the rest. SEMA_OTEL_COMPAT=langfuse (or openinference, arize, etc.) adds extra attribute names alongside the standard gen_ai.* ones.
Auth via headers. OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer ..." — standard OTLP auth, works with any hosted backend.

Backend compatibility guide →

Jaeger

Grafana

Langfuse

Datadog

Honeycomb

SigNoz

OpenObserve

Logfire

Phoenix

Elastic

Dynatrace

New Relic

Coralogix

MLflow

LangSmith

+ more

Custom spans

Add your own. Or don't.

The built-in llm/* and agent/* calls are traced for you. When you build your own abstractions — a RAG loop, a batch job, a custom provider — typed span helpers let them emit first-class spans too. Every one is a no-op when tracing is off, so they're safe to leave in.

Generic spans. (with-span "ingest-batch" {:batch.size 100} ...) — name, attributes, body.
Typed spans. otel/llm-span, otel/tool-span, otel/retrieval-span — render like the built-ins in backends.
Annotate the current span. otel/set-attribute, otel/event, otel/set-status — typed values, not strings.
Session grouping. (with-session "chat-42" {:user "alice"} ...) — groups spans for Langfuse sessions.

pipeline.semacustom spans

(with-span "ingest-batch"
  {:batch.size 100}
  (otel/event "started" {})
  (otel/retrieval-span
    "vector-search"
    (lambda ()
      (search index query))
    {:top-k 5}))

(otel/llm-span
  {:model "custom-model"
   :provider "myco"}
  (lambda ()
    (define resp (my-llm-call prompt))
    (otel/llm-usage
      {:input-tokens 120
       :output-tokens 30
       :cost-usd 0.001})
    resp))

Metrics & privacy

Counts without content.

When exporting over a network endpoint, Sema also records two GenAI metric histograms: token usage and operation duration. Prompt and response text is never recorded unless you explicitly opt in — token counts, model names, cost, and timing carry no message text and are always exported.

Token usage metric. gen_ai.client.token.usage — input/output token counts per call.
Duration metric. gen_ai.client.operation.duration — call latency in seconds.
Content capture is opt-in. Set OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true to record prompt/response text. Off by default; long messages are truncated.

gen_ai.client.token.usage

input

1,247

output

382

gen_ai.client.operation.duration

p50

340ms

p95

720ms

p99

890ms

Turn it on. See everything.

One environment variable between you and a full trace tree.

otel$OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 sema agent.sema

file$SEMA_OTEL_FILE=/tmp/trace.jsonl sema agent.sema

Observability docs Backend compatibility