Feature·Observability·OpenTelemetry
Every call, traced.
Set one environment variable and every llm/complete, agent/run, tool dispatch, and retry is recorded as an OpenTelemetry trace — automatically, no instrumentation code. GenAI semantic conventions so backends understand it out of the box. Off by default; never blocks your run.
Jaeger · Grafana · Langfuse · Datadog · Honeycomb · SigNoz · and more
What a trace looks like
An agent run, as a span tree.
One agent/run produces a tree of nested spans. The agent span contains LLM call spans and tool execution spans. Retries nest under the LLM call that triggered them. Every span carries GenAI attributes — model, token counts, cost, finish reason.
Setup
One variable. That's it.
Point Sema at your tracing backend with a single environment variable. Every LLM call, tool dispatch, agent run, and retry is instrumented automatically — no code changes, no SDK imports, no wrapper functions. Tracing is off by default; set neither variable and nothing is recorded.
- Network backend.
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318— sends spans to Jaeger, Grafana, Langfuse, any OTLP receiver. Telemetry is sent in the background — a slow or dead backend can't delay or crash your script. - File backend.
SEMA_OTEL_FILE=/tmp/trace.jsonl— writes spans to a local file, one JSON object per line. No network needed.
Automatic instrumentation
What gets traced — without you writing anything.
llm/complete and llm/chat — including cache hitsllm/embed callagent/run and tools-enabled completionBackend compatibility
Works with your tools.
Sema follows the OpenTelemetry GenAI semantic conventions, so any OTLP-compatible backend reads the traces natively. A handful of LLM-specific tools need a compat flag — one env var, no code changes.
- No compat mode needed. Jaeger, Grafana/Tempo, SigNoz, OpenObserve, Honeycomb, Datadog, Dynatrace, Logfire.
- One env var for the rest.
SEMA_OTEL_COMPAT=langfuse(oropeninference,arize, etc.) adds extra attribute names alongside the standardgen_ai.*ones. - Auth via headers.
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer ..."— standard OTLP auth, works with any hosted backend.
Custom spans
Add your own. Or don't.
The built-in llm/* and agent/* calls are traced for you. When you build your own abstractions — a RAG loop, a batch job, a custom provider — typed span helpers let them emit first-class spans too. Every one is a no-op when tracing is off, so they're safe to leave in.
- Generic spans.
(with-span "ingest-batch" {:batch.size 100} ...)— name, attributes, body. - Typed spans.
otel/llm-span,otel/tool-span,otel/retrieval-span— render like the built-ins in backends. - Annotate the current span.
otel/set-attribute,otel/event,otel/set-status— typed values, not strings. - Session grouping.
(with-session "chat-42" {:user "alice"} ...)— groups spans for Langfuse sessions.
(with-span "ingest-batch" {:batch.size 100} (otel/event "started" {}) (otel/retrieval-span "vector-search" (lambda () (search index query)) {:top-k 5})) (otel/llm-span {:model "custom-model" :provider "myco"} (lambda () (define resp (my-llm-call prompt)) (otel/llm-usage {:input-tokens 120 :output-tokens 30 :cost-usd 0.001}) resp))
Metrics & privacy
Counts without content.
When exporting over a network endpoint, Sema also records two GenAI metric histograms: token usage and operation duration. Prompt and response text is never recorded unless you explicitly opt in — token counts, model names, cost, and timing carry no message text and are always exported.
- Token usage metric.
gen_ai.client.token.usage— input/output token counts per call. - Duration metric.
gen_ai.client.operation.duration— call latency in seconds. - Content capture is opt-in. Set
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=trueto record prompt/response text. Off by default; long messages are truncated.
Turn it on. See everything.
One environment variable between you and a full trace tree.