Cost Tracking & Budgets
Usage Tracking
llm/last-usage
Get token usage from the most recent LLM call.
(llm/last-usage)
; => {:prompt-tokens 42 :completion-tokens 15 :total-tokens 57
; :cache-read-tokens 0 :cache-creation-tokens 0
; :model "..." :cost-usd 0.0003}llm/session-usage
Get cumulative usage across all LLM calls in the current session.
(llm/session-usage)
; => {:prompt-tokens 1280 :completion-tokens 410 :total-tokens 1690
; :cache-read-tokens 1024 :cache-creation-tokens 0 :cost-usd 0.012}Prompt-cache tokens
:cache-read-tokens and :cache-creation-tokens report how many input tokens were served from (or written to) the provider's prompt cache — large savings when you repeat a stable prefix across calls.
- OpenAI and Gemini (2.5+) cache implicitly: send the same long prefix twice and the second call reports
:cache-read-tokensautomatically. Reads are a subset of:prompt-tokens. - Anthropic reports
:cache-read-tokensand:cache-creation-tokensseparately from:prompt-tokens(caching there is opt-in viacache_control). - Providers that don't report cache counts leave these at
0.
Cost is currently priced at the standard input rate; cached reads are reported for visibility but not yet discounted in
:cost-usd.
llm/reset-usage
Reset session usage counters.
(llm/reset-usage)Pricing Sources
Sema tracks LLM costs using pricing data from these sources, checked in this order:
- Custom pricing — set via
(llm/set-pricing "model" input output), always wins - Bundled price list — a models.dev snapshot (2,400+ models) that ships with Sema, so cost tracking works fully offline with no network calls
- Unknown — if no source matches, cost tracking returns
niland budget enforcement is best-effort
The embedded snapshot is refreshed by maintainers with make update-pricing and shipped in patch releases. Prices are matched by model id, preferring the canonical first-party listing; when the serving provider is known (e.g. inside an llm/with-fallback chain), a reseller/gateway that lists the same model at a different rate is priced correctly.
llm/pricing-status
Check the pricing source and the snapshot date.
(llm/pricing-status)
; => {:source "embedded" :updated-at "2026-06-18"}Budget Enforcement
Note: If pricing is unknown for a model (not in any source), budget enforcement operates in best-effort mode — the call proceeds with a one-time warning. Use
(llm/set-pricing)to set pricing for unlisted models.
llm/set-budget
Set a spending limit (in dollars) for the session. LLM calls that would exceed the budget will fail.
(llm/set-budget 1.00) ; set $1.00 spending limitllm/budget-remaining
Check current budget status.
(llm/budget-remaining) ; => {:limit 1.0 :spent 0.05 :remaining 0.95}llm/with-budget
Scoped budget — sets spending limits for the duration of a thunk, then restores the previous budget when done. At least one of :max-cost-usd or :max-tokens is required. When both are provided, whichever limit is hit first triggers the error.
;; Cost-based budget
(llm/with-budget {:max-cost-usd 0.50} (lambda ()
(llm/complete "Expensive operation")))
;; Token-based budget (useful when pricing is unknown or stale)
(llm/with-budget {:max-tokens 10000} (lambda ()
(llm/complete "Limited tokens")))
;; Both limits — whichever is reached first stops execution
(llm/with-budget {:max-cost-usd 1.00 :max-tokens 50000} (lambda ()
(llm/complete "Double-capped")
(println (format "Budget: ~a" (llm/budget-remaining)))))When a token budget is active, llm/budget-remaining includes :token-limit, :tokens-spent, and :tokens-remaining in addition to the cost fields.
Streaming and the budget
By default, budgets enforce on non-streaming calls (the spend is known after each call completes). A stream's cost isn't known until it ends, so streams aren't budget-gated unless you opt in with :on-stream :pre-gate — which refuses to open a stream once the scope's spend is already at the cap:
(llm/with-budget {:max-cost-usd 0.50 :on-stream :pre-gate} (lambda ()
(llm/stream "..." on-token))) ; blocked at open once $0.50 is spentA single in-flight stream can still push past the cap (you only learn its cost when it finishes), but the next call is blocked. Usage is tracked either way.
llm/clear-budget
Remove the spending limit.
(llm/clear-budget)llm/set-pricing
Set custom pricing for a model (overrides both dynamic and built-in pricing). Costs are per million tokens.
(llm/set-pricing "my-model" 1.0 3.0) ; $1.00/M input, $3.00/M outputBatch & Parallel
llm/batch
Send multiple prompts concurrently and collect all results.
(llm/batch (list "Translate 'hello' to French"
"Translate 'hello' to Spanish"
"Translate 'hello' to German"))llm/pmap
Map a function over items, sending all resulting prompts in parallel.
(llm/pmap
(fn (word) (format "Define: ~a" word))
'("serendipity" "ephemeral" "ubiquitous")
{:max-tokens 50})