Resilience & Retry
Fallback Provider Chains
llm/with-fallback
Wraps a thunk with a fallback chain of providers. If the LLM call fails with one provider, automatically tries the next provider in the list.
(llm/with-fallback [:anthropic :openai :groq]
(lambda () (llm/complete "Hello")))Model selection across the chain
Model ids are provider-specific (a Claude id is meaningless to OpenAI), so each chain entry resolves its own model:
- A bare provider keyword (e.g.
:anthropic) uses that provider's default model, or whatever you set via(llm/configure :anthropic {:default-model "..."}). This is the recommended form — leave the body's(llm/complete ...)unpinned so every provider gets a model id valid for itself. - If the body pins a
:model, that exact string is sent to every provider in the chain. That's fine for a homogeneous chain, but pinning a provider-specific id (e.g. a Claude model) will fail on any other provider it falls back to.
Per-provider model overrides
To target a different model per provider within a single chain, give chain entries as [provider model] pairs or {:provider :model} maps. A per-provider override wins over any :model pinned in the body:
;; Anthropic uses Opus, OpenAI uses GPT-5.5, Groq uses its default
(llm/with-fallback [[:anthropic "claude-opus-4-8"]
[:openai "gpt-5.5"]
:groq]
(lambda () (llm/complete "Hello")))
;; Map form is equivalent and lets you omit :model to use the provider default
(llm/with-fallback [{:provider :anthropic :model "claude-opus-4-8"}
{:provider :openai}]
(lambda () (llm/complete "Hello")))Automatic Retry on Transient Errors
LLM calls (llm/complete, llm/chat, agent/run, and the fallback-chain path) automatically retry transient failures — no configuration needed:
- Retried: HTTP 429 (rate limited), 5xx server errors, and network/timeout errors.
- Not retried: 4xx client errors other than 429 (e.g. 400 bad request), and parse errors — these won't succeed on a retry, so they fail fast.
- Backoff: capped exponential backoff with full jitter (base 500ms, doubling per attempt, capped at 30s), up to 3 retries. A 429 honors the provider's
retry-afterhint when present.
This is distinct from llm/with-fallback (which switches providers on failure) and the generic retry (which wraps any thunk). They compose: each provider in a fallback chain does its own transient-error retry before the chain moves on.
Streaming and resilience
llm/stream applies these guarantees at stream-open — before the first token:
- Fallback — if a provider fails to open the stream, the chain fails over to the next, just like non-streaming. Once the first token has been delivered, a mid-stream failure is not failed over (switching providers mid-answer would re-emit the partial you already received); the error surfaces and the partial text is kept.
- Rate-limiting —
llm/with-rate-limitgates the stream-open call the same as a non-streaming one. - Budget — opt in with
llm/with-budget {... :on-stream :pre-gate}: the stream is refused at open if the scope's spend is already at the cap. By default streams are not budget-gated (a stream's cost is unknown until it ends), though usage is still tracked afterward.
Two things still don't apply to streams: the response cache (a live stream isn't cached — for deterministic replay use cassettes) and mid-stream retry (a retry would duplicate already-emitted output — see above).
Rate Limiting
llm/with-rate-limit
Wraps a thunk with token-bucket rate limiting. Takes a rate (requests per second) and a thunk. Useful to avoid hitting API rate limits.
(llm/with-rate-limit 5 (lambda () (llm/complete "Hello")))Generic Retry
retry
Retries a thunk on failure with exponential backoff. Takes a thunk and an optional options map.
;; Default: 3 attempts, 100ms base delay, 2.0 backoff
(retry (lambda () (http/get "https://example.com")))
;; Custom options
(retry (lambda () (http/get "https://example.com"))
{:max-attempts 5 :base-delay-ms 200 :backoff 1.5})Options:
| Key | Type | Default | Description |
|---|---|---|---|
:max-attempts | integer | 3 | Maximum number of attempts |
:base-delay-ms | integer | 100 | Initial delay between retries (ms) |
:backoff | float | 2.0 | Backoff multiplier |
Note:
retryis in the stdlib (not LLM-specific) — it works with any function.
LLM Convenience Functions
llm/summarize
Summarize text using an LLM. Takes text and an optional options map.
(llm/summarize "Long article text here...")
(llm/summarize "Long text" {:model "claude-haiku-4-5-20251001" :max-tokens 200})llm/compare
Compare two texts using an LLM. Takes two strings and an optional options map.
(llm/compare "Text A" "Text B")
(llm/compare "Text A" "Text B" {:model "claude-haiku-4-5-20251001"})