Response Caching

Sema provides an in-memory response cache for LLM calls, keyed on prompt + model + temperature. Caching is per-session and useful for iterative development, avoiding duplicate API calls when re-running scripts with the same prompts.

Cache Scope

`llm/with-cache`

Wraps a thunk (lambda), enabling the response cache for all LLM calls within it. An optional second argument is an options map with :ttl (time-to-live in seconds, default 3600). Returns the thunk's result.

scheme

(llm/with-cache (lambda () (llm/complete "hello")))

(llm/with-cache (lambda () (llm/complete "hello")) {:ttl 7200})

Inspection & Debugging

`llm/cache-key`

Generate the SHA-256 cache key for a given prompt and options. Useful for debugging cache behavior. Takes 1–2 args: a prompt string and an optional options map.

scheme

(llm/cache-key "hello" {:model "gpt-4" :temperature 0.5})

`llm/cache-stats`

Returns a map with :hits, :misses, and :size (number of cached entries).

scheme

(llm/cache-stats)  ; => {:hits 0 :misses 0 :size 0}

Cache Management

`llm/cache-clear`

Clear all cached responses. Returns the number of entries cleared.

scheme

(llm/cache-clear)  ; => 0

Response Caching ​

Cache Scope ​

llm/with-cache ​

Inspection & Debugging ​

llm/cache-key ​

llm/cache-stats ​

Cache Management ​

llm/cache-clear ​

Response Caching

Cache Scope

`llm/with-cache`

Inspection & Debugging

`llm/cache-key`

`llm/cache-stats`

Cache Management

`llm/cache-clear`