Skip to content

Architecture Overview

Sema is a Lisp with first-class LLM primitives, implemented in Rust. All code runs on a single evaluator: a bytecode VM. The runtime is single-threaded (Rc, not Arc), with deterministic destruction via reference counting instead of a garbage collector.

The entire implementation is ~116k lines of Rust across 15 crates, each with a clear responsibility and strict dependency ordering.

Crate Map

                ┌──────────────────────────────────────┐
                │              sema                    │
                │  (binary: CLI, REPL, embedding API)  │
                └──┬─────────────────┬─────────────────┘
                   │                 │
     ┌─────────────▼──┐         ┌────▼─────┐
     │ sema-notebook  │         │ sema-eval│
     │ notebook UI +  ├────────►│ macros + │
     │ server         │         │ modules  │
     └────────────────┘         └─┬───┬──┬─┘
                                  │   │  │
                 ┌────────────────▼┐  │ ┌▼──────────────┐
                 │  sema-stdlib    │  │ │   sema-llm    │
                 │  native fns     │  │ │ LLM providers │
                 └────────┬────────┘  │ │ + embeddings  │
                          │           │ └───────┬───────┘
                          │      ┌────▼─────┐   │
                          │      │ sema-vm  │   │
                          │      │ bytecode │   │
                          │      │ VM       │   │
                          │      │(evaluator)│  │
                          │      └────┬─────┘   │
                          │           │         │
                     ┌────▼───────────▼──┐      │
                     │   sema-reader     │      │
                     │   lexer/parser    │      │
                     └────────┬──────────┘      │
                              │                 │
                     ┌────────▼───────┐         │
                     │   sema-core    │◄────────┘
                     │  Value, Env,   │
                     │  SemaError     │
                     └────────────────┘

Dependency flow: sema-core ← sema-reader ← sema-vm ← sema-eval ← sema — with sema-eval also pulling in sema-stdlib (to register builtins) and sema-llm, both of which depend only on sema-core (plus sema-reader for stdlib).

The critical constraint: sema-stdlib and sema-llm depend on sema-core, not on sema-eval. This avoids circular dependencies but creates a problem — both crates sometimes need to evaluate user code. They solve it via dependency inversion:

  • sema-stdlib invokes the real evaluator via callbacks (call_callback/eval_callback) registered by sema-eval at startup — stored on the EvalContext and a shared thread-local stdlib context
  • sema-llm mostly uses the same core callbacks, but still carries a redundant second eval callback of its own (tech debt — see Solution 2)

This is discussed in detail in The Circular Dependency Problem.

Crate Responsibilities

CrateRoleKey types
sema-coreShared typesValue (NaN-boxed 8-byte), Env, SemaError, string interner, NativeFn, Lambda, Macro, Record, LLM types
sema-readerParsingLexer (24 token types) + recursive descent ParserValue AST + SpanMap
sema-vmBytecode VM (the evaluator)CoreExpr, ResolvedExpr, Op, Chunk, Emitter — lowering, resolution, compilation, VM dispatch
sema-evalMacro expansion + module loadingMacro expander (VM-native), module system (import/load), prelude, eval/call callback wiring; drives the VM. Not a standalone evaluator — the VM is the sole evaluator
sema-stdlibStandard libraryNative functions across a comprehensive standard library
sema-llmLLM integrationLlmProvider trait, native providers (Anthropic, OpenAI, Gemini, Ollama), OpenAI-compatible shim, embedding providers, cost tracking
sema-otelObservabilityOpenTelemetry facade (spans/metrics for LLM, agent, tool, and notebook runs); depends only on sema-core; native-only (no-op on wasm32)
sema-lspLanguage ServerLSP via tower-lsp: completions, hover, go-to-definition, references, rename, semantic tokens, diagnostics
sema-dapDebug AdapterDAP server: breakpoints, stepping, stack traces, variable inspection via VM debug hooks
sema-fmtFormatterCode formatter for .sema files (sema fmt)
sema-notebookNotebook interface.sema-nb JSON format, evaluation engine, HTTP server with REST API, embedded browser UI, Markdown export
sema-wasmWASM bindingsBrowser playground bindings, JS interop via wasm-bindgen
sema-mcpMCP serverModel Context Protocol server exposing Sema eval/build/notebook tools (sema mcp)
sema-docsDoc generation (internal)Builtin-docs index generator (make docs); not shipped as a binary
semaBinaryclap CLI, reedline REPL (highlighter / hinter / inspector live in crates/sema/src/repl/), InterpreterBuilder embedding API

The Value Type

All Sema data is represented by a single NaN-boxed Value — an 8-byte struct Value(u64) that encodes every type in IEEE 754 quiet NaN payload space:

rust
// crates/sema-core/src/value.rs
#[repr(transparent)]
pub struct Value(u64);

// Encoding: floats stored as raw f64 bits.
// All other types packed into quiet NaN payloads:
//   sign=1 | exponent=0x7FF | quiet=1 | TAG(6 bits) | PAYLOAD(45 bits)
//
// Immediate types (no heap allocation):
//   Nil, Bool, Char, Symbol(Spur), Keyword(Spur), IntSmall(±2^44)
//
// Heap types (Rc pointer in 45-bit payload):
//   IntBig, String, List, Vector, Map, HashMap, Lambda, Macro, NativeFn,
//   Prompt, Message, Conversation, ToolDef, Agent, Thunk, Record, Bytevector,
//   MultiMethod, Stream, F64Array, I64Array, AsyncPromise, Channel
//
// Pattern matching via val.view() → ValueView enum
The IBM 704 connection (1955)

The idea of packing type information and data into a single machine word goes back to the IBM 704 — the machine Lisp was born on. The 704's 36-bit word was divided into sub-fields: a 3-bit prefix (opcode), a 15-bit decrement, a 3-bit tag (register selector), and a 15-bit address. The same word could be an instruction, a fixed-point number, a floating-point number, or six BCD characters — depending entirely on context. Sema's NaN-boxing is the same fundamental idea scaled to 64 bits: 6 tag bits + 45 payload bits, where the tag determines how to interpret the payload. The 704 also pioneered the biased-exponent floating-point format (sign + 8-bit characteristic biased by +128 + 27-bit fraction) that would eventually become IEEE 754 thirty years later — the very standard whose NaN space we now exploit for type tagging.

Several design choices here are worth examining.

Why Rc, Not Arc

Sema is single-threaded. Arc adds an atomic increment/decrement on every clone/drop — unnecessary overhead when there's no cross-thread sharing. Rc uses ordinary (non-atomic) reference counting, which is cheaper and also means the compiler can catch accidental Send/Sync usage at compile time.

The trade-off versus a tracing garbage collector: reference counting gives deterministic destruction (values are freed the instant their last reference drops), but cannot collect cycles. In practice this is rarely a problem — Lisp closures tend to create tree-shaped reference graphs, not cycles. A lambda captures its enclosing environment, which may capture its own enclosing environment, forming a chain. Cycles are theoretically possible (e.g., named lambdas bind themselves in their own environment, and Thunk uses RefCell which could close over itself), but they don't arise in typical Sema programs. If they did, the leaked memory would be bounded by the closure's captured environment — not a growing leak.

Sema uses NaN-boxing — encoding values in the unused bits of IEEE 754 NaN representations to fit a tagged value in 8 bytes, the same technique used by Janet. This makes Value the same size as a f64 or a pointer, meaning the value stack and constant pool have excellent cache locality. Heap types like List, Map, and Lambda add one level of Rc pointer indirection, with the pointer stored in the 45-bit payload field (using the 8-byte alignment guarantee to shift the pointer right by 3 bits). Small integers (±17.5 trillion), symbols, keywords, characters, booleans, and nil are all stored entirely within the 8-byte NaN-box with zero heap allocation.

Why Vector-Backed Lists

Value::List(Rc<Vec<Value>>) stores list elements in a contiguous Vec, not a linked list of cons cells. This is a deliberate departure from traditional Lisp.

Why car and cdr have those names

McCarthy's original Lisp (1958) ran on the IBM 704, which packed cons cells into a single 36-bit machine word. The address field (bits 21-35) held a pointer to the first element; the decrement field (bits 3-17) held a pointer to the rest of the list. The 704 had hardware instructions to extract these fields directly — car is literally "Contents of the Address Register" and cdr is "Contents of the Decrement Register." They were single machine instructions, not function calls. Sema keeps the names for Scheme compatibility but the implementation is completely different — car is a Vec index (v[0]) and cdr is a slice copy (v[1..]).

The performance trade-offs:

OperationVec-backedCons cells
Random access (nth)O(1)O(n)
lengthO(1)O(n)
Cache localityContiguousPointer-chasing
cons (prepend)O(n) copyO(1)
appendO(n) copyO(n)
Pattern matching (car/cdr)Slice indexingNatural

The performance win comes from cache locality — modern CPUs prefetch sequential memory, so iterating a Vec is dramatically faster than chasing pointers through a cons list. Random access and length are constant-time bonuses.

The cost is O(n) cons and append. Sema mitigates this with copy-on-write optimization (see Performance Internals): when the Rc refcount is 1, mutations happen in place instead of copying. In practice, most list construction uses list, map, filter, or fold — which build a new Vec directly — rather than repeated cons.

Clojure takes a third approach: persistent vectors backed by wide (32-way branching) array-mapped tries, giving effectively O(1) indexed access (O(log₃₂ n), which is ≤ 7 for any practical size) with structural sharing. Sema's approach is simpler and faster for small to medium lists, at the cost of no structural sharing.

Why BTreeMap for Maps, hashbrown Opt-In

Value::Map uses BTreeMap (sorted, deterministic iteration order) rather than HashMap. This matters for:

  • Deterministic equality: Two maps with the same entries compare identically via PartialEq, and iteration order is independent of insertion order — important for consistent hashing and display
  • Printing: {:a 1 :b 2} always prints in the same order, making test assertions reliable
  • Usable as keys: Maps can be keys in other BTreeMaps because Value implements Ord. Since Map variants compare by sorted content, two maps with the same entries are always equal under Ord, regardless of construction order

For performance-critical code, Value::HashMap wraps hashbrown::HashMap (the SwissTable implementation used inside Rust's standard library). It's opt-in via (hashmap/new) — see the Performance Internals for benchmarks.

Why Spur for Symbols and Keywords

Symbol(Spur) and Keyword(Spur) store interned u32 handles rather than strings. A thread-local lasso::Rodeo interner maps strings to Spur values and back:

rust
thread_local! {
    static INTERNER: RefCell<Rodeo> = RefCell::new(Rodeo::default());
}

pub fn intern(s: &str) -> Spur {
    INTERNER.with(|r| r.borrow_mut().get_or_intern(s))
}

pub fn with_resolved<F, R>(spur: Spur, f: F) -> R
where
    F: FnOnce(&str) -> R,
{
    INTERNER.with(|r| {
        let interner = r.borrow();
        f(interner.resolve(&spur))
    })
}

This makes symbol equality O(1) (integer comparison instead of string comparison) and environment lookup faster (integer keys in the env's hash map). It also means special form dispatch — the hottest path in the evaluator — compares u32 values against pre-cached constants rather than resolving strings.

String interning is as old as Lisp itself. McCarthy's original LISP 1.5 (1962) interned atoms in the "object list" (oblist). The key difference: Sema uses a separate interner rather than pointer identity, so interning is explicit via intern() rather than implicit.

LLM Types as First-Class Values

Prompt, Message, Conversation, ToolDef, and Agent sit in the Value type at the same level as List and Map. They're not encoded as maps-with-conventions — they're distinct types with their own constructors, pattern matching, and display representations:

sema
;; These are values, not strings or maps
(define msg (message :user "Hello"))    ; => <message user "Hello">
(define p (prompt msg))                 ; => <prompt 1 messages>
(define conv (conversation p :model "claude-sonnet-4-6")) ; => <conversation 1 messages>

This means the type system catches errors like passing a string where a message is expected, and tools like complete can dispatch on the actual type rather than checking for the presence of magic keys in a map.

Environment Model

The environment is a linked list of scopes, each holding a SpurMap<Spur, Value> (a hashbrown::HashMap):

rust
pub struct Env {
    pub bindings: Rc<RefCell<SpurMap<Spur, Value>>>,
    pub parent: Option<Rc<Env>>,
    pub version: Cell<u64>,
}

The version counter is bumped on every mutation; the bytecode VM's per-instruction inline caches use it to detect stale global lookups. Variable lookup walks the parent chain until it finds a binding or reaches the root. This is the standard lexical scoping model — a closure captures a reference to its defining environment, and lookups resolve outward through enclosing scopes.

Operations

OperationBehaviorUsed by
get(spur)Walk parent chain, return first matchVariable lookup
set(spur, val)Insert in current scopedefine, parameter binding
set_existing(spur, val)Walk chain, update where foundset! (mutation)
update(spur, val)Overwrite in current scopeHot-path env reuse
take(spur)Remove from current scope, return valueCOW optimization
take_anywhere(spur)Remove from any scope in chainCOW optimization

take and take_anywhere exist for the copy-on-write optimization: by removing a value from the environment before passing it to a function, the Rc refcount drops to 1, enabling in-place mutation. See Performance Internals.

update exists for the lambda environment reuse optimization: when reusing an environment across iterations of a hot loop, update overwrites an existing binding in place instead of going through the full insert path. See Performance Internals.

Error Handling

SemaError is a thiserror-derived enum with 12 variants including WithTrace and WithContext wrappers:

rust
#[derive(Debug, Clone, thiserror::Error)]
pub enum SemaError {
    Reader { message: String, span: Span },
    Eval(String),
    Type { expected: String, got: String, got_value: Option<String> },
    Arity { name: String, expected: String, got: usize },
    Unbound(String),
    Llm(String),
    Io(String),
    PermissionDenied { function: String, capability: String },
    PathDenied { function: String, path: String },
    UserException(Value),

    WithTrace { inner: Box<SemaError>, trace: StackTrace },
    WithContext { inner: Box<SemaError>, ... },
}

Constructor Helpers

Errors are created via constructor methods, never raw enum variants:

rust
SemaError::eval("division by zero")
SemaError::type_error("int", val.type_name())
SemaError::arity("map", "2", args.len())

This keeps error construction concise across all native functions and special forms.

Lazy Stack Traces

Stack traces are not captured at error creation time. Instead, the WithTrace wrapper is attached during error propagation — as an error unwinds out through a function call, it is wrapped with the current call stack:

rust
pub fn with_stack_trace(self, trace: StackTrace) -> Self {
    if trace.0.is_empty() {
        return self;
    }
    match self {
        SemaError::WithTrace { .. } => self,  // already wrapped, don't double-wrap
        SemaError::WithContext { inner, hint, note } => SemaError::WithContext {
            inner: Box::new(inner.with_stack_trace(trace)),  // wrap inside the context
            hint,
            note,
        },
        other => SemaError::WithTrace {
            inner: Box::new(other),
            trace,
        },
    }
}

This avoids the cost of capturing a stack trace for errors that are caught by try/catch — only errors that propagate to the top level pay the trace cost. The idempotence check (WithTrace { .. } => self) prevents double-wrapping when an error passes through multiple call frames.

Interpreter State

Sema's evaluator state is held in an explicit EvalContext struct, defined in sema-core/src/context.rs and threaded through the evaluator as ctx: &EvalContext. Each Interpreter instance owns its own EvalContext, enabling multiple independent interpreters per thread with fully isolated state.

EvalContext Fields

FieldTypePurpose
module_cacheRefCell<BTreeMap<PathBuf, ...>>Loaded modules (path → exports)
current_fileRefCell<Vec<PathBuf>>Stack of file paths being executed
module_exportsRefCell<Vec<Option<Vec<String>>>>Exports declared by currently-loading module
module_load_stackRefCell<Vec<PathBuf>>Cycle detection during module loading
call_stackRefCell<Vec<CallFrame>>Call frames for error traces
span_tableRefCell<HashMap<usize, Span>>Rc pointer address → source span
eval_depthCell<usize>Recursion depth counter
max_eval_depthCell<usize>High-water mark of eval depth
eval_step_limitCell<usize>Step limit for fuzz targets
eval_stepsCell<usize>Current step counter
eval_deadlineCell<Option<Instant>>Wall-clock budget (used by the notebook)
sandboxSandboxCapability sandbox
user_context / hidden_contextRefCell<Vec<BTreeMap<Value, Value>>>Dynamic context frames
context_stacksRefCell<BTreeMap<Value, Vec<Value>>>Named context stacks
eval_fn / call_fnCell<Option<fn(...)>>Registered evaluator callbacks
interactiveCell<bool>REPL/interactive mode flag

Remaining Thread-Locals

Some state remains in thread-local storage — either because it's a pure performance cache or because it belongs to a subsystem that hasn't been refactored yet:

LocationThread-localPurpose
sema-core/value.rsINTERNERString interner (lasso::Rodeo)
sema-core/context.rsSTDLIB_CTXShared EvalContext for stdlib callbacks
sema-eval/special_forms.rsSFCached SpecialFormSpurs (performance cache)
sema-llm/builtins.rsPROVIDER_REGISTRYRegistered LLM providers
sema-llm/builtins.rsSESSION_USAGECumulative token usage
sema-llm/builtins.rsLAST_USAGEMost recent completion's usage
sema-llm/builtins.rsEVAL_FNFull evaluator callback
sema-llm/builtins.rsSESSION_COSTCumulative dollar cost
sema-llm/builtins.rsBUDGET_LIMITSpending cap
sema-llm/builtins.rsBUDGET_SPENTSpending against cap
sema-llm/pricing.rsCUSTOM_PRICINGUser-defined model pricing

Implications for Embedding

Multiple Interpreter instances can coexist on the same thread with fully isolated evaluator state — each has its own module cache, call stack, span table, and depth counters. The string interner (INTERNER) remains shared per-thread, which is correct since Spur handles must be consistent within a thread. LLM state (provider registry, usage tracking, budgets) is also per-thread, meaning all interpreters on the same thread share provider configuration and cost tracking.

Value instances are not Send or Sync (they use Rc, not Arc), so interpreters cannot be moved across threads.

WASM Support

Sema compiles to WebAssembly with conditional compilation gates. The #[cfg(not(target_arch = "wasm32"))] attribute excludes modules that depend on OS-level capabilities:

From sema-stdlib:

  • io — file system access (file/read, file/write, file/fold-lines, etc.)
  • system — process execution, environment variables, exit
  • http — HTTP client (http/get, http/post, etc.)
  • terminal — terminal control (colors, cursor, raw mode)
  • kv, pdf, serial, server, sqlite — other OS-dependent modules

From sema-eval:

  • Module import/load (depends on file system)

sema-llm is excluded entirely — LLM providers require network access.

sema-otel compiles to no-op spans on wasm32 — the OTLP exporter and its async runtime are gated out, so tracing calls become zero-cost stubs in the browser.

The pure-computation core (arithmetic, strings, lists, maps, JSON, regex, crypto, datetime, CSV, bytevectors, predicates, math, comparison, bitwise, meta) remains available in WASM, making Sema usable as an embedded scripting language in browser-based applications.

The LLM Subsystem

Provider Trait

All LLM providers implement a single trait:

rust
pub trait LlmProvider: Send + Sync {
    fn name(&self) -> &str;
    fn complete(&self, request: ChatRequest) -> Result<ChatResponse, LlmError>;
    fn default_model(&self) -> &str;

    // Optional — defaults provided
    fn stream_complete(&self, request: ChatRequest,
        on_chunk: &mut dyn FnMut(&str) -> Result<(), LlmError>,
    ) -> Result<ChatResponse, LlmError> { /* non-streaming fallback */ }

    fn batch_complete(&self, requests: Vec<ChatRequest>)
        -> Vec<Result<ChatResponse, LlmError>> { /* sequential fallback */ }

    fn embed(&self, request: EmbedRequest)
        -> Result<EmbedResponse, LlmError> { /* unsupported error */ }
}

Note the Send + Sync bound — despite the single-threaded runtime, provider implementations use tokio::runtime::Runtime::block_on internally to run async HTTP clients. The trait itself is synchronous; async is hidden behind the provider boundary.

Provider Registry

The ProviderRegistry holds registered providers by name with a default provider slot and a separate embedding provider slot:

rust
pub struct ProviderRegistry {
    providers: HashMap<String, Box<dyn LlmProvider>>,
    default: Option<String>,
    embedding_provider: Option<String>,
}

At startup, the binary crate detects available API keys and registers providers:

  • ANTHROPIC_API_KEY → Anthropic (Claude)
  • OPENAI_API_KEY → OpenAI (GPT)
  • GOOGLE_API_KEY → Gemini
  • GROQ_API_KEY, XAI_API_KEY, MISTRAL_API_KEY, MOONSHOT_API_KEY → OpenAI-compatible shim
  • Ollama (local) is always registered — no auth needed; OLLAMA_HOST overrides the default http://localhost:11434

Embedding providers (Jina, Voyage, Cohere) are registered separately and selected via (llm/set-embedding-provider).

Cost Tracking

Every completion records token usage in SESSION_USAGE and computes dollar cost via a built-in pricing table (pricing.rs). The llm/with-budget function sets a scoped spending cap:

sema
(llm/with-budget {:max-cost-usd 0.50 :max-tokens 10000} (lambda ()
  (llm/complete "Summarize this document...")))
;; Raises an error if cumulative cost exceeds $0.50 or 10000 tokens

Observability

Tracing and metrics live in sema-otel, a thin facade over OpenTelemetry. It sits below the subsystems it instruments — sema-llm, sema-stdlib, and sema-notebook all depend on it — but it itself depends only on sema-core, so the OpenTelemetry stack never leaks into the core types. On wasm32 the whole crate compiles to no-op stubs (see WASM Support).

Instrumentation is automatic and follows the OpenTelemetry GenAI semantic conventions (gen_ai.* attributes): every llm/complete and llm/embed emits a CLIENT span; each agent run, tool dispatch, and notebook cell emits an INTERNAL span (invoke_agentchatexecute_tool); HTTP retries nest beneath the LLM span. Token counts, model, cost, and finish reason ride along as attributes. Tracing is off by default and exports over OTLP (HTTP by default, gRPC optional) or to a JSONL file.

When Sema is embedded as a library it never installs a global tracer provider on its own — that is the host's job. The host chooses the wiring through InterpreterBuilder::with_telemetry(TelemetryMode::…): Off, UseHostGlobal (emit against the provider the app already installed), OwnProvider(p) (a provider handed to Sema), or FromEnv (self-install from the OTEL_* variables, owned by the built Interpreter). Sema's spans nest under whatever span is current, so a host request span becomes the parent of the invoke_agent tree. An optional SEMA_OTEL_COMPAT setting also writes vendor-specific attribute names for backends that don't read gen_ai.*. See Tracing & Metrics and Backend Compatibility for the user-facing guide.

The Circular Dependency Problem

One layering constraint shapes how the library crates reach the evaluator. It's a textbook case of dependency inversion, called out here mainly because the same pattern recurs in sema-stdlib and sema-llm.

The Problem

Both sema-stdlib and sema-llm sometimes need to evaluate user code:

  • sema-stdlib: file/fold-lines invokes a user-provided lambda on each line. map, filter, fold, for-each, sort all take lambda arguments.
  • sema-llm: Tool handlers defined via deftool are Sema expressions that must be evaluated when an LLM invokes the tool.

But the dependency already runs the other way: sema-eval depends on sema-stdlib and sema-llm so it can register their builtins at startup. If either of them depended on sema-eval to reach eval_value(), that would close a cycle — which Cargo forbids.

sema-eval   ──depends-on──► sema-stdlib / sema-llm   (to register their builtins)
sema-stdlib ──CANNOT depend on──► sema-eval          (would close the cycle)
            └── so it reaches the evaluator through a callback instead

The cycle is a hard Cargo rule, but its existence is a deliberate trade: keeping the standard library and LLM layers as separate crates (for wasm gating, compile times, and isolated testing) while letting sema-eval assemble a batteries-included interpreter. Merging them into one crate would remove the cycle and the callback — at the cost of that modularity. The inversion below is the cheaper trade.

Solution 1: Callback Architecture (sema-core + sema-stdlib)

sema-core defines callback storage in context.rs that bridges the dependency gap using dependency inversion — function-pointer slots on EvalContext, plus a shared thread-local context for stdlib functions that don't receive a ctx parameter:

rust
pub type EvalCallbackFn = fn(&EvalContext, &Value, &Env) -> Result<Value, SemaError>;
pub type CallCallbackFn = fn(&EvalContext, &Value, &[Value]) -> Result<Value, SemaError>;

pub struct EvalContext {
    // ...
    pub eval_fn: Cell<Option<EvalCallbackFn>>,
    pub call_fn: Cell<Option<CallCallbackFn>>,
}

thread_local! {
    static STDLIB_CTX: EvalContext = EvalContext::new();
}

At startup, sema-eval registers the real evaluator and call dispatch functions (into both the interpreter's context and the shared STDLIB_CTX):

rust
sema_core::set_eval_callback(&ctx, eval_value);
sema_core::set_call_callback(&ctx, call_value);

All stdlib higher-order functions (map, filter, fold, sort-by, for-each, file/fold-lines, etc.) invoke user-provided lambdas through sema_core::call_callback, which dispatches to the real evaluator:

rust
// In sema-stdlib, e.g. map implementation
let result = sema_core::call_callback(ctx, &func, &[elem])?;

The with_stdlib_ctx function provides a shared EvalContext for stdlib callbacks, avoiding per-call allocation of a new context.

This is a clean dependency inversion — sema-stdlib depends only on the callback signature defined in sema-core, not on sema-eval. The runtime cost is one Cell::get() + function pointer dispatch per call, which is negligible. Unlike the previous mini-evaluator approach, this architecture uses the same evaluator everywhere — all special forms, builtins, and features are available inside higher-order functions like map and file/fold-lines.

Solution 2: Eval Callback (sema-llm) — redundant, slated for removal

sema-llm predates Solution 1 and still carries its own parallel callback — a Box<dyn Fn> in a thread-local (EVAL_FN), plus a hand-rolled function-application routine (call_value_fn) and a degraded mini-evaluator fallback (simple_eval). It bridges the same gap, redundantly:

rust
pub type EvalCallback = Box<dyn Fn(&EvalContext, &Value, &Env) -> Result<Value, SemaError>>;

thread_local! {
    static EVAL_FN: RefCell<Option<EvalCallback>> = RefCell::new(None);
}

pub fn set_eval_callback(f: impl Fn(&EvalContext, &Value, &Env) -> Result<Value, SemaError> + 'static) {
    EVAL_FN.with(|eval| {
        *eval.borrow_mut() = Some(Box::new(f));
    });
}

At startup, the binary crate registers the full evaluator:

rust
sema_llm::builtins::set_eval_callback(sema_eval::eval_value);

When a tool handler needs to evaluate Sema code, it calls through this indirection:

rust
fn full_eval(ctx: &EvalContext, expr: &Value, env: &Env) -> Result<Value, SemaError> {
    EVAL_FN.with(|eval_fn| {
        let eval_fn = eval_fn.borrow();
        match &*eval_fn {
            Some(f) => f(ctx, expr, env),
            None => simple_eval(expr, env),  // fallback if no callback registered
        }
    })
}

This is the same dependency-inversion idea as Solution 1, but it should not be a second mechanism. sema-llm already uses the core sema_core::call_callback in a few places; the bespoke path duplicates it at ~15 call sites and, worse, call_value_fn re-implements function application by binding params into a plain Env and evaluating directly — bypassing the VM closure machinery (run_nested_closure) that the canonical call_value routes through. That means set!, captured upvalues, and async/yield inside a tool handler or streaming callback can behave differently than the same code inside a stdlib HOF. Consolidating sema-llm onto the core callback (and deleting EVAL_FN/call_value_fn/simple_eval) is tracked tech debt — see docs/plans/2026-06-22-unify-sema-llm-eval-callback.md.

Why Not a Trait?

An alternative would be to define an Evaluator trait in sema-core and have sema-eval implement it. This would work but adds complexity for little benefit — the callback is simpler, there's only one implementation, and it avoids threading a trait object through every function that might need evaluation. The callback approach also makes it easy to test sema-llm in isolation (register a mock evaluator).

Architectural Lesson

The circular dependency constraint forced a callback architecture that turned out to be a better design than having direct access to the evaluator would have been. The dependency inversion through sema-core callbacks gives a single, canonical evaluator used everywhere — stdlib HOFs, LLM tool handlers, and the main interpreter all run the same code paths with full feature support. This also provides a clean seam for future work: when the bytecode VM became the default backend, only the callback registrations needed to change — all call sites in stdlib and llm remained untouched, validating this design. Sometimes constraints lead to better designs than unconstrained freedom would have.