Overview·Language·Toolchain·Runtime

What is Sema?

Sema is a Scheme-like Lisp with a Clojure-flavored surface and first-class LLM/agent primitives, compiled to a NaN-boxed bytecode VM. Single-threaded, reference-counted, embeddable. Implemented in Rust.

$curl -fsSL https://sema-lang.com/install.sh | sh

v1.27.1 · MIT · Rust 2021 · 16 crates · ~125k lines

The language

A Lisp you can hold in your head.

One syntax rule: everything is an s-expression. The surface borrows from Clojure — keywords, maps, vectors, short lambdas, f-strings — while the semantics stay Scheme at the core: tail-call optimization, quasiquote macros, define/set!, lexical scope.

surface.semasyntax you write
(define greet
  (fn (name)
    f"Hello, ${name}!"))

(define person
  {:name "Ada"
   :age 36})

(:name person)              ; keyword as getter
(map #(* % %) (range 1 6))  ; short lambda
(match (:status res)
  :ok    (:data res)
  :error (throw (:message res)))
Clojure surface

:keywords, {:k v} maps, [1 2 3] vectors, #(* % %) lambdas, f"..." strings, #"regex" literals.

Scheme core

define, set!, lambda/fn, let/let*/letrec, if/cond/case, begin, and/or, tail-call optimization.

Modern conveniences

Threading macros (->, ->>), pattern matching (match), destructuring in let/define/fn params, when-let/if-let.

Error handling

try/catch/throw — caught errors are structured maps with :type, :message, :value, and :stack-trace.

Async

async/await + channels — a deterministic cooperative scheduler, not OS threads.

Data types

Classic types, plus LLM as first-class.

Prompts, messages, conversations, tools, and agents are values — the same as integers and strings. They can be bound, passed, inspected, and stored. That's the defining difference.

Scalars
IntegerFloatStringBooleanNilSymbolKeywordCharacter
Collections
Listvector-backedVectorMapsorted BTreeMapHashMapunorderedBytevector
LLM primitives first-class
PromptMessageConversationToolAgent
Special
PromiselazyRecordAsync PromiseChannelbounded FIFO

The toolchain

What's in the box.

One binary, sema, gives you the REPL, script runner, bytecode compiler, standalone executable builder, formatter, LSP, DAP debugger, notebook server, and MCP server.

REPL
Interactive prompt with history, auto-completion, and syntax highlighting. sema with no args.
Script runner
Run .sema files, inline expressions with --eval, and shebang scripts.
Bytecode compiler
Lowering → optimization → resolution → compilation. Inspect with sema compile / sema disasm.
Standalone executables
sema build traces imports, bundles assets, emits a self-contained binary. No toolchain needed at runtime.
Formatter
sema fmt — opinionated code formatter for .sema files.
LSP server
Completions, hover, go-to-definition, references, rename, semantic tokens, folding, inlay hints. sema lsp.
DAP debugger
Breakpoints, step in/over/out, stack traces, variable inspection via VM debug hooks. sema dap.
Notebook server
Jupyter-inspired .sema-nb format, shared-env cells, REST API, browser UI. sema notebook serve.
MCP server
Model Context Protocol server exposing Sema eval/build/notebook tools to AI agents. sema mcp.
Editor plugins
VS Code, Vim/Neovim, Emacs, Helix, Zed, IntelliJ — syntax highlighting, formatting, LSP integration.
WASM playground
Runs in the browser at sema.run and embeddable in web apps via wasm-bindgen.
Embedding
Rust crate sema-lang with a builder API. Interpreter::new().eval_str("(+ 1 2)").

The LLM layer

Not an SDK. A language.

LLM operations are forms and values, not library calls wrapped in boilerplate. The runtime handles retries, caching, cost tracking, provider fallback, and rate limiting — so your code stays the size of its idea.

  • Eight chat providers. Anthropic, OpenAI, Gemini, Groq, xAI, Mistral, Moonshot, Ollama — auto-configured from environment variables. Plus Jina, Voyage, and Cohere for embeddings.
  • Tools & agents. deftool defines a function with a schema. defagent defines a system prompt + tools + turn limit. agent/run handles the loop.
  • Conversations as data. Immutable, forkable, inspectable. conversation/say returns a new value — the old one is untouched.
  • Cassettes. Record LLM calls to a file, replay them forever. Deterministic tests without API keys.
  • Cost & budgets. llm/with-budget caps spend for a scope. Token usage tracked per call and per session.
  • Observability. Built-in OpenTelemetry tracing with GenAI conventions. Off by default.
llm.semaprimitives, not boilerplate
(deftool get-weather
  "Get weather for a city"
  {:city {:type :string}}
  (lambda (city)
    (format "~a: 22°C" city)))

(defagent bot
  {:system "Weather assistant."
   :tools [get-weather]
   :max-turns 3})

(llm/with-budget
  {:max-cost-usd 0.10}
  (lambda ()
    (agent/run bot
      "Weather in Oslo?")))
;; => "It's 22°C in Oslo."
;;    $0.003 · 1 tool call · 2 turns

How it differs from other Lisps

The things that surprise people.

Common Lisp / Scheme
Sema

Only #f and nil are falsy. 0, "", and () are all truthy. In CL, the empty list is false.

Cons cells
Vector-backed lists

Lists are Rc<Vec<Value>> — O(1) nth, O(n) cons. Prefer map/filter/fold and vector for hot paths.

Clojure atoms
define + set!

Mutable state is (define x 0) + (set! x 1). No atom/swap!/reset!.

One map type
Two map types

{:k v} literals are sorted BTreeMaps — deterministic order, usable as keys. (hashmap/new) is faster and unordered.

syntax-rules
auto-gensym

Macros use defmacro with quasiquote. Symbols ending in # inside quasiquote are auto-unique — no variable capture.

No LLM types
LLM as first-class

Prompt, Message, Conversation, Tool, Agent are values — alongside integers and strings. This is why Sema exists.

Under the hood

NaN-boxed values, bytecode VM.

Every value is a single 8-byte struct Value(u64) — encoded in IEEE 754 quiet-NaN payload space. The sole evaluator is a stack-based bytecode VM with intrinsic opcodes and NaN-boxed fast paths. No tree-walking interpreter.

  • Single-threaded. Rc-based values, no cross-thread sharing. Parallelism is at the LLM-call level, not the compute level.
  • No GC. Deterministic destruction via reference counting. Memory is freed the moment the last reference drops.
  • 16 crates. Strict dependency ordering: sema-core ← sema-reader ← sema-vm ← sema-eval ← sema. Stdlib and LLM depend on core, not eval — dependency inversion via callbacks.
  • Bytecode format. .semac files with a 24-byte header, string table, function table, main chunk. sema build embeds the runtime + bytecode into a standalone binary.

Architecture reference →

.semareaderlowerVM
.semac file layout
0053454Dmagic
0400v4
0000flags
········version
03003 sections
········reserved
24 bytes
0x01String Tablerequired
interned strings · Spur remapping
0x02Function Tablerequired
compiled function templates
0x03Main Chunkbytecode
ConstLoadLocal0CallPopConstCallGlobalReturn
NaN-boxed Value
tag6 bits
payload45 bits
struct Value(u64) — every type in 8 bytes

Naming conventions

Slash-namespaced. Predicates with ?. Arrows for conversions.

The conventions are the API contract — get these right and the stdlib falls into place.

file/read
Slash-namespaced functions. string/split, http/get, regex/match?, json/encode. Never read-file or split-string.
empty?
Predicates end in ?. null?, list?, file/exists?, equal?.
string->symbol
Conversions use ->. keyword->string, list->vector, string->number.
string-append
Legacy Scheme names kept for a few string ops. string-length, string-ref, substring — no string/ prefix on these.

Now go build something with it.

Install it — or skip the tutorial and hand the docs to your agent.

curl$curl -fsSL https://sema-lang.com/install.sh | sh
cargo$cargo install sema-lang
agent$curl -fsSL https://sema-lang.com/docs/for-agents.md >> AGENTS.md