Overview·Language·Toolchain·Runtime

What is Sema?

Sema is a Scheme-like Lisp with a Clojure-flavored surface and first-class LLM/agent primitives, compiled to a NaN-boxed bytecode VM. Single-threaded, reference-counted, embeddable. Implemented in Rust.

Read the docs Try the playground

$curl -fsSL https://sema-lang.com/install.sh | sh

v1.27.1 · MIT · Rust 2021 · 16 crates · ~125k lines

The language

A Lisp you can hold in your head.

One syntax rule: everything is an s-expression. The surface borrows from Clojure — keywords, maps, vectors, short lambdas, f-strings — while the semantics stay Scheme at the core: tail-call optimization, quasiquote macros, define/set!, lexical scope.

surface.semasyntax you write

(define greet
  (fn (name)
    f"Hello, ${name}!"))

(define person
  {:name "Ada"
   :age 36})

(:name person)              ; keyword as getter
(map #(* % %) (range 1 6))  ; short lambda
(match (:status res)
  :ok    (:data res)
  :error (throw (:message res)))

Clojure surface

:keywords, {:k v} maps, [1 2 3] vectors, #(* % %) lambdas, f"..." strings, #"regex" literals.

Scheme core

define, set!, lambda/fn, let/let*/letrec, if/cond/case, begin, and/or, tail-call optimization.

Modern conveniences

Threading macros (->, ->>), pattern matching (match), destructuring in let/define/fn params, when-let/if-let.

Error handling

try/catch/throw — caught errors are structured maps with :type, :message, :value, and :stack-trace.

Async

async/await + channels — a deterministic cooperative scheduler, not OS threads.

Data types

Classic types, plus LLM as first-class.

Prompts, messages, conversations, tools, and agents are values — the same as integers and strings. They can be bound, passed, inspected, and stored. That's the defining difference.

Scalars

IntegerFloatStringBooleanNilSymbolKeywordCharacter

Collections

Listvector-backedVectorMapsorted BTreeMapHashMapunorderedBytevector

LLM primitives first-class

PromptMessageConversationToolAgent

Special

PromiselazyRecordAsync PromiseChannelbounded FIFO

The toolchain

What's in the box.

One binary, sema, gives you the REPL, script runner, bytecode compiler, standalone executable builder, formatter, LSP, DAP debugger, notebook server, and MCP server.

REPL

Interactive prompt with history, auto-completion, and syntax highlighting. sema with no args.

Script runner

Run .sema files, inline expressions with --eval, and shebang scripts.

Bytecode compiler

Lowering → optimization → resolution → compilation. Inspect with sema compile / sema disasm.

Standalone executables

sema build traces imports, bundles assets, emits a self-contained binary. No toolchain needed at runtime.

Formatter

sema fmt — opinionated code formatter for .sema files.

LSP server

Completions, hover, go-to-definition, references, rename, semantic tokens, folding, inlay hints. sema lsp.

DAP debugger

Breakpoints, step in/over/out, stack traces, variable inspection via VM debug hooks. sema dap.

Notebook server

Jupyter-inspired .sema-nb format, shared-env cells, REST API, browser UI. sema notebook serve.

MCP server

Model Context Protocol server exposing Sema eval/build/notebook tools to AI agents. sema mcp.

Editor plugins

VS Code, Vim/Neovim, Emacs, Helix, Zed, IntelliJ — syntax highlighting, formatting, LSP integration.

WASM playground

Runs in the browser at sema.run and embeddable in web apps via wasm-bindgen.

Embedding

Rust crate sema-lang with a builder API. Interpreter::new().eval_str("(+ 1 2)").

The LLM layer

Not an SDK. A language.

LLM operations are forms and values, not library calls wrapped in boilerplate. The runtime handles retries, caching, cost tracking, provider fallback, and rate limiting — so your code stays the size of its idea.

Eight chat providers. Anthropic, OpenAI, Gemini, Groq, xAI, Mistral, Moonshot, Ollama — auto-configured from environment variables. Plus Jina, Voyage, and Cohere for embeddings.
Tools & agents. deftool defines a function with a schema. defagent defines a system prompt + tools + turn limit. agent/run handles the loop.
Conversations as data. Immutable, forkable, inspectable. conversation/say returns a new value — the old one is untouched.
Cassettes. Record LLM calls to a file, replay them forever. Deterministic tests without API keys.
Cost & budgets. llm/with-budget caps spend for a scope. Token usage tracked per call and per session.
Observability. Built-in OpenTelemetry tracing with GenAI conventions. Off by default.

llm.semaprimitives, not boilerplate

(deftool get-weather
  "Get weather for a city"
  {:city {:type :string}}
  (lambda (city)
    (format "~a: 22°C" city)))

(defagent bot
  {:system "Weather assistant."
   :tools [get-weather]
   :max-turns 3})

(llm/with-budget
  {:max-cost-usd 0.10}
  (lambda ()
    (agent/run bot
      "Weather in Oslo?")))
;; => "It's 22°C in Oslo."
;;    $0.003 · 1 tool call · 2 turns

How it differs from other Lisps

The things that surprise people.

Common Lisp / Scheme

Sema

Only #f and nil are falsy. 0, "", and () are all truthy. In CL, the empty list is false.

Cons cells

Vector-backed lists

Lists are Rc<Vec<Value>> — O(1) nth, O(n) cons. Prefer map/filter/fold and vector for hot paths.

Clojure atoms

define + set!

Mutable state is (define x 0) + (set! x 1). No atom/swap!/reset!.

One map type

Two map types

{:k v} literals are sorted BTreeMaps — deterministic order, usable as keys. (hashmap/new) is faster and unordered.

syntax-rules

auto-gensym

Macros use defmacro with quasiquote. Symbols ending in # inside quasiquote are auto-unique — no variable capture.

No LLM types

LLM as first-class

Prompt, Message, Conversation, Tool, Agent are values — alongside integers and strings. This is why Sema exists.

Under the hood

NaN-boxed values, bytecode VM.

Every value is a single 8-byte struct Value(u64) — encoded in IEEE 754 quiet-NaN payload space. The sole evaluator is a stack-based bytecode VM with intrinsic opcodes and NaN-boxed fast paths. No tree-walking interpreter.

Single-threaded. Rc-based values, no cross-thread sharing. Parallelism is at the LLM-call level, not the compute level.
No GC. Deterministic destruction via reference counting. Memory is freed the moment the last reference drops.
16 crates. Strict dependency ordering: sema-core ← sema-reader ← sema-vm ← sema-eval ← sema. Stdlib and LLM depend on core, not eval — dependency inversion via callbacks.
Bytecode format. .semac files with a 24-byte header, string table, function table, main chunk. sema build embeds the runtime + bytecode into a standalone binary.

Architecture reference →

.sema→reader→lower→VM

.semac file layout

0053454Dmagic

0400v4

0000flags

········version

03003 sections

········reserved

24 bytes

0x01String Tablerequired

interned strings · Spur remapping

0x02Function Tablerequired

compiled function templates

0x03Main Chunkbytecode

ConstLoadLocal0CallPopConstCallGlobalReturn

NaN-boxed Value

tag6 bits

payload45 bits

struct Value(u64) — every type in 8 bytes

Naming conventions

Slash-namespaced. Predicates with `?`. Arrows for conversions.

The conventions are the API contract — get these right and the stdlib falls into place.

file/read

Slash-namespaced functions. string/split, http/get, regex/match?, json/encode. Never read-file or split-string.

empty?

Predicates end in ?. null?, list?, file/exists?, equal?.

string->symbol

Conversions use ->. keyword->string, list->vector, string->number.

string-append

Legacy Scheme names kept for a few string ops. string-length, string-ref, substring — no string/ prefix on these.

Now go build something with it.

Install it — or skip the tutorial and hand the docs to your agent.

curl$curl -fsSL https://sema-lang.com/install.sh | sh

cargo$cargo install sema-lang

agent$curl -fsSL https://sema-lang.com/docs/for-agents.md >> AGENTS.md

Get started Open the playground