Bytecode VM
Opt-In (Alpha)
The bytecode VM is functional and available via the --vm CLI flag. The tree-walking interpreter remains the default execution path. Both runtimes coexist and share the global environment.
Overview
Sema includes a bytecode VM alongside the existing tree-walking interpreter, enabled with the --vm CLI flag. The VM compiles Sema source code into stack-based bytecode for faster execution, delivering up to 17× speedup over the tree-walker on compute-heavy workloads (e.g., TAK benchmark: 1.25s vs 21.4s).
The tree-walking interpreter (sema-eval) is preserved as the default execution path, the macro expansion engine, and eval fallback — the two runtimes coexist, sharing the global environment and EvalContext.
Compilation Pipeline
Source text
→ Reader (tokenize + parse → Value AST)
→ Macro expand (tree-walker evaluates macros)
→ Lower (Value AST → CoreExpr IR)
→ Resolve (CoreExpr → ResolvedExpr with slot/upvalue/global analysis)
→ Compile (ResolvedExpr → bytecode Chunks)
→ VM execution (dispatch loop)Phase 1: Lowering (Value → CoreExpr)
The lowering pass converts the Value AST into CoreExpr, a desugared intermediate representation. All ~35 special forms are lowered to ~30 CoreExpr variants. Several forms desugar into simpler ones:
| Source Form | Lowers To |
|---|---|
cond | Nested If |
when | If with nil else |
unless | If with swapped branches |
case | Let + nested If with Or comparisons |
defun | Define + Lambda |
Tail position analysis happens during lowering. The Call node carries a tail: bool flag, set based on position:
- Tail: last expression in
lambdabody,begin,let/let*/letrecbody,ifbranches,condclauses,and/orlast operand - Not tail:
trybody (handler must be reachable),doloop body, non-last expressions
Phase 2: Variable Resolution (CoreExpr → ResolvedExpr)
The resolver walks the CoreExpr tree and classifies every variable reference as one of:
| Resolution | Opcode | Description |
|---|---|---|
Local { slot } | LoadLocal / StoreLocal | Variable in the current function's frame |
Upvalue { index } | LoadUpvalue / StoreUpvalue | Captured from an enclosing function |
Global { spur } | LoadGlobal / StoreGlobal | Module-level binding |
This is the key optimization over the tree-walker: instead of hash-based environment chain lookup (O(scope depth) per access), variables are accessed by direct slot index (O(1)).
Upvalue Capture
Closures use the Lua/Steel upvalue model. When a lambda references a variable from an enclosing function:
- The resolver marks the outer local as captured
- An
UpvalueDescentry is added to the inner lambda:ParentLocal(slot)if capturing from the immediate parent,ParentUpvalue(index)if capturing through an intermediate function
(lambda (x) ; x = Local slot 0
(lambda () ; captures x: UpvalueDesc::ParentLocal(0)
(lambda () ; captures through chain: UpvalueDesc::ParentUpvalue(0)
x))) ; resolves to Upvalue { index: 0 }Phase 3: Bytecode Compilation (ResolvedExpr → Chunk)
The compiler (compiler.rs) transforms ResolvedExpr into bytecode Chunks. The Compiler struct wraps an Emitter (bytecode builder) and collects Function templates for inner lambdas.
Compilation strategies:
- Constants:
Nil,True,Falseget dedicated opcodes. All other constants useConst+ constant pool. - Variables:
LoadLocal/StoreLocalfor locals,LoadUpvalue/StoreUpvaluefor captures,LoadGlobal/StoreGlobal/DefineGlobalfor globals. - Control flow:
ifusesJumpIfFalse+Jumpfor short-circuit.and/oruseDup+ conditional jumps to preserve the last truthy/falsy value. - Lambdas: compiled to separate
Functiontemplates, referenced byMakeClosureinstruction with upvalue descriptors. doloops: compile to backwardJumpwithJumpIfTruefor exit test.try/catch: adds entries to the chunk's exception table, no inline opcodes.- Named let: compiled as
MakeClosure+Call— the loop body becomes a function.
Runtime-delegated forms — forms that can't be compiled to pure bytecode are compiled as calls to __vm-* global functions registered by sema-eval:
| Source Form | Delegate |
|---|---|
eval | __vm-eval |
import | __vm-import |
load | __vm-load |
defmacro | __vm-defmacro-form (passes entire form as quoted constant) |
define-record-type | __vm-define-record-type |
delay | __vm-delay (passes unevaluated body as quoted constant) |
force | __vm-force |
prompt, message, deftool, defagent, macroexpand | Corresponding __vm-* delegates |
Public API: compile(), compile_many(), compile_with_locals(), compile_many_with_locals() — all return CompileResult { chunk, functions }.
Compiler Optimizations
- Intrinsic recognition: Known builtins (
+,-,*,/,<,>,<=,>=,=,not) are compiled to inline opcodes instead of function calls, eliminating global lookup,Rcdowncast, argumentVecallocation, and function pointer dispatch. - Peephole:
(if (not X) ...): The pattern(if (not X) A B)compiles toJumpIfTrueinstead ofNot+JumpIfFalse, eliminating one instruction. - Fused
CallGlobal: Non-tail calls to global functions use a fusedCallGlobalinstruction that combinesLoadGlobal+Callinto a single opcode with(u32 spur, u16 argc)operands. - Specialized
LoadLocal/StoreLocal: Slots 0–3 have dedicated zero-operand opcodes (LoadLocal0..LoadLocal3,StoreLocal0..StoreLocal3), saving 2 bytes per instruction for the most frequently accessed locals.
Phase 4: VM Execution
The VM (vm.rs) is a stack-based dispatch loop.
Core structs:
VM { stack: Vec<Value>, frames: Vec<CallFrame>, globals: Rc<Env>, functions: Vec<Rc<Function>> }
CallFrame { closure: Rc<Closure>, pc: usize, base: usize, open_upvalues: Vec<Option<Rc<UpvalueCell>>> }Key design points:
- Unsafe hot path: The dispatch loop uses
unsafeunchecked stack operations (pop_unchecked) and raw pointer bytecode reads viaread_u16!/read_i32!/read_u32!macros for performance. Opcode decoding usesstd::mem::transmuteon the#[repr(u8)]Openum. Debug builds retain bounds checks viadebug_assert!. - Closure interop: VM closures are wrapped as
Value::NativeFnvalues so the tree-walker can call them. Each NativeFn carries anRc<dyn Any>payload containingVmClosurePayload(closure + function table), and the VM usesraw_tag()+downcast_refto avoidRcrefcount bumps on the hot path. Each NativeFn wrapper creates a fresh VM instance to execute the closure's bytecode. - Upvalue cells:
UpvalueCellwithRc<RefCell<Value>>for shared mutable state —StoreLocalsyncs to open upvalue cells,LoadLocalreads from cells when present. - Exception handling:
Throwopcode triggers handler search via the chunk's exception table. Stack is restored to saved depth, error value pushed, PC jumps to handler.
Entry points: VM::execute() takes a closure and EvalContext. eval_str() is a convenience that parses, compiles, and runs. compile_program() is the shared pipeline: Value AST → lower → resolve → compile → (Closure, Vec<Function>).
VM Optimizations
- Two-level dispatch loop: An outer loop caches frame-local state (code pointer, constants pointer, base offset) into local variables. The inner loop dispatches opcodes without re-fetching frame data. Frame state is only reloaded when control flow changes frames (
Call,TailCall,Return, exceptions). - NaN-boxed int fast paths:
AddInt/SubInt/MulInt/LtInt/EqIntoperate directly on raw NaN-boxed bits — sign-extending the payload, performing the arithmetic, and re-boxing, without ever constructing aValue. - 16-entry direct-mapped global cache: Global lookups are cached in a
[(spur, env_version, Value); 16]array indexed byspur & 0xF. Cache hits skip theEnvhash-map lookup entirely; cache lines are invalidated by env version mismatch. - Raw pointer bytecode reads:
read_u16!,read_i32!, andread_u32!macros read operands via raw pointer arithmetic on the code buffer, avoiding bounds checks in release builds. - Unsafe unchecked stack operations:
pop_uncheckedskips length checks (the compiler guarantees stack correctness).debug_assert!guards catch violations in debug builds. - Cold path factoring: The
handle_err!macro factors exception handling out of the hot instruction sequence, keeping the fast path compact for better instruction-cache behavior.
Opcode Set
The VM uses a stack-based instruction set with variable-length encoding. Each opcode is one byte, followed by operands (u16, u32, or i32).
Constants & Stack
| Opcode | Operands | Description |
|---|---|---|
Const | u16 index | Push constants[index] |
Nil | — | Push nil |
True | — | Push #t |
False | — | Push #f |
Pop | — | Discard top of stack |
Dup | — | Duplicate top of stack |
Variable Access
| Opcode | Operands | Description |
|---|---|---|
LoadLocal | u16 slot | Push locals[slot] |
StoreLocal | u16 slot | locals[slot] = pop |
LoadUpvalue | u16 index | Push upvalues[index].get() |
StoreUpvalue | u16 index | upvalues[index].set(pop) |
LoadGlobal | u32 spur | Push globals[spur] |
StoreGlobal | u32 spur | globals[spur] = pop |
DefineGlobal | u32 spur | Define new global binding |
LoadLocal0..LoadLocal3 | — | Push locals[0..3] (zero-operand fast path) |
StoreLocal0..StoreLocal3 | — | locals[0..3] = pop (zero-operand fast path) |
Control Flow
| Opcode | Operands | Description |
|---|---|---|
Jump | i32 offset | Unconditional relative jump |
JumpIfFalse | i32 offset | Pop, jump if falsy |
JumpIfTrue | i32 offset | Pop, jump if truthy |
Functions
| Opcode | Operands | Description |
|---|---|---|
Call | u16 argc | Call function with argc args |
TailCall | u16 argc | Tail call (reuse frame for TCO) |
Return | — | Return top of stack |
MakeClosure | u16 func_id, u16 n_upvalues, ... | Create closure from function template |
CallNative | u16 native_id, u16 argc | Direct native function call |
CallGlobal | u32 spur, u16 argc | Fused global lookup + call |
Data Constructors
| Opcode | Operands | Description |
|---|---|---|
MakeList | u16 n | Pop n values, push list |
MakeVector | u16 n | Pop n values, push vector |
MakeMap | u16 n_pairs | Pop 2n values, push sorted map |
MakeHashMap | u16 n_pairs | Pop 2n values, push hash map |
Arithmetic & Comparison
| Opcode | Description |
|---|---|
Add, Sub, Mul, Div | Generic arithmetic (int/float/string) |
Negate, Not | Unary operators |
Eq, Lt, Gt, Le, Ge | Generic comparison |
AddInt, SubInt, MulInt | Specialized int fast paths |
LtInt, EqInt | Specialized int comparison |
Exception Handling
| Opcode | Description |
|---|---|
Throw | Pop value, raise as exception |
Exception handling uses an exception table on the Chunk rather than inline opcodes. Each entry specifies a PC range, handler address, and stack depth to restore.
Crate Structure
The bytecode VM lives in the sema-vm crate, which sits between sema-reader and sema-eval in the dependency graph:
sema-core ← sema-reader ← sema-vm ← sema-evalsema-vm depends on sema-core (for Value, Spur, SemaError) and sema-reader (for parsing in test helpers). It does not depend on sema-eval — the evaluator will depend on the VM, not the other way around.
Source Files
| File | Purpose |
|---|---|
opcodes.rs | Op enum — 51 bytecode opcodes |
chunk.rs | Chunk (bytecode + constants + spans), Function, UpvalueDesc |
emit.rs | Emitter — bytecode builder with jump backpatching |
disasm.rs | Human-readable bytecode disassembler |
core_expr.rs | CoreExpr and ResolvedExpr IR enums |
lower.rs | Value AST → CoreExpr lowering pass |
resolve.rs | Variable resolution (local/upvalue/global analysis) |
compiler.rs | Bytecode compiler (ResolvedExpr → Chunk) |
vm.rs | VM dispatch loop, call frames, closures, exception handling |
Current Limitations
- VM closures create a fresh VM per call (no true TCO across closure boundaries)
try/catchdoesn't catch runtime errors from native functions (e.g., division by zero)try/catcherror value format may differ from tree-walkerdefine-record-typebindings not visible to subsequent compiled code- The compiler emits inline opcodes for common builtins (
+,-,*,/,<,>,<=,>=,=,not) via intrinsic recognition. If a user redefines these globals, the intrinsic will still fire (these are treated as non-rebindable primitives). CallNativeopcode is defined but not yet used (no native function registry)
CLI Usage
The --vm flag enables the bytecode compilation path. The tree-walker remains the default.
# Run a file with the bytecode VM
sema --vm examples/hello.sema
# REPL: toggle VM mode with ,vm
sema
> ,vmBoth paths share the global Env and EvalContext.
Design Decisions
Why Keep the Tree-Walker?
The tree-walking interpreter remains for:
- Macro expansion — macros can call
evalat expansion time, requiring a full evaluator before compilation evalfallback — dynamicevalneeds to parse, expand, compile, and execute at runtime- Debugging — tree-walking is easier to step through and inspect
Why Not Delete CoreExpr After Resolution?
The pipeline uses two IR types: CoreExpr (variables as names) and ResolvedExpr (variables as slots). This provides type-level safety — the compiler can only receive resolved expressions, preventing accidental use of unresolved variable references.