Skip to content

Bytecode VM

Opt-In (Alpha)

The bytecode VM is functional and available via the --vm CLI flag. The tree-walking interpreter remains the default execution path. Both runtimes coexist and share the global environment.

Overview

Sema includes a bytecode VM alongside the existing tree-walking interpreter, enabled with the --vm CLI flag. The VM compiles Sema source code into stack-based bytecode for faster execution, delivering up to 17× speedup over the tree-walker on compute-heavy workloads (e.g., TAK benchmark: 1.25s vs 21.4s).

The tree-walking interpreter (sema-eval) is preserved as the default execution path, the macro expansion engine, and eval fallback — the two runtimes coexist, sharing the global environment and EvalContext.

Compilation Pipeline

Source text
  → Reader       (tokenize + parse → Value AST)
  → Macro expand  (tree-walker evaluates macros)
  → Lower         (Value AST → CoreExpr IR)
  → Resolve        (CoreExpr → ResolvedExpr with slot/upvalue/global analysis)
  → Compile        (ResolvedExpr → bytecode Chunks)
  → VM execution   (dispatch loop)

Phase 1: Lowering (Value → CoreExpr)

The lowering pass converts the Value AST into CoreExpr, a desugared intermediate representation. All ~35 special forms are lowered to ~30 CoreExpr variants. Several forms desugar into simpler ones:

Source FormLowers To
condNested If
whenIf with nil else
unlessIf with swapped branches
caseLet + nested If with Or comparisons
defunDefine + Lambda

Tail position analysis happens during lowering. The Call node carries a tail: bool flag, set based on position:

  • Tail: last expression in lambda body, begin, let/let*/letrec body, if branches, cond clauses, and/or last operand
  • Not tail: try body (handler must be reachable), do loop body, non-last expressions

Phase 2: Variable Resolution (CoreExpr → ResolvedExpr)

The resolver walks the CoreExpr tree and classifies every variable reference as one of:

ResolutionOpcodeDescription
Local { slot }LoadLocal / StoreLocalVariable in the current function's frame
Upvalue { index }LoadUpvalue / StoreUpvalueCaptured from an enclosing function
Global { spur }LoadGlobal / StoreGlobalModule-level binding

This is the key optimization over the tree-walker: instead of hash-based environment chain lookup (O(scope depth) per access), variables are accessed by direct slot index (O(1)).

Upvalue Capture

Closures use the Lua/Steel upvalue model. When a lambda references a variable from an enclosing function:

  1. The resolver marks the outer local as captured
  2. An UpvalueDesc entry is added to the inner lambda: ParentLocal(slot) if capturing from the immediate parent, ParentUpvalue(index) if capturing through an intermediate function
scheme
(lambda (x)           ; x = Local slot 0
  (lambda ()          ; captures x: UpvalueDesc::ParentLocal(0)
    (lambda ()        ; captures through chain: UpvalueDesc::ParentUpvalue(0)
      x)))            ; resolves to Upvalue { index: 0 }

Phase 3: Bytecode Compilation (ResolvedExpr → Chunk)

The compiler (compiler.rs) transforms ResolvedExpr into bytecode Chunks. The Compiler struct wraps an Emitter (bytecode builder) and collects Function templates for inner lambdas.

Compilation strategies:

  • Constants: Nil, True, False get dedicated opcodes. All other constants use Const + constant pool.
  • Variables: LoadLocal/StoreLocal for locals, LoadUpvalue/StoreUpvalue for captures, LoadGlobal/StoreGlobal/DefineGlobal for globals.
  • Control flow: if uses JumpIfFalse + Jump for short-circuit. and/or use Dup + conditional jumps to preserve the last truthy/falsy value.
  • Lambdas: compiled to separate Function templates, referenced by MakeClosure instruction with upvalue descriptors.
  • do loops: compile to backward Jump with JumpIfTrue for exit test.
  • try/catch: adds entries to the chunk's exception table, no inline opcodes.
  • Named let: compiled as MakeClosure + Call — the loop body becomes a function.

Runtime-delegated forms — forms that can't be compiled to pure bytecode are compiled as calls to __vm-* global functions registered by sema-eval:

Source FormDelegate
eval__vm-eval
import__vm-import
load__vm-load
defmacro__vm-defmacro-form (passes entire form as quoted constant)
define-record-type__vm-define-record-type
delay__vm-delay (passes unevaluated body as quoted constant)
force__vm-force
prompt, message, deftool, defagent, macroexpandCorresponding __vm-* delegates

Public API: compile(), compile_many(), compile_with_locals(), compile_many_with_locals() — all return CompileResult { chunk, functions }.

Compiler Optimizations

  • Intrinsic recognition: Known builtins (+, -, *, /, <, >, <=, >=, =, not) are compiled to inline opcodes instead of function calls, eliminating global lookup, Rc downcast, argument Vec allocation, and function pointer dispatch.
  • Peephole: (if (not X) ...): The pattern (if (not X) A B) compiles to JumpIfTrue instead of Not + JumpIfFalse, eliminating one instruction.
  • Fused CallGlobal: Non-tail calls to global functions use a fused CallGlobal instruction that combines LoadGlobal + Call into a single opcode with (u32 spur, u16 argc) operands.
  • Specialized LoadLocal/StoreLocal: Slots 0–3 have dedicated zero-operand opcodes (LoadLocal0..LoadLocal3, StoreLocal0..StoreLocal3), saving 2 bytes per instruction for the most frequently accessed locals.

Phase 4: VM Execution

The VM (vm.rs) is a stack-based dispatch loop.

Core structs:

rust
VM { stack: Vec<Value>, frames: Vec<CallFrame>, globals: Rc<Env>, functions: Vec<Rc<Function>> }
CallFrame { closure: Rc<Closure>, pc: usize, base: usize, open_upvalues: Vec<Option<Rc<UpvalueCell>>> }

Key design points:

  • Unsafe hot path: The dispatch loop uses unsafe unchecked stack operations (pop_unchecked) and raw pointer bytecode reads via read_u16!/read_i32!/read_u32! macros for performance. Opcode decoding uses std::mem::transmute on the #[repr(u8)] Op enum. Debug builds retain bounds checks via debug_assert!.
  • Closure interop: VM closures are wrapped as Value::NativeFn values so the tree-walker can call them. Each NativeFn carries an Rc<dyn Any> payload containing VmClosurePayload (closure + function table), and the VM uses raw_tag() + downcast_ref to avoid Rc refcount bumps on the hot path. Each NativeFn wrapper creates a fresh VM instance to execute the closure's bytecode.
  • Upvalue cells: UpvalueCell with Rc<RefCell<Value>> for shared mutable state — StoreLocal syncs to open upvalue cells, LoadLocal reads from cells when present.
  • Exception handling: Throw opcode triggers handler search via the chunk's exception table. Stack is restored to saved depth, error value pushed, PC jumps to handler.

Entry points: VM::execute() takes a closure and EvalContext. eval_str() is a convenience that parses, compiles, and runs. compile_program() is the shared pipeline: Value AST → lower → resolve → compile → (Closure, Vec<Function>).

VM Optimizations

  • Two-level dispatch loop: An outer loop caches frame-local state (code pointer, constants pointer, base offset) into local variables. The inner loop dispatches opcodes without re-fetching frame data. Frame state is only reloaded when control flow changes frames (Call, TailCall, Return, exceptions).
  • NaN-boxed int fast paths: AddInt/SubInt/MulInt/LtInt/EqInt operate directly on raw NaN-boxed bits — sign-extending the payload, performing the arithmetic, and re-boxing, without ever constructing a Value.
  • 16-entry direct-mapped global cache: Global lookups are cached in a [(spur, env_version, Value); 16] array indexed by spur & 0xF. Cache hits skip the Env hash-map lookup entirely; cache lines are invalidated by env version mismatch.
  • Raw pointer bytecode reads: read_u16!, read_i32!, and read_u32! macros read operands via raw pointer arithmetic on the code buffer, avoiding bounds checks in release builds.
  • Unsafe unchecked stack operations: pop_unchecked skips length checks (the compiler guarantees stack correctness). debug_assert! guards catch violations in debug builds.
  • Cold path factoring: The handle_err! macro factors exception handling out of the hot instruction sequence, keeping the fast path compact for better instruction-cache behavior.

Opcode Set

The VM uses a stack-based instruction set with variable-length encoding. Each opcode is one byte, followed by operands (u16, u32, or i32).

Constants & Stack

OpcodeOperandsDescription
Constu16 indexPush constants[index]
NilPush nil
TruePush #t
FalsePush #f
PopDiscard top of stack
DupDuplicate top of stack

Variable Access

OpcodeOperandsDescription
LoadLocalu16 slotPush locals[slot]
StoreLocalu16 slotlocals[slot] = pop
LoadUpvalueu16 indexPush upvalues[index].get()
StoreUpvalueu16 indexupvalues[index].set(pop)
LoadGlobalu32 spurPush globals[spur]
StoreGlobalu32 spurglobals[spur] = pop
DefineGlobalu32 spurDefine new global binding
LoadLocal0..LoadLocal3Push locals[0..3] (zero-operand fast path)
StoreLocal0..StoreLocal3locals[0..3] = pop (zero-operand fast path)

Control Flow

OpcodeOperandsDescription
Jumpi32 offsetUnconditional relative jump
JumpIfFalsei32 offsetPop, jump if falsy
JumpIfTruei32 offsetPop, jump if truthy

Functions

OpcodeOperandsDescription
Callu16 argcCall function with argc args
TailCallu16 argcTail call (reuse frame for TCO)
ReturnReturn top of stack
MakeClosureu16 func_id, u16 n_upvalues, ...Create closure from function template
CallNativeu16 native_id, u16 argcDirect native function call
CallGlobalu32 spur, u16 argcFused global lookup + call

Data Constructors

OpcodeOperandsDescription
MakeListu16 nPop n values, push list
MakeVectoru16 nPop n values, push vector
MakeMapu16 n_pairsPop 2n values, push sorted map
MakeHashMapu16 n_pairsPop 2n values, push hash map

Arithmetic & Comparison

OpcodeDescription
Add, Sub, Mul, DivGeneric arithmetic (int/float/string)
Negate, NotUnary operators
Eq, Lt, Gt, Le, GeGeneric comparison
AddInt, SubInt, MulIntSpecialized int fast paths
LtInt, EqIntSpecialized int comparison

Exception Handling

OpcodeDescription
ThrowPop value, raise as exception

Exception handling uses an exception table on the Chunk rather than inline opcodes. Each entry specifies a PC range, handler address, and stack depth to restore.

Crate Structure

The bytecode VM lives in the sema-vm crate, which sits between sema-reader and sema-eval in the dependency graph:

sema-core ← sema-reader ← sema-vm ← sema-eval

sema-vm depends on sema-core (for Value, Spur, SemaError) and sema-reader (for parsing in test helpers). It does not depend on sema-eval — the evaluator will depend on the VM, not the other way around.

Source Files

FilePurpose
opcodes.rsOp enum — 51 bytecode opcodes
chunk.rsChunk (bytecode + constants + spans), Function, UpvalueDesc
emit.rsEmitter — bytecode builder with jump backpatching
disasm.rsHuman-readable bytecode disassembler
core_expr.rsCoreExpr and ResolvedExpr IR enums
lower.rsValue AST → CoreExpr lowering pass
resolve.rsVariable resolution (local/upvalue/global analysis)
compiler.rsBytecode compiler (ResolvedExpr → Chunk)
vm.rsVM dispatch loop, call frames, closures, exception handling

Current Limitations

  • VM closures create a fresh VM per call (no true TCO across closure boundaries)
  • try/catch doesn't catch runtime errors from native functions (e.g., division by zero)
  • try/catch error value format may differ from tree-walker
  • define-record-type bindings not visible to subsequent compiled code
  • The compiler emits inline opcodes for common builtins (+, -, *, /, <, >, <=, >=, =, not) via intrinsic recognition. If a user redefines these globals, the intrinsic will still fire (these are treated as non-rebindable primitives).
  • CallNative opcode is defined but not yet used (no native function registry)

CLI Usage

The --vm flag enables the bytecode compilation path. The tree-walker remains the default.

bash
# Run a file with the bytecode VM
sema --vm examples/hello.sema

# REPL: toggle VM mode with ,vm
sema
> ,vm

Both paths share the global Env and EvalContext.

Design Decisions

Why Keep the Tree-Walker?

The tree-walking interpreter remains for:

  1. Macro expansion — macros can call eval at expansion time, requiring a full evaluator before compilation
  2. eval fallback — dynamic eval needs to parse, expand, compile, and execute at runtime
  3. Debugging — tree-walking is easier to step through and inspect

Why Not Delete CoreExpr After Resolution?

The pipeline uses two IR types: CoreExpr (variables as names) and ResolvedExpr (variables as slots). This provides type-level safety — the compiler can only receive resolved expressions, preventing accidental use of unresolved variable references.