Skip to content

Bundled Executable Format

Status: Implemented (Alpha)

The bundled executable format is implemented and available via sema build. The format is not yet stable — breaking changes are expected before v1.0.

Overview

sema build compiles a Sema program into a standalone executable by embedding a VFS (Virtual File System) archive into the Sema runtime binary. The resulting binary is self-contained and requires no Sema installation to run.

Entry file (.sema) → Compile to bytecode → Trace imports → Build VFS archive → Inject into runtime binary → Executable

Running a bundled executable skips CLI argument parsing, loads the embedded bytecode from the VFS archive, and executes it directly.

CLI Interface

bash
# Basic build
sema build script.sema                        # → ./script
sema build script.sema -o myapp               # explicit output path

# Bundle additional files
sema build script.sema --include data.json    # bundle a file
sema build script.sema --include assets/      # bundle a directory (recursive)

# Use a specific runtime binary
sema build script.sema --runtime /path/to/sema

# Run the resulting standalone executable
./myapp --name hello

Options

OptionDescription
-o, --output <path>Output executable path (default: filename without extension)
--include <path>...Additional files or directories to bundle (repeatable)
--runtime <path>Sema binary to use as runtime base (default: current executable)

Binary Layout

The injection strategy varies by platform to preserve binary integrity and OS loader compatibility.

Linux (ELF): Raw Append

┌─────────────────────────────┐
│  Original Sema Binary (ELF) │
├─────────────────────────────┤
│  VFS Archive                │
├─────────────────────────────┤
│  Trailer (16 bytes)         │
│    archive_size: u64 LE     │
│    magic: "SEMAEXEC"        │
└─────────────────────────────┘

ELF loaders ignore appended data, so the binary remains valid.

macOS (Mach-O): Section Injection

┌─────────────────────────────┐
│  Modified Mach-O Binary     │
│  ├── Mach-O Header          │
│  ├── Load Commands          │
│  ├── ...segments...         │
│  └── "semaexec" section     │  ← VFS archive injected here
└─────────────────────────────┘

Injected via libsui, which ad-hoc re-signs the binary for macOS ARM64 compatibility.

Windows (PE): Resource Injection

┌─────────────────────────────┐
│  Modified PE Binary         │
│  ├── PE Header              │
│  ├── .text, .data, ...      │
│  └── .rsrc                  │
│       └── "semaexec"        │  ← VFS archive injected here
└─────────────────────────────┘

Injected via libsui. Existing Authenticode signatures are stripped.

Trailer Format

16 bytes, frozen — only used on Linux/ELF.

OffsetSizeTypeDescription
08u64 LESize of the VFS archive in bytes
88bytesMagic: SEMAEXEC (0x53 0x45 0x4D 0x41 0x45 0x58 0x45 0x43)

The trailer format is permanent and will never change. Old loaders can always detect new binaries and reject them if the archive format version is unsupported.

On macOS and Windows, the archive is stored in a named binary section — no trailer is used.

VFS Archive Format

The VFS archive is a flat binary format with a versioned header, metadata, table of contents, and file data.

All multi-byte integers are little-endian. All strings are UTF-8.

┌─ Archive Header ──────────────────────┐
│  format_version: u16                  │  Currently v1
│  flags: u16                           │  Reserved bitfield (must be 0)
│  archive_checksum: u32                │  CRC32-IEEE of all bytes after this field
│  metadata_count: u32                  │
│  ┌─ Metadata entries ───────────────┐ │
│  │ key_len(u16) + key(utf8)         │ │
│  │ val_len(u32) + val(bytes)        │ │
│  │ ...repeats metadata_count times  │ │
│  └──────────────────────────────────┘ │
├─ TOC (Table of Contents) ─────────────┤
│  entry_count: u32                     │
│  ┌─ TOC entries ────────────────────┐ │
│  │ path_len(u32) + path(utf8)       │ │
│  │ offset(u64) + size(u64)          │ │
│  │ ...repeats entry_count times     │ │
│  └──────────────────────────────────┘ │
├─ File data ───────────────────────────┤
│  raw bytes for all bundled files      │
│  (offsets relative to file data start)│
└───────────────────────────────────────┘
OffsetSizeTypeDescription
02u16 LEformat_version — currently 1
22u16 LEflags — reserved for future use, must be 0
44u32 LEarchive_checksum — CRC32-IEEE of all bytes from offset 8 to end of archive
84u32 LEmetadata_count — number of metadata key-value entries

Total header: 12 bytes

Metadata Entry

Repeated metadata_count times, immediately after the header.

FieldSizeTypeDescription
key_len2u16 LELength of key string in bytes
keykey_lenUTF-8Metadata key
val_len4u32 LELength of value in bytes
valval_lenbytesMetadata value (opaque bytes, typically UTF-8)

Unknown metadata keys are ignored by the loader (forward compatibility).

v1 Metadata Keys

KeyValueDescription
sema-versione.g. "1.10.0"Sema version that built the executable
build-timestampUnix timestamp stringSeconds since epoch when the executable was built
entry-point"__main__.semac"VFS path of the compiled entry bytecode
build-rootabsolute path stringOriginal project root directory

TOC (Table of Contents)

Starts immediately after the last metadata entry.

FieldSizeTypeDescription
entry_count4u32 LENumber of file entries

Each TOC entry:

FieldSizeTypeDescription
path_len4u32 LELength of VFS path in bytes
pathpath_lenUTF-8VFS path (relative, forward-slash separated)
offset8u64 LEByte offset from start of file data section
size8u64 LESize of file data in bytes

File Data

Raw concatenated bytes for all files, in TOC order. Offsets in TOC entries are relative to the start of this section (byte 0 = first byte after the last TOC entry).

VFS Path Conventions

VFS PathContents
__main__.semacCompiled bytecode of the entry file (always present)
lib/utils.semaAuto-traced import (relative to project root)
data.jsonAsset from --include data.json
prompts/system.txtAsset from --include prompts/

All VFS paths must be:

  • Relative (no leading / or \)
  • Forward-slash separated
  • No .. segments
  • No NUL bytes
  • No Windows reserved device names (CON, PRN, AUX, NUL, COM1COM3, LPT1LPT3)

Paths are validated at build time. Invalid paths cause a build error.

Integrity

The archive_checksum is a CRC32-IEEE checksum (polynomial 0xEDB88320, same as gzip/zlib) computed over all archive bytes from offset 8 (after the checksum field) to the end of the archive.

On load, the runtime recomputes the checksum and rejects the archive if it doesn't match. This detects accidental corruption but is not a cryptographic security feature.

Runtime Startup

When a Sema binary starts, before CLI argument parsing:

  1. Try libsui::find_section("semaexec") for named section (macOS/Windows)
  2. If not found: read last 16 bytes, check for SEMAEXEC magic (Linux/ELF)
  3. If archive found:
    • Deserialize and validate CRC32 checksum
    • Populate thread-local VFS with all archive files
    • Read entry-point from metadata (default: __main__.semac)
    • Load and execute the bytecode
    • Exit with appropriate status code
  4. If no archive found: proceed with normal CLI parsing (REPL/interpreter mode)

VFS Interception

When the VFS is active, the following functions check VFS first, then fall back to the real filesystem:

FunctionBehavior
(file/read path)Read UTF-8 text from VFS or filesystem
(file/read-bytes path)Read raw bytes from VFS or filesystem
(file/read-lines path)Read lines from VFS or filesystem
(file/exists? path)Check VFS first, then filesystem
(import "module")Resolve relative to VFS if active
(load "file.sema")Resolve relative to VFS if active

Write operations (file/write, file/append, file/delete, etc.) always target the real filesystem.

Build Flow

  1. Compile the entry file to bytecode (.semac format)
  2. Trace all (import ...) and (load ...) dependencies recursively
    • Circular imports are detected and handled
    • Dynamic imports (non-literal paths) emit a warning
  3. Collect --include assets (directories are expanded recursively)
  4. Build VFS archive with metadata and CRC32 checksum
  5. Inject archive into runtime binary (platform-specific)
  6. Set executable permissions on Unix

Platform Notes

PlatformInjectionSigningNotes
Linux (ELF)Raw append + trailerN/AELF loaders ignore appended data
macOS (Mach-O)libsui section injectionAd-hoc re-signedRe-sign with Developer ID for distribution
Windows (PE)libsui resource injectionAuthenticode strippedRe-sign with signtool if needed

Implementation

ComponentFile
Archive serializationcrates/sema/src/archive.rs
Import tracercrates/sema/src/import_tracer.rs
Build commandcrates/sema/src/main.rs
VFS corecrates/sema-core/src/vfs.rs
VFS I/O interceptioncrates/sema-stdlib/src/io.rs
Import/load VFS interceptioncrates/sema-eval/src/special_forms.rs

Future Work

  • Compression — optional zstd/deflate compression for VFS entries
  • Cross-compilation — pre-download runtimes to ~/.sema/cache/runtimes/
  • sema.toml manifest — declare includes, metadata, and build options in config
  • Runtime-only binary — strip tree-walker for smaller executables (requires architectural changes)
  • Code signing — proper Apple notarization / Authenticode signing integration