Feature·LLM·RAG & Vector Store

Retrieval is a language feature.

Four primitives — llm/embed, vector-store/*, llm/rerank, llm/complete — compose into the full RAG pipeline. No framework. No vector database to stand up. No orchestration library. Just functions that fit in one screen.

$sema examples/llm/rag-docs-search.sema

embed → retrieve → rerank → answer · no external DB · disk-persisted

The pipeline

Embed. Retrieve. Rerank. Answer.

Bi-encoder search for high recall, cross-encoder reranking for precision. The industry-standard pattern, as four function calls.

llm/embed
text → vectors
all docs
RustPythonLispCookingMLMacrosGardeningCryptoHTTPRegexJSONPDF
vector-store/search
cosine nearest-k · high recall
top 12
LispMacrosHomoiconicfile/readRegexstring/splitRustPythonMLHTTPJSONPDF
llm/rerank
cross-encoder · precision
top 4
LispMacrosHomoiconicfile/read
llm/complete
grounded answer
"Lisp is homoiconic — code is data."

Embed

Vectors as bytevectors.

llm/embed takes a string or a list (batch). Returns packed f64 bytevectors with a fast path for similarity computations — no per-element unboxing overhead. It's a first-class value: (map llm/embed ...) just works.

  • Batch embeddings. (llm/embed ["a" "b" "c"]) returns a list of vectors in one network call.
  • Seven providers. Jina, Voyage, Cohere, Nomic, Together AI, Fireworks AI, OpenAI. Auto-configured from env vars. Separate from your chat provider.
  • Async-aware. Inside (async/...), embeddings offload to the scheduler so sibling tasks overlap.
embed.semabatch + similarity
(define texts
  (list "Rust is a systems language"
        "Python is great for ML"
        "Lisp is homoiconic"))

;; Batch: one network call
(define vecs (llm/embed texts))

;; Similarity between two vectors
(llm/similarity
  (first vecs)
  (last vecs))
;; => 0.72

Store & retrieve

No database. Just a file.

An in-memory vector store with disk persistence. Index once, query forever. vector-store/open loads from disk automatically. The JSON format is portable across platforms — base64-encoded embeddings, full metadata, diffable in git.

  • Index once. vector-store/save writes to disk. Next run loads instantly — no re-embedding.
  • Metadata on every doc. Store source paths, page numbers, timestamps alongside the vector.
  • Dimension-mismatch safety. Mixing embedding models in one store raises a clear error at search time.
store.semaindex + search
(vector-store/open "docs" "my-docs.json")

(for-each
  (lambda (text)
    (vector-store/add "docs" text
      (llm/embed text)
      {:text text}))
  texts)

(vector-store/save "docs")

(define hits
  (vector-store/search
    "docs"
    (llm/embed "Which is homoiconic?")
    5))

;; => ({:id "Lisp"
;;     :score 0.94
;;     :metadata {:text "Lisp is homoiconic"}}
;;    ...)

Rerank

Retrieve many. Rerank to a few.

Bi-encoders embed query and document independently — fast, but coarse. Cross-encoders read them together — slow, but precise. Sema's llm/rerank calls a hosted cross-encoder (Cohere, Jina, Voyage, Nomic, Together AI, or Fireworks AI) to reorder your candidates. The :index field maps back to the original list.

vector-store/search
top-12 by cosine similarity
file-read-lines 0.82
read-line 0.79
file-for-each-line 0.77
io-read-line 0.74
string/split 0.71
... 7 more
llm/rerank
cross-encoder
top-4 by relevance
file-read-lines 0.467
read-line 0.304
file-for-each-line 0.293
io-read-line 0.239
rerank.semathree providers, per-call override
(define candidates
  (vector-store/search "docs" (llm/embed question) 12))

(define reranked
  (llm/rerank
    question
    (map (lambda (c) (:text (:metadata c))) candidates)
    {:top-k 4
     :provider :cohere}))

;; => ({:index 0 :score 0.467
;;     :document "file-read-lines"}
;;    {:index 1 :score 0.304
;;     :document "read-line"} ...)

;; Override per call:
(llm/rerank q docs {:top-k 5 :provider :voyage
                    :model "rerank-2.5"})

The whole pipeline

Four functions. One screen.

Index a directory of docs, embed the query, retrieve candidates, rerank for precision, build context, generate a grounded answer. This is the code from examples/llm/rag-docs-search.sema — run it with sema and it indexes Sema's own documentation.

rag-docs-search.semathe complete pipeline
;; 1. Index (run once, cached to disk)
(vector-store/open "docs" "/tmp/sema-docs.vec")
(when (= (vector-store/count "docs") 0)
  (let* ((files (file/glob "crates/sema-docs/entries/stdlib/**/*.md"))
         (docs  (map (lambda (p)
                        {:name (path/stem p) :path p
                         :text (string/take (file/read p) 900)})
                      files))
         (vecs  (flat-map llm/embed
                          (list/chunk 64 (map :text docs)))))
    (map (lambda (doc vec)
           (vector-store/add "docs" (:name doc) vec doc))
         docs vecs)
    (vector-store/save "docs")))

;; 2. Retrieve
(define question "How do I read a file and split it into lines?")
(define hits (vector-store/search "docs" (llm/embed question) 12))

;; 3. Rerank
(define reranked
  (llm/rerank question
    (map (lambda (c) (:text (:metadata c))) hits)
    {:top-k 4}))

;; 4. Answer
(define context
  (string/join
    (map (lambda (r) (nth (map :text hits) (:index r))) reranked)
    "\n\n---\n\n"))

(println (llm/complete
  (prompt (system "Answer using only the context.")
          (user (format "Context:\n~a\n\nQ: ~a" context question)))
  {:max-tokens 400}))

;; => "Use file/read-lines to read all lines,
;;     then string/split or map over the result."

No infrastructure

No Pinecone. No pgvector. No Chroma.

The vector store is in-process with disk persistence. No connection strings, no Docker compose, no infrastructure to maintain. Four embedding providers and three reranker providers, all auto-configured from environment variables.

  • Embedding providers. Jina, Voyage, Cohere, Nomic, Together AI, Fireworks AI, OpenAI — or any OpenAI-compatible endpoint via :base-url.
  • Reranker providers. Cohere, Jina, Voyage, Nomic, Together AI, Fireworks AI — same API key, per-call override.
  • Separate from chat. llm/configure-embeddings lets you use Voyage for embeddings and Anthropic for chat.
embed
JinaVoyageCohereNomicTogetherFireworksOpenAI
rerank
CohereJinaVoyageNomicTogetherFireworks
store
in-memorydisk (JSON)
answer
AnthropicOpenAIGeminiOllama

The argument

What you'd assemble without it.

A typical Python RAG stack: LangChain for orchestration, Chroma or Pinecone for vectors, sentence-transformers for embeddings, a separate Cohere call for reranking, and prompt templates to glue it together. Sema replaces all of it with four function calls.

rag.py5 libraries
from langchain.vectorstores import Chroma
from langchain.embeddings import \
    HuggingFaceEmbeddings
from langchain.text_splitter import \
    RecursiveCharacterTextSplitter
from cohere import Client as Cohere
from langchain.openai import ChatOpenAI

embeddings = HuggingFaceEmbeddings()
splitter = RecursiveCharacterTextSplitter()
vectorstore = Chroma.from_documents(
    splitter.split_text(docs), embeddings)
cohere = Cohere(api_key=...)
llm = ChatOpenAI()

def rag(question):
    docs = vectorstore.similarity_search(
        question, k=12)
    results = cohere.rerank(
        query=question, documents=docs, top_k=4)
    context = "\n\n".join(
        docs[r.index].page_content
        for r in results)
    return llm.predict(
        f"Context: {context}\nQ: {question}")
5 imports. 3 objects to configure. Glue code to connect them. The pipeline is hidden behind abstractions.
rag.sema4 functions
(define hits
  (vector-store/search "docs"
    (llm/embed question) 12))

(define reranked
  (llm/rerank question
    (map (lambda (c)
           (:text (:metadata c))) hits)
    {:top-k 4}))

(define context
  (string/join
    (map (lambda (r)
           (nth (map :text hits)
                (:index r)))
         reranked) "\n\n---\n\n"))

(llm/complete
  (prompt
    (system "Answer using context.")
    (user (format "Context:\n~a\n\nQ: ~a"
                  context question)))
  {:max-tokens 400})
4 function calls. Zero imports. Zero configuration objects. The pipeline is the code.

Search your first document.

Run the example. It indexes Sema's own docs and answers questions.

run$sema examples/llm/rag-docs-search.sema
install$cargo install sema-lang