Bundled with Meaning Memory Studio

Hybrid retrieval, in the box.

Vexilon is the production retrieval layer in Meaning Memory Studio , dense vectors, BM25 sparse search, cross-encoder reranking, contextual chunking, and a knowledge graph, all bundled and MCP-native. Memory and corpus retrieval in one install.

Building production retrieval is a five-system problem. You need a vector store. You need a sparse-search index. You need a way to fuse their results. You need a reranker to push semantic precision past nearest-neighbor noise. You need a chunking strategy that preserves document scope. And, if your fleet asks questions like "what does this entity connect to?" , you need a graph layer too.

Most RAG stacks ship one or two of these and leave you to wire the rest. LlamaIndex hands you a framework. Pinecone gives you a vector store. Vespa is heavy but capable. pgvector is a Postgres extension. None of them ship the full retrieval pipeline as a single deployable unit.

Vexilon does. Hybrid retrieval, fusion, reranking, contextual chunking, knowledge graph, bundled, deployable, MCP-native, and integrated natively with Meaning Memory's STARE-scored cognitive memory. Memory and corpus retrieval in one bundled install.

Hybrid retrieval, fully assembled.

Vexilon is a production retrieval pipeline you can install and run, not a kit of pieces. The vector store, embedding service, sparse index, fusion ranker, cross-encoder reranker, chunker, and knowledge graph all ship together. You point it at your corpus and queries return ranked results through the same MCP tool surface your agents already use for memory operations.

Built on proven open-source components, Qdrant for vectors, BGE-M3 for embeddings, BGE-reranker-v2-m3 for cross-encoder reordering. Designed to scale from prototype to production without rewiring.

Hybrid retrieval

Dense vector search (BGE-M3 on Qdrant) + BM25 sparse search, fused via Reciprocal Rank Fusion. Catches both semantic and exact-match signal.

Cross-encoder reranker

BGE-reranker-v2-m3 reorders top candidates for semantic precision. Blends with RRF scores (configurable alpha) so you tune the precision/recall axis.

Contextual chunking

LLM-generated context preambles prepended to each chunk before embedding. Chunks carry their document scope, so retrieval doesn't lose the forest for the trees.

Knowledge graph

Entities and relationships extracted across the corpus and stored as a queryable graph. Multi-hop reasoning over connected concepts, not just nearest-neighbor lookup.

Three-signal boost profiles

Tune ranking per query intent: recent (fresh wins), significant (rich content wins), procedural (authoritative docs win). Auto-detected from query or set explicitly.

MCP-native

Same MCP tool surface as your Meaning Memory deployment. vexilon_search, vexilon_sources, vexilon_graph, vexilon_reindex. No separate SDK.

Why bundled beats assembled.

Capability Vexilon LlamaIndex Pinecone Vespa pgvector
Dense vector search via plugin
BM25 sparse search via plugin
RRF fusion (bundled) DIY
Cross-encoder reranker (bundled) via integration DIY
Contextual chunking
Knowledge graph layer via integration via integration
MCP-native interface
Self-host (no SaaS lock-in) SaaS only
Integrated with cognitive memory

Ready to bundle retrieval with memory.

Vexilon ships with Meaning Memory Studio, license, install, deploy. Tell us about your corpus and your fleet, and we'll scope the right deployment shape.

Request access

Common questions.

What is Vexilon?

Vexilon is the production hybrid retrieval layer bundled with Meaning Memory Studio. It combines dense vector search (BGE-M3 embeddings on a Qdrant vector store), BM25 sparse search, and Reciprocal Rank Fusion to unify results across both retrieval modes. A cross-encoder reranker (BGE-reranker-v2-m3) reorders the top candidates for semantic precision. Contextual chunking uses an LLM to generate context preambles before embedding, so chunks carry the document scope they came from. A lightweight knowledge graph tracks entities and relationships across the corpus. Three-signal boost profiles (recency, significance, procedural authority) let operators tune ranking per query intent.

How does Vexilon compare to LlamaIndex, Pinecone, Vespa, or pgvector?

Most RAG stacks ship one or two of the pieces and leave you to wire the rest. LlamaIndex is a framework, you choose vector stores, rerankers, and chunking strategies. Pinecone is a managed vector database, no BM25, no reranker bundled, no contextual chunking. Vespa is a production search engine, capable but heavy, sparse-and-dense in one but no contextual preamble layer or knowledge graph. pgvector is a Postgres extension, you build everything else yourself. Vexilon bundles all of it (hybrid retrieval + RRF + reranker + contextual chunking + knowledge graph + Qdrant + embeddings) as a single MCP-native install, then integrates natively with Meaning Memory's STARE-scored cognitive memory.

Why is Vexilon part of Studio and not bundled with Engine?

Vexilon carries genuine deployment weight, a Qdrant vector store, an embedding service running on GPU for BGE-M3 inference, and a cross-encoder reranker container. Bundling it with Engine would force every Engine customer to operate that stack whether they need corpus retrieval or not. Engine customers who already run their own RAG (LlamaIndex, LangGraph, Vespa, pgvector, or custom) wire retrieved context in via the BYO-RAG adapter; Studio customers get Vexilon bundled and ready.