Vexilon is the production retrieval layer in Meaning Memory Studio , dense vectors, BM25 sparse search, cross-encoder reranking, contextual chunking, and a knowledge graph, all bundled and MCP-native. Memory and corpus retrieval in one install.
Building production retrieval is a five-system problem. You need a vector store. You need a sparse-search index. You need a way to fuse their results. You need a reranker to push semantic precision past nearest-neighbor noise. You need a chunking strategy that preserves document scope. And, if your fleet asks questions like "what does this entity connect to?" , you need a graph layer too.
Most RAG stacks ship one or two of these and leave you to wire the rest. LlamaIndex hands you a framework. Pinecone gives you a vector store. Vespa is heavy but capable. pgvector is a Postgres extension. None of them ship the full retrieval pipeline as a single deployable unit.
Vexilon does. Hybrid retrieval, fusion, reranking, contextual chunking, knowledge graph, bundled, deployable, MCP-native, and integrated natively with Meaning Memory's STARE-scored cognitive memory. Memory and corpus retrieval in one bundled install.
Vexilon is a production retrieval pipeline you can install and run, not a kit of pieces. The vector store, embedding service, sparse index, fusion ranker, cross-encoder reranker, chunker, and knowledge graph all ship together. You point it at your corpus and queries return ranked results through the same MCP tool surface your agents already use for memory operations.
Built on proven open-source components, Qdrant for vectors, BGE-M3 for embeddings, BGE-reranker-v2-m3 for cross-encoder reordering. Designed to scale from prototype to production without rewiring.
Dense vector search (BGE-M3 on Qdrant) + BM25 sparse search, fused via Reciprocal Rank Fusion. Catches both semantic and exact-match signal.
BGE-reranker-v2-m3 reorders top candidates for semantic precision. Blends with RRF scores (configurable alpha) so you tune the precision/recall axis.
LLM-generated context preambles prepended to each chunk before embedding. Chunks carry their document scope, so retrieval doesn't lose the forest for the trees.
Entities and relationships extracted across the corpus and stored as a queryable graph. Multi-hop reasoning over connected concepts, not just nearest-neighbor lookup.
Tune ranking per query intent: recent (fresh wins), significant (rich content wins), procedural (authoritative docs win). Auto-detected from query or set explicitly.
Same MCP tool surface as your Meaning Memory deployment. vexilon_search, vexilon_sources, vexilon_graph, vexilon_reindex. No separate SDK.
| Capability | Vexilon | LlamaIndex | Pinecone | Vespa | pgvector |
|---|---|---|---|---|---|
| Dense vector search | ✓ | via plugin | ✓ | ✓ | ✓ |
| BM25 sparse search | ✓ | via plugin | — | ✓ | — |
| RRF fusion (bundled) | ✓ | DIY | — | ✓ | — |
| Cross-encoder reranker (bundled) | ✓ | via integration | — | DIY | — |
| Contextual chunking | ✓ | — | — | — | — |
| Knowledge graph layer | ✓ | via integration | — | via integration | — |
| MCP-native interface | ✓ | — | — | — | — |
| Self-host (no SaaS lock-in) | ✓ | ✓ | SaaS only | ✓ | ✓ |
| Integrated with cognitive memory | ✓ | — | — | — | — |
Vexilon ships with Meaning Memory Studio, license, install, deploy. Tell us about your corpus and your fleet, and we'll scope the right deployment shape.
Request accessVexilon is the production hybrid retrieval layer bundled with Meaning Memory Studio. It combines dense vector search (BGE-M3 embeddings on a Qdrant vector store), BM25 sparse search, and Reciprocal Rank Fusion to unify results across both retrieval modes. A cross-encoder reranker (BGE-reranker-v2-m3) reorders the top candidates for semantic precision. Contextual chunking uses an LLM to generate context preambles before embedding, so chunks carry the document scope they came from. A lightweight knowledge graph tracks entities and relationships across the corpus. Three-signal boost profiles (recency, significance, procedural authority) let operators tune ranking per query intent.
Most RAG stacks ship one or two of the pieces and leave you to wire the rest. LlamaIndex is a framework, you choose vector stores, rerankers, and chunking strategies. Pinecone is a managed vector database, no BM25, no reranker bundled, no contextual chunking. Vespa is a production search engine, capable but heavy, sparse-and-dense in one but no contextual preamble layer or knowledge graph. pgvector is a Postgres extension, you build everything else yourself. Vexilon bundles all of it (hybrid retrieval + RRF + reranker + contextual chunking + knowledge graph + Qdrant + embeddings) as a single MCP-native install, then integrates natively with Meaning Memory's STARE-scored cognitive memory.
Vexilon carries genuine deployment weight, a Qdrant vector store, an embedding service running on GPU for BGE-M3 inference, and a cross-encoder reranker container. Bundling it with Engine would force every Engine customer to operate that stack whether they need corpus retrieval or not. Engine customers who already run their own RAG (LlamaIndex, LangGraph, Vespa, pgvector, or custom) wire retrieved context in via the BYO-RAG adapter; Studio customers get Vexilon bundled and ready.